All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/11] Add network namespace support in the RDMA-CM
@ 2015-04-20  9:03 Haggai Eran
       [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                   ` (4 more replies)
  0 siblings, 5 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Haggai Eran

On 4/15/2015 3:39 PM, Doug Ledford wrote:
> For instance, the namespace patches aren't included, and that's at least partially because they didn't apply cleanly any more.

Here's an updated series on top of your tree. I've also included the fix for
IPv4 connections to IPv6 listeners.

Regards,
Haggai

Changes from v1:
- Include patch 1 in this series.
- Rebase for v4.1.

Changes from v0:
- Fix code review comments by Yann
- Rebase on top of linux-3.19

RDMA-CM uses IP based addressing and routing to setup RDMA connections between
hosts. Currently, all of the IP interfaces and addresses used by the RDMA-CM
must reside in the init_net namespace. This restricts the usage of containers
with RDMA to only work with host network namespace (aka the kernel init_net NS
instance).

This patchset allows using network namespaces with the RDMA-CM.

Each RDMA-CM and CM id is keeping a reference to a network namespace.

This reference is based on the process network namespace at the time of the
creation of the object or inherited from the listener.

This network namespace is used to perform all IP and network related
operations. Specifically, the local device lookup, as well as the remote GID
address resolution are done in the context of the RDMA-CM object's namespace.
This allows outgoing connections to reach the right target, even if the same
IP address exists in multiple network namespaces. This can happen if each
network namespace resides on a different pkey.

Additionally, the network namespace is used to split the listener service ID
table. From the user point of view, each network namespace has a unique,
completely independent table of service IDs. This allows running multiple
instances of a single service on the same machine, using containers. To
implement this, the CM layer now parses the IP address from the CM connect
requests, and searches for the matching networking device. The namespace of
the device found is used when looking up the service ID in the listener table.

The functionnality introduced by this series would come into play when the
transport is InfiniBand and IPoIB interfaces are assigned to each namespace.
Multiple IPoIB interfaces can be created and assigned to different RDMA-CM
capable containers, for example using pipework [1].

Full support for RoCE will be introduced in a later stage.

The patches apply against Roland's/Doug's tree for v4.1.

The patchset is structured as follows:

Patch 1 is a resend of patch to fix IPv4 connections to an IPv4/IPv6 listener.

Patches 2 and 4 are relatively trivial API extensions, requiring the callers
of certain ib_addr and ib_core functions to provide a network namespace, as
needed.

Patches 4 and 5 adds the ability to lookup a network namespace according to
the IP address, device and pkey. It finds the matching IPoIB interfaces, and
safely takes a reference on the network namespace before returning to the
caller.

Patch 6 moves the logic that extracts the IP address from a connect request
into the CM layer. This is needed for the upcoming listener lookup by
namespace.

Patch 7 adds support for network namespaces in the CM layer. All callers are
still passing init_net as the namespace, to maintain backward compatibility.
For incoming requests, the namespace of the relevant IPoIB device is used.

Patches 8 and 9 add proper namespace support to the RDMA-CM module.

Patches 10 and 11 add namespace support to the relevant user facing modules in
the IB stack.

[1] https://github.com/jpetazzo/pipework/pull/108

Guy Shapiro (7):
  IB/addr: Pass network namespace as a parameter
  IB/core: Pass network namespace as a parameter to relevant functions
  IB/ipoib: Return IPoIB devices as possible matches to
    get_net_device_by_port_pkey_ip
  IB/cm, cma: Move RDMA IP CM private-data parsing code from ib_cma to
    ib_cm
  IB/cm: Add network namespace support
  IB/cma: Add support for network namespaces
  IB/ucma: Take the network namespace from the process

Shachar Raindel (1):
  IB/ucm: Add partial support for network namespaces

Yotam Kenneth (3):
  RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6
  IB/core: Find the network namespace matching connection parameters
  IB/cma: Separate port allocation to network namespaces

 drivers/infiniband/core/addr.c                     |  31 +-
 drivers/infiniband/core/agent.c                    |   4 +-
 drivers/infiniband/core/cm.c                       | 287 ++++++++++++++++--
 drivers/infiniband/core/cma.c                      | 332 +++++++++------------
 drivers/infiniband/core/device.c                   |  57 ++++
 drivers/infiniband/core/mad_rmpp.c                 |  10 +-
 drivers/infiniband/core/ucm.c                      |   4 +-
 drivers/infiniband/core/ucma.c                     |   4 +-
 drivers/infiniband/core/user_mad.c                 |   4 +-
 drivers/infiniband/core/verbs.c                    |  22 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c           |   3 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c            |  21 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c          | 122 +++++++-
 drivers/infiniband/ulp/iser/iser_verbs.c           |   2 +-
 drivers/infiniband/ulp/isert/ib_isert.c            |   2 +-
 drivers/infiniband/ulp/srp/ib_srp.c                |   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c              |   5 +-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |   4 +-
 include/rdma/ib_addr.h                             |  44 ++-
 include/rdma/ib_cm.h                               |  63 +++-
 include/rdma/ib_verbs.h                            |  44 ++-
 include/rdma/rdma_cm.h                             |   6 +-
 net/9p/trans_rdma.c                                |   2 +-
 net/rds/ib.c                                       |   2 +-
 net/rds/ib_cm.c                                    |   2 +-
 net/rds/iw.c                                       |   2 +-
 net/rds/iw_cm.c                                    |   2 +-
 net/rds/rdma_transport.c                           |   2 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c           |   2 +-
 net/sunrpc/xprtrdma/verbs.c                        |   3 +-
 30 files changed, 822 insertions(+), 268 deletions(-)

-- 
1.7.11.2

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 01/11] RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6
       [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-04-20  9:03   ` Haggai Eran
       [not found]     ` <1429520622-10303-2-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-04-20  9:03   ` [PATCH v2 02/11] IB/addr: Pass network namespace as a parameter Haggai Eran
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Or Gerlitz

From: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When accepting a new connection with the listener being IPv6, the
family of the new connection is set as IPv6. This causes cma_zero_addr
function to return true on an non-zero address. As a result, the wrong
code path is taken. This causes the connection request to be rejected,
as the RDMA-CM code looks for the wrong type of device.

Since copying the ip address is done in different function depending
on the family (cma_save_ip4_info/cma_save_ip6_info) this is fixed by
hard coding the family of the IP address according to the function.

Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d570030d899c..6e5e11ca7702 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -866,12 +866,12 @@ static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_i
 
 	listen4 = (struct sockaddr_in *) &listen_id->route.addr.src_addr;
 	ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
-	ip4->sin_family = listen4->sin_family;
+	ip4->sin_family = AF_INET;
 	ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
 	ip4->sin_port = listen4->sin_port;
 
 	ip4 = (struct sockaddr_in *) &id->route.addr.dst_addr;
-	ip4->sin_family = listen4->sin_family;
+	ip4->sin_family = AF_INET;
 	ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
 	ip4->sin_port = hdr->port;
 }
@@ -883,12 +883,12 @@ static void cma_save_ip6_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_i
 
 	listen6 = (struct sockaddr_in6 *) &listen_id->route.addr.src_addr;
 	ip6 = (struct sockaddr_in6 *) &id->route.addr.src_addr;
-	ip6->sin6_family = listen6->sin6_family;
+	ip6->sin6_family = AF_INET6;
 	ip6->sin6_addr = hdr->dst_addr.ip6;
 	ip6->sin6_port = listen6->sin6_port;
 
 	ip6 = (struct sockaddr_in6 *) &id->route.addr.dst_addr;
-	ip6->sin6_family = listen6->sin6_family;
+	ip6->sin6_family = AF_INET6;
 	ip6->sin6_addr = hdr->src_addr.ip6;
 	ip6->sin6_port = hdr->port;
 }
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 02/11] IB/addr: Pass network namespace as a parameter
       [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-04-20  9:03   ` [PATCH v2 01/11] RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6 Haggai Eran
@ 2015-04-20  9:03   ` Haggai Eran
       [not found]     ` <1429520622-10303-3-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-04-20  9:03   ` [PATCH v2 04/11] IB/core: Find the network namespace matching connection parameters Haggai Eran
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Haggai Eran

From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add network namespace support to the ib_addr module. For that, all the address
resolution and matching should be done using the appropriate namespace instead
of init_net.

This is achieved by:

1. Adding an explicit network namespace argument to exported function that
   require a namespace.
2. Saving the namespace in the rdma_addr_client structure.
3. Using it when calling networking functions.

In order to preserve the behavior of calling modules, &init_net is
passed as the parameter in calls from other modules. This is modified as
namespace support is added on more levels.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/addr.c           | 31 ++++++++++++----------
 drivers/infiniband/core/cma.c            |  4 ++-
 drivers/infiniband/core/verbs.c          | 14 +++++++---
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |  3 ++-
 include/rdma/ib_addr.h                   | 44 ++++++++++++++++++++++++++++----
 5 files changed, 72 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index f80da50d84a5..95beaef6b66d 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -128,7 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 	int ret = -EADDRNOTAVAIL;
 
 	if (dev_addr->bound_dev_if) {
-		dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
+		dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
 		if (!dev)
 			return -ENODEV;
 		ret = rdma_copy_addr(dev_addr, dev, NULL);
@@ -137,9 +137,10 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 	}
 
 	switch (addr->sa_family) {
-	case AF_INET:
-		dev = ip_dev_find(&init_net,
-			((struct sockaddr_in *) addr)->sin_addr.s_addr);
+	case AF_INET: {
+		struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
+
+		dev = ip_dev_find(dev_addr->net, addr_in->sin_addr.s_addr);
 
 		if (!dev)
 			return ret;
@@ -149,12 +150,12 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 			*vlan_id = rdma_vlan_dev_vlan_id(dev);
 		dev_put(dev);
 		break;
-
+	}
 #if IS_ENABLED(CONFIG_IPV6)
 	case AF_INET6:
 		rcu_read_lock();
-		for_each_netdev_rcu(&init_net, dev) {
-			if (ipv6_chk_addr(&init_net,
+		for_each_netdev_rcu(dev_addr->net, dev) {
+			if (ipv6_chk_addr(dev_addr->net,
 					  &((struct sockaddr_in6 *) addr)->sin6_addr,
 					  dev, 1)) {
 				ret = rdma_copy_addr(dev_addr, dev, NULL);
@@ -236,7 +237,7 @@ static int addr4_resolve(struct sockaddr_in *src_in,
 	fl4.daddr = dst_ip;
 	fl4.saddr = src_ip;
 	fl4.flowi4_oif = addr->bound_dev_if;
-	rt = ip_route_output_key(&init_net, &fl4);
+	rt = ip_route_output_key(addr->net, &fl4);
 	if (IS_ERR(rt)) {
 		ret = PTR_ERR(rt);
 		goto out;
@@ -278,12 +279,13 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 	fl6.saddr = src_in->sin6_addr;
 	fl6.flowi6_oif = addr->bound_dev_if;
 
-	dst = ip6_route_output(&init_net, NULL, &fl6);
+	dst = ip6_route_output(addr->net, NULL, &fl6);
 	if ((ret = dst->error))
 		goto put;
 
 	if (ipv6_addr_any(&fl6.saddr)) {
-		ret = ipv6_dev_get_saddr(&init_net, ip6_dst_idev(dst)->dev,
+		ret = ipv6_dev_get_saddr(addr->net,
+					 ip6_dst_idev(dst)->dev,
 					 &fl6.daddr, 0, &fl6.saddr);
 		if (ret)
 			goto put;
@@ -458,7 +460,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr,
 }
 
 int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
-			       u16 *vlan_id)
+			       u16 *vlan_id, struct net *net)
 {
 	int ret = 0;
 	struct rdma_dev_addr dev_addr;
@@ -481,6 +483,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
 		return ret;
 
 	memset(&dev_addr, 0, sizeof(dev_addr));
+	dev_addr.net = net;
 
 	ctx.addr = &dev_addr;
 	init_completion(&ctx.comp);
@@ -492,7 +495,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
 	wait_for_completion(&ctx.comp);
 
 	memcpy(dmac, dev_addr.dst_dev_addr, ETH_ALEN);
-	dev = dev_get_by_index(&init_net, dev_addr.bound_dev_if);
+	dev = dev_get_by_index(net, dev_addr.bound_dev_if);
 	if (!dev)
 		return -ENODEV;
 	if (vlan_id)
@@ -502,7 +505,8 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
 }
 EXPORT_SYMBOL(rdma_addr_find_dmac_by_grh);
 
-int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id)
+int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id,
+				struct net *net)
 {
 	int ret = 0;
 	struct rdma_dev_addr dev_addr;
@@ -517,6 +521,7 @@ int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id)
 	if (ret)
 		return ret;
 	memset(&dev_addr, 0, sizeof(dev_addr));
+	dev_addr.net = net;
 	ret = rdma_translate_ip(&gid_addr._sockaddr, &dev_addr, vlan_id);
 	if (ret)
 		return ret;
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 6e5e11ca7702..aeb2417ec928 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -512,6 +512,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
 	INIT_LIST_HEAD(&id_priv->listen_list);
 	INIT_LIST_HEAD(&id_priv->mc_list);
 	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
+	id_priv->id.route.addr.dev_addr.net = &init_net;
 
 	return &id_priv->id;
 }
@@ -637,7 +638,8 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
 	    == RDMA_TRANSPORT_IB &&
 	    rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)
 	    == IB_LINK_LAYER_ETHERNET) {
-		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL);
+		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL,
+						  &init_net);
 
 		if (ret)
 			goto out;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f93eb8da7b5a..ca5c4dd8a67a 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -212,7 +212,9 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
 			ah_attr->vlan_id = wc->vlan_id;
 		} else {
 			ret = rdma_addr_find_dmac_by_grh(&grh->dgid, &grh->sgid,
-					ah_attr->dmac, &ah_attr->vlan_id);
+							 ah_attr->dmac,
+							 &ah_attr->vlan_id,
+							 &init_net);
 			if (ret)
 				return ret;
 		}
@@ -882,11 +884,15 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
 			if (!(*qp_attr_mask & IB_QP_VID))
 				qp_attr->vlan_id = rdma_get_vlan_id(&sgid);
 		} else {
-			ret = rdma_addr_find_dmac_by_grh(&sgid, &qp_attr->ah_attr.grh.dgid,
-					qp_attr->ah_attr.dmac, &qp_attr->vlan_id);
+			ret = rdma_addr_find_dmac_by_grh(
+				&sgid,
+				&qp_attr->ah_attr.grh.dgid,
+				qp_attr->ah_attr.dmac, &qp_attr->vlan_id,
+				&init_net);
 			if (ret)
 				goto out;
-			ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr->smac, NULL);
+			ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr->smac,
+							  NULL, &init_net);
 			if (ret)
 				goto out;
 		}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index d812904f3984..cc0626d4f1d4 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -123,7 +123,8 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
 
 	if (pd->uctx) {
 		status = rdma_addr_find_dmac_by_grh(&sgid, &attr->grh.dgid,
-                                        attr->dmac, &attr->vlan_id);
+						    attr->dmac, &attr->vlan_id,
+						    &init_net);
 		if (status) {
 			pr_err("%s(): Failed to resolve dmac from gid." 
 				"status = %d\n", __func__, status);
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index ce55906b54a0..40ccf8b83755 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -47,6 +47,7 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_pack.h>
 #include <net/ipv6.h>
+#include <net/net_namespace.h>
 
 struct rdma_addr_client {
 	atomic_t refcount;
@@ -64,6 +65,16 @@ void rdma_addr_register_client(struct rdma_addr_client *client);
  */
 void rdma_addr_unregister_client(struct rdma_addr_client *client);
 
+/**
+ * struct rdma_dev_addr - Contains resolved RDMA hardware addresses
+ * @src_dev_addr:	Source MAC address.
+ * @dst_dev_addr:	Destination MAC address.
+ * @broadcast:		Broadcast address of the device.
+ * @dev_type:		The interface hardware type of the device.
+ * @bound_dev_if:	An optional device interface index.
+ * @transport:		The transport type used.
+ * @net:		Network namespace containing the bound_dev_if net_dev.
+ */
 struct rdma_dev_addr {
 	unsigned char src_dev_addr[MAX_ADDR_LEN];
 	unsigned char dst_dev_addr[MAX_ADDR_LEN];
@@ -71,11 +82,14 @@ struct rdma_dev_addr {
 	unsigned short dev_type;
 	int bound_dev_if;
 	enum rdma_transport_type transport;
+	struct net *net;
 };
 
 /**
  * rdma_translate_ip - Translate a local IP address to an RDMA hardware
  *   address.
+ *
+ * The dev_addr->net field must be initialized.
  */
 int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 		      u16 *vlan_id);
@@ -90,7 +104,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
  * @dst_addr: The destination address to resolve.
  * @addr: A reference to a data location that will receive the resolved
  *   addresses.  The data location must remain valid until the callback has
- *   been invoked.
+ *   been invoked. The net field of the addr struct must be valid.
  * @timeout_ms: Amount of time to wait for the address resolution to complete.
  * @callback: Call invoked once address resolution has completed, timed out,
  *   or been canceled.  A status of 0 indicates success.
@@ -110,9 +124,29 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
 
 int rdma_addr_size(struct sockaddr *addr);
 
-int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id);
-int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *smac,
-			       u16 *vlan_id);
+/** rdma_addr_find_smac_by_sgid() - Find the src MAC and VLAN ID for a src GID
+ * @sgid:	Source GID to find the MAC and VLAN for.
+ * @smac:	A buffer to contain the resulting MAC address.
+ * @vlan_id:	Will contain the resulting VLAN ID.
+ * @net:	Network namespace to use for the address resolution.
+ *
+ * It is the caller's responsibility to keep the network namespace alive until
+ * the function returns.
+ */
+int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id,
+				struct net *net);
+/** rdma_addr_find_dmac_by_grh() - Find the dst MAC and VLAN ID for a GID pair
+ * @sgid:	Source GID to use for the search.
+ * @dgid:	Destination GID to find the details for.
+ * @dmac:	Contains the resulting destination MAC address.
+ * @vlan_id:	Contains the resulting VLAN ID.
+ * @net:	Network namespace to use for the address resolution.
+ *
+ * It is the caller's responsibility to keep the network namespace alive until
+ * the function returns.
+ */
+int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
+			       u16 *vlan_id, struct net *net);
 
 static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr)
 {
@@ -182,7 +216,7 @@ static inline void iboe_addr_get_sgid(struct rdma_dev_addr *dev_addr,
 	struct net_device *dev;
 	struct in_device *ip4;
 
-	dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
+	dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
 	if (dev) {
 		ip4 = (struct in_device *)dev->ip_ptr;
 		if (ip4 && ip4->ifa_list && ip4->ifa_list->ifa_address)
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 03/11] IB/core: Pass network namespace as a parameter to relevant functions
  2015-04-20  9:03 [PATCH v2 00/11] Add network namespace support in the RDMA-CM Haggai Eran
       [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-04-20  9:03 ` Haggai Eran
  2015-04-20  9:03 ` [PATCH v2 05/11] IB/ipoib: Return IPoIB devices as possible matches to get_net_device_by_port_pkey_ip Haggai Eran
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Haggai Eran

From: Guy Shapiro <guysh@mellanox.com>

Add network namespace parameters for the address related ib_core
functions. The parameter is passed to lower level function, instead of
&init_net, so things are done in the correct namespace.

For now pass &init_net on every caller.
Callers that will pass &init_net permanently are marked with an
appropriate comment.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
---
 drivers/infiniband/core/agent.c       |  4 +++-
 drivers/infiniband/core/cm.c          |  9 +++++++--
 drivers/infiniband/core/mad_rmpp.c    | 10 ++++++++--
 drivers/infiniband/core/user_mad.c    |  4 +++-
 drivers/infiniband/core/verbs.c       | 10 ++++++----
 drivers/infiniband/ulp/srpt/ib_srpt.c |  3 ++-
 include/rdma/ib_verbs.h               | 15 +++++++++++++--
 7 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index f6d29614cb01..539378d64041 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -99,7 +99,9 @@ void agent_send_response(struct ib_mad *mad, struct ib_grh *grh,
 	}
 
 	agent = port_priv->agent[qpn];
-	ah = ib_create_ah_from_wc(agent->qp->pd, wc, grh, port_num);
+	/* Physical devices (and their MAD replies) always reside in the host
+	 * network namespace */
+	ah = ib_create_ah_from_wc(agent->qp->pd, wc, grh, port_num, &init_net);
 	if (IS_ERR(ah)) {
 		dev_err(&device->dev, "ib_create_ah_from_wc error %ld\n",
 			PTR_ERR(ah));
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index e28a494e2a3a..5a45cb76c43e 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -290,8 +290,13 @@ static int cm_alloc_response_msg(struct cm_port *port,
 	struct ib_mad_send_buf *m;
 	struct ib_ah *ah;
 
+	/* For IB, the network namespace doesn't affect the created address
+	 * handle, so we use &init_net. In the future, RoCE support will
+	 * require finding a specific network namespace to send the response
+	 * from. */
 	ah = ib_create_ah_from_wc(port->mad_agent->qp->pd, mad_recv_wc->wc,
-				  mad_recv_wc->recv_buf.grh, port->port_num);
+				  mad_recv_wc->recv_buf.grh, port->port_num,
+				  &init_net);
 	if (IS_ERR(ah))
 		return PTR_ERR(ah);
 
@@ -346,7 +351,7 @@ static void cm_init_av_for_response(struct cm_port *port, struct ib_wc *wc,
 	av->port = port;
 	av->pkey_index = wc->pkey_index;
 	ib_init_ah_from_wc(port->cm_dev->ib_device, port->port_num, wc,
-			   grh, &av->ah_attr);
+			   grh, &av->ah_attr, &init_net);
 }
 
 static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index f37878c9c06e..6c1576202965 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -157,8 +157,11 @@ static struct ib_mad_send_buf *alloc_response_msg(struct ib_mad_agent *agent,
 	struct ib_ah *ah;
 	int hdr_len;
 
+	/* Physical devices (and their MAD replies) always reside in the host
+	 * network namespace */
 	ah = ib_create_ah_from_wc(agent->qp->pd, recv_wc->wc,
-				  recv_wc->recv_buf.grh, agent->port_num);
+				  recv_wc->recv_buf.grh, agent->port_num,
+				  &init_net);
 	if (IS_ERR(ah))
 		return (void *) ah;
 
@@ -287,10 +290,13 @@ create_rmpp_recv(struct ib_mad_agent_private *agent,
 	if (!rmpp_recv)
 		return NULL;
 
+	/* Physical devices (and their MAD replies) always reside in the host
+	 * network namespace */
 	rmpp_recv->ah = ib_create_ah_from_wc(agent->agent.qp->pd,
 					     mad_recv_wc->wc,
 					     mad_recv_wc->recv_buf.grh,
-					     agent->agent.port_num);
+					     agent->agent.port_num,
+					     &init_net);
 	if (IS_ERR(rmpp_recv->ah))
 		goto error;
 
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 928cdd20e2d1..f34c6077759d 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -239,7 +239,9 @@ static void recv_handler(struct ib_mad_agent *agent,
 
 		ib_init_ah_from_wc(agent->device, agent->port_num,
 				   mad_recv_wc->wc, mad_recv_wc->recv_buf.grh,
-				   &ah_attr);
+				   &ah_attr, &init_net);
+		/* Note that network namespace seperation isn't supported on
+		 * umad yet. */
 
 		packet->mad.hdr.gid_index = ah_attr.grh.sgid_index;
 		packet->mad.hdr.hop_limit = ah_attr.grh.hop_limit;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index ca5c4dd8a67a..a51d5d642fb7 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -193,7 +193,8 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
 EXPORT_SYMBOL(ib_create_ah);
 
 int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
-		       struct ib_grh *grh, struct ib_ah_attr *ah_attr)
+		       struct ib_grh *grh, struct ib_ah_attr *ah_attr,
+		       struct net *net)
 {
 	u32 flow_class;
 	u16 gid_index;
@@ -214,7 +215,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
 			ret = rdma_addr_find_dmac_by_grh(&grh->dgid, &grh->sgid,
 							 ah_attr->dmac,
 							 &ah_attr->vlan_id,
-							 &init_net);
+							 net);
 			if (ret)
 				return ret;
 		}
@@ -247,12 +248,13 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
 EXPORT_SYMBOL(ib_init_ah_from_wc);
 
 struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, struct ib_wc *wc,
-				   struct ib_grh *grh, u8 port_num)
+				   struct ib_grh *grh, u8 port_num,
+				   struct net *net)
 {
 	struct ib_ah_attr ah_attr;
 	int ret;
 
-	ret = ib_init_ah_from_wc(pd->device, port_num, wc, grh, &ah_attr);
+	ret = ib_init_ah_from_wc(pd->device, port_num, wc, grh, &ah_attr, net);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 4b9b866e6b0d..a95e7d51cd8b 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -467,7 +467,8 @@ static void srpt_mad_recv_handler(struct ib_mad_agent *mad_agent,
 		return;
 
 	ah = ib_create_ah_from_wc(mad_agent->qp->pd, mad_wc->wc,
-				  mad_wc->recv_buf.grh, mad_agent->port_num);
+				  mad_wc->recv_buf.grh, mad_agent->port_num,
+				  &init_net);
 	if (IS_ERR(ah))
 		goto err;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 65994a19e840..f4a85decc60f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -48,6 +48,7 @@
 #include <linux/rwsem.h>
 #include <linux/scatterlist.h>
 #include <linux/workqueue.h>
+#include <net/net_namespace.h>
 #include <uapi/linux/if_ether.h>
 
 #include <linux/atomic.h>
@@ -1798,9 +1799,14 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
  *   ignored unless the work completion indicates that the GRH is valid.
  * @ah_attr: Returned attributes that can be used when creating an address
  *   handle for replying to the message.
+ * @net: The network namespace to use for address resolution.
+ *
+ * It is the caller's responsibility to make sure the network namespace is
+ * alive until the function returns.
  */
 int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
-		       struct ib_grh *grh, struct ib_ah_attr *ah_attr);
+		       struct ib_grh *grh, struct ib_ah_attr *ah_attr,
+		       struct net *net);
 
 /**
  * ib_create_ah_from_wc - Creates an address handle associated with the
@@ -1810,12 +1816,17 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
  * @grh: References the received global route header.  This parameter is
  *   ignored unless the work completion indicates that the GRH is valid.
  * @port_num: The outbound port number to associate with the address.
+ * @net: The network namespace to use for address resolution.
  *
  * The address handle is used to reference a local or global destination
  * in all UD QP post sends.
+ *
+ * It is the caller's responsibility to make sure the network namespace is
+ * alive until the function returns.
  */
 struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, struct ib_wc *wc,
-				   struct ib_grh *grh, u8 port_num);
+				   struct ib_grh *grh, u8 port_num,
+				   struct net *net);
 
 /**
  * ib_modify_ah - Modifies the address vector associated with an address
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 04/11] IB/core: Find the network namespace matching connection parameters
       [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-04-20  9:03   ` [PATCH v2 01/11] RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6 Haggai Eran
  2015-04-20  9:03   ` [PATCH v2 02/11] IB/addr: Pass network namespace as a parameter Haggai Eran
@ 2015-04-20  9:03   ` Haggai Eran
  2015-04-20  9:03   ` [PATCH v2 06/11] IB/cm, cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm Haggai Eran
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Haggai Eran

From: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

In the case of IPoIB, and maybe in other cases, the network device is
managed by an upper-layer protocol (ULP). In order to expose this
network device to other users of the IB device, let ULPs implement
a callback that returns network device according to connection parameters.

The IB device and port, together with the P_Key and the IP address should be
enough to uniquely identify the ULP net device.

This function is passed to ib_core as part of the ib_client
registration.

Using this functionality, add a way to get the network namespace
corresponding to a work completion. This is needed so that responses to CM
requests can be sent from the same network namespace as the request.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/device.c | 57 ++++++++++++++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h          | 29 ++++++++++++++++++++
 2 files changed, 86 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece765f2..2f06be5b0b59 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -38,6 +38,7 @@
 #include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/mutex.h>
+#include <linux/netdevice.h>
 #include <rdma/rdma_netlink.h>
 
 #include "core_priv.h"
@@ -733,6 +734,62 @@ int ib_find_pkey(struct ib_device *device,
 }
 EXPORT_SYMBOL(ib_find_pkey);
 
+static struct net_device *ib_get_net_dev_by_port_pkey_ip(struct ib_device *dev,
+							 u8 port,
+							 u16 pkey,
+							 struct sockaddr *addr)
+{
+	struct net_device *ret = NULL;
+	struct ib_client *client;
+
+	mutex_lock(&device_mutex);
+	list_for_each_entry(client, &client_list, list)
+		if (client->get_net_device_by_port_pkey_ip) {
+			ret = client->get_net_device_by_port_pkey_ip(dev, port,
+								     pkey,
+								     addr);
+			if (ret)
+				break;
+		}
+
+	mutex_unlock(&device_mutex);
+	return ret;
+}
+
+struct net *ib_get_net_ns_by_port_pkey_ip(struct ib_device *dev,
+					  u8 port,
+					  u16 pkey,
+					  struct sockaddr *addr)
+{
+	struct net_device *ndev = NULL;
+	struct net *ns;
+
+	switch (rdma_port_get_link_layer(dev, port)) {
+	case IB_LINK_LAYER_INFINIBAND:
+		if (!addr)
+			goto not_found;
+		ndev = ib_get_net_dev_by_port_pkey_ip(dev, port, pkey, addr);
+		break;
+	default:
+		goto not_found;
+	}
+
+	if (!ndev)
+		goto not_found;
+
+	rcu_read_lock();
+	ns = maybe_get_net(dev_net(ndev));
+	dev_put(ndev);
+	rcu_read_unlock();
+	if (!ns)
+		goto not_found;
+	return ns;
+
+not_found:
+	return get_net(&init_net);
+}
+EXPORT_SYMBOL(ib_get_net_ns_by_port_pkey_ip);
+
 static int __init ib_core_init(void)
 {
 	int ret;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f4a85decc60f..74b239410562 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1683,6 +1683,21 @@ struct ib_client {
 	void (*add)   (struct ib_device *);
 	void (*remove)(struct ib_device *);
 
+	/* Returns the net_dev belonging to this ib_client and matching the
+	 * given parameters.
+	 * @dev:	An RDMA device that the net_dev use for communication.
+	 * @port:	A physical port number on the RDMA device.
+	 * @pkey:	P_Key that the net_dev uses if applicable.
+	 * @addr:	An IP address the net_dev is configured with.
+	 *
+	 * An ib_client that implements a net_dev on top of RDMA devices
+	 * (such as IP over IB) should implement this callback, allowing the
+	 * rdma_cm module to find the right net_dev for a given request. */
+	struct net_device *(*get_net_device_by_port_pkey_ip)(
+			struct ib_device *dev,
+			u8 port,
+			u16 pkey,
+			struct sockaddr *addr);
 	struct list_head list;
 };
 
@@ -2679,4 +2694,18 @@ static inline int ib_check_mr_access(int flags)
 int ib_check_mr_status(struct ib_mr *mr, u32 check_mask,
 		       struct ib_mr_status *mr_status);
 
+/**
+ * ib_get_net_ns_by_port_pkey_ip() - Return the appropriate net namespace
+ * for a received CM request
+ * @dev:	An RDMA device on which the request has been received.
+ * @port:	Port number on the RDMA device.
+ * @pkey:	The Pkey the request came on.
+ * @addr:	Contains the IP address that the request specified as its
+ *		destination.
+ */
+struct net *ib_get_net_ns_by_port_pkey_ip(struct ib_device *dev,
+					  u8 port,
+					  u16 pkey,
+					  struct sockaddr *addr);
+
 #endif /* IB_VERBS_H */
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 05/11] IB/ipoib: Return IPoIB devices as possible matches to get_net_device_by_port_pkey_ip
  2015-04-20  9:03 [PATCH v2 00/11] Add network namespace support in the RDMA-CM Haggai Eran
       [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-04-20  9:03 ` [PATCH v2 03/11] IB/core: Pass network namespace as a parameter to relevant functions Haggai Eran
@ 2015-04-20  9:03 ` Haggai Eran
  2015-04-20 23:09   ` ira.weiny
  2015-04-20  9:03 ` [PATCH v2 07/11] IB/cm: Add network namespace support Haggai Eran
  2015-04-20  9:03 ` [PATCH v2 10/11] IB/ucma: Take the network namespace from the process Haggai Eran
  4 siblings, 1 reply; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Haggai Eran

From: Guy Shapiro <guysh@mellanox.com>

Implement callback that returns network device to ib_core according to
connection parameters. Check the ipoib device and iterate over all child
devices to look for a match.

For each ipoib device we iterate through all upper devices when searching for
a matching IP, in order to support bonding.

Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 122 +++++++++++++++++++++++++++++-
 1 file changed, 121 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 7cad4dd87469..89a59a0e17e6 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -48,6 +48,9 @@
 
 #include <linux/jhash.h>
 #include <net/arp.h>
+#include <net/addrconf.h>
+#include <linux/inetdevice.h>
+#include <rdma/ib_cache.h>
 
 #define DRV_VERSION "1.0.0"
 
@@ -91,11 +94,15 @@ struct ib_sa_client ipoib_sa_client;
 static void ipoib_add_one(struct ib_device *device);
 static void ipoib_remove_one(struct ib_device *device);
 static void ipoib_neigh_reclaim(struct rcu_head *rp);
+static struct net_device *ipoib_get_net_device_by_port_pkey_ip(
+		struct ib_device *dev, u8 port, u16 pkey,
+		struct sockaddr *addr);
 
 static struct ib_client ipoib_client = {
 	.name   = "ipoib",
 	.add    = ipoib_add_one,
-	.remove = ipoib_remove_one
+	.remove = ipoib_remove_one,
+	.get_net_device_by_port_pkey_ip = ipoib_get_net_device_by_port_pkey_ip,
 };
 
 int ipoib_open(struct net_device *dev)
@@ -222,6 +229,119 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
+static bool ipoib_is_dev_match_addr(struct sockaddr *addr,
+				    struct net_device *dev)
+{
+	struct net *net = dev_net(dev);
+
+	if (addr->sa_family == AF_INET) {
+		struct in_device *in_dev = in_dev_get(dev);
+		struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
+		__be32 ret_addr;
+
+		if (!in_dev)
+			return false;
+
+		ret_addr = inet_confirm_addr(net, in_dev, 0,
+					     addr_in->sin_addr.s_addr,
+					     RT_SCOPE_HOST);
+		in_dev_put(in_dev);
+		if (ret_addr)
+			return true;
+	}
+#if IS_ENABLED(CONFIG_IPV6)
+	else if (addr->sa_family == AF_INET6) {
+		struct sockaddr_in6 *addr_in6 = (struct sockaddr_in6 *)addr;
+
+		if (ipv6_chk_addr(net, &addr_in6->sin6_addr, dev, 1))
+			return true;
+	}
+#endif
+	return false;
+}
+
+/**
+ * Find a net_device matching the given address, which is an upper device of
+ * the given net_device.
+ * @addr: IP address to look for.
+ * @dev: base IPoIB net_device
+ *
+ * If found, returns the net_device with a reference held. Otherwise return
+ * NULL.
+ */
+static struct net_device *ipoib_get_net_dev_match_addr(struct sockaddr *addr,
+						       struct net_device *dev)
+{
+	struct net_device *upper,
+			  *result = NULL;
+	struct list_head *iter;
+
+	if (ipoib_is_dev_match_addr(addr, dev)) {
+		dev_hold(dev);
+		return dev;
+	}
+
+	rcu_read_lock();
+	netdev_for_each_all_upper_dev_rcu(dev, upper, iter) {
+		if (ipoib_is_dev_match_addr(addr, upper)) {
+			dev_hold(upper);
+			result = upper;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	return result;
+}
+
+static struct net_device *ipoib_get_net_device_by_port_pkey_ip(
+		struct ib_device *dev, u8 port, u16 pkey, struct sockaddr *addr)
+{
+	struct ipoib_dev_priv *priv;
+	struct list_head *dev_list;
+	u16 pkey_index;
+
+	ib_find_cached_pkey(dev, port, pkey, &pkey_index);
+	if (pkey_index == (u16)-1)
+		return NULL;
+
+	if (rdma_node_get_transport(dev->node_type) != RDMA_TRANSPORT_IB)
+		return NULL;
+
+	dev_list = ib_get_client_data(dev, &ipoib_client);
+	if (!dev_list)
+		return NULL;
+
+	list_for_each_entry(priv, dev_list, list) {
+		struct net_device *net_dev = NULL;
+		struct ipoib_dev_priv *child_priv;
+
+		if (priv->port != port)
+			continue;
+
+		if (priv->pkey_index == pkey_index) {
+			net_dev = ipoib_get_net_dev_match_addr(addr, priv->dev);
+			if (net_dev)
+				return net_dev;
+		}
+
+		down_read(&priv->vlan_rwsem);
+		list_for_each_entry(child_priv,
+				    &priv->child_intfs, list) {
+			if (child_priv->pkey_index != pkey_index)
+				continue;
+
+			net_dev = ipoib_get_net_dev_match_addr(
+					addr, child_priv->dev);
+			if (net_dev)
+				break;
+		}
+		up_read(&priv->vlan_rwsem);
+		if (net_dev)
+			return net_dev;
+	}
+	return NULL;
+}
+
 int ipoib_set_mode(struct net_device *dev, const char *buf)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 06/11] IB/cm, cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm
       [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-04-20  9:03   ` [PATCH v2 04/11] IB/core: Find the network namespace matching connection parameters Haggai Eran
@ 2015-04-20  9:03   ` Haggai Eran
       [not found]     ` <1429520622-10303-7-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-04-20  9:03   ` [PATCH v2 08/11] IB/cma: Separate port allocation to network namespaces Haggai Eran
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Haggai Eran

From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When receiving a connection request, ib_cm needs to associate the request with
a network namespace. To do this, it needs to know the request's destination
IP. For this the RDMA IP CM packet formatting functionality needs to be
exposed to ib_cm.

This patch merely moves the RDMA IP CM data formatting and parsing functions
to be part of ib_cm. The following patch will utilize the new knowledge to
look-up the appropriate namespace. Each namespace maintains an independent
table of RDMA CM service IDs, allowing isolation and separation between the
network namespaces.

When creating a new incoming connection ID, the code in cm_save_ip_info can no
longer rely on the listener's private data to find the port number, so it
reads it from the requested service ID. This required saving the service ID in
cm_format_paths_from_req.

Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cm.c  | 156 +++++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/cma.c | 166 +++++-------------------------------------
 include/rdma/ib_cm.h          |  56 ++++++++++++++
 3 files changed, 230 insertions(+), 148 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 5a45cb76c43e..efc5cffb675a 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -51,6 +51,7 @@
 
 #include <rdma/ib_cache.h>
 #include <rdma/ib_cm.h>
+#include <rdma/ib.h>
 #include "cm_msgs.h"
 
 MODULE_AUTHOR("Sean Hefty");
@@ -701,6 +702,159 @@ static void cm_reject_sidr_req(struct cm_id_private *cm_id_priv,
 	ib_send_cm_sidr_rep(&cm_id_priv->id, &param);
 }
 
+int cm_format_hdr(void *hdr, int family,
+		  struct sockaddr *src_addr,
+		  struct sockaddr *dst_addr)
+{
+	struct cm_hdr *cm_hdr;
+
+	cm_hdr = hdr;
+	cm_hdr->cm_version = RDMA_IP_CM_VERSION;
+	if (family == AF_INET) {
+		struct sockaddr_in *src4, *dst4;
+
+		src4 = (struct sockaddr_in *)src_addr;
+		dst4 = (struct sockaddr_in *)dst_addr;
+
+		cm_set_ip_ver(cm_hdr, 4);
+		cm_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
+		cm_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
+		cm_hdr->port = src4->sin_port;
+	} else if (family == AF_INET6) {
+		struct sockaddr_in6 *src6, *dst6;
+
+		src6 = (struct sockaddr_in6 *)src_addr;
+		dst6 = (struct sockaddr_in6 *)dst_addr;
+
+		cm_set_ip_ver(cm_hdr, 6);
+		cm_hdr->src_addr.ip6 = src6->sin6_addr;
+		cm_hdr->dst_addr.ip6 = dst6->sin6_addr;
+		cm_hdr->port = src6->sin6_port;
+	}
+	return 0;
+}
+EXPORT_SYMBOL(cm_format_hdr);
+
+static void cm_save_ib_info(struct sockaddr *src_addr,
+			    struct sockaddr *dst_addr,
+			    struct ib_sa_path_rec *path)
+{
+	struct sockaddr_ib  *ib;
+
+	if (src_addr) {
+		ib = (struct sockaddr_ib *)src_addr;
+		ib->sib_family = AF_IB;
+		ib->sib_pkey = path->pkey;
+		ib->sib_flowinfo = path->flow_label;
+		memcpy(&ib->sib_addr, &path->sgid, 16);
+		ib->sib_sid = path->service_id;
+		ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
+		ib->sib_scope_id = 0;
+	}
+	if (dst_addr) {
+		ib = (struct sockaddr_ib *)dst_addr;
+		ib->sib_family = AF_IB;
+		ib->sib_pkey = path->pkey;
+		ib->sib_flowinfo = path->flow_label;
+		memcpy(&ib->sib_addr, &path->dgid, 16);
+	}
+}
+
+static void cm_save_ip6_info(struct sockaddr *src_addr,
+			     struct sockaddr *dst_addr,
+			     struct cm_hdr *hdr,
+			     __be16 local_port)
+{
+	struct sockaddr_in6 *ip6;
+
+	if (src_addr) {
+		ip6 = (struct sockaddr_in6 *)src_addr;
+		ip6->sin6_family = AF_INET6;
+		ip6->sin6_addr = hdr->dst_addr.ip6;
+		ip6->sin6_port = local_port;
+	}
+
+	if (dst_addr) {
+		ip6 = (struct sockaddr_in6 *)dst_addr;
+		ip6->sin6_family = AF_INET6;
+		ip6->sin6_addr = hdr->src_addr.ip6;
+		ip6->sin6_port = hdr->port;
+	}
+}
+
+static void cm_save_ip4_info(struct sockaddr *src_addr,
+			     struct sockaddr *dst_addr,
+			     struct cm_hdr *hdr,
+			     __be16 local_port)
+{
+	struct sockaddr_in *ip4;
+
+	if (src_addr) {
+		ip4 = (struct sockaddr_in *)src_addr;
+		ip4->sin_family = AF_INET;
+		ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
+		ip4->sin_port = local_port;
+	}
+
+	if (dst_addr) {
+		ip4 = (struct sockaddr_in *)dst_addr;
+		ip4->sin_family = AF_INET;
+		ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
+		ip4->sin_port = hdr->port;
+	}
+}
+
+static __be16 cm_port_from_service_id(__be64 service_id)
+{
+	return htons(be64_to_cpu(service_id));
+}
+
+static int cm_save_ip_info(struct sockaddr *src_addr,
+			   struct sockaddr *dst_addr,
+			   struct cm_work *work)
+{
+	struct cm_hdr *hdr;
+	__be16 port;
+
+	hdr = work->cm_event.private_data;
+	if (hdr->cm_version != RDMA_IP_CM_VERSION)
+		return -EINVAL;
+
+	port = cm_port_from_service_id(work->path->service_id);
+
+	switch (cm_get_ip_ver(hdr)) {
+	case 4:
+		cm_save_ip4_info(src_addr, dst_addr, hdr, port);
+		break;
+	case 6:
+		cm_save_ip6_info(src_addr, dst_addr, hdr, port);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int cm_save_net_info(struct sockaddr *src_addr,
+		     struct sockaddr *dst_addr,
+		     struct ib_cm_event *ib_event)
+{
+	struct cm_work *work = container_of(ib_event, struct cm_work, cm_event);
+
+	if ((rdma_port_get_link_layer(work->port->cm_dev->ib_device,
+				      work->port->port_num) ==
+	     IB_LINK_LAYER_INFINIBAND) &&
+	    (ib_event->event == IB_CM_REQ_RECEIVED)) {
+		cm_save_ib_info(src_addr, dst_addr,
+				ib_event->param.req_rcvd.primary_path);
+		return 0;
+	}
+
+	return cm_save_ip_info(src_addr, dst_addr, work);
+}
+EXPORT_SYMBOL(cm_save_net_info);
+
 struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 				 ib_cm_handler cm_handler,
 				 void *context)
@@ -1260,6 +1414,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
 	primary_path->packet_life_time =
 		cm_req_get_primary_local_ack_timeout(req_msg);
 	primary_path->packet_life_time -= (primary_path->packet_life_time > 0);
+	primary_path->service_id = req_msg->service_id;
 
 	if (req_msg->alt_local_lid) {
 		memset(alt_path, 0, sizeof *alt_path);
@@ -1281,6 +1436,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
 		alt_path->packet_life_time =
 			cm_req_get_alt_local_ack_timeout(req_msg);
 		alt_path->packet_life_time -= (alt_path->packet_life_time > 0);
+		alt_path->service_id = req_msg->service_id;
 	}
 }
 
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index aeb2417ec928..9f6faeb1de5f 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -179,23 +179,8 @@ struct iboe_mcast_work {
 	struct cma_multicast	*mc;
 };
 
-union cma_ip_addr {
-	struct in6_addr ip6;
-	struct {
-		__be32 pad[3];
-		__be32 addr;
-	} ip4;
-};
 
-struct cma_hdr {
-	u8 cma_version;
-	u8 ip_version;	/* IP version: 7:4 */
-	__be16 port;
-	union cma_ip_addr src_addr;
-	union cma_ip_addr dst_addr;
-};
 
-#define CMA_VERSION 0x00
 
 static int cma_comp(struct rdma_id_private *id_priv, enum rdma_cm_state comp)
 {
@@ -234,16 +219,6 @@ static enum rdma_cm_state cma_exch(struct rdma_id_private *id_priv,
 	return old;
 }
 
-static inline u8 cma_get_ip_ver(struct cma_hdr *hdr)
-{
-	return hdr->ip_version >> 4;
-}
-
-static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
-{
-	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
-}
-
 static void cma_attach_to_dev(struct rdma_id_private *id_priv,
 			      struct cma_device *cma_dev)
 {
@@ -839,93 +814,9 @@ static inline int cma_any_port(struct sockaddr *addr)
 	return !cma_port(addr);
 }
 
-static void cma_save_ib_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			     struct ib_sa_path_rec *path)
-{
-	struct sockaddr_ib *listen_ib, *ib;
-
-	listen_ib = (struct sockaddr_ib *) &listen_id->route.addr.src_addr;
-	ib = (struct sockaddr_ib *) &id->route.addr.src_addr;
-	ib->sib_family = listen_ib->sib_family;
-	ib->sib_pkey = path->pkey;
-	ib->sib_flowinfo = path->flow_label;
-	memcpy(&ib->sib_addr, &path->sgid, 16);
-	ib->sib_sid = listen_ib->sib_sid;
-	ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
-	ib->sib_scope_id = listen_ib->sib_scope_id;
-
-	ib = (struct sockaddr_ib *) &id->route.addr.dst_addr;
-	ib->sib_family = listen_ib->sib_family;
-	ib->sib_pkey = path->pkey;
-	ib->sib_flowinfo = path->flow_label;
-	memcpy(&ib->sib_addr, &path->dgid, 16);
-}
-
-static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			      struct cma_hdr *hdr)
-{
-	struct sockaddr_in *listen4, *ip4;
-
-	listen4 = (struct sockaddr_in *) &listen_id->route.addr.src_addr;
-	ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
-	ip4->sin_family = AF_INET;
-	ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
-	ip4->sin_port = listen4->sin_port;
-
-	ip4 = (struct sockaddr_in *) &id->route.addr.dst_addr;
-	ip4->sin_family = AF_INET;
-	ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
-	ip4->sin_port = hdr->port;
-}
-
-static void cma_save_ip6_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			      struct cma_hdr *hdr)
-{
-	struct sockaddr_in6 *listen6, *ip6;
-
-	listen6 = (struct sockaddr_in6 *) &listen_id->route.addr.src_addr;
-	ip6 = (struct sockaddr_in6 *) &id->route.addr.src_addr;
-	ip6->sin6_family = AF_INET6;
-	ip6->sin6_addr = hdr->dst_addr.ip6;
-	ip6->sin6_port = listen6->sin6_port;
-
-	ip6 = (struct sockaddr_in6 *) &id->route.addr.dst_addr;
-	ip6->sin6_family = AF_INET6;
-	ip6->sin6_addr = hdr->src_addr.ip6;
-	ip6->sin6_port = hdr->port;
-}
-
-static int cma_save_net_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			     struct ib_cm_event *ib_event)
-{
-	struct cma_hdr *hdr;
-
-	if ((listen_id->route.addr.src_addr.ss_family == AF_IB) &&
-	    (ib_event->event == IB_CM_REQ_RECEIVED)) {
-		cma_save_ib_info(id, listen_id, ib_event->param.req_rcvd.primary_path);
-		return 0;
-	}
-
-	hdr = ib_event->private_data;
-	if (hdr->cma_version != CMA_VERSION)
-		return -EINVAL;
-
-	switch (cma_get_ip_ver(hdr)) {
-	case 4:
-		cma_save_ip4_info(id, listen_id, hdr);
-		break;
-	case 6:
-		cma_save_ip6_info(id, listen_id, hdr);
-		break;
-	default:
-		return -EINVAL;
-	}
-	return 0;
-}
-
 static inline int cma_user_data_offset(struct rdma_id_private *id_priv)
 {
-	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cma_hdr);
+	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cm_hdr);
 }
 
 static void cma_cancel_route(struct rdma_id_private *id_priv)
@@ -1195,7 +1086,9 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
 		return NULL;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
-	if (cma_save_net_info(id, listen_id, ib_event))
+	if (cm_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
+			     (struct sockaddr *)&id->route.addr.dst_addr,
+			     ib_event))
 		goto err;
 
 	rt = &id->route;
@@ -1241,7 +1134,9 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
 		return NULL;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
-	if (cma_save_net_info(id, listen_id, ib_event))
+	if (cm_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
+			     (struct sockaddr *)&id->route.addr.dst_addr,
+			     ib_event))
 		goto err;
 
 	if (!cma_any_addr((struct sockaddr *) &id->route.addr.src_addr)) {
@@ -1369,7 +1264,7 @@ EXPORT_SYMBOL(rdma_get_service_id);
 static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
 				 struct ib_cm_compare_data *compare)
 {
-	struct cma_hdr *cma_data, *cma_mask;
+	struct cm_hdr *cma_data, *cma_mask;
 	__be32 ip4_addr;
 	struct in6_addr ip6_addr;
 
@@ -1380,8 +1275,8 @@ static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
 	switch (addr->sa_family) {
 	case AF_INET:
 		ip4_addr = ((struct sockaddr_in *) addr)->sin_addr.s_addr;
-		cma_set_ip_ver(cma_data, 4);
-		cma_set_ip_ver(cma_mask, 0xF);
+		cm_set_ip_ver(cma_data, 4);
+		cm_set_ip_ver(cma_mask, 0xF);
 		if (!cma_any_addr(addr)) {
 			cma_data->dst_addr.ip4.addr = ip4_addr;
 			cma_mask->dst_addr.ip4.addr = htonl(~0);
@@ -1389,8 +1284,8 @@ static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
 		break;
 	case AF_INET6:
 		ip6_addr = ((struct sockaddr_in6 *) addr)->sin6_addr;
-		cma_set_ip_ver(cma_data, 6);
-		cma_set_ip_ver(cma_mask, 0xF);
+		cm_set_ip_ver(cma_data, 6);
+		cm_set_ip_ver(cma_mask, 0xF);
 		if (!cma_any_addr(addr)) {
 			cma_data->dst_addr.ip6 = ip6_addr;
 			memset(&cma_mask->dst_addr.ip6, 0xFF,
@@ -2615,35 +2510,6 @@ err1:
 }
 EXPORT_SYMBOL(rdma_bind_addr);
 
-static int cma_format_hdr(void *hdr, struct rdma_id_private *id_priv)
-{
-	struct cma_hdr *cma_hdr;
-
-	cma_hdr = hdr;
-	cma_hdr->cma_version = CMA_VERSION;
-	if (cma_family(id_priv) == AF_INET) {
-		struct sockaddr_in *src4, *dst4;
-
-		src4 = (struct sockaddr_in *) cma_src_addr(id_priv);
-		dst4 = (struct sockaddr_in *) cma_dst_addr(id_priv);
-
-		cma_set_ip_ver(cma_hdr, 4);
-		cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
-		cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
-		cma_hdr->port = src4->sin_port;
-	} else if (cma_family(id_priv) == AF_INET6) {
-		struct sockaddr_in6 *src6, *dst6;
-
-		src6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
-		dst6 = (struct sockaddr_in6 *) cma_dst_addr(id_priv);
-
-		cma_set_ip_ver(cma_hdr, 6);
-		cma_hdr->src_addr.ip6 = src6->sin6_addr;
-		cma_hdr->dst_addr.ip6 = dst6->sin6_addr;
-		cma_hdr->port = src6->sin6_port;
-	}
-	return 0;
-}
 
 static int cma_sidr_rep_handler(struct ib_cm_id *cm_id,
 				struct ib_cm_event *ib_event)
@@ -2731,7 +2597,9 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv,
 		       conn_param->private_data_len);
 
 	if (private_data) {
-		ret = cma_format_hdr(private_data, id_priv);
+		ret = cm_format_hdr(private_data, cma_family(id_priv),
+				    cma_src_addr(id_priv),
+				    cma_dst_addr(id_priv));
 		if (ret)
 			goto out;
 		req.private_data = private_data;
@@ -2796,7 +2664,9 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
 
 	route = &id_priv->id.route;
 	if (private_data) {
-		ret = cma_format_hdr(private_data, id_priv);
+		ret = cm_format_hdr(private_data, cma_family(id_priv),
+				    cma_src_addr(id_priv),
+				    cma_dst_addr(id_priv));
 		if (ret)
 			goto out;
 		req.private_data = private_data;
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 0e3ff30647d5..0e49933c7b2a 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -274,6 +274,62 @@ struct ib_cm_event {
 #define CM_LAP_ATTR_ID		cpu_to_be16(0x0019)
 #define CM_APR_ATTR_ID		cpu_to_be16(0x001A)
 
+union cm_ip_addr {
+	struct in6_addr ip6;
+	struct {
+		__be32 pad[3];
+		__be32 addr;
+	} ip4;
+};
+
+struct cm_hdr {
+	u8 cm_version;
+	u8 ip_version;	/* IP version: 7:4 */
+	__be16 port;
+	union cm_ip_addr src_addr;
+	union cm_ip_addr dst_addr;
+};
+
+#define RDMA_IP_CM_VERSION 0x00
+
+static inline u8 cm_get_ip_ver(struct cm_hdr *hdr)
+{
+	return hdr->ip_version >> 4;
+}
+
+static inline void cm_set_ip_ver(struct cm_hdr *hdr, u8 ip_ver)
+{
+	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
+}
+
+/**
+ * cm_format_hdr - Fill in a cm_hdr struct according to connection details
+ * @hdr:      cm_hdr struct to fill
+ * @family:   ip family of the addresses - AF_INET or AF_INTET6
+ * @src_addr: source address of the connection
+ * @dst_addr: destination address of the connection
+ **/
+int cm_format_hdr(void *hdr, int family,
+		  struct sockaddr *src_addr,
+		  struct sockaddr *dst_addr);
+
+/**
+ * cm_save_net_info - saves ib connection event details
+ * @src_addr: source address of the connection
+ * @dst_addr: destination address of the connection
+ * @ib_event: ib event to take connection details from
+ **/
+int cm_save_net_info(struct sockaddr *src_addr,
+		     struct sockaddr *dst_addr,
+		     struct ib_cm_event *ib_event);
+
+/**
+ * cm_set_ip_ver - sets the ip version of a cm_hdr struct
+ * @hdr:    cm_hdr struct to change
+ * @ip_ver: ip version to set - a 4 bit value
+ **/
+void cm_set_ip_ver(struct cm_hdr *hdr, u8 ip_ver);
+
 /**
  * ib_cm_handler - User-defined callback to process communication events.
  * @cm_id: Communication identifier associated with the reported event.
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 07/11] IB/cm: Add network namespace support
  2015-04-20  9:03 [PATCH v2 00/11] Add network namespace support in the RDMA-CM Haggai Eran
                   ` (2 preceding siblings ...)
  2015-04-20  9:03 ` [PATCH v2 05/11] IB/ipoib: Return IPoIB devices as possible matches to get_net_device_by_port_pkey_ip Haggai Eran
@ 2015-04-20  9:03 ` Haggai Eran
       [not found]   ` <1429520622-10303-8-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-04-20  9:03 ` [PATCH v2 10/11] IB/ucma: Take the network namespace from the process Haggai Eran
  4 siblings, 1 reply; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Haggai Eran

From: Guy Shapiro <guysh@mellanox.com>

Add namespace support to the IB-CM layer.

- Each CM-ID now has a network namespace it is associated with, assigned at
  creation. This namespace is used as needed during subsequent action on the
  CM-ID or related objects.

- All of the relevant calls to ib_addr and ib_core were updated to use the
  namespace from the CM-ID. External APIs were extended as needed to allow
  specifying the namespace where relevant.

- The listening service ID table is now also indexed by the CM-ID namespace.

- For incoming connection requests, we use the connection parameters to select
  namespace. The namespace is matched when looking for listening service ID.

To preserve current behavior pass init_net to ib_cm wherever network namespace
function parameters were added.

The ib_cm_create_id interface now takes a reference to the relevant network
namespace. CM-IDs created by accepting a connection for a listening CM-ID will
also take a reference to the namespace. When the ID is destroyed, the
namespace reference is released.

Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
---
 drivers/infiniband/core/cm.c            | 124 ++++++++++++++++++++++++--------
 drivers/infiniband/core/cma.c           |   8 ++-
 drivers/infiniband/core/ucm.c           |   3 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |  21 +++++-
 drivers/infiniband/ulp/srp/ib_srp.c     |   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c   |   2 +-
 include/rdma/ib_cm.h                    |   7 +-
 7 files changed, 130 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index efc5cffb675a..75c6ac9a4aee 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -241,6 +241,8 @@ struct cm_id_private {
 	u8 service_timeout;
 	u8 target_ack_delay;
 
+	struct net *net; /* A network namespace that the ID belongs to */
+
 	struct list_head work_list;
 	atomic_t work_count;
 };
@@ -347,12 +349,13 @@ static void cm_set_private_data(struct cm_id_private *cm_id_priv,
 }
 
 static void cm_init_av_for_response(struct cm_port *port, struct ib_wc *wc,
-				    struct ib_grh *grh, struct cm_av *av)
+				    struct ib_grh *grh, struct cm_av *av,
+				    struct net *net)
 {
 	av->port = port;
 	av->pkey_index = wc->pkey_index;
 	ib_init_ah_from_wc(port->cm_dev->ib_device, port->port_num, wc,
-			   grh, &av->ah_attr, &init_net);
+			   grh, &av->ah_attr, net);
 }
 
 static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
@@ -521,10 +524,15 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 		if ((cur_cm_id_priv->id.service_mask & service_id) ==
 		    (service_mask & cur_cm_id_priv->id.service_id) &&
 		    (cm_id_priv->id.device == cur_cm_id_priv->id.device) &&
-		    !data_cmp)
+		    !data_cmp &&
+		    net_eq(cm_id_priv->net, cur_cm_id_priv->net))
 			return cur_cm_id_priv;
 
-		if (cm_id_priv->id.device < cur_cm_id_priv->id.device)
+		if (cm_id_priv->net < cur_cm_id_priv->net)
+			link = &(*link)->rb_left;
+		else if (cm_id_priv->net > cur_cm_id_priv->net)
+			link = &(*link)->rb_right;
+		else if	(cm_id_priv->id.device < cur_cm_id_priv->id.device)
 			link = &(*link)->rb_left;
 		else if (cm_id_priv->id.device > cur_cm_id_priv->id.device)
 			link = &(*link)->rb_right;
@@ -544,7 +552,8 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 
 static struct cm_id_private * cm_find_listen(struct ib_device *device,
 					     __be64 service_id,
-					     u8 *private_data)
+					     u8 *private_data,
+					     struct net *net)
 {
 	struct rb_node *node = cm.listen_service_table.rb_node;
 	struct cm_id_private *cm_id_priv;
@@ -556,10 +565,14 @@ static struct cm_id_private * cm_find_listen(struct ib_device *device,
 						   cm_id_priv->compare_data);
 		if ((cm_id_priv->id.service_mask & service_id) ==
 		     cm_id_priv->id.service_id &&
-		    (cm_id_priv->id.device == device) && !data_cmp)
+		    (cm_id_priv->id.device == device) && !data_cmp &&
+		    net_eq(cm_id_priv->net, net))
 			return cm_id_priv;
-
-		if (device < cm_id_priv->id.device)
+		if (net < cm_id_priv->net)
+			node = node->rb_left;
+		else if (net > cm_id_priv->net)
+			node = node->rb_right;
+		else if (device < cm_id_priv->id.device)
 			node = node->rb_left;
 		else if (device > cm_id_priv->id.device)
 			node = node->rb_right;
@@ -857,7 +870,8 @@ EXPORT_SYMBOL(cm_save_net_info);
 
 struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 				 ib_cm_handler cm_handler,
-				 void *context)
+				 void *context,
+				 struct net *net)
 {
 	struct cm_id_private *cm_id_priv;
 	int ret;
@@ -875,6 +889,8 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 	if (ret)
 		goto error;
 
+	cm_id_priv->net = get_net(net);
+
 	spin_lock_init(&cm_id_priv->lock);
 	init_completion(&cm_id_priv->comp);
 	INIT_LIST_HEAD(&cm_id_priv->work_list);
@@ -1078,6 +1094,7 @@ retest:
 		cm_free_work(work);
 	kfree(cm_id_priv->compare_data);
 	kfree(cm_id_priv->private_data);
+	put_net(cm_id_priv->net);
 	kfree(cm_id_priv);
 }
 
@@ -1597,7 +1614,8 @@ free:	cm_free_msg(msg);
 }
 
 static struct cm_id_private * cm_match_req(struct cm_work *work,
-					   struct cm_id_private *cm_id_priv)
+					   struct cm_id_private *cm_id_priv,
+					   struct net *net)
 {
 	struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
 	struct cm_timewait_info *timewait_info;
@@ -1633,7 +1651,8 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
 	/* Find matching listen request. */
 	listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device,
 					   req_msg->service_id,
-					   req_msg->private_data);
+					   req_msg->private_data,
+					   net);
 	if (!listen_cm_id_priv) {
 		cm_cleanup_timewait(cm_id_priv->timewait_info);
 		spin_unlock_irq(&cm.lock);
@@ -1679,24 +1698,58 @@ static void cm_process_routed_req(struct cm_req_msg *req_msg, struct ib_wc *wc)
 	}
 }
 
+static int cm_is_cma_service_id(__be64 service_id)
+{
+	return (IB_CMA_SERVICE_ID_MASK & service_id) == IB_CMA_SERVICE_ID;
+}
+
+static struct net *cm_get_net_ns(struct cm_work *work, __be64 service_id,
+				 __be16 pkey)
+{
+	struct sockaddr_storage addr_storage;
+	struct sockaddr *listen_addr;
+
+	if (cm_is_cma_service_id(service_id)) {
+		listen_addr = (struct sockaddr *)&addr_storage;
+		cm_save_ip_info(listen_addr, NULL, work);
+	} else {
+		/* On RoCE we could extend this branch to determine the
+		 * destination IP from the incoming packet headers, and add
+		 * support for services that are not RDMA IP CM compliant. */
+		listen_addr = NULL;
+	}
+
+	return ib_get_net_ns_by_port_pkey_ip(work->port->cm_dev->ib_device,
+					     work->port->port_num,
+					     be16_to_cpu(pkey),
+					     listen_addr);
+}
 static int cm_req_handler(struct cm_work *work)
 {
 	struct ib_cm_id *cm_id;
 	struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
 	struct cm_req_msg *req_msg;
+	struct net *net;
 	int ret;
 
 	req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
+	work->cm_event.private_data = req_msg->private_data;
 
-	cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL);
-	if (IS_ERR(cm_id))
-		return PTR_ERR(cm_id);
+	net = cm_get_net_ns(work, req_msg->service_id, req_msg->pkey);
+
+	cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL, net);
+	/* cm_id took a reference to net, so no need to hold it anymore */
+	put_net(net);
+	if (IS_ERR(cm_id)) {
+		ret = PTR_ERR(cm_id);
+		goto out;
+	}
 
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
 	cm_id_priv->id.remote_id = req_msg->local_comm_id;
 	cm_init_av_for_response(work->port, work->mad_recv_wc->wc,
 				work->mad_recv_wc->recv_buf.grh,
-				&cm_id_priv->av);
+				&cm_id_priv->av, net);
 	cm_id_priv->timewait_info = cm_create_timewait_info(cm_id_priv->
 							    id.local_id);
 	if (IS_ERR(cm_id_priv->timewait_info)) {
@@ -1707,7 +1760,7 @@ static int cm_req_handler(struct cm_work *work)
 	cm_id_priv->timewait_info->remote_ca_guid = req_msg->local_ca_guid;
 	cm_id_priv->timewait_info->remote_qpn = cm_req_get_local_qpn(req_msg);
 
-	listen_cm_id_priv = cm_match_req(work, cm_id_priv);
+	listen_cm_id_priv = cm_match_req(work, cm_id_priv, net);
 	if (!listen_cm_id_priv) {
 		ret = -EINVAL;
 		kfree(cm_id_priv->timewait_info);
@@ -1766,6 +1819,7 @@ rejected:
 	cm_deref_id(listen_cm_id_priv);
 destroy:
 	ib_destroy_cm_id(cm_id);
+out:
 	return ret;
 }
 
@@ -2900,7 +2954,7 @@ static int cm_lap_handler(struct cm_work *work)
 	cm_id_priv->tid = lap_msg->hdr.tid;
 	cm_init_av_for_response(work->port, work->mad_recv_wc->wc,
 				work->mad_recv_wc->recv_buf.grh,
-				&cm_id_priv->av);
+				&cm_id_priv->av, cm_id_priv->net);
 	cm_init_av_by_path(param->alternate_path, &cm_id_priv->alt_av);
 	ret = atomic_inc_and_test(&cm_id_priv->work_count);
 	if (!ret)
@@ -3150,21 +3204,31 @@ static int cm_sidr_req_handler(struct cm_work *work)
 	struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
 	struct cm_sidr_req_msg *sidr_req_msg;
 	struct ib_wc *wc;
+	struct net *net;
+	int retval;
+
+	sidr_req_msg = (struct cm_sidr_req_msg *)
+				work->mad_recv_wc->recv_buf.mad;
+	work->cm_event.private_data = sidr_req_msg->private_data;
+
+	net = cm_get_net_ns(work, sidr_req_msg->service_id, sidr_req_msg->pkey);
 
-	cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL);
-	if (IS_ERR(cm_id))
-		return PTR_ERR(cm_id);
+	cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL, net);
+	/* cm_id took a reference to net, so no need to hold it anymore */
+	put_net(net);
+	if (IS_ERR(cm_id)) {
+		retval = PTR_ERR(cm_id);
+		goto out;
+	}
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
 
 	/* Record SGID/SLID and request ID for lookup. */
-	sidr_req_msg = (struct cm_sidr_req_msg *)
-				work->mad_recv_wc->recv_buf.mad;
 	wc = work->mad_recv_wc->wc;
 	cm_id_priv->av.dgid.global.subnet_prefix = cpu_to_be64(wc->slid);
 	cm_id_priv->av.dgid.global.interface_id = 0;
 	cm_init_av_for_response(work->port, work->mad_recv_wc->wc,
 				work->mad_recv_wc->recv_buf.grh,
-				&cm_id_priv->av);
+				&cm_id_priv->av, net);
 	cm_id_priv->id.remote_id = sidr_req_msg->request_id;
 	cm_id_priv->tid = sidr_req_msg->hdr.tid;
 	atomic_inc(&cm_id_priv->work_count);
@@ -3175,16 +3239,19 @@ static int cm_sidr_req_handler(struct cm_work *work)
 		spin_unlock_irq(&cm.lock);
 		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
 				counter[CM_SIDR_REQ_COUNTER]);
-		goto out; /* Duplicate message. */
+		retval = -EINVAL; /* Duplicate message. */
+		goto out_id;
 	}
 	cm_id_priv->id.state = IB_CM_SIDR_REQ_RCVD;
 	cur_cm_id_priv = cm_find_listen(cm_id->device,
 					sidr_req_msg->service_id,
-					sidr_req_msg->private_data);
+					sidr_req_msg->private_data,
+					net);
 	if (!cur_cm_id_priv) {
 		spin_unlock_irq(&cm.lock);
 		cm_reject_sidr_req(cm_id_priv, IB_SIDR_UNSUPPORTED);
-		goto out; /* No match. */
+		retval = -EINVAL; /* No match. */
+		goto out_id;
 	}
 	atomic_inc(&cur_cm_id_priv->refcount);
 	atomic_inc(&cm_id_priv->refcount);
@@ -3199,9 +3266,10 @@ static int cm_sidr_req_handler(struct cm_work *work)
 	cm_process_work(cm_id_priv, work);
 	cm_deref_id(cur_cm_id_priv);
 	return 0;
-out:
+out_id:
 	ib_destroy_cm_id(&cm_id_priv->id);
-	return -EINVAL;
+out:
+	return retval;
 }
 
 static void cm_format_sidr_rep(struct cm_sidr_rep_msg *sidr_rep_msg,
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9f6faeb1de5f..1ce84a03c883 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1456,7 +1456,8 @@ static int cma_ib_listen(struct rdma_id_private *id_priv)
 	__be64 svc_id;
 	int ret;
 
-	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv);
+	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv,
+			     &init_net);
 	if (IS_ERR(id))
 		return PTR_ERR(id);
 
@@ -2606,7 +2607,7 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv,
 	}
 
 	id = ib_create_cm_id(id_priv->id.device, cma_sidr_rep_handler,
-			     id_priv);
+			     id_priv, &init_net);
 	if (IS_ERR(id)) {
 		ret = PTR_ERR(id);
 		goto out;
@@ -2655,7 +2656,8 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
 		memcpy(private_data + offset, conn_param->private_data,
 		       conn_param->private_data_len);
 
-	id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, id_priv);
+	id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, id_priv,
+			     &init_net);
 	if (IS_ERR(id)) {
 		ret = PTR_ERR(id);
 		goto out;
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index f2f63933e8a9..9604ab068984 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -489,7 +489,8 @@ static ssize_t ib_ucm_create_id(struct ib_ucm_file *file,
 
 	ctx->uid = cmd.uid;
 	ctx->cm_id = ib_create_cm_id(file->device->ib_dev,
-				     ib_ucm_event_handler, ctx);
+				     ib_ucm_event_handler, ctx,
+				     &init_net);
 	if (IS_ERR(ctx->cm_id)) {
 		result = PTR_ERR(ctx->cm_id);
 		goto err1;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 56959adb6c7d..65dbe4523bf5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -846,7 +846,15 @@ int ipoib_cm_dev_open(struct net_device *dev)
 	if (!IPOIB_CM_SUPPORTED(dev->dev_addr))
 		return 0;
 
-	priv->cm.id = ib_create_cm_id(priv->ca, ipoib_cm_rx_handler, dev);
+	/*
+	 * The IPoIB CM ID should always be in the init_net namespace.
+	 * It is using a service ID which is not in the RDMA IP CM
+	 * range.  Furthermore, it is guaranteed that this service ID
+	 * will be unique in the machine, as it is based on the UD QP
+	 * number.
+	 */
+	priv->cm.id = ib_create_cm_id(priv->ca, ipoib_cm_rx_handler, dev,
+				      &init_net);
 	if (IS_ERR(priv->cm.id)) {
 		printk(KERN_WARNING "%s: failed to create CM ID\n", priv->ca->name);
 		ret = PTR_ERR(priv->cm.id);
@@ -1130,7 +1138,16 @@ static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn,
 		goto err_qp;
 	}
 
-	p->id = ib_create_cm_id(priv->ca, ipoib_cm_tx_handler, p);
+	/*
+	 * The IPoIB CM ID should always be in the init_net namespace.
+	 *
+	 * The target for connection is specified by an explicit GID,
+	 * which is machine global and not specific for the namespace
+	 * the device resides at. The service ID is also guaranteed to
+	 * be per machine unique, and therefore init_net is the right
+	 * namespace.
+	 */
+	p->id = ib_create_cm_id(priv->ca, ipoib_cm_tx_handler, p, &init_net);
 	if (IS_ERR(p->id)) {
 		ret = PTR_ERR(p->id);
 		ipoib_warn(priv, "failed to create tx cm id: %d\n", ret);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 918814cd0f80..b9b5c3f9ce11 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -295,7 +295,7 @@ static int srp_new_cm_id(struct srp_rdma_ch *ch)
 	struct ib_cm_id *new_cm_id;
 
 	new_cm_id = ib_create_cm_id(target->srp_host->srp_dev->dev,
-				    srp_cm_handler, ch);
+				    srp_cm_handler, ch, &init_net);
 	if (IS_ERR(new_cm_id))
 		return PTR_ERR(new_cm_id);
 
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index a95e7d51cd8b..5b1a48052d35 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -3242,7 +3242,7 @@ static void srpt_add_one(struct ib_device *device)
 	if (!srpt_service_guid)
 		srpt_service_guid = be64_to_cpu(device->node_guid);
 
-	sdev->cm_id = ib_create_cm_id(device, srpt_cm_handler, sdev);
+	sdev->cm_id = ib_create_cm_id(device, srpt_cm_handler, sdev, &init_net);
 	if (IS_ERR(sdev->cm_id))
 		goto err_srq;
 
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 0e49933c7b2a..a22ffa370fb3 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -369,13 +369,18 @@ struct ib_cm_id {
  * @cm_handler: Callback invoked to notify the user of CM events.
  * @context: User specified context associated with the communication
  *   identifier.
+ * @net: Network namespace associated with the cm_id.
  *
  * Communication identifiers are used to track connection states, service
  * ID resolution requests, and listen requests.
+ *
+ * The created CM ID will hold a reference on the network namespace until its
+ * destruction.
  */
 struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 				 ib_cm_handler cm_handler,
-				 void *context);
+				 void *context,
+				 struct net *net);
 
 /**
  * ib_destroy_cm_id - Destroy a connection identifier.
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 08/11] IB/cma: Separate port allocation to network namespaces
       [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-04-20  9:03   ` [PATCH v2 06/11] IB/cm, cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm Haggai Eran
@ 2015-04-20  9:03   ` Haggai Eran
  2015-04-20  9:03   ` [PATCH v2 09/11] IB/cma: Add support for " Haggai Eran
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Haggai Eran

From: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Keep a radix-tree for the network namespaces we support for each port-space.
Dynamically allocate idr for network namespace upon first bind request for a
port in the (ps, net) tuple.
Destroy the idr when the (ps, net) tuple does not contain any bounded ports.

This patch is internal infrastructure work for the following patch. In
this patch, init_net is statically used as the network namespace for
the new port-space API.

The radix-tree is protected under the same locking that protects the
rest of the port space data. This locking is practically a big, static
mutex lock for the entire module.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 122 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 99 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 1ce84a03c883..022b0d0a51cc 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -39,11 +39,13 @@
 #include <linux/mutex.h>
 #include <linux/random.h>
 #include <linux/idr.h>
+#include <linux/radix-tree.h>
 #include <linux/inetdevice.h>
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <net/route.h>
 
+#include <net/netns/hash.h>
 #include <net/tcp.h>
 #include <net/ipv6.h>
 
@@ -80,10 +82,83 @@ static LIST_HEAD(dev_list);
 static LIST_HEAD(listen_any_list);
 static DEFINE_MUTEX(lock);
 static struct workqueue_struct *cma_wq;
-static DEFINE_IDR(tcp_ps);
-static DEFINE_IDR(udp_ps);
-static DEFINE_IDR(ipoib_ps);
-static DEFINE_IDR(ib_ps);
+static RADIX_TREE(tcp_ps, GFP_KERNEL);
+static RADIX_TREE(udp_ps, GFP_KERNEL);
+static RADIX_TREE(ipoib_ps, GFP_KERNEL);
+static RADIX_TREE(ib_ps, GFP_KERNEL);
+
+static LIST_HEAD(idrs_list);
+
+struct idr_ll {
+	unsigned net_val;
+	struct net *net;
+	struct radix_tree_root *ps;
+	struct idr idr;
+};
+
+static void zap_ps_idr(struct idr_ll *idr_ll)
+{
+	radix_tree_delete(idr_ll->ps, idr_ll->net_val);
+	idr_destroy(&idr_ll->idr);
+	kfree(idr_ll);
+}
+
+static int cma_ps_alloc(struct radix_tree_root *ps, struct net *net, void *ptr,
+			int snum)
+{
+	struct idr_ll *idr_ll;
+	int err;
+	int res;
+
+	idr_ll = radix_tree_lookup(ps, net_hash_mix(net));
+	if (!idr_ll) {
+		idr_ll = kmalloc(sizeof(*idr_ll), GFP_KERNEL);
+		if (!idr_ll)
+			return -ENOMEM;
+		idr_init(&idr_ll->idr);
+		idr_ll->net_val = net_hash_mix(net);
+		idr_ll->net = net;
+		idr_ll->ps = ps;
+		err = radix_tree_insert(ps, idr_ll->net_val, idr_ll);
+		if (err) {
+			idr_destroy(&idr_ll->idr);
+			kfree(idr_ll);
+			return err;
+		}
+	}
+	res = idr_alloc(&idr_ll->idr, ptr, snum, snum + 1, GFP_KERNEL);
+	if (unlikely((res < 0) && idr_is_empty(&idr_ll->idr))) {
+		zap_ps_idr(idr_ll);
+		return res;
+	}
+	return res;
+}
+
+static void *cma_ps_find(struct radix_tree_root *ps, struct net *net, int snum)
+{
+	struct idr_ll *idr_ll;
+
+	idr_ll = radix_tree_lookup(ps, net_hash_mix(net));
+	if (!idr_ll)
+		return NULL;
+	return idr_find(&idr_ll->idr, snum);
+}
+
+static void cma_ps_remove(struct radix_tree_root *ps, struct net *net, int snum)
+{
+	struct idr_ll *idr_ll;
+
+	idr_ll = radix_tree_lookup(ps, net_hash_mix(net));
+	if (unlikely(!idr_ll)) {
+		WARN(1, "cma_ps_removed can't find expected net ns 0x%lx\n",
+		     (unsigned long)net);
+		return;
+	}
+	idr_remove(&idr_ll->idr, snum);
+	if (idr_is_empty(&idr_ll->idr)) {
+		zap_ps_idr(idr_ll);
+	}
+}
 
 struct cma_device {
 	struct list_head	list;
@@ -94,9 +169,9 @@ struct cma_device {
 };
 
 struct rdma_bind_list {
-	struct idr		*ps;
-	struct hlist_head	owners;
-	unsigned short		port;
+	struct radix_tree_root	*ps;
+	struct hlist_head		owners;
+	unsigned short			port;
 };
 
 enum {
@@ -885,7 +960,7 @@ static void cma_release_port(struct rdma_id_private *id_priv)
 	mutex_lock(&lock);
 	hlist_del(&id_priv->node);
 	if (hlist_empty(&bind_list->owners)) {
-		idr_remove(bind_list->ps, bind_list->port);
+		cma_ps_remove(bind_list->ps, &init_net, bind_list->port);
 		kfree(bind_list);
 	}
 	mutex_unlock(&lock);
@@ -2198,8 +2273,8 @@ static void cma_bind_port(struct rdma_bind_list *bind_list,
 	hlist_add_head(&id_priv->node, &bind_list->owners);
 }
 
-static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
-			  unsigned short snum)
+static int cma_alloc_port(struct radix_tree_root *ps,
+			  struct rdma_id_private *id_priv, unsigned short snum)
 {
 	struct rdma_bind_list *bind_list;
 	int ret;
@@ -2208,7 +2283,7 @@ static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
 	if (!bind_list)
 		return -ENOMEM;
 
-	ret = idr_alloc(ps, bind_list, snum, snum + 1, GFP_KERNEL);
+	ret = cma_ps_alloc(ps, &init_net, bind_list, snum);
 	if (ret < 0)
 		goto err;
 
@@ -2221,7 +2296,8 @@ err:
 	return ret == -ENOSPC ? -EADDRNOTAVAIL : ret;
 }
 
-static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
+static int cma_alloc_any_port(struct radix_tree_root *ps,
+			      struct rdma_id_private *id_priv)
 {
 	static unsigned int last_used_port;
 	int low, high, remaining;
@@ -2232,7 +2308,7 @@ static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 	rover = prandom_u32() % remaining + low;
 retry:
 	if (last_used_port != rover &&
-	    !idr_find(ps, (unsigned short) rover)) {
+	    !cma_ps_find(ps, &init_net, (unsigned short)rover)) {
 		int ret = cma_alloc_port(ps, id_priv, rover);
 		/*
 		 * Remember previously used port number in order to avoid
@@ -2257,6 +2333,8 @@ retry:
  * bind to a specific port, or when trying to listen on a bound port.  In
  * the latter case, the provided id_priv may already be on the bind_list, but
  * we still need to check that it's okay to start listening.
+ *
+ * Assume the bind_list contains only services from the correct name space.
  */
 static int cma_check_port(struct rdma_bind_list *bind_list,
 			  struct rdma_id_private *id_priv, uint8_t reuseaddr)
@@ -2287,7 +2365,8 @@ static int cma_check_port(struct rdma_bind_list *bind_list,
 	return 0;
 }
 
-static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
+static int cma_use_port(struct radix_tree_root *ps,
+			struct rdma_id_private *id_priv)
 {
 	struct rdma_bind_list *bind_list;
 	unsigned short snum;
@@ -2297,7 +2376,7 @@ static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
 	if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
-	bind_list = idr_find(ps, snum);
+	bind_list = cma_ps_find(ps, &init_net, snum);
 	if (!bind_list) {
 		ret = cma_alloc_port(ps, id_priv, snum);
 	} else {
@@ -2320,7 +2399,8 @@ static int cma_bind_listen(struct rdma_id_private *id_priv)
 	return ret;
 }
 
-static struct idr *cma_select_inet_ps(struct rdma_id_private *id_priv)
+static struct radix_tree_root *cma_select_inet_ps(
+		struct rdma_id_private *id_priv)
 {
 	switch (id_priv->id.ps) {
 	case RDMA_PS_TCP:
@@ -2336,9 +2416,9 @@ static struct idr *cma_select_inet_ps(struct rdma_id_private *id_priv)
 	}
 }
 
-static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
+static struct radix_tree_root *cma_select_ib_ps(struct rdma_id_private *id_priv)
 {
-	struct idr *ps = NULL;
+	struct radix_tree_root *ps = NULL;
 	struct sockaddr_ib *sib;
 	u64 sid_ps, mask, sid;
 
@@ -2369,7 +2449,7 @@ static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
 
 static int cma_get_port(struct rdma_id_private *id_priv)
 {
-	struct idr *ps;
+	struct radix_tree_root *ps;
 	int ret;
 
 	if (cma_family(id_priv) != AF_IB)
@@ -3567,10 +3647,6 @@ static void __exit cma_cleanup(void)
 	rdma_addr_unregister_client(&addr_client);
 	ib_sa_unregister_client(&sa_client);
 	destroy_workqueue(cma_wq);
-	idr_destroy(&tcp_ps);
-	idr_destroy(&udp_ps);
-	idr_destroy(&ipoib_ps);
-	idr_destroy(&ib_ps);
 }
 
 module_init(cma_init);
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 09/11] IB/cma: Add support for network namespaces
       [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-04-20  9:03   ` [PATCH v2 08/11] IB/cma: Separate port allocation to network namespaces Haggai Eran
@ 2015-04-20  9:03   ` Haggai Eran
  2015-04-20  9:03   ` [PATCH v2 11/11] IB/ucm: Add partial " Haggai Eran
  2015-04-20 14:53   ` [PATCH v2 00/11] Add network namespace support in the RDMA-CM Steve Wise
  7 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Haggai Eran

From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add support for network namespaces in the ib_cma module. This is
accomplished by:

1. Adding network namespace parameter for rdma_create_id. This parameter is used
   to populate the network namespace field in rdma_id_private. rdma_create_id
   keeps a reference on the network namespace.
2. Using the network namespace from the rdma_id instead of init_net inside of
   ib_cma.
3. Decrementing the reference count for the appropriate network namespace when
   calling rdma_destroy_id.

In order to preserve the current behavior init_net is passed when calling from
other modules.

Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c                      | 52 +++++++++++++---------
 drivers/infiniband/core/ucma.c                     |  3 +-
 drivers/infiniband/ulp/iser/iser_verbs.c           |  2 +-
 drivers/infiniband/ulp/isert/ib_isert.c            |  2 +-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |  4 +-
 include/rdma/rdma_cm.h                             |  6 ++-
 net/9p/trans_rdma.c                                |  2 +-
 net/rds/ib.c                                       |  2 +-
 net/rds/ib_cm.c                                    |  2 +-
 net/rds/iw.c                                       |  2 +-
 net/rds/iw_cm.c                                    |  2 +-
 net/rds/rdma_transport.c                           |  2 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c           |  2 +-
 net/sunrpc/xprtrdma/verbs.c                        |  3 +-
 14 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 022b0d0a51cc..9ea42fe2853b 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -540,7 +540,8 @@ static int cma_disable_callback(struct rdma_id_private *id_priv,
 
 struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
 				  void *context, enum rdma_port_space ps,
-				  enum ib_qp_type qp_type)
+				  enum ib_qp_type qp_type,
+				  struct net *net)
 {
 	struct rdma_id_private *id_priv;
 
@@ -562,7 +563,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
 	INIT_LIST_HEAD(&id_priv->listen_list);
 	INIT_LIST_HEAD(&id_priv->mc_list);
 	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
-	id_priv->id.route.addr.dev_addr.net = &init_net;
+	id_priv->id.route.addr.dev_addr.net = get_net(net);
 
 	return &id_priv->id;
 }
@@ -689,7 +690,7 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
 	    rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)
 	    == IB_LINK_LAYER_ETHERNET) {
 		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL,
-						  &init_net);
+				id_priv->id.route.addr.dev_addr.net);
 
 		if (ret)
 			goto out;
@@ -953,6 +954,7 @@ static void cma_cancel_operation(struct rdma_id_private *id_priv,
 static void cma_release_port(struct rdma_id_private *id_priv)
 {
 	struct rdma_bind_list *bind_list = id_priv->bind_list;
+	struct net *net = id_priv->id.route.addr.dev_addr.net;
 
 	if (!bind_list)
 		return;
@@ -960,7 +962,7 @@ static void cma_release_port(struct rdma_id_private *id_priv)
 	mutex_lock(&lock);
 	hlist_del(&id_priv->node);
 	if (hlist_empty(&bind_list->owners)) {
-		cma_ps_remove(bind_list->ps, &init_net, bind_list->port);
+		cma_ps_remove(bind_list->ps, net, bind_list->port);
 		kfree(bind_list);
 	}
 	mutex_unlock(&lock);
@@ -1029,6 +1031,7 @@ void rdma_destroy_id(struct rdma_cm_id *id)
 		cma_deref_id(id_priv->id.context);
 
 	kfree(id_priv->id.route.path_rec);
+	put_net(id_priv->id.route.addr.dev_addr.net);
 	kfree(id_priv);
 }
 EXPORT_SYMBOL(rdma_destroy_id);
@@ -1156,7 +1159,8 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
 	int ret;
 
 	id = rdma_create_id(listen_id->event_handler, listen_id->context,
-			    listen_id->ps, ib_event->param.req_rcvd.qp_type);
+			    listen_id->ps, ib_event->param.req_rcvd.qp_type,
+			    listen_id->route.addr.dev_addr.net);
 	if (IS_ERR(id))
 		return NULL;
 
@@ -1201,10 +1205,11 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
 {
 	struct rdma_id_private *id_priv;
 	struct rdma_cm_id *id;
+	struct net *net = listen_id->route.addr.dev_addr.net;
 	int ret;
 
 	id = rdma_create_id(listen_id->event_handler, listen_id->context,
-			    listen_id->ps, IB_QPT_UD);
+			    listen_id->ps, IB_QPT_UD, net);
 	if (IS_ERR(id))
 		return NULL;
 
@@ -1455,7 +1460,8 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
 	/* Create a new RDMA id for the new IW CM ID */
 	new_cm_id = rdma_create_id(listen_id->id.event_handler,
 				   listen_id->id.context,
-				   RDMA_PS_TCP, IB_QPT_RC);
+				   RDMA_PS_TCP, IB_QPT_RC,
+				   listen_id->id.route.addr.dev_addr.net);
 	if (IS_ERR(new_cm_id)) {
 		ret = -ENOMEM;
 		goto out;
@@ -1528,11 +1534,11 @@ static int cma_ib_listen(struct rdma_id_private *id_priv)
 	struct ib_cm_compare_data compare_data;
 	struct sockaddr *addr;
 	struct ib_cm_id	*id;
+	struct net *net = id_priv->id.route.addr.dev_addr.net;
 	__be64 svc_id;
 	int ret;
 
-	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv,
-			     &init_net);
+	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv, net);
 	if (IS_ERR(id))
 		return PTR_ERR(id);
 
@@ -1596,6 +1602,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
 {
 	struct rdma_id_private *dev_id_priv;
 	struct rdma_cm_id *id;
+	struct net *net = id_priv->id.route.addr.dev_addr.net;
 	int ret;
 
 	if (cma_family(id_priv) == AF_IB &&
@@ -1603,7 +1610,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
 		return;
 
 	id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps,
-			    id_priv->id.qp_type);
+			    id_priv->id.qp_type, net);
 	if (IS_ERR(id))
 		return;
 
@@ -2283,7 +2290,8 @@ static int cma_alloc_port(struct radix_tree_root *ps,
 	if (!bind_list)
 		return -ENOMEM;
 
-	ret = cma_ps_alloc(ps, &init_net, bind_list, snum);
+	ret = cma_ps_alloc(ps, id_priv->id.route.addr.dev_addr.net, bind_list,
+			   snum);
 	if (ret < 0)
 		goto err;
 
@@ -2302,13 +2310,14 @@ static int cma_alloc_any_port(struct radix_tree_root *ps,
 	static unsigned int last_used_port;
 	int low, high, remaining;
 	unsigned int rover;
+	struct net *net = id_priv->id.route.addr.dev_addr.net;
 
-	inet_get_local_port_range(&init_net, &low, &high);
+	inet_get_local_port_range(net, &low, &high);
 	remaining = (high - low) + 1;
 	rover = prandom_u32() % remaining + low;
 retry:
 	if (last_used_port != rover &&
-	    !cma_ps_find(ps, &init_net, (unsigned short)rover)) {
+	    !cma_ps_find(ps, net, (unsigned short)rover)) {
 		int ret = cma_alloc_port(ps, id_priv, rover);
 		/*
 		 * Remember previously used port number in order to avoid
@@ -2376,7 +2385,7 @@ static int cma_use_port(struct radix_tree_root *ps,
 	if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
-	bind_list = cma_ps_find(ps, &init_net, snum);
+	bind_list = cma_ps_find(ps, id_priv->id.route.addr.dev_addr.net, snum);
 	if (!bind_list) {
 		ret = cma_alloc_port(ps, id_priv, snum);
 	} else {
@@ -2573,8 +2582,11 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 		if (addr->sa_family == AF_INET)
 			id_priv->afonly = 1;
 #if IS_ENABLED(CONFIG_IPV6)
-		else if (addr->sa_family == AF_INET6)
-			id_priv->afonly = init_net.ipv6.sysctl.bindv6only;
+		else if (addr->sa_family == AF_INET6) {
+			struct net *net = id_priv->id.route.addr.dev_addr.net;
+
+			id_priv->afonly = net->ipv6.sysctl.bindv6only;
+		}
 #endif
 	}
 	ret = cma_get_port(id_priv);
@@ -2687,7 +2699,7 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv,
 	}
 
 	id = ib_create_cm_id(id_priv->id.device, cma_sidr_rep_handler,
-			     id_priv, &init_net);
+			     id_priv, id_priv->id.route.addr.dev_addr.net);
 	if (IS_ERR(id)) {
 		ret = PTR_ERR(id);
 		goto out;
@@ -2737,7 +2749,7 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
 		       conn_param->private_data_len);
 
 	id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, id_priv,
-			     &init_net);
+			     id_priv->id.route.addr.dev_addr.net);
 	if (IS_ERR(id)) {
 		ret = PTR_ERR(id);
 		goto out;
@@ -3387,6 +3399,7 @@ static int cma_netdev_change(struct net_device *ndev, struct rdma_id_private *id
 	dev_addr = &id_priv->id.route.addr.dev_addr;
 
 	if ((dev_addr->bound_dev_if == ndev->ifindex) &&
+	    (net_eq(dev_net(ndev), dev_addr->net)) &&
 	    memcmp(dev_addr->src_dev_addr, ndev->dev_addr, ndev->addr_len)) {
 		printk(KERN_INFO "RDMA CM addr change for ndev %s used by id %p\n",
 		       ndev->name, &id_priv->id);
@@ -3412,9 +3425,6 @@ static int cma_netdev_callback(struct notifier_block *self, unsigned long event,
 	struct rdma_id_private *id_priv;
 	int ret = NOTIFY_DONE;
 
-	if (dev_net(ndev) != &init_net)
-		return NOTIFY_DONE;
-
 	if (event != NETDEV_BONDING_FAILOVER)
 		return NOTIFY_DONE;
 
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 45d67e9228d7..2f7fad84f933 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -391,7 +391,8 @@ static ssize_t ucma_create_id(struct ucma_file *file, const char __user *inbuf,
 		return -ENOMEM;
 
 	ctx->uid = cmd.uid;
-	ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps, qp_type);
+	ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps, qp_type,
+				    &init_net);
 	if (IS_ERR(ctx->cm_id)) {
 		ret = PTR_ERR(ctx->cm_id);
 		goto err1;
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index cc2dd35ffbc0..e658f31079b8 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -962,7 +962,7 @@ int iser_connect(struct iser_conn   *iser_conn,
 
 	ib_conn->cma_id = rdma_create_id(iser_cma_handler,
 					 (void *)iser_conn,
-					 RDMA_PS_TCP, IB_QPT_RC);
+					 RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(ib_conn->cma_id)) {
 		err = PTR_ERR(ib_conn->cma_id);
 		iser_err("rdma_create_id failed: %d\n", err);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 075b19cc78e8..745f79c1f498 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2973,7 +2973,7 @@ isert_setup_id(struct isert_np *isert_np)
 	isert_dbg("ksockaddr: %p, sa: %p\n", &np->np_sockaddr, sa);
 
 	id = rdma_create_id(isert_cma_handler, isert_np,
-			    RDMA_PS_TCP, IB_QPT_RC);
+			    RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(id)) {
 		isert_err("rdma_create_id() failed: %ld\n", PTR_ERR(id));
 		ret = PTR_ERR(id);
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
index cd664d025f41..d9c07e942326 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
@@ -125,7 +125,9 @@ extern kib_tunables_t  kiblnd_tunables;
 				     IBLND_CREDIT_HIGHWATER_V1 : \
 				     *kiblnd_tunables.kib_peercredits_hiw) /* when eagerly to return credits */
 
-#define kiblnd_rdma_create_id(cb, dev, ps, qpt) rdma_create_id(cb, dev, ps, qpt)
+#define kiblnd_rdma_create_id(cb, dev, ps, qpt) rdma_create_id(cb, dev, \
+							       ps, qpt, \
+							       &init_net)
 
 static inline int
 kiblnd_concurrent_sends_v1(void)
diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
index 1ed2088dc9f5..3953e9c8bc94 100644
--- a/include/rdma/rdma_cm.h
+++ b/include/rdma/rdma_cm.h
@@ -163,10 +163,14 @@ struct rdma_cm_id {
  * @context: User specified context associated with the id.
  * @ps: RDMA port space.
  * @qp_type: type of queue pair associated with the id.
+ * @net: The network namespace in which to create the new id.
+ *
+ * The id holds a reference on the network namespace until it is destroyed.
  */
 struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
 				  void *context, enum rdma_port_space ps,
-				  enum ib_qp_type qp_type);
+				  enum ib_qp_type qp_type,
+				  struct net *net);
 
 /**
   * rdma_destroy_id - Destroys an RDMA identifier.
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 14ad43b5cf89..577fd3129bcf 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -635,7 +635,7 @@ rdma_create_trans(struct p9_client *client, const char *addr, char *args)
 
 	/* Create the RDMA CM ID */
 	rdma->cm_id = rdma_create_id(p9_cm_event_handler, client, RDMA_PS_TCP,
-				     IB_QPT_RC);
+				     IB_QPT_RC, &init_net);
 	if (IS_ERR(rdma->cm_id))
 		goto error;
 
diff --git a/net/rds/ib.c b/net/rds/ib.c
index ba2dffeff608..cc137f523248 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -326,7 +326,7 @@ static int rds_ib_laddr_check(__be32 addr)
 	/* Create a CMA ID and try to bind it. This catches both
 	 * IB and iWARP capable NICs.
 	 */
-	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
+	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(cm_id))
 		return PTR_ERR(cm_id);
 
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index 31b74f5e61ad..d19b91296ddc 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -584,7 +584,7 @@ int rds_ib_conn_connect(struct rds_connection *conn)
 	/* XXX I wonder what affect the port space has */
 	/* delegate cm event handler to rdma_transport */
 	ic->i_cm_id = rdma_create_id(rds_rdma_cm_event_handler, conn,
-				     RDMA_PS_TCP, IB_QPT_RC);
+				     RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(ic->i_cm_id)) {
 		ret = PTR_ERR(ic->i_cm_id);
 		ic->i_cm_id = NULL;
diff --git a/net/rds/iw.c b/net/rds/iw.c
index 589935661d66..8501b73ed12f 100644
--- a/net/rds/iw.c
+++ b/net/rds/iw.c
@@ -227,7 +227,7 @@ static int rds_iw_laddr_check(__be32 addr)
 	/* Create a CMA ID and try to bind it. This catches both
 	 * IB and iWARP capable NICs.
 	 */
-	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
+	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(cm_id))
 		return PTR_ERR(cm_id);
 
diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
index a6c2bea9f8f9..06406bc7aabb 100644
--- a/net/rds/iw_cm.c
+++ b/net/rds/iw_cm.c
@@ -521,7 +521,7 @@ int rds_iw_conn_connect(struct rds_connection *conn)
 	/* XXX I wonder what affect the port space has */
 	/* delegate cm event handler to rdma_transport */
 	ic->i_cm_id = rdma_create_id(rds_rdma_cm_event_handler, conn,
-				     RDMA_PS_TCP, IB_QPT_RC);
+				     RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(ic->i_cm_id)) {
 		ret = PTR_ERR(ic->i_cm_id);
 		ic->i_cm_id = NULL;
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index 6cd9d1deafc3..066b60b27b12 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -160,7 +160,7 @@ static int rds_rdma_listen_init(void)
 	int ret;
 
 	cm_id = rdma_create_id(rds_rdma_cm_event_handler, NULL, RDMA_PS_TCP,
-			       IB_QPT_RC);
+			       IB_QPT_RC, &init_net);
 	if (IS_ERR(cm_id)) {
 		ret = PTR_ERR(cm_id);
 		printk(KERN_ERR "RDS/RDMA: failed to setup listener, "
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index f609c1c2d38d..dbf9013d9667 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -705,7 +705,7 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
 		return ERR_PTR(-ENOMEM);
 
 	listen_id = rdma_create_id(rdma_listen_handler, cma_xprt, RDMA_PS_TCP,
-				   IB_QPT_RC);
+				   IB_QPT_RC, &init_net);
 	if (IS_ERR(listen_id)) {
 		ret = PTR_ERR(listen_id);
 		dprintk("svcrdma: rdma_create_id failed = %d\n", ret);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index e28909fddd30..b2e3a0515fd7 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -519,7 +519,8 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
 
 	init_completion(&ia->ri_done);
 
-	id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP, IB_QPT_RC);
+	id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP, IB_QPT_RC,
+			    &init_net);
 	if (IS_ERR(id)) {
 		rc = PTR_ERR(id);
 		dprintk("RPC:       %s: rdma_create_id() failed %i\n",
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 10/11] IB/ucma: Take the network namespace from the process
  2015-04-20  9:03 [PATCH v2 00/11] Add network namespace support in the RDMA-CM Haggai Eran
                   ` (3 preceding siblings ...)
  2015-04-20  9:03 ` [PATCH v2 07/11] IB/cm: Add network namespace support Haggai Eran
@ 2015-04-20  9:03 ` Haggai Eran
  4 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Haggai Eran

From: Guy Shapiro <guysh@mellanox.com>

Add support for network namespaces from user space. This is done by passing
the network namespace of the process instead of init_net.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 2f7fad84f933..0ccdf2b05153 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -42,6 +42,7 @@
 #include <linux/slab.h>
 #include <linux/sysctl.h>
 #include <linux/module.h>
+#include <linux/nsproxy.h>
 
 #include <rdma/rdma_user_cm.h>
 #include <rdma/ib_marshall.h>
@@ -392,7 +393,7 @@ static ssize_t ucma_create_id(struct ucma_file *file, const char __user *inbuf,
 
 	ctx->uid = cmd.uid;
 	ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps, qp_type,
-				    &init_net);
+				    current->nsproxy->net_ns);
 	if (IS_ERR(ctx->cm_id)) {
 		ret = PTR_ERR(ctx->cm_id);
 		goto err1;
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 11/11] IB/ucm: Add partial support for network namespaces
       [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-04-20  9:03   ` [PATCH v2 09/11] IB/cma: Add support for " Haggai Eran
@ 2015-04-20  9:03   ` Haggai Eran
  2015-04-20 23:46     ` ira.weiny
  2015-04-20 14:53   ` [PATCH v2 00/11] Add network namespace support in the RDMA-CM Steve Wise
  7 siblings, 1 reply; 39+ messages in thread
From: Haggai Eran @ 2015-04-20  9:03 UTC (permalink / raw)
  To: Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth, Haggai Eran

From: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

It is impossible to completely support network namespaces for UCM, as
we cannot identify the target IPoIB device. However, we add support
which will work if the user is following the IB-Spec Annex 11 (RDMA IP
CM Services) with the service ID and private data formatting.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/ucm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 9604ab068984..424421091dae 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -45,6 +45,7 @@
 #include <linux/idr.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
+#include <linux/nsproxy.h>
 
 #include <asm/uaccess.h>
 
@@ -490,7 +491,7 @@ static ssize_t ib_ucm_create_id(struct ib_ucm_file *file,
 	ctx->uid = cmd.uid;
 	ctx->cm_id = ib_create_cm_id(file->device->ib_dev,
 				     ib_ucm_event_handler, ctx,
-				     &init_net);
+				     current->nsproxy->net_ns);
 	if (IS_ERR(ctx->cm_id)) {
 		result = PTR_ERR(ctx->cm_id);
 		goto err1;
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/11] Add network namespace support in the RDMA-CM
       [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2015-04-20  9:03   ` [PATCH v2 11/11] IB/ucm: Add partial " Haggai Eran
@ 2015-04-20 14:53   ` Steve Wise
  2015-04-21  6:36       ` Haggai Eran
  7 siblings, 1 reply; 39+ messages in thread
From: Steve Wise @ 2015-04-20 14:53 UTC (permalink / raw)
  To: Haggai Eran, Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth


Hey Haggai,

Did you check for changes needed in drivers/infiniband/core/iwcm.c? I 
notice that it uses init_net here:

static int __init iw_cm_init(void)
{
         iwcm_wq = create_singlethread_workqueue("iw_cm_wq");
         if (!iwcm_wq)
                 return -ENOMEM;

         iwcm_ctl_table_hdr = register_net_sysctl(&init_net, "net/iw_cm",
                                                  iwcm_ctl_table);
         if (!iwcm_ctl_table_hdr) {
                 pr_err("iw_cm: couldn't register sysctl paths\n");
                 destroy_workqueue(iwcm_wq);
                 return -ENOMEM;
         }

         return 0;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 01/11] RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6
       [not found]     ` <1429520622-10303-2-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-04-20 16:41       ` Jason Gunthorpe
       [not found]         ` <20150420164140.GC7676-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2015-04-20 16:41 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth,
	Or Gerlitz

On Mon, Apr 20, 2015 at 12:03:32PM +0300, Haggai Eran wrote:
> From: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> When accepting a new connection with the listener being IPv6, the
> family of the new connection is set as IPv6. This causes cma_zero_addr
> function to return true on an non-zero address. As a result, the wrong
> code path is taken. This causes the connection request to be rejected,
> as the RDMA-CM code looks for the wrong type of device.

This description doesn't really make sense as to what the problem is.

> @@ -866,12 +866,12 @@ static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_i
>  
>  	listen4 = (struct sockaddr_in *) &listen_id->route.addr.src_addr;
>  	ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
> -	ip4->sin_family = listen4->sin_family;
> +	ip4->sin_family = AF_INET;

If listen_id->route.addr.src_addr.ss_family != AF_INET then it is
invalid to cast to sockaddr_in.

So listen4->sin_family MUST be AF_INET or this function MUST NOT be
called.

Forcing to AF_INET cannot be correct here.

What does this patch have to do with this series?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 07/11] IB/cm: Add network namespace support
       [not found]   ` <1429520622-10303-8-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-04-20 17:06     ` Jason Gunthorpe
       [not found]       ` <20150420170659.GD7676-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2015-04-20 17:06 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth

On Mon, Apr 20, 2015 at 12:03:38PM +0300, Haggai Eran wrote:
> From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Add namespace support to the IB-CM layer.

> - Each CM-ID now has a network namespace it is associated with, assigned at
>   creation. This namespace is used as needed during subsequent action on the
>   CM-ID or related objects.

There is really something weird about this layering. At the CM layer
there should be no concept of an IP address, it only deals with GIDs.

So how can a CM object have a network namespace associated with it?

>  {
>  	av->port = port;
>  	av->pkey_index = wc->pkey_index;
>  	ib_init_ah_from_wc(port->cm_dev->ib_device, port->port_num, wc,
> -			   grh, &av->ah_attr, &init_net);
> +			   grh, &av->ah_attr, net);

There is something deeply wrong with adding network namespace
arguments to verbs.

For rocee the gid index clearly specifies the network namespace
to use, so much of this should go away and have rocee get the
namespace from the gid index.

Ie in ib_init_ah_from_wc we have the ib_wc which contains the sgid
index.

I'm really not excited at how many places are gaining a net when those
layers shouldn't even need to care about IP layer details. Just acting
as a pass through for rocee doesn't make sense.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/11] IB/addr: Pass network namespace as a parameter
       [not found]     ` <1429520622-10303-3-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-04-20 17:09       ` Jason Gunthorpe
       [not found]         ` <20150420170925.GE7676-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-04-20 22:05       ` Doug Ledford
  1 sibling, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2015-04-20 17:09 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth

On Mon, Apr 20, 2015 at 12:03:33PM +0300, Haggai Eran wrote:
> +/** rdma_addr_find_smac_by_sgid() - Find the src MAC and VLAN ID for a src GID
> + * @sgid:	Source GID to find the MAC and VLAN for.
> + * @smac:	A buffer to contain the resulting MAC address.
> + * @vlan_id:	Will contain the resulting VLAN ID.
> + * @net:	Network namespace to use for the address resolution.
> + *
> + * It is the caller's responsibility to keep the network namespace alive until
> + * the function returns.
> + */
> +int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id,
> +				struct net *net);

kdocs are typically placed with the body of the function, not at the
prototype.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 01/11] RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6
       [not found]         ` <20150420164140.GC7676-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-04-20 18:38           ` Or Gerlitz
  2015-04-20 20:01             ` Jason Gunthorpe
       [not found]             ` <CAJ3xEMgKFdr68Qt0vNCaf1p4YjPK2KUSn2FdtQVP0SZQ+Y7atg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 2 replies; 39+ messages in thread
From: Or Gerlitz @ 2015-04-20 18:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Haggai Eran, Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux Netdev List, Liran Liss,
	Guy Shapiro, Shachar Raindel, Yotam Kenneth, Or Gerlitz

On Mon, Apr 20, 2015 at 7:41 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Mon, Apr 20, 2015 at 12:03:32PM +0300, Haggai Eran wrote:
>> From: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> When accepting a new connection with the listener being IPv6, the
>> family of the new connection is set as IPv6. This causes cma_zero_addr
>> function to return true on an non-zero address. As a result, the wrong
>> code path is taken. This causes the connection request to be rejected,
>> as the RDMA-CM code looks for the wrong type of device.
>
> This description doesn't really make sense as to what the problem is.
>
>> @@ -866,12 +866,12 @@ static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_i
>>
>>       listen4 = (struct sockaddr_in *) &listen_id->route.addr.src_addr;
>>       ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
>> -     ip4->sin_family = listen4->sin_family;
>> +     ip4->sin_family = AF_INET;
>
> If listen_id->route.addr.src_addr.ss_family != AF_INET then it is
> invalid to cast to sockaddr_in.
>
> So listen4->sin_family MUST be AF_INET or this function MUST NOT be
> called.
>
> Forcing to AF_INET cannot be correct here.

Jason, could you take a look @ this thread
http://marc.info/?t=141589395000004&r=1&w=2 where the authors
addressed some comments from Sean and he eventually Acked the patch?

> What does this patch have to do with this series?

I believe it's either a pre-patch to address some assumption or
something they stepped on while testing

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 01/11] RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6
  2015-04-20 18:38           ` Or Gerlitz
@ 2015-04-20 20:01             ` Jason Gunthorpe
       [not found]               ` <20150420200111.GA32449-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
       [not found]             ` <CAJ3xEMgKFdr68Qt0vNCaf1p4YjPK2KUSn2FdtQVP0SZQ+Y7atg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2015-04-20 20:01 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Haggai Eran, Doug Ledford, Roland Dreier, Sean Hefty, linux-rdma,
	Linux Netdev List, Liran Liss, Guy Shapiro, Shachar Raindel,
	Yotam Kenneth, Or Gerlitz

On Mon, Apr 20, 2015 at 09:38:02PM +0300, Or Gerlitz wrote:
> On Mon, Apr 20, 2015 at 7:41 PM, Jason Gunthorpe
> <jgunthorpe@obsidianresearch.com> wrote:
> > On Mon, Apr 20, 2015 at 12:03:32PM +0300, Haggai Eran wrote:
> >> From: Yotam Kenneth <yotamke@mellanox.com>
> >>
> >> When accepting a new connection with the listener being IPv6, the
> >> family of the new connection is set as IPv6. This causes cma_zero_addr
> >> function to return true on an non-zero address. As a result, the wrong
> >> code path is taken. This causes the connection request to be rejected,
> >> as the RDMA-CM code looks for the wrong type of device.
> >
> > This description doesn't really make sense as to what the problem is.

> Jason, could you take a look @ this thread
> http://marc.info/?t=141589395000004&r=1&w=2 where the authors
> addressed some comments from Sean and he eventually Acked the patch?

Please actually read my comments:

 If listen_id->route.addr.src_addr.ss_family != AF_INET then it is
 invalid to cast to sockaddr_in.

Sean asked basically the same thing, and his question was ignored too.

This should take care of it, testing, and figuring the fixes tag is
left as an exercise to the reader..

>From 24cdf029349c9ffad0b2aab37058048ab422960f Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Date: Mon, 20 Apr 2015 13:48:52 -0600
Subject: [PATCH] RDMA/CMA: Canonize IPv4 on IPV6 sockets properly

When accepting a new IPv4 connect to an IPv6 socket, the CMA tries to
canonize the address family to IPv4, but does not properly process
the listening sockaddr to get the listening port, and does not properly
set the address family of the canonized sockaddr.

Cc: <stable@vger.kernel.org>
Reported-By: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/infiniband/core/cma.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d570030d899c..e8d492eceff3 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -859,19 +859,27 @@ static void cma_save_ib_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id
 	memcpy(&ib->sib_addr, &path->dgid, 16);
 }
 
+static unsigned int ss_get_port(const struct sockaddr_storage *ss)
+{
+	if (ss->ss_family == AF_INET)
+		return ((struct sockaddr_in *)ss)->sin_port;
+	else if (ss->ss_family == AF_INET6)
+		return ((struct sockaddr_in6 *)ss)->sin6_port;
+	BUG();
+}
+
 static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
 			      struct cma_hdr *hdr)
 {
-	struct sockaddr_in *listen4, *ip4;
+	struct sockaddr_in *ip4;
 
-	listen4 = (struct sockaddr_in *) &listen_id->route.addr.src_addr;
 	ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
-	ip4->sin_family = listen4->sin_family;
+	ip4->sin_family = AF_INET;
 	ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
-	ip4->sin_port = listen4->sin_port;
+	ip4->sin_port = ss_get_port(&listen_id->route.addr.src_addr);
 
 	ip4 = (struct sockaddr_in *) &id->route.addr.dst_addr;
-	ip4->sin_family = listen4->sin_family;
+	ip4->sin_family = AF_INET;
 	ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
 	ip4->sin_port = hdr->port;
 }
@@ -879,16 +887,15 @@ static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_i
 static void cma_save_ip6_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
 			      struct cma_hdr *hdr)
 {
-	struct sockaddr_in6 *listen6, *ip6;
+	struct sockaddr_in6 *ip6;
 
-	listen6 = (struct sockaddr_in6 *) &listen_id->route.addr.src_addr;
 	ip6 = (struct sockaddr_in6 *) &id->route.addr.src_addr;
-	ip6->sin6_family = listen6->sin6_family;
+	ip6->sin6_family = AF_INET6;
 	ip6->sin6_addr = hdr->dst_addr.ip6;
-	ip6->sin6_port = listen6->sin6_port;
+	ip6->sin6_port = ss_get_port(&listen_id->route.addr.src_addr);
 
 	ip6 = (struct sockaddr_in6 *) &id->route.addr.dst_addr;
-	ip6->sin6_family = listen6->sin6_family;
+	ip6->sin6_family = AF_INET6;
 	ip6->sin6_addr = hdr->src_addr.ip6;
 	ip6->sin6_port = hdr->port;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/11] IB/addr: Pass network namespace as a parameter
       [not found]     ` <1429520622-10303-3-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-04-20 17:09       ` Jason Gunthorpe
@ 2015-04-20 22:05       ` Doug Ledford
       [not found]         ` <1429567530.45956.31.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Doug Ledford @ 2015-04-20 22:05 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Roland Dreier, Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

[-- Attachment #1: Type: text/plain, Size: 4579 bytes --]

On Mon, 2015-04-20 at 12:03 +0300, Haggai Eran wrote:
> From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Add network namespace support to the ib_addr module. For that, all the address
> resolution and matching should be done using the appropriate namespace instead
> of init_net.
> 
> This is achieved by:
> 
> 1. Adding an explicit network namespace argument to exported function that
>    require a namespace.
> 2. Saving the namespace in the rdma_addr_client structure.
> 3. Using it when calling networking functions.
> 
> In order to preserve the behavior of calling modules, &init_net is
> passed as the parameter in calls from other modules. This is modified as
> namespace support is added on more levels.
> 
> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/infiniband/core/addr.c           | 31 ++++++++++++----------
>  drivers/infiniband/core/cma.c            |  4 ++-
>  drivers/infiniband/core/verbs.c          | 14 +++++++---
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |  3 ++-
>  include/rdma/ib_addr.h                   | 44 ++++++++++++++++++++++++++++----
>  5 files changed, 72 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> index f80da50d84a5..95beaef6b66d 100644
> --- a/drivers/infiniband/core/addr.c
> +++ b/drivers/infiniband/core/addr.c
> @@ -128,7 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>  	int ret = -EADDRNOTAVAIL;
>  
>  	if (dev_addr->bound_dev_if) {
> -		dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
> +		dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
>  		if (!dev)
>  			return -ENODEV;
>  		ret = rdma_copy_addr(dev_addr, dev, NULL);
> @@ -137,9 +137,10 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>  	}
>  
>  	switch (addr->sa_family) {
> -	case AF_INET:
> -		dev = ip_dev_find(&init_net,
> -			((struct sockaddr_in *) addr)->sin_addr.s_addr);
> +	case AF_INET: {
                       ^ Please don't add brackets just so you can
convert a cast into a variable declaration that's unnecessary

> +		struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
> +
> +		dev = ip_dev_find(dev_addr->net, addr_in->sin_addr.s_addr);
>  
>  		if (!dev)
>  			return ret;
> @@ -149,12 +150,12 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>  			*vlan_id = rdma_vlan_dev_vlan_id(dev);
>  		dev_put(dev);
>  		break;
> -
> +	}
>  #if IS_ENABLED(CONFIG_IPV6)
>  	case AF_INET6:
>  		rcu_read_lock();
> -		for_each_netdev_rcu(&init_net, dev) {
> -			if (ipv6_chk_addr(&init_net,
> +		for_each_netdev_rcu(dev_addr->net, dev) {
> +			if (ipv6_chk_addr(dev_addr->net,
>  					  &((struct sockaddr_in6 *) addr)->sin6_addr,
>  					  dev, 1)) {
>  				ret = rdma_copy_addr(dev_addr, dev, NULL);
> @@ -236,7 +237,7 @@ static int addr4_resolve(struct sockaddr_in *src_in,
>  	fl4.daddr = dst_ip;
>  	fl4.saddr = src_ip;
>  	fl4.flowi4_oif = addr->bound_dev_if;
> -	rt = ip_route_output_key(&init_net, &fl4);
> +	rt = ip_route_output_key(addr->net, &fl4);
>  	if (IS_ERR(rt)) {
>  		ret = PTR_ERR(rt);
>  		goto out;
> @@ -278,12 +279,13 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
>  	fl6.saddr = src_in->sin6_addr;
>  	fl6.flowi6_oif = addr->bound_dev_if;
>  
> -	dst = ip6_route_output(&init_net, NULL, &fl6);
> +	dst = ip6_route_output(addr->net, NULL, &fl6);
>  	if ((ret = dst->error))
>  		goto put;
>  
>  	if (ipv6_addr_any(&fl6.saddr)) {
> -		ret = ipv6_dev_get_saddr(&init_net, ip6_dst_idev(dst)->dev,
> +		ret = ipv6_dev_get_saddr(addr->net,
> +					 ip6_dst_idev(dst)->dev,
>  					 &fl6.daddr, 0, &fl6.saddr);
>  		if (ret)
>  			goto put;
> @@ -458,7 +460,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr,
>  }
>  
>  int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
> -			       u16 *vlan_id)
> +			       u16 *vlan_id, struct net *net)

In the core networking code, the net namespace is always first.  Please
stick with that paradigm.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/11] IB/ipoib: Return IPoIB devices as possible matches to get_net_device_by_port_pkey_ip
  2015-04-20  9:03 ` [PATCH v2 05/11] IB/ipoib: Return IPoIB devices as possible matches to get_net_device_by_port_pkey_ip Haggai Eran
@ 2015-04-20 23:09   ` ira.weiny
  0 siblings, 0 replies; 39+ messages in thread
From: ira.weiny @ 2015-04-20 23:09 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, Roland Dreier, Sean Hefty, linux-rdma, netdev,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth

On Mon, Apr 20, 2015 at 12:03:36PM +0300, Haggai Eran wrote:
> From: Guy Shapiro <guysh@mellanox.com>
> 
> Implement callback that returns network device to ib_core according to
> connection parameters. Check the ipoib device and iterate over all child
> devices to look for a match.
> 
> For each ipoib device we iterate through all upper devices when searching for
> a matching IP, in order to support bonding.
> 
> Signed-off-by: Guy Shapiro <guysh@mellanox.com>
> Signed-off-by: Haggai Eran <haggaie@mellanox.com>
> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 122 +++++++++++++++++++++++++++++-
>  1 file changed, 121 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 7cad4dd87469..89a59a0e17e6 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -48,6 +48,9 @@
>  
>  #include <linux/jhash.h>
>  #include <net/arp.h>
> +#include <net/addrconf.h>
> +#include <linux/inetdevice.h>
> +#include <rdma/ib_cache.h>
>  
>  #define DRV_VERSION "1.0.0"
>  
> @@ -91,11 +94,15 @@ struct ib_sa_client ipoib_sa_client;
>  static void ipoib_add_one(struct ib_device *device);
>  static void ipoib_remove_one(struct ib_device *device);
>  static void ipoib_neigh_reclaim(struct rcu_head *rp);
> +static struct net_device *ipoib_get_net_device_by_port_pkey_ip(
> +		struct ib_device *dev, u8 port, u16 pkey,
> +		struct sockaddr *addr);
>  
>  static struct ib_client ipoib_client = {
>  	.name   = "ipoib",
>  	.add    = ipoib_add_one,
> -	.remove = ipoib_remove_one
> +	.remove = ipoib_remove_one,
> +	.get_net_device_by_port_pkey_ip = ipoib_get_net_device_by_port_pkey_ip,
>  };
>  
>  int ipoib_open(struct net_device *dev)
> @@ -222,6 +229,119 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
>  	return 0;
>  }
>  
> +static bool ipoib_is_dev_match_addr(struct sockaddr *addr,
> +				    struct net_device *dev)
> +{
> +	struct net *net = dev_net(dev);
> +
> +	if (addr->sa_family == AF_INET) {
> +		struct in_device *in_dev = in_dev_get(dev);
> +		struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
> +		__be32 ret_addr;
> +
> +		if (!in_dev)
> +			return false;
> +
> +		ret_addr = inet_confirm_addr(net, in_dev, 0,
> +					     addr_in->sin_addr.s_addr,
> +					     RT_SCOPE_HOST);
> +		in_dev_put(in_dev);
> +		if (ret_addr)
> +			return true;
> +	}
> +#if IS_ENABLED(CONFIG_IPV6)
> +	else if (addr->sa_family == AF_INET6) {
> +		struct sockaddr_in6 *addr_in6 = (struct sockaddr_in6 *)addr;
> +
> +		if (ipv6_chk_addr(net, &addr_in6->sin6_addr, dev, 1))
> +			return true;
> +	}
> +#endif
> +	return false;
> +}
> +
> +/**
> + * Find a net_device matching the given address, which is an upper device of
> + * the given net_device.
> + * @addr: IP address to look for.
> + * @dev: base IPoIB net_device
> + *
> + * If found, returns the net_device with a reference held. Otherwise return
> + * NULL.
> + */
> +static struct net_device *ipoib_get_net_dev_match_addr(struct sockaddr *addr,
> +						       struct net_device *dev)
> +{
> +	struct net_device *upper,
> +			  *result = NULL;
> +	struct list_head *iter;
> +
> +	if (ipoib_is_dev_match_addr(addr, dev)) {
> +		dev_hold(dev);
> +		return dev;
> +	}
> +
> +	rcu_read_lock();
> +	netdev_for_each_all_upper_dev_rcu(dev, upper, iter) {
> +		if (ipoib_is_dev_match_addr(addr, upper)) {
> +			dev_hold(upper);
> +			result = upper;
> +			break;
> +		}
> +	}
> +	rcu_read_unlock();
> +	return result;
> +}
> +
> +static struct net_device *ipoib_get_net_device_by_port_pkey_ip(
> +		struct ib_device *dev, u8 port, u16 pkey, struct sockaddr *addr)
> +{
> +	struct ipoib_dev_priv *priv;
> +	struct list_head *dev_list;
> +	u16 pkey_index;
> +
> +	ib_find_cached_pkey(dev, port, pkey, &pkey_index);
> +	if (pkey_index == (u16)-1)
> +		return NULL;

Why not check the return value of ib_find_cached_pkey?

> +
> +	if (rdma_node_get_transport(dev->node_type) != RDMA_TRANSPORT_IB)
> +		return NULL;

The use of Link Layer and Transport in this series will need to be reevaluated
based on Michaels work:

https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg24140.html

Ira

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/11] IB/cm, cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm
       [not found]     ` <1429520622-10303-7-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-04-20 23:29       ` ira.weiny
  0 siblings, 0 replies; 39+ messages in thread
From: ira.weiny @ 2015-04-20 23:29 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth

On Mon, Apr 20, 2015 at 12:03:37PM +0300, Haggai Eran wrote:
> From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> When receiving a connection request, ib_cm needs to associate the request with
> a network namespace. To do this, it needs to know the request's destination
> IP. For this the RDMA IP CM packet formatting functionality needs to be
> exposed to ib_cm.
> 

[snip]

> +
> +int cm_save_net_info(struct sockaddr *src_addr,
> +		     struct sockaddr *dst_addr,
> +		     struct ib_cm_event *ib_event)
> +{
> +	struct cm_work *work = container_of(ib_event, struct cm_work, cm_event);
> +
> +	if ((rdma_port_get_link_layer(work->port->cm_dev->ib_device,
> +				      work->port->port_num) ==
> +	     IB_LINK_LAYER_INFINIBAND) &&
> +	    (ib_event->event == IB_CM_REQ_RECEIVED)) {

The original code in the RDMA CM had a check for AF_IB.  Isn't that needed here
as well?

Ira

> +		cm_save_ib_info(src_addr, dst_addr,
> +				ib_event->param.req_rcvd.primary_path);
> +		return 0;
> +	}
> +
> +	return cm_save_ip_info(src_addr, dst_addr, work);
> +}
> +EXPORT_SYMBOL(cm_save_net_info);
> +
>  struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
>  				 ib_cm_handler cm_handler,
>  				 void *context)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 07/11] IB/cm: Add network namespace support
       [not found]       ` <20150420170659.GD7676-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-04-20 23:35         ` ira.weiny
       [not found]           ` <55363D93.10706@mellanox.com>
  2015-04-21 11:59           ` Haggai Eran
  1 sibling, 1 reply; 39+ messages in thread
From: ira.weiny @ 2015-04-20 23:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Haggai Eran, Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth

On Mon, Apr 20, 2015 at 11:06:59AM -0600, Jason Gunthorpe wrote:
> On Mon, Apr 20, 2015 at 12:03:38PM +0300, Haggai Eran wrote:
> > From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > 
> > Add namespace support to the IB-CM layer.
> 
> > - Each CM-ID now has a network namespace it is associated with, assigned at
> >   creation. This namespace is used as needed during subsequent action on the
> >   CM-ID or related objects.
> 
> There is really something weird about this layering. At the CM layer
> there should be no concept of an IP address, it only deals with GIDs.
> 
> So how can a CM object have a network namespace associated with it?
> 
> >  {
> >  	av->port = port;
> >  	av->pkey_index = wc->pkey_index;
> >  	ib_init_ah_from_wc(port->cm_dev->ib_device, port->port_num, wc,
> > -			   grh, &av->ah_attr, &init_net);
> > +			   grh, &av->ah_attr, net);
> 
> There is something deeply wrong with adding network namespace
> arguments to verbs.
> 
> For rocee the gid index clearly specifies the network namespace
> to use, so much of this should go away and have rocee get the
> namespace from the gid index.
> 
> Ie in ib_init_ah_from_wc we have the ib_wc which contains the sgid
> index.
> 
> I'm really not excited at how many places are gaining a net when those
> layers shouldn't even need to care about IP layer details. Just acting
> as a pass through for rocee doesn't make sense.
> 

I had the same feeling when I saw the addition of the network namespace to the
MAD code, especially the RMPP code.

It seems like there should be a better way to deal with this.  My gut says that
the namespace should be handled separate from the ib_init_ah_from_wc.  Perhaps
as a secondary call used only when the namespace is needed?  But I'm not sure
when it is appropriate/needed.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 11/11] IB/ucm: Add partial support for network namespaces
  2015-04-20  9:03   ` [PATCH v2 11/11] IB/ucm: Add partial " Haggai Eran
@ 2015-04-20 23:46     ` ira.weiny
  0 siblings, 0 replies; 39+ messages in thread
From: ira.weiny @ 2015-04-20 23:46 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, Roland Dreier, Sean Hefty, linux-rdma, netdev,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth

On Mon, Apr 20, 2015 at 12:03:42PM +0300, Haggai Eran wrote:
> From: Shachar Raindel <raindel@mellanox.com>
> 
> It is impossible to completely support network namespaces for UCM, as
> we cannot identify the target IPoIB device.
>

As Jasons said it seems like the use of namespaces should be limited to the
RDMA CM layer.  If so I _think_ this patch would not be needed?

Ira


>
> However, we add support
> which will work if the user is following the IB-Spec Annex 11 (RDMA IP
> CM Services) with the service ID and private data formatting.
> 
> Signed-off-by: Haggai Eran <haggaie@mellanox.com>
> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> Signed-off-by: Guy Shapiro <guysh@mellanox.com>
> ---
>  drivers/infiniband/core/ucm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
> index 9604ab068984..424421091dae 100644
> --- a/drivers/infiniband/core/ucm.c
> +++ b/drivers/infiniband/core/ucm.c
> @@ -45,6 +45,7 @@
>  #include <linux/idr.h>
>  #include <linux/mutex.h>
>  #include <linux/slab.h>
> +#include <linux/nsproxy.h>
>  
>  #include <asm/uaccess.h>
>  
> @@ -490,7 +491,7 @@ static ssize_t ib_ucm_create_id(struct ib_ucm_file *file,
>  	ctx->uid = cmd.uid;
>  	ctx->cm_id = ib_create_cm_id(file->device->ib_dev,
>  				     ib_ucm_event_handler, ctx,
> -				     &init_net);
> +				     current->nsproxy->net_ns);
>  	if (IS_ERR(ctx->cm_id)) {
>  		result = PTR_ERR(ctx->cm_id);
>  		goto err1;
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v2 01/11] RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6
       [not found]             ` <CAJ3xEMgKFdr68Qt0vNCaf1p4YjPK2KUSn2FdtQVP0SZQ+Y7atg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-04-21  5:18               ` Shachar Raindel
  0 siblings, 0 replies; 39+ messages in thread
From: Shachar Raindel @ 2015-04-21  5:18 UTC (permalink / raw)
  To: Or Gerlitz, Jason Gunthorpe
  Cc: Haggai Eran, Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux Netdev List, Liran Liss,
	Guy Shapiro, Yotam Kenneth, Or Gerlitz



> -----Original Message-----
> From: Or Gerlitz [mailto:gerlitz.or@gmail.com]
> Sent: Monday, April 20, 2015 9:38 PM
> 
> On Mon, Apr 20, 2015 at 7:41 PM, Jason Gunthorpe
> <jgunthorpe@obsidianresearch.com> wrote:
> > On Mon, Apr 20, 2015 at 12:03:32PM +0300, Haggai Eran wrote:
> >> From: Yotam Kenneth <yotamke@mellanox.com>
> >>
> >> When accepting a new connection with the listener being IPv6, the
> >> family of the new connection is set as IPv6. This causes
> cma_zero_addr
> >> function to return true on an non-zero address. As a result, the
> wrong
> >> code path is taken. This causes the connection request to be
> rejected,
> >> as the RDMA-CM code looks for the wrong type of device.
> >
> > This description doesn't really make sense as to what the problem is.
> >
> >> @@ -866,12 +866,12 @@ static void cma_save_ip4_info(struct rdma_cm_id
> *id, struct rdma_cm_id *listen_i
> >>
> >>       listen4 = (struct sockaddr_in *) &listen_id-
> >route.addr.src_addr;
> >>       ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
> >> -     ip4->sin_family = listen4->sin_family;
> >> +     ip4->sin_family = AF_INET;
> >
> > If listen_id->route.addr.src_addr.ss_family != AF_INET then it is
> > invalid to cast to sockaddr_in.
> >
> > So listen4->sin_family MUST be AF_INET or this function MUST NOT be
> > called.
> >
> > Forcing to AF_INET cannot be correct here.
> 
> Jason, could you take a look @ this thread
> http://marc.info/?t=141589395000004&r=1&w=2 where the authors
> addressed some comments from Sean and he eventually Acked the patch?
> 
> > What does this patch have to do with this series?
> 
> I believe it's either a pre-patch to address some assumption or
> something they stepped on while testing
> 

We stepped upon this issue while testing the containers support we are
Submitting here. When creating a new network namespace, the kernel set 
net->ipv6.sysctl.bindv6only to 0. As a result, we got the IPv6 listening
ID accepting IPv4 connection. This is fixed by the above patch.

Thanks,
--Shachar


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/11] Add network namespace support in the RDMA-CM
  2015-04-20 14:53   ` [PATCH v2 00/11] Add network namespace support in the RDMA-CM Steve Wise
@ 2015-04-21  6:36       ` Haggai Eran
  0 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-21  6:36 UTC (permalink / raw)
  To: Steve Wise, Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 20/04/2015 17:53, Steve Wise wrote:
> 
> Hey Haggai,
> 
> Did you check for changes needed in drivers/infiniband/core/iwcm.c? 

We focused on namespace support for InfiniBand alone in this series. We
didn't handle iWARP, nor did we implement support for RoCE or other
transports.

> I notice that it uses init_net here:
> 
> static int __init iw_cm_init(void)
> {
>         iwcm_wq = create_singlethread_workqueue("iw_cm_wq");
>         if (!iwcm_wq)
>                 return -ENOMEM;
> 
>         iwcm_ctl_table_hdr = register_net_sysctl(&init_net, "net/iw_cm",
>                                                  iwcm_ctl_table);
>         if (!iwcm_ctl_table_hdr) {
>                 pr_err("iw_cm: couldn't register sysctl paths\n");
>                 destroy_workqueue(iwcm_wq);
>                 return -ENOMEM;
>         }
> 
>         return 0;
> }
> 

I see the only thing in the iWARP sysctl registered here is the default
backlog. If you want to control this parameter per namespace, we could
store it per network namespace, and add a namespace parameter to
iw_cm_listen. I'm not sure how important this is though.

Haggai

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/11] Add network namespace support in the RDMA-CM
@ 2015-04-21  6:36       ` Haggai Eran
  0 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-21  6:36 UTC (permalink / raw)
  To: Steve Wise, Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 20/04/2015 17:53, Steve Wise wrote:
> 
> Hey Haggai,
> 
> Did you check for changes needed in drivers/infiniband/core/iwcm.c? 

We focused on namespace support for InfiniBand alone in this series. We
didn't handle iWARP, nor did we implement support for RoCE or other
transports.

> I notice that it uses init_net here:
> 
> static int __init iw_cm_init(void)
> {
>         iwcm_wq = create_singlethread_workqueue("iw_cm_wq");
>         if (!iwcm_wq)
>                 return -ENOMEM;
> 
>         iwcm_ctl_table_hdr = register_net_sysctl(&init_net, "net/iw_cm",
>                                                  iwcm_ctl_table);
>         if (!iwcm_ctl_table_hdr) {
>                 pr_err("iw_cm: couldn't register sysctl paths\n");
>                 destroy_workqueue(iwcm_wq);
>                 return -ENOMEM;
>         }
> 
>         return 0;
> }
> 

I see the only thing in the iWARP sysctl registered here is the default
backlog. If you want to control this parameter per namespace, we could
store it per network namespace, and add a namespace parameter to
iw_cm_listen. I'm not sure how important this is though.

Haggai

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 01/11] RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6
       [not found]               ` <20150420200111.GA32449-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-04-21 10:15                 ` Haggai Eran
       [not found]                   ` <5536232F.3050707-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Haggai Eran @ 2015-04-21 10:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Or Gerlitz
  Cc: Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux Netdev List, Liran Liss,
	Guy Shapiro, Shachar Raindel, Yotam Kenneth, Or Gerlitz

On 20/04/2015 23:01, Jason Gunthorpe wrote:
> On Mon, Apr 20, 2015 at 09:38:02PM +0300, Or Gerlitz wrote:
>> On Mon, Apr 20, 2015 at 7:41 PM, Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>>> On Mon, Apr 20, 2015 at 12:03:32PM +0300, Haggai Eran wrote:
>>>> From: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>
>>>> When accepting a new connection with the listener being IPv6, the
>>>> family of the new connection is set as IPv6. This causes cma_zero_addr
>>>> function to return true on an non-zero address. As a result, the wrong
>>>> code path is taken. This causes the connection request to be rejected,
>>>> as the RDMA-CM code looks for the wrong type of device.
>>>
>>> This description doesn't really make sense as to what the problem is.
> 
>> Jason, could you take a look @ this thread
>> http://marc.info/?t=141589395000004&r=1&w=2 where the authors
>> addressed some comments from Sean and he eventually Acked the patch?
> 
> Please actually read my comments:
> 
>  If listen_id->route.addr.src_addr.ss_family != AF_INET then it is
>  invalid to cast to sockaddr_in.

That's correct. We didn't address it because it was part of the existing
code. Anyway, in a later patch in this series we move this code from the
CMA to the CM module. Then we get the port number from the service ID
instead of from the listener ID, since the listener ID's port isn't
available.

> 
> Sean asked basically the same thing, and his question was ignored too.
> 
> This should take care of it, testing, and figuring the fixes tag is
> left as an exercise to the reader..
> 

Fixes: e51060f08a61 ("IB: IP address based RDMA connection manager")
Tested-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/11] IB/addr: Pass network namespace as a parameter
       [not found]         ` <20150420170925.GE7676-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-04-21 10:29             ` Haggai Eran
  0 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-21 10:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth

On 20/04/2015 20:09, Jason Gunthorpe wrote:
> On Mon, Apr 20, 2015 at 12:03:33PM +0300, Haggai Eran wrote:
>> +/** rdma_addr_find_smac_by_sgid() - Find the src MAC and VLAN ID for a src GID
>> + * @sgid:	Source GID to find the MAC and VLAN for.
>> + * @smac:	A buffer to contain the resulting MAC address.
>> + * @vlan_id:	Will contain the resulting VLAN ID.
>> + * @net:	Network namespace to use for the address resolution.
>> + *
>> + * It is the caller's responsibility to keep the network namespace alive until
>> + * the function returns.
>> + */
>> +int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id,
>> +				struct net *net);
> 
> kdocs are typically placed with the body of the function, not at the
> prototype.

I'll move it in the next revision. We did that because other functions
(rdma_translate_ip, rdma_resolve_ip) are documented inside ib_addr.h
this way.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/11] IB/addr: Pass network namespace as a parameter
@ 2015-04-21 10:29             ` Haggai Eran
  0 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-21 10:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth

On 20/04/2015 20:09, Jason Gunthorpe wrote:
> On Mon, Apr 20, 2015 at 12:03:33PM +0300, Haggai Eran wrote:
>> +/** rdma_addr_find_smac_by_sgid() - Find the src MAC and VLAN ID for a src GID
>> + * @sgid:	Source GID to find the MAC and VLAN for.
>> + * @smac:	A buffer to contain the resulting MAC address.
>> + * @vlan_id:	Will contain the resulting VLAN ID.
>> + * @net:	Network namespace to use for the address resolution.
>> + *
>> + * It is the caller's responsibility to keep the network namespace alive until
>> + * the function returns.
>> + */
>> +int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id,
>> +				struct net *net);
> 
> kdocs are typically placed with the body of the function, not at the
> prototype.

I'll move it in the next revision. We did that because other functions
(rdma_translate_ip, rdma_resolve_ip) are documented inside ib_addr.h
this way.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/11] IB/addr: Pass network namespace as a parameter
       [not found]         ` <1429567530.45956.31.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-04-21 10:34             ` Haggai Eran
  0 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-21 10:34 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Roland Dreier, Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 21/04/2015 01:05, Doug Ledford wrote:
> On Mon, 2015-04-20 at 12:03 +0300, Haggai Eran wrote:
>> From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Add network namespace support to the ib_addr module. For that, all the address
>> resolution and matching should be done using the appropriate namespace instead
>> of init_net.
>>
>> This is achieved by:
>>
>> 1. Adding an explicit network namespace argument to exported function that
>>    require a namespace.
>> 2. Saving the namespace in the rdma_addr_client structure.
>> 3. Using it when calling networking functions.
>>
>> In order to preserve the behavior of calling modules, &init_net is
>> passed as the parameter in calls from other modules. This is modified as
>> namespace support is added on more levels.
>>
>> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>>  drivers/infiniband/core/addr.c           | 31 ++++++++++++----------
>>  drivers/infiniband/core/cma.c            |  4 ++-
>>  drivers/infiniband/core/verbs.c          | 14 +++++++---
>>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |  3 ++-
>>  include/rdma/ib_addr.h                   | 44 ++++++++++++++++++++++++++++----
>>  5 files changed, 72 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
>> index f80da50d84a5..95beaef6b66d 100644
>> --- a/drivers/infiniband/core/addr.c
>> +++ b/drivers/infiniband/core/addr.c
>> @@ -128,7 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>>  	int ret = -EADDRNOTAVAIL;
>>  
>>  	if (dev_addr->bound_dev_if) {
>> -		dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
>> +		dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
>>  		if (!dev)
>>  			return -ENODEV;
>>  		ret = rdma_copy_addr(dev_addr, dev, NULL);
>> @@ -137,9 +137,10 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>>  	}
>>  
>>  	switch (addr->sa_family) {
>> -	case AF_INET:
>> -		dev = ip_dev_find(&init_net,
>> -			((struct sockaddr_in *) addr)->sin_addr.s_addr);
>> +	case AF_INET: {
>                        ^ Please don't add brackets just so you can
> convert a cast into a variable declaration that's unnecessary
> 
>> +		struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
>> +
>> +		dev = ip_dev_find(dev_addr->net, addr_in->sin_addr.s_addr);
>>  
>>  		if (!dev)
>>  			return ret;
>> @@ -149,12 +150,12 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>>  			*vlan_id = rdma_vlan_dev_vlan_id(dev);
>>  		dev_put(dev);
>>  		break;
>> -
>> +	}
>>  #if IS_ENABLED(CONFIG_IPV6)
>>  	case AF_INET6:
>>  		rcu_read_lock();
>> -		for_each_netdev_rcu(&init_net, dev) {
>> -			if (ipv6_chk_addr(&init_net,
>> +		for_each_netdev_rcu(dev_addr->net, dev) {
>> +			if (ipv6_chk_addr(dev_addr->net,
>>  					  &((struct sockaddr_in6 *) addr)->sin6_addr,
>>  					  dev, 1)) {
>>  				ret = rdma_copy_addr(dev_addr, dev, NULL);
>> @@ -236,7 +237,7 @@ static int addr4_resolve(struct sockaddr_in *src_in,
>>  	fl4.daddr = dst_ip;
>>  	fl4.saddr = src_ip;
>>  	fl4.flowi4_oif = addr->bound_dev_if;
>> -	rt = ip_route_output_key(&init_net, &fl4);
>> +	rt = ip_route_output_key(addr->net, &fl4);
>>  	if (IS_ERR(rt)) {
>>  		ret = PTR_ERR(rt);
>>  		goto out;
>> @@ -278,12 +279,13 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
>>  	fl6.saddr = src_in->sin6_addr;
>>  	fl6.flowi6_oif = addr->bound_dev_if;
>>  
>> -	dst = ip6_route_output(&init_net, NULL, &fl6);
>> +	dst = ip6_route_output(addr->net, NULL, &fl6);
>>  	if ((ret = dst->error))
>>  		goto put;
>>  
>>  	if (ipv6_addr_any(&fl6.saddr)) {
>> -		ret = ipv6_dev_get_saddr(&init_net, ip6_dst_idev(dst)->dev,
>> +		ret = ipv6_dev_get_saddr(addr->net,
>> +					 ip6_dst_idev(dst)->dev,
>>  					 &fl6.daddr, 0, &fl6.saddr);
>>  		if (ret)
>>  			goto put;
>> @@ -458,7 +460,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr,
>>  }
>>  
>>  int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
>> -			       u16 *vlan_id)
>> +			       u16 *vlan_id, struct net *net)
> 
> In the core networking code, the net namespace is always first.  Please
> stick with that paradigm.
> 

I'll fix these comments in the next revision.

Thanks,
Haggai

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/11] IB/addr: Pass network namespace as a parameter
@ 2015-04-21 10:34             ` Haggai Eran
  0 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-21 10:34 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Roland Dreier, Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 21/04/2015 01:05, Doug Ledford wrote:
> On Mon, 2015-04-20 at 12:03 +0300, Haggai Eran wrote:
>> From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Add network namespace support to the ib_addr module. For that, all the address
>> resolution and matching should be done using the appropriate namespace instead
>> of init_net.
>>
>> This is achieved by:
>>
>> 1. Adding an explicit network namespace argument to exported function that
>>    require a namespace.
>> 2. Saving the namespace in the rdma_addr_client structure.
>> 3. Using it when calling networking functions.
>>
>> In order to preserve the behavior of calling modules, &init_net is
>> passed as the parameter in calls from other modules. This is modified as
>> namespace support is added on more levels.
>>
>> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>>  drivers/infiniband/core/addr.c           | 31 ++++++++++++----------
>>  drivers/infiniband/core/cma.c            |  4 ++-
>>  drivers/infiniband/core/verbs.c          | 14 +++++++---
>>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |  3 ++-
>>  include/rdma/ib_addr.h                   | 44 ++++++++++++++++++++++++++++----
>>  5 files changed, 72 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
>> index f80da50d84a5..95beaef6b66d 100644
>> --- a/drivers/infiniband/core/addr.c
>> +++ b/drivers/infiniband/core/addr.c
>> @@ -128,7 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>>  	int ret = -EADDRNOTAVAIL;
>>  
>>  	if (dev_addr->bound_dev_if) {
>> -		dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
>> +		dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
>>  		if (!dev)
>>  			return -ENODEV;
>>  		ret = rdma_copy_addr(dev_addr, dev, NULL);
>> @@ -137,9 +137,10 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>>  	}
>>  
>>  	switch (addr->sa_family) {
>> -	case AF_INET:
>> -		dev = ip_dev_find(&init_net,
>> -			((struct sockaddr_in *) addr)->sin_addr.s_addr);
>> +	case AF_INET: {
>                        ^ Please don't add brackets just so you can
> convert a cast into a variable declaration that's unnecessary
> 
>> +		struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
>> +
>> +		dev = ip_dev_find(dev_addr->net, addr_in->sin_addr.s_addr);
>>  
>>  		if (!dev)
>>  			return ret;
>> @@ -149,12 +150,12 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>>  			*vlan_id = rdma_vlan_dev_vlan_id(dev);
>>  		dev_put(dev);
>>  		break;
>> -
>> +	}
>>  #if IS_ENABLED(CONFIG_IPV6)
>>  	case AF_INET6:
>>  		rcu_read_lock();
>> -		for_each_netdev_rcu(&init_net, dev) {
>> -			if (ipv6_chk_addr(&init_net,
>> +		for_each_netdev_rcu(dev_addr->net, dev) {
>> +			if (ipv6_chk_addr(dev_addr->net,
>>  					  &((struct sockaddr_in6 *) addr)->sin6_addr,
>>  					  dev, 1)) {
>>  				ret = rdma_copy_addr(dev_addr, dev, NULL);
>> @@ -236,7 +237,7 @@ static int addr4_resolve(struct sockaddr_in *src_in,
>>  	fl4.daddr = dst_ip;
>>  	fl4.saddr = src_ip;
>>  	fl4.flowi4_oif = addr->bound_dev_if;
>> -	rt = ip_route_output_key(&init_net, &fl4);
>> +	rt = ip_route_output_key(addr->net, &fl4);
>>  	if (IS_ERR(rt)) {
>>  		ret = PTR_ERR(rt);
>>  		goto out;
>> @@ -278,12 +279,13 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
>>  	fl6.saddr = src_in->sin6_addr;
>>  	fl6.flowi6_oif = addr->bound_dev_if;
>>  
>> -	dst = ip6_route_output(&init_net, NULL, &fl6);
>> +	dst = ip6_route_output(addr->net, NULL, &fl6);
>>  	if ((ret = dst->error))
>>  		goto put;
>>  
>>  	if (ipv6_addr_any(&fl6.saddr)) {
>> -		ret = ipv6_dev_get_saddr(&init_net, ip6_dst_idev(dst)->dev,
>> +		ret = ipv6_dev_get_saddr(addr->net,
>> +					 ip6_dst_idev(dst)->dev,
>>  					 &fl6.daddr, 0, &fl6.saddr);
>>  		if (ret)
>>  			goto put;
>> @@ -458,7 +460,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr,
>>  }
>>  
>>  int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
>> -			       u16 *vlan_id)
>> +			       u16 *vlan_id, struct net *net)
> 
> In the core networking code, the net namespace is always first.  Please
> stick with that paradigm.
> 

I'll fix these comments in the next revision.

Thanks,
Haggai

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 07/11] IB/cm: Add network namespace support
       [not found]       ` <20150420170659.GD7676-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-04-21 11:59           ` Haggai Eran
  2015-04-21 11:59           ` Haggai Eran
  1 sibling, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-21 11:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth

On 20/04/2015 20:06, Jason Gunthorpe wrote:
> On Mon, Apr 20, 2015 at 12:03:38PM +0300, Haggai Eran wrote:
>> From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Add namespace support to the IB-CM layer.
> 
>> - Each CM-ID now has a network namespace it is associated with, assigned at
>>   creation. This namespace is used as needed during subsequent action on the
>>   CM-ID or related objects.
> 
> There is really something weird about this layering. At the CM layer
> there should be no concept of an IP address, it only deals with GIDs.

Using the GID alone is not enough to distinguish between namespaces,
because you can have multiple IPoIB interfaces, all using the GID (and
possibly the same P_Key), and each belonging to a different namespace.

> So how can a CM object have a network namespace associated with it?

The listener rbtree's key is currently the service ID, for instance. Now
with namespaces, you can have multiple listeners listening on the same
service ID, so we need to use (service ID, namespace) as the key.

> 
>>  {
>>  	av->port = port;
>>  	av->pkey_index = wc->pkey_index;
>>  	ib_init_ah_from_wc(port->cm_dev->ib_device, port->port_num, wc,
>> -			   grh, &av->ah_attr, &init_net);
>> +			   grh, &av->ah_attr, net);
> 
> There is something deeply wrong with adding network namespace
> arguments to verbs.
> 
> For rocee the gid index clearly specifies the network namespace
> to use, so much of this should go away and have rocee get the
> namespace from the gid index.
> 
> Ie in ib_init_ah_from_wc we have the ib_wc which contains the sgid
> index.

I don't see it there. The code seem to fetch the GID from the GRH.
Because the IP address in the source GID can be the same for different
namespaces, this is not enough to pick the right namespace.

Regards,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 07/11] IB/cm: Add network namespace support
@ 2015-04-21 11:59           ` Haggai Eran
  0 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-21 11:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth

On 20/04/2015 20:06, Jason Gunthorpe wrote:
> On Mon, Apr 20, 2015 at 12:03:38PM +0300, Haggai Eran wrote:
>> From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Add namespace support to the IB-CM layer.
> 
>> - Each CM-ID now has a network namespace it is associated with, assigned at
>>   creation. This namespace is used as needed during subsequent action on the
>>   CM-ID or related objects.
> 
> There is really something weird about this layering. At the CM layer
> there should be no concept of an IP address, it only deals with GIDs.

Using the GID alone is not enough to distinguish between namespaces,
because you can have multiple IPoIB interfaces, all using the GID (and
possibly the same P_Key), and each belonging to a different namespace.

> So how can a CM object have a network namespace associated with it?

The listener rbtree's key is currently the service ID, for instance. Now
with namespaces, you can have multiple listeners listening on the same
service ID, so we need to use (service ID, namespace) as the key.

> 
>>  {
>>  	av->port = port;
>>  	av->pkey_index = wc->pkey_index;
>>  	ib_init_ah_from_wc(port->cm_dev->ib_device, port->port_num, wc,
>> -			   grh, &av->ah_attr, &init_net);
>> +			   grh, &av->ah_attr, net);
> 
> There is something deeply wrong with adding network namespace
> arguments to verbs.
> 
> For rocee the gid index clearly specifies the network namespace
> to use, so much of this should go away and have rocee get the
> namespace from the gid index.
> 
> Ie in ib_init_ah_from_wc we have the ib_wc which contains the sgid
> index.

I don't see it there. The code seem to fetch the GID from the GRH.
Because the IP address in the source GID can be the same for different
namespaces, this is not enough to pick the right namespace.

Regards,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/11] Add network namespace support in the RDMA-CM
       [not found]       ` <5535EFE9.3000106-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-04-21 14:11         ` Steve Wise
       [not found]           ` <55365AAD.6020100-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Steve Wise @ 2015-04-21 14:11 UTC (permalink / raw)
  To: Haggai Eran, Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 4/21/2015 1:36 AM, Haggai Eran wrote:
> On 20/04/2015 17:53, Steve Wise wrote:
>> Hey Haggai,
>>
>> Did you check for changes needed in drivers/infiniband/core/iwcm.c?
> We focused on namespace support for InfiniBand alone in this series. We
> didn't handle iWARP, nor did we implement support for RoCE or other
> transports.
>
>> I notice that it uses init_net here:
>>
>> static int __init iw_cm_init(void)
>> {
>>          iwcm_wq = create_singlethread_workqueue("iw_cm_wq");
>>          if (!iwcm_wq)
>>                  return -ENOMEM;
>>
>>          iwcm_ctl_table_hdr = register_net_sysctl(&init_net, "net/iw_cm",
>>                                                   iwcm_ctl_table);
>>          if (!iwcm_ctl_table_hdr) {
>>                  pr_err("iw_cm: couldn't register sysctl paths\n");
>>                  destroy_workqueue(iwcm_wq);
>>                  return -ENOMEM;
>>          }
>>
>>          return 0;
>> }
>>
> I see the only thing in the iWARP sysctl registered here is the default
> backlog. If you want to control this parameter per namespace, we could
> store it per network namespace, and add a namespace parameter to
> iw_cm_listen. I'm not sure how important this is though.

I don't think it needs to be per namespace, as long as it still applies 
across all name spaces.

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/11] Add network namespace support in the RDMA-CM
       [not found]           ` <55365AAD.6020100-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2015-04-21 14:21               ` Haggai Eran
  0 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-21 14:21 UTC (permalink / raw)
  To: Steve Wise, Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 21/04/2015 17:11, Steve Wise wrote:
> On 4/21/2015 1:36 AM, Haggai Eran wrote:
>> On 20/04/2015 17:53, Steve Wise wrote:
>>> Hey Haggai,
>>>
>>> Did you check for changes needed in drivers/infiniband/core/iwcm.c?
>> We focused on namespace support for InfiniBand alone in this series. We
>> didn't handle iWARP, nor did we implement support for RoCE or other
>> transports.
>>
>>> I notice that it uses init_net here:
>>>
>>> static int __init iw_cm_init(void)
>>> {
>>>          iwcm_wq = create_singlethread_workqueue("iw_cm_wq");
>>>          if (!iwcm_wq)
>>>                  return -ENOMEM;
>>>
>>>          iwcm_ctl_table_hdr = register_net_sysctl(&init_net,
>>> "net/iw_cm",
>>>                                                   iwcm_ctl_table);
>>>          if (!iwcm_ctl_table_hdr) {
>>>                  pr_err("iw_cm: couldn't register sysctl paths\n");
>>>                  destroy_workqueue(iwcm_wq);
>>>                  return -ENOMEM;
>>>          }
>>>
>>>          return 0;
>>> }
>>>
>> I see the only thing in the iWARP sysctl registered here is the default
>> backlog. If you want to control this parameter per namespace, we could
>> store it per network namespace, and add a namespace parameter to
>> iw_cm_listen. I'm not sure how important this is though.
> 
> I don't think it needs to be per namespace, as long as it still applies
> across all name spaces.

It will, but it will currently only be visible and controllable through
init's namespace.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/11] Add network namespace support in the RDMA-CM
@ 2015-04-21 14:21               ` Haggai Eran
  0 siblings, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-21 14:21 UTC (permalink / raw)
  To: Steve Wise, Doug Ledford, Roland Dreier
  Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 21/04/2015 17:11, Steve Wise wrote:
> On 4/21/2015 1:36 AM, Haggai Eran wrote:
>> On 20/04/2015 17:53, Steve Wise wrote:
>>> Hey Haggai,
>>>
>>> Did you check for changes needed in drivers/infiniband/core/iwcm.c?
>> We focused on namespace support for InfiniBand alone in this series. We
>> didn't handle iWARP, nor did we implement support for RoCE or other
>> transports.
>>
>>> I notice that it uses init_net here:
>>>
>>> static int __init iw_cm_init(void)
>>> {
>>>          iwcm_wq = create_singlethread_workqueue("iw_cm_wq");
>>>          if (!iwcm_wq)
>>>                  return -ENOMEM;
>>>
>>>          iwcm_ctl_table_hdr = register_net_sysctl(&init_net,
>>> "net/iw_cm",
>>>                                                   iwcm_ctl_table);
>>>          if (!iwcm_ctl_table_hdr) {
>>>                  pr_err("iw_cm: couldn't register sysctl paths\n");
>>>                  destroy_workqueue(iwcm_wq);
>>>                  return -ENOMEM;
>>>          }
>>>
>>>          return 0;
>>> }
>>>
>> I see the only thing in the iWARP sysctl registered here is the default
>> backlog. If you want to control this parameter per namespace, we could
>> store it per network namespace, and add a namespace parameter to
>> iw_cm_listen. I'm not sure how important this is though.
> 
> I don't think it needs to be per namespace, as long as it still applies
> across all name spaces.

It will, but it will currently only be visible and controllable through
init's namespace.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 07/11] IB/cm: Add network namespace support
       [not found]             ` <55363D93.10706-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-04-21 15:54               ` Jason Gunthorpe
  0 siblings, 0 replies; 39+ messages in thread
From: Jason Gunthorpe @ 2015-04-21 15:54 UTC (permalink / raw)
  To: Haggai Eran
  Cc: ira.weiny, Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth

On Tue, Apr 21, 2015 at 03:07:47PM +0300, Haggai Eran wrote:

> Namespace is needed for RoCE address resolution, in cases where the
> driver doesn't report the MAC as part of the ib_wc.

This patch explicitly says it doesn't deal with RoCE, so why are we
adding namespaces to support RoCE paths in this series? Especially
since we have no idea how that should fit into verbs.

Frankly, that stuff is the most objectionable part of this series.

I suggest:
 1) Focus only on the RDMA-CM, and only on IB support, as the title
    says
 2) Drop all changes to verbs and cm and otherwise that are not
    directly related to IB
 3) Very, very, strongly justify why the remaining layer violations are
    necessary, and think very carefully about doing something else.

For IB, it is very clear to me that only the RDMA-CM can possibly have
the knoweldge to find the namespace, so only the RDMA-CM should be
touching it.

If the interface between the RDMA-CM and IB-CM layers is preventing
something, then extend the interface, don't drop RDMA-CM code into
IB-CM.

>From that point, with working IB, we can revisit what is needed to
make iWarp and RoCE work at the verbs layer and ultimately at the CM
layer, in steps.

Your other questions:

> Using the GID alone is not enough to distinguish between namespaces,
> because you can have multiple IPoIB interfaces, all using the GID (and
> possibly the same P_Key), and each belonging to a different namespace.

Exactly, this is why IB GID layers can't possibly need to touch the net
namespace.

> The listener rbtree's key is currently the service ID, for
> instance. Now with namespaces, you can have multiple listeners
> listening on the same service ID, so we need to use (service ID,
> namespace) as the key.

CM doesn't care, a service ID is registered by RDMA-CM and RDMA-CM can
demux the (service ID,IP) tuple to the right namespace. Having CM
snoop private data is a huge layering violation!

> looks at it's private data and demuxes it to a net namespace.
> I don't see it there. The code seem to fetch the GID from the GRH.
> Because the IP address in the source GID can be the same for different
> namespaces, this is not enough to pick the right namespace.

For IB, ib_init_ah_from_wc does not need a namespace.

For RoCEE, the GID *MUST* be enough to find the namespace because each
namespace will create a unique GID table entry.

RoCEE and IB are going to be totally different in how this
implemented...

I expect RoCEE to have namespace constraints at the verbs QP level,
while IB cannot - that feels like a huge journey...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 01/11] RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6
       [not found]                   ` <5536232F.3050707-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-04-22 11:26                     ` Haggai Eran
  2015-04-22 17:29                     ` Jason Gunthorpe
  1 sibling, 0 replies; 39+ messages in thread
From: Haggai Eran @ 2015-04-22 11:26 UTC (permalink / raw)
  To: Jason Gunthorpe, Or Gerlitz
  Cc: Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux Netdev List, Liran Liss,
	Guy Shapiro, Shachar Raindel, Yotam Kenneth, Or Gerlitz

On Tuesday, April 21, 2015 1:15 PM, Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On 20/04/2015 23:01, Jason Gunthorpe wrote:
>> This should take care of it, testing, and figuring the fixes tag is
>> left as an exercise to the reader..
> 
> Fixes: e51060f08a61 ("IB: IP address based RDMA connection manager")
> Tested-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 

Roland, Doug,

Could you pick Jason's patch without the rest of the series?

It seems the namespace series will need more work, but I don't think this patch should be delayed as well.

Thanks,
Haggai--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 01/11] RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6
       [not found]                   ` <5536232F.3050707-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-04-22 11:26                     ` Haggai Eran
@ 2015-04-22 17:29                     ` Jason Gunthorpe
  1 sibling, 0 replies; 39+ messages in thread
From: Jason Gunthorpe @ 2015-04-22 17:29 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Or Gerlitz, Doug Ledford, Roland Dreier, Sean Hefty,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux Netdev List, Liran Liss,
	Guy Shapiro, Shachar Raindel, Yotam Kenneth, Or Gerlitz

On Tue, Apr 21, 2015 at 01:15:11PM +0300, Haggai Eran wrote:

> That's correct. We didn't address it because it was part of the existing
> code. Anyway, in a later patch in this series we move this code from the
> CMA to the CM module.

Just so we are all on the same page in the future:
 - Don't half fix bugs: 'part of the existing code' is not an excuse.
 - Don't mix clearly independent bug fixes into a patch series.

> Fixes: e51060f08a61 ("IB: IP address based RDMA connection manager")
> Tested-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Thanks

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2015-04-22 17:29 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-20  9:03 [PATCH v2 00/11] Add network namespace support in the RDMA-CM Haggai Eran
     [not found] ` <1429520622-10303-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-04-20  9:03   ` [PATCH v2 01/11] RDMA/CMA: Mark IPv4 addresses correctly when the listener is IPv6 Haggai Eran
     [not found]     ` <1429520622-10303-2-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-04-20 16:41       ` Jason Gunthorpe
     [not found]         ` <20150420164140.GC7676-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-04-20 18:38           ` Or Gerlitz
2015-04-20 20:01             ` Jason Gunthorpe
     [not found]               ` <20150420200111.GA32449-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-04-21 10:15                 ` Haggai Eran
     [not found]                   ` <5536232F.3050707-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-04-22 11:26                     ` Haggai Eran
2015-04-22 17:29                     ` Jason Gunthorpe
     [not found]             ` <CAJ3xEMgKFdr68Qt0vNCaf1p4YjPK2KUSn2FdtQVP0SZQ+Y7atg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-04-21  5:18               ` Shachar Raindel
2015-04-20  9:03   ` [PATCH v2 02/11] IB/addr: Pass network namespace as a parameter Haggai Eran
     [not found]     ` <1429520622-10303-3-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-04-20 17:09       ` Jason Gunthorpe
     [not found]         ` <20150420170925.GE7676-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-04-21 10:29           ` Haggai Eran
2015-04-21 10:29             ` Haggai Eran
2015-04-20 22:05       ` Doug Ledford
     [not found]         ` <1429567530.45956.31.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-04-21 10:34           ` Haggai Eran
2015-04-21 10:34             ` Haggai Eran
2015-04-20  9:03   ` [PATCH v2 04/11] IB/core: Find the network namespace matching connection parameters Haggai Eran
2015-04-20  9:03   ` [PATCH v2 06/11] IB/cm, cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm Haggai Eran
     [not found]     ` <1429520622-10303-7-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-04-20 23:29       ` ira.weiny
2015-04-20  9:03   ` [PATCH v2 08/11] IB/cma: Separate port allocation to network namespaces Haggai Eran
2015-04-20  9:03   ` [PATCH v2 09/11] IB/cma: Add support for " Haggai Eran
2015-04-20  9:03   ` [PATCH v2 11/11] IB/ucm: Add partial " Haggai Eran
2015-04-20 23:46     ` ira.weiny
2015-04-20 14:53   ` [PATCH v2 00/11] Add network namespace support in the RDMA-CM Steve Wise
2015-04-21  6:36     ` Haggai Eran
2015-04-21  6:36       ` Haggai Eran
     [not found]       ` <5535EFE9.3000106-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-04-21 14:11         ` Steve Wise
     [not found]           ` <55365AAD.6020100-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2015-04-21 14:21             ` Haggai Eran
2015-04-21 14:21               ` Haggai Eran
2015-04-20  9:03 ` [PATCH v2 03/11] IB/core: Pass network namespace as a parameter to relevant functions Haggai Eran
2015-04-20  9:03 ` [PATCH v2 05/11] IB/ipoib: Return IPoIB devices as possible matches to get_net_device_by_port_pkey_ip Haggai Eran
2015-04-20 23:09   ` ira.weiny
2015-04-20  9:03 ` [PATCH v2 07/11] IB/cm: Add network namespace support Haggai Eran
     [not found]   ` <1429520622-10303-8-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-04-20 17:06     ` Jason Gunthorpe
     [not found]       ` <20150420170659.GD7676-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-04-20 23:35         ` ira.weiny
     [not found]           ` <55363D93.10706@mellanox.com>
     [not found]             ` <55363D93.10706-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-04-21 15:54               ` Jason Gunthorpe
2015-04-21 11:59         ` Haggai Eran
2015-04-21 11:59           ` Haggai Eran
2015-04-20  9:03 ` [PATCH v2 10/11] IB/ucma: Take the network namespace from the process Haggai Eran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.