All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
@ 2009-10-28  5:42 Jason Gunthorpe
       [not found] ` <20091028054253.GA22882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2009-10-28  5:42 UTC (permalink / raw)
  To: David J. Wilder, leo.tominna-QHcLZuEGTsvQT0dZR+AlfA
  Cc: Sean Hefty, linux-rdma

**COMPILE TESTED ONLY**

Convert the address resolution process for outgoing connections
to be very similar to the way the TCP stack does the same operations.
This fixes many corner case bugs that can crop up.

- Process link local scope IPv6 addreses and use sin6_scope_id
  to choose the binding device
- Use routing lookups to choose the device and the source address
- Use ipv6_addr_copy
- Make the interaction of rdma_bind_addr and rdma_resolve_addr smoother
  when both specify a source address
- Replace the unlocked struct net_device pointer in rdma_dev_addr with
  an ifindex value. This is easier to work with when processing
  sin6_scope_id.
- Make addr4_resolve and addr6_resolve have the same flow, for easier
  review.
- Fixup error checking in the v6 routing lookup, dst is never 0, errno
  is returned via dst.error. Corrects cases where routing would return
  ENETUNREACH, etc
- Fold addr_send_arp into addr_resolve so that it uses the correct
  dst structure.

Based on work from "David J. Wilder" <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Reported-by: "David J. Wilder" <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Reported-by: leo.tominna-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org
---
 drivers/infiniband/core/addr.c |  242 ++++++++++++++--------------------------
 drivers/infiniband/core/cma.c  |   51 +++++++--
 include/rdma/ib_addr.h         |    2 +-
 3 files changed, 127 insertions(+), 168 deletions(-)

So.. I suddenly had a notion how to fit this all together when I was
writing about that SO_BINDTODEVICE stuff.

Here is how I think this should all go. I can't steal some machines to
test it right now (prolly can't till after SC|09), but David/Sean maybe
you can have a look at it?

This should be split up into probably 3-4 patches, but I've included it
as one to make discussions as to the whole flow simpler..

There are still a few areas I'm not certain about. There is something
wonky about how the old code was handling neighbours, I'm not sure if 
addr_send_arp was correct to assume dst->neighbour is always non zero,
or if addrX_resolve_remote is correct to check that it could be
zero. I'm also not certain the neigh_lookup is necessary, if the
dst->neighbour is already correct and non-zero. Hmm.

I left the network namespace stuff alone and kept with the init_net
situation..

The approach to solve the niggly problem Sean pointed out WRT to
rdma_bind_addr was to treat rdma_bind_addr as the combination of
SO_BINDTODEVICE and bind() in the normal inet case, and copy that
algorithm exactly. If rdma_bind_addr() is called with non-any IP
then the cm_id is bound to that device via the new bound_dev_if
index. All other checks then refer to that, and bound_dev_if is
fed into the route lookup to set the source device, and if the source
device is set then it is copied into the route output by the route
routines and everything works the way you'd expect. For the loopback
case, if no source device is bound then the destination IP is used
as the RDMA device, if one is bound then it is used instead.

There is also now checking for IPv6 link local addresses, and like in
the real stack specifying link local is basically the same flow as
SO_BINDTODEVICE, stick the scope_id into bound_dev_if and let the
routing sort it out.

I think the change for the arps should be done, but I'd definately
split it out into a new patch. Doing the neigh_event with the dst from
the single route lookup is really necessary for correct operation in
all cases - the old code could jump to another net_device in certain
situations due to the incomplete second lookup (Leo noticed this). It
may also be a good idea to set bound_dev_if when the neigh_event_send
is called so that if the timers run through addrX_resolve again we are
guarenteed to be bound on the right device.

This also fixes rdma_bind_addr with a v6 link local address, and v6
source address selection cases when unbound.

I'm guessing there are probably more problems with the v6 support,
like how is the port space being shared between v6/v4 and do incoming
v4 connections get mapped to v6 listeners? But that has nothing to do
with the routing... :)

AH, there is still another oopsie I just noticed, rdma_resolve_addr
does not check that dst_addr and src_addr are the same AF, and does
not check if src_addr == 0 that the bind_addr value is the same
AF. Not fixed in this version, just noting it here..

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index bd07803..1b6d419 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -107,7 +107,7 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
 	memcpy(dev_addr->broadcast, dev->broadcast, MAX_ADDR_LEN);
 	if (dst_dev_addr)
 		memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN);
-	dev_addr->src_dev = dev;
+	dev_addr->bound_dev_if = dev->ifindex;
 	return 0;
 }
 EXPORT_SYMBOL(rdma_copy_addr);
@@ -117,6 +117,16 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr)
 	struct net_device *dev;
 	int ret = -EADDRNOTAVAIL;
 
+	/* Someone has specified we must bind to a specific interface.. */
+	if (dev_addr->bound_dev_if) {
+		dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
+		if (!dev)
+			return -ENODEV;
+		ret = rdma_copy_addr(dev_addr, dev, NULL);
+		dev_put(dev);
+		return ret;
+	}
+
 	switch (addr->sa_family) {
 	case AF_INET:
 		dev = ip_dev_find(&init_net,
@@ -176,48 +186,9 @@ static void queue_req(struct addr_req *req)
 	mutex_unlock(&lock);
 }
 
-static void addr_send_arp(struct sockaddr *dst_in)
-{
-	struct rtable *rt;
-	struct flowi fl;
-
-	memset(&fl, 0, sizeof fl);
-
-	switch (dst_in->sa_family) {
-	case AF_INET:
-		fl.nl_u.ip4_u.daddr =
-			((struct sockaddr_in *) dst_in)->sin_addr.s_addr;
-
-		if (ip_route_output_key(&init_net, &rt, &fl))
-			return;
-
-		neigh_event_send(rt->u.dst.neighbour, NULL);
-		ip_rt_put(rt);
-		break;
-
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
-	case AF_INET6:
-	{
-		struct dst_entry *dst;
-
-		fl.nl_u.ip6_u.daddr =
-			((struct sockaddr_in6 *) dst_in)->sin6_addr;
-
-		dst = ip6_route_output(&init_net, NULL, &fl);
-		if (!dst)
-			return;
-
-		neigh_event_send(dst->neighbour, NULL);
-		dst_release(dst);
-		break;
-	}
-#endif
-	}
-}
-
-static int addr4_resolve_remote(struct sockaddr_in *src_in,
-			       struct sockaddr_in *dst_in,
-			       struct rdma_dev_addr *addr)
+static int addr4_resolve(struct sockaddr_in *src_in,
+			 struct sockaddr_in *dst_in,
+			 struct rdma_dev_addr *addr)
 {
 	__be32 src_ip = src_in->sin_addr.s_addr;
 	__be32 dst_ip = dst_in->sin_addr.s_addr;
@@ -229,10 +200,22 @@ static int addr4_resolve_remote(struct sockaddr_in *src_in,
 	memset(&fl, 0, sizeof fl);
 	fl.nl_u.ip4_u.daddr = dst_ip;
 	fl.nl_u.ip4_u.saddr = src_ip;
+	fl.oif = addr->bound_dev_if;
 	ret = ip_route_output_key(&init_net, &rt, &fl);
 	if (ret)
 		goto out;
 
+	src_in->sin_family = AF_INET;
+	src_in->sin_addr.s_addr = rt->rt_src;
+
+	if (rt->idev->dev == init_net.loopback_dev) {
+		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr);
+		if (!ret)
+			memcpy(addr->dst_dev_addr, addr->src_dev_addr,
+			       MAX_ADDR_LEN);
+		goto put;
+	}
+
 	/* If the device does ARP internally, return 'done' */
 	if (rt->idev->dev->flags & IFF_NOARP) {
 		rdma_copy_addr(addr, rt->idev->dev, NULL);
@@ -240,24 +223,16 @@ static int addr4_resolve_remote(struct sockaddr_in *src_in,
 	}
 
 	neigh = neigh_lookup(&arp_tbl, &rt->rt_gateway, rt->idev->dev);
-	if (!neigh) {
+	if (!neigh || !(neigh->nud_state & NUD_VALID)) {
+		neigh_event_send(rt->u.dst.neighbour, NULL);
 		ret = -ENODATA;
+		if (neigh)
+			neigh_release(neigh);
 		goto put;
 	}
-
-	if (!(neigh->nud_state & NUD_VALID)) {
-		ret = -ENODATA;
-		goto release;
-	}
-
-	if (!src_ip) {
-		src_in->sin_family = dst_in->sin_family;
-		src_in->sin_addr.s_addr = rt->rt_src;
-	}
+	neigh_release(neigh);
 
 	ret = rdma_copy_addr(addr, neigh->dev, neigh->ha);
-release:
-	neigh_release(neigh);
 put:
 	ip_rt_put(rt);
 out:
@@ -265,36 +240,64 @@ out:
 }
 
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
-static int addr6_resolve_remote(struct sockaddr_in6 *src_in,
-			       struct sockaddr_in6 *dst_in,
-			       struct rdma_dev_addr *addr)
+static int addr6_resolve(struct sockaddr_in6 *src_in,
+			 struct sockaddr_in6 *dst_in,
+			 struct rdma_dev_addr *addr)
 {
 	struct flowi fl;
 	struct neighbour *neigh;
 	struct dst_entry *dst;
-	int ret = -ENODATA;
+	int ret;
 
 	memset(&fl, 0, sizeof fl);
-	fl.nl_u.ip6_u.daddr = dst_in->sin6_addr;
-	fl.nl_u.ip6_u.saddr = src_in->sin6_addr;
+	ipv6_addr_copy(&fl.fl6_dst, &dst_in->sin6_addr);
+	ipv6_addr_copy(&fl.fl6_src, &src_in->sin6_addr);
+	fl.oif = addr->bound_dev_if;
 
 	dst = ip6_route_output(&init_net, NULL, &fl);
-	if (!dst)
-		return ret;
+	if ((ret = dst->error))
+		goto put;
+
+	if (ipv6_addr_any(&fl.fl6_src)) {
+		ret = ipv6_dev_get_saddr(&init_net, ip6_dst_idev(dst)->dev,
+					 &fl.fl6_dst, 0,
+					 &fl.fl6_src);
+		if (!ret)
+			goto put;
+		src_in->sin6_family = AF_INET6;
+		ipv6_addr_copy(&src_in->sin6_addr, &fl.fl6_src);
+	}
+
+	if (ip6_dst_idev(dst)->dev == init_net.loopback_dev) {
+		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr);
+		if (!ret)
+			memcpy(addr->dst_dev_addr, addr->src_dev_addr,
+			       MAX_ADDR_LEN);
+		goto put;
+	}
 
-	if (dst->dev->flags & IFF_NOARP) {
-		ret = rdma_copy_addr(addr, dst->dev, NULL);
-	} else {
-		neigh = dst->neighbour;
-		if (neigh && (neigh->nud_state & NUD_VALID))
-			ret = rdma_copy_addr(addr, neigh->dev, neigh->ha);
+	/* If the device does ARP internally, return 'done' */
+	if (ip6_dst_idev(dst)->dev->flags & IFF_NOARP) {
+		ret = rdma_copy_addr(addr, ip6_dst_idev(dst)->dev, NULL);
+		goto put;
 	}
 
+	/* XXX why is this different from the addr4_resolve case?
+	 Is the neigh_lookup redundant? */
+	neigh = dst->neighbour;
+	if (!neigh || !(neigh->nud_state & NUD_VALID)) {
+		neigh_event_send(dst->neighbour, NULL);
+		ret = -ENODATA;
+		goto put;
+	}
+
+	ret = rdma_copy_addr(addr, neigh->dev, neigh->ha);
+put:
 	dst_release(dst);
 	return ret;
 }
 #else
-static int addr6_resolve_remote(struct sockaddr_in6 *src_in,
+static int addr6_resolve(struct sockaddr_in6 *src_in,
 			       struct sockaddr_in6 *dst_in,
 			       struct rdma_dev_addr *addr)
 {
@@ -302,15 +305,15 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in,
 }
 #endif
 
-static int addr_resolve_remote(struct sockaddr *src_in,
-				struct sockaddr *dst_in,
-				struct rdma_dev_addr *addr)
+static int addr_resolve(struct sockaddr *src_in,
+			struct sockaddr *dst_in,
+			struct rdma_dev_addr *addr)
 {
 	if (src_in->sa_family == AF_INET) {
-		return addr4_resolve_remote((struct sockaddr_in *) src_in,
+		return addr4_resolve((struct sockaddr_in *) src_in,
 			(struct sockaddr_in *) dst_in, addr);
 	} else
-		return addr6_resolve_remote((struct sockaddr_in6 *) src_in,
+		return addr6_resolve((struct sockaddr_in6 *) src_in,
 			(struct sockaddr_in6 *) dst_in, addr);
 }
 
@@ -327,8 +330,7 @@ static void process_req(struct work_struct *work)
 		if (req->status == -ENODATA) {
 			src_in = (struct sockaddr *) &req->src_addr;
 			dst_in = (struct sockaddr *) &req->dst_addr;
-			req->status = addr_resolve_remote(src_in, dst_in,
-							  req->addr);
+			req->status = addr_resolve(src_in, dst_in, req->addr);
 			if (req->status && time_after_eq(jiffies, req->timeout))
 				req->status = -ETIMEDOUT;
 			else if (req->status == -ENODATA)
@@ -352,82 +354,6 @@ static void process_req(struct work_struct *work)
 	}
 }
 
-static int addr_resolve_local(struct sockaddr *src_in,
-			      struct sockaddr *dst_in,
-			      struct rdma_dev_addr *addr)
-{
-	struct net_device *dev;
-	int ret;
-
-	switch (dst_in->sa_family) {
-	case AF_INET:
-	{
-		__be32 src_ip = ((struct sockaddr_in *) src_in)->sin_addr.s_addr;
-		__be32 dst_ip = ((struct sockaddr_in *) dst_in)->sin_addr.s_addr;
-
-		dev = ip_dev_find(&init_net, dst_ip);
-		if (!dev)
-			return -EADDRNOTAVAIL;
-
-		if (ipv4_is_zeronet(src_ip)) {
-			src_in->sa_family = dst_in->sa_family;
-			((struct sockaddr_in *) src_in)->sin_addr.s_addr = dst_ip;
-			ret = rdma_copy_addr(addr, dev, dev->dev_addr);
-		} else if (ipv4_is_loopback(src_ip)) {
-			ret = rdma_translate_ip(dst_in, addr);
-			if (!ret)
-				memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
-		} else {
-			ret = rdma_translate_ip(src_in, addr);
-			if (!ret)
-				memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
-		}
-		dev_put(dev);
-		break;
-	}
-
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
-	case AF_INET6:
-	{
-		struct in6_addr *a;
-
-		for_each_netdev(&init_net, dev)
-			if (ipv6_chk_addr(&init_net,
-					  &((struct sockaddr_in6 *) dst_in)->sin6_addr,
-					  dev, 1))
-				break;
-
-		if (!dev)
-			return -EADDRNOTAVAIL;
-
-		a = &((struct sockaddr_in6 *) src_in)->sin6_addr;
-
-		if (ipv6_addr_any(a)) {
-			src_in->sa_family = dst_in->sa_family;
-			((struct sockaddr_in6 *) src_in)->sin6_addr =
-				((struct sockaddr_in6 *) dst_in)->sin6_addr;
-			ret = rdma_copy_addr(addr, dev, dev->dev_addr);
-		} else if (ipv6_addr_loopback(a)) {
-			ret = rdma_translate_ip(dst_in, addr);
-			if (!ret)
-				memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
-		} else  {
-			ret = rdma_translate_ip(src_in, addr);
-			if (!ret)
-				memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
-		}
-		break;
-	}
-#endif
-
-	default:
-		ret = -EADDRNOTAVAIL;
-		break;
-	}
-
-	return ret;
-}
-
 int rdma_resolve_ip(struct rdma_addr_client *client,
 		    struct sockaddr *src_addr, struct sockaddr *dst_addr,
 		    struct rdma_dev_addr *addr, int timeout_ms,
@@ -455,10 +381,11 @@ int rdma_resolve_ip(struct rdma_addr_client *client,
 	src_in = (struct sockaddr *) &req->src_addr;
 	dst_in = (struct sockaddr *) &req->dst_addr;
 
-	req->status = addr_resolve_local(src_in, dst_in, addr);
-	if (req->status == -EADDRNOTAVAIL)
-		req->status = addr_resolve_remote(src_in, dst_in, addr);
+	req->status = addr_resolve(src_in, dst_in, addr);
 
+	/* XXX this scares me, ENODATA is not special, there is no guarentee
+	   that any of the functions in the IP stack called by addr_resolve
+	   don't return ENODATA in some cases.. */
 	switch (req->status) {
 	case 0:
 		req->timeout = jiffies;
@@ -467,7 +394,6 @@ int rdma_resolve_ip(struct rdma_addr_client *client,
 	case -ENODATA:
 		req->timeout = msecs_to_jiffies(timeout_ms) + jiffies;
 		queue_req(req);
-		addr_send_arp(dst_in);
 		break;
 	default:
 		ret = req->status;
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 0753178..a0fa241 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1875,13 +1875,25 @@ err:
 	return ret;
 }
 
-static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
-			 struct sockaddr *dst_addr)
+/* IPv6 link local addresses must specify a consistent set of devices
+   to bind to. */
+static int check_ip6_ll(struct rdma_dev_addr *dev_addr,
+			const struct sockaddr *addr)
 {
-	if (src_addr && src_addr->sa_family)
-		return rdma_bind_addr(id, src_addr);
-	else
-		return cma_bind_any(id, dst_addr->sa_family);
+	const struct sockaddr_in6 *addr6 = (const struct sockaddr_in6 *)addr;
+
+	if (addr->sa_family != AF_INET6)
+		return 0;
+
+	if (ipv6_addr_type(&addr6->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
+		if (dev_addr->bound_dev_if &&
+		    dev_addr->bound_dev_if != addr6->sin6_scope_id)
+			return -EINVAL;
+		dev_addr->bound_dev_if = addr6->sin6_scope_id;
+		if (!dev_addr->bound_dev_if)
+			return -EINVAL;
+	}
+	return 0;
 }
 
 int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
@@ -1891,8 +1903,16 @@ int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
 	int ret;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
+
+	if (src_addr) {
+		if ((ret = check_ip6_ll(&id->route.addr.dev_addr, src_addr)))
+			return ret;
+	}
+	if ((ret = check_ip6_ll(&id->route.addr.dev_addr, dst_addr)))
+		return ret;
+
 	if (id_priv->state == CMA_IDLE) {
-		ret = cma_bind_addr(id, src_addr, dst_addr);
+		ret = cma_bind_any(id, dst_addr->sa_family);
 		if (ret)
 			return ret;
 	}
@@ -1900,12 +1920,18 @@ int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
 	if (!cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_ADDR_QUERY))
 		return -EINVAL;
 
+	/* NOTE: If the user has called rdma_bind_addr() and then passes
+	   a src_addr value here the interpretation is that they wished
+	   to use whatever address they gave to bind to choose a device,
+	   and this address as the source IP for the connection. */
+
 	atomic_inc(&id_priv->refcount);
 	memcpy(&id->route.addr.dst_addr, dst_addr, ip_addr_size(dst_addr));
 	if (cma_any_addr(dst_addr))
 		ret = cma_resolve_loopback(id_priv);
 	else
-		ret = rdma_resolve_ip(&addr_client, (struct sockaddr *) &id->route.addr.src_addr,
+		ret = rdma_resolve_ip(&addr_client,
+				      src_addr ? src_addr : (struct sockaddr *) &id->route.addr.src_addr,
 				      dst_addr, &id->route.addr.dev_addr,
 				      timeout_ms, addr_handler, id_priv);
 	if (ret)
@@ -2077,6 +2103,8 @@ static int cma_get_port(struct rdma_id_private *id_priv)
 	return ret;
 }
 
+/* rdma_bind_addr does a combination of what socket bind() and
+   SO_BINDTODEVICE would do. */
 int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 {
 	struct rdma_id_private *id_priv;
@@ -2085,10 +2113,14 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 	if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6)
 		return -EAFNOSUPPORT;
 
+	if ((ret = check_ip6_ll(&id->route.addr.dev_addr, addr)))
+		return ret;
+
 	id_priv = container_of(id, struct rdma_id_private, id);
 	if (!cma_comp_exch(id_priv, CMA_IDLE, CMA_ADDR_BOUND))
 		return -EINVAL;
 
+	/* Do socket SO_BINDTODEVICE like behavior */
 	if (!cma_any_addr(addr)) {
 		ret = rdma_translate_ip(addr, &id->route.addr.dev_addr);
 		if (ret)
@@ -2101,6 +2133,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 			goto err1;
 	}
 
+	/* Do socket bind() like behavior */
 	memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr));
 	ret = cma_get_port(id_priv);
 	if (ret)
@@ -2815,7 +2848,7 @@ static int cma_netdev_change(struct net_device *ndev, struct rdma_id_private *id
 
 	dev_addr = &id_priv->id.route.addr.dev_addr;
 
-	if ((dev_addr->src_dev == ndev) &&
+	if ((dev_addr->bound_dev_if == ndev->ifindex) &&
 	    memcmp(dev_addr->src_dev_addr, ndev->dev_addr, ndev->addr_len)) {
 		printk(KERN_INFO "RDMA CM addr change for ndev %s used by id %p\n",
 		       ndev->name, &id_priv->id);
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index 483057b..27f17cc 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -61,7 +61,7 @@ struct rdma_dev_addr {
 	unsigned char dst_dev_addr[MAX_ADDR_LEN];
 	unsigned char broadcast[MAX_ADDR_LEN];
 	enum rdma_node_type dev_type;
-	struct net_device *src_dev;
+	int bound_dev_if;
 };
 
 /**
-- 
1.5.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found] ` <20091028054253.GA22882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-28  8:08   ` Sean Hefty
       [not found]     ` <4E89386C612E46C9A33501CEC2521BAF-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  2009-10-28 14:49   ` Or Gerlitz
  2009-10-29 19:56   ` David J. Wilder
  2 siblings, 1 reply; 20+ messages in thread
From: Sean Hefty @ 2009-10-28  8:08 UTC (permalink / raw)
  To: 'Jason Gunthorpe',
	David J. Wilder, leo.tominna-QHcLZuEGTsvQT0dZR+AlfA
  Cc: linux-rdma

>Here is how I think this should all go. I can't steal some machines to
>test it right now (prolly can't till after SC|09), but David/Sean maybe
>you can have a look at it?

I will try to test it some this week.

>AH, there is still another oopsie I just noticed, rdma_resolve_addr
>does not check that dst_addr and src_addr are the same AF

I was actually considering making use of this to allow AF_INET or AF_INET6 to
communicate with AF_IB.  There's not a protocol limitation to supporting this.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found] ` <20091028054253.GA22882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2009-10-28  8:08   ` Sean Hefty
@ 2009-10-28 14:49   ` Or Gerlitz
       [not found]     ` <4AE859F8.2030708-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
  2009-10-29 19:56   ` David J. Wilder
  2 siblings, 1 reply; 20+ messages in thread
From: Or Gerlitz @ 2009-10-28 14:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: David J. Wilder, leo.tominna-QHcLZuEGTsvQT0dZR+AlfA, Sean Hefty,
	linux-rdma

Jason Gunthorpe wrote:
> **COMPILE TESTED ONLY**

any reason why other people have to test for you?

> Convert the address resolution process for outgoing connections
> to be very similar to the way the TCP stack does the same operations.
> This fixes many corner case bugs that can crop up.

rdma_join_multicast(3) states that "before  joining  a  multicast  group,  the  rdma_cm_id  must  be  bound to an RDMA device by calling rdma_bind_addr or rdma_resolve_addr", please make sure that this flow isn't broken by your patch.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]     ` <4AE859F8.2030708-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
@ 2009-10-28 16:46       ` Jason Gunthorpe
       [not found]         ` <20091028164615.GC1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2009-10-28 16:46 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: David J. Wilder, leo.tominna-QHcLZuEGTsvQT0dZR+AlfA, Sean Hefty,
	linux-rdma

On Wed, Oct 28, 2009 at 04:49:28PM +0200, Or Gerlitz wrote:
> Jason Gunthorpe wrote:
> > **COMPILE TESTED ONLY**
> 
> any reason why other people have to test for you?

Just a preliminary proposal - other people can do whatever they
want. Look at the list of things it fixes and decide your self if you
want to help or not <shrug>

> > Convert the address resolution process for outgoing connections
> > to be very similar to the way the TCP stack does the same operations.
> > This fixes many corner case bugs that can crop up.
> 
> rdma_join_multicast(3) states that "before joining a multicast
> group, the rdma_cm_id must be bound to an RDMA device by calling
> rdma_bind_addr or rdma_resolve_addr", please make sure that this
> flow isn't broken by your patch.

Can you explain how rdma_resolve_addr is used in conjunction with
multicast? I do not understand what the dest would be. Is it just a man
page typo?

I belive the rdma_bind_addr path is unaffected.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]     ` <4E89386C612E46C9A33501CEC2521BAF-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-28 16:50       ` Jason Gunthorpe
       [not found]         ` <20091028165043.GD1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2009-10-28 16:50 UTC (permalink / raw)
  To: Sean Hefty
  Cc: David J. Wilder, leo.tominna-QHcLZuEGTsvQT0dZR+AlfA, linux-rdma

On Wed, Oct 28, 2009 at 01:08:33AM -0700, Sean Hefty wrote:
> >Here is how I think this should all go. I can't steal some machines to
> >test it right now (prolly can't till after SC|09), but David/Sean maybe
> >you can have a look at it?
> 
> I will try to test it some this week.

Thanks, probably won't work right away, but I feel it is closer..

Do you have any idea about the ARP things? Is whoever did this
originally still around?

> >AH, there is still another oopsie I just noticed, rdma_resolve_addr
> >does not check that dst_addr and src_addr are the same AF
> 
> I was actually considering making use of this to allow AF_INET or AF_INET6 to
> communicate with AF_IB.  There's not a protocol limitation to supporting this.

Hum.. Not sure how that would work? src = AF_INET, dst = AF_IB?

I guess I was hoping you'd be able to do the private data in userspace
and avoid those kind of special cases..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]         ` <20091028164615.GC1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-28 17:05           ` Sean Hefty
       [not found]             ` <D775590A8D92454491BD2034E85B2B36-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Sean Hefty @ 2009-10-28 17:05 UTC (permalink / raw)
  To: 'Jason Gunthorpe', Or Gerlitz
  Cc: David J. Wilder, leo.tominna-QHcLZuEGTsvQT0dZR+AlfA, linux-rdma

>Can you explain how rdma_resolve_addr is used in conjunction with
>multicast? I do not understand what the dest would be. Is it just a man
>page typo?

A UD endpoint can communicate using multicast and to other UD endpoints.  A user
could resolve a UD endpoint before joining a multicast group.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]             ` <D775590A8D92454491BD2034E85B2B36-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-28 17:32               ` Jason Gunthorpe
       [not found]                 ` <20091028173229.GI14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2009-10-28 17:32 UTC (permalink / raw)
  To: Sean Hefty
  Cc: Or Gerlitz, David J. Wilder, leo.tominna-QHcLZuEGTsvQT0dZR+AlfA,
	linux-rdma

On Wed, Oct 28, 2009 at 10:05:19AM -0700, Sean Hefty wrote:
> >Can you explain how rdma_resolve_addr is used in conjunction with
> >multicast? I do not understand what the dest would be. Is it just a man
> >page typo?
> 
> A UD endpoint can communicate using multicast and to other UD
> endpoints.  A user could resolve a UD endpoint before joining a
> multicast group.

So the IP world analog would be:

fd = socket(AF_INET,SOCK_DGRAM);
connect(fd,'Some Unicast Address');
setsockopt(fd,IP_MULITCAST_ADD_MEMBERSHIP,'Some Multicast Address');
sendto(fd,...,'Some Multicast Address');

?

I think that is still OK. The routines still bind the rdma cm_id to
the devices via rdma_translate_ip pretty much like they did before.

There is no support for Linux IP multicast routing though..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]         ` <20091028165043.GD1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-28 17:58           ` Sean Hefty
       [not found]             ` <E27512B9FE294380956C074D2C35D235-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Sean Hefty @ 2009-10-28 17:58 UTC (permalink / raw)
  To: 'Jason Gunthorpe'; +Cc: linux-rdma, Or Gerlitz

>Hum.. Not sure how that would work? src = AF_INET, dst = AF_IB?

yes - conceptually, these both map to PF_IB, which I think is really the
limiting factor

I'm guessing that you could make the reverse work as well, but it requires more
work and probably wouldn't be used by apps anyway.

>I guess I was hoping you'd be able to do the private data in userspace
>and avoid those kind of special cases..

The private data would still be done in user space.  This allows apps that make
use of ADDR_CHANGE events to continue working as normal, while still allowing
for address resolution to be done in user space.  This should meet Or's request.
User space may not know if they can resolve a particular address before binding.

I need to write the kernel patches to see if supporting this causes undo grief.
At this point, I'm guessing that the only thing that would prevent this from
working is if an explicit check is added.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]             ` <E27512B9FE294380956C074D2C35D235-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-28 18:58               ` Jason Gunthorpe
       [not found]                 ` <20091028185847.GK14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2009-10-28 18:58 UTC (permalink / raw)
  To: Sean Hefty; +Cc: linux-rdma, Or Gerlitz

On Wed, Oct 28, 2009 at 10:58:58AM -0700, Sean Hefty wrote:

> >I guess I was hoping you'd be able to do the private data in userspace
> >and avoid those kind of special cases..
> 
> The private data would still be done in user space.  This allows
> apps that make use of ADDR_CHANGE events to continue working as
> normal, while still allowing for address resolution to be done in
> user space.  This should meet Or's request.  User space may not know
> if they can resolve a particular address before binding.

Well, looking at this, I see that ADDR_CHANGE is only generated on
netdev events NETDEV_BONDING_FAILOVER - so what are the semantics
WRT to UCM here? As I said before, defeating the ND process seriously
affects how bonding works.

I think if UCM is going to have bonding-like HA it has to do the work
itself. It has to record the bonding IP is bound to multiple GID/LIDs
and it has to attempt connecting to multiple addrs.

So, I don't see ADDR_CHANGE fitting into this. An IB centric IB
connection lost event to trigger reconnect would be more
appropriate. This would be desirable anyhow for AF_IB only apps, so
multi-protocol apps can handle both cases - the same recovery
action is probably required anyhow...

FWIW, I think HA + UCM is very doable, if UCM can construct multiple
PRs based on bonding data it collects it should pass them into the
kernel using the mechanism we talked about for APM. The kernel does
not have to implement APM but it certainly can run through the
combinations and use the first to reply to the GMP as the path to
establish a connection. For typical 2 port bonding there would be 4
combinations, 2 of which the kernel could eliminate immediately by
detecting the local port is downed.

Then again, is bonding + ucm going to be a common usage model? Without
the faster address failover provided by the unsolicted ND packet the
recovery time will be worse. On the other hand, if all the above is
done UCM would actually have enough information to setup multi-port
APM...

Anyhow, I don't suggest you try to implement HA + UCM out of the gate,
but as a direction for work in that area to take I think AF_IB only
plus IB_LINK_LOST event, plus 'APM-lite' kernel PR round robin is a
viable approach.

Keeping AF_IB uniformly as AF_IB seems like it would make things
simpler to implement and describe WTF it is doing.

> At this point, I'm guessing that the only thing that would prevent
> this from working is if an explicit check is added.

Some kinds of checks are needed, passing  AF_INET and AF_INET6 into
rdma_resolve_ip will mis-function. dst=AF_IB will have to
branch off before this.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found] ` <20091028054253.GA22882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2009-10-28  8:08   ` Sean Hefty
  2009-10-28 14:49   ` Or Gerlitz
@ 2009-10-29 19:56   ` David J. Wilder
       [not found]     ` <1256846162.12957.27.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  2 siblings, 1 reply; 20+ messages in thread
From: David J. Wilder @ 2009-10-29 19:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: leo.tominna-QHcLZuEGTsvQT0dZR+AlfA, Sean Hefty, linux-rdma


On Tue, 2009-10-27 at 23:42 -0600, Jason Gunthorpe wrote:

> 
> I left the network namespace stuff alone and kept with the init_net
> situation..

Another possible solution ;)

 if (rt->idev->dev->flags & IFF_LOOPBACK){


> 
> -static int addr_resolve_remote(struct sockaddr *src_in,
> -				struct sockaddr *dst_in,
> -				struct rdma_dev_addr *addr)
> +static int addr_resolve(struct sockaddr *src_in,
> +			struct sockaddr *dst_in,
> +			struct rdma_dev_addr *addr)
>  {

A problem here, if a source address has not been specified then this
test is invalid.  I had to change it to use dst_in->sa_family. But as
you said, we should have validated sa_family before this point.    

>  	if (src_in->sa_family == AF_INET) {
> -		return addr4_resolve_remote((struct sockaddr_in *) src_in,
> +		return addr4_resolve((struct sockaddr_in *) src_in,
>  			(struct sockaddr_in *) dst_in, addr);
>  	} else
> -		return addr6_resolve_remote((struct sockaddr_in6 *) src_in,
> +		return addr6_resolve((struct sockaddr_in6 *) src_in,
>  			(struct sockaddr_in6 *) dst_in, addr);
>  }


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]     ` <1256846162.12957.27.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
@ 2009-10-29 20:21       ` Jason Gunthorpe
       [not found]         ` <20091029202106.GF1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2009-10-29 20:21 UTC (permalink / raw)
  To: David J. Wilder; +Cc: Sean Hefty, linux-rdma

On Thu, Oct 29, 2009 at 12:56:01PM -0700, David J. Wilder wrote:
> 
> On Tue, 2009-10-27 at 23:42 -0600, Jason Gunthorpe wrote:
> 
> > 
> > I left the network namespace stuff alone and kept with the init_net
> > situation..
> 
> Another possible solution ;)
> 
>  if (rt->idev->dev->flags & IFF_LOOPBACK){

Oh! That is probbly much better
 
> 
> > 
> > -static int addr_resolve_remote(struct sockaddr *src_in,
> > -				struct sockaddr *dst_in,
> > -				struct rdma_dev_addr *addr)
> > +static int addr_resolve(struct sockaddr *src_in,
> > +			struct sockaddr *dst_in,
> > +			struct rdma_dev_addr *addr)
> >  {
> 
> A problem here, if a source address has not been specified then this
> test is invalid.  I had to change it to use dst_in->sa_family. But as
> you said, we should have validated sa_family before this point.    

Yes, that should have been dst, dst is not optional, src is at this
point.

The sa_family validation of src should only be to check that if it is
specified it is is the same..

Thanks!
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]                 ` <20091028173229.GI14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-11-01  6:15                   ` Or Gerlitz
  0 siblings, 0 replies; 20+ messages in thread
From: Or Gerlitz @ 2009-11-01  6:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sean Hefty, David J. Wilder, leo.tominna-QHcLZuEGTsvQT0dZR+AlfA,
	linux-rdma

Jason Gunthorpe wrote:
> On Wed, Oct 28, 2009 at 10:05:19AM -0700, Sean Hefty wrote:
>> A UD endpoint can communicate using multicast and to other UD endpoints.  A user could resolve a UD endpoint before joining a multicast group.
>
> So the IP world analog would be:
> fd = socket(AF_INET,SOCK_DGRAM);
> connect(fd,'Some Unicast Address');
> setsockopt(fd,IP_MULITCAST_ADD_MEMBERSHIP,'Some Multicast Address');
> sendto(fd,...,'Some Multicast Address');
IP multicast senders don't call IP_ADD_MEMBERSHIP, only receivers

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]                 ` <20091028185847.GK14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-11-05 12:21                   ` Or Gerlitz
       [not found]                     ` <4AF2C361.1080307-smomgflXvOZWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Or Gerlitz @ 2009-11-05 12:21 UTC (permalink / raw)
  To: Jason Gunthorpe, Sean Hefty; +Cc: linux-rdma

Jason Gunthorpe wrote:
> Well, looking at this, I see that ADDR_CHANGE is only generated on netdev events NETDEV_BONDING_FAILOVER - so what are the semantics WRT to UCM here? As I said before, defeating the ND process seriously affects how bonding works.
Again, much of the rdma-cm essence is to link/bridge/glue the netdev and 
rdma stacks. I don't see what it has to do with UCM which you keep 
mentioning/repeating over this email/thread and elsewhere.

Sean, I don't see why ACM can't be implemented without introducing 
AF_IB/PS_IB, is it really a must or more of something you see as making 
the implementation somehow easier? Now, even if you are fine with AF_IB, 
I don't think that UCM should take over the rdma-ucm and fill the kernel 
rdma-cm code with tons of IB specifics code. If people need UCM, lets 
them use libibcm.

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]                     ` <4AF2C361.1080307-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2009-11-05 17:04                       ` Jason Gunthorpe
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2009-11-05 17:04 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Sean Hefty, linux-rdma

On Thu, Nov 05, 2009 at 02:21:53PM +0200, Or Gerlitz wrote:
> Jason Gunthorpe wrote:
> >Well, looking at this, I see that ADDR_CHANGE is only generated on netdev 
> >events NETDEV_BONDING_FAILOVER - so what are the semantics WRT to UCM 
> >here? As I said before, defeating the ND process seriously affects how 
> >bonding works.

> Again, much of the rdma-cm essence is to link/bridge/glue the netdev and 
> rdma stacks. I don't see what it has to do with UCM which you keep 
> mentioning/repeating over this email/thread and elsewhere.

Oops, brian fart, I ment ACM

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]         ` <20091029202106.GF1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-11-10 21:05           ` Pradeep Satyanarayana
       [not found]             ` <4AF9D5A1.6030309-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Pradeep Satyanarayana @ 2009-11-10 21:05 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: David J. Wilder, Sean Hefty, linux-rdma

Jason Gunthorpe wrote:
> On Thu, Oct 29, 2009 at 12:56:01PM -0700, David J. Wilder wrote:
>> On Tue, 2009-10-27 at 23:42 -0600, Jason Gunthorpe wrote:
>>
>>> I left the network namespace stuff alone and kept with the init_net
>>> situation..
>> Another possible solution ;)
>>
>>  if (rt->idev->dev->flags & IFF_LOOPBACK){
> 
> Oh! That is probbly much better
> 
>>> -static int addr_resolve_remote(struct sockaddr *src_in,
>>> -				struct sockaddr *dst_in,
>>> -				struct rdma_dev_addr *addr)
>>> +static int addr_resolve(struct sockaddr *src_in,
>>> +			struct sockaddr *dst_in,
>>> +			struct rdma_dev_addr *addr)
>>>  {
>> A problem here, if a source address has not been specified then this
>> test is invalid.  I had to change it to use dst_in->sa_family. But as
>> you said, we should have validated sa_family before this point.    
> 
> Yes, that should have been dst, dst is not optional, src is at this
> point.
> 
> The sa_family validation of src should only be to check that if it is
> specified it is is the same..

Sean,

I know Jason and David came up with different sets of patches. Is it possible that
we can come to some agreement, so that this can be pushed to OFED-1.5? 

I am concerned that if we miss the OFED-1.5 boat, this feature (IPv6 support for RDMA CM)
may be available in the various distributions only in late 2010 or maybe even 2011.

We got side tracked with some higher priority stuff, but now should be able to help
with patches and testing them too. How can we make progress? What are your thoughts?

Pradeep


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]             ` <4AF9D5A1.6030309-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2009-11-10 21:12               ` Sean Hefty
       [not found]                 ` <E34EB5B2A00C459EB20F307ED1C5B437-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Sean Hefty @ 2009-11-10 21:12 UTC (permalink / raw)
  To: 'Pradeep Satyanarayana', Jason Gunthorpe
  Cc: David J. Wilder, linux-rdma

>How can we make progress? What are your thoughts?

I'm actually working on this today.  I've taken Jason's patch as a starting
point, and I'm breaking it into separate patches for merging and testing.  I
hope to post the patches within the next couple of days.  I'll compare my final
patches against the ones submitted by David to see if anything got missed.

Help with testing would be great.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]                 ` <E34EB5B2A00C459EB20F307ED1C5B437-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-11-10 21:25                   ` Jason Gunthorpe
  2009-11-10 23:03                   ` David J. Wilder
  2009-11-11 10:16                   ` Or Gerlitz
  2 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2009-11-10 21:25 UTC (permalink / raw)
  To: Sean Hefty; +Cc: 'Pradeep Satyanarayana', David J. Wilder, linux-rdma

On Tue, Nov 10, 2009 at 01:12:17PM -0800, Sean Hefty wrote:
> >How can we make progress? What are your thoughts?
> 
> I'm actually working on this today.  I've taken Jason's patch as a
> starting point, and I'm breaking it into separate patches for
> merging and testing.  I hope to post the patches within the next
> couple of days.  I'll compare my final patches against the ones
> submitted by David to see if anything got missed.

Oh, thanks, I didn't mean for you to do all that work..

But I won't be able to get to it till after SC|09 which is like the
23rd, so maybe it is best if there is some OFED related deadline.

Aside from testing and breaking up the remaining things I wanted to
review/adjust:
- David's notes on the patch
- The rdma_resolve_ip should require src/dst have the same family
- The arp checking paths needs to be examinined as per my post
- The arp reply waiting code may be better done by holding the dst
  and not polling on route lookups.
- Review the private data handling and check that mixed family cases
  work as they should

It was my intent to capture everything David had done in that big
patch, but I may have missed something.

My intuition is that the fixups to the ARP/ND code will matter for
IPv6, particularly multi-port link local cases, but I'm not 100% sure
on that.

For full IPv6 support the 2 multicast patches you already reviewed will
need to included in OFED as well.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]                 ` <E34EB5B2A00C459EB20F307ED1C5B437-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  2009-11-10 21:25                   ` Jason Gunthorpe
@ 2009-11-10 23:03                   ` David J. Wilder
  2009-11-11 10:16                   ` Or Gerlitz
  2 siblings, 0 replies; 20+ messages in thread
From: David J. Wilder @ 2009-11-10 23:03 UTC (permalink / raw)
  To: Sean Hefty; +Cc: 'Pradeep Satyanarayana', Jason Gunthorpe, linux-rdma


On Tue, 2009-11-10 at 13:12 -0800, Sean Hefty wrote:
> >How can we make progress? What are your thoughts?

> 
> Help with testing would be great.

Thanks Sean,
I would be glad to do some testing when patches are ready!
 
> 
> - Sean
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]                 ` <E34EB5B2A00C459EB20F307ED1C5B437-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  2009-11-10 21:25                   ` Jason Gunthorpe
  2009-11-10 23:03                   ` David J. Wilder
@ 2009-11-11 10:16                   ` Or Gerlitz
       [not found]                     ` <4AFA8EF7.6010408-smomgflXvOZWk0Htik3J/w@public.gmane.org>
  2 siblings, 1 reply; 20+ messages in thread
From: Or Gerlitz @ 2009-11-11 10:16 UTC (permalink / raw)
  To: Sean Hefty
  Cc: 'Pradeep Satyanarayana',
	Jason Gunthorpe, David J. Wilder, linux-rdma

Sean Hefty wrote:
> I'll compare my final patches against the ones submitted by David to see if anything got missed
>   
Are Jason's patches a superset of David's patches? or they need to be 
applied and only then David's work can be re-reviewed/merged, etc?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
       [not found]                     ` <4AFA8EF7.6010408-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2009-11-11 17:12                       ` Sean Hefty
  0 siblings, 0 replies; 20+ messages in thread
From: Sean Hefty @ 2009-11-11 17:12 UTC (permalink / raw)
  To: 'Or Gerlitz'
  Cc: 'Pradeep Satyanarayana',
	Jason Gunthorpe, David J. Wilder, linux-rdma

>Are Jason's patches a superset of David's patches? or they need to be
>applied and only then David's work can be re-reviewed/merged, etc?

I believe that Jason's patches are a superset of David's.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-11-11 17:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-28  5:42 [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM Jason Gunthorpe
     [not found] ` <20091028054253.GA22882-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-28  8:08   ` Sean Hefty
     [not found]     ` <4E89386C612E46C9A33501CEC2521BAF-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-28 16:50       ` Jason Gunthorpe
     [not found]         ` <20091028165043.GD1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-28 17:58           ` Sean Hefty
     [not found]             ` <E27512B9FE294380956C074D2C35D235-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-28 18:58               ` Jason Gunthorpe
     [not found]                 ` <20091028185847.GK14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-11-05 12:21                   ` Or Gerlitz
     [not found]                     ` <4AF2C361.1080307-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2009-11-05 17:04                       ` Jason Gunthorpe
2009-10-28 14:49   ` Or Gerlitz
     [not found]     ` <4AE859F8.2030708-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
2009-10-28 16:46       ` Jason Gunthorpe
     [not found]         ` <20091028164615.GC1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-28 17:05           ` Sean Hefty
     [not found]             ` <D775590A8D92454491BD2034E85B2B36-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-28 17:32               ` Jason Gunthorpe
     [not found]                 ` <20091028173229.GI14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-11-01  6:15                   ` Or Gerlitz
2009-10-29 19:56   ` David J. Wilder
     [not found]     ` <1256846162.12957.27.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-29 20:21       ` Jason Gunthorpe
     [not found]         ` <20091029202106.GF1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-11-10 21:05           ` Pradeep Satyanarayana
     [not found]             ` <4AF9D5A1.6030309-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2009-11-10 21:12               ` Sean Hefty
     [not found]                 ` <E34EB5B2A00C459EB20F307ED1C5B437-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-11-10 21:25                   ` Jason Gunthorpe
2009-11-10 23:03                   ` David J. Wilder
2009-11-11 10:16                   ` Or Gerlitz
     [not found]                     ` <4AFA8EF7.6010408-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2009-11-11 17:12                       ` Sean Hefty

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.