All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH libibverbs V5 0/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
@ 2014-08-20  6:49 Matan Barak
       [not found] ` <1408517381-17523-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Matan Barak @ 2014-08-20  6:49 UTC (permalink / raw)
  To: Roland Dreier, Jason Gunthorpe
  Cc: Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, Or Gerlitz, Matan Barak

Hi Roland,

This series adds support for Ethernet L2 address resolution for RoCE UD QPs,
whose L2 address-handles, unlike RC/UC/XRC/etc QPs are set from user space
without going through uverbs and the kernel IB core. The code is also
compatible both with old kernels that don't run IP based addressing.

Matan and Or.

changes from V4 -- addressed feedback from Jason
  - Instead of an extended create_ah_ex provider call, use a utlity
    function in libibverbs.
  - Remove the extended query port that was mainly used for caching.

changes from V3 -- addressed feedback from Jason:
  - replaced the char* that used to hold the MAC address provided to
    the provider library (e.g libmlx4) with sockaddr_storage and use ARPHRD
    for the af_family (similar to the practice used in the kernel)

  - Dropped mask1 from ibv_query_port_ex and put all fields in comp_mask
  - Added a man page for ibv_query_port_ex
  - Made the code c99 friendly by dropping the anonymous union and struct
  - Splitted the patches to patch that adds the interface and patch that uses it
  - Removed the usage of drv_ and lib_ schemas
  - Removed the VERBS_CONTEXT_XXXXXXX

changes from V2:
  - when using libnl3, link libibverbs itself with libnl3 and not the utilities
  - fix memory leak which took place when nl_addr_cmp_prefix_msb returns non zero.
  - rebased against libibverbs-1.1.8

changes from V1, addressed feedback from Yann Droneaud:
  - ported the code to libnl3 with automatic fallback option to libnl1
  - better error handling (more checks to erroneous flows)
  - fixed typos and removed blank lines
  - set close-on-exec open flag for the relevant fds
  - remove wrong comment from the change-log of patch #1

changes from V0:
 - don't change struct ibv_port_attr flags field from uint32_t
   to enum (size could change)
 - ibv_ah_attr_ex is extendable and contains ibv_ah_attr

Matan Barak (2):
  Add ibv_port_cap_flags
  Use neighbour lookup for RoCE UD QPs Eth L2 resolution

 Makefile.am                |   24 +-
 configure.ac               |   31 ++
 include/infiniband/verbs.h |   29 ++
 src/libibverbs.map         |    2 +
 src/neigh.c                | 1003 ++++++++++++++++++++++++++++++++++++++++++++
 src/neigh.h                |   53 +++
 src/verbs.c                |  143 +++++++
 7 files changed, 1275 insertions(+), 10 deletions(-)
 create mode 100644 src/neigh.c
 create mode 100644 src/neigh.h

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH libibverbs V5 1/2] Add ibv_port_cap_flags
       [not found] ` <1408517381-17523-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-08-20  6:49   ` Matan Barak
  2014-08-20  6:49   ` [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution Matan Barak
  2014-08-20 16:26   ` [PATCH libibverbs V5 0/2] " Jason Gunthorpe
  2 siblings, 0 replies; 16+ messages in thread
From: Matan Barak @ 2014-08-20  6:49 UTC (permalink / raw)
  To: Roland Dreier, Jason Gunthorpe
  Cc: Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, Or Gerlitz, Matan Barak

Add an enum that describes ibv_port_cap_flags that complies
with the respective kernel enum.

This value could be fetched when using ibv_query_port in
port_cap_flags.


Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 include/infiniband/verbs.h |   26 ++++++++++++++++++++++++++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 9826b72..4eb68f1 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -191,6 +191,32 @@ enum {
 	IBV_LINK_LAYER_ETHERNET,
 };
 
+enum ibv_port_cap_flags {
+	IBV_PORT_SM				= 1 <<  1,
+	IBV_PORT_NOTICE_SUP			= 1 <<  2,
+	IBV_PORT_TRAP_SUP			= 1 <<  3,
+	IBV_PORT_OPT_IPD_SUP			= 1 <<  4,
+	IBV_PORT_AUTO_MIGR_SUP			= 1 <<  5,
+	IBV_PORT_SL_MAP_SUP			= 1 <<  6,
+	IBV_PORT_MKEY_NVRAM			= 1 <<  7,
+	IBV_PORT_PKEY_NVRAM			= 1 <<  8,
+	IBV_PORT_LED_INFO_SUP			= 1 <<  9,
+	IBV_PORT_SYS_IMAGE_GUID_SUP		= 1 << 11,
+	IBV_PORT_PKEY_SW_EXT_PORT_TRAP_SUP	= 1 << 12,
+	IBV_PORT_EXTENDED_SPEEDS_SUP		= 1 << 14,
+	IBV_PORT_CM_SUP				= 1 << 16,
+	IBV_PORT_SNMP_TUNNEL_SUP		= 1 << 17,
+	IBV_PORT_REINIT_SUP			= 1 << 18,
+	IBV_PORT_DEVICE_MGMT_SUP		= 1 << 19,
+	IBV_PORT_VENDOR_CLASS_SUP		= 1 << 20,
+	IBV_PORT_DR_NOTICE_SUP			= 1 << 21,
+	IBV_PORT_CAP_MASK_NOTICE_SUP		= 1 << 22,
+	IBV_PORT_BOOT_MGMT_SUP			= 1 << 23,
+	IBV_PORT_LINK_LATENCY_SUP		= 1 << 24,
+	IBV_PORT_CLIENT_REG_SUP			= 1 << 25,
+	IBV_PORT_IP_BASED_GIDS			= 1 << 26
+};
+
 struct ibv_port_attr {
 	enum ibv_port_state	state;
 	enum ibv_mtu		max_mtu;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found] ` <1408517381-17523-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2014-08-20  6:49   ` [PATCH libibverbs V5 1/2] Add ibv_port_cap_flags Matan Barak
@ 2014-08-20  6:49   ` Matan Barak
       [not found]     ` <1408517381-17523-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2014-08-20 16:26   ` [PATCH libibverbs V5 0/2] " Jason Gunthorpe
  2 siblings, 1 reply; 16+ messages in thread
From: Matan Barak @ 2014-08-20  6:49 UTC (permalink / raw)
  To: Roland Dreier, Jason Gunthorpe
  Cc: Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, Or Gerlitz, Matan Barak

In order to implement RoCE IP based addressing for UD QPs, without introducing
uverbs changes, we need a way to resolve the L2 Ethernet addresses from user-space.

This is done with netlink through libnl, and in libibverbs such that multiple
vendor provider libraries can use the code.

This is implemented as a helper function ibv_resolve_eth_l2_from_gid.
If the GID is IP based, the provider shall call this utility function,
which resolves the IP to MAC and vlan.

The steps for resolution:

1. get sgid

2. from sgid, get the interface

3. query route from interface to the destination

4. query the neigh table, if the MAC for the destination is already known, done.

5. if not, loop until timeout:
    a. send a UDP packet to that IP targeted to the "discard" port
    b. listen to events from the neigh table
    c. query neigh table in order to know if the neigh has shown up between
       query until we started monitoring it

6. query vlan id from the interface

This solution depends on libnl-3 with backports to libnl-1.


Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 Makefile.am                |   24 +-
 configure.ac               |   31 ++
 include/infiniband/verbs.h |    3 +
 src/libibverbs.map         |    2 +
 src/neigh.c                | 1003 ++++++++++++++++++++++++++++++++++++++++++++
 src/neigh.h                |   53 +++
 src/verbs.c                |  143 +++++++
 7 files changed, 1249 insertions(+), 10 deletions(-)
 create mode 100644 src/neigh.c
 create mode 100644 src/neigh.h

diff --git a/Makefile.am b/Makefile.am
index 3a40f3e..3104d5a 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -5,13 +5,17 @@ lib_LTLIBRARIES = src/libibverbs.la
 ACLOCAL_AMFLAGS = -I config
 AM_CFLAGS = -g -Wall
 
-src_libibverbs_la_CFLAGS = $(AM_CFLAGS) -DIBV_CONFIG_DIR=\"$(sysconfdir)/libibverbs.d\"
-
+src_libibverbs_la_CFLAGS = $(AM_CFLAGS) -DIBV_CONFIG_DIR=\"$(sysconfdir)/libibverbs.d\" \
+			   $(LIBNL_CFLAGS)
 libibverbs_version_script = @LIBIBVERBS_VERSION_SCRIPT@
+src_libibverbs_la_LIBADD = $(LIBNL_LIBS)
 
 src_libibverbs_la_SOURCES = src/cmd.c src/compat-1_0.c src/device.c src/init.c \
 			    src/marshall.c src/memory.c src/sysfs.c src/verbs.c \
 			    src/enum_strs.c
+if ! NO_RESOLVE_NEIGH
+src_libibverbs_la_SOURCES += src/neigh.c
+endif
 src_libibverbs_la_LDFLAGS = -version-info 1 -export-dynamic \
     $(libibverbs_version_script)
 src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map
@@ -20,21 +24,21 @@ bin_PROGRAMS = examples/ibv_devices examples/ibv_devinfo \
     examples/ibv_asyncwatch examples/ibv_rc_pingpong examples/ibv_uc_pingpong \
     examples/ibv_ud_pingpong examples/ibv_srq_pingpong examples/ibv_xsrq_pingpong
 examples_ibv_devices_SOURCES = examples/device_list.c
-examples_ibv_devices_LDADD = $(top_builddir)/src/libibverbs.la
+examples_ibv_devices_LDADD = $(top_builddir)/src/libibverbs.la $(LIBNL_LIBS)
 examples_ibv_devinfo_SOURCES = examples/devinfo.c
-examples_ibv_devinfo_LDADD = $(top_builddir)/src/libibverbs.la
+examples_ibv_devinfo_LDADD = $(top_builddir)/src/libibverbs.la $(LIBNL_LIBS)
 examples_ibv_rc_pingpong_SOURCES = examples/rc_pingpong.c examples/pingpong.c
-examples_ibv_rc_pingpong_LDADD = $(top_builddir)/src/libibverbs.la
+examples_ibv_rc_pingpong_LDADD = $(top_builddir)/src/libibverbs.la $(LIBNL_LIBS)
 examples_ibv_uc_pingpong_SOURCES = examples/uc_pingpong.c examples/pingpong.c
-examples_ibv_uc_pingpong_LDADD = $(top_builddir)/src/libibverbs.la
+examples_ibv_uc_pingpong_LDADD = $(top_builddir)/src/libibverbs.la $(LIBNL_LIBS)
 examples_ibv_ud_pingpong_SOURCES = examples/ud_pingpong.c examples/pingpong.c
-examples_ibv_ud_pingpong_LDADD = $(top_builddir)/src/libibverbs.la
+examples_ibv_ud_pingpong_LDADD = $(top_builddir)/src/libibverbs.la $(LIBNL_LIBS)
 examples_ibv_srq_pingpong_SOURCES = examples/srq_pingpong.c examples/pingpong.c
-examples_ibv_srq_pingpong_LDADD = $(top_builddir)/src/libibverbs.la
+examples_ibv_srq_pingpong_LDADD = $(top_builddir)/src/libibverbs.la $(LIBNL_LIBS)
 examples_ibv_xsrq_pingpong_SOURCES = examples/xsrq_pingpong.c examples/pingpong.c
-examples_ibv_xsrq_pingpong_LDADD = $(top_builddir)/src/libibverbs.la
+examples_ibv_xsrq_pingpong_LDADD = $(top_builddir)/src/libibverbs.la $(LIBNL_LIBS)
 examples_ibv_asyncwatch_SOURCES = examples/asyncwatch.c
-examples_ibv_asyncwatch_LDADD = $(top_builddir)/src/libibverbs.la
+examples_ibv_asyncwatch_LDADD = $(top_builddir)/src/libibverbs.la $(LIBNL_LIBS)
 
 libibverbsincludedir = $(includedir)/infiniband
 
diff --git a/configure.ac b/configure.ac
index b5f6dfc..bef0e35 100644
--- a/configure.ac
+++ b/configure.ac
@@ -28,6 +28,37 @@ else
     fi
 fi
 
+AC_ARG_WITH([resolve-neigh],
+    AC_HELP_STRING([--with-resolve-neigh],
+        [Enable neighbour resolution in Ethernet (default YES)]))
+have_libnl=no
+if  test x$with_resolve_neigh = x || test x$with_resolve_neigh = xyes; then
+	PKG_CHECK_MODULES([LIBNL],[libnl-3.0],[
+			   have_libnl=yes
+			   AC_DEFINE([HAVE_LIBNL3], [1], [Use libnl-3.0])
+			   AC_DEFINE([HAVE_LIBNL], [1],  [Use libnl])
+			   PKG_CHECK_MODULES([LIBNL_ROUTE3], [libnl-route-3.0])
+			   LIBNL_CFLAGS="$LIBNL_CFLAGS $LIBNL_ROUTE3_CFLAGS"
+			   LIBNL_LIBS="$LIBNL_LIBS $LIBNL_ROUTE3_LIBS"], [:]
+			  );
+	if test "$have_libnl" = no; then
+		PKG_CHECK_MODULES([LIBNL], [libnl-1], [have_libnl=yes
+				  AC_DEFINE([HAVE_LIBNL1], [1], [Use libnl-1])
+				  AC_DEFINE([HAVE_LIBNL], [1],  [Use libnl])
+				  AC_CHECK_LIB(nl, rtnl_link_vlan_get_id, [],
+					AC_MSG_ERROR([rtnl_link_vlan_get_id not found.  libibverbs requires libnl.]))
+				 ],[
+					AC_MSG_ERROR([libibverbs requires libnl.])
+				 ])
+	fi
+else
+    AC_DEFINE([NRESOLVE_NEIGH], 1, [Define to 1 to disable resovle neigh annotations.])
+fi
+AM_CONDITIONAL([HAVE_LIBNL], [test "$have_libnl" = "yes"])
+AC_SUBST([LIBNL_CFLAGS])
+AC_SUBST([LIBNL_LIBS])
+AM_CONDITIONAL(NO_RESOLVE_NEIGH, test x$with_resolve_neigh = xno)
+
 dnl Checks for libraries
 AC_CHECK_LIB(dl, dlsym, [],
     AC_MSG_ERROR([dlsym() not found.  libibverbs requires libdl.]))
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 4eb68f1..ebcfc03 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -1539,6 +1539,9 @@ const char *ibv_port_state_str(enum ibv_port_state port_state);
  */
 const char *ibv_event_type_str(enum ibv_event_type event);
 
+#define ETHERNET_LL_SIZE 6
+int ibv_resolve_eth_l2_from_gid(struct ibv_context *context, struct ibv_ah_attr *attr,
+				uint8_t eth_mac[ETHERNET_LL_SIZE], uint16_t *vid);
 END_C_DECLS
 
 #  undef __attribute_const
diff --git a/src/libibverbs.map b/src/libibverbs.map
index 30212f3..9f0ec69 100644
--- a/src/libibverbs.map
+++ b/src/libibverbs.map
@@ -103,6 +103,8 @@ IBVERBS_1.1 {
 		ibv_rate_to_mbps;
 		mbps_to_ibv_rate;
 
+		ibv_resolve_eth_l2_from_gid;
+
 		ibv_cmd_open_xrcd;
 		ibv_cmd_close_xrcd;
 		ibv_cmd_create_srq_ex;
diff --git a/src/neigh.c b/src/neigh.c
new file mode 100644
index 0000000..afa2281
--- /dev/null
+++ b/src/neigh.c
@@ -0,0 +1,1003 @@
+
+#include "config.h"
+#include <net/if_packet.h>
+#include <linux/netlink.h>
+#include <linux/rtnetlink.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <netlink/route/rtnl.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/neighbour.h>
+#ifndef HAVE_LIBNL1
+#include <netlink/route/link/vlan.h>
+#endif
+
+
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/timerfd.h>
+#include <netinet/in.h>
+#include <errno.h>
+#include <unistd.h>
+#include <ifaddrs.h>
+#include <netdb.h>
+#ifndef _LINUX_IF_H
+#include <net/if.h>
+#else
+/*Workaround when there's a collision between the includes */
+extern unsigned int if_nametoindex(__const char *__ifname) __THROW;
+#endif
+
+/* for PFX */
+#include "ibverbs.h"
+
+#include "neigh.h"
+
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+#define MIN(a, b) ((a) < (b) ? (a) : (b))
+#define print_hdr PFX "resolver: "
+#define print_err(...) fprintf(stderr, print_hdr __VA_ARGS__)
+#ifdef _DEBUG_
+#define print_dbg_s(stream, ...) fprintf((stream), __VA_ARGS__)
+#define print_address(stream, msg, addr) \
+	{							      \
+		char buff[100];					      \
+		print_dbg_s((stream), (msg), nl_addr2str((addr), buff,\
+			    sizeof(buff)));			      \
+	}
+#else
+#define print_dbg_s(stream, args...)
+#define print_address(stream, msg, ...)
+#endif
+#define print_dbg(...) print_dbg_s(stderr, print_hdr __VA_ARGS__)
+
+#ifdef HAVE_LIBNL1
+#define nl_geterror(x) nl_geterror()
+#endif
+
+/* Workaround - declaration missing */
+extern int		rtnl_link_vlan_get_id(struct rtnl_link *);
+
+static pthread_once_t device_neigh_alloc = PTHREAD_ONCE_INIT;
+#ifdef HAVE_LIBNL1
+static struct nl_handle *zero_socket;
+#else
+static struct nl_sock *zero_socket;
+#endif
+
+union sktaddr {
+	struct sockaddr s;
+	struct sockaddr_in s4;
+	struct sockaddr_in6 s6;
+};
+
+struct skt {
+	union sktaddr sktaddr;
+	socklen_t len;
+};
+
+
+static int set_link_port(union sktaddr *s, int port, int oif)
+{
+	switch (s->s.sa_family) {
+	case AF_INET:
+		s->s4.sin_port = port;
+		break;
+	case AF_INET6:
+		s->s6.sin6_port = port;
+		s->s6.sin6_scope_id = oif;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int cmp_address(const struct sockaddr *s1,
+		       const struct sockaddr *s2) {
+	if (s1->sa_family != s2->sa_family)
+		return s1->sa_family ^ s2->sa_family;
+
+	switch (s1->sa_family) {
+	case AF_INET:
+		return ((struct sockaddr_in *)s1)->sin_addr.s_addr ^
+		       ((struct sockaddr_in *)s2)->sin_addr.s_addr;
+	case AF_INET6:
+		return memcmp(
+			((struct sockaddr_in6 *)s1)->sin6_addr.s6_addr,
+			((struct sockaddr_in6 *)s2)->sin6_addr.s6_addr,
+			sizeof(((struct sockaddr_in6 *)s1)->sin6_addr.s6_addr));
+	default:
+		return -ENOTSUP;
+	}
+}
+
+
+static int get_ifindex(const struct sockaddr *s)
+{
+	struct ifaddrs *ifaddr, *ifa;
+	int name2index = -ENODEV;
+
+	if (-1 == getifaddrs(&ifaddr))
+		return errno;
+
+	for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) {
+		if (NULL == ifa->ifa_addr)
+			continue;
+
+		if (!cmp_address(ifa->ifa_addr, s)) {
+			name2index = if_nametoindex(ifa->ifa_name);
+			break;
+		}
+	}
+
+	freeifaddrs(ifaddr);
+
+	return name2index;
+}
+
+static struct nl_addr *get_neigh_mac(struct get_neigh_handler *neigh_handler)
+{
+	struct rtnl_neigh *neigh;
+	struct nl_addr *ll_addr = NULL;
+
+	/* future optimization - if link local address - parse address and
+	 * return mac */
+	neigh = rtnl_neigh_get(neigh_handler->neigh_cache,
+			       neigh_handler->oif,
+			       neigh_handler->dst);
+	if (NULL == neigh) {
+		print_dbg("Neigh isn't at cache\n");
+		return NULL;
+	}
+
+	ll_addr = rtnl_neigh_get_lladdr(neigh);
+	if (NULL == ll_addr)
+		print_err("Failed to get hw address from neighbour\n");
+	else
+		ll_addr = nl_addr_clone(ll_addr);
+
+	rtnl_neigh_put(neigh);
+	return ll_addr;
+}
+
+static void get_neigh_cb_event(struct nl_object *obj, void *arg)
+{
+	struct get_neigh_handler *neigh_handler =
+		(struct get_neigh_handler *)arg;
+	/* assumed serilized callback (no parallel execution of function) */
+	if (nl_object_match_filter(
+		obj,
+		(struct nl_object *)neigh_handler->filter_neigh)) {
+		struct rtnl_neigh *neigh = (struct rtnl_neigh *)obj;
+		print_dbg("Found a match for neighbour\n");
+		/* check that we didn't set it already */
+		if (NULL == neigh_handler->found_ll_addr) {
+			if (NULL == rtnl_neigh_get_lladdr(neigh)) {
+				print_err("Neighbour doesn't have a hw addr\n");
+				return;
+			}
+			neigh_handler->found_ll_addr =
+				nl_addr_clone(rtnl_neigh_get_lladdr(neigh));
+			if (NULL == neigh_handler->found_ll_addr)
+				print_err("Couldn't copy neighbour hw addr\n");
+		}
+	}
+}
+
+static int get_neigh_cb(struct nl_msg *msg, void *arg)
+{
+	struct get_neigh_handler *neigh_handler =
+		(struct get_neigh_handler *)arg;
+
+	if (nl_msg_parse(msg, &get_neigh_cb_event, neigh_handler) < 0)
+		print_err("Unknown event\n");
+
+	return NL_OK;
+}
+
+static void set_neigh_filter(struct get_neigh_handler *neigh_handler,
+			     struct rtnl_neigh *filter) {
+	neigh_handler->filter_neigh = filter;
+}
+
+static struct rtnl_neigh *create_filter_neigh_for_dst(struct nl_addr *dst_addr,
+						      int oif)
+{
+	struct rtnl_neigh *filter_neigh;
+
+	filter_neigh = rtnl_neigh_alloc();
+	if (NULL == filter_neigh) {
+		print_err("Couldn't allocate filter neigh\n");
+		return NULL;
+	}
+	rtnl_neigh_set_ifindex(filter_neigh, oif);
+	rtnl_neigh_set_dst(filter_neigh, dst_addr);
+
+	return filter_neigh;
+}
+
+#define PORT_DISCARD htons(9)
+#define SEND_PAYLOAD "H"
+
+static int create_socket(struct get_neigh_handler *neigh_handler,
+			 struct skt *addr_dst, int *psock_fd)
+{
+	int err;
+	struct skt addr_src;
+	int sock_fd;
+
+	memset(addr_dst, 0, sizeof(*addr_dst));
+	memset(&addr_src, 0, sizeof(addr_src));
+	addr_src.len = sizeof(addr_src.sktaddr);
+
+	err = nl_addr_fill_sockaddr(neigh_handler->src,
+				    &addr_src.sktaddr.s,
+				    &addr_src.len);
+	if (err) {
+		print_err("couldn't convert src to sockaddr\n");
+		return err;
+	}
+
+	addr_dst->len = sizeof(addr_dst->sktaddr);
+	err = nl_addr_fill_sockaddr(neigh_handler->dst,
+				    &addr_dst->sktaddr.s,
+				    &addr_dst->len);
+	if (err) {
+		print_err("couldn't convert dst to sockaddr\n");
+		return err;
+	}
+
+	err = set_link_port(&addr_dst->sktaddr, PORT_DISCARD,
+			    neigh_handler->oif);
+	if (err)
+		return err;
+
+	sock_fd = socket(addr_dst->sktaddr.s.sa_family,
+			 SOCK_DGRAM | SOCK_CLOEXEC, 0);
+	if (sock_fd == -1)
+		return errno ? -errno : -1;
+	err = bind(sock_fd, &addr_src.sktaddr.s, addr_src.len);
+	if (err) {
+		int bind_err = -errno;
+		print_err("Couldn't bind socket\n");
+		close(sock_fd);
+		return bind_err ?: err;
+	}
+
+	*psock_fd = sock_fd;
+
+	return 0;
+}
+
+#define NUM_OF_RETRIES 10
+#define NUM_OF_TRIES ((NUM_OF_RETRIES) + 1)
+#if NUM_OF_TRIES < 1
+#error "neigh: invalid value of NUM_OF_RETRIES"
+#endif
+static int create_timer(struct get_neigh_handler *neigh_handler)
+{
+	int user_timeout = neigh_handler->timeout/NUM_OF_TRIES;
+	struct timespec timeout = {
+		.tv_sec = user_timeout / 1000,
+		.tv_nsec = (user_timeout % 1000) * 1000000
+	};
+	struct itimerspec timer_time = {.it_value = timeout};
+	int timer_fd;
+
+	timer_fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC);
+	if (-1 == timer_fd) {
+		print_err("Couldn't create timer\n");
+		return timer_fd;
+	}
+
+	if (neigh_handler->timeout) {
+		if (NUM_OF_TRIES <= 1)
+			bzero(&timer_time.it_interval,
+			      sizeof(timer_time.it_interval));
+		else
+			timer_time.it_interval = timeout;
+		if (timerfd_settime(timer_fd, 0, &timer_time, NULL)) {
+			print_err("Couldn't set timer\n");
+			return -1;
+		}
+	}
+
+	return timer_fd;
+}
+
+#define UDP_SOCKET_MAX_SENDTO 100000ULL
+
+static struct nl_addr *process_get_neigh_mac(
+		struct get_neigh_handler *neigh_handler)
+{
+	int err;
+	struct nl_addr *ll_addr = get_neigh_mac(neigh_handler);
+	struct rtnl_neigh *neigh_filter;
+	fd_set fdset;
+	int sock_fd;
+	int fd;
+	int nfds;
+	int timer_fd;
+	int ret;
+	struct skt addr_dst;
+	char buff[sizeof(SEND_PAYLOAD)] = SEND_PAYLOAD;
+	int retries = 0;
+	uint64_t max_count = UDP_SOCKET_MAX_SENDTO;
+
+	if (NULL != ll_addr)
+		return ll_addr;
+
+	err = nl_socket_add_membership(neigh_handler->sock,
+				       RTNLGRP_NEIGH);
+	if (err < 0) {
+		print_err("%s\n", nl_geterror(err));
+		return NULL;
+	}
+
+	neigh_filter = create_filter_neigh_for_dst(neigh_handler->dst,
+						   neigh_handler->oif);
+	if (NULL == neigh_filter)
+		return NULL;
+
+	set_neigh_filter(neigh_handler, neigh_filter);
+
+#ifdef HAVE_LIBNL1
+	nl_disable_sequence_check(neigh_handler->sock);
+#else
+	nl_socket_disable_seq_check(neigh_handler->sock);
+#endif
+	nl_socket_modify_cb(neigh_handler->sock, NL_CB_VALID, NL_CB_CUSTOM,
+			    &get_neigh_cb, neigh_handler);
+
+	fd = nl_socket_get_fd(neigh_handler->sock);
+
+	err = create_socket(neigh_handler, &addr_dst, &sock_fd);
+
+	if (err)
+		return NULL;
+
+	do {
+		err = sendto(sock_fd, buff, sizeof(buff), 0,
+			     &addr_dst.sktaddr.s,
+			     addr_dst.len);
+		if (err > 0)
+			err = 0;
+	} while (-1 == err && EADDRNOTAVAIL == errno && --max_count);
+
+	if (err) {
+		print_err("Failed to send packet %d", err);
+		goto close_socket;
+	}
+	timer_fd = create_timer(neigh_handler);
+	if (timer_fd < 0)
+		goto close_socket;
+
+	nfds = MAX(fd, timer_fd) + 1;
+
+
+	while (1) {
+		FD_ZERO(&fdset);
+		FD_SET(fd, &fdset);
+		FD_SET(timer_fd, &fdset);
+
+		/* wait for an incoming message on the netlink socket */
+		ret = select(nfds, &fdset, NULL, NULL, NULL);
+
+		if (ret) {
+			if (FD_ISSET(fd, &fdset)) {
+				nl_recvmsgs_default(neigh_handler->sock);
+				if (neigh_handler->found_ll_addr)
+					break;
+			} else {
+				nl_cache_refill(neigh_handler->sock,
+						neigh_handler->neigh_cache);
+				ll_addr = get_neigh_mac(neigh_handler);
+				if (NULL != ll_addr) {
+					break;
+				} else if (FD_ISSET(timer_fd, &fdset) &&
+					   retries < NUM_OF_RETRIES) {
+					if (sendto(sock_fd, buff, sizeof(buff),
+						   0, &addr_dst.sktaddr.s,
+						   addr_dst.len) < 0)
+						print_err("Failed to send "
+							  "packet while waiting"
+							  " for events\n");
+				}
+			}
+
+			if (FD_ISSET(timer_fd, &fdset)) {
+				uint64_t read_val;
+				if (read(timer_fd, &read_val, sizeof(read_val))
+				    < 0)
+					print_dbg("Read timer_fd failed\n");
+				if (++retries >=  NUM_OF_TRIES) {
+					print_dbg("Timeout while trying to fetch "
+						  "neighbours\n");
+					break;
+				}
+			}
+		}
+	}
+	close(timer_fd);
+close_socket:
+	close(sock_fd);
+	return ll_addr ? ll_addr : neigh_handler->found_ll_addr;
+}
+
+
+static int get_mcast_mac_ipv4(struct nl_addr *dst, struct nl_addr **ll_addr)
+{
+	char mac_addr[6] = {0x01, 0x00, 0x5E};
+	uint32_t addr = ntohl(*(uint32_t *)nl_addr_get_binary_addr(dst));
+	mac_addr[5] = addr & 0xFF;
+	addr >>= 8;
+	mac_addr[4] = addr & 0xFF;
+	addr >>= 8;
+	mac_addr[3] = addr & 0x7F;
+
+	*ll_addr = nl_addr_build(AF_LLC, mac_addr, sizeof(mac_addr));
+
+	return *ll_addr == NULL ? -EINVAL : 0;
+}
+
+static int get_mcast_mac_ipv6(struct nl_addr *dst, struct nl_addr **ll_addr)
+{
+	char mac_addr[6] = {0x33, 0x33};
+	memcpy(mac_addr + 2, (char *)nl_addr_get_binary_addr(dst) + 12, 4);
+
+	*ll_addr = nl_addr_build(AF_LLC, mac_addr, sizeof(mac_addr));
+
+	return *ll_addr == NULL ? -EINVAL : 0;
+}
+
+static int get_link_local_mac_ipv6(struct nl_addr *dst,
+				   struct nl_addr **ll_addr)
+{
+	char mac_addr[6];
+
+	memcpy(mac_addr + 3, (char *)nl_addr_get_binary_addr(dst) + 13, 3);
+	memcpy(mac_addr, (char *)nl_addr_get_binary_addr(dst) + 8, 3);
+	mac_addr[0] ^= 2;
+
+	*ll_addr = nl_addr_build(AF_LLC, mac_addr, sizeof(mac_addr));
+	return *ll_addr == NULL ? -EINVAL : 0;
+}
+
+const struct encoded_l3_addr {
+	short family;
+	uint8_t prefix_bits;
+	const char data[16];
+	int (*getter)(struct nl_addr *dst, struct nl_addr **ll_addr);
+} encoded_prefixes[] = {
+	{.family = AF_INET,
+	 .prefix_bits = 4,
+	 .data = {0xe0},
+	 .getter = &get_mcast_mac_ipv4},
+	{.family = AF_INET6,
+	 .prefix_bits = 8,
+	 .data = {0xff},
+	 .getter = &get_mcast_mac_ipv6},
+	{.family = AF_INET6,
+	 .prefix_bits = 64,
+	 .data = {0xfe, 0x80},
+	 .getter = get_link_local_mac_ipv6},
+};
+
+int nl_addr_cmp_prefix_msb(void *addr1, int len1, void *addr2, int len2)
+{
+	int len = MIN(len1, len2);
+	int bytes = len / 8;
+	int d = memcmp(addr1, addr2, bytes);
+
+	if (d == 0) {
+		int mask = ((1UL << (len % 8)) - 1UL) << (8 - len);
+
+		d = (((char *)addr1)[bytes] & mask) -
+		    (((char *)addr2)[bytes] & mask);
+	}
+
+	return d;
+}
+static int handle_encoded_mac(struct nl_addr *dst, struct nl_addr **ll_addr)
+{
+	uint32_t family = nl_addr_get_family(dst);
+	struct nl_addr *prefix = NULL;
+	int i;
+	int ret = 1;
+
+	for (i = 0;
+	     i < sizeof(encoded_prefixes)/sizeof(encoded_prefixes[0]) &&
+	     ret; prefix = NULL, i++) {
+		if (encoded_prefixes[i].family != family)
+			continue;
+
+		prefix = nl_addr_build(
+				family,
+				(void *)encoded_prefixes[i].data,
+				MIN(encoded_prefixes[i].prefix_bits/8 +
+				    !!(encoded_prefixes[i].prefix_bits % 8),
+				    sizeof(encoded_prefixes[i].data)));
+
+		if (NULL == prefix)
+			return -ENOMEM;
+		nl_addr_set_prefixlen(prefix,
+				      encoded_prefixes[i].prefix_bits);
+
+		if (nl_addr_cmp_prefix_msb(nl_addr_get_binary_addr(dst),
+					   nl_addr_get_prefixlen(dst),
+					   nl_addr_get_binary_addr(prefix),
+					   nl_addr_get_prefixlen(prefix)))
+			continue;
+
+		ret = encoded_prefixes[i].getter(dst, ll_addr);
+#ifdef HAVE_LIBNL1
+		nl_addr_destroy(prefix);
+#else
+		nl_addr_put(prefix);
+#endif
+	}
+
+	return ret;
+}
+
+static void get_route_cb_parser(struct nl_object *obj, void *arg)
+{
+	struct get_neigh_handler *neigh_handler =
+		(struct get_neigh_handler *)arg;
+
+	struct rtnl_route *route = (struct rtnl_route *)obj;
+	struct nl_addr *gateway;
+	struct nl_addr *src = rtnl_route_get_pref_src(route);
+	int oif;
+	int type = rtnl_route_get_type(route);
+	struct rtnl_link *link;
+
+#ifdef HAVE_LIBNL1
+	gateway = rtnl_route_get_gateway(route);
+	oif = rtnl_route_get_oif(route);
+#else
+	struct rtnl_nexthop *nh = rtnl_route_nexthop_n(route, 0);
+	if (!nh)
+		print_err("Out of memory\n");
+	gateway = rtnl_route_nh_get_gateway(nh);
+	oif = rtnl_route_nh_get_ifindex(nh);
+#endif
+
+	if (gateway) {
+#ifdef HAVE_LIBNL1
+		nl_addr_destroy(neigh_handler->dst);
+#else
+		nl_addr_put(neigh_handler->dst);
+#endif
+		neigh_handler->dst = nl_addr_clone(gateway);
+		print_dbg("Found gateway\n");
+	}
+
+	if (RTN_BLACKHOLE == type ||
+	    RTN_UNREACHABLE == type ||
+	    RTN_PROHIBIT == type ||
+	    RTN_THROW == type) {
+		print_err("Destination unrechable (type %d)\n", type);
+		goto err;
+	}
+
+	if (!neigh_handler->src && src)
+		neigh_handler->src = nl_addr_clone(src);
+
+	if (neigh_handler->oif < 0 && oif > 0)
+		neigh_handler->oif = oif;
+
+	/* Link Local */
+	if (RTN_LOCAL == type) {
+		struct nl_addr *lladdr;
+
+		link = rtnl_link_get(neigh_handler->link_cache,
+				     neigh_handler->oif);
+
+		if (NULL == link)
+			goto err;
+
+		lladdr = rtnl_link_get_addr(link);
+
+		if (NULL == lladdr)
+			goto err_link;
+
+		neigh_handler->found_ll_addr = nl_addr_clone(lladdr);
+		rtnl_link_put(link);
+	} else {
+		if (!handle_encoded_mac(
+				neigh_handler->dst,
+				&neigh_handler->found_ll_addr))
+			print_address(stderr, "calculated address %s\n",
+				      neigh_handler->found_ll_addr);
+	}
+
+	print_address(stderr, "Current dst %s\n", neigh_handler->dst);
+	return;
+
+err_link:
+	rtnl_link_put(link);
+err:
+	if (neigh_handler->src) {
+#ifdef HAVE_LIBNL1
+		nl_addr_put(neigh_handler->src);
+#else
+		nl_addr_put(neigh_handler->src);
+#endif
+		neigh_handler->src = NULL;
+	}
+}
+
+static int get_route_cb(struct nl_msg *msg, void *arg)
+{
+	struct get_neigh_handler *neigh_handler =
+		(struct get_neigh_handler *)arg;
+	int err;
+
+	err = nl_msg_parse(msg, &get_route_cb_parser, neigh_handler);
+	if (err < 0) {
+		print_err("Unable to parse object: %s", nl_geterror(err));
+		return err;
+	}
+
+	if (!neigh_handler->dst || !neigh_handler->src ||
+	    neigh_handler->oif <= 0) {
+		print_err("Missing params\n");
+		return -1;
+	}
+
+	if (NULL != neigh_handler->found_ll_addr)
+		goto found;
+
+	neigh_handler->found_ll_addr =
+		process_get_neigh_mac(neigh_handler);
+	if (neigh_handler->found_ll_addr)
+		print_address(stderr, "Neigh is %s\n",
+			      neigh_handler->found_ll_addr);
+
+found:
+	return neigh_handler->found_ll_addr ? 0 : -1;
+}
+
+int neigh_get_oif_from_src(struct get_neigh_handler *neigh_handler)
+{
+	int oif = -ENODEV;
+	struct addrinfo *src_info;
+
+#ifdef HAVE_LIBNL1
+	src_info = nl_addr_info(neigh_handler->src);
+	if (NULL == src_info) {
+#else
+	int err;
+	err = nl_addr_info(neigh_handler->src, &src_info);
+	if (err) {
+#endif
+		print_err("Unable to get address info %s\n",
+			  nl_geterror(err));
+		return oif;
+	}
+
+	oif = get_ifindex(src_info->ai_addr);
+	if (oif <= 0)
+		goto free;
+
+	print_dbg("IF index is %d\n", oif);
+
+free:
+	freeaddrinfo(src_info);
+	return oif;
+}
+
+static void destroy_zero_based_socket(void)
+{
+	if (NULL != zero_socket) {
+		print_dbg("destroying zero based socket\n");
+#ifdef HAVE_LIBNL1
+		nl_handle_destroy(zero_socket);
+#else
+		nl_socket_free(zero_socket);
+#endif
+	}
+}
+
+static void alloc_zero_based_socket(void)
+{
+#ifdef HAVE_LIBNL1
+	zero_socket = nl_handle_alloc();
+#else
+	zero_socket = nl_socket_alloc();
+#endif
+	print_dbg("creating zero based socket\n");
+	atexit(&destroy_zero_based_socket);
+}
+
+int neigh_init_resources(struct get_neigh_handler *neigh_handler, int timeout)
+{
+	int err;
+
+	pthread_once(&device_neigh_alloc, &alloc_zero_based_socket);
+#ifdef HAVE_LIBNL1
+	neigh_handler->sock = nl_handle_alloc();
+#else
+	neigh_handler->sock = nl_socket_alloc();
+#endif
+	if (NULL == neigh_handler->sock) {
+		print_err("Unable to allocate netlink socket\n");
+		return -ENOMEM;
+	}
+
+	err = nl_connect(neigh_handler->sock, NETLINK_ROUTE);
+	if (err < 0) {
+		print_err("Unable to connect netlink socket: %s\n",
+			  nl_geterror(err));
+		goto free_socket;
+	}
+
+#ifdef HAVE_LIBNL1
+	neigh_handler->link_cache =
+		rtnl_link_alloc_cache(neigh_handler->sock);
+	if (NULL == neigh_handler->link_cache) {
+		err = -ENOMEM;
+#else
+
+	err = rtnl_link_alloc_cache(neigh_handler->sock, AF_UNSPEC,
+				    &neigh_handler->link_cache);
+	if (err) {
+#endif
+		print_err("Unable to allocate link cache: %s\n",
+			  nl_geterror(err));
+		err = -ENOMEM;
+		goto close_connection;
+	}
+
+	nl_cache_mngt_provide(neigh_handler->link_cache);
+
+#ifdef HAVE_LIBNL1
+	neigh_handler->route_cache =
+		rtnl_route_alloc_cache(neigh_handler->sock);
+	if (NULL == neigh_handler->route_cache) {
+		err = -ENOMEM;
+#else
+	err = rtnl_route_alloc_cache(neigh_handler->sock, AF_UNSPEC, 0,
+				     &neigh_handler->route_cache);
+	if (err) {
+#endif
+		print_err("Unable to allocate route cache: %s\n",
+			  nl_geterror(err));
+		goto free_link_cache;
+	}
+
+	nl_cache_mngt_provide(neigh_handler->route_cache);
+
+#ifdef HAVE_LIBNL1
+	neigh_handler->neigh_cache =
+		rtnl_neigh_alloc_cache(neigh_handler->sock);
+	if (NULL == neigh_handler->neigh_cache) {
+		err = -ENOMEM;
+#else
+	err = rtnl_neigh_alloc_cache(neigh_handler->sock,
+				     &neigh_handler->neigh_cache);
+	if (err) {
+#endif
+		print_err("Couldn't allocate neigh cache %s\n",
+			  nl_geterror(err));
+		goto free_route_cache;
+	}
+
+	nl_cache_mngt_provide(neigh_handler->neigh_cache);
+
+	/* init structure */
+	neigh_handler->timeout = timeout;
+	neigh_handler->oif = -1;
+	neigh_handler->filter_neigh = NULL;
+	neigh_handler->found_ll_addr = NULL;
+	neigh_handler->dst = NULL;
+	neigh_handler->src = NULL;
+	neigh_handler->vid = -1;
+	neigh_handler->neigh_status = GET_NEIGH_STATUS_OK;
+
+	return 0;
+
+free_route_cache:
+	nl_cache_mngt_unprovide(neigh_handler->route_cache);
+	nl_cache_free(neigh_handler->route_cache);
+	neigh_handler->route_cache = NULL;
+free_link_cache:
+	nl_cache_mngt_unprovide(neigh_handler->link_cache);
+	nl_cache_free(neigh_handler->link_cache);
+	neigh_handler->link_cache = NULL;
+close_connection:
+	nl_close(neigh_handler->sock);
+free_socket:
+#ifdef HAVE_LIBNL1
+	nl_handle_destroy(neigh_handler->sock);
+		nl_handle_destroy(zero_socket);
+#else
+	nl_socket_free(neigh_handler->sock);
+#endif
+	neigh_handler->sock = NULL;
+	return err;
+}
+
+uint16_t neigh_get_vlan_id_from_dev(struct get_neigh_handler *neigh_handler)
+{
+	struct rtnl_link *link;
+	int vid = 0xffff;
+
+	link = rtnl_link_get(neigh_handler->link_cache, neigh_handler->oif);
+	if (NULL == link) {
+		print_err("Can't find link in cache\n");
+		return -EINVAL;
+	}
+
+#ifndef HAVE_LIBNL1
+	if (rtnl_link_is_vlan(link))
+#endif
+		vid = rtnl_link_vlan_get_id(link);
+	rtnl_link_put(link);
+	return vid >= 0 && vid <= 0xfff ? vid : 0xffff;
+}
+
+void neigh_set_vlan_id(struct get_neigh_handler *neigh_handler, uint16_t vid)
+{
+	if (vid >= 0 && vid <= 0xfff)
+		neigh_handler->vid = vid;
+}
+
+int neigh_set_dst(struct get_neigh_handler *neigh_handler,
+		  int family, void *buf, size_t size)
+{
+	neigh_handler->dst = nl_addr_build(family, buf, size);
+	return NULL == neigh_handler->dst;
+}
+
+int neigh_set_src(struct get_neigh_handler *neigh_handler,
+		  int family, void *buf, size_t size)
+{
+	neigh_handler->src = nl_addr_build(family, buf, size);
+	return NULL == neigh_handler->src;
+}
+
+void neigh_set_oif(struct get_neigh_handler *neigh_handler, int oif)
+{
+	neigh_handler->oif = oif;
+}
+
+int neigh_get_ll(struct get_neigh_handler *neigh_handler, void *addr_buff,
+		 int addr_size) {
+	int neigh_len;
+
+	if (NULL == neigh_handler->found_ll_addr)
+		return -EINVAL;
+
+	 neigh_len = nl_addr_get_len(neigh_handler->found_ll_addr);
+
+	if (neigh_len > addr_size)
+		return -EINVAL;
+
+	memcpy(addr_buff, nl_addr_get_binary_addr(neigh_handler->found_ll_addr),
+	       neigh_len);
+
+	return neigh_len;
+}
+
+void neigh_free_resources(struct get_neigh_handler *neigh_handler)
+{
+	/* Should be released first because it's holding a reference to dst */
+	if (NULL != neigh_handler->filter_neigh) {
+		rtnl_neigh_put(neigh_handler->filter_neigh);
+		neigh_handler->filter_neigh = NULL;
+	}
+
+	if (NULL != neigh_handler->src) {
+#ifdef HAVE_LIBNL1
+		nl_addr_put(neigh_handler->src);
+#else
+		nl_addr_put(neigh_handler->src);
+#endif
+		neigh_handler->src = NULL;
+	}
+
+	if (NULL != neigh_handler->dst) {
+#ifdef HAVE_LIBNL1
+		nl_addr_destroy(neigh_handler->dst);
+#else
+		nl_addr_put(neigh_handler->dst);
+#endif
+		neigh_handler->dst = NULL;
+	}
+
+	if (NULL != neigh_handler->found_ll_addr) {
+#ifdef HAVE_LIBNL1
+		nl_addr_destroy(neigh_handler->found_ll_addr);
+#else
+		nl_addr_put(neigh_handler->found_ll_addr);
+#endif
+		neigh_handler->found_ll_addr = NULL;
+	}
+
+	if (NULL != neigh_handler->neigh_cache) {
+		nl_cache_mngt_unprovide(neigh_handler->neigh_cache);
+		nl_cache_free(neigh_handler->neigh_cache);
+		neigh_handler->neigh_cache = NULL;
+	}
+
+	if (NULL != neigh_handler->route_cache) {
+		nl_cache_mngt_unprovide(neigh_handler->route_cache);
+		nl_cache_free(neigh_handler->route_cache);
+		neigh_handler->route_cache = NULL;
+	}
+
+	if (NULL != neigh_handler->link_cache) {
+		nl_cache_mngt_unprovide(neigh_handler->link_cache);
+		nl_cache_free(neigh_handler->link_cache);
+		neigh_handler->link_cache = NULL;
+	}
+
+	if (NULL != neigh_handler->sock) {
+		nl_close(neigh_handler->sock);
+#ifdef HAVE_LIBNL1
+		nl_handle_destroy(neigh_handler->sock);
+#else
+		nl_socket_free(neigh_handler->sock);
+#endif
+		neigh_handler->sock = NULL;
+	}
+}
+
+int process_get_neigh(struct get_neigh_handler *neigh_handler)
+{
+	struct nl_msg *m;
+	struct rtmsg rmsg = {
+		.rtm_family = nl_addr_get_family(neigh_handler->dst),
+		.rtm_dst_len = nl_addr_get_prefixlen(neigh_handler->dst),
+	};
+	int err;
+	int last_status;
+
+	last_status = __sync_fetch_and_or(
+			&neigh_handler->neigh_status,
+			GET_NEIGH_STATUS_IN_PROCESS);
+
+	if (last_status != GET_NEIGH_STATUS_OK)
+		return -EINPROGRESS;
+
+	m = nlmsg_alloc_simple(RTM_GETROUTE, 0);
+
+	if (NULL == m)
+		return -ENOMEM;
+
+	nlmsg_append(m, &rmsg, sizeof(rmsg), NLMSG_ALIGNTO);
+
+	nla_put_addr(m, RTA_DST, neigh_handler->dst);
+
+	if (neigh_handler->oif > 0)
+		nla_put_u32(m, RTA_OIF, neigh_handler->oif);
+
+	err = nl_send_auto_complete(neigh_handler->sock, m);
+	nlmsg_free(m);
+	if (err < 0) {
+		print_err("%s", nl_geterror(err));
+		return err;
+	}
+
+	nl_socket_modify_cb(neigh_handler->sock, NL_CB_VALID,
+			    NL_CB_CUSTOM, &get_route_cb, neigh_handler);
+
+	err = nl_recvmsgs_default(neigh_handler->sock);
+	if (err < 0) {
+		print_err("%s", nl_geterror(err));
+		__sync_fetch_and_or(
+			&neigh_handler->neigh_status,
+			GET_NEIGH_STATUS_ERR);
+	}
+
+	__sync_fetch_and_and(
+		&neigh_handler->neigh_status,
+		~GET_NEIGH_STATUS_IN_PROCESS);
+
+	return err;
+}
diff --git a/src/neigh.h b/src/neigh.h
new file mode 100644
index 0000000..09e01e1
--- /dev/null
+++ b/src/neigh.h
@@ -0,0 +1,53 @@
+#ifndef _GET_NEIGH_
+#define _GET_NEIGH_
+
+#include <stddef.h>
+#include <stdint.h>
+#include "config.h"
+#ifdef HAVE_LIBNL1
+#include <netlink/object.h>
+#else
+#include <netlink/object-api.h>
+#endif
+
+enum get_neigh_status {
+	GET_NEIGH_STATUS_OK = 0,
+	GET_NEIGH_STATUS_IN_PROCESS = 1 << 0,
+	GET_NEIGH_STATUS_ERR = 1 << 1,
+};
+
+struct get_neigh_handler {
+#ifdef HAVE_LIBNL1
+	struct nl_handle *sock;
+#else
+	struct nl_sock *sock;
+#endif
+	struct nl_cache *link_cache;
+	struct nl_cache	*neigh_cache;
+	struct nl_cache *route_cache;
+	int32_t oif;
+	int vid;
+	struct rtnl_neigh *filter_neigh;
+	struct nl_addr *found_ll_addr;
+	struct nl_addr *dst;
+	struct nl_addr *src;
+	enum get_neigh_status neigh_status;
+	uint64_t timeout;
+};
+
+int process_get_neigh(struct get_neigh_handler *neigh_handler);
+void neigh_free_resources(struct get_neigh_handler *neigh_handler);
+void neigh_set_vlan_id(struct get_neigh_handler *neigh_handler, uint16_t vid);
+uint16_t neigh_get_vlan_id_from_dev(struct get_neigh_handler *neigh_handler);
+int neigh_init_resources(struct get_neigh_handler *neigh_handler, int timeout);
+
+int neigh_set_src(struct get_neigh_handler *neigh_handler,
+		  int family, void *buf, size_t size);
+void neigh_set_oif(struct get_neigh_handler *neigh_handler, int oif);
+int neigh_set_dst(struct get_neigh_handler *neigh_handler,
+		  int family, void *buf, size_t size);
+int neigh_get_oif_from_src(struct get_neigh_handler *neigh_handler);
+int neigh_get_ll(struct get_neigh_handler *neigh_handler, void *addr_buf,
+		 int addr_size);
+
+#endif
diff --git a/src/verbs.c b/src/verbs.c
index a6aae70..22bb996 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -43,6 +43,11 @@
 #include <string.h>
 
 #include "ibverbs.h"
+#ifndef NRESOLVE_NEIGH
+#include <net/if.h>
+#include <net/if_arp.h>
+#include "neigh.h"
+#endif
 
 int ibv_rate_to_mult(enum ibv_rate rate)
 {
@@ -591,3 +596,141 @@ int __ibv_detach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid
 	return qp->context->ops.detach_mcast(qp, gid, lid);
 }
 default_symver(__ibv_detach_mcast, ibv_detach_mcast);
+
+static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
+{
+	return ((a->s6_addr32[0] | a->s6_addr32[1]) |
+		(a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL ||
+		/* IPv4 encoded multicast addresses */
+	       (a->s6_addr32[0]  == htonl(0xff0e0000) &&
+		((a->s6_addr32[1] |
+		  (a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL));
+}
+
+
+struct peer_address {
+	void *address;
+	uint32_t size;
+};
+
+static inline int create_peer_from_gid(int family, void *raw_gid,
+				       struct peer_address *peer_address)
+{
+	switch (family) {
+	case AF_INET:
+		peer_address->address = raw_gid + 12;
+		peer_address->size = 4;
+		break;
+	case AF_INET6:
+		peer_address->address = raw_gid;
+		peer_address->size = 16;
+		break;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+#define NEIGH_GET_DEFAULT_TIMEOUT_MS 3000
+int ibv_resolve_eth_l2_from_gid(struct ibv_context *context,
+				struct ibv_ah_attr *attr,
+				uint8_t eth_mac[ETHERNET_LL_SIZE], uint16_t *vid)
+{
+#ifndef NRESOLVE_NEIGH
+	int dst_family;
+	int src_family;
+	int oif;
+	struct get_neigh_handler neigh_handler;
+	union ibv_gid sgid;
+	int ether_len;
+	struct peer_address src;
+	struct peer_address dst;
+	uint16_t ret_vid;
+	int ret = -EINVAL;
+	int err;
+
+	err = ibv_query_gid(context, attr->port_num,
+			    attr->grh.sgid_index, &sgid);
+
+	if (err) {
+		fprintf(stderr, PFX "ibv_create_ah failed to query sgid.\n");
+		return err;
+	}
+
+	err = neigh_init_resources(&neigh_handler, NEIGH_GET_DEFAULT_TIMEOUT_MS);
+
+	if (err)
+		return err;
+
+	dst_family = ipv6_addr_v4mapped((struct in6_addr *)attr->grh.dgid.raw) ?
+			AF_INET : AF_INET6;
+	src_family = ipv6_addr_v4mapped((struct in6_addr *)sgid.raw) ?
+			AF_INET : AF_INET6;
+
+	if (create_peer_from_gid(dst_family, attr->grh.dgid.raw, &dst)) {
+		fprintf(stderr, PFX
+			"ibv_create_ah failed to create dst peer\n");
+		goto free_resources;
+	}
+	if (create_peer_from_gid(src_family, &sgid.raw, &src)) {
+		fprintf(stderr, PFX
+			"ibv_create_ah failed to create src peer\n");
+		goto free_resources;
+	}
+	if (neigh_set_dst(&neigh_handler, dst_family, dst.address,
+			  dst.size)) {
+		fprintf(stderr, PFX
+			"ibv_create_ah failed to create dst addr\n");
+		goto free_resources;
+	}
+
+	if (neigh_set_src(&neigh_handler, src_family, src.address,
+			  src.size)) {
+		fprintf(stderr, PFX
+			"ibv_create_ah failed to create src addr\n");
+		goto free_resources;
+	}
+
+	oif = neigh_get_oif_from_src(&neigh_handler);
+
+	if (oif > 0) {
+		neigh_set_oif(&neigh_handler, oif);
+	} else {
+		fprintf(stderr, PFX "ibv_create_ah failed to get output IF\n");
+		goto free_resources;
+	}
+
+	ret = -EHOSTUNREACH;
+
+	/* blocking call */
+	if (process_get_neigh(&neigh_handler)) {
+		fprintf(stderr, PFX "Neigh resolution process failed\n");
+		goto free_resources;
+	}
+
+	ret_vid = neigh_get_vlan_id_from_dev(&neigh_handler);
+
+	if (ret_vid <= 0xfff)
+		neigh_set_vlan_id(&neigh_handler, ret_vid);
+
+	/* We are using only Ethernet here */
+	ether_len = neigh_get_ll(&neigh_handler,
+				 eth_mac,
+				 sizeof(eth_mac));
+
+	if (ether_len <= 0)
+		goto free_resources;
+
+	*vid = ret_vid;
+
+	ret = 0;
+
+free_resources:
+	neigh_free_resources(&neigh_handler);
+
+	return ret;
+#else
+	return -ENOIMPL;
+#endif
+}
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH libibverbs V5 0/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found] ` <1408517381-17523-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2014-08-20  6:49   ` [PATCH libibverbs V5 1/2] Add ibv_port_cap_flags Matan Barak
  2014-08-20  6:49   ` [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution Matan Barak
@ 2014-08-20 16:26   ` Jason Gunthorpe
  2 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2014-08-20 16:26 UTC (permalink / raw)
  To: Matan Barak
  Cc: Roland Dreier, Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, Or Gerlitz

On Wed, Aug 20, 2014 at 09:49:39AM +0300, Matan Barak wrote:
> Hi Roland,
> 
> This series adds support for Ethernet L2 address resolution for RoCE UD QPs,
> whose L2 address-handles, unlike RC/UC/XRC/etc QPs are set from user space
> without going through uverbs and the kernel IB core. The code is also
> compatible both with old kernels that don't run IP based addressing.

I think this looks much better now, avoiding churning the public API
has made this much less risky..

I'm not sure Roland's feeling on the subject, but I would like to see
your code be more consistent with the rest of verbs:
 - No 'reversed equal' ie verbs doesn't do: 'NULL == foo'
 - Pointless '^' equality 'optimization'
 - Wouldn't hurt to fix the various checkpatch warnings (for the most
   part verbs is in the style of Linux)
 - System libraries should not be randomly printing things in error
   cases. Use errno properly.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]     ` <1408517381-17523-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-08-20 17:01       ` Jason Gunthorpe
       [not found]         ` <20140820170142.GC12605-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2014-08-20 17:01 UTC (permalink / raw)
  To: Matan Barak
  Cc: Roland Dreier, Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, Or Gerlitz

On Wed, Aug 20, 2014 at 09:49:41AM +0300, Matan Barak wrote:

> +#define MAX(a, b) ((a) > (b) ? (a) : (b))
> +#define MIN(a, b) ((a) < (b) ? (a) : (b))
> +#define print_hdr PFX "resolver: "
> +#define print_err(...) fprintf(stderr, print_hdr __VA_ARGS__)

System libraries should really not be printing things on error
paths...

> +static int set_link_port(union sktaddr *s, int port, int oif)
> +{
> +	switch (s->s.sa_family) {
> +	case AF_INET:
> +		s->s4.sin_port = port;
> +		break;
> +	case AF_INET6:
> +		s->s6.sin6_port = port;
> +		s->s6.sin6_scope_id = oif;

Does flowlabel get initialized someplace?

> +static int cmp_address(const struct sockaddr *s1,
> +		       const struct sockaddr *s2) {

'cmp' functions that only test equality should return bool for
clarity. Otherwise people will assume the usual -1,0,1 return style,
which this does not implement.

> +	if (s1->sa_family != s2->sa_family)
> +		return s1->sa_family ^ s2->sa_family;

Ditch the ^ for compare, pointless.

> +static struct nl_addr *get_neigh_mac(struct get_neigh_handler *neigh_handler)
> +{
> +	struct rtnl_neigh *neigh;
> +	struct nl_addr *ll_addr = NULL;
> +
> +	/* future optimization - if link local address - parse address and
> +	 * return mac */

What does this comment refer to? MACs should never be parsed out of
IPv6 addresses.

> +	sock_fd = socket(addr_dst->sktaddr.s.sa_family,
> +			 SOCK_DGRAM | SOCK_CLOEXEC, 0);
> +	if (sock_fd == -1)
> +		return errno ? -errno : -1;

If socket fails then errno is always set, not sure the safety test is
necessary, if it is, then -1 is not the right result, it should be
-EINVAL or something.

> +	err = bind(sock_fd, &addr_src.sktaddr.s, addr_src.len);
> +	if (err) {
> +		int bind_err = -errno;
> +		print_err("Couldn't bind socket\n");
> +		close(sock_fd);
> +		return bind_err ?: err;

bind does not return an errno code, so it should never be the return
result.

> +	timer_fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC);
> +	if (-1 == timer_fd) {
> +		print_err("Couldn't create timer\n");
> +		return timer_fd;
> +	}

The use of timerfd will impact the minimum OS version, have you
checked this is OK? Does RHEL5 still work?

> +	while (1) {
> +		FD_ZERO(&fdset);
> +		FD_SET(fd, &fdset);
> +		FD_SET(timer_fd, &fdset);
> +
> +		/* wait for an incoming message on the netlink socket */
> +		ret = select(nfds, &fdset, NULL, NULL, NULL);

poll is a better choice here, it would also be fairly simple to remove
timerfd by using the timeout arg.

> +					if (sendto(sock_fd, buff, sizeof(buff),
> +						   0, &addr_dst.sktaddr.s,
> +						   addr_dst.len) < 0)
> +						print_err("Failed to send "
> +							  "packet while waiting"
> +							  " for
> events\n");

If the earlier sendto needs to be protected by a loop looking at
EADDRNOTAVAIL then so does this one.

I'm also not sure all this looping and complexity is needed... Why
override the kernel timers for ND?

I would think you'd check the ND table, if there is no good entry then
do the send, and then watch the ND table for a FAILED/REACHABLE
result. The kernel timers will always transition through INCOMPLETE to
one of those terminal states.

No need for more timers and retry and things, the kernel already retried.

> +static int get_mcast_mac_ipv6(struct nl_addr *dst, struct nl_addr **ll_addr)
> +{
> +	char mac_addr[6] = {0x33, 0x33};
> +	memcpy(mac_addr + 2, (char *)nl_addr_get_binary_addr(dst) + 12, 4);

(char *) should be (uint8_t *), you are not working with a string
here. Similarly for nearly all char casts. Proper use of const
throughout would be nice too..

> +const struct encoded_l3_addr {

Missing static

> +int nl_addr_cmp_prefix_msb(void *addr1, int len1, void *addr2, int len2)
> +{

Missing static

> +	struct rtnl_nexthop *nh = rtnl_route_nexthop_n(route, 0);
> +	if (!nh)
> +		print_err("Out of memory\n");
> +	gateway = rtnl_route_nh_get_gateway(nh);

And then crash?

> +err_link:
> +	rtnl_link_put(link);
> +err:
> +	if (neigh_handler->src) {
> +#ifdef HAVE_LIBNL1
> +		nl_addr_put(neigh_handler->src);

Should this be nl_addr_destroy ?

Can you do a better job with these ifdefs? I think you can get away
with a nl1_compat.h which has simple wrapper things like this:

static inline void emulate_nl_addr_put(x) {return nl_addr_destroy(x);}
#define nl_addr_put(x) emulate_nl_addr_put(x)

And purge the ifdefery from the main source.
> +
> +	last_status = __sync_fetch_and_or(
> +			&neigh_handler->neigh_status,
> +			GET_NEIGH_STATUS_IN_PROCESS);

Is there a thread running around in here someplace?

Callina raw gcc intrinsic like this should really be avoided,
especially since this isn't a performance path. Use pthreads.

> +static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
> +{
> +	return ((a->s6_addr32[0] | a->s6_addr32[1]) |
> +		(a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL ||
> +		/* IPv4 encoded multicast addresses */
> +	       (a->s6_addr32[0]  == htonl(0xff0e0000) &&
> +		((a->s6_addr32[1] |
> +		  (a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL));
> +}

Don't duplicate IN6_IS_ADDR_V4MAPPED

> +	if (err) {
> +		fprintf(stderr, PFX "ibv_create_ah failed to query
> sgid.\n");

Again, system libraries should not be printing on error, use errno
properly.

> +#else
> +	return -ENOIMPL;

Is that a typo of ENOSYS?

Be sure you have tested compiling with each and every variation of the
#ifdefs added.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]         ` <20140820170142.GC12605-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2014-08-24 13:51           ` Matan Barak
       [not found]             ` <53F9EDCD.9060105-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Matan Barak @ 2014-08-24 13:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, Or Gerlitz

On 20/8/2014 8:01 PM, Jason Gunthorpe wrote:
> On Wed, Aug 20, 2014 at 09:49:41AM +0300, Matan Barak wrote:
>
>> +#define MAX(a, b) ((a) > (b) ? (a) : (b))
>> +#define MIN(a, b) ((a) < (b) ? (a) : (b))
>> +#define print_hdr PFX "resolver: "
>> +#define print_err(...) fprintf(stderr, print_hdr __VA_ARGS__)
>
> System libraries should really not be printing things on error
> paths...

Ok, I'll only set the errno value.

>
>> +static int set_link_port(union sktaddr *s, int port, int oif)
>> +{
>> +	switch (s->s.sa_family) {
>> +	case AF_INET:
>> +		s->s4.sin_port = port;
>> +		break;
>> +	case AF_INET6:
>> +		s->s6.sin6_port = port;
>> +		s->s6.sin6_scope_id = oif;
>
> Does flowlabel get initialized someplace?
>

Do you refer to flowinfo?
If so, it isn't initialized, but should be.
0 should be good enough, so I'll memset the whole struct.

>> +static int cmp_address(const struct sockaddr *s1,
>> +		       const struct sockaddr *s2) {
>
> 'cmp' functions that only test equality should return bool for
> clarity. Otherwise people will assume the usual -1,0,1 return style,
> which this does not implement.
>

Ok, will be fixed.

>> +	if (s1->sa_family != s2->sa_family)
>> +		return s1->sa_family ^ s2->sa_family;
>
> Ditch the ^ for compare, pointless.
>

Ok

>> +static struct nl_addr *get_neigh_mac(struct get_neigh_handler *neigh_handler)
>> +{
>> +	struct rtnl_neigh *neigh;
>> +	struct nl_addr *ll_addr = NULL;
>> +
>> +	/* future optimization - if link local address - parse address and
>> +	 * return mac */
>
> What does this comment refer to? MACs should never be parsed out of
> IPv6 addresses.
>

GID0 should always be valid, even if no IPv4 and IPv6 are configured.
IPv6 link-local and multicast addresses should be parsed from the IPv6 
address.

>> +	sock_fd = socket(addr_dst->sktaddr.s.sa_family,
>> +			 SOCK_DGRAM | SOCK_CLOEXEC, 0);
>> +	if (sock_fd == -1)
>> +		return errno ? -errno : -1;
>
> If socket fails then errno is always set, not sure the safety test is
> necessary, if it is, then -1 is not the right result, it should be
> -EINVAL or something.
>
>> +	err = bind(sock_fd, &addr_src.sktaddr.s, addr_src.len);
>> +	if (err) {
>> +		int bind_err = -errno;
>> +		print_err("Couldn't bind socket\n");
>> +		close(sock_fd);
>> +		return bind_err ?: err;
>
> bind does not return an errno code, so it should never be the return
> result.
>

Ok

>> +	timer_fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC);
>> +	if (-1 == timer_fd) {
>> +		print_err("Couldn't create timer\n");
>> +		return timer_fd;
>> +	}
>
> The use of timerfd will impact the minimum OS version, have you
> checked this is OK? Does RHEL5 still work?
>

It was added in linux v2.6.25. I think that an API that's more than 6.5 
years old is valid.


>> +	while (1) {
>> +		FD_ZERO(&fdset);
>> +		FD_SET(fd, &fdset);
>> +		FD_SET(timer_fd, &fdset);
>> +
>> +		/* wait for an incoming message on the netlink socket */
>> +		ret = select(nfds, &fdset, NULL, NULL, NULL);
>
> poll is a better choice here, it would also be fairly simple to remove
> timerfd by using the timeout arg.
>

The purpose is to have a timeout on the whole process and not per loop.

>> +					if (sendto(sock_fd, buff, sizeof(buff),
>> +						   0, &addr_dst.sktaddr.s,
>> +						   addr_dst.len) < 0)
>> +						print_err("Failed to send "
>> +							  "packet while waiting"
>> +							  " for
>> events\n");
>
> If the earlier sendto needs to be protected by a loop looking at
> EADDRNOTAVAIL then so does this one.
>

If the sendto failed, we would just retry it in the next loop.

> I'm also not sure all this looping and complexity is needed... Why
> override the kernel timers for ND?
>
> I would think you'd check the ND table, if there is no good entry then
> do the send, and then watch the ND table for a FAILED/REACHABLE
> result. The kernel timers will always transition through INCOMPLETE to
> one of those terminal states.
>
> No need for more timers and retry and things, the kernel already retried.
>

Watching just for the ND event is problematic.
If the user sent a packet just after we looked at the ND and before we 
waited for an event, we would probably just fail without finding any 
neighbor. Regarding the retries, because we use UD and switches could
sometimes take a few seconds to learn the network, I chose to do a few 
reties before giving up.


>> +static int get_mcast_mac_ipv6(struct nl_addr *dst, struct nl_addr **ll_addr)
>> +{
>> +	char mac_addr[6] = {0x33, 0x33};
>> +	memcpy(mac_addr + 2, (char *)nl_addr_get_binary_addr(dst) + 12, 4);
>
> (char *) should be (uint8_t *), you are not working with a string
> here. Similarly for nearly all char casts. Proper use of const
> throughout would be nice too..
>
>> +const struct encoded_l3_addr {
>
> Missing static
>

Ok

>> +int nl_addr_cmp_prefix_msb(void *addr1, int len1, void *addr2, int len2)
>> +{
>
> Missing static
>

Right

>> +	struct rtnl_nexthop *nh = rtnl_route_nexthop_n(route, 0);
>> +	if (!nh)
>> +		print_err("Out of memory\n");
>> +	gateway = rtnl_route_nh_get_gateway(nh);
>
> And then crash?
>

Corret, I'll add a return when an error is detected.

>> +err_link:
>> +	rtnl_link_put(link);
>> +err:
>> +	if (neigh_handler->src) {
>> +#ifdef HAVE_LIBNL1
>> +		nl_addr_put(neigh_handler->src);
>
> Should this be nl_addr_destroy ?
>

Only if it was allocated by this function - should be fixed.

> Can you do a better job with these ifdefs? I think you can get away
> with a nl1_compat.h which has simple wrapper things like this:
>
> static inline void emulate_nl_addr_put(x) {return nl_addr_destroy(x);}
> #define nl_addr_put(x) emulate_nl_addr_put(x)
>

Ok

> And purge the ifdefery from the main source.
>> +
>> +	last_status = __sync_fetch_and_or(
>> +			&neigh_handler->neigh_status,
>> +			GET_NEIGH_STATUS_IN_PROCESS);
>
> Is there a thread running around in here someplace?
>
> Callina raw gcc intrinsic like this should really be avoided,
> especially since this isn't a performance path. Use pthreads.
>

The current implementation is synchronous.
For the usage of the neigh.c entry function, those guards could be deleted.

>> +static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
>> +{
>> +	return ((a->s6_addr32[0] | a->s6_addr32[1]) |
>> +		(a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL ||
>> +		/* IPv4 encoded multicast addresses */
>> +	       (a->s6_addr32[0]  == htonl(0xff0e0000) &&
>> +		((a->s6_addr32[1] |
>> +		  (a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL));
>> +}
>
> Don't duplicate IN6_IS_ADDR_V4MAPPED
>

I wasn't aware the macro exists, I'll use it.

>> +	if (err) {
>> +		fprintf(stderr, PFX "ibv_create_ah failed to query
>> sgid.\n");
>
> Again, system libraries should not be printing on error, use errno
> properly.
>

I'll remove all prints.

>> +#else
>> +	return -ENOIMPL;
>
> Is that a typo of ENOSYS?
>
> Be sure you have tested compiling with each and every variation of the
> #ifdefs added.
>
> Jason
>

Thanks for the comments and the thorough code review Jason.
I'll send a fixed version.

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]             ` <53F9EDCD.9060105-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-08-25 18:33               ` Jason Gunthorpe
       [not found]                 ` <20140825183325.GC1298-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2014-08-25 18:33 UTC (permalink / raw)
  To: Matan Barak
  Cc: Roland Dreier, Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, Or Gerlitz

On Sun, Aug 24, 2014 at 04:51:09PM +0300, Matan Barak wrote:

> Do you refer to flowinfo?
> If so, it isn't initialized, but should be.
> 0 should be good enough, so I'll memset the whole struct.

K
 
> >What does this comment refer to? MACs should never be parsed out of
> >IPv6 addresses.
> 
> GID0 should always be valid, even if no IPv4 and IPv6 are configured.
> IPv6 link-local and multicast addresses should be parsed from the
> IPv6 address.

Might want to just clarify this is dealing with a GID, not a IPv6
address - they are not interchangable terms. I'm not sure how you are
getting a GID from a neigh though?

> >>+	timer_fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC);
> >>+	if (-1 == timer_fd) {
> >>+		print_err("Couldn't create timer\n");
> >>+		return timer_fd;
> >>+	}
> >
> >The use of timerfd will impact the minimum OS version, have you
> >checked this is OK? Does RHEL5 still work?

> It was added in linux v2.6.25. I think that an API that's more than
> 6.5 years old is valid.

RHEL5 is using 2.6.18 as their base kernel. You should at least
consult with the OFED people to determine if this is a problem for
them.

 > 
> >>+	while (1) {
> >>+		FD_ZERO(&fdset);
> >>+		FD_SET(fd, &fdset);
> >>+		FD_SET(timer_fd, &fdset);
> >>+
> >>+		/* wait for an incoming message on the netlink socket */
> >>+		ret = select(nfds, &fdset, NULL, NULL, NULL);
> >
> >poll is a better choice here, it would also be fairly simple to remove
> >timerfd by using the timeout arg.
>
> The purpose is to have a timeout on the whole process and not per loop.

Yes, I know - it is not too hard to use poll to do that, you need to
call clock_gettime(CLOCK_MONOTONIC) to compute the relative timeout.
 
> >>+					if (sendto(sock_fd, buff, sizeof(buff),
> >>+						   0, &addr_dst.sktaddr.s,
> >>+						   addr_dst.len) < 0)
> >>+						print_err("Failed to send "
> >>+							  "packet while waiting"
> >>+							  " for
> >>events\n");
> >
> >If the earlier sendto needs to be protected by a loop looking at
> >EADDRNOTAVAIL then so does this one.
> >
> 
> If the sendto failed, we would just retry it in the next loop.

Except the error handing flow is a bit different depending on
EADDRNOTAVAIL.

> >I'm also not sure all this looping and complexity is needed... Why
> >override the kernel timers for ND?
> >
> >I would think you'd check the ND table, if there is no good entry then
> >do the send, and then watch the ND table for a FAILED/REACHABLE
> >result. The kernel timers will always transition through INCOMPLETE to
> >one of those terminal states.
> >
> >No need for more timers and retry and things, the kernel already retried.
> 
> Watching just for the ND event is problematic.
> If the user sent a packet just after we looked at the ND and before
> we waited for an event, we would probably just fail without finding
> any neighbor. 

Hmm, I don't think there is a race like that.

You start listening for ND updates, send your packet, and wait for a
transition into FAILED or REACHABLE. Whenever you send a packet a
FAILED entry will transition back to INCOMPLETE.

If someone else already has sent a packet then you start in the
INCOMPLETE state, and are still waiting to transition to FAILED or
REACHABLE.

> Regarding the retries, because we use UD and switches could
> sometimes take a few seconds to learn the network, I chose to do a
> few reties before giving up.

The kernel already has retires and timers for ND.

> >>+
> >>+	last_status = __sync_fetch_and_or(
> >>+			&neigh_handler->neigh_status,
> >>+			GET_NEIGH_STATUS_IN_PROCESS);
> >
> >Is there a thread running around in here someplace?
> >
> >Callina raw gcc intrinsic like this should really be avoided,
> >especially since this isn't a performance path. Use pthreads.
> >
> 
> The current implementation is synchronous.
> For the usage of the neigh.c entry function, those guards could be deleted.

K

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]                 ` <20140825183325.GC1298-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2014-08-25 19:33                   ` Doug Ledford
       [not found]                     ` <5C3BB7B1-07E6-431F-98E5-13543AAA897D-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Doug Ledford @ 2014-08-25 19:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, Roland Dreier, Yishai Hadas,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz

> On Aug 25, 2014, at 1:33 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> 
>>>> +    timer_fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC);
>>>> +    if (-1 == timer_fd) {
>>>> +        print_err("Couldn't create timer\n");
>>>> +        return timer_fd;
>>>> +    }
>>> 
>>> The use of timerfd will impact the minimum OS version, have you
>>> checked this is OK? Does RHEL5 still work?
> 
>> It was added in linux v2.6.25. I think that an API that's more than
>> 6.5 years old is valid.
> 
> RHEL5 is using 2.6.18 as their base kernel. You should at least
> consult with the OFED people to determine if this is a problem for
> them.

Please don't.  This code should not be changed for something as ancient as rhel5.

Sent from my iPad

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]                     ` <5C3BB7B1-07E6-431F-98E5-13543AAA897D-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-08-26 12:18                       ` Or Gerlitz
       [not found]                         ` <53FC7B1D.6050501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2014-08-26 12:18 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Matan Barak, Roland Dreier, Yishai Hadas,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 25/08/2014 22:33, Doug Ledford wrote:
>> On Aug 25, 2014, at 1:33 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>>
>>>>> +    timer_fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC);
>>>>> +    if (-1 == timer_fd) {
>>>>> +        print_err("Couldn't create timer\n");
>>>>> +        return timer_fd;
>>>>> +    }
>>>>
>>>> The use of timerfd will impact the minimum OS version, have you
>>>> checked this is OK? Does RHEL5 still work?
>>
>>> It was added in linux v2.6.25. I think that an API that's more than
>>> 6.5 years old is valid.
>>
>> RHEL5 is using 2.6.18 as their base kernel. You should at least
>> consult with the OFED people to determine if this is a problem for
>> them.
>
> Please don't.  This code should not be changed for something as ancient as rhel5.


Indeed. Telling people to avoid using constructs/mechanisms ~6-7 years 
after they were introduced isn't something we want nor need to do.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]                         ` <53FC7B1D.6050501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-08-26 16:08                           ` Jason Gunthorpe
       [not found]                             ` <20140826160853.GA31127-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2014-08-26 16:08 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, Matan Barak, Roland Dreier, Yishai Hadas,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 26, 2014 at 03:18:37PM +0300, Or Gerlitz wrote:
> On 25/08/2014 22:33, Doug Ledford wrote:
> >>On Aug 25, 2014, at 1:33 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> >>
> >>>>>+    timer_fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC);
> >>>>>+    if (-1 == timer_fd) {
> >>>>>+        print_err("Couldn't create timer\n");
> >>>>>+        return timer_fd;
> >>>>>+    }
> >>>>
> >>>>The use of timerfd will impact the minimum OS version, have you
> >>>>checked this is OK? Does RHEL5 still work?
> >>
> >>>It was added in linux v2.6.25. I think that an API that's more than
> >>>6.5 years old is valid.
> >>
> >>RHEL5 is using 2.6.18 as their base kernel. You should at least
> >>consult with the OFED people to determine if this is a problem for
> >>them.
> >
> >Please don't.  This code should not be changed for something as ancient as rhel5.
> 
> Indeed. Telling people to avoid using constructs/mechanisms ~6-7
> years after they were introduced isn't something we want nor need to
> do.

I looked myself and it looks like OFED has dropped support for these
old distros so there isn't any problem.

However, I still think this use of timerfd is fairly gratuitous, and
looking closer, causes little bugs:

+		if (timerfd_settime(timer_fd, 0, &timer_time, NULL)) {
+		   			      print_err("Couldn't set timer\n");
+			return -1;
                     ^^^^^^^^^^^^^^
leaks timer_fd


Alos, I noticed:

+     	/* wait for an incoming message on the netlink socket */
+	   ret = select(nfds, &fdset, NULL, NULL, NULL);
+
+		if (ret) {

Fails to detect error return from select.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]                             ` <20140826160853.GA31127-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2014-08-28  9:48                               ` Devesh Sharma
       [not found]                                 ` <EE7902D3F51F404C82415C4803930ACD3FE48403-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
  2014-08-28 16:27                               ` Matan Barak
  1 sibling, 1 reply; 16+ messages in thread
From: Devesh Sharma @ 2014-08-28  9:48 UTC (permalink / raw)
  To: Jason Gunthorpe, Or Gerlitz
  Cc: Doug Ledford, Matan Barak, Roland Dreier, Yishai Hadas,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Matan,

I have been watching this thread for quite some time. I have a
Basic question, do you think ib_uverbs_create_ah() in uverbs_cmd.c
Should resolve to l2 address? Presently it is not calling rdma_addr_find_dmac_by_grh(), am I missing something here?

-Regards
 Devesh

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
> Sent: Tuesday, August 26, 2014 9:39 PM
> To: Or Gerlitz
> Cc: Doug Ledford; Matan Barak; Roland Dreier; Yishai Hadas; linux-
> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD
> QPs Eth L2 resolution
> 
> On Tue, Aug 26, 2014 at 03:18:37PM +0300, Or Gerlitz wrote:
> > On 25/08/2014 22:33, Doug Ledford wrote:
> > >>On Aug 25, 2014, at 1:33 PM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > >>
> > >>>>>+    timer_fd = timerfd_create(CLOCK_MONOTONIC,
> TFD_NONBLOCK | TFD_CLOEXEC);
> > >>>>>+    if (-1 == timer_fd) {
> > >>>>>+        print_err("Couldn't create timer\n");
> > >>>>>+        return timer_fd;
> > >>>>>+    }
> > >>>>
> > >>>>The use of timerfd will impact the minimum OS version, have you
> > >>>>checked this is OK? Does RHEL5 still work?
> > >>
> > >>>It was added in linux v2.6.25. I think that an API that's more than
> > >>>6.5 years old is valid.
> > >>
> > >>RHEL5 is using 2.6.18 as their base kernel. You should at least
> > >>consult with the OFED people to determine if this is a problem for
> > >>them.
> > >
> > >Please don't.  This code should not be changed for something as ancient
> as rhel5.
> >
> > Indeed. Telling people to avoid using constructs/mechanisms ~6-7 years
> > after they were introduced isn't something we want nor need to do.
> 
> I looked myself and it looks like OFED has dropped support for these old
> distros so there isn't any problem.
> 
> However, I still think this use of timerfd is fairly gratuitous, and looking closer,
> causes little bugs:
> 
> +		if (timerfd_settime(timer_fd, 0, &timer_time, NULL)) {
> +		   			      print_err("Couldn't set timer\n");
> +			return -1;
>                      ^^^^^^^^^^^^^^
> leaks timer_fd
> 
> 
> Alos, I noticed:
> 
> +     	/* wait for an incoming message on the netlink socket */
> +	   ret = select(nfds, &fdset, NULL, NULL, NULL);
> +
> +		if (ret) {
> 
> Fails to detect error return from select.
> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]                                 ` <EE7902D3F51F404C82415C4803930ACD3FE48403-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2014-08-28 16:20                                   ` Matan Barak
       [not found]                                     ` <53FF56DF.1050207-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Matan Barak @ 2014-08-28 16:20 UTC (permalink / raw)
  To: Devesh Sharma, Jason Gunthorpe, Or Gerlitz
  Cc: Doug Ledford, Roland Dreier, Yishai Hadas,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 28/8/2014 12:48 PM, Devesh Sharma wrote:
> Hi Matan,
>
> I have been watching this thread for quite some time. I have a
> Basic question, do you think ib_uverbs_create_ah() in uverbs_cmd.c
> Should resolve to l2 address? Presently it is not calling rdma_addr_find_dmac_by_grh(), am I missing something here?
>
> -Regards
>   Devesh
>

Hi Devesh,

Some vendors don't call ib_uverbs_create_ah and do all this creation in 
userspace only. It's true that it might be a lot easier to do that 
resolution in kernel, but it could create dependency of new versions of 
libibverbs and the provider library tn new kernels only.
I would like to avoid creating such a dependency.

Matan

>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
>> Sent: Tuesday, August 26, 2014 9:39 PM
>> To: Or Gerlitz
>> Cc: Doug Ledford; Matan Barak; Roland Dreier; Yishai Hadas; linux-
>> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD
>> QPs Eth L2 resolution
>>
>> On Tue, Aug 26, 2014 at 03:18:37PM +0300, Or Gerlitz wrote:
>>> On 25/08/2014 22:33, Doug Ledford wrote:
>>>>> On Aug 25, 2014, at 1:33 PM, Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>>>>>
>>>>>>>> +    timer_fd = timerfd_create(CLOCK_MONOTONIC,
>> TFD_NONBLOCK | TFD_CLOEXEC);
>>>>>>>> +    if (-1 == timer_fd) {
>>>>>>>> +        print_err("Couldn't create timer\n");
>>>>>>>> +        return timer_fd;
>>>>>>>> +    }
>>>>>>>
>>>>>>> The use of timerfd will impact the minimum OS version, have you
>>>>>>> checked this is OK? Does RHEL5 still work?
>>>>>
>>>>>> It was added in linux v2.6.25. I think that an API that's more than
>>>>>> 6.5 years old is valid.
>>>>>
>>>>> RHEL5 is using 2.6.18 as their base kernel. You should at least
>>>>> consult with the OFED people to determine if this is a problem for
>>>>> them.
>>>>
>>>> Please don't.  This code should not be changed for something as ancient
>> as rhel5.
>>>
>>> Indeed. Telling people to avoid using constructs/mechanisms ~6-7 years
>>> after they were introduced isn't something we want nor need to do.
>>
>> I looked myself and it looks like OFED has dropped support for these old
>> distros so there isn't any problem.
>>
>> However, I still think this use of timerfd is fairly gratuitous, and looking closer,
>> causes little bugs:
>>
>> +		if (timerfd_settime(timer_fd, 0, &timer_time, NULL)) {
>> +		   			      print_err("Couldn't set timer\n");
>> +			return -1;
>>                       ^^^^^^^^^^^^^^
>> leaks timer_fd
>>
>>
>> Alos, I noticed:
>>
>> +     	/* wait for an incoming message on the netlink socket */
>> +	   ret = select(nfds, &fdset, NULL, NULL, NULL);
>> +
>> +		if (ret) {
>>
>> Fails to detect error return from select.
>>
>> Jason
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
>> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]                             ` <20140826160853.GA31127-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2014-08-28  9:48                               ` Devesh Sharma
@ 2014-08-28 16:27                               ` Matan Barak
  1 sibling, 0 replies; 16+ messages in thread
From: Matan Barak @ 2014-08-28 16:27 UTC (permalink / raw)
  To: Jason Gunthorpe, Or Gerlitz
  Cc: Doug Ledford, Roland Dreier, Yishai Hadas,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 26/8/2014 7:08 PM, Jason Gunthorpe wrote:
> On Tue, Aug 26, 2014 at 03:18:37PM +0300, Or Gerlitz wrote:
>> On 25/08/2014 22:33, Doug Ledford wrote:
>>>> On Aug 25, 2014, at 1:33 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>>>>
>>>>>>> +    timer_fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC);
>>>>>>> +    if (-1 == timer_fd) {
>>>>>>> +        print_err("Couldn't create timer\n");
>>>>>>> +        return timer_fd;
>>>>>>> +    }
>>>>>>
>>>>>> The use of timerfd will impact the minimum OS version, have you
>>>>>> checked this is OK? Does RHEL5 still work?
>>>>
>>>>> It was added in linux v2.6.25. I think that an API that's more than
>>>>> 6.5 years old is valid.
>>>>
>>>> RHEL5 is using 2.6.18 as their base kernel. You should at least
>>>> consult with the OFED people to determine if this is a problem for
>>>> them.
>>>
>>> Please don't.  This code should not be changed for something as ancient as rhel5.
>>
>> Indeed. Telling people to avoid using constructs/mechanisms ~6-7
>> years after they were introduced isn't something we want nor need to
>> do.
>
> I looked myself and it looks like OFED has dropped support for these
> old distros so there isn't any problem.
>
> However, I still think this use of timerfd is fairly gratuitous, and
> looking closer, causes little bugs:
>
> +		if (timerfd_settime(timer_fd, 0, &timer_time, NULL)) {
> +		   			      print_err("Couldn't set timer\n");
> +			return -1;
>                       ^^^^^^^^^^^^^^
> leaks timer_fd
>
>
> Alos, I noticed:
>
> +     	/* wait for an incoming message on the netlink socket */
> +	   ret = select(nfds, &fdset, NULL, NULL, NULL);
> +
> +		if (ret) {
>
> Fails to detect error return from select.
>
> Jason
>

Thanks, I'll fix those issues too.

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]                                     ` <53FF56DF.1050207-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-08-28 17:48                                       ` Devesh Sharma
       [not found]                                         ` <EE7902D3F51F404C82415C4803930ACD3FE48637-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Devesh Sharma @ 2014-08-28 17:48 UTC (permalink / raw)
  To: Matan Barak, Jason Gunthorpe, Or Gerlitz
  Cc: Doug Ledford, Roland Dreier, Yishai Hadas,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Matan,

Thanks for quick response, my further comments are inline below:

> -----Original Message-----
> From: Matan Barak [mailto:matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org]
> Sent: Thursday, August 28, 2014 9:51 PM
> To: Devesh Sharma; Jason Gunthorpe; Or Gerlitz
> Cc: Doug Ledford; Roland Dreier; Yishai Hadas; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD
> QPs Eth L2 resolution
> 
> On 28/8/2014 12:48 PM, Devesh Sharma wrote:
> > Hi Matan,
> >
> > I have been watching this thread for quite some time. I have a Basic
> > question, do you think ib_uverbs_create_ah() in uverbs_cmd.c Should
> > resolve to l2 address? Presently it is not calling
> rdma_addr_find_dmac_by_grh(), am I missing something here?
> >
> > -Regards
> >   Devesh
> >
> 
> Hi Devesh,
> 
> Some vendors don't call ib_uverbs_create_ah and do all this creation in
> userspace only. It's true that it might be a lot easier to do that resolution in
> kernel, but it could create dependency of new versions of libibverbs and the
Which dependency you are specifying here, I see the scheme posted in this series adds dependency on libibverbs.
Do you anticipate modification even when uverbs interface is used?
> provider library tn new kernels only.
> I would like to avoid creating such a dependency.

Okay, got it, any suggestions for those who use uverbs interface.
I think another patch is needed for such vendors?

> 
> Matan
> 
> >> -----Original Message-----
> >> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> >> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
> >> Sent: Tuesday, August 26, 2014 9:39 PM
> >> To: Or Gerlitz
> >> Cc: Doug Ledford; Matan Barak; Roland Dreier; Yishai Hadas; linux-
> >> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Subject: Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE
> >> UD QPs Eth L2 resolution
> >>
> >> On Tue, Aug 26, 2014 at 03:18:37PM +0300, Or Gerlitz wrote:
> >>> On 25/08/2014 22:33, Doug Ledford wrote:
> >>>>> On Aug 25, 2014, at 1:33 PM, Jason Gunthorpe
> >> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> >>>>>
> >>>>>>>> +    timer_fd = timerfd_create(CLOCK_MONOTONIC,
> >> TFD_NONBLOCK | TFD_CLOEXEC);
> >>>>>>>> +    if (-1 == timer_fd) {
> >>>>>>>> +        print_err("Couldn't create timer\n");
> >>>>>>>> +        return timer_fd;
> >>>>>>>> +    }
> >>>>>>>
> >>>>>>> The use of timerfd will impact the minimum OS version, have you
> >>>>>>> checked this is OK? Does RHEL5 still work?
> >>>>>
> >>>>>> It was added in linux v2.6.25. I think that an API that's more
> >>>>>> than
> >>>>>> 6.5 years old is valid.
> >>>>>
> >>>>> RHEL5 is using 2.6.18 as their base kernel. You should at least
> >>>>> consult with the OFED people to determine if this is a problem for
> >>>>> them.
> >>>>
> >>>> Please don't.  This code should not be changed for something as
> >>>> ancient
> >> as rhel5.
> >>>
> >>> Indeed. Telling people to avoid using constructs/mechanisms ~6-7
> >>> years after they were introduced isn't something we want nor need to
> do.
> >>
> >> I looked myself and it looks like OFED has dropped support for these
> >> old distros so there isn't any problem.
> >>
> >> However, I still think this use of timerfd is fairly gratuitous, and
> >> looking closer, causes little bugs:
> >>
> >> +		if (timerfd_settime(timer_fd, 0, &timer_time, NULL)) {
> >> +		   			      print_err("Couldn't set timer\n");
> >> +			return -1;
> >>                       ^^^^^^^^^^^^^^
> >> leaks timer_fd
> >>
> >>
> >> Alos, I noticed:
> >>
> >> +     	/* wait for an incoming message on the netlink socket */
> >> +	   ret = select(nfds, &fdset, NULL, NULL, NULL);
> >> +
> >> +		if (ret) {
> >>
> >> Fails to detect error return from select.
> >>
> >> Jason
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More
> majordomo
> >> info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]                                         ` <EE7902D3F51F404C82415C4803930ACD3FE48637-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2014-08-31  8:08                                           ` Matan Barak
       [not found]                                             ` <5402D813.4020204-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Matan Barak @ 2014-08-31  8:08 UTC (permalink / raw)
  To: Devesh Sharma, Jason Gunthorpe, Or Gerlitz
  Cc: Doug Ledford, Roland Dreier, Yishai Hadas,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 28/8/2014 8:48 PM, Devesh Sharma wrote:
> Hi Matan,
>
> Thanks for quick response, my further comments are inline below:
>
>> -----Original Message-----
>> From: Matan Barak [mailto:matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org]
>> Sent: Thursday, August 28, 2014 9:51 PM
>> To: Devesh Sharma; Jason Gunthorpe; Or Gerlitz
>> Cc: Doug Ledford; Roland Dreier; Yishai Hadas; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD
>> QPs Eth L2 resolution
>>
>> On 28/8/2014 12:48 PM, Devesh Sharma wrote:
>>> Hi Matan,
>>>
>>> I have been watching this thread for quite some time. I have a Basic
>>> question, do you think ib_uverbs_create_ah() in uverbs_cmd.c Should
>>> resolve to l2 address? Presently it is not calling
>> rdma_addr_find_dmac_by_grh(), am I missing something here?
>>>
>>> -Regards
>>>    Devesh
>>>
>>
>> Hi Devesh,
>>
>> Some vendors don't call ib_uverbs_create_ah and do all this creation in
>> userspace only. It's true that it might be a lot easier to do that resolution in
>> kernel, but it could create dependency of new versions of libibverbs and the
> Which dependency you are specifying here, I see the scheme posted in this series adds dependency on libibverbs.
> Do you anticipate modification even when uverbs interface is used?

If we had resolved the address handle in kernel space, we would have 
created a dependency between the user libraries to and new version odf 
the uverbs module. The scheme posted here only adds a dependency between 
the provider's library to libibverbs.
Resolving in kernel space would still require creating a dependency on a 
new provider library - as currently some providers don't even call the 
kernel when a new address handle is created.

>> provider library tn new kernels only.
>> I would like to avoid creating such a dependency.
>
> Okay, got it, any suggestions for those who use uverbs interface.
> I think another patch is needed for such vendors?
>

There are 2 options here - if all the required information (including 
the MAC address) is contained in the address handle user-space 
structure, you could just use the proposed method. However, if you 
register some kind of an address handle with the hardware and then only 
use it via user-space, we'll need another patch for uverbs.

>>
>> Matan
>>
>>>> -----Original Message-----
>>>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>>>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
>>>> Sent: Tuesday, August 26, 2014 9:39 PM
>>>> To: Or Gerlitz
>>>> Cc: Doug Ledford; Matan Barak; Roland Dreier; Yishai Hadas; linux-
>>>> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> Subject: Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE
>>>> UD QPs Eth L2 resolution
>>>>
>>>> On Tue, Aug 26, 2014 at 03:18:37PM +0300, Or Gerlitz wrote:
>>>>> On 25/08/2014 22:33, Doug Ledford wrote:
>>>>>>> On Aug 25, 2014, at 1:33 PM, Jason Gunthorpe
>>>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>>>>>>>
>>>>>>>>>> +    timer_fd = timerfd_create(CLOCK_MONOTONIC,
>>>> TFD_NONBLOCK | TFD_CLOEXEC);
>>>>>>>>>> +    if (-1 == timer_fd) {
>>>>>>>>>> +        print_err("Couldn't create timer\n");
>>>>>>>>>> +        return timer_fd;
>>>>>>>>>> +    }
>>>>>>>>>
>>>>>>>>> The use of timerfd will impact the minimum OS version, have you
>>>>>>>>> checked this is OK? Does RHEL5 still work?
>>>>>>>
>>>>>>>> It was added in linux v2.6.25. I think that an API that's more
>>>>>>>> than
>>>>>>>> 6.5 years old is valid.
>>>>>>>
>>>>>>> RHEL5 is using 2.6.18 as their base kernel. You should at least
>>>>>>> consult with the OFED people to determine if this is a problem for
>>>>>>> them.
>>>>>>
>>>>>> Please don't.  This code should not be changed for something as
>>>>>> ancient
>>>> as rhel5.
>>>>>
>>>>> Indeed. Telling people to avoid using constructs/mechanisms ~6-7
>>>>> years after they were introduced isn't something we want nor need to
>> do.
>>>>
>>>> I looked myself and it looks like OFED has dropped support for these
>>>> old distros so there isn't any problem.
>>>>
>>>> However, I still think this use of timerfd is fairly gratuitous, and
>>>> looking closer, causes little bugs:
>>>>
>>>> +		if (timerfd_settime(timer_fd, 0, &timer_time, NULL)) {
>>>> +		   			      print_err("Couldn't set timer\n");
>>>> +			return -1;
>>>>                        ^^^^^^^^^^^^^^
>>>> leaks timer_fd
>>>>
>>>>
>>>> Alos, I noticed:
>>>>
>>>> +     	/* wait for an incoming message on the netlink socket */
>>>> +	   ret = select(nfds, &fdset, NULL, NULL, NULL);
>>>> +
>>>> +		if (ret) {
>>>>
>>>> Fails to detect error return from select.
>>>>
>>>> Jason
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>>>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More
>> majordomo
>>>> info at http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
       [not found]                                             ` <5402D813.4020204-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-09-01 11:56                                               ` Devesh Sharma
  0 siblings, 0 replies; 16+ messages in thread
From: Devesh Sharma @ 2014-09-01 11:56 UTC (permalink / raw)
  To: Matan Barak, Jason Gunthorpe, Or Gerlitz
  Cc: Doug Ledford, Roland Dreier, Yishai Hadas,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

> -----Original Message-----
> From: Matan Barak [mailto:matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org]
> Sent: Sunday, August 31, 2014 1:39 PM
> To: Devesh Sharma; Jason Gunthorpe; Or Gerlitz
> Cc: Doug Ledford; Roland Dreier; Yishai Hadas; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD
> QPs Eth L2 resolution
> 
> On 28/8/2014 8:48 PM, Devesh Sharma wrote:
> > Hi Matan,
> >
> > Thanks for quick response, my further comments are inline below:
> >
> >> -----Original Message-----
> >> From: Matan Barak [mailto:matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org]
> >> Sent: Thursday, August 28, 2014 9:51 PM
> >> To: Devesh Sharma; Jason Gunthorpe; Or Gerlitz
> >> Cc: Doug Ledford; Roland Dreier; Yishai Hadas;
> >> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Subject: Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE
> >> UD QPs Eth L2 resolution
> >>
> >> On 28/8/2014 12:48 PM, Devesh Sharma wrote:
> >>> Hi Matan,
> >>>
> >>> I have been watching this thread for quite some time. I have a Basic
> >>> question, do you think ib_uverbs_create_ah() in uverbs_cmd.c Should
> >>> resolve to l2 address? Presently it is not calling
> >> rdma_addr_find_dmac_by_grh(), am I missing something here?
> >>>
> >>> -Regards
> >>>    Devesh
> >>>
> >>
> >> Hi Devesh,
> >>
> >> Some vendors don't call ib_uverbs_create_ah and do all this creation
> >> in userspace only. It's true that it might be a lot easier to do that
> >> resolution in kernel, but it could create dependency of new versions
> >> of libibverbs and the
> > Which dependency you are specifying here, I see the scheme posted in this
> series adds dependency on libibverbs.
> > Do you anticipate modification even when uverbs interface is used?
> 
> If we had resolved the address handle in kernel space, we would have
> created a dependency between the user libraries to and new version odf the
> uverbs module. The scheme posted here only adds a dependency between
> the provider's library to libibverbs.
> Resolving in kernel space would still require creating a dependency on a new
> provider library - as currently some providers don't even call the kernel when
> a new address handle is created.
> 

Okay, got it.

> >> provider library tn new kernels only.
> >> I would like to avoid creating such a dependency.
> >
> > Okay, got it, any suggestions for those who use uverbs interface.
> > I think another patch is needed for such vendors?
> >
> 
> There are 2 options here - if all the required information (including the MAC
> address) is contained in the address handle user-space structure, you could
> just use the proposed method. However, if you register some kind of an
> address handle with the hardware and then only use it via user-space, we'll
> need another patch for uverbs.

Since, libocrdma does ah creation purely in kernel through uverbs interface, such patch is required to be posted.

> 
> >>
> >> Matan
> >>
> >>>> -----Original Message-----
> >>>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> >>>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
> >>>> Sent: Tuesday, August 26, 2014 9:39 PM
> >>>> To: Or Gerlitz
> >>>> Cc: Doug Ledford; Matan Barak; Roland Dreier; Yishai Hadas; linux-
> >>>> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >>>> Subject: Re: [PATCH libibverbs V5 2/2] Use neighbour lookup for
> >>>> RoCE UD QPs Eth L2 resolution
> >>>>
> >>>> On Tue, Aug 26, 2014 at 03:18:37PM +0300, Or Gerlitz wrote:
> >>>>> On 25/08/2014 22:33, Doug Ledford wrote:
> >>>>>>> On Aug 25, 2014, at 1:33 PM, Jason Gunthorpe
> >>>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> >>>>>>>
> >>>>>>>>>> +    timer_fd = timerfd_create(CLOCK_MONOTONIC,
> >>>> TFD_NONBLOCK | TFD_CLOEXEC);
> >>>>>>>>>> +    if (-1 == timer_fd) {
> >>>>>>>>>> +        print_err("Couldn't create timer\n");
> >>>>>>>>>> +        return timer_fd;
> >>>>>>>>>> +    }
> >>>>>>>>>
> >>>>>>>>> The use of timerfd will impact the minimum OS version, have
> >>>>>>>>> you checked this is OK? Does RHEL5 still work?
> >>>>>>>
> >>>>>>>> It was added in linux v2.6.25. I think that an API that's more
> >>>>>>>> than
> >>>>>>>> 6.5 years old is valid.
> >>>>>>>
> >>>>>>> RHEL5 is using 2.6.18 as their base kernel. You should at least
> >>>>>>> consult with the OFED people to determine if this is a problem
> >>>>>>> for them.
> >>>>>>
> >>>>>> Please don't.  This code should not be changed for something as
> >>>>>> ancient
> >>>> as rhel5.
> >>>>>
> >>>>> Indeed. Telling people to avoid using constructs/mechanisms ~6-7
> >>>>> years after they were introduced isn't something we want nor need
> >>>>> to
> >> do.
> >>>>
> >>>> I looked myself and it looks like OFED has dropped support for
> >>>> these old distros so there isn't any problem.
> >>>>
> >>>> However, I still think this use of timerfd is fairly gratuitous,
> >>>> and looking closer, causes little bugs:
> >>>>
> >>>> +		if (timerfd_settime(timer_fd, 0, &timer_time, NULL)) {
> >>>> +		   			      print_err("Couldn't set timer\n");
> >>>> +			return -1;
> >>>>                        ^^^^^^^^^^^^^^ leaks timer_fd
> >>>>
> >>>>
> >>>> Alos, I noticed:
> >>>>
> >>>> +     	/* wait for an incoming message on the netlink socket */
> >>>> +	   ret = select(nfds, &fdset, NULL, NULL, NULL);
> >>>> +
> >>>> +		if (ret) {
> >>>>
> >>>> Fails to detect error return from select.
> >>>>
> >>>> Jason
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >>>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More
> >> majordomo
> >>>> info at http://vger.kernel.org/majordomo-info.html
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-09-01 11:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-20  6:49 [PATCH libibverbs V5 0/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution Matan Barak
     [not found] ` <1408517381-17523-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-08-20  6:49   ` [PATCH libibverbs V5 1/2] Add ibv_port_cap_flags Matan Barak
2014-08-20  6:49   ` [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution Matan Barak
     [not found]     ` <1408517381-17523-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-08-20 17:01       ` Jason Gunthorpe
     [not found]         ` <20140820170142.GC12605-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2014-08-24 13:51           ` Matan Barak
     [not found]             ` <53F9EDCD.9060105-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-08-25 18:33               ` Jason Gunthorpe
     [not found]                 ` <20140825183325.GC1298-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2014-08-25 19:33                   ` Doug Ledford
     [not found]                     ` <5C3BB7B1-07E6-431F-98E5-13543AAA897D-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-26 12:18                       ` Or Gerlitz
     [not found]                         ` <53FC7B1D.6050501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-08-26 16:08                           ` Jason Gunthorpe
     [not found]                             ` <20140826160853.GA31127-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2014-08-28  9:48                               ` Devesh Sharma
     [not found]                                 ` <EE7902D3F51F404C82415C4803930ACD3FE48403-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-08-28 16:20                                   ` Matan Barak
     [not found]                                     ` <53FF56DF.1050207-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-08-28 17:48                                       ` Devesh Sharma
     [not found]                                         ` <EE7902D3F51F404C82415C4803930ACD3FE48637-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-08-31  8:08                                           ` Matan Barak
     [not found]                                             ` <5402D813.4020204-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-09-01 11:56                                               ` Devesh Sharma
2014-08-28 16:27                               ` Matan Barak
2014-08-20 16:26   ` [PATCH libibverbs V5 0/2] " Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.