All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] Sending kernel pathrecord query to user cache server
@ 2015-06-04 12:26 kaike.wan-ral2JQCrhuEAvxtiuMwx3w
       [not found] ` <1433420809-13529-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-04 12:26 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan

From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

A SA cache is undeniably critical for fabric scalability and performance.
In user space, the ibacm application provides a good example of pathrecord
cache for address and route resolution. With the recent implementation of
the provider architecture, ibacm offers more extensibility as a SA cache.
In kernel, ipoib implements its own small cache for pathrecords, which is
however not available for general use. Furthermore, the implementation of
a SA cache in user space offers better flexibility, larger capacity, and
more robustness for the system.

In this patch series, a mechanism is implemented to allow ib_sa to
send pathrecord query to a user application (eg ibacm) through netlink.
Potentially, this mechanism could be easily extended to other SA queries.

With a customized test implemented in rdma_cm module (not included in this
series), it was shown that the time to retrieve 1 million pathrecords
dropped from 46660 jiffies (46.66 seconds) to 16119 jiffies (or 16.119
seconds) on a two-node system, a reduction of more than 60%.

Changes since v2:
- Redesigned the communication protocol between the kernel and user space
  application. Instead of the MAD packet format, the new protocol uses 
  netlink message header and attributes to exchange request and 
  response between the kernel and user space.The design was described
  here:
  http://www.spinics.net/lists/linux-rdma/msg25621.html

Changes since v1:
- Move kzalloc changes into a separate patch (Patch 3).
- Remove redundant include line (Patch 4). 
- Rename struct rdma_nl_resp_msg as structure ib_nl_resp_msg (Patch 4).

Kaike Wan (4):
  IB/netlink: Add defines for local service requests through netlink
  IB/core: Check the presence of netlink multicast group listeners
  IB/sa: Allocate SA query with kzalloc
  IB/sa: Route SA pathrecord query through netlink

 drivers/infiniband/core/netlink.c  |    8 +
 drivers/infiniband/core/sa_query.c |  496 +++++++++++++++++++++++++++++++++++-
 include/rdma/rdma_netlink.h        |    7 +
 include/uapi/rdma/rdma_netlink.h   |   56 ++++
 4 files changed, 562 insertions(+), 5 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/4] IB/netlink: Add defines for local service requests through netlink
       [not found] ` <1433420809-13529-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-06-04 12:26   ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  2015-06-04 12:26   ` [PATCH v3 2/4] IB/core: Check the presence of netlink multicast group listeners kaike.wan-ral2JQCrhuEAvxtiuMwx3w
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-04 12:26 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny

From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This patch adds netlink defines for SA client, local service group, local
service operations, and related attributes.

Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 include/uapi/rdma/rdma_netlink.h |   56 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 6e4bb42..bc30b90 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -7,12 +7,14 @@ enum {
 	RDMA_NL_RDMA_CM = 1,
 	RDMA_NL_NES,
 	RDMA_NL_C4IW,
+	RDMA_NL_SA,
 	RDMA_NL_NUM_CLIENTS
 };
 
 enum {
 	RDMA_NL_GROUP_CM = 1,
 	RDMA_NL_GROUP_IWPM,
+	RDMA_NL_GROUP_LS,
 	RDMA_NL_NUM_GROUPS
 };
 
@@ -128,5 +130,59 @@ enum {
 	IWPM_NLA_ERR_MAX
 };
 
+/* Local service group opcodes */
+enum {
+	RDMA_NL_LS_OP_RESOLVE = 0,
+	RDMA_NL_LS_OP_SET_TIMEOUT,
+	RDMA_NL_LS_NUM_OPS
+};
+
+/* Local service netlink message flags */
+#define RDMA_NL_LS_F_OK		0x0100	/* Success response */
+#define RDMA_NL_LS_F_ERR	0x0200	/* Failed response */
+
+/* Local service attribute type */
+enum {
+	LS_NLA_TYPE_STATUS = 0,
+	LS_NLA_TYPE_PATH_RECORD,
+	LS_NLA_TYPE_TIMEOUT,
+	LS_NLA_TYPE_MAX
+};
+
+/* Local service status attribute */
+enum {
+	LS_NLA_STATUS_SUCCESS = 0,
+	LS_NLA_STATUS_EINVAL,
+	LS_NLA_STATUS_ENODATA,
+	LS_NLA_STATUS_MAX
+};
+
+struct rdma_nla_ls_status {
+	__u32		status;
+};
+
+/* Local service pathrecord attribute */
+/* Flags for pathrecord */
+enum {
+	LS_NLA_PATH_F_GMP		= 1,
+	LS_NLA_PATH_F_PRIMARY		= (1<<1),
+	LS_NLA_PATH_F_ALTERNATE		= (1<<2),
+	LS_NLA_PATH_F_OUTBOUND		= (1<<3),
+	LS_NLA_PATH_F_INBOUND		= (1<<4),
+	LS_NLA_PATH_F_INBOUND_REVERSE	= (1<<5),
+	LS_NLA_PATH_F_BIDIRECTIONAL	= (LS_NLA_PATH_F_OUTBOUND |
+					  LS_NLA_PATH_F_INBOUND_REVERSE),
+	LS_NLA_PATH_F_USER		= (1<<6)
+};
+
+struct rdma_nla_ls_path_rec {
+	__u32	flags;
+	__u32	path_rec[0];
+};
+
+/* Local service timeout attribute */
+struct rdma_nla_ls_timeout {
+	__u32		timeout;
+};
 
 #endif /* _UAPI_RDMA_NETLINK_H */
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] IB/core: Check the presence of netlink multicast group listeners
       [not found] ` <1433420809-13529-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-06-04 12:26   ` [PATCH v3 1/4] IB/netlink: Add defines for local service requests through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-04 12:26   ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  2015-06-04 12:26   ` [PATCH v3 3/4] IB/sa: Allocate SA query with kzalloc kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  2015-06-04 12:26   ` [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  3 siblings, 0 replies; 14+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-04 12:26 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny

From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This patch adds a function to check if listeners for a netlink multicast
group are present.

Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/netlink.c |    8 ++++++++
 include/rdma/rdma_netlink.h       |    7 +++++++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c
index 23dd5a5..e0fc913 100644
--- a/drivers/infiniband/core/netlink.c
+++ b/drivers/infiniband/core/netlink.c
@@ -49,6 +49,14 @@ static DEFINE_MUTEX(ibnl_mutex);
 static struct sock *nls;
 static LIST_HEAD(client_list);
 
+int ibnl_chk_listeners(unsigned int group)
+{
+	if (netlink_has_listeners(nls, group) == 0)
+		return -1;
+	return 0;
+}
+EXPORT_SYMBOL(ibnl_chk_listeners);
+
 int ibnl_add_client(int index, int nops,
 		    const struct ibnl_client_cbs cb_table[])
 {
diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
index 0790882..5852661 100644
--- a/include/rdma/rdma_netlink.h
+++ b/include/rdma/rdma_netlink.h
@@ -77,4 +77,11 @@ int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh,
 int ibnl_multicast(struct sk_buff *skb, struct nlmsghdr *nlh,
 			unsigned int group, gfp_t flags);
 
+/**
+ * Check if there are any listeners to the netlink group
+ * @group: the netlink group ID
+ * Returns 0 on success or a negative for no listeners.
+ */
+int ibnl_chk_listeners(unsigned int group);
+
 #endif /* _RDMA_NETLINK_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/4] IB/sa: Allocate SA query with kzalloc
       [not found] ` <1433420809-13529-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-06-04 12:26   ` [PATCH v3 1/4] IB/netlink: Add defines for local service requests through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  2015-06-04 12:26   ` [PATCH v3 2/4] IB/core: Check the presence of netlink multicast group listeners kaike.wan-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-04 12:26   ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  2015-06-04 12:26   ` [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  3 siblings, 0 replies; 14+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-04 12:26 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny

From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Replace kmalloc with kzalloc so that all uninitialized fields in SA query
will be zero-ed out to avoid unintentional consequence. This prepares the
SA query structure to accept new fields in the future.

Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/sa_query.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index c38f030..17e1cf7 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -739,7 +739,7 @@ int ib_sa_path_rec_get(struct ib_sa_client *client,
 	port  = &sa_dev->port[port_num - sa_dev->start_port];
 	agent = port->agent;
 
-	query = kmalloc(sizeof *query, gfp_mask);
+	query = kzalloc(sizeof(*query), gfp_mask);
 	if (!query)
 		return -ENOMEM;
 
@@ -861,7 +861,7 @@ int ib_sa_service_rec_query(struct ib_sa_client *client,
 	    method != IB_SA_METHOD_DELETE)
 		return -EINVAL;
 
-	query = kmalloc(sizeof *query, gfp_mask);
+	query = kzalloc(sizeof(*query), gfp_mask);
 	if (!query)
 		return -ENOMEM;
 
@@ -953,7 +953,7 @@ int ib_sa_mcmember_rec_query(struct ib_sa_client *client,
 	port  = &sa_dev->port[port_num - sa_dev->start_port];
 	agent = port->agent;
 
-	query = kmalloc(sizeof *query, gfp_mask);
+	query = kzalloc(sizeof(*query), gfp_mask);
 	if (!query)
 		return -ENOMEM;
 
@@ -1050,7 +1050,7 @@ int ib_sa_guid_info_rec_query(struct ib_sa_client *client,
 	port  = &sa_dev->port[port_num - sa_dev->start_port];
 	agent = port->agent;
 
-	query = kmalloc(sizeof *query, gfp_mask);
+	query = kzalloc(sizeof(*query), gfp_mask);
 	if (!query)
 		return -ENOMEM;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found] ` <1433420809-13529-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-06-04 12:26   ` [PATCH v3 3/4] IB/sa: Allocate SA query with kzalloc kaike.wan-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-04 12:26   ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1433420809-13529-5-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  3 siblings, 1 reply; 14+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-04 12:26 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny

From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This patch routes a SA pathrecord query to netlink first and processes the
response appropriately. If a failure is returned, the request will be sent
through IB. The decision whether to route the request to netlink first is
determined by the presence of a listener for the local service netlink
multicast group. If the user-space local service netlink multicast group
listener is not present, the request will be sent through IB, just like
what is currently being done.

Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/sa_query.c |  488 +++++++++++++++++++++++++++++++++++-
 1 files changed, 487 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 17e1cf7..205a419 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -45,12 +45,21 @@
 #include <uapi/linux/if_ether.h>
 #include <rdma/ib_pack.h>
 #include <rdma/ib_cache.h>
+#include <rdma/rdma_netlink.h>
+#include <net/netlink.h>
+#include <uapi/rdma/ib_user_sa.h>
+#include <rdma/ib_marshall.h>
 #include "sa.h"
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("InfiniBand subnet administration query support");
 MODULE_LICENSE("Dual BSD/GPL");
 
+#define IB_SA_LOCAL_SVC_TIMEOUT_MIN		100
+#define IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT		2000
+#define IB_SA_LOCAL_SVC_TIMEOUT_MAX		200000
+static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT;
+
 struct ib_sa_sm_ah {
 	struct ib_ah        *ah;
 	struct kref          ref;
@@ -80,8 +89,25 @@ struct ib_sa_query {
 	struct ib_mad_send_buf *mad_buf;
 	struct ib_sa_sm_ah     *sm_ah;
 	int			id;
+	u32			flags;
+	void			*input;
 };
 
+#define IB_SA_ENABLE_LOCAL_SERVICE	0x00000001
+#define IB_SA_CANCEL			0x00000002
+
+#define IB_SA_LOCAL_SVC_ENABLED(query) \
+	((query)->flags & IB_SA_ENABLE_LOCAL_SERVICE)
+#define IB_SA_ENABLE_LOCAL_SVC(query) \
+	((query)->flags |= IB_SA_ENABLE_LOCAL_SERVICE)
+#define IB_SA_DISABLE_LOCAL_SVC(query) \
+	((query)->flags &= ~IB_SA_ENABLE_LOCAL_SERVICE)
+
+#define IB_SA_QUERY_CANCELLED(query) \
+	((query)->flags & IB_SA_CANCEL)
+#define IB_SA_CANCEL_QUERY(query) \
+	((query)->flags |= IB_SA_CANCEL)
+
 struct ib_sa_service_query {
 	void (*callback)(int, struct ib_sa_service_rec *, void *);
 	void *context;
@@ -106,6 +132,26 @@ struct ib_sa_mcmember_query {
 	struct ib_sa_query sa_query;
 };
 
+struct ib_nl_request_info {
+	struct list_head list;
+	u32 seq;
+	unsigned long timeout;
+	struct ib_sa_query *query;
+};
+
+struct ib_nl_attr_info {
+	u16 type;
+	u16 len;	/* Attribute payload length */
+	void *input;
+	void (*set_attr)(struct sk_buff *skb, struct ib_nl_attr_info *info);
+};
+
+static LIST_HEAD(ib_nl_request_list);
+static DEFINE_SPINLOCK(ib_nl_request_lock);
+static atomic_t ib_nl_sa_request_seq;
+static struct workqueue_struct *ib_nl_wq;
+static struct delayed_work ib_nl_timed_work;
+
 static void ib_sa_add_one(struct ib_device *device);
 static void ib_sa_remove_one(struct ib_device *device);
 
@@ -381,6 +427,405 @@ static const struct ib_field guidinfo_rec_table[] = {
 	  .size_bits    = 512 },
 };
 
+static int ib_nl_send_msg(int opcode, struct ib_nl_attr_info *attrs,
+			  int num_attrs, u32 seq)
+{
+	struct sk_buff *skb = NULL;
+	struct nlmsghdr *nlh;
+	void *data;
+	int ret = 0;
+	int i, len = 0;
+
+	for (i = 0; i < num_attrs; i++)
+		len += nla_total_size(attrs[i].len);
+
+	if (len <= 0)
+		return -EMSGSIZE;
+
+	skb = nlmsg_new(len, GFP_KERNEL);
+	if (!skb) {
+		pr_err("alloc failed ret=%d\n", ret);
+		return -ENOMEM;
+	}
+
+	/* Put nlmsg header only for now */
+	data = ibnl_put_msg(skb, &nlh, seq, 0, RDMA_NL_SA,
+			    opcode, GFP_KERNEL);
+	if (!data) {
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	/* Add attributes */
+	for (i = 0; i < num_attrs; i++)
+		attrs[i].set_attr(skb, &attrs[i]);
+
+	/* Repair the nlmsg header length */
+	nlmsg_end(skb, nlh);
+
+	ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, GFP_KERNEL);
+	if (!ret) {
+		ret = len;
+	} else {
+		if (ret != -ESRCH)
+			pr_err("ibnl_multicast failed l=%d, r=%d\n", len, ret);
+		ret = 0;
+	}
+	return ret;
+}
+
+static struct ib_nl_request_info *
+ib_nl_alloc_request(struct ib_sa_query *query)
+{
+	struct ib_nl_request_info *rinfo;
+
+	rinfo = kzalloc(sizeof(*rinfo), GFP_ATOMIC);
+	if (rinfo == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&rinfo->list);
+	rinfo->seq = (u32)atomic_inc_return(&ib_nl_sa_request_seq);
+	rinfo->query = query;
+
+	return rinfo;
+}
+
+static void ib_nl_set_path_rec_attr(struct sk_buff *skb,
+				    struct ib_nl_attr_info *info)
+{
+	struct nlattr *nla;
+	struct rdma_nla_ls_path_rec *nla_rec;
+
+	nla = (struct nlattr *) skb_put(skb, nla_total_size(info->len));
+	nla->nla_type = info->type;
+	nla->nla_len = nla_attr_size(info->len);
+	memset((unsigned char *) nla + nla->nla_len, 0, nla_padlen(info->len));
+
+	nla_rec = (struct rdma_nla_ls_path_rec *) nla_data(nla);
+	nla_rec->flags = LS_NLA_PATH_F_USER;
+
+	/*
+	 * We know that the input is of type struct ib_sa_path_rec while
+	 * the output is of type struct ib_user_path_rec.
+	 */
+	ib_copy_path_rec_to_user((struct ib_user_path_rec *) nla_rec->path_rec,
+				 (struct ib_sa_path_rec *) info->input);
+}
+
+static int ib_nl_send_request(struct ib_nl_request_info *rinfo)
+{
+	struct ib_nl_attr_info info;
+	int opcode;
+	struct ib_sa_mad *mad;
+	unsigned long flags;
+	unsigned long delay;
+	int ret;
+
+	mad = rinfo->query->mad_buf->mad;
+	switch (mad->mad_hdr.attr_id) {
+	case cpu_to_be16(IB_SA_ATTR_PATH_REC):
+		opcode = RDMA_NL_LS_OP_RESOLVE;
+		info.type = LS_NLA_TYPE_PATH_RECORD;
+		info.len = sizeof(struct rdma_nla_ls_path_rec) +
+			sizeof(struct ib_user_path_rec);
+		info.input = rinfo->query->input;
+		info.set_attr = ib_nl_set_path_rec_attr;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	spin_lock_irqsave(&ib_nl_request_lock, flags);
+	ret = ib_nl_send_msg(opcode, &info, 1, rinfo->seq);
+	if (ret <= 0) {
+		ret = -EIO;
+		goto request_out;
+	} else {
+		ret = 0;
+	}
+
+	delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
+	rinfo->timeout = delay + jiffies;
+	list_add_tail(&rinfo->list, &ib_nl_request_list);
+	/* Start the timeout if this is the only request */
+	if (ib_nl_request_list.next == &rinfo->list)
+		queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
+
+request_out:
+	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+
+	return ret;
+}
+
+static int ib_nl_make_request(struct ib_sa_query *query)
+{
+	struct ib_nl_request_info *rinfo;
+	int ret;
+
+	rinfo = ib_nl_alloc_request(query);
+	if (IS_ERR(rinfo))
+		return -ENOMEM;
+
+	ret = ib_nl_send_request(rinfo);
+	if (ret)
+		kfree(rinfo);
+
+	return ret;
+}
+
+static int ib_nl_cancel_request(struct ib_sa_query *query)
+{
+	unsigned long flags;
+	struct ib_nl_request_info *rinfo;
+	int found = 0;
+
+	spin_lock_irqsave(&ib_nl_request_lock, flags);
+	list_for_each_entry(rinfo, &ib_nl_request_list, list) {
+		/* Let the timeout to take care of the callback */
+		if (query == rinfo->query) {
+			IB_SA_CANCEL_QUERY(query);
+			rinfo->timeout = jiffies;
+			list_move(&rinfo->list, &ib_nl_request_list);
+			found = 1;
+			mod_delayed_work(ib_nl_wq, &ib_nl_timed_work, 1);
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+
+	return found;
+}
+
+static void send_handler(struct ib_mad_agent *agent,
+			 struct ib_mad_send_wc *mad_send_wc);
+
+static void ib_nl_process_good_resolve_rsp(struct ib_sa_query *query,
+					   const struct nlmsghdr *nlh)
+{
+	struct ib_mad_send_wc mad_send_wc;
+	struct ib_sa_mad *mad = NULL;
+	const struct nlattr *attr;
+	struct rdma_nla_ls_path_rec *rec;
+	struct ib_user_path_rec *user_rec;
+	struct ib_sa_path_rec sa_rec;
+	struct ib_sa_path_query *path_query =
+		container_of(query, struct ib_sa_path_query, sa_query);
+
+	if (query->callback) {
+		attr = (const struct nlattr *) nlmsg_data(nlh);
+		rec = (struct rdma_nla_ls_path_rec *) nla_data(attr);
+		if (rec->flags & LS_NLA_PATH_F_USER) {
+			user_rec = (struct ib_user_path_rec *)
+				   rec->path_rec;
+			memset(&sa_rec, 0, sizeof(sa_rec));
+			ib_copy_path_rec_from_user(&sa_rec, user_rec);
+			path_query->callback(0, &sa_rec, path_query->context);
+		} else {
+			mad = query->mad_buf->mad;
+			mad->mad_hdr.method |= IB_MGMT_METHOD_RESP;
+			memcpy(mad->data, rec->path_rec,
+			       nla_len(attr) - sizeof(*rec));
+			query->callback(query, 0, mad);
+		}
+	}
+
+	mad_send_wc.send_buf = query->mad_buf;
+	mad_send_wc.status = IB_WC_SUCCESS;
+	send_handler(query->mad_buf->mad_agent, &mad_send_wc);
+}
+
+static void ib_nl_request_timeout(struct work_struct *work)
+{
+	unsigned long flags;
+	struct ib_nl_request_info *rinfo;
+	struct ib_sa_query *query;
+	unsigned long delay;
+	struct ib_mad_send_wc mad_send_wc;
+	int ret;
+
+	spin_lock_irqsave(&ib_nl_request_lock, flags);
+	while (!list_empty(&ib_nl_request_list)) {
+		rinfo = list_entry(ib_nl_request_list.next,
+				   struct ib_nl_request_info, list);
+
+		if (time_after(rinfo->timeout, jiffies)) {
+			delay = rinfo->timeout - jiffies;
+			if ((long)delay <= 0)
+				delay = 1;
+			queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
+			break;
+		}
+
+		list_del(&rinfo->list);
+		query = rinfo->query;
+		IB_SA_DISABLE_LOCAL_SVC(query);
+		/* Hold the lock to protect against query cancellation */
+		if (IB_SA_QUERY_CANCELLED(query))
+			ret = -1;
+		else
+			ret = ib_post_send_mad(query->mad_buf, NULL);
+		if (ret) {
+			mad_send_wc.send_buf = query->mad_buf;
+			mad_send_wc.status = IB_WC_WR_FLUSH_ERR;
+			spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+			send_handler(query->port->agent, &mad_send_wc);
+			spin_lock_irqsave(&ib_nl_request_lock, flags);
+		}
+		kfree(rinfo);
+	}
+	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+}
+
+static inline int ib_nl_is_good_resolve_resp(const struct nlmsghdr *nlh)
+{
+	const struct nlattr *attr;
+	struct rdma_nla_ls_path_rec *rec;
+	struct ib_path_rec_data rec_data;
+
+	if (nlh->nlmsg_flags & RDMA_NL_LS_F_ERR)
+		return 0;
+
+	if (!(nlh->nlmsg_flags & RDMA_NL_LS_F_OK))
+		return 0;
+
+	if (nlmsg_len(nlh) < nla_attr_size(sizeof(*rec)))
+		return 0;
+
+	attr = (const struct nlattr *) nlmsg_data(nlh);
+	if (attr->nla_type != LS_NLA_TYPE_PATH_RECORD)
+		return 0;
+
+	rec = (struct rdma_nla_ls_path_rec *) nla_data(attr);
+	if (((rec->flags & LS_NLA_PATH_F_USER) &&
+	    nla_len(attr) < sizeof(struct ib_user_path_rec)) ||
+	    (!(rec->flags & LS_NLA_PATH_F_USER) &&
+	    nla_len(attr) < sizeof(rec_data.path_rec)))
+		return 0;
+
+	return 1;
+}
+
+static int ib_nl_handle_set_timeout(struct sk_buff *skb,
+				    struct netlink_callback *cb)
+{
+	const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
+	int timeout, delta, abs_delta;
+	const struct nlattr *attr;
+	struct rdma_nla_ls_timeout *to_attr;
+	unsigned long flags;
+	struct ib_nl_request_info *rinfo;
+	long delay = 0;
+
+	if (nlmsg_len(nlh) < nla_attr_size(sizeof(*to_attr)))
+		goto settimeout_out;
+
+	attr = (const struct nlattr *) nlmsg_data(nlh);
+	if (attr->nla_type != LS_NLA_TYPE_TIMEOUT ||
+	    nla_len(attr) != sizeof(*to_attr))
+		goto settimeout_out;
+
+	to_attr = (struct rdma_nla_ls_timeout *) nla_data(attr);
+	timeout = (int) to_attr->timeout;
+	if (timeout < IB_SA_LOCAL_SVC_TIMEOUT_MIN)
+		timeout = IB_SA_LOCAL_SVC_TIMEOUT_MIN;
+	if (timeout > IB_SA_LOCAL_SVC_TIMEOUT_MAX)
+		timeout = IB_SA_LOCAL_SVC_TIMEOUT_MAX;
+
+	delta = timeout - sa_local_svc_timeout_ms;
+	if (delta < 0)
+		abs_delta = -delta;
+	else
+		abs_delta = delta;
+
+	if (delta != 0) {
+		spin_lock_irqsave(&ib_nl_request_lock, flags);
+		sa_local_svc_timeout_ms = timeout;
+		list_for_each_entry(rinfo, &ib_nl_request_list, list) {
+			if (delta < 0 && abs_delta > rinfo->timeout)
+				rinfo->timeout = 0;
+			else
+				rinfo->timeout += delta;
+
+			/* Get the new delay from the first entry */
+			if (!delay) {
+				delay = rinfo->timeout - jiffies;
+				if (delay <= 0)
+					delay = 1;
+			}
+		}
+		if (delay)
+			mod_delayed_work(ib_nl_wq, &ib_nl_timed_work,
+					 (unsigned long)delay);
+		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+	}
+
+settimeout_out:
+	return skb->len;
+}
+
+static int ib_nl_handle_resolve_resp(struct sk_buff *skb,
+				     struct netlink_callback *cb)
+{
+	const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
+	unsigned long flags;
+	struct ib_nl_request_info *rinfo;
+	struct ib_sa_query *query;
+	struct ib_mad_send_buf *send_buf;
+	struct ib_mad_send_wc mad_send_wc;
+	int found = 0;
+	int ret;
+
+	spin_lock_irqsave(&ib_nl_request_lock, flags);
+	list_for_each_entry(rinfo, &ib_nl_request_list, list) {
+		/*
+		 * If the query is cancelled, let the timeout routine
+		 * take care of it.
+		 */
+		if (nlh->nlmsg_seq == rinfo->seq) {
+			found = !IB_SA_QUERY_CANCELLED(rinfo->query);
+			if (found)
+				list_del(&rinfo->list);
+			break;
+		}
+	}
+
+	if (!found) {
+		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+		goto resp_out;
+	}
+
+	query = rinfo->query;
+	send_buf = query->mad_buf;
+
+	if (!ib_nl_is_good_resolve_resp(nlh)) {
+		/* if the result is a failure, send out the packet via IB */
+		IB_SA_DISABLE_LOCAL_SVC(query);
+		ret = ib_post_send_mad(query->mad_buf, NULL);
+		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+		if (ret) {
+			mad_send_wc.send_buf = send_buf;
+			mad_send_wc.status = IB_WC_GENERAL_ERR;
+			send_handler(query->port->agent, &mad_send_wc);
+		}
+	} else {
+		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+		ib_nl_process_good_resolve_rsp(query, nlh);
+	}
+
+	kfree(rinfo);
+resp_out:
+	return skb->len;
+}
+
+static struct ibnl_client_cbs ib_sa_cb_table[] = {
+	[RDMA_NL_LS_OP_RESOLVE] = {
+		.dump = ib_nl_handle_resolve_resp,
+		.module = THIS_MODULE },
+	[RDMA_NL_LS_OP_SET_TIMEOUT] = {
+		.dump = ib_nl_handle_set_timeout,
+		.module = THIS_MODULE },
+};
+
 static void free_sm_ah(struct kref *kref)
 {
 	struct ib_sa_sm_ah *sm_ah = container_of(kref, struct ib_sa_sm_ah, ref);
@@ -502,7 +947,13 @@ void ib_sa_cancel_query(int id, struct ib_sa_query *query)
 	mad_buf = query->mad_buf;
 	spin_unlock_irqrestore(&idr_lock, flags);
 
-	ib_cancel_mad(agent, mad_buf);
+	/*
+	 * If the query is still on the netlink request list, schedule
+	 * it to be cancelled by the timeout routine. Otherwise, it has been
+	 * sent to the MAD layer and has to be cancelled from there.
+	 */
+	if (!ib_nl_cancel_request(query))
+		ib_cancel_mad(agent, mad_buf);
 }
 EXPORT_SYMBOL(ib_sa_cancel_query);
 
@@ -638,6 +1089,14 @@ static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask)
 	query->mad_buf->context[0] = query;
 	query->id = id;
 
+	if (IB_SA_LOCAL_SVC_ENABLED(query)) {
+		if (!ibnl_chk_listeners(RDMA_NL_GROUP_LS)) {
+			if (!ib_nl_make_request(query))
+				return id;
+		}
+		IB_SA_DISABLE_LOCAL_SVC(query);
+	}
+
 	ret = ib_post_send_mad(query->mad_buf, NULL);
 	if (ret) {
 		spin_lock_irqsave(&idr_lock, flags);
@@ -766,6 +1225,9 @@ int ib_sa_path_rec_get(struct ib_sa_client *client,
 
 	*sa_query = &query->sa_query;
 
+	IB_SA_ENABLE_LOCAL_SVC(&query->sa_query);
+	query->sa_query.input = rec;
+
 	ret = send_mad(&query->sa_query, timeout_ms, gfp_mask);
 	if (ret < 0)
 		goto err2;
@@ -1250,6 +1712,8 @@ static int __init ib_sa_init(void)
 
 	get_random_bytes(&tid, sizeof tid);
 
+	atomic_set(&ib_nl_sa_request_seq, 0);
+
 	ret = ib_register_client(&sa_client);
 	if (ret) {
 		printk(KERN_ERR "Couldn't register ib_sa client\n");
@@ -1262,7 +1726,25 @@ static int __init ib_sa_init(void)
 		goto err2;
 	}
 
+	ib_nl_wq = create_singlethread_workqueue("ib_nl_sa_wq");
+	if (!ib_nl_wq) {
+		ret = -ENOMEM;
+		goto err3;
+	}
+
+	if (ibnl_add_client(RDMA_NL_SA, RDMA_NL_LS_NUM_OPS,
+			    ib_sa_cb_table)) {
+		pr_err("Failed to add netlink callback\n");
+		ret = -EINVAL;
+		goto err4;
+	}
+	INIT_DELAYED_WORK(&ib_nl_timed_work, ib_nl_request_timeout);
+
 	return 0;
+err4:
+	destroy_workqueue(ib_nl_wq);
+err3:
+	mcast_cleanup();
 err2:
 	ib_unregister_client(&sa_client);
 err1:
@@ -1271,6 +1753,10 @@ err1:
 
 static void __exit ib_sa_cleanup(void)
 {
+	ibnl_remove_client(RDMA_NL_SA);
+	cancel_delayed_work(&ib_nl_timed_work);
+	flush_workqueue(ib_nl_wq);
+	destroy_workqueue(ib_nl_wq);
 	mcast_cleanup();
 	ib_unregister_client(&sa_client);
 	idr_destroy(&query_idr);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found]     ` <1433420809-13529-5-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-06-04 20:19       ` Jason Gunthorpe
       [not found]         ` <20150604201953.GA7227-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2015-06-04 20:19 UTC (permalink / raw)
  To: kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, John Fleck, Ira Weiny

On Thu, Jun 04, 2015 at 08:26:49AM -0400, kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> This patch routes a SA pathrecord query to netlink first and processes the
> response appropriately. If a failure is returned, the request will be sent
> through IB. The decision whether to route the request to netlink first is
> determined by the presence of a listener for the local service netlink
> multicast group. If the user-space local service netlink multicast group
> listener is not present, the request will be sent through IB, just like
> what is currently being done.

Why doesn't this follow the usual netlink usage idioms? I don't see a
single nla_put or nla_nest_start, which, based on the RFC, is what I
expect to see. Don't home brew net link message construction without a
really great reason.

I am very surprised to see this layered under send_mad, all the
context is lost at that point, and it makes it impossible to provide
all the non-IB context (IP addrs, netdevs, QOS, etc)

Confused why all sorts of LS_NLA_PATH_F_X are defined but only
LS_NLA_PATH_F_USER is referenced. What does F_USER even mean? I'm
pretty sure that should be the same as cma:
  (IB_PATH_GMP | IB_PATH_PRIMARY | IB_PATH_BIDIRECTIONAL))

Why define new path flags when this is already done as a UAPI?

Not sure the reply parse is done right either, where is the for loop?

The entire point of the original comment was to move away from using
IB SA Path query MADs in netlink, but this really only changes the
reply format. Better, but wondering if we should go further.

I was hoping to see the three callers of ib_sa_path_rec_get contribute
their information to the netlink message.

Suggest putting all the netlink stuff in its own file..

Bunch of smaller stuff, can look at v4..

Overall, much improved, I think.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found]         ` <20150604201953.GA7227-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-04 20:31           ` Hefty, Sean
  2015-06-05 10:57           ` Wan, Kaike
  1 sibling, 0 replies; 14+ messages in thread
From: Hefty, Sean @ 2015-06-04 20:31 UTC (permalink / raw)
  To: Jason Gunthorpe, Wan, Kaike
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fleck, John, Weiny, Ira

> I am very surprised to see this layered under send_mad, all the
> context is lost at that point, and it makes it impossible to provide
> all the non-IB context (IP addrs, netdevs, QOS, etc)

It's layered under query path record.  This should provide the most benefit to the most applications, including the existing kernel clients.  You could extend this by hooking into the rdma cm directly, but IMO, that can be viewed as another optimization.  Basically, if one were to hook in only one place, this seems the most reasonable.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found]         ` <20150604201953.GA7227-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-06-04 20:31           ` Hefty, Sean
@ 2015-06-05 10:57           ` Wan, Kaike
       [not found]             ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB4FD7-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Wan, Kaike @ 2015-06-05 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fleck, John, Weiny, Ira, Hefty, Sean


> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
> Sent: Thursday, June 04, 2015 4:20 PM
> 
> On Thu, Jun 04, 2015 at 08:26:49AM -0400, kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> > From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >
> > This patch routes a SA pathrecord query to netlink first and processes
> > the response appropriately. If a failure is returned, the request will
> > be sent through IB. The decision whether to route the request to
> > netlink first is determined by the presence of a listener for the
> > local service netlink multicast group. If the user-space local service
> > netlink multicast group listener is not present, the request will be
> > sent through IB, just like what is currently being done.
> 
> Why doesn't this follow the usual netlink usage idioms? I don't see a single
> nla_put or nla_nest_start, which, based on the RFC, is what I expect to see.
> Don't home brew net link message construction without a really great reason.

The only reason why we did not use nla_put is to avoid data copying. For example, to translate the input struct ib_sa_path_rec into struct ib_user_path_rec, we directly use the skb buffer reserved for the pathrecord attribute as the destination (ib_nl_set_path_rec_attr()) of ib_copy_path_rec_to_user, which is essentially the guts of nla_put. To use nla_put directly, we would have to use a temp buffer to hold ib_user_path_rec and then copy it to the skb.

> 
> I am very surprised to see this layered under send_mad, all the context is lost
> at that point, and it makes it impossible to provide all the non-IB context (IP
> addrs, netdevs, QOS, etc)

As Sean pointed out in a separate e-mail, this provides benefits to more applications. Currently, rdma_cm, ipoib, and SRP calls ib_sa_path_rec_get() to query pathrecord. In addition, this does not preclude optimization in other locations in the kernel in the future.

> 
> Confused why all sorts of LS_NLA_PATH_F_X are defined but only
> LS_NLA_PATH_F_USER is referenced. What does F_USER even mean? 

it indicates that the format of the attribute is struct ib_user_path_rec instead of packed MAD wire format (or struct ibv_path_rec in user space). This has been documented in the original RFC.

I'm
> pretty sure that should be the same as cma:
>   (IB_PATH_GMP | IB_PATH_PRIMARY | IB_PATH_BIDIRECTIONAL))
> 
> Why define new path flags when this is already done as a UAPI?

yes, the flag is really copied from the pathrecord flags for struct ib_path_rec_data in include/uapi/rdma/ib_user_sa.h because our pathrecord attribute (struct rdma_nla_ls_path_rec) is similar to struct ib_path_rec_data, but not exactly the same.  That's why we should define its own flags as a separate entity.

> 
> Not sure the reply parse is done right either, where is the for loop?

Well, for ib_sa_path_rec_get(), we are expecting to see only one pathrecord in the response. For other kernel locations where multiple pathrecords are expected, one may have to iterate through all returned attributes.

> 
> The entire point of the original comment was to move away from using IB SA
> Path query MADs in netlink, but this really only changes the reply format.
> Better, but wondering if we should go further.
> 
> I was hoping to see the three callers of ib_sa_path_rec_get contribute their
> information to the netlink message.
> 
> Suggest putting all the netlink stuff in its own file..

All our netlink stuff is based on the rdma_netlink.h definitions (client, multicast group, opcode, attributes) and I think that the include/uapi/rdma/rdma_entlink.h file is its natural habitat.

> 
> Bunch of smaller stuff, can look at v4..
> 
> Overall, much improved, I think.
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found]             ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB4FD7-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-05 17:05               ` Jason Gunthorpe
       [not found]                 ` <20150605170521.GB7776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2015-06-05 17:05 UTC (permalink / raw)
  To: Wan, Kaike
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fleck, John, Weiny, Ira, Hefty, Sean

On Fri, Jun 05, 2015 at 10:57:03AM +0000, Wan, Kaike wrote:

> > Why doesn't this follow the usual netlink usage idioms? I don't
> > see a single nla_put or nla_nest_start, which, based on the RFC,
> > is what I expect to see.  Don't home brew net link message
> > construction without a really great reason.
> 
> The only reason why we did not use nla_put is to avoid data
> copying. For example, to translate the input struct ib_sa_path_rec
> into struct ib_user_path_rec, we directly use the skb buffer
> reserved for the pathrecord attribute as the destination
> (ib_nl_set_path_rec_attr()) of ib_copy_path_rec_to_user, which is
> essentially the guts of nla_put. To use nla_put directly, we would
> have to use a temp buffer to hold ib_user_path_rec and then copy it
> to the skb.

Okay, but isn't that what nla_reserve is for?

> > I am very surprised to see this layered under send_mad, all the
> > context is lost at that point, and it makes it impossible to
> > provide all the non-IB context (IP addrs, netdevs, QOS, etc)
> 
> As Sean pointed out in a separate e-mail, this provides benefits to
> more applications. Currently, rdma_cm, ipoib, and SRP calls
> ib_sa_path_rec_get() to query pathrecord.

Yes, I saw Sean's comment, and I am glad we all understand there are
only 3 callers. And looking at the callers there are only a small
number of elements being used:

rdma cm:
 IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
                    IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
                    IB_SA_PATH_REC_REVERSIBLE | IB_SA_PATH_REC_SERVICE_ID;
 IB_SA_PATH_REC_QOS_CLASS;
 IB_SA_PATH_REC_TRAFFIC_CLASS

ipoib:
                                   IB_SA_PATH_REC_DGID          |
                                   IB_SA_PATH_REC_SGID          |
                                   IB_SA_PATH_REC_NUMB_PATH     |
                                   IB_SA_PATH_REC_TRAFFIC_CLASS |
                                   IB_SA_PATH_REC_PKEY,

srp:
                                               IB_SA_PATH_REC_SERVICE_ID |
                                               IB_SA_PATH_REC_DGID       |
                                               IB_SA_PATH_REC_SGID       |
                                               IB_SA_PATH_REC_NUMB_PATH  |
                                               IB_SA_PATH_REC_PKEY,

So, rather than a SA mad in netlink, I'd suggest actually using
extensible netlink here and define the query cleanly using nested
attributes:

RDMA_OP_RESOLVE (request)
  GIDS
  SERVICE_ID
  QOS/TCLASS
  PKEY
  QP TYPE (aka reversible)
  
RDMA_OP_RESOLVE (reply)
  IB_PATH
  IB_PATH
  IB_PATH

This could probably still be done under send_mad, and if you don't
want to finish the job then that is probably fine.

The reason to prefer native netlink format as a UAPI is because it is
more specific. It is very hard to process a general SA Path Record
query mad, there are many combinations and many wonky attributes. That
makes implementing a correct/future proof user space client
unnecessarily hard. With native netlink we can define our own
conventions for UAPI extension.

.. and with only 3 callers we don't have a huge burden to shift if a
kAPI needs to change to make this natural ..

> In addition, this does not preclude optimization in other locations
> in the kernel in the future.

Confused. Adding new netlink attributes is not an optimization, it is
a UAPI enhancement.

> > Confused why all sorts of LS_NLA_PATH_F_X are defined but only
> > LS_NLA_PATH_F_USER is referenced. What does F_USER even mean? 
> 
> it indicates that the format of the attribute is struct
> ib_user_path_rec instead of packed MAD wire format (or struct
> ibv_path_rec in user space). This has been documented in the
> original RFC.

Why do we need two formats in the uapi? Drop the !user one.

Even so, it isn't the netlink way to bury two different structs in the
same attribute, you need to use netlink headers to tell them apart,
not buried flags..

> I'm
> > pretty sure that should be the same as cma:
> >   (IB_PATH_GMP | IB_PATH_PRIMARY | IB_PATH_BIDIRECTIONAL))

You missed this one

You can't just ignore the flag bit, they have to be checked. It is
very important to have an extensible uapi.

> > Why define new path flags when this is already done as a UAPI?
> 
> yes, the flag is really copied from the pathrecord flags for struct
> ib_path_rec_data in include/uapi/rdma/ib_user_sa.h because our
> pathrecord attribute (struct rdma_nla_ls_path_rec) is similar to
> struct ib_path_rec_data, but not exactly the same.  That's why we
> should define its own flags as a separate entity.

I would keep the original flags for consistency, especially once the
!user case is dropped. This is because it should be exactly the same
protocol and system as the other users.

> > Not sure the reply parse is done right either, where is the for loop?
> 
> Well, for ib_sa_path_rec_get(), we are expecting to see only one
> pathrecord in the response. For other kernel locations where
> multiple pathrecords are expected, one may have to iterate through
> all returned attributes.

No, you must accept multiple attributes and loop to find a supported
one (today's kernel only supports (IB_PATH_GMP | IB_PATH_PRIMARY |
IB_PATH_BIDIRECTIONAL)). Review the other implementations of IB_PATH_*
flags in the kernel to see how this works.

Otherwise future extensibility is compromised.

> > Suggest putting all the netlink stuff in its own file..
> 
> All our netlink stuff is based on the rdma_netlink.h definitions
> (client, multicast group, opcode, attributes) and I think that the
> include/uapi/rdma/rdma_entlink.h file is its natural habitat.

I mean the .c file stuf.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found]                 ` <20150605170521.GB7776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-05 20:53                   ` Hefty, Sean
       [not found]                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE54EC-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-06-05 21:01                   ` Wan, Kaike
  1 sibling, 1 reply; 14+ messages in thread
From: Hefty, Sean @ 2015-06-05 20:53 UTC (permalink / raw)
  To: Jason Gunthorpe, Wan, Kaike
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fleck, John, Weiny, Ira

> Yes, I saw Sean's comment, and I am glad we all understand there are
> only 3 callers. And looking at the callers there are only a small
> number of elements being used:
> 
> rdma cm:
>  IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
>                     IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
>                     IB_SA_PATH_REC_REVERSIBLE | IB_SA_PATH_REC_SERVICE_ID;
>  IB_SA_PATH_REC_QOS_CLASS;
>  IB_SA_PATH_REC_TRAFFIC_CLASS
> 
> ipoib:
>                                    IB_SA_PATH_REC_DGID          |
>                                    IB_SA_PATH_REC_SGID          |
>                                    IB_SA_PATH_REC_NUMB_PATH     |
>                                    IB_SA_PATH_REC_TRAFFIC_CLASS |
>                                    IB_SA_PATH_REC_PKEY,
> 
> srp:
>                                                IB_SA_PATH_REC_SERVICE_ID |
>                                                IB_SA_PATH_REC_DGID       |
>                                                IB_SA_PATH_REC_SGID       |
>                                                IB_SA_PATH_REC_NUMB_PATH  |
>                                                IB_SA_PATH_REC_PKEY,
> 
> So, rather than a SA mad in netlink, I'd suggest actually using
> extensible netlink here and define the query cleanly using nested
> attributes:
> 
> RDMA_OP_RESOLVE (request)
>   GIDS
>   SERVICE_ID
>   QOS/TCLASS
>   PKEY
>   QP TYPE (aka reversible)
> 
> RDMA_OP_RESOLVE (reply)
>   IB_PATH
>   IB_PATH
>   IB_PATH

I thought you were asking to move the hook site.  This approach seems reasonable.  We should just need to add a check against the compmask.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found]                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE54EC-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-05 20:59                       ` Jason Gunthorpe
       [not found]                         ` <20150605205946.GC12504-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2015-06-05 20:59 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Wan, Kaike, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fleck, John, Weiny, Ira

On Fri, Jun 05, 2015 at 08:53:01PM +0000, Hefty, Sean wrote:
> > RDMA_OP_RESOLVE (reply)
> >   IB_PATH
> >   IB_PATH
> >   IB_PATH
> 
> I thought you were asking to move the hook site.  This approach
> seems reasonable.  We should just need to add a check against the
> compmask.

I've been asking if we can include more meta-information, like the
netdev for rdma-cm and ipoib. Having the hook site in send_mad makes
that seem hard to reach. (maybe not?)

One one hand, it would be nice to launch a new UAPI with broad feature
completeness, on the other hand, if it is a lot of work then go
incremental. That is why I only said I was surprised :)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found]                 ` <20150605170521.GB7776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-06-05 20:53                   ` Hefty, Sean
@ 2015-06-05 21:01                   ` Wan, Kaike
       [not found]                     ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB5071-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Wan, Kaike @ 2015-06-05 21:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fleck, John, Weiny, Ira, Hefty, Sean

> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
> Sent: Friday, June 05, 2015 1:05 PM
> 
> > > Why doesn't this follow the usual netlink usage idioms? I don't see
> > > a single nla_put or nla_nest_start, which, based on the RFC, is what
> > > I expect to see.  Don't home brew net link message construction
> > > without a really great reason.
> >
> > The only reason why we did not use nla_put is to avoid data copying.
> > For example, to translate the input struct ib_sa_path_rec into struct
> > ib_user_path_rec, we directly use the skb buffer reserved for the
> > pathrecord attribute as the destination
> > (ib_nl_set_path_rec_attr()) of ib_copy_path_rec_to_user, which is
> > essentially the guts of nla_put. To use nla_put directly, we would
> > have to use a temp buffer to hold ib_user_path_rec and then copy it to
> > the skb.
> 
> Okay, but isn't that what nla_reserve is for?

Yes. I will replace part of the code with nla_reserve. Thank you for pointing it out.

> 
> > > I am very surprised to see this layered under send_mad, all the
> > > context is lost at that point, and it makes it impossible to provide
> > > all the non-IB context (IP addrs, netdevs, QOS, etc)
> >
> > As Sean pointed out in a separate e-mail, this provides benefits to
> > more applications. Currently, rdma_cm, ipoib, and SRP calls
> > ib_sa_path_rec_get() to query pathrecord.
> 
> Yes, I saw Sean's comment, and I am glad we all understand there are only 3
> callers. And looking at the callers there are only a small number of elements
> being used:
> 
> rdma cm:
>  IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
>                     IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
>                     IB_SA_PATH_REC_REVERSIBLE | IB_SA_PATH_REC_SERVICE_ID;
> IB_SA_PATH_REC_QOS_CLASS;  IB_SA_PATH_REC_TRAFFIC_CLASS
> 
> ipoib:
>                                    IB_SA_PATH_REC_DGID          |
>                                    IB_SA_PATH_REC_SGID          |
>                                    IB_SA_PATH_REC_NUMB_PATH     |
>                                    IB_SA_PATH_REC_TRAFFIC_CLASS |
>                                    IB_SA_PATH_REC_PKEY,
> 
> srp:
>                                                IB_SA_PATH_REC_SERVICE_ID |
>                                                IB_SA_PATH_REC_DGID       |
>                                                IB_SA_PATH_REC_SGID       |
>                                                IB_SA_PATH_REC_NUMB_PATH  |
>                                                IB_SA_PATH_REC_PKEY,
> 
> So, rather than a SA mad in netlink, I'd suggest actually using extensible
> netlink here and define the query cleanly using nested
> attributes:
> 
> RDMA_OP_RESOLVE (request)
>   GIDS
>   SERVICE_ID
>   QOS/TCLASS
>   PKEY
>   QP TYPE (aka reversible)
> 
> RDMA_OP_RESOLVE (reply)
>   IB_PATH
>   IB_PATH
>   IB_PATH

Are you saying that instead of defining the pathrecord attribute only, we can define the following simpler attributes: GID, SERVICE_ID, QOS/TCLASS, PKEY, QP TYPE, IB_PATH, and then use them in the request or response? In this case,  we need more attribute types (eg, SGID and DGID as two different attribute types instead of a single GID attribute type with a flags field in the attribute data) to avoid using flags as attribute modifiers.

For response, we will still like to have the struct ibv_path_rec format (MAD wire format) for this patch series. In the future, it might be possible to use individual attributes just like the request.


> 
> This could probably still be done under send_mad, and if you don't want to
> finish the job then that is probably fine.

We would like to put netlink call under send_mad() for two reasons:

(1) It can potentially be used by other SA request (eg get ServiceRecord);
(2) send_mad has other operations (store query point in idr map, set timeout) in addition to call the MAD stack. Theoretically, we could do all netlink call inside ib_sa_path_rec_get() before calling send_mad. However, since we need to handle timeout and netlink response in separate thread, we need to set everything up properly.

> 
> The reason to prefer native netlink format as a UAPI is because it is more
> specific. It is very hard to process a general SA Path Record query mad, there
> are many combinations and many wonky attributes. That makes
> implementing a correct/future proof user space client unnecessarily hard.
> With native netlink we can define our own conventions for UAPI extension.

Agree. That's why we are using struct ib_user_path_rec as the request input currently. As a matter of fact, we are using only some of the parameters in ibacm (sgid, dgid, slid, dlid, etc).

> 
> .. and with only 3 callers we don't have a huge burden to shift if a kAPI needs
> to change to make this natural ..
> 
> > In addition, this does not preclude optimization in other locations in
> > the kernel in the future.
> 
> Confused. Adding new netlink attributes is not an optimization, it is a UAPI
> enhancement.
> 
> > > Confused why all sorts of LS_NLA_PATH_F_X are defined but only
> > > LS_NLA_PATH_F_USER is referenced. What does F_USER even mean?
> >
> > it indicates that the format of the attribute is struct
> > ib_user_path_rec instead of packed MAD wire format (or struct
> > ibv_path_rec in user space). This has been documented in the original
> > RFC.
> 
> Why do we need two formats in the uapi? Drop the !user one.

If we don't use struct ib_user_path_rec format, we don't need it any more.

> 
> Even so, it isn't the netlink way to bury two different structs in the same
> attribute, you need to use netlink headers to tell them apart, not buried
> flags..

I was back and forth during the implementation on whether to use more attribute types or use a single attribute type and flags to define different structures. I can switch to use different attributes for different structures. 

> 
> > I'm
> > > pretty sure that should be the same as cma:
> > >   (IB_PATH_GMP | IB_PATH_PRIMARY | IB_PATH_BIDIRECTIONAL))
> 
> You missed this one
> 
> You can't just ignore the flag bit, they have to be checked. It is very important
> to have an extensible uapi.
> 
> > > Why define new path flags when this is already done as a UAPI?

> >
> > yes, the flag is really copied from the pathrecord flags for struct
> > ib_path_rec_data in include/uapi/rdma/ib_user_sa.h because our
> > pathrecord attribute (struct rdma_nla_ls_path_rec) is similar to
> > struct ib_path_rec_data, but not exactly the same.  That's why we
> > should define its own flags as a separate entity.
> 
> I would keep the original flags for consistency, especially once the !user case
> is dropped. This is because it should be exactly the same protocol and system
> as the other users.

I will do that.

> 
> > > Not sure the reply parse is done right either, where is the for loop?
> >
> > Well, for ib_sa_path_rec_get(), we are expecting to see only one
> > pathrecord in the response. For other kernel locations where multiple
> > pathrecords are expected, one may have to iterate through all returned
> > attributes.
> 
> No, you must accept multiple attributes and loop to find a supported one
> (today's kernel only supports (IB_PATH_GMP | IB_PATH_PRIMARY |
> IB_PATH_BIDIRECTIONAL)). Review the other implementations of
> IB_PATH_* flags in the kernel to see how this works.
> 
> Otherwise future extensibility is compromised.

OK. I will do the looping to find the correct one.

Thanks,

Kaike
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found]                         ` <20150605205946.GC12504-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-05 21:02                           ` Wan, Kaike
  0 siblings, 0 replies; 14+ messages in thread
From: Wan, Kaike @ 2015-06-05 21:02 UTC (permalink / raw)
  To: Jason Gunthorpe, Hefty, Sean
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fleck, John, Weiny, Ira

> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
> Sent: Friday, June 05, 2015 5:00 PM
> 
> On Fri, Jun 05, 2015 at 08:53:01PM +0000, Hefty, Sean wrote:
> > > RDMA_OP_RESOLVE (reply)
> > >   IB_PATH
> > >   IB_PATH
> > >   IB_PATH
> >
> > I thought you were asking to move the hook site.  This approach seems
> > reasonable.  We should just need to add a check against the compmask.
> 
> I've been asking if we can include more meta-information, like the netdev for
> rdma-cm and ipoib. Having the hook site in send_mad makes that seem hard
> to reach. (maybe not?)
> 
> One one hand, it would be nice to launch a new UAPI with broad feature
> completeness, on the other hand, if it is a lot of work then go incremental.
> That is why I only said I was surprised :)

I would like the incremental approach.

Kaike

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found]                     ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB5071-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-05 21:19                       ` Jason Gunthorpe
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2015-06-05 21:19 UTC (permalink / raw)
  To: Wan, Kaike
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fleck, John, Weiny, Ira, Hefty, Sean

On Fri, Jun 05, 2015 at 09:01:11PM +0000, Wan, Kaike wrote:
> > So, rather than a SA mad in netlink, I'd suggest actually using extensible
> > netlink here and define the query cleanly using nested
> > attributes:
> > 
> > RDMA_OP_RESOLVE (request)
> >   GIDS
> >   SERVICE_ID
> >   QOS/TCLASS
> >   PKEY
> >   QP TYPE (aka reversible)
> > 
> > RDMA_OP_RESOLVE (reply)
> >   IB_PATH
> >   IB_PATH
> >   IB_PATH
> 
> Are you saying that instead of defining the pathrecord attribute
> only, we can define the following simpler attributes: GID,
> SERVICE_ID, QOS/TCLASS, PKEY, QP TYPE, IB_PATH, and then use them in
> the request or response?

Request only.

Response will be an array of ib_path_rec_data (and sorry if I was
confusing on !user, user, the goal is to match ucma's existing API for
this, see ucma_set_ib_path).

Please check if we can extend the answer array too, or if it needs to
be more like:

RDMA_OP_RESOLVE (reply)
  RESOLVED_PATH (#1, REVERISBLE, GMP)
     FLAGS
     IB_PATH (ib_path_rec_data)
     FUTURE_EXTENSION
  RESOLVED_PATH (#2 FORWARD ONLY)
   ..
  RESOLVED_PATH (#3 REVERSE ONLY)
   ..
  RESOLVED_PATH (#4 Alternate FORWARD ONLY)
   ..
  RESOLVED_PATH (#5 Alternate REVERSE ONLY)
   ..
  RESOLVED_PATH (#6 Alternate REVERSIBLE, GMP)
   ..

To elaborte again: This is best API I've seen to add APM - userspace
can have the 'policy' side that APM needs and plug in to the kernel
here with this netlink interface. Obviously the kernel side is
missing, but lets plan ahead :)

> In this case, we need more attribute types
> (eg, SGID and DGID as two different attribute types instead of a
> single GID attribute type with a flags field in the attribute data)

Well, think about it and slice it up as you feel best, using netlink
conventions. Ie every address should be carryed similar to
IFLA_ADDRESS with the type (AF_*) and size appropriate.

SGID and DGID are mandatory, but the overhead of splitting them is
probably small ?

I'd *really* like to use QP TYPE instead of reversible, a UD QP has a
different path set need than a RC QP, for instance. This is forward
looking toward APM.

> > This could probably still be done under send_mad, and if you don't want to
> > finish the job then that is probably fine.
> 
> We would like to put netlink call under send_mad() for two reasons:
> 
> (1) It can potentially be used by other SA request (eg get
> ServiceRecord);
> (2) send_mad has other operations (store query point in idr map, set
> timeout) in addition to call the MAD stack. Theoretically, we could
> do all netlink call inside ib_sa_path_rec_get() before calling
> send_mad. However, since we need to handle timeout and netlink
> response in separate thread, we need to set everything up properly.

Yes, I appreciate this.

> > Even so, it isn't the netlink way to bury two different structs in the same
> > attribute, you need to use netlink headers to tell them apart, not buried
> > flags..
> 
> I was back and forth during the implementation on whether to use
> more attribute types or use a single attribute type and flags to
> define different structures. I can switch to use different
> attributes for different structures.

Generally speaking with netlink we want to see the attribute self
describe with the common length field. Different things get different
attributes. This lets you use the existing mechanics for handling
attributes and the code reads cleanly.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-06-05 21:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-04 12:26 [PATCH v3 0/4] Sending kernel pathrecord query to user cache server kaike.wan-ral2JQCrhuEAvxtiuMwx3w
     [not found] ` <1433420809-13529-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-06-04 12:26   ` [PATCH v3 1/4] IB/netlink: Add defines for local service requests through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
2015-06-04 12:26   ` [PATCH v3 2/4] IB/core: Check the presence of netlink multicast group listeners kaike.wan-ral2JQCrhuEAvxtiuMwx3w
2015-06-04 12:26   ` [PATCH v3 3/4] IB/sa: Allocate SA query with kzalloc kaike.wan-ral2JQCrhuEAvxtiuMwx3w
2015-06-04 12:26   ` [PATCH v3 4/4] IB/sa: Route SA pathrecord query through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1433420809-13529-5-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-06-04 20:19       ` Jason Gunthorpe
     [not found]         ` <20150604201953.GA7227-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-04 20:31           ` Hefty, Sean
2015-06-05 10:57           ` Wan, Kaike
     [not found]             ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB4FD7-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-05 17:05               ` Jason Gunthorpe
     [not found]                 ` <20150605170521.GB7776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-05 20:53                   ` Hefty, Sean
     [not found]                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE54EC-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-05 20:59                       ` Jason Gunthorpe
     [not found]                         ` <20150605205946.GC12504-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-05 21:02                           ` Wan, Kaike
2015-06-05 21:01                   ` Wan, Kaike
     [not found]                     ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB5071-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-05 21:19                       ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.