All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] IB/sa: Route SA pathrecord query through netlink
@ 2015-05-21 13:52 Wan, Kaike
       [not found] ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB2E96-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Wan, Kaike @ 2015-05-21 13:52 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Hefty, Sean, Weiny, Ira, Jason Gunthorpe,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org),
	Or Gerlitz


In our previous posting to the mailing list, we proposed to send a MAD request from kernel (more
specifically, from ib_sa module) to a user space application (ibacm in this case) through netlink.
The user space application will send back the response. This simple scheme can achieve the goal 
of a local SA cache in user space.

The format of the request and response is diagrammed below:

  ------------------
  | netlink header |
  ------------------
  |     MAD        |
  ------------------

The kernel requests for a pathrecord, and the user application finds it in its local cache and sends
it to the kernel. If the netlink request fails, the kernel will send the request to SA through the
normal IB path (ib_mad -> hca driver -> wire).

Jason pointed out that this message format was limited to lower stack format (MAD) and its use
could not be readily extended to upper layer modules like rdma_cm. After lengthy discussions, we 
come up with a new and modified scheme, as described below.

The general format of the request and response will be the same:

  ------------------
  | netlink header |
  ------------------
  |  Data header   |
  ------------------
  |      Data      |
  ------------------

The data header contains information about the type of request/response, the status (for response),
the type (format) of the data, the total length of the data header + data, and a flags field about
the request/response or data.

Based on the type of the data, the data section may be in different format: a string about the host
name to resolve, an IP4/IP6 address, a pathrecord, a user pathrecord (struct ib_user_path_rec),
or simply a MAD (like our posted patches), etc. Essentially it can be of any format based on the 
data type. The key is to document the format so that the kernel and user space can communicate 
correctly.

The details are described below:

#define IB_NL_VERSION		0x01

#define IB_NL_OP_MASK		0x0F
#define IB_NL_OP_RESOLVE	0x01
#define IB_NL_OP_QUERY_PATH	0x02
#define IB_NL_OP_SET_TIMEOUT	0x03
#define IB_NL_OP_ACK		0x80

#define IB_NL_STATUS_SUCCESS	0x0000
#define IB_NL_STATUS_ENODATA	0x0001

#define IB_NL_DATA_TYPE_INVALID			0x0000
#define IB_NL_DATA_TYPE_NAME			0x0001
#define IB_NL_DATA_TYPE_ADDRESS_IP		0x0002
#define IB_NL_DATA_TYPE_ADDRESS_IP6		0x0003
#define IB_NL_DATA_TYPE_PATH_RECORD		0x0004
#define IB_NL_DATA_TYPE_USER_PATH_REC		0x0005
#define IB_NL_DATA_TYPE_MAD			0x0006

#define IB_NL_FLAGS_PATH_GMP			1
#define IB_NL_FLAGS_PATH_PRIMARY		(1<<1)
#define IB_NL_FLAGS_PATH_ALTERNATE		(1<<2)
#define IB_NL_FLAGS_PATH_OUTBOUND		(1<<3)
#define IB_NL_FLAGS_PATH_INBOUND		(1<<4)
#define IB_NL_FLAGS_PATH_INBOUND_REVERSE 	(1<<5)
#define IB_NL_FLAGS_PATH_BIDIRECTIONAL		(IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE)
#define IB_NL_FLAGS_QUERY_SA			(1<<31)
#define IB_NL_FLAGS_NODELAY			(1<<30)

struct ib_nl_data_hdr {
	__u8	version;
	__u8	opcode;
	__u16	status;
	__u16	type;
	__u16	reserved;
	__u32	flags;
	__u32	length;
};

struct ib_nl_data {
	struct ib_nl_data_hdr		hdr;
	__u8				data[0];
};


These defines and structures can be added to file include/upai/rdma/rdma_netlink.h (replace with
RDMA_NL prefix) or contained in a seperate file (include/upai/rdma/ib_netlink.h ???). 

Please share your thoughts.

Kaike
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] IB/sa: Route SA pathrecord query through netlink
       [not found] ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB2E96-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-05-21 17:21   ` Doug Ledford
       [not found]     ` <1432228874.28905.35.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-05-21 18:12   ` Jason Gunthorpe
  2015-05-21 19:44   ` ira.weiny
  2 siblings, 1 reply; 11+ messages in thread
From: Doug Ledford @ 2015-05-21 17:21 UTC (permalink / raw)
  To: Wan, Kaike
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean, Weiny, Ira,
	Jason Gunthorpe,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org),
	Or Gerlitz

[-- Attachment #1: Type: text/plain, Size: 4957 bytes --]

On Thu, 2015-05-21 at 13:52 +0000, Wan, Kaike wrote:
> In our previous posting to the mailing list, we proposed to send a MAD request from kernel (more
> specifically, from ib_sa module) to a user space application (ibacm in this case) through netlink.
> The user space application will send back the response. This simple scheme can achieve the goal 
> of a local SA cache in user space.
> 
> The format of the request and response is diagrammed below:
> 
>   ------------------
>   | netlink header |
>   ------------------
>   |     MAD        |
>   ------------------
> 
> The kernel requests for a pathrecord, and the user application finds it in its local cache and sends
> it to the kernel. If the netlink request fails, the kernel will send the request to SA through the
> normal IB path (ib_mad -> hca driver -> wire).
> 
> Jason pointed out that this message format was limited to lower stack format (MAD) and its use
> could not be readily extended to upper layer modules like rdma_cm. After lengthy discussions, we 
> come up with a new and modified scheme, as described below.
> 
> The general format of the request and response will be the same:
> 
>   ------------------
>   | netlink header |
>   ------------------
>   |  Data header   |
>   ------------------
>   |      Data      |
>   ------------------
> 
> The data header contains information about the type of request/response, the status (for response),
> the type (format) of the data, the total length of the data header + data, and a flags field about
> the request/response or data.
> 
> Based on the type of the data, the data section may be in different format: a string about the host
> name to resolve, an IP4/IP6 address, a pathrecord, a user pathrecord (struct ib_user_path_rec),
> or simply a MAD (like our posted patches), etc. Essentially it can be of any format based on the 
> data type. The key is to document the format so that the kernel and user space can communicate 
> correctly.
> 
> The details are described below:
> 
> #define IB_NL_VERSION		0x01
> 
> #define IB_NL_OP_MASK		0x0F
> #define IB_NL_OP_RESOLVE	0x01
> #define IB_NL_OP_QUERY_PATH	0x02
> #define IB_NL_OP_SET_TIMEOUT	0x03
> #define IB_NL_OP_ACK		0x80

If OP_ACK is one bit, why isn't the OP_MASK 0x7f?

> #define IB_NL_STATUS_SUCCESS	0x0000
> #define IB_NL_STATUS_ENODATA	0x0001

Do we need 16 bits for a bool?  In fact, couldn't this actually be
switched so that the return of the message uses OP_SUCCESS instead of
OP_ACK?

In other words, instead of two items here, couldn't the ACK bit be
dropped entirely and replaced with SUCCESS so that when the user app
returns the netlink packet, if the op on return == to the op on send, it
failed, if it's op | SUCCESS, it succeeded?

> #define IB_NL_DATA_TYPE_INVALID			0x0000
> #define IB_NL_DATA_TYPE_NAME			0x0001
> #define IB_NL_DATA_TYPE_ADDRESS_IP		0x0002
> #define IB_NL_DATA_TYPE_ADDRESS_IP6		0x0003
> #define IB_NL_DATA_TYPE_PATH_RECORD		0x0004
> #define IB_NL_DATA_TYPE_USER_PATH_REC		0x0005
> #define IB_NL_DATA_TYPE_MAD			0x0006
> 
> #define IB_NL_FLAGS_PATH_GMP			1
> #define IB_NL_FLAGS_PATH_PRIMARY		(1<<1)
> #define IB_NL_FLAGS_PATH_ALTERNATE		(1<<2)
> #define IB_NL_FLAGS_PATH_OUTBOUND		(1<<3)
> #define IB_NL_FLAGS_PATH_INBOUND		(1<<4)
> #define IB_NL_FLAGS_PATH_INBOUND_REVERSE 	(1<<5)
> #define IB_NL_FLAGS_PATH_BIDIRECTIONAL		(IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE)
> #define IB_NL_FLAGS_QUERY_SA			(1<<31)
> #define IB_NL_FLAGS_NODELAY			(1<<30)

Please keep these in numerical order, don't put <<31 and below it <<30

> struct ib_nl_data_hdr {
> 	__u8	version;
> 	__u8	opcode;
> 	__u16	status;
Drop status because we fold it into opcode
> 	__u16	type;
> 	__u16	reserved;
Drop reserved because we don't need alignment any more
> 	__u32	flags;
Flags is the only thing using bits fast, and we would want to make this
header an even 128bits in length, so add a __u32 reserved; here.  That's
more likely to be useful than the current layout since we are likely to
run out of flags before anything else.
> 	__u32	length;
> };
> 
> struct ib_nl_data {
> 	struct ib_nl_data_hdr		hdr;
> 	__u8				data[0];
> };
> 
> 
> These defines and structures can be added to file include/upai/rdma/rdma_netlink.h (replace with
> RDMA_NL prefix) or contained in a seperate file (include/upai/rdma/ib_netlink.h ???). 
> 
> Please share your thoughts.

I think an extensible netlink framework here is the right way to go,
certainly better than the one shot method you had first.

> Kaike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] IB/sa: Route SA pathrecord query through netlink
       [not found]     ` <1432228874.28905.35.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-21 17:35       ` Doug Ledford
       [not found]         ` <1432229723.28905.40.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-05-21 17:43       ` Wan, Kaike
  1 sibling, 1 reply; 11+ messages in thread
From: Doug Ledford @ 2015-05-21 17:35 UTC (permalink / raw)
  To: Wan, Kaike
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean, Weiny, Ira,
	Jason Gunthorpe,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org),
	Or Gerlitz

[-- Attachment #1: Type: text/plain, Size: 5320 bytes --]

On Thu, 2015-05-21 at 13:21 -0400, Doug Ledford wrote:
> On Thu, 2015-05-21 at 13:52 +0000, Wan, Kaike wrote:
> > In our previous posting to the mailing list, we proposed to send a MAD request from kernel (more
> > specifically, from ib_sa module) to a user space application (ibacm in this case) through netlink.
> > The user space application will send back the response. This simple scheme can achieve the goal 
> > of a local SA cache in user space.
> > 
> > The format of the request and response is diagrammed below:
> > 
> >   ------------------
> >   | netlink header |
> >   ------------------
> >   |     MAD        |
> >   ------------------
> > 
> > The kernel requests for a pathrecord, and the user application finds it in its local cache and sends
> > it to the kernel. If the netlink request fails, the kernel will send the request to SA through the
> > normal IB path (ib_mad -> hca driver -> wire).
> > 
> > Jason pointed out that this message format was limited to lower stack format (MAD) and its use
> > could not be readily extended to upper layer modules like rdma_cm. After lengthy discussions, we 
> > come up with a new and modified scheme, as described below.
> > 
> > The general format of the request and response will be the same:
> > 
> >   ------------------
> >   | netlink header |
> >   ------------------
> >   |  Data header   |
> >   ------------------
> >   |      Data      |
> >   ------------------
> > 
> > The data header contains information about the type of request/response, the status (for response),
> > the type (format) of the data, the total length of the data header + data, and a flags field about
> > the request/response or data.
> > 
> > Based on the type of the data, the data section may be in different format: a string about the host
> > name to resolve, an IP4/IP6 address, a pathrecord, a user pathrecord (struct ib_user_path_rec),
> > or simply a MAD (like our posted patches), etc. Essentially it can be of any format based on the 
> > data type. The key is to document the format so that the kernel and user space can communicate 
> > correctly.
> > 
> > The details are described below:
> > 
> > #define IB_NL_VERSION		0x01
> > 
> > #define IB_NL_OP_MASK		0x0F
> > #define IB_NL_OP_RESOLVE	0x01
> > #define IB_NL_OP_QUERY_PATH	0x02
> > #define IB_NL_OP_SET_TIMEOUT	0x03
> > #define IB_NL_OP_ACK		0x80
> 
> If OP_ACK is one bit, why isn't the OP_MASK 0x7f?
> 
> > #define IB_NL_STATUS_SUCCESS	0x0000
> > #define IB_NL_STATUS_ENODATA	0x0001
> 
> Do we need 16 bits for a bool?  In fact, couldn't this actually be
> switched so that the return of the message uses OP_SUCCESS instead of
> OP_ACK?
> 
> In other words, instead of two items here, couldn't the ACK bit be
> dropped entirely and replaced with SUCCESS so that when the user app
> returns the netlink packet, if the op on return == to the op on send, it
> failed, if it's op | SUCCESS, it succeeded?
> 
> > #define IB_NL_DATA_TYPE_INVALID			0x0000
> > #define IB_NL_DATA_TYPE_NAME			0x0001
> > #define IB_NL_DATA_TYPE_ADDRESS_IP		0x0002
> > #define IB_NL_DATA_TYPE_ADDRESS_IP6		0x0003
> > #define IB_NL_DATA_TYPE_PATH_RECORD		0x0004
> > #define IB_NL_DATA_TYPE_USER_PATH_REC		0x0005
> > #define IB_NL_DATA_TYPE_MAD			0x0006
> > 
> > #define IB_NL_FLAGS_PATH_GMP			1
> > #define IB_NL_FLAGS_PATH_PRIMARY		(1<<1)
> > #define IB_NL_FLAGS_PATH_ALTERNATE		(1<<2)
> > #define IB_NL_FLAGS_PATH_OUTBOUND		(1<<3)
> > #define IB_NL_FLAGS_PATH_INBOUND		(1<<4)
> > #define IB_NL_FLAGS_PATH_INBOUND_REVERSE 	(1<<5)
> > #define IB_NL_FLAGS_PATH_BIDIRECTIONAL		(IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE)
> > #define IB_NL_FLAGS_QUERY_SA			(1<<31)
> > #define IB_NL_FLAGS_NODELAY			(1<<30)
> 
> Please keep these in numerical order, don't put <<31 and below it <<30
> 
> > struct ib_nl_data_hdr {
> > 	__u8	version;
> > 	__u8	opcode;
> > 	__u16	status;
> Drop status because we fold it into opcode
> > 	__u16	type;
> > 	__u16	reserved;
> Drop reserved because we don't need alignment any more
> > 	__u32	flags;
> Flags is the only thing using bits fast, and we would want to make this
> header an even 128bits in length, so add a __u32 reserved; here.  That's
> more likely to be useful than the current layout since we are likely to
> run out of flags before anything else.
> > 	__u32	length;
> > };
> > 
> > struct ib_nl_data {
> > 	struct ib_nl_data_hdr		hdr;
> > 	__u8				data[0];
> > };
> > 
> > 
> > These defines and structures can be added to file include/upai/rdma/rdma_netlink.h (replace with
> > RDMA_NL prefix) or contained in a seperate file (include/upai/rdma/ib_netlink.h ???). 
> > 
> > Please share your thoughts.
> 
> I think an extensible netlink framework here is the right way to go,
> certainly better than the one shot method you had first.

The one thing I left out of the above that might be worth changing is
the fact that you bury your sequence number down in your mad header.  If
there is a generic mechanism that multiple modules can use to send
customized data via nl, then it might be worthwhile to have the sequence
moved to the generic level.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC] IB/sa: Route SA pathrecord query through netlink
       [not found]     ` <1432228874.28905.35.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-05-21 17:35       ` Doug Ledford
@ 2015-05-21 17:43       ` Wan, Kaike
  1 sibling, 0 replies; 11+ messages in thread
From: Wan, Kaike @ 2015-05-21 17:43 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean, Weiny, Ira,
	Jason Gunthorpe,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org),
	Or Gerlitz


> On Thu, 2015-05-21 at 13:52 +0000, Wan, Kaike wrote:
> > In our previous posting to the mailing list, we proposed to send a MAD
> > request from kernel (more specifically, from ib_sa module) to a user space
> application (ibacm in this case) through netlink.
> > The user space application will send back the response. This simple
> > scheme can achieve the goal of a local SA cache in user space.
> >
> > The format of the request and response is diagrammed below:
> >
> >   ------------------
> >   | netlink header |
> >   ------------------
> >   |     MAD        |
> >   ------------------
> >
> > The kernel requests for a pathrecord, and the user application finds
> > it in its local cache and sends it to the kernel. If the netlink
> > request fails, the kernel will send the request to SA through the normal IB
> path (ib_mad -> hca driver -> wire).
> >
> > Jason pointed out that this message format was limited to lower stack
> > format (MAD) and its use could not be readily extended to upper layer
> > modules like rdma_cm. After lengthy discussions, we come up with a new
> and modified scheme, as described below.
> >
> > The general format of the request and response will be the same:
> >
> >   ------------------
> >   | netlink header |
> >   ------------------
> >   |  Data header   |
> >   ------------------
> >   |      Data      |
> >   ------------------
> >
> > The data header contains information about the type of
> > request/response, the status (for response), the type (format) of the
> > data, the total length of the data header + data, and a flags field about the
> request/response or data.
> >
> > Based on the type of the data, the data section may be in different
> > format: a string about the host name to resolve, an IP4/IP6 address, a
> > pathrecord, a user pathrecord (struct ib_user_path_rec), or simply a
> > MAD (like our posted patches), etc. Essentially it can be of any
> > format based on the data type. The key is to document the format so that
> the kernel and user space can communicate correctly.
> >
> > The details are described below:
> >
> > #define IB_NL_VERSION		0x01
> >
> > #define IB_NL_OP_MASK		0x0F
> > #define IB_NL_OP_RESOLVE	0x01
> > #define IB_NL_OP_QUERY_PATH	0x02
> > #define IB_NL_OP_SET_TIMEOUT	0x03
> > #define IB_NL_OP_ACK		0x80
> 
> If OP_ACK is one bit, why isn't the OP_MASK 0x7f?

You are right. The mask should be 0x7f

> 
> > #define IB_NL_STATUS_SUCCESS	0x0000
> > #define IB_NL_STATUS_ENODATA	0x0001
> 
> Do we need 16 bits for a bool?  In fact, couldn't this actually be switched so
> that the return of the message uses OP_SUCCESS instead of OP_ACK?

Potentially, you may want to return different statii for diagnostic purpose. ( OP_ACK | original OP) indicates that this is a response to OP, just like what is done in MAD response.

> 
> In other words, instead of two items here, couldn't the ACK bit be dropped
> entirely and replaced with SUCCESS so that when the user app returns the
> netlink packet, if the op on return == to the op on send, it failed, if it's op |
> SUCCESS, it succeeded?
> 
> > #define IB_NL_DATA_TYPE_INVALID			0x0000
> > #define IB_NL_DATA_TYPE_NAME			0x0001
> > #define IB_NL_DATA_TYPE_ADDRESS_IP		0x0002
> > #define IB_NL_DATA_TYPE_ADDRESS_IP6		0x0003
> > #define IB_NL_DATA_TYPE_PATH_RECORD		0x0004
> > #define IB_NL_DATA_TYPE_USER_PATH_REC		0x0005
> > #define IB_NL_DATA_TYPE_MAD			0x0006
> >
> > #define IB_NL_FLAGS_PATH_GMP			1
> > #define IB_NL_FLAGS_PATH_PRIMARY		(1<<1)
> > #define IB_NL_FLAGS_PATH_ALTERNATE		(1<<2)
> > #define IB_NL_FLAGS_PATH_OUTBOUND		(1<<3)
> > #define IB_NL_FLAGS_PATH_INBOUND		(1<<4)
> > #define IB_NL_FLAGS_PATH_INBOUND_REVERSE 	(1<<5)
> > #define IB_NL_FLAGS_PATH_BIDIRECTIONAL
> 	(IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE)
> > #define IB_NL_FLAGS_QUERY_SA			(1<<31)
> > #define IB_NL_FLAGS_NODELAY			(1<<30)
> 
> Please keep these in numerical order, don't put <<31 and below it <<30

Indeed, the flags should be defined with care. I simply copied from include/uapi/rdma/ib_user_sa.h and acm.h from ibacm for demonstration. I will remove them from my patches later.

> 
> > struct ib_nl_data_hdr {
> > 	__u8	version;
> > 	__u8	opcode;
> > 	__u16	status;
> Drop status because we fold it into opcode

Only if we don't need status. But we may need more status.


> > 	__u16	type;
> > 	__u16	reserved;
> Drop reserved because we don't need alignment any more

Depends on above.

> > 	__u32	flags;
> Flags is the only thing using bits fast, and we would want to make this header
> an even 128bits in length, so add a __u32 reserved; here.  That's more likely
> to be useful than the current layout since we are likely to run out of flags
> before anything else.

That is a very reasonable assumption. I will keep an eye on it.

> > 	__u32	length;
> > };
> >
> > struct ib_nl_data {
> > 	struct ib_nl_data_hdr		hdr;
> > 	__u8				data[0];
> > };
> >
> >
> > These defines and structures can be added to file
> > include/upai/rdma/rdma_netlink.h (replace with RDMA_NL prefix) or
> contained in a seperate file (include/upai/rdma/ib_netlink.h ???).
> >
> > Please share your thoughts.
> 
> I think an extensible netlink framework here is the right way to go, certainly
> better than the one shot method you had first.

Thank you.

Kaike
> 
> > Kaike
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > in the body of a message to majordomo@vger.kernel.org More
> majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> Doug Ledford <dledford@redhat.com>
>               GPG KeyID: 0E572FDD


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC] IB/sa: Route SA pathrecord query through netlink
       [not found]         ` <1432229723.28905.40.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-21 17:48           ` Wan, Kaike
  0 siblings, 0 replies; 11+ messages in thread
From: Wan, Kaike @ 2015-05-21 17:48 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean, Weiny, Ira,
	Jason Gunthorpe,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org),
	Or Gerlitz

> On Thu, 2015-05-21 at 13:21 -0400, Doug Ledford wrote:
> 
> The one thing I left out of the above that might be worth changing is the fact
> that you bury your sequence number down in your mad header.  If there is a
> generic mechanism that multiple modules can use to send customized data
> via nl, then it might be worthwhile to have the sequence moved to the
> generic level.

Absolutely. The netlink sequence number (for this multicast group) should be moved to drivers/infiniband/core/netlink.c and a function should be exported to get next sequence number so that multiple modules using the same multicast group can be multiplexed correctly.

> 
> 
> --
> Doug Ledford <dledford@redhat.com>
>               GPG KeyID: 0E572FDD


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] IB/sa: Route SA pathrecord query through netlink
       [not found] ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB2E96-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-05-21 17:21   ` Doug Ledford
@ 2015-05-21 18:12   ` Jason Gunthorpe
       [not found]     ` <20150521181200.GC6771-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-05-21 19:44   ` ira.weiny
  2 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2015-05-21 18:12 UTC (permalink / raw)
  To: Wan, Kaike
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean, Weiny, Ira,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org),
	Or Gerlitz

On Thu, May 21, 2015 at 01:52:36PM +0000, Wan, Kaike wrote:
> 
> In our previous posting to the mailing list, we proposed to send a MAD request from kernel (more
> specifically, from ib_sa module) to a user space application (ibacm in this case) through netlink.
> The user space application will send back the response. This simple scheme can achieve the goal 
> of a local SA cache in user space.
> 
> The format of the request and response is diagrammed below:
> 
>   | netlink header |
>   |     MAD        |
> 
> The kernel requests for a pathrecord, and the user application finds it in its local cache and sends
> it to the kernel. If the netlink request fails, the kernel will send the request to SA through the
> normal IB path (ib_mad -> hca driver -> wire).
> 
> Jason pointed out that this message format was limited to lower stack format (MAD) and its use
> could not be readily extended to upper layer modules like rdma_cm. After lengthy discussions, we 
> come up with a new and modified scheme, as described below.
> 
> The general format of the request and response will be the same:
> 
>   | netlink header |
>   |  Data header   |
>   |      Data      |
> 
> The data header contains information about the type of request/response, the status (for response),
> the type (format) of the data, the total length of the data header + data, and a flags field about
> the request/response or data.

I assume we can stack multiple data records?

So a response can have the required number of path records?

There is growing interest in APM as well, please ensure that all 6 APM
records can be returned to any query:
 - Primary GMP Path
 - Primary Forward Path
 - Primary Return Path
 - Alternate GMP Path
 - Alternate Forward Path
 - Alternate Return Path

[Somewhere I have an experimental patch that globally enables one-shot
 APM for RDMA-CM users, it isn't a big step]

Please at least consider how we could use the netlink interface to
maintain APM when alternate paths trigger and new path data needs to
be loaded.

Please consider how we could use this netlink interface to alter
existing alternate paths on established QPs.

(Consider, means just think through how the protocol would work, not implement)

Can you please provide a some quick examples of exactly what the
exchange will look like:
 - IPoIB UD mode connecting to a peer based on a ND response
 - IPoIB RC mode connecting to a peer based on a ND response
 - RDMA CM connecting RC from a src IP to a dst IP

> #define IB_NL_DATA_TYPE_INVALID			0x0000
> #define IB_NL_DATA_TYPE_NAME			0x0001
> #define IB_NL_DATA_TYPE_ADDRESS_IP		0x0002
> #define IB_NL_DATA_TYPE_ADDRESS_IP6		0x0003
> #define IB_NL_DATA_TYPE_PATH_RECORD		0x0004
> #define IB_NL_DATA_TYPE_USER_PATH_REC		0x0005
> #define IB_NL_DATA_TYPE_MAD			0x0006

We definitely want to include policy information:
 - What IPoIB netdev is this associated with, if any
 - IP TOS bits, tclass, flowlabel
 - Requesting kernel agent
 - Src/Dst IP

I see this as a way to delegate path lookup to user space, so that
userspace can inject policy.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC] IB/sa: Route SA pathrecord query through netlink
       [not found]     ` <20150521181200.GC6771-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-21 19:14       ` Wan, Kaike
  0 siblings, 0 replies; 11+ messages in thread
From: Wan, Kaike @ 2015-05-21 19:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean, Weiny, Ira,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org),
	Or Gerlitz

> > The general format of the request and response will be the same:
> >
> >   | netlink header |
> >   |  Data header   |
> >   |      Data      |
> >
> > The data header contains information about the type of
> > request/response, the status (for response), the type (format) of the
> > data, the total length of the data header + data, and a flags field about the
> request/response or data.
> 
> I assume we can stack multiple data records?
> 
> So a response can have the required number of path records?

Yes, you can. The type indicates the data format of individual record. The length field, along with potential flags definition (multi-record indicator), can determine how many records are returned.

> 
> There is growing interest in APM as well, please ensure that all 6 APM
> records can be returned to any query:
>  - Primary GMP Path
>  - Primary Forward Path
>  - Primary Return Path
>  - Alternate GMP Path
>  - Alternate Forward Path
>  - Alternate Return Path
> 

Using struct ib_path_rec_data in each record should be able to accomplish this. Again a type should be defined for this format. Alternative, we could define a mixed type where each data record has a subheader (subheader + data == data section):

#define IB_NL_DATA_TYPE_MIXED			0x0008

struct ib_nl_data_sub_hdr {
	__u16	type;
	__u16	flags;
	__u32	length;
};

----------------------
|  netlink header |
----------------------
| Data header      |
---------------------
| data subhdr 1   |
--------------------
|  data rec 1         |
--------------------
| data subhdr 2 |
--------------------
|  data rec 2       |
-------------------
|         ....                |
--------------------
| data subhdr N|
--------------------
| data rec N       |
-------------------


> [Somewhere I have an experimental patch that globally enables one-shot
> APM for RDMA-CM users, it isn't a big step]
> 
> Please at least consider how we could use the netlink interface to maintain
> APM when alternate paths trigger and new path data needs to be loaded.
> 
> Please consider how we could use this netlink interface to alter existing
> alternate paths on established QPs.
> 
> (Consider, means just think through how the protocol would work, not
> implement)
> 
> Can you please provide a some quick examples of exactly what the exchange
> will look like:
>  - IPoIB UD mode connecting to a peer based on a ND response
>  - IPoIB RC mode connecting to a peer based on a ND response

Not familiar with IPoIB and not sure what information exchange is needed here except for multicast group joining. MCMemberRecord could be gotten from a user application (SA proxy), similar to that for pathrecord, by sending query to the user application and getting back the MCMemberRecord. If the use application supports setting this attribute, it can be set through similar request/response exchange.

Any help for details?

>  - RDMA CM connecting RC from a src IP to a dst IP

Request from rdma_cm to resolve src/dst IP could be sent to user application (eg ibacm) and the pathrecord is sent back as the response. rdma_cm could use the returned info to establish connections. Again I am not familiar with the rdma_cm details.

Any expert out there? I know Sean is out today.

> 
> > #define IB_NL_DATA_TYPE_INVALID			0x0000
> > #define IB_NL_DATA_TYPE_NAME			0x0001
> > #define IB_NL_DATA_TYPE_ADDRESS_IP		0x0002
> > #define IB_NL_DATA_TYPE_ADDRESS_IP6		0x0003
> > #define IB_NL_DATA_TYPE_PATH_RECORD		0x0004
> > #define IB_NL_DATA_TYPE_USER_PATH_REC		0x0005
> > #define IB_NL_DATA_TYPE_MAD			0x0006
> 
> We definitely want to include policy information:
>  - What IPoIB netdev is this associated with, if any
>  - IP TOS bits, tclass, flowlabel
>  - Requesting kernel agent
>  - Src/Dst IP
> 
> I see this as a way to delegate path lookup to user space, so that userspace
> can inject policy.

As shown above, we can use subheader (or data sections) to aggregate data into one request/response.

Kaike

> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] IB/sa: Route SA pathrecord query through netlink
       [not found] ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB2E96-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-05-21 17:21   ` Doug Ledford
  2015-05-21 18:12   ` Jason Gunthorpe
@ 2015-05-21 19:44   ` ira.weiny
       [not found]     ` <20150521194439.GA6389-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
  2 siblings, 1 reply; 11+ messages in thread
From: ira.weiny @ 2015-05-21 19:44 UTC (permalink / raw)
  To: Wan, Kaike
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean, Jason Gunthorpe,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org),
	Or Gerlitz

On Thu, May 21, 2015 at 07:52:36AM -0600, Wan, Kaike wrote:
> 
> The general format of the request and response will be the same:
> 
>   ------------------
>   | netlink header |
>   ------------------
>   |  Data header   |
>   ------------------
>   |      Data      |
>   ------------------
> 
> The data header contains information about the type of request/response, the status (for response),
> the type (format) of the data, the total length of the data header + data, and a flags field about
> the request/response or data.
> 
> Based on the type of the data, the data section may be in different format: a string about the host
> name to resolve, an IP4/IP6 address, a pathrecord, a user pathrecord (struct ib_user_path_rec),
> or simply a MAD (like our posted patches), etc.

I think given the new plans this is not really necessary.

>
> Essentially it can be of any format based on the 
> data type. The key is to document the format so that the kernel and user space can communicate 
> correctly.
> 
> The details are described below:
> 
> #define IB_NL_VERSION		0x01

Change all "IB" to "RDMA"

> 
> #define IB_NL_OP_MASK		0x0F
> #define IB_NL_OP_RESOLVE	0x01
> #define IB_NL_OP_QUERY_PATH	0x02
> #define IB_NL_OP_SET_TIMEOUT	0x03
> #define IB_NL_OP_ACK		0x80
> 
> #define IB_NL_STATUS_SUCCESS	0x0000
> #define IB_NL_STATUS_ENODATA	0x0001

If we do what Doug suggested should we just make OP 16bits with the high bit
ACK/NACK?

Then use the other u8 as a detailed status if needed.

> 
> #define IB_NL_DATA_TYPE_INVALID			0x0000
> #define IB_NL_DATA_TYPE_NAME			0x0001
> #define IB_NL_DATA_TYPE_ADDRESS_IP		0x0002
> #define IB_NL_DATA_TYPE_ADDRESS_IP6		0x0003
> #define IB_NL_DATA_TYPE_PATH_RECORD		0x0004
> #define IB_NL_DATA_TYPE_USER_PATH_REC		0x0005

Do we need both PATH_RECORD and USER_PATH_REC?

I'm having trouble determining when the OP == QUERY_PATH and the DATA_TYPE !=
PATH_RECORD.

Why don't we remove "QUERY_PATH" above and allow OP == RESOLVE and DATA_TYPE ==
PATH_RECORD be a "query for path record"?

> #define IB_NL_DATA_TYPE_MAD			0x0006

I would drop this for now.

> 
> #define IB_NL_FLAGS_PATH_GMP			1
> #define IB_NL_FLAGS_PATH_PRIMARY		(1<<1)
> #define IB_NL_FLAGS_PATH_ALTERNATE		(1<<2)
> #define IB_NL_FLAGS_PATH_OUTBOUND		(1<<3)
> #define IB_NL_FLAGS_PATH_INBOUND		(1<<4)
> #define IB_NL_FLAGS_PATH_INBOUND_REVERSE 	(1<<5)
> #define IB_NL_FLAGS_PATH_BIDIRECTIONAL		(IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE)
> #define IB_NL_FLAGS_QUERY_SA			(1<<31)
> #define IB_NL_FLAGS_NODELAY			(1<<30)
> 
> struct ib_nl_data_hdr {
> 	__u8	version;
> 	__u8	opcode;
> 	__u16	status;
> 	__u16	type;
> 	__u16	reserved;
> 	__u32	flags;
> 	__u32	length;

The overall message length is in the netlink header.  So keeping in mind Jasons
comments regarding returning multiple data records.

I think this should be the length of individual records with a "num records"
also specified.  I would much prefer this over the yucky IBTA RMPP method of
implicit record sizes needing to be divided into the overall message size.

> };

Should we have a "class" value in the header somewhere?  With multiple user
space listeners it could be easier to mux messages with such a value.  The
class/seq would then differentiate the message.

> 
> struct ib_nl_data {
> 	struct ib_nl_data_hdr		hdr;
> 	__u8				data[0];
> };
> 
> 
> These defines and structures can be added to file include/upai/rdma/rdma_netlink.h (replace with
> RDMA_NL prefix) or contained in a seperate file (include/upai/rdma/ib_netlink.h ???). 

This is not as important as getting the protocol down.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] IB/sa: Route SA pathrecord query through netlink
       [not found]     ` <20150521194439.GA6389-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
@ 2015-05-21 19:49       ` Jason Gunthorpe
  2015-05-21 20:40       ` Hefty, Sean
  2015-05-21 23:33       ` Wan, Kaike
  2 siblings, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2015-05-21 19:49 UTC (permalink / raw)
  To: ira.weiny
  Cc: Wan, Kaike, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org),
	Or Gerlitz

On Thu, May 21, 2015 at 03:44:40PM -0400, ira.weiny wrote:

> The overall message length is in the netlink header.  So keeping in mind Jasons
> comments regarding returning multiple data records.

There is already an existing idiom and macro set for nesting netlink
records, use it.

Someone needs to describe exactly what NL request packet each of the
interesting query points in the kernel will produce to meaningfully
continue discussion.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC] IB/sa: Route SA pathrecord query through netlink
       [not found]     ` <20150521194439.GA6389-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
  2015-05-21 19:49       ` Jason Gunthorpe
@ 2015-05-21 20:40       ` Hefty, Sean
  2015-05-21 23:33       ` Wan, Kaike
  2 siblings, 0 replies; 11+ messages in thread
From: Hefty, Sean @ 2015-05-21 20:40 UTC (permalink / raw)
  To: Weiny, Ira, Wan, Kaike
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org),
	Or Gerlitz

> > #define IB_NL_DATA_TYPE_INVALID			0x0000
> > #define IB_NL_DATA_TYPE_NAME			0x0001
> > #define IB_NL_DATA_TYPE_ADDRESS_IP		0x0002
> > #define IB_NL_DATA_TYPE_ADDRESS_IP6		0x0003
> > #define IB_NL_DATA_TYPE_PATH_RECORD		0x0004
> > #define IB_NL_DATA_TYPE_USER_PATH_REC		0x0005
> 
> Do we need both PATH_RECORD and USER_PATH_REC?
> 
> I'm having trouble determining when the OP == QUERY_PATH and the DATA_TYPE
> !=
> PATH_RECORD.
> 
> Why don't we remove "QUERY_PATH" above and allow OP == RESOLVE and
> DATA_TYPE ==
> PATH_RECORD be a "query for path record"?

I agree with Ira.

Conceptually, there are at least 2 pieces of information that need to be provided.  The format/type of the input data and the desired output data.  It looks like Kaike is using the ibacm protocol as a base.  I *think* the QUERY_PATH operation for ibacm forced an SA query.  (It is used for testing purposes.  That, or I'm remembering something else and associating it with QUERY_PATH.)

The desired output data can either be encoded as part of the operation or separated into its own field.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC] IB/sa: Route SA pathrecord query through netlink
       [not found]     ` <20150521194439.GA6389-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
  2015-05-21 19:49       ` Jason Gunthorpe
  2015-05-21 20:40       ` Hefty, Sean
@ 2015-05-21 23:33       ` Wan, Kaike
  2 siblings, 0 replies; 11+ messages in thread
From: Wan, Kaike @ 2015-05-21 23:33 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean, Jason Gunthorpe,
	Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org),
	Or Gerlitz

> On Thu, May 21, 2015 at 07:52:36AM -0600, Wan, Kaike wrote:
> >
> > The general format of the request and response will be the same:
> >
> >   ------------------
> >   | netlink header |
> >   ------------------
> >   |  Data header   |
> >   ------------------
> >   |      Data      |
> >   ------------------
> >
> > The data header contains information about the type of
> > request/response, the status (for response), the type (format) of the
> > data, the total length of the data header + data, and a flags field about the
> request/response or data.
> >
> > Based on the type of the data, the data section may be in different
> > format: a string about the host name to resolve, an IP4/IP6 address, a
> > pathrecord, a user pathrecord (struct ib_user_path_rec), or simply a MAD
> (like our posted patches), etc.
> 
> I think given the new plans this is not really necessary.

Why not? Communication of the ib_mad/ib_sa layer with user space might find using MAD format more natural. It all depends  on where  the kernel component is, or the format of the input data that are available to it.

> 
> >
> > Essentially it can be of any format based on the data type. The key is
> > to document the format so that the kernel and user space can
> > communicate correctly.
> >
> > The details are described below:
> >
> > #define IB_NL_VERSION		0x01
> 
> Change all "IB" to "RDMA"

Will do.

> 
> >
> > #define IB_NL_OP_MASK		0x0F
> > #define IB_NL_OP_RESOLVE	0x01
> > #define IB_NL_OP_QUERY_PATH	0x02
> > #define IB_NL_OP_SET_TIMEOUT	0x03
> > #define IB_NL_OP_ACK		0x80
> >
> > #define IB_NL_STATUS_SUCCESS	0x0000
> > #define IB_NL_STATUS_ENODATA	0x0001
> 
> If we do what Doug suggested should we just make OP 16bits with the high
> bit ACK/NACK?
> 
> Then use the other u8 as a detailed status if needed.
In that case,  Use :

   __u8 version;
   __u8 status;
  __u16 opcode;

We will have more opcode space. 

> 
> >
> > #define IB_NL_DATA_TYPE_INVALID			0x0000
> > #define IB_NL_DATA_TYPE_NAME			0x0001
> > #define IB_NL_DATA_TYPE_ADDRESS_IP		0x0002
> > #define IB_NL_DATA_TYPE_ADDRESS_IP6		0x0003
> > #define IB_NL_DATA_TYPE_PATH_RECORD		0x0004
> > #define IB_NL_DATA_TYPE_USER_PATH_REC		0x0005
> 
> Do we need both PATH_RECORD and USER_PATH_REC?

With subheader definition (In my response to Jason's comments), we need only PATH_RECORD type. I will update my RFC.

> 
> I'm having trouble determining when the OP == QUERY_PATH and the
> DATA_TYPE != PATH_RECORD.
> 
> Why don't we remove "QUERY_PATH" above and allow OP == RESOLVE and
> DATA_TYPE == PATH_RECORD be a "query for path record"?

TRUE.

> 
> > #define IB_NL_DATA_TYPE_MAD			0x0006
> 
> I would drop this for now.
> 
> >
> > #define IB_NL_FLAGS_PATH_GMP			1
> > #define IB_NL_FLAGS_PATH_PRIMARY		(1<<1)
> > #define IB_NL_FLAGS_PATH_ALTERNATE		(1<<2)
> > #define IB_NL_FLAGS_PATH_OUTBOUND		(1<<3)
> > #define IB_NL_FLAGS_PATH_INBOUND		(1<<4)
> > #define IB_NL_FLAGS_PATH_INBOUND_REVERSE 	(1<<5)
> > #define IB_NL_FLAGS_PATH_BIDIRECTIONAL
> 	(IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE)
> > #define IB_NL_FLAGS_QUERY_SA			(1<<31)
> > #define IB_NL_FLAGS_NODELAY			(1<<30)
> >
> > struct ib_nl_data_hdr {
> > 	__u8	version;
> > 	__u8	opcode;
> > 	__u16	status;
> > 	__u16	type;
> > 	__u16	reserved;
> > 	__u32	flags;
> > 	__u32	length;
> 
> The overall message length is in the netlink header.  So keeping in mind
> Jasons comments regarding returning multiple data records.

With this length field, a netlink package could carry multiple requests/responses (opcodes). Each request/response could carry multiple data sections.

> 
> I think this should be the length of individual records with a "num records"
> also specified

With data sections, each section may have different length. Within a data section, when all records have the same size, we can carry the number of records field. 

.  I would much prefer this over the yucky IBTA RMPP method
> of implicit record sizes needing to be divided into the overall message size.
> 
> > };
> 
> Should we have a "class" value in the header somewhere?  With multiple
> user space listeners it could be easier to mux messages with such a value.
> The class/seq would then differentiate the message.

Then use a different multicast group. It does not make sense to have multiple listeners for the same multicast group and they all can serve the same request.

> 
> >
> > struct ib_nl_data {
> > 	struct ib_nl_data_hdr		hdr;
> > 	__u8				data[0];
> > };
> >
> >
> > These defines and structures can be added to file
> > include/upai/rdma/rdma_netlink.h (replace with RDMA_NL prefix) or
> contained in a seperate file (include/upai/rdma/ib_netlink.h ???).
> 
> This is not as important as getting the protocol down.

Minor.

> 
> Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-05-21 23:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-21 13:52 [RFC] IB/sa: Route SA pathrecord query through netlink Wan, Kaike
     [not found] ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB2E96-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-05-21 17:21   ` Doug Ledford
     [not found]     ` <1432228874.28905.35.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-21 17:35       ` Doug Ledford
     [not found]         ` <1432229723.28905.40.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-21 17:48           ` Wan, Kaike
2015-05-21 17:43       ` Wan, Kaike
2015-05-21 18:12   ` Jason Gunthorpe
     [not found]     ` <20150521181200.GC6771-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-21 19:14       ` Wan, Kaike
2015-05-21 19:44   ` ira.weiny
     [not found]     ` <20150521194439.GA6389-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-05-21 19:49       ` Jason Gunthorpe
2015-05-21 20:40       ` Hefty, Sean
2015-05-21 23:33       ` Wan, Kaike

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.