All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch v2 00/37] add rxe (soft RoCE)
@ 2011-07-24 19:43 rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 01/37] add slice by 8 algorithm to crc32.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (37 more replies)
  0 siblings, 38 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

Changes in v2 include:

	- Updated to Roland's tree as of 7/24/2011

	- Moved the crc32 algorithm into a patch (slice-by-8-for_crc32.c.diff)
	  that goes into the mainline kernel. It has been submitted upstream
	  but is also included in here since it is required to build the driver.

	- renamed rxe_sb8.c rxe_icrc.c since that is all it now does.

	- Cleaned up warnings from checkpatch, C=2 and __CHECK_ENDIAN__.

	- moved small .h files into rxe_loc.h

	- rewrote the Kconfig text to be a little friendlier

	- Changed the patch names to make them easier to handle.

	- the quilt patch series is online at:
	  http://support.systemfabricworks.com/downloads/rxe/patches-v2.tgz

	- librxe is online at:
	  http://support.systemfabricworks.com/downloads/rxe/librxe-1.0.0.tar.gz

	Thanks to Roland Dreier, Bart van Assche and David Dillow for helpful
	suggestions.

Introduction:

This patch set implements a software emulation of RoCE or InfiniBand transport.
It consists of two kernel modules. The first, ib_rxe, implements the RDMA
transport and registers with the RDMA core as a kernel verbs provider. The
second, ib_rxe_net or ib_sample, implement the packet IO layer. ib_rxe_net
attaches to the Linux netdev stack as a network protocol and can send and
receive packets over any Ethernet device. It uses the RoCE protocol to handle
RDMA transport. ib_sample is a pure loopback device that uses the InfiniBand
transport i.e. it includes the LRH header while RoCE only includes the GRH
header.

The modules are configured by entries in /sys. There is a configuration script
(rxe_cfg) that simplifies the use of this interface. Rxe_cfg is part of the
rxe user space code, librxe.

The use of rxe verbs in user space requires the inclusion of librxe as a device
specific plug-in to libibverbs. Librxe is packaged separately.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 01/37] add slice by 8 algorithm to crc32.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201228.154338911-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 02/37] add opcodes to ib_pack.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (36 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: slice-by-8-for-crc32.diff --]
[-- Type: text/plain, Size: 17837 bytes --]

Added support for slice by 8 to existing crc32 algorithm. Also
modified gen_crc32table.c to only produce table entries that are
actually used. The parameters CRC_LE_BITS and CRC_BE_BITS determine
the number of bits in the input array that are processed during each
step. Generally the more bits the faster the algorithm is but the
more table data required.

Using an x86_64 Opteron machine running at 2100MHz the following table
was collected with a pre-warmed cache by computing the crc 1000 times
on a buffer of 4096 bytes.

	BITS	Size	LE Cycles/byte	BE Cycles/byte
	----------------------------------------------
	1	873	41.65		34.60
	2	1097	25.43		29.61
	4	1057	13.29		15.28
	8	2913	7.13		8.19
	32	9684	2.80		2.82
	64	18178	1.53		1.53

	BITS is the value of CRC_LE_BITS or CRC_BE_BITS. The old
	default was 8 which actually selected the 32 bit algorithm. In
	this version the value 8 is used to select the standard
	8 bit algorithm and two new values: 32 and 64 are introduced
	to select the slice by 4 and slice by 8 algorithms respectively.

	Where Size is the size of crc32.o's text segment which includes
	code and table data when both LE and BE versions are set to BITS.

The current version of crc32.c by default uses the slice by 4 algorithm
which requires about 2.8 cycles per byte. The slice by 8 algorithm is
roughly 2X faster and enables packet processing at over 1GB/sec on a typical
2-3GHz system.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 lib/crc32.c          |  372 +++++++++++++++++++++++++++++++++++----------------
 lib/crc32defs.h      |   16 +-
 lib/gen_crc32table.c |   59 +++++---
 3 files changed, 310 insertions(+), 137 deletions(-)

Index: infiniband/lib/crc32.c
===================================================================
--- infiniband.orig/lib/crc32.c
+++ infiniband/lib/crc32.c
@@ -1,4 +1,8 @@
 /*
+ * July 20, 2011 Bob Pearson <rpearson at systemfabricworks.com>
+ * added slice by 8 algorithm to the existing conventional and
+ * slice by 4 algorithms.
+ *
  * Oct 15, 2000 Matt Domsch <Matt_Domsch-8PEkshWhKlo@public.gmane.org>
  * Nicer crc32 functions/docs submitted by linux-gpGsJRJZ3PDowKkBSvOlow@public.gmane.org  Thanks!
  * Code was from the public domain, copyright abandoned.  Code was
@@ -19,7 +23,6 @@
  * This source code is licensed under the GNU General Public License,
  * Version 2.  See the file COPYING for more details.
  */
-
 #include <linux/crc32.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -28,14 +31,15 @@
 #include <linux/init.h>
 #include <asm/atomic.h>
 #include "crc32defs.h"
-#if CRC_LE_BITS == 8
-# define tole(x) __constant_cpu_to_le32(x)
+
+#if CRC_LE_BITS > 8
+# define tole(x) (__force u32) __constant_cpu_to_le32(x)
 #else
 # define tole(x) (x)
 #endif
 
-#if CRC_BE_BITS == 8
-# define tobe(x) __constant_cpu_to_be32(x)
+#if CRC_BE_BITS > 8
+# define tobe(x) (__force u32) __constant_cpu_to_be32(x)
 #else
 # define tobe(x) (x)
 #endif
@@ -45,54 +49,228 @@ MODULE_AUTHOR("Matt Domsch <Matt_Domsch@
 MODULE_DESCRIPTION("Ethernet CRC32 calculations");
 MODULE_LICENSE("GPL");
 
-#if CRC_LE_BITS == 8 || CRC_BE_BITS == 8
+#if CRC_LE_BITS > 8
+static inline u32 crc32_le_body(u32 crc, u8 const *buf, size_t len)
+{
+	const u8 *p8;
+	const u32 *p32;
+	int init_bytes, end_bytes;
+	size_t words;
+	int i;
+	u32 q;
+	u8 i0, i1, i2, i3;
+
+	crc = (__force u32) __cpu_to_le32(crc);
+
+#if CRC_LE_BITS == 64
+	p8 = buf;
+	p32 = (u32 *)(((uintptr_t)p8 + 7) & ~7);
+
+	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;
+	if (init_bytes > len)
+		init_bytes = len;
+	words = (len - init_bytes) >> 3;
+	end_bytes = (len - init_bytes) & 7;
+#else
+	p8 = buf;
+	p32 = (u32 *)(((uintptr_t)p8 + 3) & ~3);
+
+	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;
+	if (init_bytes > len)
+		init_bytes = len;
+	words = (len - init_bytes) >> 2;
+	end_bytes = (len - init_bytes) & 3;
+#endif
+
+	for (i = 0; i < init_bytes; i++) {
+#ifdef __LITTLE_ENDIAN
+		i0 = *p8++ ^ crc;
+		crc = t0_le[i0] ^ (crc >> 8);
+#else
+		i0 = *p8++ ^ (crc >> 24);
+		crc = t0_le[i0] ^ (crc << 8);
+#endif
+	}
+
+	for (i = 0; i < words; i++) {
+#ifdef __LITTLE_ENDIAN
+#  if CRC_LE_BITS == 64
+		/* slice by 8 algorithm */
+		q = *p32++ ^ crc;
+		i3 = q;
+		i2 = q >> 8;
+		i1 = q >> 16;
+		i0 = q >> 24;
+		crc = t7_le[i3] ^ t6_le[i2] ^ t5_le[i1] ^ t4_le[i0];
+
+		q = *p32++;
+		i3 = q;
+		i2 = q >> 8;
+		i1 = q >> 16;
+		i0 = q >> 24;
+		crc ^= t3_le[i3] ^ t2_le[i2] ^ t1_le[i1] ^ t0_le[i0];
+#  else
+		/* slice by 4 algorithm */
+		q = *p32++ ^ crc;
+		i3 = q;
+		i2 = q >> 8;
+		i1 = q >> 16;
+		i0 = q >> 24;
+		crc = t3_le[i3] ^ t2_le[i2] ^ t1_le[i1] ^ t0_le[i0];
+#  endif
+#else
+#  if CRC_LE_BITS == 64
+		q = *p32++ ^ crc;
+		i3 = q >> 24;
+		i2 = q >> 16;
+		i1 = q >> 8;
+		i0 = q;
+		crc = t7_le[i3] ^ t6_le[i2] ^ t5_le[i1] ^ t4_le[i0];
+
+		q = *p32++;
+		i3 = q >> 24;
+		i2 = q >> 16;
+		i1 = q >> 8;
+		i0 = q;
+		crc ^= t3_le[i3] ^ t2_le[i2] ^ t1_le[i1] ^ t0_le[i0];
+#  else
+		q = *p32++ ^ crc;
+		i3 = q >> 24;
+		i2 = q >> 16;
+		i1 = q >> 8;
+		i0 = q;
+		crc = t3_le[i3] ^ t2_le[i2] ^ t1_le[i1] ^ t0_le[i0];
+#  endif
+#endif
+	}
+
+	p8 = (u8 *)p32;
+
+	for (i = 0; i < end_bytes; i++) {
+#ifdef __LITTLE_ENDIAN
+		i0 = *p8++ ^ crc;
+		crc = t0_le[i0] ^ (crc >> 8);
+#else
+		i0 = *p8++ ^ (crc >> 24);
+		crc = t0_le[i0] ^ (crc << 8);
+#endif
+	}
+
+	return __le32_to_cpu((__force __le32)crc);
+}
+#endif
 
-static inline u32
-crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
+#if CRC_BE_BITS > 8
+static inline u32 crc32_be_body(u32 crc, u8 const *buf, size_t len)
 {
-# ifdef __LITTLE_ENDIAN
-#  define DO_CRC(x) crc = tab[0][(crc ^ (x)) & 255] ^ (crc >> 8)
-#  define DO_CRC4 crc = tab[3][(crc) & 255] ^ \
-		tab[2][(crc >> 8) & 255] ^ \
-		tab[1][(crc >> 16) & 255] ^ \
-		tab[0][(crc >> 24) & 255]
-# else
-#  define DO_CRC(x) crc = tab[0][((crc >> 24) ^ (x)) & 255] ^ (crc << 8)
-#  define DO_CRC4 crc = tab[0][(crc) & 255] ^ \
-		tab[1][(crc >> 8) & 255] ^ \
-		tab[2][(crc >> 16) & 255] ^ \
-		tab[3][(crc >> 24) & 255]
-# endif
-	const u32 *b;
-	size_t    rem_len;
+	const u8 *p8;
+	const u32 *p32;
+	int init_bytes, end_bytes;
+	size_t words;
+	int i;
+	u32 q;
+	u8 i0, i1, i2, i3;
+
+	crc = (__force u32) __cpu_to_be32(crc);
+
+#if CRC_LE_BITS == 64
+	p8 = buf;
+	p32 = (u32 *)(((uintptr_t)p8 + 7) & ~7);
+
+	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;
+	if (init_bytes > len)
+		init_bytes = len;
+	words = (len - init_bytes) >> 3;
+	end_bytes = (len - init_bytes) & 7;
+#else
+	p8 = buf;
+	p32 = (u32 *)(((uintptr_t)p8 + 3) & ~3);
+
+	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;
+	if (init_bytes > len)
+		init_bytes = len;
+	words = (len - init_bytes) >> 2;
+	end_bytes = (len - init_bytes) & 3;
+#endif
+
+	for (i = 0; i < init_bytes; i++) {
+#ifdef __LITTLE_ENDIAN
+		i0 = *p8++ ^ crc;
+		crc = t0_be[i0] ^ (crc >> 8);
+#else
+		i0 = *p8++ ^ (crc >> 24);
+		crc = t0_be[i0] ^ (crc << 8);
+#endif
+	}
 
-	/* Align it */
-	if (unlikely((long)buf & 3 && len)) {
-		do {
-			DO_CRC(*buf++);
-		} while ((--len) && ((long)buf)&3);
-	}
-	rem_len = len & 3;
-	/* load data 32 bits wide, xor data 32 bits wide. */
-	len = len >> 2;
-	b = (const u32 *)buf;
-	for (--b; len; --len) {
-		crc ^= *++b; /* use pre increment for speed */
-		DO_CRC4;
-	}
-	len = rem_len;
-	/* And the last few bytes */
-	if (len) {
-		u8 *p = (u8 *)(b + 1) - 1;
-		do {
-			DO_CRC(*++p); /* use pre increment for speed */
-		} while (--len);
+	for (i = 0; i < words; i++) {
+#ifdef __LITTLE_ENDIAN
+#  if CRC_LE_BITS == 64
+		/* slice by 8 algorithm */
+		q = *p32++ ^ crc;
+		i3 = q;
+		i2 = q >> 8;
+		i1 = q >> 16;
+		i0 = q >> 24;
+		crc = t7_be[i3] ^ t6_be[i2] ^ t5_be[i1] ^ t4_be[i0];
+
+		q = *p32++;
+		i3 = q;
+		i2 = q >> 8;
+		i1 = q >> 16;
+		i0 = q >> 24;
+		crc ^= t3_be[i3] ^ t2_be[i2] ^ t1_be[i1] ^ t0_be[i0];
+#  else
+		/* slice by 4 algorithm */
+		q = *p32++ ^ crc;
+		i3 = q;
+		i2 = q >> 8;
+		i1 = q >> 16;
+		i0 = q >> 24;
+		crc = t3_be[i3] ^ t2_be[i2] ^ t1_be[i1] ^ t0_be[i0];
+#  endif
+#else
+#  if CRC_LE_BITS == 64
+		q = *p32++ ^ crc;
+		i3 = q >> 24;
+		i2 = q >> 16;
+		i1 = q >> 8;
+		i0 = q;
+		crc = t7_be[i3] ^ t6_be[i2] ^ t5_be[i1] ^ t4_be[i0];
+
+		q = *p32++;
+		i3 = q >> 24;
+		i2 = q >> 16;
+		i1 = q >> 8;
+		i0 = q;
+		crc ^= t3_be[i3] ^ t2_be[i2] ^ t1_be[i1] ^ t0_be[i0];
+#  else
+		q = *p32++ ^ crc;
+		i3 = q >> 24;
+		i2 = q >> 16;
+		i1 = q >> 8;
+		i0 = q;
+		crc = t3_be[i3] ^ t2_be[i2] ^ t1_be[i1] ^ t0_be[i0];
+#  endif
+#endif
 	}
-	return crc;
-#undef DO_CRC
-#undef DO_CRC4
+
+	p8 = (u8 *)p32;
+
+	for (i = 0; i < end_bytes; i++) {
+#ifdef __LITTLE_ENDIAN
+		i0 = *p8++ ^ crc;
+		crc = t0_be[i0] ^ (crc >> 8);
+#else
+		i0 = *p8++ ^ (crc >> 24);
+		crc = t0_be[i0] ^ (crc << 8);
+#endif
+	}
+
+	return __be32_to_cpu((__force __be32)crc);
 }
 #endif
+
 /**
  * crc32_le() - Calculate bitwise little-endian Ethernet AUTODIN II CRC32
  * @crc: seed value for computation.  ~0 for Ethernet, sometimes 0 for
@@ -100,53 +278,40 @@ crc32_body(u32 crc, unsigned char const 
  * @p: pointer to buffer over which CRC is run
  * @len: length of buffer @p
  */
-u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len);
-
-#if CRC_LE_BITS == 1
-/*
- * In fact, the table-based code will work in this case, but it can be
- * simplified by inlining the table in ?: form.
- */
-
 u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
 {
+#if CRC_LE_BITS == 1
 	int i;
 	while (len--) {
 		crc ^= *p++;
 		for (i = 0; i < 8; i++)
 			crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
 	}
-	return crc;
-}
-#else				/* Table-based approach */
-
-u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
-{
-# if CRC_LE_BITS == 8
-	const u32      (*tab)[] = crc32table_le;
-
-	crc = __cpu_to_le32(crc);
-	crc = crc32_body(crc, p, len, tab);
-	return __le32_to_cpu(crc);
+# elif CRC_LE_BITS == 2
+	while (len--) {
+		crc ^= *p++;
+		crc = (crc >> 2) ^ t0_le[crc & 0x03];
+		crc = (crc >> 2) ^ t0_le[crc & 0x03];
+		crc = (crc >> 2) ^ t0_le[crc & 0x03];
+		crc = (crc >> 2) ^ t0_le[crc & 0x03];
+	}
 # elif CRC_LE_BITS == 4
 	while (len--) {
 		crc ^= *p++;
-		crc = (crc >> 4) ^ crc32table_le[crc & 15];
-		crc = (crc >> 4) ^ crc32table_le[crc & 15];
+		crc = (crc >> 4) ^ t0_le[crc & 0x0f];
+		crc = (crc >> 4) ^ t0_le[crc & 0x0f];
 	}
-	return crc;
-# elif CRC_LE_BITS == 2
+# elif CRC_LE_BITS == 8
 	while (len--) {
 		crc ^= *p++;
-		crc = (crc >> 2) ^ crc32table_le[crc & 3];
-		crc = (crc >> 2) ^ crc32table_le[crc & 3];
-		crc = (crc >> 2) ^ crc32table_le[crc & 3];
-		crc = (crc >> 2) ^ crc32table_le[crc & 3];
+		crc = (crc >> 8) ^ t0_le[crc & 0xff];
 	}
-	return crc;
+# else
+	crc = crc32_le_body(crc, p, len);
 # endif
+	return crc;
 }
-#endif
+EXPORT_SYMBOL(crc32_le);
 
 /**
  * crc32_be() - Calculate bitwise big-endian Ethernet AUTODIN II CRC32
@@ -155,57 +320,40 @@ u32 __pure crc32_le(u32 crc, unsigned ch
  * @p: pointer to buffer over which CRC is run
  * @len: length of buffer @p
  */
-u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len);
-
-#if CRC_BE_BITS == 1
-/*
- * In fact, the table-based code will work in this case, but it can be
- * simplified by inlining the table in ?: form.
- */
-
 u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
 {
+#if CRC_BE_BITS == 1
 	int i;
 	while (len--) {
 		crc ^= *p++ << 24;
 		for (i = 0; i < 8; i++)
-			crc =
-			    (crc << 1) ^ ((crc & 0x80000000) ? CRCPOLY_BE :
-					  0);
+			crc = (crc << 1) ^
+			      ((crc & 0x80000000) ? CRCPOLY_BE : 0);
+	}
+# elif CRC_BE_BITS == 2
+	while (len--) {
+		crc ^= *p++ << 24;
+		crc = (crc << 2) ^ t0_be[crc >> 30];
+		crc = (crc << 2) ^ t0_be[crc >> 30];
+		crc = (crc << 2) ^ t0_be[crc >> 30];
+		crc = (crc << 2) ^ t0_be[crc >> 30];
 	}
-	return crc;
-}
-
-#else				/* Table-based approach */
-u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
-{
-# if CRC_BE_BITS == 8
-	const u32      (*tab)[] = crc32table_be;
-
-	crc = __cpu_to_be32(crc);
-	crc = crc32_body(crc, p, len, tab);
-	return __be32_to_cpu(crc);
 # elif CRC_BE_BITS == 4
 	while (len--) {
 		crc ^= *p++ << 24;
-		crc = (crc << 4) ^ crc32table_be[crc >> 28];
-		crc = (crc << 4) ^ crc32table_be[crc >> 28];
+		crc = (crc << 4) ^ t0_be[crc >> 28];
+		crc = (crc << 4) ^ t0_be[crc >> 28];
 	}
-	return crc;
-# elif CRC_BE_BITS == 2
+# elif CRC_BE_BITS == 8
 	while (len--) {
 		crc ^= *p++ << 24;
-		crc = (crc << 2) ^ crc32table_be[crc >> 30];
-		crc = (crc << 2) ^ crc32table_be[crc >> 30];
-		crc = (crc << 2) ^ crc32table_be[crc >> 30];
-		crc = (crc << 2) ^ crc32table_be[crc >> 30];
+		crc = (crc << 8) ^ t0_be[crc >> 24];
 	}
-	return crc;
+# else
+	crc = crc32_be_body(crc, p, len);
 # endif
+	return crc;
 }
-#endif
-
-EXPORT_SYMBOL(crc32_le);
 EXPORT_SYMBOL(crc32_be);
 
 /*
Index: infiniband/lib/crc32defs.h
===================================================================
--- infiniband.orig/lib/crc32defs.h
+++ infiniband/lib/crc32defs.h
@@ -6,27 +6,29 @@
 #define CRCPOLY_LE 0xedb88320
 #define CRCPOLY_BE 0x04c11db7
 
-/* How many bits at a time to use.  Requires a table of 4<<CRC_xx_BITS bytes. */
+/* How many bits at a time to use.  Valid values are 1, 2, 4, 8, 32 and 64. */
 /* For less performance-sensitive, use 4 */
 #ifndef CRC_LE_BITS 
-# define CRC_LE_BITS 8
+# define CRC_LE_BITS 64
 #endif
 #ifndef CRC_BE_BITS
-# define CRC_BE_BITS 8
+# define CRC_BE_BITS 64
 #endif
 
 /*
  * Little-endian CRC computation.  Used with serial bit streams sent
  * lsbit-first.  Be sure to use cpu_to_le32() to append the computed CRC.
  */
-#if CRC_LE_BITS > 8 || CRC_LE_BITS < 1 || CRC_LE_BITS & CRC_LE_BITS-1
-# error CRC_LE_BITS must be a power of 2 between 1 and 8
+#if CRC_LE_BITS > 64 || CRC_LE_BITS < 1 || CRC_LE_BITS == 16 || \
+	CRC_LE_BITS & CRC_LE_BITS-1
+# error "CRC_LE_BITS must be one of {1, 2, 4, 8, 32, 64}"
 #endif
 
 /*
  * Big-endian CRC computation.  Used with serial bit streams sent
  * msbit-first.  Be sure to use cpu_to_be32() to append the computed CRC.
  */
-#if CRC_BE_BITS > 8 || CRC_BE_BITS < 1 || CRC_BE_BITS & CRC_BE_BITS-1
-# error CRC_BE_BITS must be a power of 2 between 1 and 8
+#if CRC_BE_BITS > 64 || CRC_BE_BITS < 1 || CRC_BE_BITS == 16 || \
+	CRC_BE_BITS & CRC_BE_BITS-1
+# error "CRC_BE_BITS must be one of {1, 2, 4, 8, 32, 64}"
 #endif
Index: infiniband/lib/gen_crc32table.c
===================================================================
--- infiniband.orig/lib/gen_crc32table.c
+++ infiniband/lib/gen_crc32table.c
@@ -4,11 +4,20 @@
 
 #define ENTRIES_PER_LINE 4
 
+#if CRC_LE_BITS <= 8
 #define LE_TABLE_SIZE (1 << CRC_LE_BITS)
+#else
+#define LE_TABLE_SIZE 256
+#endif
+
+#if CRC_BE_BITS <= 8
 #define BE_TABLE_SIZE (1 << CRC_BE_BITS)
+#else
+#define BE_TABLE_SIZE 256
+#endif
 
-static uint32_t crc32table_le[4][LE_TABLE_SIZE];
-static uint32_t crc32table_be[4][BE_TABLE_SIZE];
+static uint32_t crc32table_le[8][256];
+static uint32_t crc32table_be[8][256];
 
 /**
  * crc32init_le() - allocate and initialize LE table data
@@ -24,14 +33,14 @@ static void crc32init_le(void)
 
 	crc32table_le[0][0] = 0;
 
-	for (i = 1 << (CRC_LE_BITS - 1); i; i >>= 1) {
+	for (i = LE_TABLE_SIZE >> 1; i; i >>= 1) {
 		crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
 		for (j = 0; j < LE_TABLE_SIZE; j += 2 * i)
 			crc32table_le[0][i + j] = crc ^ crc32table_le[0][j];
 	}
 	for (i = 0; i < LE_TABLE_SIZE; i++) {
 		crc = crc32table_le[0][i];
-		for (j = 1; j < 4; j++) {
+		for (j = 1; j < 8; j++) {
 			crc = crc32table_le[0][crc & 0xff] ^ (crc >> 8);
 			crc32table_le[j][i] = crc;
 		}
@@ -55,44 +64,58 @@ static void crc32init_be(void)
 	}
 	for (i = 0; i < BE_TABLE_SIZE; i++) {
 		crc = crc32table_be[0][i];
-		for (j = 1; j < 4; j++) {
+		for (j = 1; j < 8; j++) {
 			crc = crc32table_be[0][(crc >> 24) & 0xff] ^ (crc << 8);
 			crc32table_be[j][i] = crc;
 		}
 	}
 }
 
-static void output_table(uint32_t table[4][256], int len, char *trans)
+static void output_table(uint32_t table[8][256], int len, char trans)
 {
 	int i, j;
 
-	for (j = 0 ; j < 4; j++) {
-		printf("{");
+	for (j = 0 ; j < 8; j++) {
+		printf("static const u32 t%d_%ce[] = {", j, trans);
 		for (i = 0; i < len - 1; i++) {
-			if (i % ENTRIES_PER_LINE == 0)
+			if ((i % ENTRIES_PER_LINE) == 0)
 				printf("\n");
-			printf("%s(0x%8.8xL), ", trans, table[j][i]);
+			printf("to%ce(0x%8.8xL),", trans, table[j][i]);
+			if ((i % ENTRIES_PER_LINE) != (ENTRIES_PER_LINE - 1))
+				printf(" ");
+		}
+		printf("to%ce(0x%8.8xL)};\n\n", trans, table[j][len - 1]);
+
+		if (trans == 'l') {
+			if ((j+1)*8 >= CRC_LE_BITS)
+				break;
+		} else {
+			if ((j+1)*8 >= CRC_BE_BITS)
+				break;
 		}
-		printf("%s(0x%8.8xL)},\n", trans, table[j][len - 1]);
 	}
 }
 
 int main(int argc, char** argv)
 {
-	printf("/* this file is generated - do not edit */\n\n");
+	printf("/*\n");
+	printf(" * crc32table.h - CRC32 tables\n");
+	printf(" *    this file is generated - do not edit\n");
+	printf(" *	# gen_crc32table > crc32table.h\n");
+	printf(" *    with\n");
+	printf(" *	CRC_LE_BITS = %d\n", CRC_LE_BITS);
+	printf(" *	CRC_BE_BITS = %d\n", CRC_BE_BITS);
+	printf(" */\n");
+	printf("\n");
 
 	if (CRC_LE_BITS > 1) {
 		crc32init_le();
-		printf("static const u32 crc32table_le[4][256] = {");
-		output_table(crc32table_le, LE_TABLE_SIZE, "tole");
-		printf("};\n");
+		output_table(crc32table_le, LE_TABLE_SIZE, 'l');
 	}
 
 	if (CRC_BE_BITS > 1) {
 		crc32init_be();
-		printf("static const u32 crc32table_be[4][256] = {");
-		output_table(crc32table_be, BE_TABLE_SIZE, "tobe");
-		printf("};\n");
+		output_table(crc32table_be, BE_TABLE_SIZE, 'b');
 	}
 
 	return 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 02/37] add opcodes to ib_pack.h
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 01/37] add slice by 8 algorithm to crc32.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201228.204265835-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 03/37] add rxe_hdr.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (35 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-opcodes-to-ib_pack.diff --]
[-- Type: text/plain, Size: 3100 bytes --]

Bring up to date with the current version of the IBTA spec.
	- add new opcodes for RC and RD
	- add new groups of opcodes for CN and XRC

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 include/rdma/ib_pack.h |   39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

Index: infiniband/include/rdma/ib_pack.h
===================================================================
--- infiniband.orig/include/rdma/ib_pack.h
+++ infiniband/include/rdma/ib_pack.h
@@ -75,6 +75,8 @@ enum {
 	IB_OPCODE_UC                                = 0x20,
 	IB_OPCODE_RD                                = 0x40,
 	IB_OPCODE_UD                                = 0x60,
+	IB_OPCODE_CN                                = 0x80,
+	IB_OPCODE_XRC                               = 0xA0,
 
 	/* operations -- just used to define real constants */
 	IB_OPCODE_SEND_FIRST                        = 0x00,
@@ -98,6 +100,10 @@ enum {
 	IB_OPCODE_ATOMIC_ACKNOWLEDGE                = 0x12,
 	IB_OPCODE_COMPARE_SWAP                      = 0x13,
 	IB_OPCODE_FETCH_ADD                         = 0x14,
+	IB_OPCODE_CNP                               = 0x00,
+	IB_OPCODE_RESYNC                            = 0x15,
+	IB_OPCODE_SEND_LAST_INV                     = 0x16,
+	IB_OPCODE_SEND_ONLY_INV                     = 0x17,
 
 	/* real constants follow -- see comment about above IB_OPCODE()
 	   macro for more details */
@@ -124,6 +130,8 @@ enum {
 	IB_OPCODE(RC, ATOMIC_ACKNOWLEDGE),
 	IB_OPCODE(RC, COMPARE_SWAP),
 	IB_OPCODE(RC, FETCH_ADD),
+	IB_OPCODE(RC, SEND_LAST_INV),
+	IB_OPCODE(RC, SEND_ONLY_INV),
 
 	/* UC */
 	IB_OPCODE(UC, SEND_FIRST),
@@ -161,10 +169,39 @@ enum {
 	IB_OPCODE(RD, ATOMIC_ACKNOWLEDGE),
 	IB_OPCODE(RD, COMPARE_SWAP),
 	IB_OPCODE(RD, FETCH_ADD),
+	IB_OPCODE(RD, RESYNC),
 
 	/* UD */
 	IB_OPCODE(UD, SEND_ONLY),
-	IB_OPCODE(UD, SEND_ONLY_WITH_IMMEDIATE)
+	IB_OPCODE(UD, SEND_ONLY_WITH_IMMEDIATE),
+
+	/* CN */
+	IB_OPCODE(CN, CNP),
+
+	/* XRC */
+	IB_OPCODE(XRC, SEND_FIRST),
+	IB_OPCODE(XRC, SEND_MIDDLE),
+	IB_OPCODE(XRC, SEND_LAST),
+	IB_OPCODE(XRC, SEND_LAST_WITH_IMMEDIATE),
+	IB_OPCODE(XRC, SEND_ONLY),
+	IB_OPCODE(XRC, SEND_ONLY_WITH_IMMEDIATE),
+	IB_OPCODE(XRC, RDMA_WRITE_FIRST),
+	IB_OPCODE(XRC, RDMA_WRITE_MIDDLE),
+	IB_OPCODE(XRC, RDMA_WRITE_LAST),
+	IB_OPCODE(XRC, RDMA_WRITE_LAST_WITH_IMMEDIATE),
+	IB_OPCODE(XRC, RDMA_WRITE_ONLY),
+	IB_OPCODE(XRC, RDMA_WRITE_ONLY_WITH_IMMEDIATE),
+	IB_OPCODE(XRC, RDMA_READ_REQUEST),
+	IB_OPCODE(XRC, RDMA_READ_RESPONSE_FIRST),
+	IB_OPCODE(XRC, RDMA_READ_RESPONSE_MIDDLE),
+	IB_OPCODE(XRC, RDMA_READ_RESPONSE_LAST),
+	IB_OPCODE(XRC, RDMA_READ_RESPONSE_ONLY),
+	IB_OPCODE(XRC, ACKNOWLEDGE),
+	IB_OPCODE(XRC, ATOMIC_ACKNOWLEDGE),
+	IB_OPCODE(XRC, COMPARE_SWAP),
+	IB_OPCODE(XRC, FETCH_ADD),
+	IB_OPCODE(XRC, SEND_LAST_INV),
+	IB_OPCODE(XRC, SEND_ONLY_INV),
 };
 
 enum {


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 03/37] add rxe_hdr.h
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 01/37] add slice by 8 algorithm to crc32.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 02/37] add opcodes to ib_pack.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 04/37] add rxe_opcode.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (34 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_hdr.h --]
[-- Type: text/plain, Size: 33394 bytes --]

Add declarations for data structures used to hold per opcode
and per work request opcode tables.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_hdr.h | 1294 ++++++++++++++++++++++++++++++++++++
 1 file changed, 1294 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_hdr.h
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_hdr.h
@@ -0,0 +1,1294 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_HDR_H
+#define RXE_HDR_H
+
+/* extracted information about a packet carried in an sk_buff struct
+   fits in the skbuff cb array. Must be at most 48 bytes. */
+struct rxe_pkt_info {
+	struct rxe_dev		*rxe;		/* device that owns packet */
+	struct rxe_qp		*qp;		/* qp that owns packet */
+	u8			*hdr;		/* points to grh or bth */
+	u32			mask;		/* useful info about pkt */
+	u32			psn;		/* bth psn of packet */
+	u16			pkey_index;	/* partition of pkt */
+	u16			paylen;		/* length of bth - icrc */
+	u8			port_num;	/* port pkt received on */
+	u8			opcode;		/* bth opcode of packet */
+	u8			offset;		/* bth offset from pkt->hdr */
+};
+
+#define SKB_TO_PKT(skb) ((struct rxe_pkt_info *)(skb)->cb)
+#define PKT_TO_SKB(pkt) ((struct sk_buff *)((char *)(pkt)	\
+			- offsetof(struct sk_buff, cb)))
+
+/*
+ * IBA header types and methods
+ *
+ * Some of these are for reference and completeness only since
+ * rxe does not currently support RD transport
+ * most of this could be moved into OFED core. ib_pack.h has
+ * part of this but is incomplete
+ *
+ * Header specific routines to insert/extract values to/from headers
+ * the routines that are named __hhh_(set_)fff() take a pointer to a
+ * hhh header and get(set) the fff field. The routines named
+ * hhh_(set_)fff take a packet info struct and find the
+ * header and field based on the opcode in the packet.
+ * Conversion to/from network byte order from cpu order is also done.
+ */
+
+#define RXE_ICRC_SIZE		(4)
+#define RXE_MAX_HDR_LENGTH	(80)
+
+/******************************************************************************
+ * Local Route Header
+ ******************************************************************************/
+struct rxe_lrh {
+	u8			vlver;		/* vl and lver */
+	u8			slnh;		/* sl and lnh */
+	__be16			dlid;
+	__be16			length;
+	__be16			slid;
+};
+
+#define LRH_LVER		(0)
+
+#define LRH_VL_MASK		(0xf0)
+#define LRH_LVER_MASK		(0x0f)
+#define LRH_SL_MASK		(0xf0)
+#define LRH_RESV2_MASK		(0x0c)
+#define LRH_LNH_MASK		(0x03)
+#define LRH_RESV5_MASK		(0xf800)
+#define LRH_LENGTH_MASK		(0x07ff)
+
+enum lrh_lnh {
+	LRH_LNH_RAW		= 0,
+	LRH_LNH_IP		= 1,
+	LRH_LNH_IBA_LOC		= 2,
+	LRH_LNH_IBA_GBL		= 3,
+};
+
+static inline u8 __lrh_vl(void *arg)
+{
+	struct rxe_lrh *lrh = arg;
+	return lrh->vlver >> 4;
+}
+
+static inline void __lrh_set_vl(void *arg, u8 vl)
+{
+	struct rxe_lrh *lrh = arg;
+	lrh->vlver = (vl << 4) | (~LRH_VL_MASK & lrh->vlver);
+}
+
+static inline u8 __lrh_lver(void *arg)
+{
+	struct rxe_lrh *lrh = arg;
+	return lrh->vlver & LRH_LVER_MASK;
+}
+
+static inline void __lrh_set_lver(void *arg, u8 lver)
+{
+	struct rxe_lrh *lrh = arg;
+	lrh->vlver = (lver & LRH_LVER_MASK) | (~LRH_LVER_MASK & lrh->vlver);
+}
+
+static inline u8 __lrh_sl(void *arg)
+{
+	struct rxe_lrh *lrh = arg;
+	return lrh->slnh >> 4;
+}
+
+static inline void __lrh_set_sl(void *arg, u8 sl)
+{
+	struct rxe_lrh *lrh = arg;
+	lrh->slnh = (sl << 4) | (~LRH_SL_MASK & lrh->slnh);
+}
+
+static inline u8 __lrh_lnh(void *arg)
+{
+	struct rxe_lrh *lrh = arg;
+	return lrh->slnh & LRH_LNH_MASK;
+}
+
+static inline void __lrh_set_lnh(void *arg, u8 lnh)
+{
+	struct rxe_lrh *lrh = arg;
+	lrh->slnh = (lnh & LRH_LNH_MASK) | (~LRH_LNH_MASK & lrh->slnh);
+}
+
+static inline u8 __lrh_dlid(void *arg)
+{
+	struct rxe_lrh *lrh = arg;
+	return be16_to_cpu(lrh->dlid);
+}
+
+static inline void __lrh_set_dlid(void *arg, u16 dlid)
+{
+	struct rxe_lrh *lrh = arg;
+	lrh->dlid = cpu_to_be16(dlid);
+}
+
+static inline u8 __lrh_length(void *arg)
+{
+	struct rxe_lrh *lrh = arg;
+	return be16_to_cpu(lrh->length) & LRH_LENGTH_MASK;
+}
+
+static inline void __lrh_set_length(void *arg, u16 length)
+{
+	struct rxe_lrh *lrh = arg;
+	u16 was = be16_to_cpu(lrh->length);
+	lrh->length = cpu_to_be16((length & LRH_LENGTH_MASK)
+			| (~LRH_LENGTH_MASK & was));
+}
+
+static inline u8 __lrh_slid(void *arg)
+{
+	struct rxe_lrh *lrh = arg;
+	return be16_to_cpu(lrh->slid);
+}
+
+static inline void __lrh_set_slid(void *arg, u16 slid)
+{
+	struct rxe_lrh *lrh = arg;
+	lrh->slid = cpu_to_be16(slid);
+}
+
+static inline void lrh_init(struct rxe_pkt_info *pkt, u8 vl, u8 sl,
+			    u8 lnh, u16 length, u16 dlid, u16 slid)
+{
+	struct rxe_lrh *lrh = (struct rxe_lrh *)pkt->hdr;
+
+	lrh->vlver = (vl << 4);
+	lrh->slnh = (sl << 4) | (lnh & LRH_LNH_MASK);
+	lrh->length = cpu_to_be16(length & LRH_LENGTH_MASK);
+	lrh->dlid = cpu_to_be16(dlid);
+	lrh->slid = cpu_to_be16(slid);
+}
+
+/******************************************************************************
+ * Global Route Header
+ ******************************************************************************/
+struct rxe_grh {
+	__be32			vflow;		/* ipver, tclass and flow */
+	__be16			paylen;
+	u8			next_hdr;
+	u8			hop_limit;
+	union ib_gid		sgid;
+	union ib_gid		dgid;
+};
+
+#define GRH_IPV6		(6)
+#define GRH_RXE_NEXT_HDR	(0x1b)
+
+#define GRH_IPVER_MASK		(0xf0000000)
+#define GRH_TCLASS_MASK		(0x0ff00000)
+#define GRH_FLOW_MASK		(0x000fffff)
+
+static inline u8 __grh_ipver(void *arg)
+{
+	struct rxe_grh *grh = arg;
+	return (GRH_IPVER_MASK & be32_to_cpu(grh->vflow)) >> 28;
+}
+
+static inline void __grh_set_ipver(void *arg, u8 ipver)
+{
+	struct rxe_grh *grh = arg;
+	u32 vflow = be32_to_cpu(grh->vflow);
+	grh->vflow = cpu_to_be32((GRH_IPVER_MASK & (ipver << 28))
+			| (~GRH_IPVER_MASK & vflow));
+}
+
+static inline u8 __grh_tclass(void *arg)
+{
+	struct rxe_grh *grh = arg;
+	return (GRH_TCLASS_MASK & be32_to_cpu(grh->vflow)) >> 20;
+}
+
+static inline void __grh_set_tclass(void *arg, u8 tclass)
+{
+	struct rxe_grh *grh = arg;
+	u32 vflow = be32_to_cpu(grh->vflow);
+	grh->vflow = cpu_to_be32((GRH_TCLASS_MASK & (tclass << 20))
+			| (~GRH_TCLASS_MASK & vflow));
+}
+
+static inline u32 __grh_flow(void *arg)
+{
+	struct rxe_grh *grh = arg;
+	return GRH_FLOW_MASK & be32_to_cpu(grh->vflow);
+}
+
+static inline void __grh_set_flow(void *arg, u32 flow)
+{
+	struct rxe_grh *grh = arg;
+	u32 vflow = be32_to_cpu(grh->vflow);
+	grh->vflow = cpu_to_be32((GRH_FLOW_MASK & flow)
+			| (~GRH_FLOW_MASK & vflow));
+}
+
+static inline void __grh_set_vflow(void *arg, u8 tclass, u32 flow)
+{
+	struct rxe_grh *grh = arg;
+	grh->vflow = cpu_to_be32((GRH_IPV6 << 28) | (tclass << 20)
+		| (flow & GRH_FLOW_MASK));
+}
+
+static inline u16 __grh_paylen(void *arg)
+{
+	struct rxe_grh *grh = arg;
+	return be16_to_cpu(grh->paylen);
+}
+
+static inline void __grh_set_paylen(void *arg, u16 paylen)
+{
+	struct rxe_grh *grh = arg;
+	grh->paylen = cpu_to_be16(paylen);
+}
+
+static inline u8 __grh_next_hdr(void *arg)
+{
+	struct rxe_grh *grh = arg;
+	return grh->next_hdr;
+}
+
+static inline void __grh_set_next_hdr(void *arg, u8 next_hdr)
+{
+	struct rxe_grh *grh = arg;
+	grh->next_hdr = next_hdr;
+}
+
+static inline u8 __grh_hop_limit(void *arg)
+{
+	struct rxe_grh *grh = arg;
+	return grh->hop_limit;
+}
+
+static inline void __grh_set_hop_limit(void *arg, u8 hop_limit)
+{
+	struct rxe_grh *grh = arg;
+	grh->hop_limit = hop_limit;
+}
+
+static inline union ib_gid *__grh_sgid(void *arg)
+{
+	struct rxe_grh *grh = arg;
+	return &grh->sgid;
+}
+
+static inline void __grh_set_sgid(void *arg, __be64 prefix, __be64 guid)
+{
+	struct rxe_grh *grh = arg;
+	grh->sgid.global.subnet_prefix = prefix;
+	grh->sgid.global.interface_id = guid;
+}
+
+static inline union ib_gid *__grh_dgid(void *arg)
+{
+	struct rxe_grh *grh = arg;
+	return &grh->dgid;
+}
+
+static inline void __grh_set_dgid(void *arg, __be64 prefix, __be64 guid)
+{
+	struct rxe_grh *grh = arg;
+	grh->dgid.global.subnet_prefix = prefix;
+	grh->dgid.global.interface_id = guid;
+}
+
+static inline u8 grh_ipver(struct rxe_pkt_info *pkt)
+{
+	return __grh_ipver(pkt->hdr);
+}
+
+static inline void grh_set_ipver(struct rxe_pkt_info *pkt, u8 ipver)
+{
+	__grh_set_ipver(pkt->hdr, ipver);
+	return;
+}
+
+static inline u8 grh_tclass(struct rxe_pkt_info *pkt)
+{
+	return __grh_tclass(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh));
+}
+
+static inline void grh_set_tclass(struct rxe_pkt_info *pkt, u8 tclass)
+{
+	__grh_set_tclass(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh), tclass);
+	return;
+}
+
+static inline u32 grh_flow(struct rxe_pkt_info *pkt)
+{
+	return __grh_flow(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh));
+}
+
+static inline void grh_set_flow(struct rxe_pkt_info *pkt, u32 flow)
+{
+	__grh_set_flow(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh), flow);
+	return;
+}
+
+static inline u16 grh_paylen(struct rxe_pkt_info *pkt)
+{
+	return __grh_paylen(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh));
+}
+
+static inline void grh_set_paylen(struct rxe_pkt_info *pkt, u16 paylen)
+{
+	__grh_set_paylen(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh), paylen);
+	return;
+}
+
+static inline u8 grh_next_hdr(struct rxe_pkt_info *pkt)
+{
+	return __grh_next_hdr(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh));
+}
+
+static inline void grh_set_next_hdr(struct rxe_pkt_info *pkt, u8 next_hdr)
+{
+	__grh_set_next_hdr(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh), next_hdr);
+	return;
+}
+
+static inline u8 grh_hop_limit(struct rxe_pkt_info *pkt)
+{
+	return __grh_hop_limit(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh));
+}
+
+static inline void grh_set_hop_limit(struct rxe_pkt_info *pkt, u8 hop_limit)
+{
+	__grh_set_hop_limit(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh), hop_limit);
+	return;
+}
+
+static inline union ib_gid *grh_sgid(struct rxe_pkt_info *pkt)
+{
+	return __grh_sgid(pkt->hdr + pkt->offset - sizeof(struct rxe_grh));
+}
+
+static inline void grh_set_sgid(struct rxe_pkt_info *pkt,
+	__be64 prefix, __be64 guid)
+{
+	__grh_set_sgid(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh), prefix, guid);
+	return;
+}
+
+static inline union ib_gid *grh_dgid(struct rxe_pkt_info *pkt)
+{
+	return __grh_dgid(pkt->hdr + pkt->offset - sizeof(struct rxe_grh));
+}
+
+static inline void grh_set_dgid(struct rxe_pkt_info *pkt,
+	__be64 prefix, __be64 guid)
+{
+	__grh_set_dgid(pkt->hdr + pkt->offset
+		- sizeof(struct rxe_grh), prefix, guid);
+	return;
+}
+
+static inline void grh_init(struct rxe_pkt_info *pkt, u8 tclass, u32 flow,
+			    u16 paylen, u8 hop_limit)
+{
+	struct rxe_grh *grh = (struct rxe_grh *)pkt->hdr
+		+ pkt->offset - sizeof(struct rxe_grh);
+
+	grh->vflow = cpu_to_be32((GRH_IPV6 << 28) | (tclass << 20)
+		| (flow & GRH_FLOW_MASK));
+	grh->paylen = cpu_to_be16(paylen);
+	grh->next_hdr = GRH_RXE_NEXT_HDR;
+	grh->hop_limit = hop_limit;
+}
+
+/******************************************************************************
+ * Base Transport Header
+ ******************************************************************************/
+struct rxe_bth {
+	u8			opcode;
+	u8			flags;
+	__be16			pkey;
+	__be32			qpn;
+	__be32			apsn;
+};
+
+#define BTH_TVER		(0)
+#define BTH_DEF_PKEY		(0xffff)
+
+#define BTH_SE_MASK		(0x80)
+#define BTH_MIG_MASK		(0x40)
+#define BTH_PAD_MASK		(0x30)
+#define BTH_TVER_MASK		(0x0f)
+#define BTH_FECN_MASK		(0x80000000)
+#define BTH_BECN_MASK		(0x40000000)
+#define BTH_RESV6A_MASK		(0x3f000000)
+#define BTH_QPN_MASK		(0x00ffffff)
+#define BTH_ACK_MASK		(0x80000000)
+#define BTH_RESV7_MASK		(0x7f000000)
+#define BTH_PSN_MASK		(0x00ffffff)
+
+static inline u8 __bth_opcode(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return bth->opcode;
+}
+
+static inline void __bth_set_opcode(void *arg, u8 opcode)
+{
+	struct rxe_bth *bth = arg;
+	bth->opcode = opcode;
+}
+
+static inline u8 __bth_se(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return 0 != (BTH_SE_MASK & bth->flags);
+}
+
+static inline void __bth_set_se(void *arg, int se)
+{
+	struct rxe_bth *bth = arg;
+	if (se)
+		bth->flags |= BTH_SE_MASK;
+	else
+		bth->flags &= ~BTH_SE_MASK;
+}
+
+static inline u8 __bth_mig(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return 0 != (BTH_MIG_MASK & bth->flags);
+}
+
+static inline void __bth_set_mig(void *arg, u8 mig)
+{
+	struct rxe_bth *bth = arg;
+	if (mig)
+		bth->flags |= BTH_MIG_MASK;
+	else
+		bth->flags &= ~BTH_MIG_MASK;
+}
+
+static inline u8 __bth_pad(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return (BTH_PAD_MASK & bth->flags) >> 4;
+}
+
+static inline void __bth_set_pad(void *arg, u8 pad)
+{
+	struct rxe_bth *bth = arg;
+	bth->flags = (BTH_PAD_MASK & (pad << 4)) |
+			(~BTH_PAD_MASK & bth->flags);
+}
+
+static inline u8 __bth_tver(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return BTH_TVER_MASK & bth->flags;
+}
+
+static inline void __bth_set_tver(void *arg, u8 tver)
+{
+	struct rxe_bth *bth = arg;
+	bth->flags = (BTH_TVER_MASK & tver) |
+			(~BTH_TVER_MASK & bth->flags);
+}
+
+static inline u16 __bth_pkey(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return be16_to_cpu(bth->pkey);
+}
+
+static inline void __bth_set_pkey(void *arg, u16 pkey)
+{
+	struct rxe_bth *bth = arg;
+	bth->pkey = cpu_to_be16(pkey);
+}
+
+static inline u32 __bth_qpn(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return BTH_QPN_MASK & be32_to_cpu(bth->qpn);
+}
+
+static inline void __bth_set_qpn(void *arg, u32 qpn)
+{
+	struct rxe_bth *bth = arg;
+	u32 resvqpn = be32_to_cpu(bth->qpn);
+	bth->qpn = cpu_to_be32((BTH_QPN_MASK & qpn) |
+			       (~BTH_QPN_MASK & resvqpn));
+}
+
+static inline int __bth_fecn(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return 0 != (__constant_cpu_to_be32(BTH_FECN_MASK) & bth->qpn);
+}
+
+static inline void __bth_set_fecn(void *arg, int fecn)
+{
+	struct rxe_bth *bth = arg;
+	if (fecn)
+		bth->qpn |= __constant_cpu_to_be32(BTH_FECN_MASK);
+	else
+		bth->qpn &= ~__constant_cpu_to_be32(BTH_FECN_MASK);
+}
+
+static inline int __bth_becn(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return 0 != (__constant_cpu_to_be32(BTH_BECN_MASK) & bth->qpn);
+}
+
+static inline void __bth_set_becn(void *arg, int becn)
+{
+	struct rxe_bth *bth = arg;
+	if (becn)
+		bth->qpn |= __constant_cpu_to_be32(BTH_BECN_MASK);
+	else
+		bth->qpn &= ~__constant_cpu_to_be32(BTH_BECN_MASK);
+}
+
+static inline u8 __bth_resv6a(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return (BTH_RESV6A_MASK & be32_to_cpu(bth->qpn)) >> 24;
+}
+
+static inline void __bth_set_resv6a(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	bth->qpn = __constant_cpu_to_be32(~BTH_RESV6A_MASK);
+}
+
+static inline int __bth_ack(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return 0 != (__constant_cpu_to_be32(BTH_ACK_MASK) & bth->apsn);
+}
+
+static inline void __bth_set_ack(void *arg, int ack)
+{
+	struct rxe_bth *bth = arg;
+	if (ack)
+		bth->apsn |= __constant_cpu_to_be32(BTH_ACK_MASK);
+	else
+		bth->apsn &= ~__constant_cpu_to_be32(BTH_ACK_MASK);
+}
+
+static inline void __bth_set_resv7(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	bth->apsn &= ~__constant_cpu_to_be32(BTH_RESV7_MASK);
+}
+
+static inline u32 __bth_psn(void *arg)
+{
+	struct rxe_bth *bth = arg;
+	return BTH_PSN_MASK & be32_to_cpu(bth->apsn);
+}
+
+static inline void __bth_set_psn(void *arg, u32 psn)
+{
+	struct rxe_bth *bth = arg;
+	u32 apsn = be32_to_cpu(bth->apsn);
+	bth->apsn = cpu_to_be32((BTH_PSN_MASK & psn) |
+			(~BTH_PSN_MASK & apsn));
+}
+
+static inline u8 bth_opcode(struct rxe_pkt_info *pkt)
+{
+	return __bth_opcode(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_opcode(struct rxe_pkt_info *pkt, u8 opcode)
+{
+	__bth_set_opcode(pkt->hdr + pkt->offset, opcode);
+	return;
+}
+
+static inline u8 bth_se(struct rxe_pkt_info *pkt)
+{
+	return __bth_se(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_se(struct rxe_pkt_info *pkt, int se)
+{
+	__bth_set_se(pkt->hdr + pkt->offset, se);
+	return;
+}
+
+static inline u8 bth_mig(struct rxe_pkt_info *pkt)
+{
+	return __bth_mig(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_mig(struct rxe_pkt_info *pkt, u8 mig)
+{
+	__bth_set_mig(pkt->hdr + pkt->offset, mig);
+	return;
+}
+
+static inline u8 bth_pad(struct rxe_pkt_info *pkt)
+{
+	return __bth_pad(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_pad(struct rxe_pkt_info *pkt, u8 pad)
+{
+	__bth_set_pad(pkt->hdr + pkt->offset, pad);
+	return;
+}
+
+static inline u8 bth_tver(struct rxe_pkt_info *pkt)
+{
+	return __bth_tver(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_tver(struct rxe_pkt_info *pkt, u8 tver)
+{
+	__bth_set_tver(pkt->hdr + pkt->offset, tver);
+	return;
+}
+
+static inline u16 bth_pkey(struct rxe_pkt_info *pkt)
+{
+	return __bth_pkey(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_pkey(struct rxe_pkt_info *pkt, u16 pkey)
+{
+	__bth_set_pkey(pkt->hdr + pkt->offset, pkey);
+	return;
+}
+
+static inline u32 bth_qpn(struct rxe_pkt_info *pkt)
+{
+	return __bth_qpn(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_qpn(struct rxe_pkt_info *pkt, u32 qpn)
+{
+	__bth_set_qpn(pkt->hdr + pkt->offset, qpn);
+	return;
+}
+
+static inline int bth_fecn(struct rxe_pkt_info *pkt)
+{
+	return __bth_fecn(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_fecn(struct rxe_pkt_info *pkt, int fecn)
+{
+	__bth_set_fecn(pkt->hdr + pkt->offset, fecn);
+	return;
+}
+
+static inline int bth_becn(struct rxe_pkt_info *pkt)
+{
+	return __bth_becn(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_becn(struct rxe_pkt_info *pkt, int becn)
+{
+	__bth_set_becn(pkt->hdr + pkt->offset, becn);
+	return;
+}
+
+static inline u8 bth_resv6a(struct rxe_pkt_info *pkt)
+{
+	return __bth_resv6a(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_resv6a(struct rxe_pkt_info *pkt)
+{
+	__bth_set_resv6a(pkt->hdr + pkt->offset);
+	return;
+}
+
+static inline int bth_ack(struct rxe_pkt_info *pkt)
+{
+	return __bth_ack(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_ack(struct rxe_pkt_info *pkt, int ack)
+{
+	__bth_set_ack(pkt->hdr + pkt->offset, ack);
+	return;
+}
+
+static inline void bth_set_resv7(struct rxe_pkt_info *pkt)
+{
+	__bth_set_resv7(pkt->hdr + pkt->offset);
+	return;
+}
+
+static inline u32 bth_psn(struct rxe_pkt_info *pkt)
+{
+	return __bth_psn(pkt->hdr + pkt->offset);
+}
+
+static inline void bth_set_psn(struct rxe_pkt_info *pkt, u32 psn)
+{
+	__bth_set_psn(pkt->hdr + pkt->offset, psn);
+	return;
+}
+
+static inline void bth_init(struct rxe_pkt_info *pkt, u8 opcode, int se,
+	int mig, int pad, u16 pkey, u32 qpn, int ack_req, u32 psn)
+{
+	struct rxe_bth *bth = (struct rxe_bth *)(pkt->hdr + pkt->offset);
+
+	bth->opcode = opcode;
+	bth->flags = (pad << 4) & BTH_PAD_MASK;
+	if (se)
+		bth->flags |= BTH_SE_MASK;
+	if (mig)
+		bth->flags |= BTH_MIG_MASK;
+	bth->pkey = cpu_to_be16(pkey);
+	bth->qpn = cpu_to_be32(qpn & BTH_QPN_MASK);
+	psn &= BTH_PSN_MASK;
+	if (ack_req)
+		psn |= BTH_ACK_MASK;
+	bth->apsn = cpu_to_be32(psn);
+}
+
+/******************************************************************************
+ * Reliable Datagram Extended Transport Header
+ ******************************************************************************/
+struct rxe_rdeth {
+	__be32			een;
+};
+
+#define RDETH_EEN_MASK		(0x00ffffff)
+
+static inline u8 __rdeth_een(void *arg)
+{
+	struct rxe_rdeth *rdeth = arg;
+	return RDETH_EEN_MASK & be32_to_cpu(rdeth->een);
+}
+
+static inline void __rdeth_set_een(void *arg, u32 een)
+{
+	struct rxe_rdeth *rdeth = arg;
+	rdeth->een = cpu_to_be32(RDETH_EEN_MASK & een);
+}
+
+static inline u8 rdeth_een(struct rxe_pkt_info *pkt)
+{
+	return __rdeth_een(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_RDETH]);
+}
+
+static inline void rdeth_set_een(struct rxe_pkt_info *pkt, u32 een)
+{
+	__rdeth_set_een(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_RDETH], een);
+	return;
+}
+
+/******************************************************************************
+ * Datagram Extended Transport Header
+ ******************************************************************************/
+struct rxe_deth {
+	__be32			qkey;
+	__be32			sqp;
+};
+
+#define GSI_QKEY		(0x80010000)
+#define DETH_SQP_MASK		(0x00ffffff)
+
+static inline u32 __deth_qkey(void *arg)
+{
+	struct rxe_deth *deth = arg;
+	return be32_to_cpu(deth->qkey);
+}
+
+static inline void __deth_set_qkey(void *arg, u32 qkey)
+{
+	struct rxe_deth *deth = arg;
+	deth->qkey = cpu_to_be32(qkey);
+}
+
+static inline u32 __deth_sqp(void *arg)
+{
+	struct rxe_deth *deth = arg;
+	return DETH_SQP_MASK & be32_to_cpu(deth->sqp);
+}
+
+static inline void __deth_set_sqp(void *arg, u32 sqp)
+{
+	struct rxe_deth *deth = arg;
+	deth->sqp = cpu_to_be32(DETH_SQP_MASK & sqp);
+}
+
+static inline u32 deth_qkey(struct rxe_pkt_info *pkt)
+{
+	return __deth_qkey(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_DETH]);
+}
+
+static inline void deth_set_qkey(struct rxe_pkt_info *pkt, u32 qkey)
+{
+	__deth_set_qkey(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_DETH], qkey);
+	return;
+}
+
+static inline u32 deth_sqp(struct rxe_pkt_info *pkt)
+{
+	return __deth_sqp(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_DETH]);
+}
+
+static inline void deth_set_sqp(struct rxe_pkt_info *pkt, u32 sqp)
+{
+	__deth_set_sqp(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_DETH], sqp);
+	return;
+}
+
+/******************************************************************************
+ * RDMA Extended Transport Header
+ ******************************************************************************/
+struct rxe_reth {
+	__be64			va;
+	__be32			rkey;
+	__be32			len;
+};
+
+static inline u64 __reth_va(void *arg)
+{
+	struct rxe_reth *reth = arg;
+	return be64_to_cpu(reth->va);
+}
+
+static inline void __reth_set_va(void *arg, u64 va)
+{
+	struct rxe_reth *reth = arg;
+	reth->va = cpu_to_be64(va);
+}
+
+static inline u32 __reth_rkey(void *arg)
+{
+	struct rxe_reth *reth = arg;
+	return be32_to_cpu(reth->rkey);
+}
+
+static inline void __reth_set_rkey(void *arg, u32 rkey)
+{
+	struct rxe_reth *reth = arg;
+	reth->rkey = cpu_to_be32(rkey);
+}
+
+static inline u32 __reth_len(void *arg)
+{
+	struct rxe_reth *reth = arg;
+	return be32_to_cpu(reth->len);
+}
+
+static inline void __reth_set_len(void *arg, u32 len)
+{
+	struct rxe_reth *reth = arg;
+	reth->len = cpu_to_be32(len);
+}
+
+static inline u64 reth_va(struct rxe_pkt_info *pkt)
+{
+	return __reth_va(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_RETH]);
+}
+
+static inline void reth_set_va(struct rxe_pkt_info *pkt, u64 va)
+{
+	__reth_set_va(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_RETH], va);
+	return;
+}
+
+static inline u32 reth_rkey(struct rxe_pkt_info *pkt)
+{
+	return __reth_rkey(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_RETH]);
+}
+
+static inline void reth_set_rkey(struct rxe_pkt_info *pkt, u32 rkey)
+{
+	__reth_set_rkey(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_RETH], rkey);
+	return;
+}
+
+static inline u32 reth_len(struct rxe_pkt_info *pkt)
+{
+	return __reth_len(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_RETH]);
+}
+
+static inline void reth_set_len(struct rxe_pkt_info *pkt, u32 len)
+{
+	__reth_set_len(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_RETH], len);
+	return;
+}
+
+/******************************************************************************
+ * Atomic Extended Transport Header
+ ******************************************************************************/
+struct rxe_atmeth {
+	__be64			va;
+	__be32			rkey;
+	__be64			swap_add;
+	__be64			comp;
+} __attribute__((__packed__));
+
+static inline u64 __atmeth_va(void *arg)
+{
+	struct rxe_atmeth *atmeth = arg;
+	return be64_to_cpu(atmeth->va);
+}
+
+static inline void __atmeth_set_va(void *arg, u64 va)
+{
+	struct rxe_atmeth *atmeth = arg;
+	atmeth->va = cpu_to_be64(va);
+}
+
+static inline u32 __atmeth_rkey(void *arg)
+{
+	struct rxe_atmeth *atmeth = arg;
+	return be32_to_cpu(atmeth->rkey);
+}
+
+static inline void __atmeth_set_rkey(void *arg, u32 rkey)
+{
+	struct rxe_atmeth *atmeth = arg;
+	atmeth->rkey = cpu_to_be32(rkey);
+}
+
+static inline u64 __atmeth_swap_add(void *arg)
+{
+	struct rxe_atmeth *atmeth = arg;
+	return be64_to_cpu(atmeth->swap_add);
+}
+
+static inline void __atmeth_set_swap_add(void *arg, u64 swap_add)
+{
+	struct rxe_atmeth *atmeth = arg;
+	atmeth->swap_add = cpu_to_be64(swap_add);
+}
+
+static inline u64 __atmeth_comp(void *arg)
+{
+	struct rxe_atmeth *atmeth = arg;
+	return be64_to_cpu(atmeth->comp);
+}
+
+static inline void __atmeth_set_comp(void *arg, u64 comp)
+{
+	struct rxe_atmeth *atmeth = arg;
+	atmeth->comp = cpu_to_be64(comp);
+}
+
+static inline u64 atmeth_va(struct rxe_pkt_info *pkt)
+{
+	return __atmeth_va(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_ATMETH]);
+}
+
+static inline void atmeth_set_va(struct rxe_pkt_info *pkt, u64 va)
+{
+	__atmeth_set_va(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_ATMETH], va);
+	return;
+}
+
+static inline u32 atmeth_rkey(struct rxe_pkt_info *pkt)
+{
+	return __atmeth_rkey(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_ATMETH]);
+}
+
+static inline void atmeth_set_rkey(struct rxe_pkt_info *pkt, u32 rkey)
+{
+	__atmeth_set_rkey(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_ATMETH], rkey);
+	return;
+}
+
+static inline u64 atmeth_swap_add(struct rxe_pkt_info *pkt)
+{
+	return __atmeth_swap_add(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_ATMETH]);
+}
+
+static inline void atmeth_set_swap_add(struct rxe_pkt_info *pkt, u64 swap_add)
+{
+	__atmeth_set_swap_add(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_ATMETH], swap_add);
+	return;
+}
+
+static inline u64 atmeth_comp(struct rxe_pkt_info *pkt)
+{
+	return __atmeth_comp(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_ATMETH]);
+}
+
+static inline void atmeth_set_comp(struct rxe_pkt_info *pkt, u64 comp)
+{
+	__atmeth_set_comp(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_ATMETH], comp);
+	return;
+}
+
+/******************************************************************************
+ * Ack Extended Transport Header
+ ******************************************************************************/
+struct rxe_aeth {
+	__be32			smsn;
+};
+
+#define AETH_SYN_MASK		(0xff000000)
+#define AETH_MSN_MASK		(0x00ffffff)
+
+enum aeth_syndrome {
+	AETH_TYPE_MASK		= 0xe0,
+	AETH_ACK		= 0x00,
+	AETH_RNR_NAK		= 0x20,
+	AETH_RSVD		= 0x40,
+	AETH_NAK		= 0x60,
+	AETH_ACK_UNLIMITED	= 0x1f,
+	AETH_NAK_PSN_SEQ_ERROR	= 0x60,
+	AETH_NAK_INVALID_REQ	= 0x61,
+	AETH_NAK_REM_ACC_ERR	= 0x62,
+	AETH_NAK_REM_OP_ERR	= 0x63,
+	AETH_NAK_INV_RD_REQ	= 0x64,
+};
+
+static inline u8 __aeth_syn(void *arg)
+{
+	struct rxe_aeth *aeth = arg;
+	return (AETH_SYN_MASK & be32_to_cpu(aeth->smsn)) >> 24;
+}
+
+static inline void __aeth_set_syn(void *arg, u8 syn)
+{
+	struct rxe_aeth *aeth = arg;
+	u32 smsn = be32_to_cpu(aeth->smsn);
+	aeth->smsn = cpu_to_be32((AETH_SYN_MASK & (syn << 24)) |
+			 (~AETH_SYN_MASK & smsn));
+}
+
+static inline u32 __aeth_msn(void *arg)
+{
+	struct rxe_aeth *aeth = arg;
+	return AETH_MSN_MASK & be32_to_cpu(aeth->smsn);
+}
+
+static inline void __aeth_set_msn(void *arg, u32 msn)
+{
+	struct rxe_aeth *aeth = arg;
+	u32 smsn = be32_to_cpu(aeth->smsn);
+	aeth->smsn = cpu_to_be32((AETH_MSN_MASK & msn) |
+			 (~AETH_MSN_MASK & smsn));
+}
+
+static inline u8 aeth_syn(struct rxe_pkt_info *pkt)
+{
+	return __aeth_syn(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_AETH]);
+}
+
+static inline void aeth_set_syn(struct rxe_pkt_info *pkt, u8 syn)
+{
+	__aeth_set_syn(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_AETH], syn);
+	return;
+}
+
+static inline u32 aeth_msn(struct rxe_pkt_info *pkt)
+{
+	return __aeth_msn(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_AETH]);
+}
+
+static inline void aeth_set_msn(struct rxe_pkt_info *pkt, u32 msn)
+{
+	__aeth_set_msn(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_AETH], msn);
+	return;
+}
+
+/******************************************************************************
+ * Atomic Ack Extended Transport Header
+ ******************************************************************************/
+struct rxe_atmack {
+	__be64			orig;
+};
+
+static inline u64 __atmack_orig(void *arg)
+{
+	struct rxe_atmack *atmack = arg;
+	return be64_to_cpu(atmack->orig);
+}
+
+static inline void __atmack_set_orig(void *arg, u64 orig)
+{
+	struct rxe_atmack *atmack = arg;
+	atmack->orig = cpu_to_be64(orig);
+}
+
+static inline u64 atmack_orig(struct rxe_pkt_info *pkt)
+{
+	return __atmack_orig(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_ATMACK]);
+}
+
+static inline void atmack_set_orig(struct rxe_pkt_info *pkt, u64 orig)
+{
+	__atmack_set_orig(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_ATMACK], orig);
+	return;
+}
+
+/******************************************************************************
+ * Immediate Extended Transport Header
+ ******************************************************************************/
+struct rxe_immdt {
+	__be32			imm;
+};
+
+static inline __be32 __immdt_imm(void *arg)
+{
+	struct rxe_immdt *immdt = arg;
+	return immdt->imm;
+}
+
+static inline void __immdt_set_imm(void *arg, __be32 imm)
+{
+	struct rxe_immdt *immdt = arg;
+	immdt->imm = imm;
+}
+
+static inline __be32 immdt_imm(struct rxe_pkt_info *pkt)
+{
+	return __immdt_imm(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_IMMDT]);
+}
+
+static inline void immdt_set_imm(struct rxe_pkt_info *pkt, __be32 imm)
+{
+	__immdt_set_imm(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_IMMDT], imm);
+	return;
+}
+
+/******************************************************************************
+ * Invalid Extended Transport Header
+ ******************************************************************************/
+struct rxe_ieth {
+	__be32			rkey;
+};
+
+static inline u32 __ieth_rkey(void *arg)
+{
+	struct rxe_ieth *ieth = arg;
+	return be32_to_cpu(ieth->rkey);
+}
+
+static inline void __ieth_set_rkey(void *arg, u32 rkey)
+{
+	struct rxe_ieth *ieth = arg;
+	ieth->rkey = cpu_to_be32(rkey);
+}
+
+static inline u32 ieth_rkey(struct rxe_pkt_info *pkt)
+{
+	return __ieth_rkey(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_IETH]);
+}
+
+static inline void ieth_set_rkey(struct rxe_pkt_info *pkt, u32 rkey)
+{
+	__ieth_set_rkey(pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_IETH], rkey);
+	return;
+}
+
+enum rxe_hdr_length {
+	RXE_LRH_BYTES		= sizeof(struct rxe_lrh),
+	RXE_GRH_BYTES		= sizeof(struct rxe_grh),
+	RXE_BTH_BYTES		= sizeof(struct rxe_bth),
+	RXE_DETH_BYTES		= sizeof(struct rxe_deth),
+	RXE_IMMDT_BYTES		= sizeof(struct rxe_immdt),
+	RXE_RETH_BYTES		= sizeof(struct rxe_reth),
+	RXE_AETH_BYTES		= sizeof(struct rxe_aeth),
+	RXE_ATMACK_BYTES	= sizeof(struct rxe_atmack),
+	RXE_ATMETH_BYTES	= sizeof(struct rxe_atmeth),
+	RXE_IETH_BYTES		= sizeof(struct rxe_ieth),
+	RXE_RDETH_BYTES		= sizeof(struct rxe_rdeth),
+};
+
+static inline size_t header_size(struct rxe_pkt_info *pkt)
+{
+	return pkt->offset + rxe_opcode[pkt->opcode].length;
+}
+
+static inline void *payload_addr(struct rxe_pkt_info *pkt)
+{
+	return pkt->hdr + pkt->offset
+		+ rxe_opcode[pkt->opcode].offset[RXE_PAYLOAD];
+}
+
+static inline size_t payload_size(struct rxe_pkt_info *pkt)
+{
+	return pkt->paylen - rxe_opcode[pkt->opcode].offset[RXE_PAYLOAD]
+		- bth_pad(pkt) - RXE_ICRC_SIZE;
+}
+
+#endif /* RXE_HDR_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 04/37] add rxe_opcode.h
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (2 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 03/37] add rxe_hdr.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 05/37] add rxe_opcode.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (33 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_opcode.h --]
[-- Type: text/plain, Size: 4722 bytes --]

Add declarations for data structures used to hold per opcode
and per work request opcode tables.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_opcode.h |  130 +++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_opcode.h
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_opcode.h
@@ -0,0 +1,130 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_OPCODE_H
+#define RXE_OPCODE_H
+
+/*
+ * contains header bit mask definitions and header lengths
+ * declaration of the rxe_opcode_info struct and
+ * rxe_wr_opcode_info struct
+ */
+
+enum rxe_wr_mask {
+	WR_INLINE_MASK			= (1 << 0),
+	WR_ATOMIC_MASK			= (1 << 1),
+	WR_SEND_MASK			= (1 << 2),
+	WR_READ_MASK			= (1 << 3),
+	WR_WRITE_MASK			= (1 << 4),
+	WR_LOCAL_MASK			= (1 << 5),
+
+	WR_READ_OR_WRITE_MASK		= WR_READ_MASK | WR_WRITE_MASK,
+	WR_READ_WRITE_OR_SEND_MASK	= WR_READ_OR_WRITE_MASK | WR_SEND_MASK,
+	WR_WRITE_OR_SEND_MASK		= WR_WRITE_MASK | WR_SEND_MASK,
+	WR_ATOMIC_OR_READ_MASK		= WR_ATOMIC_MASK | WR_READ_MASK,
+};
+
+#define WR_MAX_QPT		(8)
+
+struct rxe_wr_opcode_info {
+	char			*name;
+	enum rxe_wr_mask	mask[WR_MAX_QPT];
+};
+
+extern struct rxe_wr_opcode_info rxe_wr_opcode_info[];
+
+enum rxe_hdr_type {
+	RXE_LRH,
+	RXE_GRH,
+	RXE_BTH,
+	RXE_RETH,
+	RXE_AETH,
+	RXE_ATMETH,
+	RXE_ATMACK,
+	RXE_IETH,
+	RXE_RDETH,
+	RXE_DETH,
+	RXE_IMMDT,
+	RXE_PAYLOAD,
+	NUM_HDR_TYPES
+};
+
+enum rxe_hdr_mask {
+	RXE_LRH_MASK		= (1 << RXE_LRH),
+	RXE_GRH_MASK		= (1 << RXE_GRH),
+	RXE_BTH_MASK		= (1 << RXE_BTH),
+	RXE_IMMDT_MASK		= (1 << RXE_IMMDT),
+	RXE_RETH_MASK		= (1 << RXE_RETH),
+	RXE_AETH_MASK		= (1 << RXE_AETH),
+	RXE_ATMETH_MASK		= (1 << RXE_ATMETH),
+	RXE_ATMACK_MASK		= (1 << RXE_ATMACK),
+	RXE_IETH_MASK		= (1 << RXE_IETH),
+	RXE_RDETH_MASK		= (1 << RXE_RDETH),
+	RXE_DETH_MASK		= (1 << RXE_DETH),
+	RXE_PAYLOAD_MASK	= (1 << RXE_PAYLOAD),
+
+	RXE_REQ_MASK		= (1 << (NUM_HDR_TYPES+0)),
+	RXE_ACK_MASK		= (1 << (NUM_HDR_TYPES+1)),
+	RXE_SEND_MASK		= (1 << (NUM_HDR_TYPES+2)),
+	RXE_WRITE_MASK		= (1 << (NUM_HDR_TYPES+3)),
+	RXE_READ_MASK		= (1 << (NUM_HDR_TYPES+4)),
+	RXE_ATOMIC_MASK		= (1 << (NUM_HDR_TYPES+5)),
+
+	RXE_RWR_MASK		= (1 << (NUM_HDR_TYPES+6)),
+	RXE_COMP_MASK		= (1 << (NUM_HDR_TYPES+7)),
+
+	RXE_START_MASK		= (1 << (NUM_HDR_TYPES+8)),
+	RXE_MIDDLE_MASK		= (1 << (NUM_HDR_TYPES+9)),
+	RXE_END_MASK		= (1 << (NUM_HDR_TYPES+10)),
+
+	RXE_CNP_MASK		= (1 << (NUM_HDR_TYPES+11)),
+
+	RXE_LOOPBACK_MASK	= (1 << (NUM_HDR_TYPES+12)),
+
+	RXE_READ_OR_ATOMIC	= (RXE_READ_MASK | RXE_ATOMIC_MASK),
+	RXE_WRITE_OR_SEND	= (RXE_WRITE_MASK | RXE_SEND_MASK),
+};
+
+#define OPCODE_NONE		(-1)
+#define RXE_NUM_OPCODE		256
+
+struct rxe_opcode_info {
+	char			*name;
+	enum rxe_hdr_mask	mask;
+	int			length;
+	int			offset[NUM_HDR_TYPES];
+};
+
+extern struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE];
+
+#endif /* RXE_OPCODE_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 05/37] add rxe_opcode.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (3 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 04/37] add rxe_opcode.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 06/37] add rxe_param.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (32 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_opcode.c --]
[-- Type: text/plain, Size: 30275 bytes --]

Add data structures used to hold per opcode
and per work request opcode tables.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_opcode.c |  982 +++++++++++++++++++++++++++++++++
 1 file changed, 982 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_opcode.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_opcode.c
@@ -0,0 +1,982 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/ib_pack.h>
+#include "rxe_opcode.h"
+#include "rxe_hdr.h"
+
+/* useful information about work request opcodes and pkt opcodes
+   in table form */
+
+struct rxe_wr_opcode_info rxe_wr_opcode_info[] = {
+	[IB_WR_RDMA_WRITE]				= {
+		.name	= "IB_WR_RDMA_WRITE",
+		.mask	= {
+			[IB_QPT_RC]	= WR_INLINE_MASK | WR_WRITE_MASK,
+			[IB_QPT_UC]	= WR_INLINE_MASK | WR_WRITE_MASK,
+		},
+	},
+	[IB_WR_RDMA_WRITE_WITH_IMM]			= {
+		.name	= "IB_WR_RDMA_WRITE_WITH_IMM",
+		.mask	= {
+			[IB_QPT_RC]	= WR_INLINE_MASK | WR_WRITE_MASK,
+			[IB_QPT_UC]	= WR_INLINE_MASK | WR_WRITE_MASK,
+		},
+	},
+	[IB_WR_SEND]					= {
+		.name	= "IB_WR_SEND",
+		.mask	= {
+			[IB_QPT_SMI]	= WR_INLINE_MASK | WR_SEND_MASK,
+			[IB_QPT_GSI]	= WR_INLINE_MASK | WR_SEND_MASK,
+			[IB_QPT_RC]	= WR_INLINE_MASK | WR_SEND_MASK,
+			[IB_QPT_UC]	= WR_INLINE_MASK | WR_SEND_MASK,
+			[IB_QPT_UD]	= WR_INLINE_MASK | WR_SEND_MASK,
+		},
+	},
+	[IB_WR_SEND_WITH_IMM]				= {
+		.name	= "IB_WR_SEND_WITH_IMM",
+		.mask	= {
+			[IB_QPT_SMI]	= WR_INLINE_MASK | WR_SEND_MASK,
+			[IB_QPT_GSI]	= WR_INLINE_MASK | WR_SEND_MASK,
+			[IB_QPT_RC]	= WR_INLINE_MASK | WR_SEND_MASK,
+			[IB_QPT_UC]	= WR_INLINE_MASK | WR_SEND_MASK,
+			[IB_QPT_UD]	= WR_INLINE_MASK | WR_SEND_MASK,
+		},
+	},
+	[IB_WR_RDMA_READ]				= {
+		.name	= "IB_WR_RDMA_READ",
+		.mask	= {
+			[IB_QPT_RC]	= WR_READ_MASK,
+		},
+	},
+	[IB_WR_ATOMIC_CMP_AND_SWP]			= {
+		.name	= "IB_WR_ATOMIC_CMP_AND_SWP",
+		.mask	= {
+			[IB_QPT_RC]	= WR_ATOMIC_MASK,
+		},
+	},
+	[IB_WR_ATOMIC_FETCH_AND_ADD]			= {
+		.name	= "IB_WR_ATOMIC_FETCH_AND_ADD",
+		.mask	= {
+			[IB_QPT_RC]	= WR_ATOMIC_MASK,
+		},
+	},
+	[IB_WR_LSO]					= {
+		.name	= "IB_WR_LSO",
+		.mask	= {
+			/* not supported */
+		},
+	},
+	[IB_WR_SEND_WITH_INV]				= {
+		.name	= "IB_WR_SEND_WITH_INV",
+		.mask	= {
+			[IB_QPT_RC]	= WR_INLINE_MASK | WR_SEND_MASK,
+			[IB_QPT_UC]	= WR_INLINE_MASK | WR_SEND_MASK,
+			[IB_QPT_UD]	= WR_INLINE_MASK | WR_SEND_MASK,
+		},
+	},
+	[IB_WR_RDMA_READ_WITH_INV]			= {
+		.name	= "IB_WR_RDMA_READ_WITH_INV",
+		.mask	= {
+			[IB_QPT_RC]	= WR_READ_MASK | WR_READ_MASK,
+		},
+	},
+	[IB_WR_LOCAL_INV]				= {
+		.name	= "IB_WR_LOCAL_INV",
+		.mask	= {
+			/* not supported */
+		},
+	},
+	[IB_WR_FAST_REG_MR]				= {
+		.name	= "IB_WR_FAST_REG_MR",
+		.mask	= {
+			/* not supported */
+		},
+	},
+};
+
+struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
+	[IB_OPCODE_RC_SEND_FIRST]			= {
+		.name	= "IB_OPCODE_RC_SEND_FIRST",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_RWR_MASK
+				| RXE_SEND_MASK | RXE_START_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_SEND_MIDDLE]		= {
+		.name	= "IB_OPCODE_RC_SEND_MIDDLE]",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_SEND_MASK
+				| RXE_MIDDLE_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_SEND_LAST]			= {
+		.name	= "IB_OPCODE_RC_SEND_LAST",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_COMP_MASK
+				| RXE_SEND_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_SEND_LAST_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_RC_SEND_LAST_WITH_IMMEDIATE",
+		.mask	= RXE_IMMDT_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_COMP_MASK | RXE_SEND_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_IMMDT]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_SEND_ONLY]			= {
+		.name	= "IB_OPCODE_RC_SEND_ONLY",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_COMP_MASK
+				| RXE_RWR_MASK | RXE_SEND_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_SEND_ONLY_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_RC_SEND_ONLY_WITH_IMMEDIATE",
+		.mask	= RXE_IMMDT_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_COMP_MASK | RXE_RWR_MASK | RXE_SEND_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_IMMDT]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_RDMA_WRITE_FIRST]		= {
+		.name	= "IB_OPCODE_RC_RDMA_WRITE_FIRST",
+		.mask	= RXE_RETH_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_WRITE_MASK | RXE_START_MASK,
+		.length = RXE_BTH_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_RDMA_WRITE_MIDDLE]		= {
+		.name	= "IB_OPCODE_RC_RDMA_WRITE_MIDDLE",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_WRITE_MASK
+				| RXE_MIDDLE_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_RDMA_WRITE_LAST]			= {
+		.name	= "IB_OPCODE_RC_RDMA_WRITE_LAST",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_WRITE_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_RDMA_WRITE_LAST_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_RC_RDMA_WRITE_LAST_WITH_IMMEDIATE",
+		.mask	= RXE_IMMDT_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_WRITE_MASK | RXE_COMP_MASK | RXE_RWR_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_IMMDT]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_RDMA_WRITE_ONLY]			= {
+		.name	= "IB_OPCODE_RC_RDMA_WRITE_ONLY",
+		.mask	= RXE_RETH_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_WRITE_MASK | RXE_START_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_RDMA_WRITE_ONLY_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_RC_RDMA_WRITE_ONLY_WITH_IMMEDIATE",
+		.mask	= RXE_RETH_MASK | RXE_IMMDT_MASK | RXE_PAYLOAD_MASK
+				| RXE_REQ_MASK | RXE_WRITE_MASK
+				| RXE_COMP_MASK | RXE_RWR_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RETH]	= RXE_BTH_BYTES,
+			[RXE_IMMDT]	= RXE_BTH_BYTES
+						+ RXE_RETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RETH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_RDMA_READ_REQUEST]			= {
+		.name	= "IB_OPCODE_RC_RDMA_READ_REQUEST",
+		.mask	= RXE_RETH_MASK | RXE_REQ_MASK | RXE_READ_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST]		= {
+		.name	= "IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST",
+		.mask	= RXE_AETH_MASK | RXE_PAYLOAD_MASK | RXE_ACK_MASK
+				| RXE_START_MASK,
+		.length = RXE_BTH_BYTES + RXE_AETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_AETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_AETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE]		= {
+		.name	= "IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE",
+		.mask	= RXE_PAYLOAD_MASK | RXE_ACK_MASK | RXE_MIDDLE_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST]		= {
+		.name	= "IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST",
+		.mask	= RXE_AETH_MASK | RXE_PAYLOAD_MASK | RXE_ACK_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_AETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_AETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_AETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY]		= {
+		.name	= "IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY",
+		.mask	= RXE_AETH_MASK | RXE_PAYLOAD_MASK | RXE_ACK_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_AETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_AETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_AETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_ACKNOWLEDGE]			= {
+		.name	= "IB_OPCODE_RC_ACKNOWLEDGE",
+		.mask	= RXE_AETH_MASK | RXE_ACK_MASK | RXE_START_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_AETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_AETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_AETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE]			= {
+		.name	= "IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE",
+		.mask	= RXE_AETH_MASK | RXE_ATMACK_MASK | RXE_ACK_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_ATMACK_BYTES + RXE_AETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_AETH]	= RXE_BTH_BYTES,
+			[RXE_ATMACK]	= RXE_BTH_BYTES
+						+ RXE_AETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+					+ RXE_ATMACK_BYTES + RXE_AETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_COMPARE_SWAP]			= {
+		.name	= "IB_OPCODE_RC_COMPARE_SWAP",
+		.mask	= RXE_ATMETH_MASK | RXE_REQ_MASK | RXE_ATOMIC_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_ATMETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_ATMETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_ATMETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_FETCH_ADD]			= {
+		.name	= "IB_OPCODE_RC_FETCH_ADD",
+		.mask	= RXE_ATMETH_MASK | RXE_REQ_MASK | RXE_ATOMIC_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_ATMETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_ATMETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_ATMETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_SEND_LAST_INV]		= {
+		.name	= "IB_OPCODE_RC_SEND_LAST_INV",
+		.mask	= RXE_IETH_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_COMP_MASK | RXE_SEND_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_IETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_IETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RC_SEND_ONLY_INV]		= {
+		.name	= "IB_OPCODE_RC_SEND_ONLY_INV",
+		.mask	= RXE_IETH_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_COMP_MASK | RXE_RWR_MASK | RXE_SEND_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_IETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_IETH_BYTES,
+		}
+	},
+
+	/* UC */
+	[IB_OPCODE_UC_SEND_FIRST]			= {
+		.name	= "IB_OPCODE_UC_SEND_FIRST",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_RWR_MASK
+				| RXE_SEND_MASK | RXE_START_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_UC_SEND_MIDDLE]		= {
+		.name	= "IB_OPCODE_UC_SEND_MIDDLE",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_SEND_MASK
+				| RXE_MIDDLE_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_UC_SEND_LAST]			= {
+		.name	= "IB_OPCODE_UC_SEND_LAST",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_COMP_MASK
+				| RXE_SEND_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_UC_SEND_LAST_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_UC_SEND_LAST_WITH_IMMEDIATE",
+		.mask	= RXE_IMMDT_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_COMP_MASK | RXE_SEND_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_IMMDT]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+	[IB_OPCODE_UC_SEND_ONLY]			= {
+		.name	= "IB_OPCODE_UC_SEND_ONLY",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_COMP_MASK
+				| RXE_RWR_MASK | RXE_SEND_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_UC_SEND_ONLY_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_UC_SEND_ONLY_WITH_IMMEDIATE",
+		.mask	= RXE_IMMDT_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_COMP_MASK | RXE_RWR_MASK | RXE_SEND_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_IMMDT]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+	[IB_OPCODE_UC_RDMA_WRITE_FIRST]		= {
+		.name	= "IB_OPCODE_UC_RDMA_WRITE_FIRST",
+		.mask	= RXE_RETH_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_WRITE_MASK | RXE_START_MASK,
+		.length = RXE_BTH_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RETH_BYTES,
+		}
+	},
+	[IB_OPCODE_UC_RDMA_WRITE_MIDDLE]		= {
+		.name	= "IB_OPCODE_UC_RDMA_WRITE_MIDDLE",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_WRITE_MASK
+				| RXE_MIDDLE_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_UC_RDMA_WRITE_LAST]			= {
+		.name	= "IB_OPCODE_UC_RDMA_WRITE_LAST",
+		.mask	= RXE_PAYLOAD_MASK | RXE_REQ_MASK | RXE_WRITE_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES,
+		}
+	},
+	[IB_OPCODE_UC_RDMA_WRITE_LAST_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_UC_RDMA_WRITE_LAST_WITH_IMMEDIATE",
+		.mask	= RXE_IMMDT_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_WRITE_MASK | RXE_COMP_MASK | RXE_RWR_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_IMMDT]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+	[IB_OPCODE_UC_RDMA_WRITE_ONLY]			= {
+		.name	= "IB_OPCODE_UC_RDMA_WRITE_ONLY",
+		.mask	= RXE_RETH_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_WRITE_MASK | RXE_START_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RETH_BYTES,
+		}
+	},
+	[IB_OPCODE_UC_RDMA_WRITE_ONLY_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_UC_RDMA_WRITE_ONLY_WITH_IMMEDIATE",
+		.mask	= RXE_RETH_MASK | RXE_IMMDT_MASK | RXE_PAYLOAD_MASK
+				| RXE_REQ_MASK | RXE_WRITE_MASK
+				| RXE_COMP_MASK | RXE_RWR_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RETH]	= RXE_BTH_BYTES,
+			[RXE_IMMDT]	= RXE_BTH_BYTES
+						+ RXE_RETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RETH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+
+	/* RD */
+	[IB_OPCODE_RD_SEND_FIRST]			= {
+		.name	= "IB_OPCODE_RD_SEND_FIRST",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_PAYLOAD_MASK
+				| RXE_REQ_MASK | RXE_RWR_MASK | RXE_SEND_MASK
+				| RXE_START_MASK,
+		.length = RXE_BTH_BYTES + RXE_DETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_SEND_MIDDLE]		= {
+		.name	= "IB_OPCODE_RD_SEND_MIDDLE",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_PAYLOAD_MASK
+				| RXE_REQ_MASK | RXE_SEND_MASK
+				| RXE_MIDDLE_MASK,
+		.length = RXE_BTH_BYTES + RXE_DETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_SEND_LAST]			= {
+		.name	= "IB_OPCODE_RD_SEND_LAST",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_PAYLOAD_MASK
+				| RXE_REQ_MASK | RXE_COMP_MASK | RXE_SEND_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_DETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_SEND_LAST_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_RD_SEND_LAST_WITH_IMMEDIATE",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_IMMDT_MASK
+				| RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_COMP_MASK | RXE_SEND_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES + RXE_DETH_BYTES
+				+ RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_IMMDT]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_SEND_ONLY]			= {
+		.name	= "IB_OPCODE_RD_SEND_ONLY",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_PAYLOAD_MASK
+				| RXE_REQ_MASK | RXE_COMP_MASK | RXE_RWR_MASK
+				| RXE_SEND_MASK | RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_DETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_SEND_ONLY_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_RD_SEND_ONLY_WITH_IMMEDIATE",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_IMMDT_MASK
+				| RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_COMP_MASK | RXE_RWR_MASK | RXE_SEND_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES + RXE_DETH_BYTES
+				+ RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_IMMDT]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RDMA_WRITE_FIRST]		= {
+		.name	= "IB_OPCODE_RD_RDMA_WRITE_FIRST",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_RETH_MASK
+				| RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_WRITE_MASK | RXE_START_MASK,
+		.length = RXE_BTH_BYTES + RXE_RETH_BYTES + RXE_DETH_BYTES
+				+ RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_RETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES
+						+ RXE_RETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RDMA_WRITE_MIDDLE]		= {
+		.name	= "IB_OPCODE_RD_RDMA_WRITE_MIDDLE",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_PAYLOAD_MASK
+				| RXE_REQ_MASK | RXE_WRITE_MASK
+				| RXE_MIDDLE_MASK,
+		.length = RXE_BTH_BYTES + RXE_DETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RDMA_WRITE_LAST]			= {
+		.name	= "IB_OPCODE_RD_RDMA_WRITE_LAST",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_PAYLOAD_MASK
+				| RXE_REQ_MASK | RXE_WRITE_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_DETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RDMA_WRITE_LAST_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_RD_RDMA_WRITE_LAST_WITH_IMMEDIATE",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_IMMDT_MASK
+				| RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_WRITE_MASK | RXE_COMP_MASK | RXE_RWR_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES + RXE_DETH_BYTES
+				+ RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_IMMDT]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RDMA_WRITE_ONLY]			= {
+		.name	= "IB_OPCODE_RD_RDMA_WRITE_ONLY",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_RETH_MASK
+				| RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_WRITE_MASK | RXE_START_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_RETH_BYTES + RXE_DETH_BYTES
+				+ RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_RETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES
+						+ RXE_RETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RDMA_WRITE_ONLY_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_RD_RDMA_WRITE_ONLY_WITH_IMMEDIATE",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_RETH_MASK
+				| RXE_IMMDT_MASK | RXE_PAYLOAD_MASK
+				| RXE_REQ_MASK | RXE_WRITE_MASK
+				| RXE_COMP_MASK | RXE_RWR_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES + RXE_RETH_BYTES
+				+ RXE_DETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_RETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+			[RXE_IMMDT]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES
+						+ RXE_RETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES
+						+ RXE_RETH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RDMA_READ_REQUEST]			= {
+		.name	= "IB_OPCODE_RD_RDMA_READ_REQUEST",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_RETH_MASK
+				| RXE_REQ_MASK | RXE_READ_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_RETH_BYTES + RXE_DETH_BYTES
+				+ RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_RETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RETH_BYTES
+						+ RXE_DETH_BYTES
+						+ RXE_RDETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RDMA_READ_RESPONSE_FIRST]		= {
+		.name	= "IB_OPCODE_RD_RDMA_READ_RESPONSE_FIRST",
+		.mask	= RXE_RDETH_MASK | RXE_AETH_MASK
+				| RXE_PAYLOAD_MASK | RXE_ACK_MASK
+				| RXE_START_MASK,
+		.length = RXE_BTH_BYTES + RXE_AETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_AETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_AETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RDMA_READ_RESPONSE_MIDDLE]		= {
+		.name	= "IB_OPCODE_RD_RDMA_READ_RESPONSE_MIDDLE",
+		.mask	= RXE_RDETH_MASK | RXE_PAYLOAD_MASK | RXE_ACK_MASK
+				| RXE_MIDDLE_MASK,
+		.length = RXE_BTH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RDMA_READ_RESPONSE_LAST]		= {
+		.name	= "IB_OPCODE_RD_RDMA_READ_RESPONSE_LAST",
+		.mask	= RXE_RDETH_MASK | RXE_AETH_MASK | RXE_PAYLOAD_MASK
+				| RXE_ACK_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_AETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_AETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_AETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RDMA_READ_RESPONSE_ONLY]		= {
+		.name	= "IB_OPCODE_RD_RDMA_READ_RESPONSE_ONLY",
+		.mask	= RXE_RDETH_MASK | RXE_AETH_MASK | RXE_PAYLOAD_MASK
+				| RXE_ACK_MASK | RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_AETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_AETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_AETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_ACKNOWLEDGE]			= {
+		.name	= "IB_OPCODE_RD_ACKNOWLEDGE",
+		.mask	= RXE_RDETH_MASK | RXE_AETH_MASK | RXE_ACK_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_AETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_AETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_ATOMIC_ACKNOWLEDGE]			= {
+		.name	= "IB_OPCODE_RD_ATOMIC_ACKNOWLEDGE",
+		.mask	= RXE_RDETH_MASK | RXE_AETH_MASK | RXE_ATMACK_MASK
+				| RXE_ACK_MASK | RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_ATMACK_BYTES + RXE_AETH_BYTES
+				+ RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_AETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_ATMACK]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_AETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_COMPARE_SWAP]			= {
+		.name	= "RD_COMPARE_SWAP",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_ATMETH_MASK
+				| RXE_REQ_MASK | RXE_ATOMIC_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_ATMETH_BYTES + RXE_DETH_BYTES
+				+ RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_ATMETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES +
+						+ RXE_ATMETH_BYTES
+						+ RXE_DETH_BYTES +
+						+ RXE_RDETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_FETCH_ADD]			= {
+		.name	= "IB_OPCODE_RD_FETCH_ADD",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_ATMETH_MASK
+				| RXE_REQ_MASK | RXE_ATOMIC_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_ATMETH_BYTES + RXE_DETH_BYTES
+				+ RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+			[RXE_ATMETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES
+						+ RXE_DETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES +
+						+ RXE_ATMETH_BYTES
+						+ RXE_DETH_BYTES +
+						+ RXE_RDETH_BYTES,
+		}
+	},
+	[IB_OPCODE_RD_RESYNC]			= {
+		.name	= "RD_RESYNC",
+		.mask	= RXE_RDETH_MASK | RXE_DETH_MASK | RXE_REQ_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_DETH_BYTES + RXE_RDETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_DETH]	= RXE_BTH_BYTES
+						+ RXE_RDETH_BYTES,
+		}
+	},
+
+	/* UD */
+	[IB_OPCODE_UD_SEND_ONLY]			= {
+		.name	= "IB_OPCODE_UD_SEND_ONLY",
+		.mask	= RXE_DETH_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_COMP_MASK | RXE_RWR_MASK | RXE_SEND_MASK
+				| RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_DETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_DETH]	= RXE_BTH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_DETH_BYTES,
+		}
+	},
+	[IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE]		= {
+		.name	= "IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE",
+		.mask	= RXE_DETH_MASK | RXE_IMMDT_MASK | RXE_PAYLOAD_MASK
+				| RXE_REQ_MASK | RXE_COMP_MASK | RXE_RWR_MASK
+				| RXE_SEND_MASK | RXE_START_MASK | RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_IMMDT_BYTES + RXE_DETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_DETH]	= RXE_BTH_BYTES,
+			[RXE_IMMDT]	= RXE_BTH_BYTES
+						+ RXE_DETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES
+						+ RXE_DETH_BYTES
+						+ RXE_IMMDT_BYTES,
+		}
+	},
+
+	/* CNP */
+	[IB_OPCODE_CN_CNP]			= {
+		.name	= "IB_OPCODE_CN_CNP",
+		.mask	= RXE_CNP_MASK,
+		.length = RXE_BTH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+		}
+	},
+};


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 06/37] add rxe_param.h
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (4 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 05/37] add rxe_opcode.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201228.415281967-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 07/37] add rxe.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (31 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_param.h --]
[-- Type: text/plain, Size: 6603 bytes --]

default rxe device and port parameters

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_param.h |  212 ++++++++++++++++++++++++++++++++++
 1 file changed, 212 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_param.h
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_param.h
@@ -0,0 +1,212 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_PARAM_H
+#define RXE_PARAM_H
+
+enum rxe_mtu {
+	RXE_MTU_INVALID = 0,
+	RXE_MTU_256	= 1,
+	RXE_MTU_512	= 2,
+	RXE_MTU_1024	= 3,
+	RXE_MTU_2048	= 4,
+	RXE_MTU_4096	= 5,
+	RXE_MTU_8192	= 6,
+};
+
+static inline int rxe_mtu_enum_to_int(enum rxe_mtu mtu)
+{
+	switch (mtu) {
+	case RXE_MTU_256:	return	256;
+	case RXE_MTU_512:	return	512;
+	case RXE_MTU_1024:	return	1024;
+	case RXE_MTU_2048:	return	2048;
+	case RXE_MTU_4096:	return	4096;
+	case RXE_MTU_8192:	return	8192;
+	default:		return	-1;
+	}
+}
+
+static inline enum rxe_mtu rxe_mtu_int_to_enum(int mtu)
+{
+	if (mtu < 256)
+		return 0;
+	else if (mtu < 512)
+		return RXE_MTU_256;
+	else if (mtu < 1024)
+		return RXE_MTU_512;
+	else if (mtu < 2048)
+		return RXE_MTU_1024;
+	else if (mtu < 4096)
+		return RXE_MTU_2048;
+	else if (mtu < 8192)
+		return RXE_MTU_4096;
+	else
+		return RXE_MTU_8192;
+}
+
+/* Find the IB mtu for a given network MTU. */
+static inline enum rxe_mtu eth_mtu_int_to_enum(int mtu)
+{
+	mtu -= RXE_MAX_HDR_LENGTH;
+
+	return rxe_mtu_int_to_enum(mtu);
+}
+
+/*
+ * default/initial rxe device parameter settings
+ */
+enum rxe_device_param {
+	RXE_FW_VER			= 0,
+	RXE_MAX_MR_SIZE			= -1ull,
+	RXE_PAGE_SIZE_CAP		= 0xfffff000,
+	RXE_VENDOR_ID			= 0,
+	RXE_VENDOR_PART_ID		= 0,
+	RXE_HW_VER			= 0,
+	RXE_MAX_QP			= 0x10000,
+	RXE_MAX_QP_WR			= 0x4000,
+	RXE_MAX_INLINE_DATA		= 400,
+	RXE_DEVICE_CAP_FLAGS		= IB_DEVICE_BAD_PKEY_CNTR
+					| IB_DEVICE_BAD_QKEY_CNTR
+					| IB_DEVICE_AUTO_PATH_MIG
+					| IB_DEVICE_CHANGE_PHY_PORT
+					| IB_DEVICE_UD_AV_PORT_ENFORCE
+					| IB_DEVICE_PORT_ACTIVE_EVENT
+					| IB_DEVICE_SYS_IMAGE_GUID
+					| IB_DEVICE_RC_RNR_NAK_GEN
+					| IB_DEVICE_SRQ_RESIZE,
+	RXE_MAX_SGE			= 27,
+	RXE_MAX_SGE_RD			= 0,
+	RXE_MAX_CQ			= 16384,
+	RXE_MAX_LOG_CQE			= 13,
+	RXE_MAX_MR			= 2*1024,
+	RXE_MAX_PD			= 0x7ffc,
+	RXE_MAX_QP_RD_ATOM		= 128,
+	RXE_MAX_EE_RD_ATOM		= 0,
+	RXE_MAX_RES_RD_ATOM		= 0x3f000,
+	RXE_MAX_QP_INIT_RD_ATOM		= 128,
+	RXE_MAX_EE_INIT_RD_ATOM		= 0,
+	RXE_ATOMIC_CAP			= 1,
+	RXE_MAX_EE			= 0,
+	RXE_MAX_RDD			= 0,
+	RXE_MAX_MW			= 0,
+	RXE_MAX_RAW_IPV6_QP		= 0,
+	RXE_MAX_RAW_ETHY_QP		= 0,
+	RXE_MAX_MCAST_GRP		= 8192,
+	RXE_MAX_MCAST_QP_ATTACH		= 56,
+	RXE_MAX_TOT_MCAST_QP_ATTACH	= 0x70000,
+	RXE_MAX_AH			= 100,
+	RXE_MAX_FMR			= 2*1024,
+	RXE_MAX_MAP_PER_FMR		= 100,
+	RXE_MAX_SRQ			= 960,
+	RXE_MAX_SRQ_WR			= 0x4000,
+	RXE_MIN_SRQ_WR			= 1,
+	RXE_MAX_SRQ_SGE			= 27,
+	RXE_MIN_SRQ_SGE			= 1,
+	RXE_MAX_FMR_PAGE_LIST_LEN	= 0,
+	RXE_MAX_PKEYS			= 64,
+	RXE_LOCAL_CA_ACK_DELAY		= 15,
+
+	RXE_MAX_UCONTEXT		= 128,
+
+	RXE_NUM_PORT			= 1,
+	RXE_NUM_COMP_VECTORS		= 1,
+
+	RXE_MIN_QP_INDEX		= 16,
+	RXE_MAX_QP_INDEX		= 0x00020000,
+
+	RXE_MIN_SRQ_INDEX		= 0x00020001,
+	RXE_MAX_SRQ_INDEX		= 0x00040000,
+
+	RXE_MIN_MR_INDEX		= 0x00000001,
+	RXE_MAX_MR_INDEX		= 0x00020000,
+	RXE_MIN_FMR_INDEX		= 0x00020001,
+	RXE_MAX_FMR_INDEX		= 0x00040000,
+	RXE_MIN_MW_INDEX		= 0x00040001,
+	RXE_MAX_MW_INDEX		= 0x00060000,
+};
+
+/*
+ * default/initial rxe port parameters
+ */
+enum rxe_port_param {
+	RXE_PORT_STATE			= IB_PORT_DOWN,
+	RXE_PORT_MAX_MTU		= RXE_MTU_4096,
+	RXE_PORT_ACTIVE_MTU		= RXE_MTU_256,
+	RXE_PORT_GID_TBL_LEN		= 32,
+	RXE_PORT_PORT_CAP_FLAGS		= 0,
+	RXE_PORT_MAX_MSG_SZ		= 0x800000,
+	RXE_PORT_BAD_PKEY_CNTR		= 0,
+	RXE_PORT_QKEY_VIOL_CNTR		= 0,
+	RXE_PORT_LID			= 0,
+	RXE_PORT_SM_LID			= 0,
+	RXE_PORT_SM_SL			= 0,
+	RXE_PORT_LMC			= 0,
+	RXE_PORT_MAX_VL_NUM		= 1,
+	RXE_PORT_SUBNET_TIMEOUT		= 0,
+	RXE_PORT_INIT_TYPE_REPLY	= 0,
+	RXE_PORT_ACTIVE_WIDTH		= IB_WIDTH_1X,
+	RXE_PORT_ACTIVE_SPEED		= 1,
+	RXE_PORT_PKEY_TBL_LEN		= 64,
+	RXE_PORT_PHYS_STATE		= 2,
+	RXE_PORT_SUBNET_PREFIX		= 0xfe80000000000000ULL,
+	RXE_PORT_CC_TBL_LEN		= 128,
+	RXE_PORT_CC_TIMER		= 1024,
+	RXE_PORT_CC_INCREASE		= 1,
+	RXE_PORT_CC_THRESHOLD		= 64,
+	RXE_PORT_CC_CCTI_MIN		= 0,
+};
+
+/*
+ * default/initial port info parameters
+ */
+enum rxe_port_info_param {
+	RXE_PORT_INFO_VL_CAP		= 4,	/* 1-8 */
+	RXE_PORT_INFO_MTU_CAP		= 5,	/* 4096 */
+	RXE_PORT_INFO_OPER_VL		= 1,	/* 1 */
+};
+
+extern int rxe_debug_flags;
+extern int rxe_crc_disable;
+extern int rxe_nsec_per_packet;
+extern int rxe_nsec_per_kbyte;
+extern int rxe_max_skb_per_qp;
+extern int rxe_max_req_comp_gap;
+extern int rxe_max_pkt_per_ack;
+extern int rxe_default_mtu;
+extern int rxe_fast_comp;
+extern int rxe_fast_resp;
+extern int rxe_fast_req;
+extern int rxe_fast_arb;
+
+#endif /* RXE_PARAM_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 07/37] add rxe.h
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (5 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 06/37] add rxe_param.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 08/37] add rxe_loc.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (30 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe.h --]
[-- Type: text/plain, Size: 2836 bytes --]

ib_rxe external interface to lower level modules

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe.h |   64 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe.h
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe.h
@@ -0,0 +1,64 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_H
+#define RXE_H
+
+#include <linux/skbuff.h>
+#include <linux/crc32.h>
+
+#include <rdma/ib_verbs.h>
+#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_pack.h>
+#include <rdma/ib_smi.h>
+#include <rdma/ib_umem.h>
+
+#include "rxe_opcode.h"
+#include "rxe_hdr.h"
+#include "rxe_param.h"
+#include "rxe_verbs.h"
+
+#define RXE_UVERBS_ABI_VERSION		(1)
+
+#define IB_PHYS_STATE_LINK_UP		(5)
+
+int rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu,
+		unsigned int port_num);
+
+int rxe_add(struct rxe_dev *rxe, unsigned int mtu);
+
+void rxe_remove(struct rxe_dev *rxe);
+
+int rxe_rcv(struct sk_buff *skb);
+
+#endif /* RXE_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 08/37] add rxe_loc.h
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (6 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 07/37] add rxe.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 09/37] add rxe_mmap.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (29 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_loc.h --]
[-- Type: text/plain, Size: 8726 bytes --]

misc local interfaces between files in ib_rxe.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_loc.h |  261 ++++++++++++++++++++++++++++++++++++
 1 file changed, 261 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_loc.h
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_loc.h
@@ -0,0 +1,261 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_LOC_H
+#define RXE_LOC_H
+
+/* local declarations shared between rxe files */
+
+/* rxe_v.c */
+int rxe_av_chk_attr(struct rxe_dev *rxe, struct ib_ah_attr *attr);
+
+int rxe_av_from_attr(struct rxe_dev *rxe, u8 port_num,
+		     struct rxe_av *av, struct ib_ah_attr *attr);
+
+int rxe_av_to_attr(struct rxe_dev *rxe, struct rxe_av *av,
+		   struct ib_ah_attr *attr);
+
+/* rxe_cq.c */
+int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
+		    int cqe, int comp_vector, struct ib_udata *udata);
+
+int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe,
+		     int comp_vector, struct ib_ucontext *context,
+		     struct ib_udata *udata);
+
+int rxe_cq_resize_queue(struct rxe_cq *cq, int new_cqe, struct ib_udata *udata);
+
+int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited);
+
+void rxe_cq_cleanup(void *arg);
+
+/* rxe_mcast.c */
+int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid, u16 mlid,
+		      struct rxe_mc_grp **grp_p);
+
+int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+			   struct rxe_mc_grp *grp);
+
+int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+			    union ib_gid *mgid, u16 mlid);
+
+void rxe_drop_all_mcast_groups(struct rxe_qp *qp);
+
+void rxe_mc_cleanup(void *arg);
+
+/* rxe_mmap.c */
+
+/* must match struct in librxe */
+struct mminfo {
+	__u64			offset;
+	__u32			size;
+	__u32			pad;
+};
+
+struct rxe_mmap_info {
+	struct list_head	pending_mmaps;
+	struct ib_ucontext	*context;
+	struct kref		ref;
+	void			*obj;
+
+	struct mminfo info;
+};
+
+void rxe_mmap_release(struct kref *ref);
+
+struct rxe_mmap_info *rxe_create_mmap_info(struct rxe_dev *dev,
+					   u32 size,
+					   struct ib_ucontext *context,
+					   void *obj);
+
+int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma);
+
+/* rxe_mr.c */
+enum copy_direction {
+	direction_in,
+	direction_out,
+};
+
+int rxe_mem_init_dma(struct rxe_dev *rxe, struct rxe_pd *pd,
+		     int access, struct rxe_mem *mem);
+
+int rxe_mem_init_phys(struct rxe_dev *rxe, struct rxe_pd *pd,
+		      int access, u64 iova, struct ib_phys_buf *buf,
+		      int num_buf, struct rxe_mem *mem);
+
+int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,
+		      u64 length, u64 iova, int access, struct ib_udata *udata,
+		      struct rxe_mem *mr);
+
+int rxe_mem_init_fast(struct rxe_dev *rxe, struct rxe_pd *pd,
+		      int max_pages, struct rxe_mem *mem);
+
+int rxe_mem_init_mw(struct rxe_dev *rxe, struct rxe_pd *pd,
+		    struct rxe_mem *mw);
+
+int rxe_mem_init_fmr(struct rxe_dev *rxe, struct rxe_pd *pd, int access,
+		     struct ib_fmr_attr *attr, struct rxe_mem *fmr);
+
+int rxe_mem_copy(struct rxe_mem *mem, u64 iova, void *addr,
+		 int length, enum copy_direction dir, u32 *crcp);
+
+int copy_data(struct rxe_dev *rxe, struct rxe_pd *pd, int access,
+	      struct rxe_dma_info *dma, void *addr, int length,
+	      enum copy_direction dir, u32 *crcp);
+
+void *iova_to_vaddr(struct rxe_mem *mem, u64 iova, int length);
+
+enum lookup_type {
+	lookup_local,
+	lookup_remote,
+};
+
+struct rxe_mem *lookup_mem(struct rxe_pd *pd, int access, u32 key,
+			   enum lookup_type type);
+
+int mem_check_range(struct rxe_mem *mem, u64 iova, size_t length);
+
+int rxe_mem_map_pages(struct rxe_dev *rxe, struct rxe_mem *mem,
+		      u64 *page, int num_pages, u64 iova);
+
+void rxe_mem_cleanup(void *arg);
+
+int advance_dma_data(struct rxe_dma_info *dma, unsigned int length);
+
+/* rxe_qp.c */
+int rxe_qp_chk_init(struct rxe_dev *rxe, struct ib_qp_init_attr *init);
+
+int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
+		     struct ib_qp_init_attr *init, struct ib_udata *udata,
+		     struct ib_pd *ibpd);
+
+int rxe_qp_to_init(struct rxe_qp *qp, struct ib_qp_init_attr *init);
+
+int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp *qp,
+		    struct ib_qp_attr *attr, int mask);
+
+int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr,
+					 int mask, struct ib_udata *udata);
+
+int rxe_qp_to_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask);
+
+void rxe_qp_error(struct rxe_qp *qp);
+
+void rxe_qp_destroy(struct rxe_qp *qp);
+
+void rxe_qp_cleanup(void *arg);
+
+static inline int qp_num(struct rxe_qp *qp)
+{
+	return qp->ibqp.qp_num;
+}
+
+static inline enum ib_qp_type qp_type(struct rxe_qp *qp)
+{
+	return qp->ibqp.qp_type;
+}
+
+static inline enum ib_qp_state qp_state(struct rxe_qp *qp)
+{
+	return qp->attr.qp_state;
+}
+
+static inline int qp_mtu(struct rxe_qp *qp)
+{
+	if (qp->ibqp.qp_type == IB_QPT_RC || qp->ibqp.qp_type == IB_QPT_UC)
+		return qp->attr.path_mtu;
+	else
+		return RXE_PORT_MAX_MTU;
+}
+
+#define RCV_WQE_SIZE(max_sge) (sizeof(struct rxe_recv_wqe) + \
+			       (max_sge)*sizeof(struct ib_sge))
+
+void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res);
+
+static inline void rxe_advance_resp_resource(struct rxe_qp *qp)
+{
+	qp->resp.res_head++;
+	if (unlikely(qp->resp.res_head == qp->attr.max_rd_atomic))
+		qp->resp.res_head = 0;
+}
+
+void retransmit_timer(unsigned long data);
+void rnr_nak_timer(unsigned long data);
+
+void dump_qp(struct rxe_qp *qp);
+
+/* rxe_srq.c */
+#define IB_SRQ_INIT_MASK (~IB_SRQ_LIMIT)
+
+int rxe_srq_chk_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
+		     struct ib_srq_attr *attr, enum ib_srq_attr_mask mask);
+
+int rxe_srq_from_init(struct rxe_dev *rxe, struct rxe_srq *srq,
+		      struct ib_srq_init_attr *init,
+		      struct ib_ucontext *context, struct ib_udata *udata);
+
+int rxe_srq_from_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
+		      struct ib_srq_attr *attr, enum ib_srq_attr_mask mask,
+		      struct ib_udata *udata);
+
+void rxe_srq_cleanup(void *arg);
+
+
+extern struct ib_dma_mapping_ops rxe_dma_mapping_ops;
+
+void rxe_release(struct kref *kref);
+
+void arbiter_skb_queue(struct rxe_dev *rxe,
+		       struct rxe_qp *qp, struct sk_buff *skb);
+
+int rxe_arbiter(void *arg);
+int rxe_completer(void *arg);
+int rxe_requester(void *arg);
+int rxe_responder(void *arg);
+
+u32 rxe_icrc_hdr(struct rxe_pkt_info *pkt);
+u32 rxe_icrc_pkt(struct rxe_pkt_info *pkt);
+
+void rxe_resp_queue_pkt(struct rxe_dev *rxe,
+			struct rxe_qp *qp, struct sk_buff *skb);
+
+void rxe_comp_queue_pkt(struct rxe_dev *rxe,
+			struct rxe_qp *qp, struct sk_buff *skb);
+
+static inline unsigned wr_opcode_mask(int opcode, struct rxe_qp *qp)
+{
+	return rxe_wr_opcode_info[opcode].mask[qp->ibqp.qp_type];
+}
+
+#endif /* RXE_LOC_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 09/37] add rxe_mmap.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (7 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 08/37] add rxe_loc.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 10/37] add rxe_queue.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (28 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_mmap.c --]
[-- Type: text/plain, Size: 5460 bytes --]

mmap routines.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_mmap.c |  171 +++++++++++++++++++++++++++++++++++
 1 file changed, 171 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_mmap.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_mmap.c
@@ -0,0 +1,171 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+#include <linux/errno.h>
+#include <asm/pgtable.h>
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+void rxe_mmap_release(struct kref *ref)
+{
+	struct rxe_mmap_info *ip = container_of(ref,
+					struct rxe_mmap_info, ref);
+	struct rxe_dev *rxe = to_rdev(ip->context->device);
+
+	spin_lock_bh(&rxe->pending_lock);
+
+	if (!list_empty(&ip->pending_mmaps))
+		list_del(&ip->pending_mmaps);
+
+	spin_unlock_bh(&rxe->pending_lock);
+
+	vfree(ip->obj);		/* buf */
+	kfree(ip);
+}
+
+/*
+ * open and close keep track of how many times the CQ is mapped,
+ * to avoid releasing it.
+ */
+static void rxe_vma_open(struct vm_area_struct *vma)
+{
+	struct rxe_mmap_info *ip = vma->vm_private_data;
+
+	kref_get(&ip->ref);
+}
+
+static void rxe_vma_close(struct vm_area_struct *vma)
+{
+	struct rxe_mmap_info *ip = vma->vm_private_data;
+
+	kref_put(&ip->ref, rxe_mmap_release);
+
+}
+
+static struct vm_operations_struct rxe_vm_ops = {
+	.open = rxe_vma_open,
+	.close = rxe_vma_close,
+};
+
+/**
+ * rxe_mmap - create a new mmap region
+ * @context: the IB user context of the process making the mmap() call
+ * @vma: the VMA to be initialized
+ * Return zero if the mmap is OK. Otherwise, return an errno.
+ */
+int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
+{
+	struct rxe_dev *rxe = to_rdev(context->device);
+	unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
+	unsigned long size = vma->vm_end - vma->vm_start;
+	struct rxe_mmap_info *ip, *pp;
+	int ret;
+
+	/*
+	 * Search the device's list of objects waiting for a mmap call.
+	 * Normally, this list is very short since a call to create a
+	 * CQ, QP, or SRQ is soon followed by a call to mmap().
+	 */
+	spin_lock_bh(&rxe->pending_lock);
+	list_for_each_entry_safe(ip, pp, &rxe->pending_mmaps, pending_mmaps) {
+		if (context != ip->context || (__u64)offset != ip->info.offset)
+			continue;
+
+		/* Don't allow a mmap larger than the object. */
+		if (size > ip->info.size)
+			break;
+
+		goto found_it;
+	}
+	pr_warning("unable to find pending mmap info\n");
+	spin_unlock_bh(&rxe->pending_lock);
+	ret = -EINVAL;
+	goto done;
+
+found_it:
+	list_del_init(&ip->pending_mmaps);
+	spin_unlock_bh(&rxe->pending_lock);
+
+	ret = remap_vmalloc_range(vma, ip->obj, 0);
+	if (ret) {
+		pr_info("rxe: err %d from remap_vmalloc_range\n", ret);
+		goto done;
+	}
+
+	vma->vm_ops = &rxe_vm_ops;
+	vma->vm_private_data = ip;
+	rxe_vma_open(vma);
+done:
+	return ret;
+}
+
+/*
+ * Allocate information for rxe_mmap
+ */
+struct rxe_mmap_info *rxe_create_mmap_info(struct rxe_dev *rxe,
+					   u32 size,
+					   struct ib_ucontext *context,
+					   void *obj)
+{
+	struct rxe_mmap_info *ip;
+
+	ip = kmalloc(sizeof *ip, GFP_KERNEL);
+	if (!ip)
+		return NULL;
+
+	size = PAGE_ALIGN(size);
+
+	spin_lock_bh(&rxe->mmap_offset_lock);
+
+	if (rxe->mmap_offset == 0)
+		rxe->mmap_offset = PAGE_SIZE;
+
+	ip->info.offset = rxe->mmap_offset;
+	rxe->mmap_offset += size;
+
+	spin_unlock_bh(&rxe->mmap_offset_lock);
+
+	INIT_LIST_HEAD(&ip->pending_mmaps);
+	ip->info.size = size;
+	ip->context = context;
+	ip->obj = obj;
+	kref_init(&ip->ref);
+
+	return ip;
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 10/37] add rxe_queue.h
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (8 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 09/37] add rxe_mmap.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 11/37] add rxe_queue.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (27 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_queue.h --]
[-- Type: text/plain, Size: 6174 bytes --]

declarations for common work and completion queue.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_queue.h |  174 ++++++++++++++++++++++++++++++++++
 1 file changed, 174 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_queue.h
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_queue.h
@@ -0,0 +1,174 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_QUEUE_H
+#define RXE_QUEUE_H
+
+/* implements a simple circular buffer that can optionally be
+   shared between user space and the kernel and can be resized
+
+   the requested element size is rounded up to a power of 2
+   and the number of elements in the buffer is also rounded
+   up to a power of 2. Since the queue is empty when the
+   producer and consumer indices match the maximum capacity
+   of the queue is one less than the number of element slots */
+
+/* this data structure is shared between user space and kernel
+   space for those cases where the queue is shared. It contains
+   the producer and consumer indices. Is also contains a copy
+   of the queue size parameters for user space to use but the
+   kernel must use the parameters in the rxe_queue struct
+   this MUST MATCH the corresponding librxe struct
+   for performance reasons arrange to have producer and consumer
+   pointers in separate cache lines
+   the kernel should always mask the indices to avoid accessing
+   memory outside of the data area */
+struct rxe_queue_buf {
+	__u32			log2_elem_size;
+	__u32			index_mask;
+	__u32			pad_1[30];
+	__u32			producer_index;
+	__u32			pad_2[31];
+	__u32			consumer_index;
+	__u32			pad_3[31];
+	__u8			data[0];
+};
+
+struct rxe_queue {
+	struct rxe_dev		*rxe;
+	struct rxe_queue_buf	*buf;
+	struct rxe_mmap_info	*ip;
+	size_t			buf_size;
+	size_t			elem_size;
+	unsigned int		log2_elem_size;
+	unsigned int		index_mask;
+};
+
+int do_mmap_info(struct rxe_dev *rxe,
+		 struct ib_udata *udata,
+		 int offset,
+		 struct ib_ucontext *context,
+		 struct rxe_queue_buf *buf,
+		 size_t buf_size,
+		 struct rxe_mmap_info **ip_p);
+
+struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe,
+				 unsigned int *num_elem,
+				 unsigned int elem_size);
+
+int rxe_queue_resize(struct rxe_queue *q,
+		     unsigned int *num_elem_p,
+		     unsigned int elem_size,
+		     struct ib_ucontext *context,
+		     struct ib_udata *udata,
+		     spinlock_t *producer_lock,
+		     spinlock_t *consumer_lock);
+
+void rxe_queue_cleanup(struct rxe_queue *queue);
+
+static inline int next_index(struct rxe_queue *q, int index)
+{
+	return (index + 1) & q->buf->index_mask;
+}
+
+static inline int queue_empty(struct rxe_queue *q)
+{
+	return ((q->buf->producer_index - q->buf->consumer_index)
+			& q->index_mask) == 0;
+}
+
+static inline int queue_full(struct rxe_queue *q)
+{
+	return ((q->buf->producer_index + 1 - q->buf->consumer_index)
+			& q->index_mask) == 0;
+}
+
+static inline void advance_producer(struct rxe_queue *q)
+{
+	q->buf->producer_index = (q->buf->producer_index + 1)
+			& q->index_mask;
+}
+
+static inline void advance_consumer(struct rxe_queue *q)
+{
+	q->buf->consumer_index = (q->buf->consumer_index + 1)
+			& q->index_mask;
+}
+
+static inline void *producer_addr(struct rxe_queue *q)
+{
+	return q->buf->data + ((q->buf->producer_index & q->index_mask)
+				<< q->log2_elem_size);
+}
+
+static inline void *consumer_addr(struct rxe_queue *q)
+{
+	return q->buf->data + ((q->buf->consumer_index & q->index_mask)
+				<< q->log2_elem_size);
+}
+
+static inline unsigned int producer_index(struct rxe_queue *q)
+{
+	return q->buf->producer_index;
+}
+
+static inline unsigned int consumer_index(struct rxe_queue *q)
+{
+	return q->buf->consumer_index;
+}
+
+static inline void *addr_from_index(struct rxe_queue *q, unsigned int index)
+{
+	return q->buf->data + ((index & q->index_mask)
+				<< q->buf->log2_elem_size);
+}
+
+static inline unsigned int index_from_addr(const struct rxe_queue *q,
+					   const void *addr)
+{
+	return (((u8 *)addr - q->buf->data) >> q->log2_elem_size)
+		& q->index_mask;
+}
+
+static inline unsigned int queue_count(const struct rxe_queue *q)
+{
+	return (q->buf->producer_index - q->buf->consumer_index)
+		& q->index_mask;
+}
+
+static inline void *queue_head(struct rxe_queue *q)
+{
+	return queue_empty(q) ? NULL : consumer_addr(q);
+}
+
+#endif /* RXE_QUEUE_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 11/37] add rxe_queue.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (9 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 10/37] add rxe_queue.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 12/37] add rxe_verbs.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (26 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_queue.c --]
[-- Type: text/plain, Size: 5863 bytes --]

common work and completion queue implementation.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_queue.c |  209 ++++++++++++++++++++++++++++++++++
 1 file changed, 209 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_queue.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_queue.c
@@ -0,0 +1,209 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must retailuce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/vmalloc.h>
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+int do_mmap_info(struct rxe_dev *rxe,
+		 struct ib_udata *udata,
+		 int offset,
+		 struct ib_ucontext *context,
+		 struct rxe_queue_buf *buf,
+		 size_t buf_size,
+		 struct rxe_mmap_info **ip_p)
+{
+	int err;
+	struct rxe_mmap_info *ip = NULL;
+
+	if (udata) {
+		if ((udata->outlen - offset) < sizeof(struct mminfo))
+			goto err1;
+
+		ip = rxe_create_mmap_info(rxe, buf_size, context, buf);
+		if (!ip)
+			goto err1;
+
+		err = copy_to_user(udata->outbuf + offset, &ip->info,
+				       sizeof(ip->info));
+		if (err)
+			goto err2;
+
+		spin_lock_bh(&rxe->pending_lock);
+		list_add(&ip->pending_mmaps, &rxe->pending_mmaps);
+		spin_unlock_bh(&rxe->pending_lock);
+	}
+
+	*ip_p = ip;
+
+	return 0;
+
+err2:
+	kfree(ip);
+err1:
+	return -EINVAL;
+}
+
+struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe,
+				 unsigned int *num_elem,
+				 unsigned int elem_size)
+{
+	struct rxe_queue *q;
+	size_t buf_size;
+	unsigned int num_slots;
+
+	/* num_elem == 0 is allowed, but uninteresting */
+	if (*num_elem < 0)
+		goto err1;
+
+	q = kmalloc(sizeof(*q), GFP_KERNEL);
+	if (!q)
+		goto err1;
+
+	q->rxe = rxe;
+
+	q->elem_size = elem_size;
+	elem_size = roundup_pow_of_two(elem_size);
+	q->log2_elem_size = order_base_2(elem_size);
+
+	num_slots = *num_elem + 1;
+	num_slots = roundup_pow_of_two(num_slots);
+	q->index_mask = num_slots - 1;
+
+	buf_size = sizeof(struct rxe_queue_buf) + num_slots*elem_size;
+
+	q->buf = vmalloc_user(buf_size);
+	if (!q->buf)
+		goto err2;
+
+	q->buf->log2_elem_size = q->log2_elem_size;
+	q->buf->index_mask = q->index_mask;
+
+	q->buf->producer_index = 0;
+	q->buf->consumer_index = 0;
+
+	q->buf_size = buf_size;
+
+	*num_elem = num_slots - 1;
+	return q;
+
+err2:
+	kfree(q);
+err1:
+	return NULL;
+}
+
+/* copies elements from original q to new q and then
+   swaps the contents of the two q headers. This is
+   so that if anyone is holding a pointer to q it will
+   still work */
+static int resize_finish(struct rxe_queue *q, struct rxe_queue *new_q,
+			 unsigned int num_elem)
+{
+	struct rxe_queue temp;
+
+	if (!queue_empty(q) && (num_elem < queue_count(q)))
+		return -EINVAL;
+
+	while (!queue_empty(q)) {
+		memcpy(producer_addr(new_q), consumer_addr(q),
+		       new_q->elem_size);
+		advance_producer(new_q);
+		advance_consumer(q);
+	}
+
+	temp = *q;
+	*q = *new_q;
+	*new_q = temp;
+
+
+	return 0;
+}
+
+int rxe_queue_resize(struct rxe_queue *q,
+		     unsigned int *num_elem_p,
+		     unsigned int elem_size,
+		     struct ib_ucontext *context,
+		     struct ib_udata *udata,
+		     spinlock_t *producer_lock,
+		     spinlock_t *consumer_lock)
+{
+	struct rxe_queue *new_q;
+	unsigned int num_elem = *num_elem_p;
+	int err;
+	unsigned long flags = 0;
+
+	new_q = rxe_queue_init(q->rxe, &num_elem, elem_size);
+	if (!new_q)
+		return -ENOMEM;
+
+	err = do_mmap_info(new_q->rxe, udata, 0, context, new_q->buf,
+			   new_q->buf_size, &new_q->ip);
+	if (err) {
+		vfree(new_q->buf);
+		kfree(new_q);
+		goto err1;
+	}
+
+	spin_lock_bh(consumer_lock);
+
+	if (producer_lock) {
+		spin_lock_irqsave(producer_lock, flags);
+		err = resize_finish(q, new_q, num_elem);
+		spin_unlock_irqrestore(producer_lock, flags);
+	} else
+		err = resize_finish(q, new_q, num_elem);
+
+	spin_unlock_bh(consumer_lock);
+
+	rxe_queue_cleanup(new_q);	/* new/old dep on err */
+	if (err)
+		goto err1;
+
+	*num_elem_p = num_elem;
+	return 0;
+
+err1:
+	return err;
+}
+
+void rxe_queue_cleanup(struct rxe_queue *q)
+{
+	if (q->ip)
+		kref_put(&q->ip->ref, rxe_mmap_release);
+	else
+		vfree(q->buf);
+
+	kfree(q);
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 12/37] add rxe_verbs.h
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (10 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 11/37] add rxe_queue.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201228.710989476-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 13/37] add rxe_verbs.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (25 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_verbs.h --]
[-- Type: text/plain, Size: 13030 bytes --]

declarations for rxe interface to rdma/core

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_verbs.h |  571 ++++++++++++++++++++++++++++++++++
 1 file changed, 571 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_verbs.h
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_verbs.h
@@ -0,0 +1,571 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *	   Redistribution and use in source and binary forms, with or
+ *	   without modification, are permitted provided that the following
+ *	   conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_VERBS_H
+#define RXE_VERBS_H
+
+#include <linux/interrupt.h>
+#include "rxe_pool.h"
+#include "rxe_task.h"
+
+static inline int pkey_match(u16 key1, u16 key2)
+{
+	return (((key1 & 0x7fff) != 0) &&
+		((key1 & 0x7fff) == (key2 & 0x7fff)) &&
+		((key1 & 0x8000) || (key2 & 0x8000))) ? 1 : 0;
+}
+
+/* Return >0 if psn_a > psn_b
+ *	   0 if psn_a == psn_b
+ *	  <0 if psn_a < psn_b
+ */
+static inline int psn_compare(u32 psn_a, u32 psn_b)
+{
+	s32 diff;
+	diff = (psn_a - psn_b) << 8;
+	return diff;
+}
+
+struct rxe_ucontext {
+	struct rxe_pool_entry	pelem;
+	struct ib_ucontext	ibuc;
+};
+
+struct rxe_pd {
+	struct rxe_pool_entry	pelem;
+	struct ib_pd		ibpd;
+};
+
+#define RXE_LL_ADDR_LEN		(16)
+
+struct rxe_av {
+	struct ib_ah_attr	attr;
+	u8			ll_addr[RXE_LL_ADDR_LEN];
+
+};
+
+struct rxe_ah {
+	struct rxe_pool_entry	pelem;
+	struct ib_ah		ibah;
+	struct rxe_pd		*pd;
+	struct rxe_av		av;
+};
+
+struct rxe_cqe {
+	union {
+		struct ib_wc		ibwc;
+		struct ib_uverbs_wc	uibwc;
+	};
+};
+
+struct rxe_cq {
+	struct rxe_pool_entry	pelem;
+	struct ib_cq		ibcq;
+	struct rxe_queue	*queue;
+	spinlock_t		cq_lock;
+	u8			notify;
+	u8			special;
+	int			is_user;
+	struct tasklet_struct	comp_task;
+};
+
+struct rxe_dma_info {
+	__u32			length;
+	__u32			resid;
+	__u32			cur_sge;
+	__u32			num_sge;
+	__u32			sge_offset;
+	union {
+		u8		inline_data[0];
+		struct ib_sge	sge[0];
+	};
+};
+
+enum wqe_state {
+	wqe_state_posted,
+	wqe_state_processing,
+	wqe_state_pending,
+	wqe_state_done,
+	wqe_state_error,
+};
+
+/* must match corresponding data structure in librxe */
+struct rxe_send_wqe {
+	struct ib_send_wr	ibwr;
+	struct rxe_av		av;	/* UD only */
+	u32			status;
+	u32			state;
+	u64			iova;
+	u32			mask;
+	u32			first_psn;
+	u32			last_psn;
+	u32			ack_length;
+	u32			ssn;
+	u32			has_rd_atomic;
+	struct rxe_dma_info	dma;	/* must go last */
+};
+
+struct rxe_recv_wqe {
+	__u64			wr_id;
+	__u32			num_sge;
+	__u32			padding;
+	struct rxe_dma_info	dma;
+};
+
+struct rxe_sq {
+	int			max_wr;
+	int			max_sge;
+	int			max_inline;
+	struct ib_wc		next_wc;
+	spinlock_t		sq_lock;
+	struct rxe_queue	*queue;
+};
+
+struct rxe_rq {
+	int			max_wr;
+	int			max_sge;
+	struct ib_wc		next_wc;
+	spinlock_t		producer_lock;
+	spinlock_t		consumer_lock;
+	struct rxe_queue	*queue;
+};
+
+struct rxe_srq {
+	struct rxe_pool_entry	pelem;
+	struct ib_srq		ibsrq;
+	struct rxe_pd		*pd;
+	struct rxe_cq		*cq;
+	struct rxe_rq		rq;
+	u32			srq_num;
+
+	void			(*event_handler)(
+					struct ib_event *, void *);
+	void			*context;
+
+	int			limit;
+	int			error;
+};
+
+enum rxe_qp_state {
+	QP_STATE_RESET,
+	QP_STATE_INIT,
+	QP_STATE_READY,
+	QP_STATE_DRAIN,		/* req only */
+	QP_STATE_DRAINED,	/* req only */
+	QP_STATE_ERROR
+};
+
+extern char *rxe_qp_state_name[];
+
+struct rxe_req_info {
+	enum rxe_qp_state	state;
+	int			wqe_index;
+	u32			psn;
+	int			opcode;
+	atomic_t		rd_atomic;
+	int			wait_fence;
+	int			need_rd_atomic;
+	int			wait_psn;
+	int			need_retry;
+	int			noack_pkts;
+	struct rxe_task		task;
+};
+
+struct rxe_comp_info {
+	u32			psn;
+	int			opcode;
+	int			timeout;
+	int			timeout_retry;
+	u32			retry_cnt;
+	u32			rnr_retry;
+	struct rxe_task		task;
+};
+
+enum rdatm_res_state {
+	rdatm_res_state_next,
+	rdatm_res_state_new,
+	rdatm_res_state_replay,
+};
+
+struct resp_res {
+	int			type;
+	u32			first_psn;
+	u32			last_psn;
+	u32			cur_psn;
+	enum rdatm_res_state	state;
+
+	union {
+		struct {
+			struct sk_buff	*skb;
+		} atomic;
+		struct {
+			struct rxe_mem	*mr;
+			u64		va_org;
+			u32		rkey;
+			u32		length;
+			u64		va;
+			u32		resid;
+		} read;
+	};
+};
+
+struct rxe_resp_info {
+	enum rxe_qp_state	state;
+	u32			msn;
+	u32			psn;
+	int			opcode;
+	int			drop_msg;
+	int			goto_error;
+	int			sent_psn_nak;
+	enum ib_wc_status	status;
+	u8			aeth_syndrome;
+
+	/* Receive only */
+	struct rxe_recv_wqe	*wqe;
+
+	/* RDMA read / atomic only */
+	u64			va;
+	struct rxe_mem		*mr;
+	u32			resid;
+	u32			rkey;
+	u64			atomic_orig;
+
+	/* SRQ only */
+	struct {
+		struct rxe_recv_wqe	wqe;
+		struct ib_sge		sge[RXE_MAX_SGE];
+	} srq_wqe;
+
+	/* Responder resources. It's a circular list where the oldest
+	 * resource is dropped first. */
+	struct resp_res		*resources;
+	unsigned int		res_head;
+	unsigned int		res_tail;
+	struct resp_res		*res;
+	struct rxe_task		task;
+};
+
+struct rxe_qp {
+	struct rxe_pool_entry	pelem;
+	struct ib_qp		ibqp;
+	struct ib_qp_attr	attr;
+	unsigned int		valid;
+	unsigned int		mtu;
+	int			is_user;
+
+	struct rxe_pd		*pd;
+	struct rxe_srq		*srq;
+	struct rxe_cq		*scq;
+	struct rxe_cq		*rcq;
+
+	enum ib_sig_type	sq_sig_type;
+
+	struct rxe_sq		sq;
+	struct rxe_rq		rq;
+
+	struct rxe_av		pri_av;
+	struct rxe_av		alt_av;
+
+	/* list of mcast groups qp has joined
+	   (for cleanup) */
+	struct list_head	grp_list;
+	spinlock_t		grp_lock;
+
+	struct sk_buff_head	req_pkts;
+	struct sk_buff_head	resp_pkts;
+	struct sk_buff_head	send_pkts;
+
+	struct rxe_req_info	req;
+	struct rxe_comp_info	comp;
+	struct rxe_resp_info	resp;
+
+	struct ib_udata		*udata;
+
+	struct list_head	arbiter_list;
+	atomic_t		ssn;
+	atomic_t		req_skb_in;
+	atomic_t		resp_skb_in;
+	atomic_t		req_skb_out;
+	atomic_t		resp_skb_out;
+	int			need_req_skb;
+
+	/* Timer for retranmitting packet when ACKs have been lost. RC
+	 * only. The requester sets it when it is not already
+	 * started. The responder resets it whenever an ack is
+	 * received. */
+	struct timer_list retrans_timer;
+	u64 qp_timeout_jiffies;
+
+	/* Timer for handling RNR NAKS. */
+	struct timer_list rnr_nak_timer;
+
+	spinlock_t		state_lock;
+};
+
+enum rxe_mem_state {
+	RXE_MEM_STATE_ZOMBIE,
+	RXE_MEM_STATE_INVALID,
+	RXE_MEM_STATE_FREE,
+	RXE_MEM_STATE_VALID,
+};
+
+enum rxe_mem_type {
+	RXE_MEM_TYPE_NONE,
+	RXE_MEM_TYPE_DMA,
+	RXE_MEM_TYPE_MR,
+	RXE_MEM_TYPE_FMR,
+	RXE_MEM_TYPE_MW,
+};
+
+#define RXE_BUF_PER_MAP		(PAGE_SIZE/sizeof(struct ib_phys_buf))
+
+struct rxe_map {
+	struct ib_phys_buf	buf[RXE_BUF_PER_MAP];
+};
+
+struct rxe_mem {
+	struct rxe_pool_entry	pelem;
+	union {
+		struct ib_mr		ibmr;
+		struct ib_fmr		ibfmr;
+		struct ib_mw		ibmw;
+	};
+
+	struct rxe_pd		*pd;
+	struct ib_umem		*umem;
+
+	u32			lkey;
+	u32			rkey;
+
+	enum rxe_mem_state	state;
+	enum rxe_mem_type	type;
+	u64			va;
+	u64			iova;
+	size_t			length;
+	u32			offset;
+	int			access;
+
+	int			page_shift;
+	int			page_mask;
+	int			map_shift;
+	int			map_mask;
+
+	u32			num_buf;
+
+	u32			max_buf;
+	u32			num_map;
+
+	struct rxe_map		**map;
+};
+
+struct rxe_fast_reg_page_list {
+	struct ib_fast_reg_page_list	ibfrpl;
+};
+
+struct rxe_mc_grp {
+	struct rxe_pool_entry	pelem;
+	spinlock_t		mcg_lock;
+	struct rxe_dev		*rxe;
+	struct list_head	qp_list;
+	union ib_gid		mgid;
+	int			num_qp;
+	u32			qkey;
+	u16			mlid;
+	u16			pkey;
+};
+
+struct rxe_mc_elem {
+	struct rxe_pool_entry	pelem;
+	struct list_head	qp_list;
+	struct list_head	grp_list;
+	struct rxe_qp		*qp;
+	struct rxe_mc_grp	*grp;
+};
+
+struct rxe_port {
+	struct ib_port_attr	attr;
+	u16			*pkey_tbl;
+	__be64			*guid_tbl;
+	__be64			subnet_prefix;
+
+	/* rate control */
+	/* TODO */
+
+	spinlock_t		port_lock;
+
+	unsigned int		mtu_cap;
+
+	/* special QPs */
+	u32			qp_smi_index;
+	u32			qp_gsi_index;
+};
+
+struct rxe_arbiter {
+	struct rxe_task		task;
+	struct list_head	qp_list;
+	spinlock_t		list_lock;
+	struct timespec		time;
+	int			delay;
+	int			queue_stalled;
+};
+
+/* callbacks from ib_rxe to network interface layer */
+struct rxe_ifc_ops {
+	void (*release)(struct rxe_dev *rxe);
+	__be64 (*node_guid)(struct rxe_dev *rxe);
+	__be64 (*port_guid)(struct rxe_dev *rxe, unsigned int port_num);
+	struct device *(*dma_device)(struct rxe_dev *rxe);
+	int (*mcast_add)(struct rxe_dev *rxe, union ib_gid *mgid);
+	int (*mcast_delete)(struct rxe_dev *rxe, union ib_gid *mgid);
+	int (*send)(struct rxe_dev *rxe, struct sk_buff *skb);
+	int (*loopback)(struct rxe_dev *rxe, struct sk_buff *skb);
+	struct sk_buff *(*init_packet)(struct rxe_dev *rxe, struct rxe_av *av,
+				       int paylen);
+	int (*init_av)(struct rxe_dev *rxe, struct ib_ah_attr *attr,
+		       struct rxe_av *av);
+	char *(*parent_name)(struct rxe_dev *rxe, unsigned int port_num);
+	enum rdma_link_layer (*link_layer)(struct rxe_dev *rxe,
+			      unsigned int port_num);
+};
+
+#define RXE_QUEUE_STOPPED		(999)
+
+struct rxe_dev {
+	struct ib_device	ib_dev;
+	struct ib_device_attr	attr;
+	int			max_ucontext;
+	int			max_inline_data;
+	struct kref		ref_cnt;
+
+	struct rxe_ifc_ops	*ifc_ops;
+
+	struct net_device	*ndev;
+
+	struct rxe_arbiter	arbiter;
+
+	atomic_t		ind;
+
+	atomic_t		req_skb_in;
+	atomic_t		resp_skb_in;
+	atomic_t		req_skb_out;
+	atomic_t		resp_skb_out;
+
+	int			xmit_errors;
+
+	struct rxe_pool		uc_pool;
+	struct rxe_pool		pd_pool;
+	struct rxe_pool		ah_pool;
+	struct rxe_pool		srq_pool;
+	struct rxe_pool		qp_pool;
+	struct rxe_pool		cq_pool;
+	struct rxe_pool		mr_pool;
+	struct rxe_pool		mw_pool;
+	struct rxe_pool		fmr_pool;
+	struct rxe_pool		mc_grp_pool;
+	struct rxe_pool		mc_elem_pool;
+
+	spinlock_t		pending_lock;
+	struct list_head	pending_mmaps;
+
+	spinlock_t		mmap_offset_lock;
+	int			mmap_offset;
+
+	u8			num_ports;
+	struct rxe_port		*port;
+
+	enum rxe_mtu		pref_mtu;
+};
+
+static inline struct rxe_dev *to_rdev(struct ib_device *dev)
+{
+	return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL;
+}
+
+static inline struct rxe_ucontext *to_ruc(struct ib_ucontext *uc)
+{
+	return uc ? container_of(uc, struct rxe_ucontext, ibuc) : NULL;
+}
+
+static inline struct rxe_pd *to_rpd(struct ib_pd *pd)
+{
+	return pd ? container_of(pd, struct rxe_pd, ibpd) : NULL;
+}
+
+static inline struct rxe_ah *to_rah(struct ib_ah *ah)
+{
+	return ah ? container_of(ah, struct rxe_ah, ibah) : NULL;
+}
+
+static inline struct rxe_srq *to_rsrq(struct ib_srq *srq)
+{
+	return srq ? container_of(srq, struct rxe_srq, ibsrq) : NULL;
+}
+
+static inline struct rxe_qp *to_rqp(struct ib_qp *qp)
+{
+	return qp ? container_of(qp, struct rxe_qp, ibqp) : NULL;
+}
+
+static inline struct rxe_cq *to_rcq(struct ib_cq *cq)
+{
+	return cq ? container_of(cq, struct rxe_cq, ibcq) : NULL;
+}
+
+static inline struct rxe_mem *to_rmr(struct ib_mr *mr)
+{
+	return mr ? container_of(mr, struct rxe_mem, ibmr) : NULL;
+}
+
+static inline struct rxe_mem *to_rfmr(struct ib_fmr *fmr)
+{
+	return fmr ? container_of(fmr, struct rxe_mem, ibfmr) : NULL;
+}
+
+static inline struct rxe_mem *to_rmw(struct ib_mw *mw)
+{
+	return mw ? container_of(mw, struct rxe_mem, ibmw) : NULL;
+}
+
+static inline struct rxe_fast_reg_page_list *to_rfrpl(
+	struct ib_fast_reg_page_list *frpl)
+{
+	return frpl ? container_of(frpl,
+		struct rxe_fast_reg_page_list, ibfrpl) : NULL;
+}
+
+int rxe_register_device(struct rxe_dev *rxe);
+int rxe_unregister_device(struct rxe_dev *rxe);
+
+void rxe_mc_cleanup(void *arg);
+
+#endif /* RXE_VERBS_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 13/37] add rxe_verbs.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (11 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 12/37] add rxe_verbs.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201228.761358826-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 14/37] add rxe_pool.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (24 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_verbs.c --]
[-- Type: text/plain, Size: 30744 bytes --]

rxe interface to rdma/core

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_verbs.c | 1344 ++++++++++++++++++++++++++++++++++
 1 file changed, 1344 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_verbs.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_verbs.c
@@ -0,0 +1,1344 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+static int rxe_query_device(struct ib_device *dev, struct ib_device_attr *attr)
+{
+	struct rxe_dev *rxe = to_rdev(dev);
+
+	*attr = rxe->attr;
+	return 0;
+}
+
+static int rxe_query_port(struct ib_device *dev,
+			  u8 port_num, struct ib_port_attr *attr)
+{
+	struct rxe_dev *rxe = to_rdev(dev);
+	struct rxe_port *port;
+
+	if (unlikely(port_num < 1 || port_num > rxe->num_ports)) {
+		pr_warn("invalid port_number %d\n", port_num);
+		goto err1;
+	}
+
+	port = &rxe->port[port_num - 1];
+
+	*attr = port->attr;
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+static int rxe_query_gid(struct ib_device *device,
+			 u8 port_num, int index, union ib_gid *gid)
+{
+	struct rxe_dev *rxe = to_rdev(device);
+	struct rxe_port *port;
+
+	if (unlikely(port_num < 1 || port_num > rxe->num_ports)) {
+		pr_warn("invalid port_num = %d\n", port_num);
+		goto err1;
+	}
+
+	port = &rxe->port[port_num - 1];
+
+	if (unlikely(index < 0 || index >= port->attr.gid_tbl_len)) {
+		pr_warn("invalid index = %d\n", index);
+		goto err1;
+	}
+
+	gid->global.subnet_prefix = port->subnet_prefix;
+	gid->global.interface_id = port->guid_tbl[index];
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+static int rxe_query_pkey(struct ib_device *device,
+			  u8 port_num, u16 index, u16 *pkey)
+{
+	struct rxe_dev *rxe = to_rdev(device);
+	struct rxe_port *port;
+
+	if (unlikely(port_num < 1 || port_num > rxe->num_ports)) {
+		pr_warn("invalid port_num = %d\n", port_num);
+		goto err1;
+	}
+
+	port = &rxe->port[port_num - 1];
+
+	if (unlikely(index >= port->attr.pkey_tbl_len)) {
+		pr_warn("invalid index = %d\n", index);
+		goto err1;
+	}
+
+	*pkey = port->pkey_tbl[index];
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+static int rxe_modify_device(struct ib_device *dev,
+			     int mask, struct ib_device_modify *attr)
+{
+	struct rxe_dev *rxe = to_rdev(dev);
+
+	if (mask & IB_DEVICE_MODIFY_SYS_IMAGE_GUID)
+		rxe->attr.sys_image_guid = cpu_to_be64(attr->sys_image_guid);
+
+	if (mask & IB_DEVICE_MODIFY_NODE_DESC) {
+		memcpy(rxe->ib_dev.node_desc,
+		       attr->node_desc, sizeof(rxe->ib_dev.node_desc));
+	}
+
+	return 0;
+}
+
+static int rxe_modify_port(struct ib_device *dev,
+			   u8 port_num, int mask, struct ib_port_modify *attr)
+{
+	struct rxe_dev *rxe = to_rdev(dev);
+	struct rxe_port *port;
+
+	if (unlikely(port_num < 1 || port_num > rxe->num_ports)) {
+		pr_warn("invalid port_num = %d\n", port_num);
+		goto err1;
+	}
+
+	port = &rxe->port[port_num - 1];
+
+	port->attr.port_cap_flags |= attr->set_port_cap_mask;
+	port->attr.port_cap_flags &= ~attr->clr_port_cap_mask;
+
+	if (mask & IB_PORT_RESET_QKEY_CNTR)
+		port->attr.qkey_viol_cntr = 0;
+
+	if (mask & IB_PORT_INIT_TYPE)
+		/* TODO init type */
+		;
+
+	if (mask & IB_PORT_SHUTDOWN)
+		/* TODO shutdown port */
+		;
+
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+static enum rdma_link_layer rxe_get_link_layer(struct ib_device *dev,
+					       u8 port_num)
+{
+	struct rxe_dev *rxe = to_rdev(dev);
+
+	return rxe->ifc_ops->link_layer(rxe, port_num);
+}
+
+static struct ib_ucontext *rxe_alloc_ucontext(struct ib_device *dev,
+					      struct ib_udata *udata)
+{
+	struct rxe_dev *rxe = to_rdev(dev);
+	struct rxe_ucontext *uc;
+
+	uc = rxe_alloc(&rxe->uc_pool);
+	return uc ? &uc->ibuc : ERR_PTR(-ENOMEM);
+}
+
+static int rxe_dealloc_ucontext(struct ib_ucontext *ibuc)
+{
+	struct rxe_ucontext *uc = to_ruc(ibuc);
+
+	rxe_drop_ref(uc);
+	return 0;
+}
+
+static struct ib_pd *rxe_alloc_pd(struct ib_device *dev,
+				  struct ib_ucontext *context,
+				  struct ib_udata *udata)
+{
+	struct rxe_dev *rxe = to_rdev(dev);
+	struct rxe_pd *pd;
+
+	pd = rxe_alloc(&rxe->pd_pool);
+	return pd ? &pd->ibpd : ERR_PTR(-ENOMEM);
+}
+
+static int rxe_dealloc_pd(struct ib_pd *ibpd)
+{
+	struct rxe_pd *pd = to_rpd(ibpd);
+
+	rxe_drop_ref(pd);
+	return 0;
+}
+
+static struct ib_ah *rxe_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(ibpd->device);
+	struct rxe_pd *pd = to_rpd(ibpd);
+	struct rxe_ah *ah;
+
+	err = rxe_av_chk_attr(rxe, attr);
+	if (err)
+		goto err1;
+
+	ah = rxe_alloc(&rxe->ah_pool);
+	if (!ah) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	rxe_add_ref(pd);
+	ah->pd = pd;
+
+	err = rxe_av_from_attr(rxe, attr->port_num, &ah->av, attr);
+	if (err)
+		goto err2;
+
+	return &ah->ibah;
+
+err2:
+	rxe_drop_ref(pd);
+	rxe_drop_ref(ah);
+err1:
+	return ERR_PTR(err);
+}
+
+static int rxe_modify_ah(struct ib_ah *ibah, struct ib_ah_attr *attr)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(ibah->device);
+	struct rxe_ah *ah = to_rah(ibah);
+
+	err = rxe_av_chk_attr(rxe, attr);
+	if (err)
+		goto err1;
+
+	err = rxe_av_from_attr(rxe, attr->port_num, &ah->av, attr);
+err1:
+	return err;
+}
+
+static int rxe_query_ah(struct ib_ah *ibah, struct ib_ah_attr *attr)
+{
+	struct rxe_dev *rxe = to_rdev(ibah->device);
+	struct rxe_ah *ah = to_rah(ibah);
+
+	rxe_av_to_attr(rxe, &ah->av, attr);
+	return 0;
+}
+
+static int rxe_destroy_ah(struct ib_ah *ibah)
+{
+	struct rxe_ah *ah = to_rah(ibah);
+
+	rxe_drop_ref(ah->pd);
+	rxe_drop_ref(ah);
+	return 0;
+}
+
+static int post_one_recv(struct rxe_rq *rq, struct ib_recv_wr *ibwr)
+{
+	int err;
+	int i;
+	u32 length;
+	struct rxe_recv_wqe *recv_wqe;
+	int num_sge = ibwr->num_sge;
+
+	if (unlikely(queue_full(rq->queue))) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	if (unlikely(num_sge > rq->max_sge)) {
+		err = -EINVAL;
+		goto err1;
+	}
+
+	length = 0;
+	for (i = 0; i < num_sge; i++)
+		length += ibwr->sg_list[i].length;
+
+	recv_wqe = producer_addr(rq->queue);
+	recv_wqe->wr_id = ibwr->wr_id;
+	recv_wqe->num_sge = num_sge;
+
+	memcpy(recv_wqe->dma.sge, ibwr->sg_list,
+	       num_sge*sizeof(struct ib_sge));
+
+	recv_wqe->dma.length		= length;
+	recv_wqe->dma.resid		= length;
+	recv_wqe->dma.num_sge		= num_sge;
+	recv_wqe->dma.cur_sge		= 0;
+	recv_wqe->dma.sge_offset	= 0;
+
+	/* make sure all changes to the work queue are
+	   written before we update the producer pointer */
+	wmb();
+
+	advance_producer(rq->queue);
+	return 0;
+
+err1:
+	return err;
+}
+
+static struct ib_srq *rxe_create_srq(struct ib_pd *ibpd,
+				     struct ib_srq_init_attr *init,
+				     struct ib_udata *udata)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(ibpd->device);
+	struct rxe_pd *pd = to_rpd(ibpd);
+	struct rxe_srq *srq;
+	struct ib_ucontext *context = udata ? ibpd->uobject->context : NULL;
+
+	err = rxe_srq_chk_attr(rxe, NULL, &init->attr, IB_SRQ_INIT_MASK);
+	if (err)
+		goto err1;
+
+	srq = rxe_alloc(&rxe->srq_pool);
+	if (!srq) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	rxe_add_index(srq);
+	rxe_add_ref(pd);
+	srq->pd = pd;
+
+	err = rxe_srq_from_init(rxe, srq, init, context, udata);
+	if (err)
+		goto err2;
+
+	return &srq->ibsrq;
+
+err2:
+	rxe_drop_ref(pd);
+	rxe_drop_index(srq);
+	rxe_drop_ref(srq);
+err1:
+	return ERR_PTR(err);
+}
+
+static int rxe_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr,
+			  enum ib_srq_attr_mask mask,
+			  struct ib_udata *udata)
+{
+	int err;
+	struct rxe_srq *srq = to_rsrq(ibsrq);
+	struct rxe_dev *rxe = to_rdev(ibsrq->device);
+
+	err = rxe_srq_chk_attr(rxe, srq, attr, mask);
+	if (err)
+		goto err1;
+
+	err = rxe_srq_from_attr(rxe, srq, attr, mask, udata);
+	if (err)
+		goto err1;
+
+	return 0;
+
+err1:
+	return err;
+}
+
+static int rxe_query_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr)
+{
+	struct rxe_srq *srq = to_rsrq(ibsrq);
+
+	if (srq->error)
+		return -EINVAL;
+
+	attr->max_wr = srq->rq.queue->buf->index_mask;
+	attr->max_sge = srq->rq.max_sge;
+	attr->srq_limit = srq->limit;
+	return 0;
+}
+
+static int rxe_destroy_srq(struct ib_srq *ibsrq)
+{
+	struct rxe_srq *srq = to_rsrq(ibsrq);
+
+	if (srq->cq)
+		rxe_drop_ref(srq->cq);
+
+	rxe_drop_ref(srq->pd);
+	rxe_drop_index(srq);
+	rxe_drop_ref(srq);
+	return 0;
+}
+
+static int rxe_post_srq_recv(struct ib_srq *ibsrq, struct ib_recv_wr *wr,
+			     struct ib_recv_wr **bad_wr)
+{
+	int err = 0;
+	unsigned long flags;
+	struct rxe_srq *srq = to_rsrq(ibsrq);
+
+	spin_lock_irqsave(&srq->rq.producer_lock, flags);
+
+	while (wr) {
+		err = post_one_recv(&srq->rq, wr);
+		if (unlikely(err))
+			break;
+		wr = wr->next;
+	}
+
+	spin_unlock_irqrestore(&srq->rq.producer_lock, flags);
+
+	if (err)
+		*bad_wr = wr;
+
+	return err;
+}
+
+static struct ib_qp *rxe_create_qp(struct ib_pd *ibpd,
+				   struct ib_qp_init_attr *init,
+				   struct ib_udata *udata)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(ibpd->device);
+	struct rxe_pd *pd = to_rpd(ibpd);
+	struct rxe_qp *qp;
+
+	err = rxe_qp_chk_init(rxe, init);
+	if (err)
+		goto err1;
+
+	qp = rxe_alloc(&rxe->qp_pool);
+	if (!qp) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	rxe_add_index(qp);
+
+	if (udata)
+		qp->is_user = 1;
+
+	err = rxe_qp_from_init(rxe, qp, pd, init, udata, ibpd);
+	if (err)
+		goto err2;
+
+	return &qp->ibqp;
+
+err2:
+	rxe_drop_index(qp);
+	rxe_drop_ref(qp);
+err1:
+	return ERR_PTR(err);
+}
+
+static int rxe_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+			 int mask, struct ib_udata *udata)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(ibqp->device);
+	struct rxe_qp *qp = to_rqp(ibqp);
+
+	err = rxe_qp_chk_attr(rxe, qp, attr, mask);
+	if (err)
+		goto err1;
+
+	err = rxe_qp_from_attr(qp, attr, mask, udata);
+	if (err)
+		goto err1;
+
+	return 0;
+
+err1:
+	return err;
+}
+
+static int rxe_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+			int mask, struct ib_qp_init_attr *init)
+{
+	struct rxe_qp *qp = to_rqp(ibqp);
+
+	rxe_qp_to_init(qp, init);
+	rxe_qp_to_attr(qp, attr, mask);
+
+	return 0;
+}
+
+static int rxe_destroy_qp(struct ib_qp *ibqp)
+{
+	struct rxe_qp *qp = to_rqp(ibqp);
+
+	rxe_qp_destroy(qp);
+	rxe_drop_index(qp);
+	rxe_drop_ref(qp);
+	return 0;
+}
+
+static int validate_send_wr(struct rxe_qp *qp, struct ib_send_wr *ibwr,
+			    unsigned int mask, unsigned int length)
+{
+	int num_sge = ibwr->num_sge;
+	struct rxe_sq *sq = &qp->sq;
+
+	if (unlikely(num_sge > sq->max_sge))
+		goto err1;
+
+	if (unlikely(mask & WR_ATOMIC_MASK)) {
+		if (length < 8)
+			goto err1;
+
+		if (ibwr->wr.atomic.remote_addr & 0x7)
+			goto err1;
+	}
+
+	if (unlikely((ibwr->send_flags & IB_SEND_INLINE)
+		     && (length > sq->max_inline)))
+		goto err1;
+
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+static int init_send_wqe(struct rxe_qp *qp, struct ib_send_wr *ibwr,
+			 unsigned int mask, unsigned int length,
+			 struct rxe_send_wqe *wqe)
+{
+	int num_sge = ibwr->num_sge;
+	struct ib_sge *sge;
+	int i;
+	u8 *p;
+
+	memcpy(&wqe->ibwr, ibwr, sizeof(wqe->ibwr));
+
+	if (qp_type(qp) == IB_QPT_UD
+	    || qp_type(qp) == IB_QPT_SMI
+	    || qp_type(qp) == IB_QPT_GSI)
+		memcpy(&wqe->av, &to_rah(ibwr->wr.ud.ah)->av, sizeof(wqe->av));
+
+	if (unlikely(ibwr->send_flags & IB_SEND_INLINE)) {
+		p = wqe->dma.inline_data;
+
+		sge = ibwr->sg_list;
+		for (i = 0; i < num_sge; i++, sge++) {
+			if (qp->is_user && copy_from_user(p, (__user void *)
+					    (uintptr_t)sge->addr, sge->length))
+				return -EFAULT;
+			else
+				memcpy(p, (void *)(uintptr_t)sge->addr,
+				       sge->length);
+
+			p += sge->length;
+		}
+	} else
+		memcpy(wqe->dma.sge, ibwr->sg_list,
+		       num_sge*sizeof(struct ib_sge));
+
+	wqe->iova		= (mask & WR_ATOMIC_MASK) ?
+					ibwr->wr.atomic.remote_addr :
+					ibwr->wr.rdma.remote_addr;
+	wqe->mask		= mask;
+	wqe->dma.length		= length;
+	wqe->dma.resid		= length;
+	wqe->dma.num_sge	= num_sge;
+	wqe->dma.cur_sge	= 0;
+	wqe->dma.sge_offset	= 0;
+	wqe->state		= wqe_state_posted;
+	wqe->ssn		= atomic_add_return(1, &qp->ssn);
+
+	return 0;
+}
+
+static int post_one_send(struct rxe_qp *qp, struct ib_send_wr *ibwr,
+			 unsigned mask, u32 length)
+{
+	int err;
+	struct rxe_sq *sq = &qp->sq;
+	struct rxe_send_wqe *send_wqe;
+	unsigned long flags;
+
+	err = validate_send_wr(qp, ibwr, mask, length);
+	if (err)
+		return err;
+
+	spin_lock_irqsave(&qp->sq.sq_lock, flags);
+
+	if (unlikely(queue_full(sq->queue))) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	send_wqe = producer_addr(sq->queue);
+
+	err = init_send_wqe(qp, ibwr, mask, length, send_wqe);
+	if (unlikely(err))
+		goto err1;
+
+	/* make sure all changes to the work queue are
+	   written before we update the producer pointer */
+	wmb();
+
+	advance_producer(sq->queue);
+	spin_unlock_irqrestore(&qp->sq.sq_lock, flags);
+
+	return 0;
+
+err1:
+	spin_unlock_irqrestore(&qp->sq.sq_lock, flags);
+	return err;
+}
+
+static int rxe_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
+			 struct ib_send_wr **bad_wr)
+{
+	int err = 0;
+	struct rxe_qp *qp = to_rqp(ibqp);
+	unsigned int mask;
+	unsigned int length = 0;
+	int i;
+	int must_sched;
+
+	if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR)) {
+		*bad_wr = wr;
+		return -EINVAL;
+	}
+
+	if (unlikely(qp->req.state < QP_STATE_READY)) {
+		*bad_wr = wr;
+		return -EINVAL;
+	}
+
+	if (qp->is_user) {
+		unsigned int wqe_index = producer_index(qp->sq.queue);
+		struct rxe_send_wqe *wqe;
+
+		WARN_ON(wr);
+
+		/* compute length of last wr for use in
+		   fastpath decision below */
+		wqe_index = (wqe_index - 1) & qp->sq.queue->index_mask;
+		wqe = addr_from_index(qp->sq.queue, wqe_index);
+		length = wqe->dma.resid;
+	}
+
+	while (wr) {
+		mask = wr_opcode_mask(wr->opcode, qp);
+		if (unlikely(!mask)) {
+			err = -EINVAL;
+			*bad_wr = wr;
+			break;
+		}
+
+		if (unlikely((wr->send_flags & IB_SEND_INLINE) &&
+		    !(mask & WR_INLINE_MASK))) {
+			err = -EINVAL;
+			*bad_wr = wr;
+			break;
+		}
+
+		length = 0;
+		for (i = 0; i < wr->num_sge; i++)
+			length += wr->sg_list[i].length;
+
+		err = post_one_send(qp, wr, mask, length);
+
+		if (err) {
+			*bad_wr = wr;
+			break;
+		}
+		wr = wr->next;
+	}
+
+	must_sched = queue_count(qp->sq.queue) > 1;
+	rxe_run_task(&qp->req.task, must_sched);
+
+	return err;
+}
+
+static int rxe_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
+			 struct ib_recv_wr **bad_wr)
+{
+	int err = 0;
+	struct rxe_qp *qp = to_rqp(ibqp);
+	struct rxe_rq *rq = &qp->rq;
+	unsigned long flags;
+
+	if (unlikely((qp_state(qp) < IB_QPS_INIT) || !qp->valid)) {
+		*bad_wr = wr;
+		err = -EINVAL;
+		goto err1;
+	}
+
+	if (unlikely(qp->srq)) {
+		*bad_wr = wr;
+		err = -EINVAL;
+		goto err1;
+	}
+
+	if (unlikely(qp->is_user)) {
+		*bad_wr = wr;
+		err = -EINVAL;
+		goto err1;
+	}
+
+	spin_lock_irqsave(&rq->producer_lock, flags);
+
+	while (wr) {
+		err = post_one_recv(rq, wr);
+		if (unlikely(err)) {
+			*bad_wr = wr;
+			break;
+		}
+		wr = wr->next;
+	}
+
+	spin_unlock_irqrestore(&rq->producer_lock, flags);
+
+err1:
+	return err;
+}
+
+static struct ib_cq *rxe_create_cq(struct ib_device *dev, int cqe,
+				   int comp_vector,
+				   struct ib_ucontext *context,
+				   struct ib_udata *udata)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(dev);
+	struct rxe_cq *cq;
+
+	err = rxe_cq_chk_attr(rxe, NULL, cqe, comp_vector, udata);
+	if (err)
+		goto err1;
+
+	cq = rxe_alloc(&rxe->cq_pool);
+	if (!cq) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	err = rxe_cq_from_init(rxe, cq, cqe, comp_vector, context, udata);
+	if (err)
+		goto err2;
+
+	return &cq->ibcq;
+
+err2:
+	rxe_drop_ref(cq);
+err1:
+	return ERR_PTR(err);
+}
+
+static int rxe_destroy_cq(struct ib_cq *ibcq)
+{
+	struct rxe_cq *cq = to_rcq(ibcq);
+
+	rxe_drop_ref(cq);
+	return 0;
+}
+
+static int rxe_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
+{
+	int err;
+	struct rxe_cq *cq = to_rcq(ibcq);
+	struct rxe_dev *rxe = to_rdev(ibcq->device);
+
+	err = rxe_cq_chk_attr(rxe, cq, cqe, 0, udata);
+	if (err)
+		goto err1;
+
+	err = rxe_cq_resize_queue(cq, cqe, udata);
+	if (err)
+		goto err1;
+
+	return 0;
+
+err1:
+	return err;
+}
+
+static int rxe_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc)
+{
+	int i;
+	struct rxe_cq *cq = to_rcq(ibcq);
+	struct rxe_cqe *cqe;
+
+	for (i = 0; i < num_entries; i++) {
+		cqe = queue_head(cq->queue);
+		if (!cqe)
+			break;
+
+		memcpy(wc++, &cqe->ibwc, sizeof(*wc));
+		advance_consumer(cq->queue);
+	}
+
+	return i;
+}
+
+static int rxe_peek_cq(struct ib_cq *ibcq, int wc_cnt)
+{
+	struct rxe_cq *cq = to_rcq(ibcq);
+	int count = queue_count(cq->queue);
+
+	return (count > wc_cnt) ? wc_cnt : count;
+}
+
+static int rxe_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags)
+{
+	struct rxe_cq *cq = to_rcq(ibcq);
+
+	if (cq->notify != IB_CQ_NEXT_COMP)
+		cq->notify = flags & IB_CQ_SOLICITED_MASK;
+
+	return 0;
+}
+
+static int rxe_req_ncomp_notif(struct ib_cq *ibcq, int wc_cnt)
+{
+	return -EINVAL;
+}
+
+static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
+{
+	struct rxe_dev *rxe = to_rdev(ibpd->device);
+	struct rxe_pd *pd = to_rpd(ibpd);
+	struct rxe_mem *mr;
+	int err;
+
+	mr = rxe_alloc(&rxe->mr_pool);
+	if (!mr) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	rxe_add_index(mr);
+
+	rxe_add_ref(pd);
+
+	err = rxe_mem_init_dma(rxe, pd, access, mr);
+	if (err)
+		goto err2;
+
+	return &mr->ibmr;
+
+err2:
+	rxe_drop_ref(pd);
+	rxe_drop_index(mr);
+	rxe_drop_ref(mr);
+err1:
+	return ERR_PTR(err);
+}
+
+static struct ib_mr *rxe_reg_phys_mr(struct ib_pd *ibpd,
+				     struct ib_phys_buf *phys_buf_array,
+				     int num_phys_buf,
+				     int access, u64 *iova_start)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(ibpd->device);
+	struct rxe_pd *pd = to_rpd(ibpd);
+	struct rxe_mem *mr;
+	u64 iova = *iova_start;
+
+	mr = rxe_alloc(&rxe->mr_pool);
+	if (!mr) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	rxe_add_index(mr);
+
+	rxe_add_ref(pd);
+
+	err = rxe_mem_init_phys(rxe, pd, access, iova,
+				phys_buf_array, num_phys_buf, mr);
+	if (err)
+		goto err2;
+
+	return &mr->ibmr;
+
+err2:
+	rxe_drop_ref(pd);
+	rxe_drop_index(mr);
+	rxe_drop_ref(mr);
+err1:
+	return ERR_PTR(err);
+}
+
+static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
+				     u64 start,
+				     u64 length,
+				     u64 iova,
+				     int access, struct ib_udata *udata)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(ibpd->device);
+	struct rxe_pd *pd = to_rpd(ibpd);
+	struct rxe_mem *mr;
+
+	mr = rxe_alloc(&rxe->mr_pool);
+	if (!mr) {
+		err = -ENOMEM;
+		goto err2;
+	}
+
+	rxe_add_index(mr);
+
+	rxe_add_ref(pd);
+
+	err = rxe_mem_init_user(rxe, pd, start, length, iova,
+				access, udata, mr);
+	if (err)
+		goto err3;
+
+	return &mr->ibmr;
+
+err3:
+	rxe_drop_ref(pd);
+	rxe_drop_index(mr);
+	rxe_drop_ref(mr);
+err2:
+	return ERR_PTR(err);
+}
+
+static int rxe_dereg_mr(struct ib_mr *ibmr)
+{
+	struct rxe_mem *mr = to_rmr(ibmr);
+
+	mr->state = RXE_MEM_STATE_ZOMBIE;
+	rxe_drop_ref(mr->pd);
+	rxe_drop_index(mr);
+	rxe_drop_ref(mr);
+	return 0;
+}
+
+static struct ib_mr *rxe_alloc_fast_reg_mr(struct ib_pd *ibpd, int max_pages)
+{
+	struct rxe_dev *rxe = to_rdev(ibpd->device);
+	struct rxe_pd *pd = to_rpd(ibpd);
+	struct rxe_mem *mr;
+	int err;
+
+	mr = rxe_alloc(&rxe->mr_pool);
+	if (!mr) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	rxe_add_index(mr);
+
+	rxe_add_ref(pd);
+
+	err = rxe_mem_init_fast(rxe, pd, max_pages, mr);
+	if (err)
+		goto err2;
+
+	return &mr->ibmr;
+
+err2:
+	rxe_drop_ref(pd);
+	rxe_drop_index(mr);
+	rxe_drop_ref(mr);
+err1:
+	return ERR_PTR(err);
+}
+
+static struct ib_fast_reg_page_list *
+	rxe_alloc_fast_reg_page_list(struct ib_device *device,
+				     int page_list_len)
+{
+	struct rxe_fast_reg_page_list *frpl;
+	int err;
+
+	frpl = kmalloc(sizeof(*frpl), GFP_KERNEL);
+	if (!frpl) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	frpl->ibfrpl.page_list = kmalloc(page_list_len*sizeof(u64), GFP_KERNEL);
+	if (!frpl->ibfrpl.page_list) {
+		err = -ENOMEM;
+		goto err2;
+	}
+
+	return &frpl->ibfrpl;
+
+err2:
+	kfree(frpl);
+err1:
+	return ERR_PTR(err);
+}
+
+static void rxe_free_fast_reg_page_list(struct ib_fast_reg_page_list *ibfrpl)
+{
+	struct rxe_fast_reg_page_list *frpl = to_rfrpl(ibfrpl);
+
+	kfree(frpl->ibfrpl.page_list);
+	kfree(frpl);
+	return;
+}
+
+static int rxe_rereg_phys_mr(struct ib_mr *ibmr, int mr_rereg_mask,
+			     struct ib_pd *ibpd,
+			     struct ib_phys_buf *phys_buf_array,
+			     int num_phys_buf, int mr_access_flags,
+			     u64 *iova_start)
+{
+	return -EINVAL;
+}
+
+static struct ib_mw *rxe_alloc_mw(struct ib_pd *ibpd)
+{
+	struct rxe_dev *rxe = to_rdev(ibpd->device);
+	struct rxe_pd *pd = to_rpd(ibpd);
+	struct rxe_mem *mw;
+	int err;
+
+	mw = rxe_alloc(&rxe->mw_pool);
+	if (!mw) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	rxe_add_index(mw);
+
+	rxe_add_ref(pd);
+
+	err = rxe_mem_init_mw(rxe, pd, mw);
+	if (err)
+		goto err2;
+
+	return &mw->ibmw;
+
+err2:
+	rxe_drop_ref(pd);
+	rxe_drop_index(mw);
+	rxe_drop_ref(mw);
+err1:
+	return ERR_PTR(err);
+}
+
+static int rxe_bind_mw(struct ib_qp *ibqp,
+		       struct ib_mw *ibmw, struct ib_mw_bind *mw_bind)
+{
+	return -EINVAL;
+}
+
+static int rxe_dealloc_mw(struct ib_mw *ibmw)
+{
+	struct rxe_mem *mw = to_rmw(ibmw);
+
+	mw->state = RXE_MEM_STATE_ZOMBIE;
+	rxe_drop_ref(mw->pd);
+	rxe_drop_index(mw);
+	rxe_drop_ref(mw);
+	return 0;
+}
+
+static struct ib_fmr *rxe_alloc_fmr(struct ib_pd *ibpd,
+				    int access, struct ib_fmr_attr *attr)
+{
+	struct rxe_dev *rxe = to_rdev(ibpd->device);
+	struct rxe_pd *pd = to_rpd(ibpd);
+	struct rxe_mem *fmr;
+	int err;
+
+	fmr = rxe_alloc(&rxe->fmr_pool);
+	if (!fmr) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	rxe_add_index(fmr);
+
+	rxe_add_ref(pd);
+
+	err = rxe_mem_init_fmr(rxe, pd, access, attr, fmr);
+	if (err)
+		goto err2;
+
+	return &fmr->ibfmr;
+
+err2:
+	rxe_drop_ref(pd);
+	rxe_drop_index(fmr);
+	rxe_drop_ref(fmr);
+err1:
+	return ERR_PTR(err);
+}
+
+static int rxe_map_phys_fmr(struct ib_fmr *ibfmr,
+			    u64 *page_list, int list_length, u64 iova)
+{
+	struct rxe_mem *fmr = to_rfmr(ibfmr);
+	struct rxe_dev *rxe = to_rdev(ibfmr->device);
+
+	return rxe_mem_map_pages(rxe, fmr, page_list, list_length, iova);
+}
+
+static int rxe_unmap_fmr(struct list_head *fmr_list)
+{
+	struct rxe_mem *fmr;
+
+	list_for_each_entry(fmr, fmr_list, ibfmr.list) {
+		if (fmr->state != RXE_MEM_STATE_VALID)
+			continue;
+
+		fmr->va = 0;
+		fmr->iova = 0;
+		fmr->length = 0;
+		fmr->num_buf = 0;
+		fmr->state = RXE_MEM_STATE_FREE;
+	}
+
+	return 0;
+}
+
+static int rxe_dealloc_fmr(struct ib_fmr *ibfmr)
+{
+	struct rxe_mem *fmr = to_rfmr(ibfmr);
+
+	fmr->state = RXE_MEM_STATE_ZOMBIE;
+	rxe_drop_ref(fmr->pd);
+	rxe_drop_index(fmr);
+	rxe_drop_ref(fmr);
+	return 0;
+}
+
+static int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(ibqp->device);
+	struct rxe_qp *qp = to_rqp(ibqp);
+	struct rxe_mc_grp *grp;
+
+	/* takes a ref on grp if successful */
+	err = rxe_mcast_get_grp(rxe, mgid, mlid, &grp);
+	if (err)
+		return err;
+
+	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
+
+	rxe_drop_ref(grp);
+	return err;
+}
+
+static int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
+{
+	struct rxe_dev *rxe = to_rdev(ibqp->device);
+	struct rxe_qp *qp = to_rqp(ibqp);
+
+	return rxe_mcast_drop_grp_elem(rxe, qp, mgid, mlid);
+}
+
+static ssize_t rxe_show_skb_num(struct device *device,
+				struct device_attribute *attr, char *buf)
+{
+	struct rxe_dev *rxe = container_of(device, struct rxe_dev,
+					   ib_dev.dev);
+
+	return sprintf(buf, "req_in:%d resp_in:%d req_out:%d resp_out:%d\n",
+		atomic_read(&rxe->req_skb_in),
+		atomic_read(&rxe->resp_skb_in),
+		atomic_read(&rxe->req_skb_out),
+		atomic_read(&rxe->resp_skb_out));
+}
+
+static DEVICE_ATTR(skb_num, S_IRUGO, rxe_show_skb_num, NULL);
+
+static ssize_t rxe_show_parent(struct device *device,
+			       struct device_attribute *attr, char *buf)
+{
+	struct rxe_dev *rxe = container_of(device, struct rxe_dev,
+					   ib_dev.dev);
+	char *name;
+
+	name = rxe->ifc_ops->parent_name(rxe, 1);
+	return snprintf(buf, 16, "%s\n", name);
+}
+
+static DEVICE_ATTR(parent, S_IRUGO, rxe_show_parent, NULL);
+
+static struct device_attribute *rxe_dev_attributes[] = {
+	&dev_attr_skb_num,
+	&dev_attr_parent,
+};
+
+int rxe_register_device(struct rxe_dev *rxe)
+{
+	int err;
+	int i;
+	struct ib_device *dev = &rxe->ib_dev;
+
+	strlcpy(dev->name, "rxe%d", IB_DEVICE_NAME_MAX);
+	strlcpy(dev->node_desc, "rxe", sizeof(dev->node_desc));
+
+	dev->owner = THIS_MODULE;
+	dev->node_type = RDMA_NODE_IB_CA;
+	dev->phys_port_cnt = rxe->num_ports;
+	dev->num_comp_vectors = RXE_NUM_COMP_VECTORS;
+	dev->dma_device = rxe->ifc_ops->dma_device(rxe);
+	dev->local_dma_lkey = 0;	/* TODO */
+	dev->node_guid = rxe->ifc_ops->node_guid(rxe);
+	dev->dma_ops = &rxe_dma_mapping_ops;
+
+	dev->uverbs_abi_ver = RXE_UVERBS_ABI_VERSION;
+	dev->uverbs_cmd_mask = (1ull << IB_USER_VERBS_CMD_GET_CONTEXT)
+	    | (1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL)
+	    | (1ull << IB_USER_VERBS_CMD_QUERY_DEVICE)
+	    | (1ull << IB_USER_VERBS_CMD_QUERY_PORT)
+	    | (1ull << IB_USER_VERBS_CMD_ALLOC_PD)
+	    | (1ull << IB_USER_VERBS_CMD_DEALLOC_PD)
+	    | (1ull << IB_USER_VERBS_CMD_CREATE_SRQ)
+	    | (1ull << IB_USER_VERBS_CMD_MODIFY_SRQ)
+	    | (1ull << IB_USER_VERBS_CMD_QUERY_SRQ)
+	    | (1ull << IB_USER_VERBS_CMD_DESTROY_SRQ)
+	    | (1ull << IB_USER_VERBS_CMD_POST_SRQ_RECV)
+	    | (1ull << IB_USER_VERBS_CMD_CREATE_QP)
+	    | (1ull << IB_USER_VERBS_CMD_MODIFY_QP)
+	    | (1ull << IB_USER_VERBS_CMD_QUERY_QP)
+	    | (1ull << IB_USER_VERBS_CMD_DESTROY_QP)
+	    | (1ull << IB_USER_VERBS_CMD_POST_SEND)
+	    | (1ull << IB_USER_VERBS_CMD_POST_RECV)
+	    | (1ull << IB_USER_VERBS_CMD_CREATE_CQ)
+	    | (1ull << IB_USER_VERBS_CMD_RESIZE_CQ)
+	    | (1ull << IB_USER_VERBS_CMD_DESTROY_CQ)
+	    | (1ull << IB_USER_VERBS_CMD_POLL_CQ)
+	    | (1ull << IB_USER_VERBS_CMD_PEEK_CQ)
+	    | (1ull << IB_USER_VERBS_CMD_REQ_NOTIFY_CQ)
+	    | (1ull << IB_USER_VERBS_CMD_REG_MR)
+	    | (1ull << IB_USER_VERBS_CMD_DEREG_MR)
+	    | (1ull << IB_USER_VERBS_CMD_CREATE_AH)
+	    | (1ull << IB_USER_VERBS_CMD_MODIFY_AH)
+	    | (1ull << IB_USER_VERBS_CMD_QUERY_AH)
+	    | (1ull << IB_USER_VERBS_CMD_DESTROY_AH)
+	    | (1ull << IB_USER_VERBS_CMD_ATTACH_MCAST)
+	    | (1ull << IB_USER_VERBS_CMD_DETACH_MCAST)
+	    ;
+
+	dev->query_device = rxe_query_device;
+	dev->modify_device = rxe_modify_device;
+	dev->query_port = rxe_query_port;
+	dev->modify_port = rxe_modify_port;
+	dev->get_link_layer = rxe_get_link_layer;
+	dev->query_gid = rxe_query_gid;
+	dev->query_pkey = rxe_query_pkey;
+	dev->alloc_ucontext = rxe_alloc_ucontext;
+	dev->dealloc_ucontext = rxe_dealloc_ucontext;
+	dev->mmap = rxe_mmap;
+	dev->alloc_pd = rxe_alloc_pd;
+	dev->dealloc_pd = rxe_dealloc_pd;
+	dev->create_ah = rxe_create_ah;
+	dev->modify_ah = rxe_modify_ah;
+	dev->query_ah = rxe_query_ah;
+	dev->destroy_ah = rxe_destroy_ah;
+	dev->create_srq = rxe_create_srq;
+	dev->modify_srq = rxe_modify_srq;
+	dev->query_srq = rxe_query_srq;
+	dev->destroy_srq = rxe_destroy_srq;
+	dev->post_srq_recv = rxe_post_srq_recv;
+	dev->create_qp = rxe_create_qp;
+	dev->modify_qp = rxe_modify_qp;
+	dev->query_qp = rxe_query_qp;
+	dev->destroy_qp = rxe_destroy_qp;
+	dev->post_send = rxe_post_send;
+	dev->post_recv = rxe_post_recv;
+	dev->create_cq = rxe_create_cq;
+	dev->modify_cq = NULL;
+	dev->destroy_cq = rxe_destroy_cq;
+	dev->resize_cq = rxe_resize_cq;
+	dev->poll_cq = rxe_poll_cq;
+	dev->peek_cq = rxe_peek_cq;
+	dev->req_notify_cq = rxe_req_notify_cq;
+	dev->req_ncomp_notif = rxe_req_ncomp_notif;
+	dev->get_dma_mr = rxe_get_dma_mr;
+	dev->reg_phys_mr = rxe_reg_phys_mr;
+	dev->reg_user_mr = rxe_reg_user_mr;
+	dev->rereg_phys_mr = rxe_rereg_phys_mr;
+	dev->dereg_mr = rxe_dereg_mr;
+	dev->alloc_fast_reg_mr = rxe_alloc_fast_reg_mr;
+	dev->alloc_fast_reg_page_list = rxe_alloc_fast_reg_page_list;
+	dev->free_fast_reg_page_list = rxe_free_fast_reg_page_list;
+	dev->alloc_mw = rxe_alloc_mw;
+	dev->bind_mw = rxe_bind_mw;
+	dev->dealloc_mw = rxe_dealloc_mw;
+	dev->alloc_fmr = rxe_alloc_fmr;
+	dev->map_phys_fmr = rxe_map_phys_fmr;
+	dev->unmap_fmr = rxe_unmap_fmr;
+	dev->dealloc_fmr = rxe_dealloc_fmr;
+	dev->attach_mcast = rxe_attach_mcast;
+	dev->detach_mcast = rxe_detach_mcast;
+	dev->process_mad = NULL;
+
+	err = ib_register_device(dev, NULL);
+	if (err) {
+		pr_warn("rxe_register_device failed, err = %d\n", err);
+		goto err1;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(rxe_dev_attributes); ++i) {
+		err = device_create_file(&dev->dev, rxe_dev_attributes[i]);
+		if (err) {
+			pr_warn("device_create_file failed, "
+				"i = %d, err = %d\n", i, err);
+			goto err2;
+		}
+	}
+
+	return 0;
+
+err2:
+	ib_unregister_device(dev);
+err1:
+	return err;
+}
+
+int rxe_unregister_device(struct rxe_dev *rxe)
+{
+	int i;
+	struct ib_device *dev = &rxe->ib_dev;
+
+	for (i = 0; i < ARRAY_SIZE(rxe_dev_attributes); ++i)
+		device_remove_file(&dev->dev, rxe_dev_attributes[i]);
+
+	ib_unregister_device(dev);
+
+	return 0;
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 14/37] add rxe_pool.h
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (12 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 13/37] add rxe_verbs.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 15/37] add rxe_pool.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (23 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_pool.h --]
[-- Type: text/plain, Size: 5375 bytes --]

declarations for rdma objects

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_pool.h |  163 +++++++++++++++++++++++++++++++++++
 1 file changed, 163 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_pool.h
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_pool.h
@@ -0,0 +1,163 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *	   Redistribution and use in source and binary forms, with or
+ *	   without modification, are permitted provided that the following
+ *	   conditions are met:
+ *
+ *		- Redistributions of source code must retain the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer.
+ *
+ *		- Redistributions in binary form must reproduce the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer in the documentation and/or other materials
+ *		  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_POOL_H
+#define RXE_POOL_H
+
+/* declarations for pools of managed objects */
+
+#define RXE_POOL_ALIGN		(16)
+#define RXE_POOL_CACHE_FLAGS	(0)
+
+enum rxe_pool_flags {
+	RXE_POOL_ATOMIC		= (1 << 0),
+	RXE_POOL_INDEX		= (1 << 1),
+	RXE_POOL_KEY		= (1 << 2),
+};
+
+enum rxe_elem_type {
+	RXE_TYPE_UC,
+	RXE_TYPE_PD,
+	RXE_TYPE_AH,
+	RXE_TYPE_SRQ,
+	RXE_TYPE_QP,
+	RXE_TYPE_CQ,
+	RXE_TYPE_MR,
+	RXE_TYPE_MW,
+	RXE_TYPE_FMR,
+	RXE_TYPE_MC_GRP,
+	RXE_TYPE_MC_ELEM,
+	RXE_NUM_TYPES,		/* keep me last */
+};
+
+struct rxe_type_info {
+	char			*name;
+	size_t			size;
+	void			(*cleanup)(void *obj);
+	enum rxe_pool_flags	flags;
+	u32			max_index;
+	u32			min_index;
+	size_t			key_offset;
+	size_t			key_size;
+	struct kmem_cache	*cache;
+};
+
+extern struct rxe_type_info rxe_type_info[];
+
+enum rxe_pool_state {
+	rxe_pool_invalid,
+	rxe_pool_valid,
+};
+
+struct rxe_pool_entry {
+	struct rxe_pool		*pool;
+	struct kref		ref_cnt;
+	struct list_head	list;
+
+	/* only used if index'ed or key'ed */
+	struct rb_node		node;
+	u32			index;
+};
+
+struct rxe_pool {
+	struct rxe_dev		*rxe;
+	spinlock_t              pool_lock;
+	size_t			elem_size;
+	struct kref		ref_cnt;
+	void			(*cleanup)(void *obj);
+	enum rxe_pool_state	state;
+	enum rxe_pool_flags	flags;
+	enum rxe_elem_type	type;
+
+	unsigned int		max_elem;
+	atomic_t		num_elem;
+
+	/* only used if index'ed or key'ed */
+	struct rb_root		tree;
+	unsigned long		*table;
+	size_t			table_size;
+	u32			max_index;
+	u32			min_index;
+	u32			last;
+	size_t			key_offset;
+	size_t			key_size;
+};
+
+/* initialize slab caches for managed objects */
+int __init rxe_cache_init(void);
+
+/* cleanup slab caches for managed objects */
+void __exit rxe_cache_exit(void);
+
+/* initialize a pool of objects with given limit on
+   number of elements. gets parameters from rxe_type_info
+   pool elements will be allocated out of a slab cache */
+int rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
+		  enum rxe_elem_type type, u32 max_elem);
+
+/* free resources from object pool */
+int rxe_pool_cleanup(struct rxe_pool *pool);
+
+/* allocate an object from pool */
+void *rxe_alloc(struct rxe_pool *pool);
+
+/* assign an index to an indexed object and insert object into
+   pool's rb tree */
+void rxe_add_index(void *elem);
+
+/* drop an index and remove object from rb tree */
+void rxe_drop_index(void *elem);
+
+/* assign a key to a keyed object and insert object into
+   pool's rb tree */
+void rxe_add_key(void *elem, void *key);
+
+/* remove elem from rb tree */
+void rxe_drop_key(void *elem);
+
+/* lookup an indexed object from index. takes a reference on object */
+void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
+
+/* lookup keyed object from key. takes a reference on the object */
+void *rxe_pool_get_key(struct rxe_pool *pool, void *key);
+
+/* cleanup an object when all references are dropped */
+void rxe_elem_release(struct kref *kref);
+
+/* take a reference on an object */
+#define rxe_add_ref(elem) kref_get(&(elem)->pelem.ref_cnt)
+
+/* drop a reference on an object */
+#define rxe_drop_ref(elem) kref_put(&(elem)->pelem.ref_cnt, rxe_elem_release)
+
+#endif /* RXE_POOL_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 15/37] add rxe_pool.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (13 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 14/37] add rxe_pool.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 16/37] add rxe_task.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (22 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_pool.c --]
[-- Type: text/plain, Size: 13275 bytes --]

rdma objects

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_pool.c |  520 +++++++++++++++++++++++++++++++++++
 1 file changed, 520 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_pool.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_pool.c
@@ -0,0 +1,520 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *	   Redistribution and use in source and binary forms, with or
+ *	   without modification, are permitted provided that the following
+ *	   conditions are met:
+ *
+ *		- Redistributions of source code must retain the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer.
+ *
+ *		- Redistributions in binary form must reproduce the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer in the documentation and/or other materials
+ *		  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+/* info about object pools
+   note that mr, fmr and mw share a single index space
+   so that one can map an lkey to the correct type of object */
+struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = {
+	[RXE_TYPE_UC] = {
+		.name		= "rxe_uc",
+		.size		= sizeof(struct rxe_ucontext),
+	},
+	[RXE_TYPE_PD] = {
+		.name		= "rxe_pd",
+		.size		= sizeof(struct rxe_pd),
+	},
+	[RXE_TYPE_AH] = {
+		.name		= "rxe_ah",
+		.size		= sizeof(struct rxe_ah),
+		.flags		= RXE_POOL_ATOMIC,
+	},
+	[RXE_TYPE_SRQ] = {
+		.name		= "rxe_srq",
+		.size		= sizeof(struct rxe_srq),
+		.flags		= RXE_POOL_INDEX,
+		.min_index	= RXE_MIN_SRQ_INDEX,
+		.max_index	= RXE_MAX_SRQ_INDEX,
+		.cleanup	= rxe_srq_cleanup,
+	},
+	[RXE_TYPE_QP] = {
+		.name		= "rxe_qp",
+		.size		= sizeof(struct rxe_qp),
+		.cleanup	= rxe_qp_cleanup,
+		.flags		= RXE_POOL_INDEX,
+		.min_index	= RXE_MIN_QP_INDEX,
+		.max_index	= RXE_MAX_QP_INDEX,
+	},
+	[RXE_TYPE_CQ] = {
+		.name		= "rxe_cq",
+		.size		= sizeof(struct rxe_cq),
+		.cleanup	= rxe_cq_cleanup,
+	},
+	[RXE_TYPE_MR] = {
+		.name		= "rxe_mr",
+		.size		= sizeof(struct rxe_mem),
+		.cleanup	= rxe_mem_cleanup,
+		.flags		= RXE_POOL_INDEX,
+		.max_index	= RXE_MAX_MR_INDEX,
+		.min_index	= RXE_MIN_MR_INDEX,
+	},
+	[RXE_TYPE_FMR] = {
+		.name		= "rxe_fmr",
+		.size		= sizeof(struct rxe_mem),
+		.cleanup	= rxe_mem_cleanup,
+		.flags		= RXE_POOL_INDEX,
+		.max_index	= RXE_MAX_FMR_INDEX,
+		.min_index	= RXE_MIN_FMR_INDEX,
+	},
+	[RXE_TYPE_MW] = {
+		.name		= "rxe_mw",
+		.size		= sizeof(struct rxe_mem),
+		.flags		= RXE_POOL_INDEX,
+		.max_index	= RXE_MAX_MW_INDEX,
+		.min_index	= RXE_MIN_MW_INDEX,
+	},
+	[RXE_TYPE_MC_GRP] = {
+		.name		= "rxe_mc_grp",
+		.size		= sizeof(struct rxe_mc_grp),
+		.cleanup	= rxe_mc_cleanup,
+		.flags		= RXE_POOL_KEY,
+		.key_offset	= offsetof(struct rxe_mc_grp, mgid),
+		.key_size	= sizeof(union ib_gid),
+	},
+	[RXE_TYPE_MC_ELEM] = {
+		.name		= "rxe_mc_elem",
+		.size		= sizeof(struct rxe_mc_elem),
+		.flags		= RXE_POOL_ATOMIC,
+	},
+};
+
+static inline char *pool_name(struct rxe_pool *pool)
+{
+	return rxe_type_info[pool->type].name + 4;
+}
+
+static inline struct kmem_cache *pool_cache(struct rxe_pool *pool)
+{
+	return rxe_type_info[pool->type].cache;
+}
+
+static inline enum rxe_elem_type rxe_type(void *arg)
+{
+	struct rxe_pool_entry *elem = arg;
+	return elem->pool->type;
+}
+
+int __init rxe_cache_init(void)
+{
+	int err;
+	int i;
+	size_t size;
+	struct rxe_type_info *type;
+
+	for (i = 0; i < RXE_NUM_TYPES; i++) {
+		type = &rxe_type_info[i];
+		size = (type->size + RXE_POOL_ALIGN - 1) &
+				~(RXE_POOL_ALIGN - 1);
+		type->cache = kmem_cache_create(type->name, size,
+				RXE_POOL_ALIGN,
+				RXE_POOL_CACHE_FLAGS, NULL);
+		if (!type->cache) {
+			pr_info("Unable to init kmem cache for %s\n",
+				type->name);
+			err = -ENOMEM;
+			goto err1;
+		}
+	}
+
+	return 0;
+
+err1:
+	while (--i >= 0) {
+		kmem_cache_destroy(type->cache);
+		type->cache = NULL;
+	}
+
+	return err;
+}
+
+void __exit rxe_cache_exit(void)
+{
+	int i;
+	struct rxe_type_info *type;
+
+	for (i = 0; i < RXE_NUM_TYPES; i++) {
+		type = &rxe_type_info[i];
+		kmem_cache_destroy(type->cache);
+		type->cache = NULL;
+	}
+}
+
+static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
+{
+	int err = 0;
+	size_t size;
+
+	if ((max - min + 1) < pool->max_elem) {
+		pr_warn("not enough indices for max_elem\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	pool->max_index = max;
+	pool->min_index = min;
+
+	size = BITS_TO_LONGS(max - min + 1) * sizeof(long);
+	pool->table = kmalloc(size, GFP_KERNEL);
+	if (!pool->table) {
+		pr_warn("no memory for bit table\n");
+		err = -ENOMEM;
+		goto out;
+	}
+
+	pool->table_size = size;
+	bitmap_zero(pool->table, max - min + 1);
+
+out:
+	return err;
+}
+
+int rxe_pool_init(
+	struct rxe_dev		*rxe,
+	struct rxe_pool		*pool,
+	enum rxe_elem_type	type,
+	unsigned		max_elem)
+{
+	int			err = 0;
+	size_t			size = rxe_type_info[type].size;
+
+	memset(pool, 0, sizeof(*pool));
+
+	pool->rxe		= rxe;
+	pool->type		= type;
+	pool->max_elem		= max_elem;
+	pool->elem_size		= (size + RXE_POOL_ALIGN - 1) &
+					~(RXE_POOL_ALIGN - 1);
+	pool->flags		= rxe_type_info[type].flags;
+	pool->tree		= RB_ROOT;
+	pool->cleanup		= rxe_type_info[type].cleanup;
+
+	atomic_set(&pool->num_elem, 0);
+
+	kref_init(&pool->ref_cnt);
+
+	spin_lock_init(&pool->pool_lock);
+
+	if (rxe_type_info[type].flags & RXE_POOL_INDEX) {
+		err = rxe_pool_init_index(pool,
+			rxe_type_info[type].max_index,
+			rxe_type_info[type].min_index);
+		if (err)
+			goto out;
+	}
+
+	if (rxe_type_info[type].flags & RXE_POOL_KEY) {
+		pool->key_offset = rxe_type_info[type].key_offset;
+		pool->key_size = rxe_type_info[type].key_size;
+	}
+
+	pool->state = rxe_pool_valid;
+
+out:
+	return err;
+}
+
+static void rxe_pool_release(struct kref *kref)
+{
+	struct rxe_pool *pool = container_of(kref, struct rxe_pool, ref_cnt);
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->pool_lock, flags);
+	pool->state = rxe_pool_invalid;
+	spin_unlock_irqrestore(&pool->pool_lock, flags);
+
+	kfree(pool->table);
+
+	return;
+}
+
+int rxe_pool_cleanup(struct rxe_pool *pool)
+{
+	int num_elem;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->pool_lock, flags);
+	pool->state = rxe_pool_invalid;
+	spin_unlock_irqrestore(&pool->pool_lock, flags);
+
+	num_elem = atomic_read(&pool->num_elem);
+	if (num_elem > 0)
+		pr_warn("%s pool destroyed with %d unfree'd elem\n",
+			 pool_name(pool), num_elem);
+
+	kref_put(&pool->ref_cnt, rxe_pool_release);
+
+	return 0;
+}
+
+static u32 alloc_index(struct rxe_pool *pool)
+{
+	u32 index;
+	u32 range = pool->max_index - pool->min_index + 1;
+
+	index = find_next_zero_bit(pool->table, range, pool->last);
+	if (index >= range)
+		index = find_first_zero_bit(pool->table, range);
+
+	set_bit(index, pool->table);
+	pool->last = index;
+	return index + pool->min_index;
+}
+
+static void insert_index(struct rxe_pool *pool, struct rxe_pool_entry *new)
+{
+	struct rb_node **link = &pool->tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct rxe_pool_entry *elem;
+
+	while (*link) {
+		parent = *link;
+		elem = rb_entry(parent, struct rxe_pool_entry, node);
+
+		if (elem->index == new->index)
+			goto out;
+
+		if (elem->index > new->index)
+			link = &(*link)->rb_left;
+		else
+			link = &(*link)->rb_right;
+	}
+
+	rb_link_node(&new->node, parent, link);
+	rb_insert_color(&new->node, &pool->tree);
+out:
+	return;
+}
+
+static void insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new)
+{
+	struct rb_node **link = &pool->tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct rxe_pool_entry *elem;
+	int cmp;
+
+	while (*link) {
+		parent = *link;
+		elem = rb_entry(parent, struct rxe_pool_entry, node);
+
+		cmp = memcmp((u8 *)elem + pool->key_offset,
+			(u8 *)new + pool->key_offset, pool->key_size);
+
+		if (cmp == 0)
+			goto out;
+
+		if (cmp > 0)
+			link = &(*link)->rb_left;
+		else
+			link = &(*link)->rb_right;
+	}
+
+	rb_link_node(&new->node, parent, link);
+	rb_insert_color(&new->node, &pool->tree);
+out:
+	return;
+}
+
+void rxe_add_key(void *arg, void *key)
+{
+	struct rxe_pool_entry *elem = arg;
+	struct rxe_pool *pool = elem->pool;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->pool_lock, flags);
+	memcpy((u8 *)elem + pool->key_offset, key, pool->key_size);
+	insert_key(pool, elem);
+	spin_unlock_irqrestore(&pool->pool_lock, flags);
+	return;
+}
+
+void rxe_drop_key(void *arg)
+{
+	struct rxe_pool_entry *elem = arg;
+	struct rxe_pool *pool = elem->pool;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->pool_lock, flags);
+	rb_erase(&elem->node, &pool->tree);
+	spin_unlock_irqrestore(&pool->pool_lock, flags);
+	return;
+}
+
+void rxe_add_index(void *arg)
+{
+	struct rxe_pool_entry *elem = arg;
+	struct rxe_pool *pool = elem->pool;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->pool_lock, flags);
+	elem->index = alloc_index(pool);
+	insert_index(pool, elem);
+	spin_unlock_irqrestore(&pool->pool_lock, flags);
+	return;
+}
+
+void rxe_drop_index(void *arg)
+{
+	struct rxe_pool_entry *elem = arg;
+	struct rxe_pool *pool = elem->pool;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->pool_lock, flags);
+	clear_bit(elem->index - pool->min_index, pool->table);
+	rb_erase(&elem->node, &pool->tree);
+	spin_unlock_irqrestore(&pool->pool_lock, flags);
+	return;
+}
+
+void *rxe_alloc(struct rxe_pool *pool)
+{
+	struct rxe_pool_entry *elem;
+	unsigned long flags;
+
+	if (!(pool->flags & RXE_POOL_ATOMIC)
+	    && (in_irq() || irqs_disabled())) {
+		pr_warn("pool alloc %s in context %d/%d\n",
+		       pool_name(pool), (int)in_irq(),
+		       (int)irqs_disabled());
+	}
+
+	spin_lock_irqsave(&pool->pool_lock, flags);
+	if (pool->state != rxe_pool_valid) {
+		spin_unlock_irqrestore(&pool->pool_lock, flags);
+		return NULL;
+	}
+	kref_get(&pool->ref_cnt);
+	spin_unlock_irqrestore(&pool->pool_lock, flags);
+
+	kref_get(&pool->rxe->ref_cnt);
+
+	if (atomic_inc_return(&pool->num_elem) > pool->max_elem) {
+		atomic_dec(&pool->num_elem);
+		kref_put(&pool->rxe->ref_cnt, rxe_release);
+		kref_put(&pool->ref_cnt, rxe_pool_release);
+		return NULL;
+	}
+
+	elem = kmem_cache_zalloc(pool_cache(pool),
+				 (pool->flags & RXE_POOL_ATOMIC) ?
+				 GFP_ATOMIC : GFP_KERNEL);
+
+	elem->pool = pool;
+	kref_init(&elem->ref_cnt);
+
+	return elem;
+}
+
+void rxe_elem_release(struct kref *kref)
+{
+	struct rxe_pool_entry *elem =
+		container_of(kref, struct rxe_pool_entry, ref_cnt);
+	struct rxe_pool *pool = elem->pool;
+
+	if (pool->cleanup)
+		pool->cleanup(elem);
+
+	kmem_cache_free(pool_cache(pool), elem);
+	atomic_dec(&pool->num_elem);
+	kref_put(&pool->rxe->ref_cnt, rxe_release);
+	kref_put(&pool->ref_cnt, rxe_pool_release);
+}
+
+void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
+{
+	struct rb_node *node = NULL;
+	struct rxe_pool_entry *elem = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->pool_lock, flags);
+
+	if (pool->state != rxe_pool_valid)
+		goto out;
+
+	node = pool->tree.rb_node;
+
+	while (node) {
+		elem = rb_entry(node, struct rxe_pool_entry, node);
+
+		if (elem->index > index)
+			node = node->rb_left;
+		else if (elem->index < index)
+			node = node->rb_right;
+		else
+			break;
+	}
+
+	if (node)
+		kref_get(&elem->ref_cnt);
+
+out:
+	spin_unlock_irqrestore(&pool->pool_lock, flags);
+	return node ? (void *)elem : NULL;
+}
+
+void *rxe_pool_get_key(struct rxe_pool *pool, void *key)
+{
+	struct rb_node *node = NULL;
+	struct rxe_pool_entry *elem = NULL;
+	int cmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->pool_lock, flags);
+
+	if (pool->state != rxe_pool_valid)
+		goto out;
+
+	node = pool->tree.rb_node;
+
+	while (node) {
+		elem = rb_entry(node, struct rxe_pool_entry, node);
+
+		cmp = memcmp((u8 *)elem + pool->key_offset,
+			     key, pool->key_size);
+
+		if (cmp > 0)
+			node = node->rb_left;
+		else if (cmp < 0)
+			node = node->rb_right;
+		else
+			break;
+	}
+
+	if (node)
+		kref_get(&elem->ref_cnt);
+
+out:
+	spin_unlock_irqrestore(&pool->pool_lock, flags);
+	return node ? ((void *)elem) : NULL;
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 16/37] add rxe_task.h
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (14 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 15/37] add rxe_pool.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 17/37] add rxe_task.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (21 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_task.h --]
[-- Type: text/plain, Size: 3898 bytes --]

Declarations for tasklet handling.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_task.h |  107 +++++++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_task.h
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_task.h
@@ -0,0 +1,107 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *	   Redistribution and use in source and binary forms, with or
+ *	   without modification, are permitted provided that the following
+ *	   conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_TASK_H
+#define RXE_TASK_H
+
+enum {
+	TASK_STATE_START	= 0,
+	TASK_STATE_BUSY		= 1,
+	TASK_STATE_ARMED	= 2,
+};
+
+/*
+ * data structure to describe a 'task' which is a short
+ * function that returns 0 as long as it needs to be
+ * called again.
+ */
+struct rxe_task {
+	void			*obj;
+	int			*fast;
+	struct tasklet_struct	tasklet;
+	int			state;
+	spinlock_t		state_lock;
+	void			*arg;
+	int			(*func)(void *arg);
+	int			ret;
+	char			name[16];
+};
+
+/*
+ * init rxe_task structure
+ *	fast => address of control flag, likely a module parameter
+ *	arg  => parameter to pass to fcn
+ *	fcn  => function to call until it returns != 0
+ */
+int rxe_init_task(void *obj, struct rxe_task *task, int *fast,
+		  void *arg, int (*func)(void *), char *name);
+
+/*
+ * cleanup task
+ */
+void rxe_cleanup_task(struct rxe_task *task);
+
+/*
+ * raw call to func in loop without any checking
+ * can call when tasklets are disabled
+ */
+int __rxe_do_task(struct rxe_task *task);
+
+/*
+ * common function called by any of the main tasklets
+ * if someone is calling the function on its own
+ * thread (fast path) then they must increment busy
+ * If there is any chance that there is additional
+ * work to do someone must reschedule the task before
+ * leaving
+ */
+void rxe_do_task(unsigned long data);
+
+/*
+ * run a task, use fast path if no one else
+ * is currently running it and fast path is ok
+ * else schedule it to run as a tasklet
+ */
+void rxe_run_task(struct rxe_task *task, int sched);
+
+/*
+ * keep a task from scheduling
+ */
+void rxe_disable_task(struct rxe_task *task);
+
+/*
+ * allow task to run
+ */
+void rxe_enable_task(struct rxe_task *task);
+
+#endif /* RXE_TASK_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 17/37] add rxe_task.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (15 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 16/37] add rxe_task.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 18/37] add rxe_av.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (20 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_task.c --]
[-- Type: text/plain, Size: 5072 bytes --]

Tasklet handling details.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_task.c |  169 +++++++++++++++++++++++++++++++++++
 1 file changed, 169 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_task.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_task.c
@@ -0,0 +1,169 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *	   Redistribution and use in source and binary forms, with or
+ *	   without modification, are permitted provided that the following
+ *	   conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/interrupt.h>
+#include <linux/hardirq.h>
+
+#include "rxe_task.h"
+
+int __rxe_do_task(struct rxe_task *task)
+
+{
+	int ret;
+
+	while ((ret = task->func(task->arg)) == 0)
+		;
+
+	task->ret = ret;
+
+	return ret;
+}
+
+/*
+ * this locking is due to a potential race where
+ * a second caller finds the task already running
+ * but looks just after the last call to func
+ */
+void rxe_do_task(unsigned long data)
+{
+	int cont;
+	int ret;
+	unsigned long flags;
+	struct rxe_task *task = (struct rxe_task *)data;
+
+	spin_lock_irqsave(&task->state_lock, flags);
+	switch (task->state) {
+	case TASK_STATE_START:
+		task->state = TASK_STATE_BUSY;
+		spin_unlock_irqrestore(&task->state_lock, flags);
+		break;
+
+	case TASK_STATE_BUSY:
+		task->state = TASK_STATE_ARMED;
+		/* fall through to */
+	case TASK_STATE_ARMED:
+		spin_unlock_irqrestore(&task->state_lock, flags);
+		return;
+
+	default:
+		spin_unlock_irqrestore(&task->state_lock, flags);
+		pr_warn("bad state = %d in rxe_do_task\n", task->state);
+		return;
+	}
+
+	do {
+		cont = 0;
+		ret = task->func(task->arg);
+
+		spin_lock_irqsave(&task->state_lock, flags);
+		switch (task->state) {
+		case TASK_STATE_BUSY:
+			if (ret)
+				task->state = TASK_STATE_START;
+			else
+				cont = 1;
+			break;
+
+		/* soneone tried to run the task since the
+		   last time we called func, so we will call
+		   one more time regardless of the return value */
+		case TASK_STATE_ARMED:
+			task->state = TASK_STATE_BUSY;
+			cont = 1;
+			break;
+
+		default:
+			pr_warn("bad state = %d in rxe_do_task\n",
+				task->state);
+		}
+		spin_unlock_irqrestore(&task->state_lock, flags);
+	} while (cont);
+
+	task->ret = ret;
+}
+
+int rxe_init_task(void *obj, struct rxe_task *task, int *fast,
+		  void *arg, int (*func)(void *), char *name)
+{
+	task->obj	= obj;
+	task->fast	= fast;
+	task->arg	= arg;
+	task->func	= func;
+	snprintf(task->name, sizeof(task->name), "%s", name);
+
+	tasklet_init(&task->tasklet, rxe_do_task, (unsigned long)task);
+
+	task->state = TASK_STATE_START;
+	spin_lock_init(&task->state_lock);
+
+	return 0;
+}
+
+void rxe_cleanup_task(struct rxe_task *task)
+{
+	tasklet_kill(&task->tasklet);
+}
+
+/*
+ * depending on value of fast allow bypassing
+ * tasklet call or not
+ *	 0 => never
+ *	 1 => only if not in interrupt level
+ *	>1 => always
+ */
+static inline int rxe_fast_path_ok(int fast)
+{
+	if (fast == 0 || (fast == 1 && (in_irq() || irqs_disabled())))
+		return 0;
+	else
+		return 1;
+}
+
+void rxe_run_task(struct rxe_task *task, int sched)
+{
+	if (sched || !rxe_fast_path_ok(*task->fast))
+		tasklet_schedule(&task->tasklet);
+	else
+		rxe_do_task((unsigned long)task);
+}
+
+void rxe_disable_task(struct rxe_task *task)
+{
+	tasklet_disable(&task->tasklet);
+}
+
+void rxe_enable_task(struct rxe_task *task)
+{
+	tasklet_enable(&task->tasklet);
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 18/37] add rxe_av.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (16 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 17/37] add rxe_task.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 19/37] add rxe_srq.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (19 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_av.c --]
[-- Type: text/plain, Size: 3174 bytes --]

Address vector implementation details.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_av.c |   75 +++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_av.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_av.c
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *	   Redistribution and use in source and binary forms, with or
+ *	   without modification, are permitted provided that the following
+ *	   conditions are met:
+ *
+ *		- Redistributions of source code must retain the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer.
+ *
+ *		- Redistributions in binary form must reproduce the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer in the documentation and/or other materials
+ *		  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* address handle implementation shared by ah and qp verbs */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+int rxe_av_chk_attr(struct rxe_dev *rxe, struct ib_ah_attr *attr)
+{
+	struct rxe_port *port;
+
+	if (attr->port_num < 1 || attr->port_num > rxe->num_ports) {
+		pr_info("rxe: invalid port_num = %d\n", attr->port_num);
+		return -EINVAL;
+	}
+
+	port = &rxe->port[attr->port_num - 1];
+
+	if (attr->ah_flags & IB_AH_GRH) {
+		if (attr->grh.sgid_index > port->attr.gid_tbl_len) {
+			pr_info("rxe: invalid sgid index = %d\n",
+				 attr->grh.sgid_index);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+int rxe_av_from_attr(struct rxe_dev *rxe, u8 port_num,
+		     struct rxe_av *av, struct ib_ah_attr *attr)
+{
+	memset(av, 0, sizeof(*av));
+	av->attr = *attr;
+	av->attr.port_num = port_num;
+	return rxe->ifc_ops->init_av(rxe, attr, av);
+}
+
+int rxe_av_to_attr(struct rxe_dev *rxe, struct rxe_av *av,
+		   struct ib_ah_attr *attr)
+{
+	*attr = av->attr;
+	return 0;
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 19/37] add rxe_srq.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (17 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 18/37] add rxe_av.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 20/37] add rxe_cq.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (18 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_srq.c --]
[-- Type: text/plain, Size: 6454 bytes --]

Shared receive queue implementation details.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_srq.c |  213 ++++++++++++++++++++++++++++++++++++
 1 file changed, 213 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_srq.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_srq.c
@@ -0,0 +1,213 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* srq implementation details */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+int rxe_srq_chk_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
+		     struct ib_srq_attr *attr, enum ib_srq_attr_mask mask)
+{
+	if (srq && srq->error) {
+		pr_warn("srq in error state\n");
+		goto err1;
+	}
+
+	if (mask & IB_SRQ_MAX_WR) {
+		if (attr->max_wr > rxe->attr.max_srq_wr) {
+			pr_warn("max_wr(%d) > max_srq_wr(%d)\n",
+				 attr->max_wr, rxe->attr.max_srq_wr);
+			goto err1;
+		}
+
+		if (attr->max_wr <= 0) {
+			pr_warn("max_wr(%d) <= 0\n", attr->max_wr);
+			goto err1;
+		}
+
+		if (srq && !(rxe->attr.device_cap_flags &
+			IB_DEVICE_SRQ_RESIZE)) {
+			pr_warn("srq resize not supported\n");
+			goto err1;
+		}
+
+		if (srq && srq->limit && (attr->max_wr < srq->limit)) {
+			pr_warn("max_wr (%d) < srq->limit (%d)\n",
+				  attr->max_wr, srq->limit);
+			goto err1;
+		}
+
+		if (attr->max_wr < RXE_MIN_SRQ_WR)
+			attr->max_wr = RXE_MIN_SRQ_WR;
+	}
+
+	if (mask & IB_SRQ_LIMIT) {
+		if (attr->srq_limit > rxe->attr.max_srq_wr) {
+			pr_warn("srq_limit(%d) > max_srq_wr(%d)\n",
+				 attr->srq_limit, rxe->attr.max_srq_wr);
+			goto err1;
+		}
+
+		if (attr->srq_limit > srq->rq.queue->buf->index_mask) {
+			pr_warn("srq_limit (%d) > cur limit(%d)\n",
+				 attr->srq_limit,
+				 srq->rq.queue->buf->index_mask);
+			goto err1;
+		}
+	}
+
+	if (mask == IB_SRQ_INIT_MASK) {
+		if (attr->max_sge > rxe->attr.max_srq_sge) {
+			pr_warn("max_sge(%d) > max_srq_sge(%d)\n",
+				 attr->max_sge, rxe->attr.max_srq_sge);
+			goto err1;
+		}
+
+		if (attr->max_sge < RXE_MIN_SRQ_SGE)
+			attr->max_sge = RXE_MIN_SRQ_SGE;
+	}
+
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+int rxe_srq_from_init(struct rxe_dev *rxe, struct rxe_srq *srq,
+		      struct ib_srq_init_attr *init,
+		      struct ib_ucontext *context, struct ib_udata *udata)
+{
+	int err;
+	int srq_wqe_size;
+	struct rxe_queue *q;
+
+	srq->event_handler	= init->event_handler;
+	srq->context		= init->srq_context;
+	srq->limit		= init->attr.srq_limit;
+	srq->srq_num		= srq->pelem.index;
+	srq->rq.max_wr		= init->attr.max_wr;
+	srq->rq.max_sge		= init->attr.max_sge;
+
+	srq_wqe_size		= sizeof(struct rxe_recv_wqe) +
+				  srq->rq.max_sge*sizeof(struct ib_sge);
+
+	spin_lock_init(&srq->rq.producer_lock);
+	spin_lock_init(&srq->rq.consumer_lock);
+
+	q = rxe_queue_init(rxe, (unsigned int *)&srq->rq.max_wr,
+			   srq_wqe_size);
+	if (!q) {
+		pr_warn("unable to allocate queue for srq\n");
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	err = do_mmap_info(rxe, udata, 0, context, q->buf,
+			   q->buf_size, &q->ip);
+	if (err)
+		goto err2;
+
+	srq->rq.queue = q;
+
+	if (udata && udata->outlen >= sizeof(struct mminfo) + sizeof(u32))
+		return copy_to_user(udata->outbuf + sizeof(struct mminfo),
+			&srq->srq_num, sizeof(u32));
+	else
+		return 0;
+
+err2:
+	vfree(q->buf);
+	kfree(q);
+err1:
+	return err;
+}
+
+int rxe_srq_from_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
+		      struct ib_srq_attr *attr, enum ib_srq_attr_mask mask,
+		      struct ib_udata *udata)
+{
+	int err;
+	struct rxe_queue *q = srq->rq.queue;
+	struct mminfo mi = { .offset = 1, .size = 0};
+
+	if (mask & IB_SRQ_MAX_WR) {
+		/* Check that we can write the mminfo struct to user space */
+		if (udata && udata->inlen >= sizeof(__u64)) {
+			__u64 mi_addr;
+
+			/* Get address of user space mminfo struct */
+			err = ib_copy_from_udata(&mi_addr, udata,
+						sizeof(mi_addr));
+			if (err)
+				goto err1;
+
+			udata->outbuf = (void __user *)(unsigned long)mi_addr;
+			udata->outlen = sizeof(mi);
+
+			err = ib_copy_to_udata(udata, &mi, sizeof(mi));
+			if (err)
+				goto err1;
+		}
+
+		err = rxe_queue_resize(q, (unsigned int *)&attr->max_wr,
+				       RCV_WQE_SIZE(srq->rq.max_sge),
+				       srq->rq.queue->ip ?
+						srq->rq.queue->ip->context :
+						NULL,
+				       udata, &srq->rq.producer_lock,
+				       &srq->rq.consumer_lock);
+		if (err)
+			goto err2;
+	}
+
+	if (mask & IB_SRQ_LIMIT)
+		srq->limit = attr->srq_limit;
+
+	return 0;
+
+err2:
+	rxe_queue_cleanup(q);
+	srq->rq.queue = NULL;
+err1:
+	return err;
+}
+
+void rxe_srq_cleanup(void *arg)
+{
+	struct rxe_srq *srq = (struct rxe_srq *)arg;
+
+	if (srq->rq.queue)
+		rxe_queue_cleanup(srq->rq.queue);
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 20/37] add rxe_cq.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (18 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 19/37] add rxe_srq.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.112267527-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 21/37] add rxe_qp.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (17 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_cq.c --]
[-- Type: text/plain, Size: 5353 bytes --]

Completion queue implementation details.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_cq.c |  177 +++++++++++++++++++++++++++++++++++++
 1 file changed, 177 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_cq.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_cq.c
@@ -0,0 +1,177 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *	   Redistribution and use in source and binary forms, with or
+ *	   without modification, are permitted provided that the following
+ *	   conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* cq implementation details */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
+		    int cqe, int comp_vector, struct ib_udata *udata)
+{
+	int count;
+
+	if (cqe <= 0) {
+		pr_warn("cqe(%d) <= 0\n", cqe);
+		goto err1;
+	}
+
+	if (cqe > rxe->attr.max_cqe) {
+		pr_warn("cqe(%d) > max_cqe(%d)\n",
+			 cqe, rxe->attr.max_cqe);
+		goto err1;
+	}
+
+	if (cq) {
+		count = queue_count(cq->queue);
+		if (cqe < count) {
+			pr_warn("cqe(%d) < current # elements in queue (%d)",
+				cqe, count);
+			goto err1;
+		}
+	}
+
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+static void rxe_send_complete(unsigned long data)
+{
+	struct rxe_cq *cq = (struct rxe_cq *)data;
+
+	while (1) {
+		u8 notify = cq->notify;
+
+		cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
+
+		/* See if anything was added to the CQ during the
+		 * comp_handler call.  If so, go around again because we
+		 * won't be rescheduled.
+		 * XXX: is there a race on enqueue right after this test
+		 * but before we're out? */
+		if (notify == cq->notify)
+			return;
+	}
+}
+
+int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe,
+		     int comp_vector, struct ib_ucontext *context,
+		     struct ib_udata *udata)
+{
+	int err;
+
+	cq->queue = rxe_queue_init(rxe, (unsigned int *)&cqe,
+				   sizeof(struct rxe_cqe));
+	if (!cq->queue) {
+		pr_warn("unable to create cq\n");
+		return -ENOMEM;
+	}
+
+	err = do_mmap_info(rxe, udata, 0, context, cq->queue->buf,
+			   cq->queue->buf_size, &cq->queue->ip);
+	if (err) {
+		vfree(cq->queue->buf);
+		kfree(cq->queue);
+	}
+
+	if (udata)
+		cq->is_user = 1;
+
+	tasklet_init(&cq->comp_task, rxe_send_complete, (unsigned long)cq);
+
+	spin_lock_init(&cq->cq_lock);
+	cq->ibcq.cqe = cqe;
+	return 0;
+}
+
+int rxe_cq_resize_queue(struct rxe_cq *cq, int cqe, struct ib_udata *udata)
+{
+	int err;
+
+	err = rxe_queue_resize(cq->queue, (unsigned int *)&cqe,
+			       sizeof(struct rxe_cqe),
+			       cq->queue->ip ? cq->queue->ip->context : NULL,
+			       udata, NULL, &cq->cq_lock);
+	if (!err)
+		cq->ibcq.cqe = cqe;
+
+	return err;
+}
+
+int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
+{
+	struct ib_event ev;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cq->cq_lock, flags);
+
+	if (unlikely(queue_full(cq->queue))) {
+		spin_unlock_irqrestore(&cq->cq_lock, flags);
+		if (cq->ibcq.event_handler) {
+			ev.device = cq->ibcq.device;
+			ev.element.cq = &cq->ibcq;
+			ev.event = IB_EVENT_CQ_ERR;
+			cq->ibcq.event_handler(&ev, cq->ibcq.cq_context);
+		}
+
+		return -EBUSY;
+	}
+
+	memcpy(producer_addr(cq->queue), cqe, sizeof(*cqe));
+
+	/* make sure all changes to the CQ are written before
+	   we update the producer pointer */
+	wmb();
+
+	advance_producer(cq->queue);
+	spin_unlock_irqrestore(&cq->cq_lock, flags);
+
+	if ((cq->notify == IB_CQ_NEXT_COMP) ||
+	    (cq->notify == IB_CQ_SOLICITED && solicited)) {
+		cq->notify++;
+		tasklet_schedule(&cq->comp_task);
+	}
+
+	return 0;
+}
+
+void rxe_cq_cleanup(void *arg)
+{
+	struct rxe_cq *cq = arg;
+
+	if (cq->queue)
+		rxe_queue_cleanup(cq->queue);
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 21/37] add rxe_qp.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (19 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 20/37] add rxe_cq.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.163423781-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 22/37] add rxe_mr.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (16 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_qp.c --]
[-- Type: text/plain, Size: 21064 bytes --]

Queue pair implementation details.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_qp.c |  821 +++++++++++++++++++++++++++++++++++++
 1 file changed, 821 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_qp.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_qp.c
@@ -0,0 +1,821 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *	   Redistribution and use in source and binary forms, with or
+ *	   without modification, are permitted provided that the following
+ *	   conditions are met:
+ *
+ *		- Redistributions of source code must retain the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer.
+ *
+ *		- Redistributions in binary form must reproduce the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer in the documentation and/or other materials
+ *		  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* qp implementation details */
+
+#include <linux/skbuff.h>
+#include <linux/delay.h>
+#include <linux/sched.h>
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+#include "rxe_task.h"
+
+char *rxe_qp_state_name[] = {
+	[QP_STATE_RESET]	= "RESET",
+	[QP_STATE_INIT]		= "INIT",
+	[QP_STATE_READY]	= "READY",
+	[QP_STATE_DRAIN]	= "DRAIN",
+	[QP_STATE_DRAINED]	= "DRAINED",
+	[QP_STATE_ERROR]	= "ERROR",
+};
+
+static int rxe_qp_chk_cap(struct rxe_dev *rxe, struct ib_qp_cap *cap,
+			  int has_srq)
+{
+	if (cap->max_send_wr > rxe->attr.max_qp_wr) {
+		pr_warn("invalid send wr = %d > %d\n",
+			cap->max_send_wr, rxe->attr.max_qp_wr);
+		goto err1;
+	}
+
+	if (cap->max_send_sge > rxe->attr.max_sge) {
+		pr_warn("invalid send sge = %d > %d\n",
+			cap->max_send_sge, rxe->attr.max_sge);
+		goto err1;
+	}
+
+	if (!has_srq) {
+		if (cap->max_recv_wr > rxe->attr.max_qp_wr) {
+			pr_warn("invalid recv wr = %d > %d\n",
+				cap->max_recv_wr, rxe->attr.max_qp_wr);
+			goto err1;
+		}
+
+		if (cap->max_recv_sge > rxe->attr.max_sge) {
+			pr_warn("invalid recv sge = %d > %d\n",
+				  cap->max_recv_sge, rxe->attr.max_sge);
+			goto err1;
+		}
+	}
+
+	if (cap->max_inline_data > rxe->max_inline_data) {
+		pr_warn("invalid max inline data = %d > %d\n",
+			cap->max_inline_data, rxe->max_inline_data);
+		goto err1;
+	}
+
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+int rxe_qp_chk_init(struct rxe_dev *rxe, struct ib_qp_init_attr *init)
+{
+	struct ib_qp_cap *cap = &init->cap;
+	struct rxe_port *port;
+	int port_num = init->port_num;
+
+	if (!init->recv_cq || !init->send_cq) {
+		pr_warn("missing cq\n");
+		goto err1;
+	}
+
+	if (rxe_qp_chk_cap(rxe, cap, init->srq != NULL))
+		goto err1;
+
+	if (init->qp_type == IB_QPT_SMI || init->qp_type == IB_QPT_GSI) {
+		if (port_num < 1 || port_num > rxe->num_ports) {
+			pr_warn("invalid port = %d\n", port_num);
+			goto err1;
+		}
+
+		port = &rxe->port[port_num - 1];
+
+		if (init->qp_type == IB_QPT_SMI && port->qp_smi_index) {
+			pr_warn("SMI QP exists for port %d\n", port_num);
+			goto err1;
+		}
+
+		if (init->qp_type == IB_QPT_GSI && port->qp_gsi_index) {
+			pr_warn("GSI QP exists for port %d\n", port_num);
+			goto err1;
+		}
+	}
+
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+static int alloc_rd_atomic_resources(struct rxe_qp *qp, unsigned int n)
+{
+	qp->resp.res_head = 0;
+	qp->resp.res_tail = 0;
+	qp->resp.resources = kcalloc(n, sizeof(struct resp_res), GFP_KERNEL);
+
+	if (!qp->resp.resources)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void free_rd_atomic_resources(struct rxe_qp *qp)
+{
+	if (qp->resp.resources) {
+		int i;
+
+		for (i = 0; i < qp->attr.max_rd_atomic; i++) {
+			struct resp_res *res = &qp->resp.resources[i];
+			free_rd_atomic_resource(qp, res);
+		}
+		kfree(qp->resp.resources);
+		qp->resp.resources = NULL;
+	}
+}
+
+void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+
+	if (res->type == RXE_ATOMIC_MASK) {
+		rxe_drop_ref(qp);
+		kfree_skb(res->atomic.skb);
+		atomic_dec(&rxe->resp_skb_out);
+	} else if (res->type == RXE_READ_MASK) {
+		if (res->read.mr)
+			rxe_drop_ref(res->read.mr);
+	}
+	res->type = 0;
+}
+
+static void cleanup_rd_atomic_resources(struct rxe_qp *qp)
+{
+	int i;
+	struct resp_res *res;
+
+	if (qp->resp.resources) {
+		for (i = 0; i < qp->attr.max_rd_atomic; i++) {
+			res = &qp->resp.resources[i];
+			free_rd_atomic_resource(qp, res);
+		}
+	}
+}
+
+static void rxe_qp_init_misc(struct rxe_dev *rxe, struct rxe_qp *qp,
+			     struct ib_qp_init_attr *init)
+{
+	struct rxe_port *port;
+	u32 qpn;
+
+	qp->sq_sig_type		= init->sq_sig_type;
+	qp->attr.path_mtu	= 1;
+	qp->mtu			= 256;
+
+	qpn			= qp->pelem.index;
+	port			= &rxe->port[init->port_num - 1];
+
+	switch (init->qp_type) {
+	case IB_QPT_SMI:
+		qp->ibqp.qp_num		= 0;
+		port->qp_smi_index	= qpn;
+		qp->attr.port_num	= init->port_num;
+		break;
+
+	case IB_QPT_GSI:
+		qp->ibqp.qp_num		= 1;
+		port->qp_gsi_index	= qpn;
+		qp->attr.port_num	= init->port_num;
+		break;
+
+	default:
+		qp->ibqp.qp_num		= qpn;
+		break;
+	}
+
+	INIT_LIST_HEAD(&qp->arbiter_list);
+	INIT_LIST_HEAD(&qp->grp_list);
+
+	skb_queue_head_init(&qp->send_pkts);
+
+	spin_lock_init(&qp->grp_lock);
+	spin_lock_init(&qp->state_lock);
+
+	atomic_set(&qp->ssn, 0);
+	atomic_set(&qp->req_skb_in, 0);
+	atomic_set(&qp->resp_skb_in, 0);
+	atomic_set(&qp->req_skb_out, 0);
+	atomic_set(&qp->resp_skb_out, 0);
+}
+
+static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp,
+			   struct ib_qp_init_attr *init,
+			   struct ib_ucontext *context, struct ib_udata *udata)
+{
+	int err;
+	int wqe_size;
+
+	qp->sq.max_wr		= init->cap.max_send_wr;
+	qp->sq.max_sge		= init->cap.max_send_sge;
+	qp->sq.max_inline	= init->cap.max_inline_data;
+
+	wqe_size = max_t(int, sizeof(struct rxe_send_wqe) +
+				qp->sq.max_sge*sizeof(struct ib_sge),
+				sizeof(struct rxe_send_wqe) +
+					qp->sq.max_inline);
+
+	qp->sq.queue		= rxe_queue_init(rxe,
+						 (unsigned int *)&qp->sq.max_wr,
+						 wqe_size);
+	if (!qp->sq.queue)
+		return -ENOMEM;
+
+	err = do_mmap_info(rxe, udata, sizeof(struct mminfo),
+			   context, qp->sq.queue->buf,
+			   qp->sq.queue->buf_size, &qp->sq.queue->ip);
+
+	if (err) {
+		vfree(qp->sq.queue->buf);
+		kfree(qp->sq.queue);
+	}
+
+	qp->req.wqe_index	= producer_index(qp->sq.queue);
+	qp->req.state		= QP_STATE_RESET;
+	qp->req.opcode		= -1;
+	qp->comp.opcode		= -1;
+
+	spin_lock_init(&qp->sq.sq_lock);
+	skb_queue_head_init(&qp->req_pkts);
+
+	rxe_init_task(rxe, &qp->req.task, &rxe_fast_req, qp,
+		      rxe_requester, "req");
+	rxe_init_task(rxe, &qp->comp.task, &rxe_fast_comp, qp,
+		      rxe_completer, "comp");
+
+	init_timer(&qp->rnr_nak_timer);
+	qp->rnr_nak_timer.function = rnr_nak_timer;
+	qp->rnr_nak_timer.data = (unsigned long)qp;
+
+	init_timer(&qp->retrans_timer);
+	qp->retrans_timer.function = retransmit_timer;
+	qp->retrans_timer.data = (unsigned long)qp;
+	qp->qp_timeout_jiffies = 0; /* Can't be set for UD/UC in modify_qp */
+
+	return 0;
+}
+
+static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp,
+			    struct ib_qp_init_attr *init,
+			    struct ib_ucontext *context, struct ib_udata *udata)
+{
+	int err;
+	int wqe_size;
+
+	if (!qp->srq) {
+		qp->rq.max_wr		= init->cap.max_recv_wr;
+		qp->rq.max_sge		= init->cap.max_recv_sge;
+
+		wqe_size = sizeof(struct rxe_recv_wqe) +
+			       qp->rq.max_sge*sizeof(struct ib_sge);
+
+		pr_debug("max_wr = %d, max_sge = %d, wqe_size = %d\n",
+			 qp->rq.max_wr, qp->rq.max_sge, wqe_size);
+
+		qp->rq.queue		= rxe_queue_init(rxe,
+						(unsigned int *)&qp->rq.max_wr,
+						wqe_size);
+		if (!qp->rq.queue)
+			return -ENOMEM;
+
+		err = do_mmap_info(rxe, udata, 0, context, qp->rq.queue->buf,
+				   qp->rq.queue->buf_size, &qp->rq.queue->ip);
+		if (err) {
+			vfree(qp->rq.queue->buf);
+			kfree(qp->rq.queue);
+		}
+
+	}
+
+	spin_lock_init(&qp->rq.producer_lock);
+	spin_lock_init(&qp->rq.consumer_lock);
+
+	skb_queue_head_init(&qp->resp_pkts);
+
+	rxe_init_task(rxe, &qp->resp.task, &rxe_fast_resp, qp,
+		      rxe_responder, "resp");
+
+	qp->resp.opcode		= OPCODE_NONE;
+	qp->resp.msn		= 0;
+	qp->resp.state		= QP_STATE_RESET;
+
+	return 0;
+}
+
+/* called by the create qp verb */
+int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
+		     struct ib_qp_init_attr *init, struct ib_udata *udata,
+		     struct ib_pd *ibpd)
+{
+	int err;
+	struct rxe_cq *rcq = to_rcq(init->recv_cq);
+	struct rxe_cq *scq = to_rcq(init->send_cq);
+	struct rxe_srq *srq = init->srq ? to_rsrq(init->srq) : NULL;
+	struct ib_ucontext *context = udata ? ibpd->uobject->context : NULL;
+
+	rxe_add_ref(pd);
+	rxe_add_ref(rcq);
+	rxe_add_ref(scq);
+	if (srq)
+		rxe_add_ref(srq);
+
+	qp->pd			= pd;
+	qp->rcq			= rcq;
+	qp->scq			= scq;
+	qp->srq			= srq;
+	qp->udata		= udata;
+
+	rxe_qp_init_misc(rxe, qp, init);
+
+	err = rxe_qp_init_req(rxe, qp, init, context, udata);
+	if (err)
+		goto err1;
+
+	err = rxe_qp_init_resp(rxe, qp, init, context, udata);
+	if (err)
+		goto err2;
+
+	qp->attr.qp_state = IB_QPS_RESET;
+	qp->valid = 1;
+
+	return 0;
+
+err2:
+	rxe_queue_cleanup(qp->sq.queue);
+err1:
+	if (srq)
+		rxe_drop_ref(srq);
+	rxe_drop_ref(scq);
+	rxe_drop_ref(rcq);
+	rxe_drop_ref(pd);
+
+	return err;
+}
+
+/* called by the query qp verb */
+int rxe_qp_to_init(struct rxe_qp *qp, struct ib_qp_init_attr *init)
+{
+	init->event_handler		= qp->ibqp.event_handler;
+	init->qp_context		= qp->ibqp.qp_context;
+	init->send_cq			= qp->ibqp.send_cq;
+	init->recv_cq			= qp->ibqp.recv_cq;
+	init->srq			= qp->ibqp.srq;
+
+	init->cap.max_send_wr		= qp->sq.max_wr;
+	init->cap.max_send_sge		= qp->sq.max_sge;
+	init->cap.max_inline_data	= qp->sq.max_inline;
+
+	if (!qp->srq) {
+		init->cap.max_recv_wr		= qp->rq.max_wr;
+		init->cap.max_recv_sge		= qp->rq.max_sge;
+	}
+
+	init->sq_sig_type		= qp->sq_sig_type;
+
+	init->qp_type			= qp->ibqp.qp_type;
+	init->port_num			= 1;
+
+	return 0;
+}
+
+/* called by the modify qp verb, this routine
+   checks all the parameters before making any changes */
+int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp *qp,
+		    struct ib_qp_attr *attr, int mask)
+{
+	enum ib_qp_state cur_state = (mask & IB_QP_CUR_STATE) ?
+					attr->cur_qp_state : qp->attr.qp_state;
+	enum ib_qp_state new_state = (mask & IB_QP_STATE) ?
+					attr->qp_state : cur_state;
+
+	if (!ib_modify_qp_is_ok(cur_state, new_state, qp_type(qp), mask)) {
+		pr_warn("invalid mask or state for qp\n");
+		goto err1;
+	}
+
+	if (mask & IB_QP_STATE) {
+		if (cur_state == IB_QPS_SQD) {
+			if (qp->req.state == QP_STATE_DRAIN &&
+					new_state != IB_QPS_ERR)
+				goto err1;
+		}
+	}
+
+	if (mask & IB_QP_PORT) {
+		if (attr->port_num < 1 || attr->port_num > rxe->num_ports) {
+			pr_warn("invalid port %d\n", attr->port_num);
+			goto err1;
+		}
+	}
+
+	if (mask & IB_QP_CAP && rxe_qp_chk_cap(rxe, &attr->cap,
+					       qp->srq != NULL))
+		goto err1;
+
+	if (mask & IB_QP_AV && rxe_av_chk_attr(rxe, &attr->ah_attr))
+		goto err1;
+
+	if (mask & IB_QP_ALT_PATH && rxe_av_chk_attr(rxe, &attr->alt_ah_attr))
+		goto err1;
+
+	if (mask & IB_QP_PATH_MTU) {
+		struct rxe_port *port = &rxe->port[qp->attr.port_num - 1];
+		enum rxe_mtu max_mtu = (enum rxe_mtu __force)port->attr.max_mtu;
+		enum rxe_mtu mtu = (enum rxe_mtu __force)attr->path_mtu;
+
+		if (mtu > max_mtu) {
+			pr_debug("invalid mtu (%d) > (%d)\n",
+				  rxe_mtu_enum_to_int(mtu),
+				  rxe_mtu_enum_to_int(max_mtu));
+			goto err1;
+		}
+	}
+
+	if (mask & IB_QP_MAX_QP_RD_ATOMIC) {
+		if (attr->max_rd_atomic > rxe->attr.max_qp_rd_atom) {
+			pr_warn("invalid max_rd_atomic %d > %d\n",
+				 attr->max_rd_atomic,
+				 rxe->attr.max_qp_rd_atom);
+			goto err1;
+		}
+	}
+
+	if (mask & IB_QP_TIMEOUT) {
+		if (attr->timeout > 31) {
+			pr_warn("invalid QP timeout %d > 31\n",
+				 attr->timeout);
+			goto err1;
+		}
+	}
+
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+/* move the qp to the reset state */
+static void rxe_qp_reset(struct rxe_qp *qp)
+{
+	/* stop tasks from running */
+	rxe_disable_task(&qp->resp.task);
+
+	/* stop request/comp */
+	if (qp_type(qp) == IB_QPT_RC)
+		rxe_disable_task(&qp->comp.task);
+	rxe_disable_task(&qp->req.task);
+
+	/* move qp to the reset state */
+	qp->req.state = QP_STATE_RESET;
+	qp->resp.state = QP_STATE_RESET;
+
+	/* let state machines reset themselves
+	   drain work and packet queues etc. */
+	__rxe_do_task(&qp->resp.task);
+
+	if (qp->sq.queue) {
+		__rxe_do_task(&qp->comp.task);
+		__rxe_do_task(&qp->req.task);
+	}
+
+	/* cleanup attributes */
+	atomic_set(&qp->ssn, 0);
+	qp->req.opcode = -1;
+	qp->req.need_retry = 0;
+	qp->req.noack_pkts = 0;
+	qp->resp.msn = 0;
+	qp->resp.opcode = -1;
+	qp->resp.drop_msg = 0;
+	qp->resp.goto_error = 0;
+	qp->resp.sent_psn_nak = 0;
+
+	if (qp->resp.mr) {
+		rxe_drop_ref(qp->resp.mr);
+		qp->resp.mr = NULL;
+	}
+
+	cleanup_rd_atomic_resources(qp);
+
+	/* reenable tasks */
+	rxe_enable_task(&qp->resp.task);
+
+	if (qp->sq.queue) {
+		if (qp_type(qp) == IB_QPT_RC)
+			rxe_enable_task(&qp->comp.task);
+
+		rxe_enable_task(&qp->req.task);
+	}
+}
+
+/* drain the send queue */
+static void rxe_qp_drain(struct rxe_qp *qp)
+{
+	if (qp->sq.queue) {
+		if (qp->req.state != QP_STATE_DRAINED) {
+			qp->req.state = QP_STATE_DRAIN;
+			if (qp_type(qp) == IB_QPT_RC)
+				rxe_run_task(&qp->comp.task, 1);
+			else
+				__rxe_do_task(&qp->comp.task);
+			rxe_run_task(&qp->req.task, 1);
+		}
+	}
+}
+
+/* move the qp to the error state */
+void rxe_qp_error(struct rxe_qp *qp)
+{
+	qp->req.state = QP_STATE_ERROR;
+	qp->resp.state = QP_STATE_ERROR;
+
+	/* drain work and packet queues */
+	rxe_run_task(&qp->resp.task, 1);
+
+	if (qp_type(qp) == IB_QPT_RC)
+		rxe_run_task(&qp->comp.task, 1);
+	else
+		__rxe_do_task(&qp->comp.task);
+	rxe_run_task(&qp->req.task, 1);
+}
+
+/* called by the modify qp verb */
+int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
+		     struct ib_udata *udata)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+
+	/* TODO should handle error by leaving old resources intact */
+	if (mask & IB_QP_MAX_QP_RD_ATOMIC) {
+		int max_rd_atomic = __roundup_pow_of_two(attr->max_rd_atomic);
+
+		free_rd_atomic_resources(qp);
+
+		err = alloc_rd_atomic_resources(qp, max_rd_atomic);
+		if (err)
+			return err;
+
+		qp->attr.max_rd_atomic = max_rd_atomic;
+		atomic_set(&qp->req.rd_atomic, max_rd_atomic);
+	}
+
+	if (mask & IB_QP_CUR_STATE)
+		qp->attr.cur_qp_state = attr->qp_state;
+
+	if (mask & IB_QP_EN_SQD_ASYNC_NOTIFY)
+		qp->attr.en_sqd_async_notify = attr->en_sqd_async_notify;
+
+	if (mask & IB_QP_ACCESS_FLAGS)
+		qp->attr.qp_access_flags = attr->qp_access_flags;
+
+	if (mask & IB_QP_PKEY_INDEX)
+		qp->attr.pkey_index = attr->pkey_index;
+
+	if (mask & IB_QP_PORT)
+		qp->attr.port_num = attr->port_num;
+
+	if (mask & IB_QP_QKEY)
+		qp->attr.qkey = attr->qkey;
+
+	if (mask & IB_QP_AV) {
+		rxe_av_from_attr(rxe, attr->port_num, &qp->pri_av,
+				 &attr->ah_attr);
+	}
+
+	if (mask & IB_QP_ALT_PATH) {
+		rxe_av_from_attr(rxe, attr->alt_port_num, &qp->alt_av,
+				 &attr->alt_ah_attr);
+		qp->attr.alt_port_num = attr->alt_port_num;
+		qp->attr.alt_pkey_index = attr->alt_pkey_index;
+		qp->attr.alt_timeout = attr->alt_timeout;
+	}
+
+	if (mask & IB_QP_PATH_MTU) {
+		qp->attr.path_mtu = attr->path_mtu;
+		qp->mtu = rxe_mtu_enum_to_int((enum rxe_mtu)attr->path_mtu);
+	}
+
+	if (mask & IB_QP_TIMEOUT) {
+		qp->attr.timeout = attr->timeout;
+		if (attr->timeout == 0) {
+			qp->qp_timeout_jiffies = 0;
+		} else {
+			int j = usecs_to_jiffies(4ULL << attr->timeout);
+			qp->qp_timeout_jiffies = j ? j : 1;
+		}
+	}
+
+	if (mask & IB_QP_RETRY_CNT) {
+		qp->attr.retry_cnt = attr->retry_cnt;
+		qp->comp.retry_cnt = attr->retry_cnt;
+		pr_debug("set retry count = %d\n", attr->retry_cnt);
+	}
+
+	if (mask & IB_QP_RNR_RETRY) {
+		qp->attr.rnr_retry = attr->rnr_retry;
+		qp->comp.rnr_retry = attr->rnr_retry;
+		pr_debug("set rnr retry count = %d\n", attr->rnr_retry);
+	}
+
+	if (mask & IB_QP_RQ_PSN) {
+		qp->attr.rq_psn = (attr->rq_psn & BTH_PSN_MASK);
+		qp->resp.psn = qp->attr.rq_psn;
+		pr_debug("set resp psn = 0x%x\n", qp->resp.psn);
+	}
+
+	if (mask & IB_QP_MIN_RNR_TIMER) {
+		qp->attr.min_rnr_timer = attr->min_rnr_timer;
+		pr_debug("set min rnr timer = 0x%x\n",
+			  attr->min_rnr_timer);
+	}
+
+	if (mask & IB_QP_SQ_PSN) {
+		qp->attr.sq_psn = (attr->sq_psn & BTH_PSN_MASK);
+		qp->req.psn = qp->attr.sq_psn;
+		qp->comp.psn = qp->attr.sq_psn;
+		pr_debug("set req psn = 0x%x\n", qp->req.psn);
+	}
+
+	if (mask & IB_QP_MAX_DEST_RD_ATOMIC) {
+		qp->attr.max_dest_rd_atomic =
+			__roundup_pow_of_two(attr->max_dest_rd_atomic);
+	}
+
+	if (mask & IB_QP_PATH_MIG_STATE)
+		qp->attr.path_mig_state = attr->path_mig_state;
+
+	if (mask & IB_QP_DEST_QPN)
+		qp->attr.dest_qp_num = attr->dest_qp_num;
+
+	if (mask & IB_QP_STATE) {
+		qp->attr.qp_state = attr->qp_state;
+
+		switch (attr->qp_state) {
+		case IB_QPS_RESET:
+			pr_debug("qp state -> RESET\n");
+			rxe_qp_reset(qp);
+			break;
+
+		case IB_QPS_INIT:
+			pr_debug("qp state -> INIT\n");
+			qp->req.state = QP_STATE_INIT;
+			qp->resp.state = QP_STATE_INIT;
+			break;
+
+		case IB_QPS_RTR:
+			pr_debug("qp state -> RTR\n");
+			qp->resp.state = QP_STATE_READY;
+			break;
+
+		case IB_QPS_RTS:
+			pr_debug("qp state -> RTS\n");
+			qp->req.state = QP_STATE_READY;
+			break;
+
+		case IB_QPS_SQD:
+			pr_debug("qp state -> SQD\n");
+			rxe_qp_drain(qp);
+			break;
+
+		case IB_QPS_SQE:
+			pr_warn("qp state -> SQE !!?\n");
+			/* Not possible from modify_qp. */
+			break;
+
+		case IB_QPS_ERR:
+			pr_debug("qp state -> ERR\n");
+			rxe_qp_error(qp);
+			break;
+		}
+	}
+
+	return 0;
+}
+
+/* called by the query qp verb */
+int rxe_qp_to_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+
+	*attr = qp->attr;
+
+	attr->rq_psn				= qp->resp.psn;
+	attr->sq_psn				= qp->req.psn;
+
+	attr->cap.max_send_wr			= qp->sq.max_wr;
+	attr->cap.max_send_sge			= qp->sq.max_sge;
+	attr->cap.max_inline_data		= qp->sq.max_inline;
+
+	if (!qp->srq) {
+		attr->cap.max_recv_wr		= qp->rq.max_wr;
+		attr->cap.max_recv_sge		= qp->rq.max_sge;
+	}
+
+	rxe_av_to_attr(rxe, &qp->pri_av, &attr->ah_attr);
+	rxe_av_to_attr(rxe, &qp->alt_av, &attr->alt_ah_attr);
+
+	if (qp->req.state == QP_STATE_DRAIN) {
+		attr->sq_draining = 1;
+		/* applications that get this state
+		 * typically spin on it. yield the
+		 * processor */
+		yield();
+	} else {
+		attr->sq_draining = 0;
+	}
+
+	pr_debug("attr->sq_draining = %d\n", attr->sq_draining);
+
+	return 0;
+}
+
+/* called by the destroy qp verb */
+void rxe_qp_destroy(struct rxe_qp *qp)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+
+	qp->valid = 0;
+	qp->qp_timeout_jiffies = 0;
+	rxe_cleanup_task(&qp->resp.task);
+
+	del_timer_sync(&qp->retrans_timer);
+	del_timer_sync(&qp->rnr_nak_timer);
+
+	rxe_cleanup_task(&qp->req.task);
+	if (qp_type(qp) == IB_QPT_RC)
+		rxe_cleanup_task(&qp->comp.task);
+
+	/* flush out any receive wr's or pending requests */
+	__rxe_do_task(&qp->req.task);
+	if (qp->sq.queue) {
+		__rxe_do_task(&qp->comp.task);
+		__rxe_do_task(&qp->req.task);
+	}
+
+	/* drain the output queue */
+	while (!list_empty(&qp->arbiter_list))
+		__rxe_do_task(&rxe->arbiter.task);
+}
+
+/* called when the last reference to the qp is dropped */
+void rxe_qp_cleanup(void *arg)
+{
+	struct rxe_qp *qp = arg;
+
+	rxe_drop_all_mcast_groups(qp);
+
+	if (qp->sq.queue)
+		rxe_queue_cleanup(qp->sq.queue);
+
+	if (qp->srq)
+		rxe_drop_ref(qp->srq);
+
+	if (qp->rq.queue)
+		rxe_queue_cleanup(qp->rq.queue);
+
+	if (qp->scq)
+		rxe_drop_ref(qp->scq);
+	if (qp->rcq)
+		rxe_drop_ref(qp->rcq);
+	if (qp->pd)
+		rxe_drop_ref(qp->pd);
+
+	if (qp->resp.mr) {
+		rxe_drop_ref(qp->resp.mr);
+		qp->resp.mr = NULL;
+	}
+
+	free_rd_atomic_resources(qp);
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 22/37] add rxe_mr.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (20 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 21/37] add rxe_qp.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 23/37] add rxe_mcast.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (15 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_mr.c --]
[-- Type: text/plain, Size: 16701 bytes --]

Memory region implementation details.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_mr.c |  768 +++++++++++++++++++++++++++++++++++++
 1 file changed, 768 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_mr.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_mr.c
@@ -0,0 +1,768 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+/*
+ * lfsr (linear feedback shift register) with period 255
+ */
+static u8 rxe_get_key(void)
+{
+	static unsigned key = 1;
+
+	key = key << 1;
+
+	key |= (0 != (key & 0x100)) ^ (0 != (key & 0x10))
+		^ (0 != (key & 0x80)) ^ (0 != (key & 0x40));
+
+	key &= 0xff;
+
+	return key;
+}
+
+int mem_check_range(struct rxe_mem *mem, u64 iova, size_t length)
+{
+	switch (mem->type) {
+	case RXE_MEM_TYPE_DMA:
+		return 0;
+
+	case RXE_MEM_TYPE_MR:
+	case RXE_MEM_TYPE_FMR:
+		return ((iova < mem->iova) ||
+			((iova + length) > (mem->iova + mem->length))) ?
+			-EFAULT : 0;
+
+	default:
+		return -EFAULT;
+	}
+}
+
+#define IB_ACCESS_REMOTE	(IB_ACCESS_REMOTE_READ		\
+				| IB_ACCESS_REMOTE_WRITE	\
+				| IB_ACCESS_REMOTE_ATOMIC)
+
+static void rxe_mem_init(int access, struct rxe_mem *mem)
+{
+	u32 lkey = mem->pelem.index << 8 | rxe_get_key();
+	u32 rkey = (access & IB_ACCESS_REMOTE) ? lkey : 0;
+
+	if (mem->pelem.pool->type == RXE_TYPE_MR) {
+		mem->ibmr.lkey		= lkey;
+		mem->ibmr.rkey		= rkey;
+	} else {
+		mem->ibfmr.lkey		= lkey;
+		mem->ibfmr.rkey		= rkey;
+	}
+
+	mem->pd			= NULL;
+	mem->umem		= NULL;
+	mem->lkey		= lkey;
+	mem->rkey		= rkey;
+	mem->state		= RXE_MEM_STATE_INVALID;
+	mem->type		= RXE_MEM_TYPE_NONE;
+	mem->va			= 0;
+	mem->iova		= 0;
+	mem->length		= 0;
+	mem->offset		= 0;
+	mem->access		= 0;
+	mem->page_shift		= 0;
+	mem->page_mask		= 0;
+	mem->map_shift		= ilog2(RXE_BUF_PER_MAP);
+	mem->map_mask		= 0;
+	mem->num_buf		= 0;
+	mem->max_buf		= 0;
+	mem->num_map		= 0;
+	mem->map		= NULL;
+}
+
+void rxe_mem_cleanup(void *arg)
+{
+	struct rxe_mem *mem = arg;
+	int i;
+
+	if (mem->umem)
+		ib_umem_release(mem->umem);
+
+	if (mem->map) {
+		for (i = 0; i < mem->num_map; i++)
+			kfree(mem->map[i]);
+
+		kfree(mem->map);
+	}
+}
+
+static int rxe_mem_alloc(struct rxe_dev *rxe, struct rxe_mem *mem, int num_buf)
+{
+	int i;
+	int num_map;
+	struct rxe_map **map = mem->map;
+
+	num_map = (num_buf + RXE_BUF_PER_MAP - 1) / RXE_BUF_PER_MAP;
+
+	mem->map = kmalloc(num_map*sizeof(*map), GFP_KERNEL);
+	if (!mem->map)
+		goto err1;
+
+	for (i = 0; i < num_map; i++) {
+		mem->map[i] = kmalloc(sizeof(**map), GFP_KERNEL);
+		if (!mem->map[i])
+			goto err2;
+	}
+
+	BUG_ON(!is_power_of_2(RXE_BUF_PER_MAP));
+
+	mem->map_shift	= ilog2(RXE_BUF_PER_MAP);
+	mem->map_mask	= RXE_BUF_PER_MAP - 1;
+
+	mem->num_buf = num_buf;
+	mem->num_map = num_map;
+	mem->max_buf = num_map*RXE_BUF_PER_MAP;
+
+	return 0;
+
+err2:
+	for (i--; i >= 0; i--)
+		kfree(mem->map[i]);
+
+	kfree(mem->map);
+err1:
+	return -ENOMEM;
+}
+
+int rxe_mem_init_dma(struct rxe_dev *rxe, struct rxe_pd *pd,
+	int access, struct rxe_mem *mem)
+{
+	rxe_mem_init(access, mem);
+
+	mem->pd			= pd;
+	mem->access		= access;
+	mem->state		= RXE_MEM_STATE_VALID;
+	mem->type		= RXE_MEM_TYPE_DMA;
+
+	return 0;
+}
+
+int rxe_mem_init_phys(struct rxe_dev *rxe, struct rxe_pd *pd, int access,
+		      u64 iova, struct ib_phys_buf *phys_buf, int num_buf,
+		      struct rxe_mem *mem)
+{
+	int i;
+	struct rxe_map **map;
+	struct ib_phys_buf *buf;
+	size_t length;
+	int err;
+	size_t min_size = (size_t)(-1L);
+	size_t max_size = 0;
+	int n;
+
+	rxe_mem_init(access, mem);
+
+	err = rxe_mem_alloc(rxe, mem, num_buf);
+	if (err)
+		goto err1;
+
+	length			= 0;
+	map			= mem->map;
+	buf			= map[0]->buf;
+	n			= 0;
+
+	for (i = 0; i < num_buf; i++) {
+		length	+= phys_buf->size;
+		max_size = max_t(int, max_size, phys_buf->size);
+		min_size = min_t(int, min_size, phys_buf->size);
+		*buf++	= *phys_buf++;
+		n++;
+
+		if (n == RXE_BUF_PER_MAP) {
+			map++;
+			buf = map[0]->buf;
+			n = 0;
+		}
+	}
+
+	if (max_size == min_size && is_power_of_2(max_size)) {
+		mem->page_shift		= ilog2(max_size);
+		mem->page_mask		= max_size - 1;
+	}
+
+	mem->pd			= pd;
+	mem->access		= access;
+	mem->iova		= iova;
+	mem->va			= iova;
+	mem->length		= length;
+	mem->state		= RXE_MEM_STATE_VALID;
+	mem->type		= RXE_MEM_TYPE_MR;
+
+	return 0;
+
+err1:
+	return err;
+}
+
+int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,
+	u64 length, u64 iova, int access, struct ib_udata *udata,
+	struct rxe_mem *mem)
+{
+	int			i;
+	struct rxe_map		**map;
+	struct ib_phys_buf	*buf = NULL;
+	struct ib_umem		*umem;
+	struct ib_umem_chunk	*chunk;
+	int			num_buf;
+	void			*vaddr;
+	int err;
+
+	umem = ib_umem_get(pd->ibpd.uobject->context, start, length, access, 0);
+	if (IS_ERR(umem)) {
+		pr_warn("err %d from rxe_umem_get\n",
+			 (int)PTR_ERR(umem));
+		err = -EINVAL;
+		goto err1;
+	}
+
+	mem->umem = umem;
+
+	num_buf = 0;
+	list_for_each_entry(chunk, &umem->chunk_list, list)
+		num_buf += chunk->nents;
+
+	rxe_mem_init(access, mem);
+
+	err = rxe_mem_alloc(rxe, mem, num_buf);
+	if (err) {
+		pr_warn("err %d from rxe_mem_alloc\n", err);
+		goto err1;
+	}
+
+	BUG_ON(!is_power_of_2(umem->page_size));
+
+	mem->page_shift		= ilog2(umem->page_size);
+	mem->page_mask		= umem->page_size - 1;
+
+	num_buf			= 0;
+	map			= mem->map;
+	if (length > 0) {
+		buf = map[0]->buf;
+
+		list_for_each_entry(chunk, &umem->chunk_list, list) {
+			for (i = 0; i < chunk->nents; i++) {
+				vaddr = page_address(sg_page(&chunk->
+							     page_list[i]));
+				if (!vaddr) {
+					pr_warn("null vaddr\n");
+					err = -ENOMEM;
+					goto err1;
+				}
+
+				buf->addr = (uintptr_t)vaddr;
+				buf->size = umem->page_size;
+				num_buf++;
+				buf++;
+
+				if (num_buf >= RXE_BUF_PER_MAP) {
+					map++;
+					buf = map[0]->buf;
+					num_buf = 0;
+				}
+			}
+		}
+	}
+
+	mem->pd			= pd;
+	mem->umem		= umem;
+	mem->access		= access;
+	mem->length		= length;
+	mem->iova		= iova;
+	mem->va			= start;
+	mem->offset		= umem->offset;
+	mem->state		= RXE_MEM_STATE_VALID;
+	mem->type		= RXE_MEM_TYPE_MR;
+
+	return 0;
+
+err1:
+	return err;
+}
+
+int rxe_mem_init_fast(struct rxe_dev *rxe, struct rxe_pd *pd,
+	int max_pages, struct rxe_mem *mem)
+{
+	int err;
+
+	rxe_mem_init(0, mem);	/* TODO what access does this have */
+
+	err = rxe_mem_alloc(rxe, mem, max_pages);
+	if (err)
+		goto err1;
+
+	/* TODO what page size do we assume */
+
+	mem->pd			= pd;
+	mem->max_buf		= max_pages;
+	mem->state		= RXE_MEM_STATE_FREE;
+	mem->type		= RXE_MEM_TYPE_MR;
+
+	return 0;
+
+err1:
+	return err;
+}
+
+int rxe_mem_init_mw(struct rxe_dev *rxe, struct rxe_pd *pd,
+		    struct rxe_mem *mem)
+{
+	rxe_mem_init(0, mem);
+
+	mem->pd			= pd;
+	mem->state		= RXE_MEM_STATE_FREE;
+	mem->type		= RXE_MEM_TYPE_MW;
+
+	return 0;
+}
+
+int rxe_mem_init_fmr(struct rxe_dev *rxe, struct rxe_pd *pd, int access,
+	struct ib_fmr_attr *attr, struct rxe_mem *mem)
+{
+	int err;
+
+	if (attr->max_maps > rxe->attr.max_map_per_fmr) {
+		pr_warn("max_mmaps = %d too big, max_map_per_fmr = %d\n",
+			attr->max_maps, rxe->attr.max_map_per_fmr);
+		err = -EINVAL;
+		goto err1;
+	}
+
+	rxe_mem_init(access, mem);
+
+	err = rxe_mem_alloc(rxe, mem, attr->max_pages);
+	if (err)
+		goto err1;
+
+	mem->pd			= pd;
+	mem->access		= access;
+	mem->page_shift		 = attr->page_shift;
+	mem->page_mask		= (1 << attr->page_shift) - 1;
+	mem->max_buf		= attr->max_pages;
+	mem->state		= RXE_MEM_STATE_FREE;
+	mem->type		= RXE_MEM_TYPE_FMR;
+
+	return 0;
+
+err1:
+	return err;
+}
+
+static void lookup_iova(
+	struct rxe_mem	*mem,
+	u64			iova,
+	int			*m_out,
+	int			*n_out,
+	size_t			*offset_out)
+{
+	size_t			offset = iova - mem->iova + mem->offset;
+	int			map_index;
+	int			buf_index;
+	u64			length;
+
+	if (likely(mem->page_shift)) {
+		*offset_out = offset & mem->page_mask;
+		offset >>= mem->page_shift;
+		*n_out = offset & mem->map_mask;
+		*m_out = offset >> mem->map_shift;
+	} else {
+		map_index = 0;
+		buf_index = 0;
+
+		length = mem->map[map_index]->buf[buf_index].size;
+
+		while (offset >= length) {
+			offset -= length;
+			buf_index++;
+
+			if (buf_index == RXE_BUF_PER_MAP) {
+				map_index++;
+				buf_index = 0;
+			}
+			length = mem->map[map_index]->buf[buf_index].size;
+		}
+
+		*m_out = map_index;
+		*n_out = buf_index;
+		*offset_out = offset;
+	}
+}
+
+void *iova_to_vaddr(struct rxe_mem *mem, u64 iova, int length)
+{
+	size_t offset;
+	int m, n;
+	void *addr;
+
+	if (mem->state != RXE_MEM_STATE_VALID) {
+		pr_warn("mem not in valid state\n");
+		addr = NULL;
+		goto out;
+	}
+
+	if (!mem->map) {
+		addr = (void *)(uintptr_t)iova;
+		goto out;
+	}
+
+	if (mem_check_range(mem, iova, length)) {
+		pr_warn("range violation\n");
+		addr = NULL;
+		goto out;
+	}
+
+	lookup_iova(mem, iova, &m, &n, &offset);
+
+	if (offset + length > mem->map[m]->buf[n].size) {
+		pr_warn("crosses page boundary\n");
+		addr = NULL;
+		goto out;
+	}
+
+	addr = (void *)(uintptr_t)mem->map[m]->buf[n].addr + offset;
+
+out:
+	return addr;
+}
+
+/* copy data from a range (vaddr, vaddr+length-1) to or from
+   a mem object starting at iova. Compute incremental value of
+   crc32 if crcp is not zero. caller must hold a reference to mem */
+int rxe_mem_copy(struct rxe_mem *mem, u64 iova, void *addr, int length,
+		 enum copy_direction dir, u32 *crcp)
+{
+	int			err;
+	int			bytes;
+	u8			*va;
+	struct rxe_map		**map;
+	struct ib_phys_buf	*buf;
+	int			m;
+	int			i;
+	size_t			offset;
+	u32			crc = crcp ? (*crcp) : 0;
+
+	if (mem->type == RXE_MEM_TYPE_DMA) {
+		uint8_t *src, *dest;
+
+		src  = (dir == direction_in) ?
+			addr : ((void *)(uintptr_t)iova);
+
+		dest = (dir == direction_in) ?
+			((void *)(uintptr_t)iova) : addr;
+
+		if (crcp)
+			*crcp = crc32_le(*crcp, src, length);
+
+		memcpy(dest, src, length);
+
+		return 0;
+	}
+
+	BUG_ON(!mem->map);
+
+	err = mem_check_range(mem, iova, length);
+	if (err) {
+		err = -EFAULT;
+		goto err1;
+	}
+
+	lookup_iova(mem, iova, &m, &i, &offset);
+
+	map	= mem->map + m;
+	buf	= map[0]->buf + i;
+
+	while (length > 0) {
+		uint8_t *src, *dest;
+
+		va	= (u8 *)(uintptr_t)buf->addr + offset;
+		src  = (dir == direction_in) ? addr : va;
+		dest = (dir == direction_in) ? va : addr;
+
+		bytes	= buf->size - offset;
+
+		if (bytes > length)
+			bytes = length;
+
+		if (crcp)
+			crc = crc32_le(crc, src, bytes);
+
+		memcpy(dest, src, bytes);
+
+		length	-= bytes;
+		addr	+= bytes;
+
+		offset	= 0;
+		buf++;
+		i++;
+
+		if (i == RXE_BUF_PER_MAP) {
+			i = 0;
+			map++;
+			buf = map[0]->buf;
+		}
+	}
+
+	if (crcp)
+		*crcp = crc;
+
+	return 0;
+
+err1:
+	return err;
+}
+
+/* copy data in or out of a wqe, i.e. sg list
+   under the control of a dma descriptor */
+int copy_data(
+	struct rxe_dev		*rxe,
+	struct rxe_pd		*pd,
+	int			access,
+	struct rxe_dma_info	*dma,
+	void			*addr,
+	int			length,
+	enum copy_direction	dir,
+	u32			*crcp)
+{
+	int			bytes;
+	struct ib_sge		*sge	= &dma->sge[dma->cur_sge];
+	int			offset	= dma->sge_offset;
+	int			resid	= dma->resid;
+	struct rxe_mem		*mem	= NULL;
+	u64			iova;
+	int			err;
+
+	if (length == 0)
+		return 0;
+
+	if (length > resid) {
+		err = -EINVAL;
+		goto err2;
+	}
+
+	if (sge->length && (offset < sge->length)) {
+		mem = lookup_mem(pd, access, sge->lkey, lookup_local);
+		if (!mem) {
+			err = -EINVAL;
+			goto err1;
+		}
+	}
+
+	while (length > 0) {
+		bytes = length;
+
+		if (offset >= sge->length) {
+			if (mem) {
+				rxe_drop_ref(mem);
+				mem = NULL;
+			}
+			sge++;
+			dma->cur_sge++;
+			offset = 0;
+
+			if (dma->cur_sge >= dma->num_sge) {
+				err = -ENOSPC;
+				goto err2;
+			}
+
+			if (sge->length) {
+				mem = lookup_mem(pd, access, sge->lkey,
+						 lookup_local);
+				if (!mem) {
+					err = -EINVAL;
+					goto err1;
+				}
+			} else
+				continue;
+		}
+
+		if (bytes > sge->length - offset)
+			bytes = sge->length - offset;
+
+		if (bytes > 0) {
+			iova = sge->addr + offset;
+
+			err = rxe_mem_copy(mem, iova, addr, bytes, dir, crcp);
+			if (err)
+				goto err2;
+
+			offset	+= bytes;
+			resid	-= bytes;
+			length	-= bytes;
+			addr	+= bytes;
+		}
+	}
+
+	dma->sge_offset = offset;
+	dma->resid	= resid;
+
+	if (mem)
+		rxe_drop_ref(mem);
+
+	return 0;
+
+err2:
+	if (mem)
+		rxe_drop_ref(mem);
+err1:
+	return err;
+}
+
+int advance_dma_data(struct rxe_dma_info *dma, unsigned int length)
+{
+	struct ib_sge		*sge	= &dma->sge[dma->cur_sge];
+	int			offset	= dma->sge_offset;
+	int			resid	= dma->resid;
+
+	while (length) {
+		unsigned int bytes;
+
+		if (offset >= sge->length) {
+			sge++;
+			dma->cur_sge++;
+			offset = 0;
+			if (dma->cur_sge >= dma->num_sge)
+				return -ENOSPC;
+		}
+
+		bytes = length;
+
+		if (bytes > sge->length - offset)
+			bytes = sge->length - offset;
+
+		offset	+= bytes;
+		resid	-= bytes;
+		length	-= bytes;
+	}
+
+	dma->sge_offset = offset;
+	dma->resid	= resid;
+
+	return 0;
+}
+
+/* (1) find the mem (mr, fmr or mw) corresponding to lkey/rkey
+       depending on lookup_type
+   (2) verify that the (qp) pd matches the mem pd
+   (3) verify that the mem can support the requested access
+   (4) verify that mem state is valid */
+struct rxe_mem *lookup_mem(struct rxe_pd *pd, int access, u32 key,
+			   enum lookup_type type)
+{
+	struct rxe_mem *mem;
+	struct rxe_dev *rxe = to_rdev(pd->ibpd.device);
+	int index = key >> 8;
+
+	if (index >= RXE_MIN_MR_INDEX && index <= RXE_MAX_MR_INDEX) {
+		mem = rxe_pool_get_index(&rxe->mr_pool, index);
+		if (!mem)
+			goto err1;
+	} else if (index >= RXE_MIN_FMR_INDEX && index <= RXE_MAX_FMR_INDEX) {
+		mem = rxe_pool_get_index(&rxe->fmr_pool, index);
+		if (!mem)
+			goto err1;
+	} else if (index >= RXE_MIN_MW_INDEX && index <= RXE_MAX_MW_INDEX) {
+		mem = rxe_pool_get_index(&rxe->mw_pool, index);
+		if (!mem)
+			goto err1;
+	} else
+		goto err1;
+
+	if ((type == lookup_local && mem->lkey != key)
+		|| (type == lookup_remote && mem->rkey != key))
+		goto err2;
+
+	if (mem->pd != pd)
+		goto err2;
+
+	if (access && !(access & mem->access))
+		goto err2;
+
+	if (mem->state != RXE_MEM_STATE_VALID)
+		goto err2;
+
+	return mem;
+
+err2:
+	rxe_drop_ref(mem);
+err1:
+	return NULL;
+}
+
+int rxe_mem_map_pages(struct rxe_dev *rxe, struct rxe_mem *mem,
+	u64 *page, int num_pages, u64 iova)
+{
+	int i;
+	int num_buf;
+	int err;
+	struct rxe_map **map;
+	struct ib_phys_buf *buf;
+	int page_size;
+
+	if (num_pages > mem->max_buf) {
+		err = -EINVAL;
+		goto err1;
+	}
+
+	num_buf		= 0;
+	page_size	= 1 << mem->page_shift;
+	map		= mem->map;
+	buf		= map[0]->buf;
+
+	for (i = 0; i < num_pages; i++) {
+		buf->addr = *page++;
+		buf->size = page_size;
+		buf++;
+		num_buf++;
+
+		if (num_buf == RXE_BUF_PER_MAP) {
+			map++;
+			buf = map[0]->buf;
+			num_buf = 0;
+		}
+	}
+
+	mem->iova	= iova;
+	mem->va		= iova;
+	mem->length	= num_pages << mem->page_shift;
+	mem->state	= RXE_MEM_STATE_VALID;
+
+	return 0;
+
+err1:
+	return err;
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 23/37] add rxe_mcast.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (21 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 22/37] add rxe_mr.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.264963653-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 24/37] add rxe_recv.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (14 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_mcast.c --]
[-- Type: text/plain, Size: 5439 bytes --]

Multicast implemtation details.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_mcast.c |  192 ++++++++++++++++++++++++++++++++++
 1 file changed, 192 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_mcast.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_mcast.c
@@ -0,0 +1,192 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *	   Redistribution and use in source and binary forms, with or
+ *	   without modification, are permitted provided that the following
+ *	   conditions are met:
+ *
+ *		- Redistributions of source code must retain the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer.
+ *
+ *		- Redistributions in binary form must reproduce the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer in the documentation and/or other materials
+ *		  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* multicast implemtation details */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid, u16 mlid,
+		      struct rxe_mc_grp **grp_p)
+{
+	int err;
+	struct rxe_mc_grp *grp;
+
+	if (rxe->attr.max_mcast_qp_attach == 0) {
+		err = -EINVAL;
+		goto err1;
+	}
+
+	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
+	if (grp)
+		goto done;
+
+	grp = rxe_alloc(&rxe->mc_grp_pool);
+	if (!grp) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	INIT_LIST_HEAD(&grp->qp_list);
+	spin_lock_init(&grp->mcg_lock);
+	grp->mlid = mlid;
+	grp->rxe = rxe;
+
+	err = rxe->ifc_ops->mcast_add(rxe, mgid);
+	if (err)
+		goto err2;
+
+	rxe_add_key(grp, mgid);
+done:
+	*grp_p = grp;
+	return 0;
+
+err2:
+	rxe_drop_ref(grp);
+err1:
+	return err;
+}
+
+int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+			   struct rxe_mc_grp *grp)
+{
+	int err;
+	struct rxe_mc_elem *elem;
+
+	/* check to see of the qp is already a member of the group */
+	spin_lock_bh(&qp->grp_lock);
+	spin_lock_bh(&grp->mcg_lock);
+	list_for_each_entry(elem, &grp->qp_list, qp_list) {
+		if (elem->qp == qp) {
+			err = 0;
+			goto out;
+		}
+	}
+
+	if (grp->num_qp >= rxe->attr.max_mcast_qp_attach) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	elem = rxe_alloc(&rxe->mc_elem_pool);
+	if (!elem) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	/* each qp holds a ref on the grp */
+	rxe_add_ref(grp);
+
+	grp->num_qp++;
+	elem->qp = qp;
+	elem->grp = grp;
+
+	list_add(&elem->qp_list, &grp->qp_list);
+	list_add(&elem->grp_list, &qp->grp_list);
+
+	err = 0;
+out:
+	spin_unlock_bh(&grp->mcg_lock);
+	spin_unlock_bh(&qp->grp_lock);
+	return err;
+}
+
+int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+			    union ib_gid *mgid, u16 mlid)
+{
+	struct rxe_mc_grp *grp;
+	struct rxe_mc_elem *elem, *tmp;
+
+	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
+	if (!grp)
+		goto err1;
+
+	spin_lock_bh(&qp->grp_lock);
+	spin_lock_bh(&grp->mcg_lock);
+
+	list_for_each_entry_safe(elem, tmp, &grp->qp_list, qp_list) {
+		if (elem->qp == qp) {
+			list_del(&elem->qp_list);
+			list_del(&elem->grp_list);
+			grp->num_qp--;
+
+			spin_unlock_bh(&grp->mcg_lock);
+			spin_unlock_bh(&qp->grp_lock);
+			rxe_drop_ref(elem);
+			rxe_drop_ref(grp);	/* ref held by QP */
+			rxe_drop_ref(grp);	/* ref from get_key */
+			return 0;
+		}
+	}
+
+	spin_unlock_bh(&grp->mcg_lock);
+	spin_unlock_bh(&qp->grp_lock);
+	rxe_drop_ref(grp);			/* ref from get_key */
+err1:
+	return -EINVAL;
+}
+
+void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
+{
+	struct rxe_mc_grp *grp;
+	struct rxe_mc_elem *elem;
+
+	while (1) {
+		spin_lock_bh(&qp->grp_lock);
+		if (list_empty(&qp->grp_list)) {
+			spin_unlock_bh(&qp->grp_lock);
+			break;
+		}
+		elem = list_first_entry(&qp->grp_list, struct rxe_mc_elem,
+					grp_list);
+		list_del(&elem->grp_list);
+		spin_unlock_bh(&qp->grp_lock);
+
+		grp = elem->grp;
+		spin_lock_bh(&grp->mcg_lock);
+		list_del(&elem->qp_list);
+		grp->num_qp--;
+		spin_unlock_bh(&grp->mcg_lock);
+		rxe_drop_ref(grp);
+		rxe_drop_ref(elem);
+	}
+}
+
+void rxe_mc_cleanup(void *arg)
+{
+	struct rxe_mc_grp *grp = arg;
+	struct rxe_dev *rxe = grp->rxe;
+
+	rxe_drop_key(grp);
+	rxe->ifc_ops->mcast_delete(rxe, &grp->mgid);
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 24/37] add rxe_recv.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (22 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 23/37] add rxe_mcast.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 25/37] add rxe_comp.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (13 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_recv.c --]
[-- Type: text/plain, Size: 11017 bytes --]

handles receiving new packets which are
sent to either request or response processing.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_recv.c |  425 +++++++++++++++++++++++++++++++++++
 1 file changed, 425 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_recv.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_recv.c
@@ -0,0 +1,425 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/skbuff.h>
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+static int check_type_state(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
+			    struct rxe_qp *qp)
+{
+	if (unlikely(!qp->valid))
+		goto err1;
+
+	switch (qp_type(qp)) {
+	case IB_QPT_RC:
+		if (unlikely((pkt->opcode >> 5) != 0)) {
+			pr_warn("bad qp type\n");
+			goto err1;
+		}
+		break;
+	case IB_QPT_UC:
+		if (unlikely((pkt->opcode >> 5) != 1)) {
+			pr_warn("bad qp type\n");
+			goto err1;
+		}
+		break;
+	case IB_QPT_UD:
+	case IB_QPT_SMI:
+	case IB_QPT_GSI:
+		if (unlikely((pkt->opcode >> 5) != 3)) {
+			pr_warn("bad qp type\n");
+			goto err1;
+		}
+		break;
+	default:
+		pr_warn("unsupported qp type\n");
+		goto err1;
+	}
+
+	if (pkt->mask & RXE_REQ_MASK) {
+		if (unlikely(qp->resp.state != QP_STATE_READY))
+			goto err1;
+	} else if (unlikely(qp->req.state < QP_STATE_READY ||
+			qp->req.state > QP_STATE_DRAINED))
+			goto err1;
+
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+static int check_keys(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
+		      u32 qpn, struct rxe_qp *qp)
+{
+	int i;
+	int found_pkey = 0;
+	struct rxe_port *port = &rxe->port[pkt->port_num - 1];
+	u16 pkey = bth_pkey(pkt);
+
+	pkt->pkey_index = 0;
+
+	if (qpn == 1) {
+		for (i = 0; i < port->attr.pkey_tbl_len; i++) {
+			if (pkey_match(pkey, port->pkey_tbl[i])) {
+				pkt->pkey_index = i;
+				found_pkey = 1;
+				break;
+			}
+		}
+
+		if (!found_pkey) {
+			pr_warn("bad pkey = 0x%x\n", pkey);
+			spin_lock_bh(&port->port_lock);
+			port->attr.bad_pkey_cntr
+				= (port->attr.bad_pkey_cntr >= 0xffff) ?
+				   0xffff :
+				   port->attr.bad_pkey_cntr + 1;
+			spin_unlock_bh(&port->port_lock);
+			goto err1;
+		}
+	} else if (qpn != 0) {
+		if (unlikely(!pkey_match(pkey,
+		    port->pkey_tbl[qp->attr.pkey_index]))) {
+			pr_warn("bad pkey = 0x%0x\n", pkey);
+			spin_lock_bh(&port->port_lock);
+			port->attr.bad_pkey_cntr
+				= (port->attr.bad_pkey_cntr >= 0xffff) ?
+				   0xffff :
+				   port->attr.bad_pkey_cntr + 1;
+			spin_unlock_bh(&port->port_lock);
+			goto err1;
+		}
+		pkt->pkey_index = qp->attr.pkey_index;
+	}
+
+	if (qp_type(qp) == IB_QPT_UD && qpn != 0 && pkt->mask) {
+		u32 qkey = (qpn == 1) ? GSI_QKEY : qp->attr.qkey;
+		if (unlikely(deth_qkey(pkt) != qkey)) {
+			pr_warn("bad qkey, got 0x%x expected 0x%x\n",
+				deth_qkey(pkt), qkey);
+			spin_lock_bh(&port->port_lock);
+			port->attr.qkey_viol_cntr
+				= (port->attr.qkey_viol_cntr >= 0xffff) ?
+				   0xffff :
+				   port->attr.qkey_viol_cntr + 1;
+			spin_unlock_bh(&port->port_lock);
+			goto err1;
+		}
+	}
+
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+static int check_addr(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
+		      struct rxe_qp *qp)
+{
+	struct rxe_port *port = &rxe->port[pkt->port_num - 1];
+	union ib_gid *sgid;
+	union ib_gid *dgid;
+
+	if (qp_type(qp) != IB_QPT_RC && qp_type(qp) != IB_QPT_UC)
+		goto done;
+
+	if (unlikely(pkt->port_num != qp->attr.port_num)) {
+		pr_warn("port %d != qp port %d\n",
+			 pkt->port_num, qp->attr.port_num);
+		goto err1;
+	}
+
+	if ((pkt->mask & RXE_GRH_MASK) == 0) {
+		if (unlikely(qp->pri_av.attr.ah_flags & IB_AH_GRH)) {
+			pr_warn("no grh for global qp\n");
+			goto err1;
+		} else
+			goto done;
+	}
+
+	sgid = grh_sgid(pkt);
+	dgid = grh_dgid(pkt);
+
+	if (unlikely((qp->pri_av.attr.ah_flags & IB_AH_GRH) == 0)) {
+		pr_warn("grh for local qp\n");
+		goto err1;
+	}
+
+	if (unlikely(dgid->global.subnet_prefix == 0 &&
+	    be64_to_cpu(dgid->global.interface_id) <= 1)) {
+		pr_warn("bad dgid, subnet_prefix = 0\n");
+		goto err1;
+	}
+
+	if (unlikely(sgid->raw[0] == 0xff)) {
+		pr_warn("bad sgid, multicast gid\n");
+		goto err1;
+	}
+
+	if (unlikely(sgid->global.subnet_prefix == 0 &&
+	    be64_to_cpu(sgid->global.interface_id) <= 1)) {
+		pr_warn("bad sgid, subnet prefix = 0 or 1\n");
+		goto err1;
+	}
+
+	if (unlikely(dgid->global.interface_id !=
+	    port->guid_tbl[qp->pri_av.attr.grh.sgid_index])) {
+		pr_warn("bad dgid, doesn't match qp\n");
+		goto err1;
+	}
+
+	if (unlikely(sgid->global.interface_id !=
+	    qp->pri_av.attr.grh.dgid.global.interface_id)) {
+		pr_warn("bad sgid, doesn't match qp\n");
+		goto err1;
+	}
+done:
+	return 0;
+
+err1:
+	return -EINVAL;
+}
+
+static int hdr_check(struct rxe_pkt_info *pkt)
+{
+	struct rxe_dev *rxe = pkt->rxe;
+	struct rxe_port *port = &rxe->port[pkt->port_num - 1];
+	struct rxe_qp *qp = NULL;
+	union ib_gid *dgid = NULL;
+	u32 qpn = bth_qpn(pkt);
+	int index;
+	int err;
+
+	if (unlikely(bth_tver(pkt) != BTH_TVER)) {
+		pr_warn("bad tver\n");
+		goto err1;
+	}
+
+	if (qpn != IB_MULTICAST_QPN) {
+		index = (qpn == 0) ? port->qp_smi_index :
+			((qpn == 1) ? port->qp_gsi_index : qpn);
+		qp = rxe_pool_get_index(&rxe->qp_pool, index);
+		if (unlikely(!qp)) {
+			pr_warn("no qp matches qpn 0x%x\n", qpn);
+			goto err1;
+		}
+
+		err = check_type_state(rxe, pkt, qp);
+		if (unlikely(err))
+			goto err2;
+	}
+
+	if (pkt->mask & RXE_GRH_MASK) {
+		dgid = grh_dgid(pkt);
+
+		if (unlikely(grh_next_hdr(pkt) != GRH_RXE_NEXT_HDR)) {
+			pr_warn("bad next hdr\n");
+			goto err2;
+		}
+
+		if (unlikely(grh_ipver(pkt) != GRH_IPV6)) {
+			pr_warn("bad ipver\n");
+			goto err2;
+		}
+	}
+
+	if (qpn != IB_MULTICAST_QPN) {
+		err = check_addr(rxe, pkt, qp);
+		if (unlikely(err))
+			goto err2;
+
+		err = check_keys(rxe, pkt, qpn, qp);
+		if (unlikely(err))
+			goto err2;
+	} else {
+		if (unlikely((pkt->mask & RXE_GRH_MASK) == 0)) {
+			pr_warn("no grh for mcast qpn\n");
+			goto err1;
+		}
+		if (unlikely(dgid->raw[0] != 0xff)) {
+			pr_warn("bad dgid for mcast qpn\n");
+			goto err1;
+		}
+	}
+
+	pkt->qp = qp;
+	return 0;
+
+err2:
+	if (qp)
+		rxe_drop_ref(qp);
+err1:
+	return -EINVAL;
+}
+
+static inline void rxe_rcv_pkt(struct rxe_dev *rxe,
+			       struct rxe_pkt_info *pkt,
+			       struct sk_buff *skb)
+{
+	if (pkt->mask & RXE_REQ_MASK)
+		rxe_resp_queue_pkt(rxe, pkt->qp, skb);
+	else
+		rxe_comp_queue_pkt(rxe, pkt->qp, skb);
+}
+
+static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
+{
+	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	struct rxe_mc_grp *mcg;
+	struct list_head *l;
+	struct sk_buff *skb_copy;
+	struct rxe_mc_elem *mce;
+	struct rxe_qp *qp;
+	int err;
+
+	/* lookup mcast group corresponding to mgid, takes a ref */
+	mcg = rxe_pool_get_key(&rxe->mc_grp_pool, grh_dgid(pkt));
+	if (!mcg)
+		goto err1;	/* mcast group not registered */
+
+	spin_lock_bh(&mcg->mcg_lock);
+
+	list_for_each(l, &mcg->qp_list) {
+		mce = container_of(l, struct rxe_mc_elem, qp_list);
+		qp = mce->qp;
+		pkt = SKB_TO_PKT(skb);
+
+		/* validate qp for incoming packet */
+		err = check_type_state(rxe, pkt, qp);
+		if (err)
+			continue;
+
+		err = check_keys(rxe, pkt, bth_qpn(pkt), qp);
+		if (err)
+			continue;
+
+		/* if *not* the last qp in the list
+		   make a copy of the skb to post to the next qp */
+		skb_copy = (l->next != &mcg->qp_list) ?
+				skb_clone(skb, GFP_KERNEL) : NULL;
+
+		pkt->qp = qp;
+		rxe_add_ref(qp);
+		rxe_rcv_pkt(rxe, pkt, skb);
+
+		skb = skb_copy;
+		if (!skb)
+			break;
+	}
+
+	spin_unlock_bh(&mcg->mcg_lock);
+
+	rxe_drop_ref(mcg);	/* drop ref from rxe_pool_get_key. */
+
+err1:
+	if (skb)
+		kfree_skb(skb);
+}
+
+/* rxe_rcv is called from the interface driver
+ * on entry
+ *	pkt->rxe	= rdma device
+ *	pkt->port_num	= rdma device port
+ * For rxe_net:
+ *	pkt->mask	= RXE_GRH_MASK
+ *	pkt->hdr	= &grh	with no lrh
+ * For IB transport (e.g. rxe_sample)
+ *	pkt->mask	= RXE_LRH_MASK
+ *	pkt->hdr	= &lrh with optional grh
+ */
+int rxe_rcv(struct sk_buff *skb)
+{
+	int err;
+	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	struct rxe_dev *rxe = pkt->rxe;
+
+	pkt->offset = 0;
+
+	if (pkt->mask & RXE_LRH_MASK) {
+		unsigned int length = __lrh_length(pkt->hdr);
+		unsigned int lnh = __lrh_lnh(pkt->hdr);
+
+		if (skb->len < RXE_LRH_BYTES)
+			goto drop;
+
+		if (lnh < LRH_LNH_IBA_LOC)
+			goto drop;
+
+		pkt->paylen = 4*length - RXE_LRH_BYTES;
+		pkt->offset += RXE_LRH_BYTES;
+
+		if (lnh == LRH_LNH_IBA_GBL)
+			pkt->mask |= RXE_GRH_MASK;
+	}
+
+	if (pkt->mask & RXE_GRH_MASK) {
+		if (skb->len < pkt->offset + RXE_GRH_BYTES)
+			goto drop;
+
+		pkt->paylen = __grh_paylen(pkt->hdr);
+		pkt->offset += RXE_GRH_BYTES;
+	}
+
+	if (unlikely(skb->len < pkt->offset + RXE_BTH_BYTES))
+		goto drop;
+
+	pkt->opcode = bth_opcode(pkt);
+	pkt->psn = bth_psn(pkt);
+	pkt->qp = NULL;
+	pkt->mask |= rxe_opcode[pkt->opcode].mask;
+
+	if (unlikely(skb->len < header_size(pkt)))
+		goto drop;
+
+	err = hdr_check(pkt);
+	if (unlikely(err))
+		goto drop;
+
+	if (unlikely(bth_qpn(pkt) == IB_MULTICAST_QPN))
+		rxe_rcv_mcast_pkt(rxe, skb);
+	else
+		rxe_rcv_pkt(rxe, pkt, skb);
+
+	return 0;
+
+drop:
+	if (pkt->qp)
+		rxe_drop_ref(pkt->qp);
+
+	kfree_skb(skb);
+	return 0;
+}
+EXPORT_SYMBOL(rxe_rcv);


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 25/37] add rxe_comp.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (23 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 24/37] add rxe_recv.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 26/37] add rxe_req.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (12 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_comp.c --]
[-- Type: text/plain, Size: 19494 bytes --]

completion processing.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_comp.c |  731 +++++++++++++++++++++++++++++++++++
 1 file changed, 731 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_comp.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_comp.c
@@ -0,0 +1,731 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/skbuff.h>
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+#include "rxe_task.h"
+
+enum comp_state {
+	COMPST_GET_ACK,
+	COMPST_GET_WQE,
+	COMPST_COMP_WQE,
+	COMPST_COMP_ACK,
+	COMPST_CHECK_PSN,
+	COMPST_CHECK_ACK,
+	COMPST_READ,
+	COMPST_ATOMIC,
+	COMPST_WRITE_SEND,
+	COMPST_UPDATE_COMP,
+	COMPST_ERROR_RETRY,
+	COMPST_RNR_RETRY,
+	COMPST_ERROR,
+	COMPST_EXIT,
+	COMPST_DONE,
+};
+
+static char *comp_state_name[] =  {
+	[COMPST_GET_ACK]		= "GET ACK",
+	[COMPST_GET_WQE]		= "GET WQE",
+	[COMPST_COMP_WQE]		= "COMP WQE",
+	[COMPST_COMP_ACK]		= "COMP ACK",
+	[COMPST_CHECK_PSN]		= "CHECK PSN",
+	[COMPST_CHECK_ACK]		= "CHECK ACK",
+	[COMPST_READ]			= "READ",
+	[COMPST_ATOMIC]			= "ATOMIC",
+	[COMPST_WRITE_SEND]		= "WRITE/SEND",
+	[COMPST_UPDATE_COMP]		= "UPDATE COMP",
+	[COMPST_ERROR_RETRY]		= "ERROR RETRY",
+	[COMPST_RNR_RETRY]		= "RNR RETRY",
+	[COMPST_ERROR]			= "ERROR",
+	[COMPST_EXIT]			= "EXIT",
+	[COMPST_DONE]			= "DONE",
+};
+
+static unsigned long rnrnak_usec[32] = {
+	[IB_RNR_TIMER_655_36] = 655360,
+	[IB_RNR_TIMER_000_01] = 10,
+	[IB_RNR_TIMER_000_02] = 20,
+	[IB_RNR_TIMER_000_03] = 30,
+	[IB_RNR_TIMER_000_04] = 40,
+	[IB_RNR_TIMER_000_06] = 60,
+	[IB_RNR_TIMER_000_08] = 80,
+	[IB_RNR_TIMER_000_12] = 120,
+	[IB_RNR_TIMER_000_16] = 160,
+	[IB_RNR_TIMER_000_24] = 240,
+	[IB_RNR_TIMER_000_32] = 320,
+	[IB_RNR_TIMER_000_48] = 480,
+	[IB_RNR_TIMER_000_64] = 640,
+	[IB_RNR_TIMER_000_96] = 960,
+	[IB_RNR_TIMER_001_28] = 1280,
+	[IB_RNR_TIMER_001_92] = 1920,
+	[IB_RNR_TIMER_002_56] = 2560,
+	[IB_RNR_TIMER_003_84] = 3840,
+	[IB_RNR_TIMER_005_12] = 5120,
+	[IB_RNR_TIMER_007_68] = 7680,
+	[IB_RNR_TIMER_010_24] = 10240,
+	[IB_RNR_TIMER_015_36] = 15360,
+	[IB_RNR_TIMER_020_48] = 20480,
+	[IB_RNR_TIMER_030_72] = 30720,
+	[IB_RNR_TIMER_040_96] = 40960,
+	[IB_RNR_TIMER_061_44] = 61410,
+	[IB_RNR_TIMER_081_92] = 81920,
+	[IB_RNR_TIMER_122_88] = 122880,
+	[IB_RNR_TIMER_163_84] = 163840,
+	[IB_RNR_TIMER_245_76] = 245760,
+	[IB_RNR_TIMER_327_68] = 327680,
+	[IB_RNR_TIMER_491_52] = 491520,
+};
+
+static inline unsigned long rnrnak_jiffies(u8 timeout)
+{
+	return max_t(unsigned long,
+		usecs_to_jiffies(rnrnak_usec[timeout]), 1);
+}
+
+static enum ib_wc_opcode wr_to_wc_opcode(enum ib_wr_opcode opcode)
+{
+	switch (opcode) {
+	case IB_WR_RDMA_WRITE:			return IB_WC_RDMA_WRITE;
+	case IB_WR_RDMA_WRITE_WITH_IMM:		return IB_WC_RDMA_WRITE;
+	case IB_WR_SEND:			return IB_WC_SEND;
+	case IB_WR_SEND_WITH_IMM:		return IB_WC_SEND;
+	case IB_WR_RDMA_READ:			return IB_WC_RDMA_READ;
+	case IB_WR_ATOMIC_CMP_AND_SWP:		return IB_WC_COMP_SWAP;
+	case IB_WR_ATOMIC_FETCH_AND_ADD:	return IB_WC_FETCH_ADD;
+	case IB_WR_LSO:				return IB_WC_LSO;
+	case IB_WR_SEND_WITH_INV:		return IB_WC_SEND;
+	case IB_WR_RDMA_READ_WITH_INV:		return IB_WC_RDMA_READ;
+	case IB_WR_LOCAL_INV:			return IB_WC_LOCAL_INV;
+	case IB_WR_FAST_REG_MR:			return IB_WC_FAST_REG_MR;
+
+	default:
+		return 0xff;
+	}
+}
+
+void retransmit_timer(unsigned long data)
+{
+	struct rxe_qp *qp = (struct rxe_qp *)data;
+
+	if (qp->valid) {
+		qp->comp.timeout = 1;
+		rxe_run_task(&qp->comp.task, 1);
+	}
+}
+
+void rxe_comp_queue_pkt(struct rxe_dev *rxe, struct rxe_qp *qp,
+			struct sk_buff *skb)
+{
+	int must_sched;
+
+	atomic_inc(&rxe->resp_skb_in);
+	skb_queue_tail(&qp->resp_pkts, skb);
+
+	must_sched = skb_queue_len(&qp->resp_pkts) > 1;
+	rxe_run_task(&qp->comp.task, must_sched);
+}
+
+static inline enum comp_state get_wqe(struct rxe_qp *qp,
+				      struct rxe_pkt_info *pkt,
+				      struct rxe_send_wqe **wqe_p)
+{
+	struct rxe_send_wqe *wqe;
+
+	/* we come here whether or not we found a response packet
+	   to see if there are any posted WQEs */
+	wqe = queue_head(qp->sq.queue);
+	*wqe_p = wqe;
+
+	/* no WQE or requester has not started it yet */
+	if (!wqe || wqe->state == wqe_state_posted)
+		return pkt ? COMPST_DONE : COMPST_EXIT;
+
+	/* WQE does not require an ack */
+	if (wqe->state == wqe_state_done)
+		return COMPST_COMP_WQE;
+
+	/* WQE caused an error */
+	if (wqe->state == wqe_state_error)
+		return COMPST_ERROR;
+
+	/* we have a WQE, if we also have an ack check its PSN */
+	return pkt ? COMPST_CHECK_PSN : COMPST_EXIT;
+}
+
+static inline void reset_retry_counters(struct rxe_qp *qp)
+{
+	qp->comp.retry_cnt = qp->attr.retry_cnt;
+	qp->comp.rnr_retry = qp->attr.rnr_retry;
+}
+
+static inline enum comp_state check_psn(struct rxe_qp *qp,
+					struct rxe_pkt_info *pkt,
+					struct rxe_send_wqe *wqe)
+{
+	s32 diff;
+
+	/* check to see if response is past the oldest WQE
+	   if it is, complete send/write or error read/atomic */
+	diff = psn_compare(pkt->psn, wqe->last_psn);
+	if (diff > 0) {
+		if (wqe->state == wqe_state_pending) {
+			if (wqe->mask & WR_ATOMIC_OR_READ_MASK)
+				return COMPST_ERROR_RETRY;
+			else {
+				reset_retry_counters(qp);
+				return COMPST_COMP_WQE;
+			}
+		} else
+			return COMPST_DONE;
+	}
+
+	/* compare response packet to expected response */
+	diff = psn_compare(pkt->psn, qp->comp.psn);
+	if (diff < 0) {
+		/* response is most likely a retried packet
+		   if it matches an uncompleted WQE go complete it
+		   else ignore it */
+		if (pkt->psn == wqe->last_psn)
+			return COMPST_COMP_ACK;
+		else
+			return COMPST_DONE;
+	} else if ((diff > 0) && (wqe->mask & WR_ATOMIC_OR_READ_MASK))
+		return COMPST_ERROR_RETRY;
+	else
+		return COMPST_CHECK_ACK;
+}
+
+static inline enum comp_state check_ack(struct rxe_qp *qp,
+					struct rxe_pkt_info *pkt,
+					struct rxe_send_wqe *wqe)
+{
+	unsigned int mask = pkt->mask;
+	u8 syn;
+
+	/* Check the sequence only */
+	switch (qp->comp.opcode) {
+	case -1:
+		/* Will catch all *_ONLY cases. */
+		if (!(mask & RXE_START_MASK))
+			/* TODO check spec. retry/discard ? */
+			return COMPST_ERROR;
+
+		break;
+
+	case IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST:
+	case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
+		if (pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE &&
+			pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST) {
+			/* TODO check spec. retry/discard ? */
+			return COMPST_ERROR;
+		}
+		break;
+	default:
+		WARN_ON(1);
+	}
+
+	/* Check operation validity. */
+	switch (pkt->opcode) {
+	case IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST:
+	case IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST:
+	case IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY:
+		syn = aeth_syn(pkt);
+
+		if ((syn & AETH_TYPE_MASK) != AETH_ACK)
+			return COMPST_ERROR;
+
+		/* Fall through (IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE
+		 * doesn't have an AETH) */
+	case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
+		if (wqe->ibwr.opcode != IB_WR_RDMA_READ &&
+			wqe->ibwr.opcode != IB_WR_RDMA_READ_WITH_INV) {
+			/* TODO check spec. retry/discard ? */
+			return COMPST_ERROR;
+		}
+		reset_retry_counters(qp);
+		return COMPST_READ;
+		break;
+
+	case IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE:
+		syn = aeth_syn(pkt);
+
+		if ((syn & AETH_TYPE_MASK) != AETH_ACK)
+			return COMPST_ERROR;
+
+		if (wqe->ibwr.opcode != IB_WR_ATOMIC_CMP_AND_SWP &&
+			wqe->ibwr.opcode != IB_WR_ATOMIC_FETCH_AND_ADD)
+			/* TODO check spec. retry/discard ? */
+			return COMPST_ERROR;
+		reset_retry_counters(qp);
+		return COMPST_ATOMIC;
+		break;
+
+	case IB_OPCODE_RC_ACKNOWLEDGE:
+		syn = aeth_syn(pkt);
+		switch (syn & AETH_TYPE_MASK) {
+		case AETH_ACK:
+			reset_retry_counters(qp);
+			return COMPST_WRITE_SEND;
+
+		case AETH_RNR_NAK:
+			return COMPST_RNR_RETRY;
+
+		case AETH_NAK:
+			switch (syn) {
+			case AETH_NAK_PSN_SEQ_ERROR:
+				/* a nak implicitly acks all
+				   packets with psns before */
+				if (psn_compare(pkt->psn, qp->comp.psn) > 0) {
+					qp->comp.psn = pkt->psn;
+					if (qp->req.wait_psn) {
+						qp->req.wait_psn = 0;
+						rxe_run_task(&qp->req.task, 1);
+					}
+				}
+				return COMPST_ERROR_RETRY;
+
+			case AETH_NAK_INVALID_REQ:
+				wqe->status = IB_WC_REM_INV_REQ_ERR;
+				return COMPST_ERROR;
+
+			case AETH_NAK_REM_ACC_ERR:
+				wqe->status = IB_WC_REM_ACCESS_ERR;
+				return COMPST_ERROR;
+
+			case AETH_NAK_REM_OP_ERR:
+				wqe->status = IB_WC_REM_OP_ERR;
+				return COMPST_ERROR;
+
+			default:
+				pr_warn("unexpected nak %x\n", syn);
+				wqe->status = IB_WC_REM_OP_ERR;
+				return COMPST_ERROR;
+			}
+			return COMPST_ERROR;
+
+		default:
+			return COMPST_ERROR;
+
+		}
+		break;
+
+	default:
+		WARN_ON(1);
+	}
+
+	WARN_ON(1);
+	return COMPST_ERROR;
+}
+
+static inline enum comp_state do_read(struct rxe_qp *qp,
+				      struct rxe_pkt_info *pkt,
+				      struct rxe_send_wqe *wqe)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	int ret;
+
+	ret = copy_data(rxe, qp->pd, IB_ACCESS_LOCAL_WRITE,
+			&wqe->dma, payload_addr(pkt),
+			payload_size(pkt), direction_in, NULL);
+	if (ret)
+		return COMPST_ERROR;
+
+	if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
+		return COMPST_COMP_ACK;
+	else
+		return COMPST_UPDATE_COMP;
+}
+
+static inline enum comp_state do_atomic(struct rxe_qp *qp,
+					struct rxe_pkt_info *pkt,
+					struct rxe_send_wqe *wqe)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	int ret;
+
+	u64 atomic_orig = atmack_orig(pkt);
+
+	ret = copy_data(rxe, qp->pd, IB_ACCESS_LOCAL_WRITE,
+			&wqe->dma, &atomic_orig,
+			sizeof(u64), direction_in, NULL);
+	if (ret)
+		return COMPST_ERROR;
+	else
+		return COMPST_COMP_ACK;
+}
+
+static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
+			  struct rxe_cqe *cqe)
+{
+	memset(cqe, 0, sizeof(*cqe));
+
+	if (!qp->is_user) {
+		struct ib_wc		*wc	= &cqe->ibwc;
+
+		wc->wr_id		= wqe->ibwr.wr_id;
+		wc->status		= wqe->status;
+		wc->opcode		= wr_to_wc_opcode(wqe->ibwr.opcode);
+		wc->byte_len		= wqe->dma.length;
+		wc->qp			= &qp->ibqp;
+	} else {
+		struct ib_uverbs_wc	*uwc	= &cqe->uibwc;
+
+		uwc->wr_id		= wqe->ibwr.wr_id;
+		uwc->status		= wqe->status;
+		uwc->opcode		= wr_to_wc_opcode(wqe->ibwr.opcode);
+		uwc->byte_len		= wqe->dma.length;
+		uwc->qp_num		= qp->ibqp.qp_num;
+	}
+}
+
+static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
+{
+	struct rxe_cqe cqe;
+
+	if ((qp->sq_sig_type == IB_SIGNAL_ALL_WR) ||
+	    (wqe->ibwr.send_flags & IB_SEND_SIGNALED)) {
+		make_send_cqe(qp, wqe, &cqe);
+		rxe_cq_post(qp->scq, &cqe, 0);
+	}
+
+	advance_consumer(qp->sq.queue);
+
+	/*
+	 * we completed something so let req run again
+	 * if it is trying to fence
+	 */
+	if (qp->req.wait_fence) {
+		qp->req.wait_fence = 0;
+		rxe_run_task(&qp->req.task, 1);
+	}
+}
+
+static inline enum comp_state complete_ack(struct rxe_qp *qp,
+					   struct rxe_pkt_info *pkt,
+					   struct rxe_send_wqe *wqe)
+{
+	unsigned long flags;
+
+	if (wqe->has_rd_atomic) {
+		wqe->has_rd_atomic = 0;
+		atomic_inc(&qp->req.rd_atomic);
+		if (qp->req.need_rd_atomic) {
+			qp->comp.timeout_retry = 0;
+			qp->req.need_rd_atomic = 0;
+			rxe_run_task(&qp->req.task, 1);
+		}
+	}
+
+	if (unlikely(qp->req.state == QP_STATE_DRAIN)) {
+		/* state_lock used by requester & completer */
+		spin_lock_irqsave(&qp->state_lock, flags);
+		if ((qp->req.state == QP_STATE_DRAIN) &&
+		    (qp->comp.psn == qp->req.psn)) {
+			qp->req.state = QP_STATE_DRAINED;
+			spin_unlock_irqrestore(&qp->state_lock, flags);
+
+			if (qp->ibqp.event_handler) {
+				struct ib_event ev;
+
+				ev.device = qp->ibqp.device;
+				ev.element.qp = &qp->ibqp;
+				ev.event = IB_EVENT_SQ_DRAINED;
+				qp->ibqp.event_handler(&ev,
+					qp->ibqp.qp_context);
+			}
+		} else
+			spin_unlock_irqrestore(&qp->state_lock, flags);
+	}
+
+	do_complete(qp, wqe);
+
+	if (psn_compare(pkt->psn, qp->comp.psn) >= 0)
+		return COMPST_UPDATE_COMP;
+	else
+		return COMPST_DONE;
+}
+
+static inline enum comp_state complete_wqe(struct rxe_qp *qp,
+					   struct rxe_pkt_info *pkt,
+					   struct rxe_send_wqe *wqe)
+{
+	qp->comp.opcode = -1;
+
+	if (pkt) {
+		if (psn_compare(pkt->psn, qp->comp.psn) >= 0)
+			qp->comp.psn = (pkt->psn + 1) & BTH_PSN_MASK;
+
+		if (qp->req.wait_psn) {
+			qp->req.wait_psn = 0;
+			rxe_run_task(&qp->req.task, 1);
+		}
+	}
+
+	do_complete(qp, wqe);
+
+	return COMPST_GET_WQE;
+}
+
+int rxe_completer(void *arg)
+{
+	struct rxe_qp *qp = (struct rxe_qp *)arg;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	struct rxe_send_wqe *wqe = wqe;
+	struct sk_buff *skb = NULL;
+	struct rxe_pkt_info *pkt = NULL;
+	enum comp_state state;
+
+	if (!qp->valid) {
+		while ((skb = skb_dequeue(&qp->resp_pkts))) {
+			rxe_drop_ref(qp);
+			kfree_skb(skb);
+			atomic_dec(&rxe->resp_skb_in);
+		}
+		skb = NULL;
+		pkt = NULL;
+
+		while (queue_head(qp->sq.queue))
+			advance_consumer(qp->sq.queue);
+
+		goto exit;
+	}
+
+	if (qp->req.state == QP_STATE_ERROR) {
+		while ((skb = skb_dequeue(&qp->resp_pkts))) {
+			rxe_drop_ref(qp);
+			kfree_skb(skb);
+			atomic_dec(&rxe->resp_skb_in);
+		}
+		skb = NULL;
+		pkt = NULL;
+
+		while ((wqe = queue_head(qp->sq.queue))) {
+			wqe->status = IB_WC_WR_FLUSH_ERR;
+			do_complete(qp, wqe);
+		}
+
+		goto exit;
+	}
+
+	if (qp->req.state == QP_STATE_RESET) {
+		while ((skb = skb_dequeue(&qp->resp_pkts))) {
+			rxe_drop_ref(qp);
+			kfree_skb(skb);
+			atomic_dec(&rxe->resp_skb_in);
+		}
+		skb = NULL;
+		pkt = NULL;
+
+		while (queue_head(qp->sq.queue))
+			advance_consumer(qp->sq.queue);
+
+		goto exit;
+	}
+
+	if (qp->comp.timeout) {
+		qp->comp.timeout_retry = 1;
+		qp->comp.timeout = 0;
+	} else
+		qp->comp.timeout_retry = 0;
+
+	if (qp->req.need_retry)
+		goto exit;
+
+	state = COMPST_GET_ACK;
+
+	while (1) {
+		pr_debug("state = %s\n", comp_state_name[state]);
+		switch (state) {
+		case COMPST_GET_ACK:
+			skb = skb_dequeue(&qp->resp_pkts);
+			if (skb) {
+				pkt = SKB_TO_PKT(skb);
+				qp->comp.timeout_retry = 0;
+			}
+			state = COMPST_GET_WQE;
+			break;
+
+		case COMPST_GET_WQE:
+			state = get_wqe(qp, pkt, &wqe);
+			break;
+
+		case COMPST_CHECK_PSN:
+			state = check_psn(qp, pkt, wqe);
+			break;
+
+		case COMPST_CHECK_ACK:
+			state = check_ack(qp, pkt, wqe);
+			break;
+
+		case COMPST_READ:
+			state = do_read(qp, pkt, wqe);
+			break;
+
+		case COMPST_ATOMIC:
+			state = do_atomic(qp, pkt, wqe);
+			break;
+
+		case COMPST_WRITE_SEND:
+			if (wqe->state == wqe_state_pending &&
+			    wqe->last_psn == pkt->psn)
+				state = COMPST_COMP_ACK;
+			else
+				state = COMPST_UPDATE_COMP;
+			break;
+
+		case COMPST_COMP_ACK:
+			state = complete_ack(qp, pkt, wqe);
+			break;
+
+		case COMPST_COMP_WQE:
+			state = complete_wqe(qp, pkt, wqe);
+			break;
+
+		case COMPST_UPDATE_COMP:
+			if (pkt->mask & RXE_END_MASK)
+				qp->comp.opcode = -1;
+			else
+				qp->comp.opcode = pkt->opcode;
+
+			if (psn_compare(pkt->psn, qp->comp.psn) >= 0)
+				qp->comp.psn = (pkt->psn + 1) & BTH_PSN_MASK;
+
+			if (qp->req.wait_psn) {
+				qp->req.wait_psn = 0;
+				rxe_run_task(&qp->req.task, 1);
+			}
+
+			state = COMPST_DONE;
+			break;
+
+		case COMPST_DONE:
+			if (pkt) {
+				rxe_drop_ref(pkt->qp);
+				kfree_skb(skb);
+				atomic_dec(&rxe->resp_skb_in);
+			}
+			goto done;
+
+		case COMPST_EXIT:
+			if (qp->comp.timeout_retry && wqe) {
+				state = COMPST_ERROR_RETRY;
+				break;
+			}
+
+			/* re reset the timeout counter if
+			   (1) QP is type RC
+			   (2) the QP is alive
+			   (3) there is a packet sent by the requester that
+			       might be acked (we still might get spurious
+			       timeouts but try to keep them as few as possible)
+			   (4) the timeout parameter is set */
+			if ((qp_type(qp) == IB_QPT_RC)
+			    && (qp->req.state == QP_STATE_READY)
+			    && (psn_compare(qp->req.psn, qp->comp.psn) > 0)
+			    && qp->qp_timeout_jiffies)
+				mod_timer(&qp->retrans_timer,
+					  jiffies + qp->qp_timeout_jiffies);
+			goto exit;
+
+		case COMPST_ERROR_RETRY:
+			/* we come here if the retry timer fired
+			   and we did not receive a response packet
+			   try to retry the send queue if that makes
+			   sense and the limits have not been exceeded
+			   remember that some timeouts are spurious
+			   since we do not reset the timer but kick it
+			   down the road or let it expire */
+
+			/* there is nothing to retry in this case */
+			if (!wqe || (wqe->state == wqe_state_posted))
+				goto exit;
+
+			if (qp->comp.retry_cnt > 0) {
+				if (qp->comp.retry_cnt != 7)
+					qp->comp.retry_cnt--;
+
+				/* no point in retrying if we have already
+				   seen the last ack that the requester
+				   could have caused */
+				if (psn_compare(qp->req.psn,
+						qp->comp.psn) > 0) {
+					/* tell the requester to retry the send
+					   send queue next time around */
+					qp->req.need_retry = 1;
+					rxe_run_task(&qp->req.task, 1);
+				}
+				goto exit;
+			} else {
+				wqe->status = IB_WC_RETRY_EXC_ERR;
+				state = COMPST_ERROR;
+			}
+			break;
+
+		case COMPST_RNR_RETRY:
+			if (qp->comp.rnr_retry > 0) {
+				if (qp->comp.rnr_retry != 7)
+					qp->comp.rnr_retry--;
+
+				qp->req.need_retry = 1;
+				pr_debug("set rnr nak timer\n");
+				mod_timer(&qp->rnr_nak_timer,
+					  jiffies + rnrnak_jiffies(aeth_syn(pkt)
+						& ~AETH_TYPE_MASK));
+				goto exit;
+			} else {
+				wqe->status = IB_WC_RNR_RETRY_EXC_ERR;
+				state = COMPST_ERROR;
+			}
+			break;
+
+		case COMPST_ERROR:
+			do_complete(qp, wqe);
+			rxe_qp_error(qp);
+			goto exit;
+		}
+	}
+
+exit:
+	/* we come here if we are done with processing and want the
+	   task to exit from the loop calling us */
+	return -EAGAIN;
+
+done:
+	/* we come here if we have processed a packet
+	   we want the task to call us again to see
+	   if there is anything else to do */
+	return 0;
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 26/37] add rxe_req.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (24 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 25/37] add rxe_comp.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.415319539-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 27/37] add rxe_resp.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (11 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_req.c --]
[-- Type: text/plain, Size: 19140 bytes --]

QP request logic.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_req.c |  711 ++++++++++++++++++++++++++++++++++++
 1 file changed, 711 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_req.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_req.c
@@ -0,0 +1,711 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/skbuff.h>
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+static int next_opcode(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
+		       unsigned opcode);
+
+static inline void retry_first_write_send(struct rxe_qp *qp,
+					  struct rxe_send_wqe *wqe,
+					  unsigned mask, int npsn)
+{
+	int i;
+
+	for (i = 0; i < npsn; i++) {
+		int to_send = (wqe->dma.resid > qp->mtu) ?
+				qp->mtu : wqe->dma.resid;
+
+		qp->req.opcode = next_opcode(qp, wqe,
+					     wqe->ibwr.opcode);
+
+		if (wqe->ibwr.send_flags & IB_SEND_INLINE) {
+			wqe->dma.resid -= to_send;
+			wqe->dma.sge_offset += to_send;
+		} else
+			advance_dma_data(&wqe->dma, to_send);
+
+		if (mask & WR_WRITE_MASK)
+			wqe->iova += qp->mtu;
+	}
+}
+
+static void req_retry(struct rxe_qp *qp)
+{
+	struct rxe_send_wqe *wqe;
+	unsigned int wqe_index;
+	unsigned int mask;
+	int npsn;
+	int first = 1;
+
+	wqe = queue_head(qp->sq.queue);
+	npsn = (qp->comp.psn - wqe->first_psn) & BTH_PSN_MASK;
+
+	qp->req.wqe_index	= consumer_index(qp->sq.queue);
+	qp->req.psn		= qp->comp.psn;
+	qp->req.opcode		= -1;
+
+	for (wqe_index = consumer_index(qp->sq.queue);
+		wqe_index != producer_index(qp->sq.queue);
+		wqe_index = next_index(qp->sq.queue, wqe_index)) {
+
+		wqe = addr_from_index(qp->sq.queue, wqe_index);
+		mask = wr_opcode_mask(wqe->ibwr.opcode, qp);
+
+		if (wqe->state == wqe_state_posted)
+			break;
+
+		if (wqe->state == wqe_state_done)
+			continue;
+
+		wqe->iova = (mask & WR_ATOMIC_MASK) ?
+			wqe->ibwr.wr.atomic.remote_addr :
+			wqe->ibwr.wr.rdma.remote_addr;
+
+		if (!first || (mask & WR_READ_MASK) == 0) {
+			wqe->dma.resid = wqe->dma.length;
+			wqe->dma.cur_sge = 0;
+			wqe->dma.sge_offset = 0;
+		}
+
+		if (first) {
+			first = 0;
+
+			if (mask & WR_WRITE_OR_SEND_MASK)
+				retry_first_write_send(qp, wqe, mask, npsn);
+
+			if (mask & WR_READ_MASK)
+				wqe->iova += npsn*qp->mtu;
+		}
+
+		wqe->state = wqe_state_posted;
+	}
+}
+
+void rnr_nak_timer(unsigned long data)
+{
+	struct rxe_qp *qp = (struct rxe_qp *)data;
+	pr_debug("rnr nak timer fired\n");
+	rxe_run_task(&qp->req.task, 1);
+}
+
+static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp)
+{
+	struct rxe_send_wqe *wqe = queue_head(qp->sq.queue);
+	unsigned long flags;
+
+	if (unlikely(qp->req.state == QP_STATE_DRAIN)) {
+		/* check to see if we are drained;
+		 * state_lock used by requester and completer */
+		spin_lock_irqsave(&qp->state_lock, flags);
+		do {
+			if (qp->req.state != QP_STATE_DRAIN) {
+				/* comp just finished */
+				spin_unlock_irqrestore(&qp->state_lock,
+						       flags);
+				break;
+			}
+
+			if (wqe && ((qp->req.wqe_index !=
+				consumer_index(qp->sq.queue)) ||
+				(wqe->state != wqe_state_posted))) {
+				/* comp not done yet */
+				spin_unlock_irqrestore(&qp->state_lock,
+							   flags);
+				break;
+			}
+
+			qp->req.state = QP_STATE_DRAINED;
+			spin_unlock_irqrestore(&qp->state_lock, flags);
+
+			if (qp->ibqp.event_handler) {
+				struct ib_event ev;
+
+				ev.device = qp->ibqp.device;
+				ev.element.qp = &qp->ibqp;
+				ev.event = IB_EVENT_SQ_DRAINED;
+				qp->ibqp.event_handler(&ev,
+					qp->ibqp.qp_context);
+			}
+		} while (0);
+	}
+
+	if (qp->req.wqe_index == producer_index(qp->sq.queue))
+		return NULL;
+
+	wqe = addr_from_index(qp->sq.queue, qp->req.wqe_index);
+
+	if (qp->req.state == QP_STATE_DRAIN ||
+	    qp->req.state == QP_STATE_DRAINED)
+		if (wqe->state != wqe_state_processing)
+			return NULL;
+
+	if ((wqe->ibwr.send_flags & IB_SEND_FENCE) &&
+	    (qp->req.wqe_index != consumer_index(qp->sq.queue))) {
+		qp->req.wait_fence = 1;
+		return NULL;
+	}
+
+	wqe->mask = wr_opcode_mask(wqe->ibwr.opcode, qp);
+	return wqe;
+}
+
+static int next_opcode_rc(struct rxe_qp *qp, unsigned opcode, int fits)
+{
+	switch (opcode) {
+	case IB_WR_RDMA_WRITE:
+		if (qp->req.opcode == IB_OPCODE_RC_RDMA_WRITE_FIRST ||
+		    qp->req.opcode == IB_OPCODE_RC_RDMA_WRITE_MIDDLE)
+			return fits ?
+				IB_OPCODE_RC_RDMA_WRITE_LAST :
+				IB_OPCODE_RC_RDMA_WRITE_MIDDLE;
+		else
+			return fits ?
+				IB_OPCODE_RC_RDMA_WRITE_ONLY :
+				IB_OPCODE_RC_RDMA_WRITE_FIRST;
+
+	case IB_WR_RDMA_WRITE_WITH_IMM:
+		if (qp->req.opcode == IB_OPCODE_RC_RDMA_WRITE_FIRST ||
+		    qp->req.opcode == IB_OPCODE_RC_RDMA_WRITE_MIDDLE)
+			return fits ?
+				IB_OPCODE_RC_RDMA_WRITE_LAST_WITH_IMMEDIATE :
+				IB_OPCODE_RC_RDMA_WRITE_MIDDLE;
+		else
+			return fits ?
+				IB_OPCODE_RC_RDMA_WRITE_ONLY_WITH_IMMEDIATE :
+				IB_OPCODE_RC_RDMA_WRITE_FIRST;
+
+	case IB_WR_SEND:
+		if (qp->req.opcode == IB_OPCODE_RC_SEND_FIRST ||
+		    qp->req.opcode == IB_OPCODE_RC_SEND_MIDDLE)
+			return fits ?
+				IB_OPCODE_RC_SEND_LAST :
+				IB_OPCODE_RC_SEND_MIDDLE;
+		else
+			return fits ?
+				IB_OPCODE_RC_SEND_ONLY :
+				IB_OPCODE_RC_SEND_FIRST;
+
+	case IB_WR_SEND_WITH_IMM:
+		if (qp->req.opcode == IB_OPCODE_RC_SEND_FIRST ||
+		    qp->req.opcode == IB_OPCODE_RC_SEND_MIDDLE)
+			return fits ?
+				IB_OPCODE_RC_SEND_LAST_WITH_IMMEDIATE :
+				IB_OPCODE_RC_SEND_MIDDLE;
+		else
+			return fits ?
+				IB_OPCODE_RC_SEND_ONLY_WITH_IMMEDIATE :
+				IB_OPCODE_RC_SEND_FIRST;
+
+	case IB_WR_RDMA_READ:
+		return IB_OPCODE_RC_RDMA_READ_REQUEST;
+
+	case IB_WR_ATOMIC_CMP_AND_SWP:
+		return IB_OPCODE_RC_COMPARE_SWAP;
+
+	case IB_WR_ATOMIC_FETCH_AND_ADD:
+		return IB_OPCODE_RC_FETCH_ADD;
+
+	case IB_WR_SEND_WITH_INV:
+		if (qp->req.opcode == IB_OPCODE_RC_SEND_FIRST ||
+		    qp->req.opcode == IB_OPCODE_RC_SEND_MIDDLE)
+			return fits ? IB_OPCODE_RC_SEND_LAST_INV :
+				IB_OPCODE_RC_SEND_MIDDLE;
+		else
+			return fits ? IB_OPCODE_RC_SEND_ONLY_INV :
+				IB_OPCODE_RC_SEND_FIRST;
+	}
+
+	return -EINVAL;
+}
+
+static int next_opcode_uc(struct rxe_qp *qp, unsigned opcode, int fits)
+{
+	switch (opcode) {
+	case IB_WR_RDMA_WRITE:
+		if (qp->req.opcode == IB_OPCODE_UC_RDMA_WRITE_FIRST ||
+		    qp->req.opcode == IB_OPCODE_UC_RDMA_WRITE_MIDDLE)
+			return fits ?
+				IB_OPCODE_UC_RDMA_WRITE_LAST :
+				IB_OPCODE_UC_RDMA_WRITE_MIDDLE;
+		else
+			return fits ?
+				IB_OPCODE_UC_RDMA_WRITE_ONLY :
+				IB_OPCODE_UC_RDMA_WRITE_FIRST;
+
+	case IB_WR_RDMA_WRITE_WITH_IMM:
+		if (qp->req.opcode == IB_OPCODE_UC_RDMA_WRITE_FIRST ||
+		    qp->req.opcode == IB_OPCODE_UC_RDMA_WRITE_MIDDLE)
+			return fits ?
+				IB_OPCODE_UC_RDMA_WRITE_LAST_WITH_IMMEDIATE :
+				IB_OPCODE_UC_RDMA_WRITE_MIDDLE;
+		else
+			return fits ?
+				IB_OPCODE_UC_RDMA_WRITE_ONLY_WITH_IMMEDIATE :
+				IB_OPCODE_UC_RDMA_WRITE_FIRST;
+
+	case IB_WR_SEND:
+		if (qp->req.opcode == IB_OPCODE_UC_SEND_FIRST ||
+		    qp->req.opcode == IB_OPCODE_UC_SEND_MIDDLE)
+			return fits ?
+				IB_OPCODE_UC_SEND_LAST :
+				IB_OPCODE_UC_SEND_MIDDLE;
+		else
+			return fits ?
+				IB_OPCODE_UC_SEND_ONLY :
+				IB_OPCODE_UC_SEND_FIRST;
+
+	case IB_WR_SEND_WITH_IMM:
+		if (qp->req.opcode == IB_OPCODE_UC_SEND_FIRST ||
+		    qp->req.opcode == IB_OPCODE_UC_SEND_MIDDLE)
+			return fits ?
+				IB_OPCODE_UC_SEND_LAST_WITH_IMMEDIATE :
+				IB_OPCODE_UC_SEND_MIDDLE;
+		else
+			return fits ?
+				IB_OPCODE_UC_SEND_ONLY_WITH_IMMEDIATE :
+				IB_OPCODE_UC_SEND_FIRST;
+	}
+
+	return -EINVAL;
+}
+
+static int next_opcode(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
+		       unsigned opcode)
+{
+	int fits = (wqe->dma.resid <= qp->mtu);
+
+	switch (qp_type(qp)) {
+	case IB_QPT_RC:
+		return next_opcode_rc(qp, opcode, fits);
+
+	case IB_QPT_UC:
+		return next_opcode_uc(qp, opcode, fits);
+
+	case IB_QPT_SMI:
+	case IB_QPT_UD:
+	case IB_QPT_GSI:
+		switch (opcode) {
+		case IB_WR_SEND:
+			return IB_OPCODE_UD_SEND_ONLY;
+
+		case IB_WR_SEND_WITH_IMM:
+			return IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE;
+		}
+		break;
+
+	default:
+		break;
+	}
+
+	return -EINVAL;
+}
+
+static inline int check_init_depth(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
+{
+	int depth;
+
+	if (wqe->has_rd_atomic)
+		return 0;
+
+	qp->req.need_rd_atomic = 1;
+	depth = atomic_dec_return(&qp->req.rd_atomic);
+
+	if (depth >= 0) {
+		qp->req.need_rd_atomic = 0;
+		wqe->has_rd_atomic = 1;
+		return 0;
+	} else {
+		atomic_inc(&qp->req.rd_atomic);
+		return -EAGAIN;
+	}
+}
+
+static inline int get_mtu(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+
+	if ((qp_type(qp) == IB_QPT_RC) || (qp_type(qp) == IB_QPT_UC)) {
+		return qp->mtu;
+	} else {
+		struct rxe_av *av = &wqe->av;
+		struct rxe_port *port = &rxe->port[av->attr.port_num - 1];
+		return port->mtu_cap;
+	}
+}
+
+static struct rxe_pkt_info *init_req_packet(struct rxe_qp *qp,
+					    struct rxe_send_wqe *wqe,
+					    int opcode, int payload)
+{
+	struct rxe_dev		*rxe = to_rdev(qp->ibqp.device);
+	struct rxe_port		*port = &rxe->port[qp->attr.port_num - 1];
+	struct rxe_pkt_info	*pkt;
+	struct sk_buff		*skb;
+	struct ib_send_wr	*ibwr = &wqe->ibwr;
+	struct rxe_av		*av;
+	int			pad = (-payload) & 0x3;
+	int			paylen;
+	int			solicited;
+	u16			pkey;
+	u32			qp_num;
+	int			ack_req;
+	unsigned int		lnh;
+	unsigned int		length;
+
+	/* length from start of bth to end of icrc */
+	paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE;
+
+	/* TODO support APM someday */
+	if (qp_type(qp) == IB_QPT_RC || qp_type(qp) == IB_QPT_UC)
+		av = &qp->pri_av;
+	else
+		av = &wqe->av;
+
+	/* init skb
+	 * ifc layer must set lrh/grh flags */
+	skb = rxe->ifc_ops->init_packet(rxe, av, paylen);
+	if (!skb)
+		return NULL;
+
+	atomic_inc(&qp->req_skb_out);
+	atomic_inc(&rxe->req_skb_out);
+
+	/* pkt->hdr, rxe, port_num, paylen and
+	   mask are initialized in ifc layer */
+	pkt		= SKB_TO_PKT(skb);
+	pkt->opcode	= opcode;
+	pkt->qp		= qp;
+	pkt->psn	= qp->req.psn;
+	pkt->mask	|= rxe_opcode[opcode].mask;
+	pkt->paylen	= paylen;
+	pkt->offset	= 0;
+
+	/* init lrh */
+	if (pkt->mask & RXE_LRH_MASK) {
+		pkt->offset += RXE_LRH_BYTES;
+		lnh = (pkt->mask & RXE_GRH_MASK) ? LRH_LNH_IBA_GBL
+						 : LRH_LNH_IBA_LOC;
+		length = paylen + RXE_LRH_BYTES;
+		if (pkt->mask & RXE_GRH_MASK)
+			length += RXE_GRH_BYTES;
+
+		/* TODO currently we do not implement SL->VL
+		   table so just force VL=SL */
+		lrh_init(pkt, av->attr.sl, av->attr.sl,
+			 lnh, length/4,
+			 av->attr.dlid,
+			 port->attr.lid);
+	}
+
+	/* init grh */
+	if (pkt->mask & RXE_GRH_MASK) {
+		pkt->offset += RXE_GRH_BYTES;
+
+		/* TODO set class flow hopcount */
+		grh_init(pkt, 0, 0, paylen, 0);
+
+		grh_set_sgid(pkt, port->subnet_prefix,
+			     port->guid_tbl[av->attr.grh.sgid_index]);
+		grh_set_dgid(pkt, av->attr.grh.dgid.global.subnet_prefix,
+			     av->attr.grh.dgid.global.interface_id);
+	}
+
+	/* init bth */
+	solicited = (ibwr->send_flags & IB_SEND_SOLICITED) &&
+			(pkt->mask & RXE_END_MASK) &&
+			((pkt->mask & (RXE_SEND_MASK)) ||
+			(pkt->mask & (RXE_WRITE_MASK | RXE_IMMDT_MASK)) ==
+			(RXE_WRITE_MASK | RXE_IMMDT_MASK));
+
+	pkey = (qp_type(qp) == IB_QPT_GSI) ?
+		 port->pkey_tbl[ibwr->wr.ud.pkey_index] :
+		 port->pkey_tbl[qp->attr.pkey_index];
+
+	qp_num = (pkt->mask & RXE_DETH_MASK) ? ibwr->wr.ud.remote_qpn :
+					 qp->attr.dest_qp_num;
+
+	ack_req = ((pkt->mask & RXE_END_MASK) ||
+		(qp->req.noack_pkts++ > rxe_max_pkt_per_ack));
+	if (ack_req)
+		qp->req.noack_pkts = 0;
+
+	bth_init(pkt, pkt->opcode, solicited, 0, pad, pkey, qp_num,
+		 ack_req, pkt->psn);
+
+	/* init optional headers */
+	if (pkt->mask & RXE_RETH_MASK) {
+		reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
+		reth_set_va(pkt, wqe->iova);
+		reth_set_len(pkt, wqe->dma.length);
+	}
+
+	if (pkt->mask & RXE_IMMDT_MASK)
+		immdt_set_imm(pkt, ibwr->ex.imm_data);
+
+	if (pkt->mask & RXE_IETH_MASK)
+		ieth_set_rkey(pkt, ibwr->ex.invalidate_rkey);
+
+	if (pkt->mask & RXE_ATMETH_MASK) {
+		atmeth_set_va(pkt, wqe->iova);
+		if (opcode == IB_OPCODE_RC_COMPARE_SWAP ||
+		    opcode == IB_OPCODE_RD_COMPARE_SWAP) {
+			atmeth_set_swap_add(pkt, ibwr->wr.atomic.swap);
+			atmeth_set_comp(pkt, ibwr->wr.atomic.compare_add);
+		} else {
+			atmeth_set_swap_add(pkt, ibwr->wr.atomic.compare_add);
+		}
+		atmeth_set_rkey(pkt, ibwr->wr.atomic.rkey);
+	}
+
+	if (pkt->mask & RXE_DETH_MASK) {
+		if (qp->ibqp.qp_num == 1)
+			deth_set_qkey(pkt, GSI_QKEY);
+		else
+			deth_set_qkey(pkt, ibwr->wr.ud.remote_qkey);
+		deth_set_sqp(pkt, qp->ibqp.qp_num);
+	}
+
+	return pkt;
+}
+
+static int fill_packet(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
+		       struct rxe_pkt_info *pkt, int payload)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	static const uint8_t zerobuf[4] = {0,};
+	int pad;
+	u32 crc = 0;
+	u8 *p;
+
+	if (!rxe_crc_disable)
+		crc = rxe_icrc_hdr(pkt);
+
+	if (pkt->mask & RXE_WRITE_OR_SEND) {
+		if (wqe->ibwr.send_flags & IB_SEND_INLINE) {
+			p = &wqe->dma.inline_data[wqe->dma.sge_offset];
+
+			if (!rxe_crc_disable)
+				crc = crc32_le(crc, p, payload);
+
+			memcpy(payload_addr(pkt), p, payload);
+
+			wqe->dma.resid -= payload;
+			wqe->dma.sge_offset += payload;
+		} else {
+			if (copy_data(rxe, qp->pd, 0, &wqe->dma,
+				      payload_addr(pkt), payload,
+				      direction_out, !rxe_crc_disable ?
+				      &crc : NULL)) {
+				return -1;
+			}
+		}
+	}
+
+	p = payload_addr(pkt) + payload;
+	pad = (-payload) & 0x3;
+	if (pad) {
+		if (!rxe_crc_disable)
+			crc = crc32_le(crc, zerobuf, pad);
+
+		memcpy(p, zerobuf, pad);
+
+		p += pad;
+	}
+
+	*p = ~crc;
+
+	return 0;
+}
+
+static void update_state(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
+			 struct rxe_pkt_info *pkt, int payload)
+{
+	/* number of packets left to send including current one */
+	int num_pkt = (wqe->dma.resid + payload + qp->mtu - 1) / qp->mtu;
+
+	/* handle zero length packet case */
+	if (num_pkt == 0)
+		num_pkt = 1;
+
+	if (pkt->mask & RXE_START_MASK) {
+		wqe->first_psn = qp->req.psn;
+		wqe->last_psn = (qp->req.psn + num_pkt - 1) & BTH_PSN_MASK;
+	}
+
+	if (pkt->mask & RXE_READ_MASK)
+		qp->req.psn = (wqe->first_psn + num_pkt) & BTH_PSN_MASK;
+	else
+		qp->req.psn = (qp->req.psn + 1) & BTH_PSN_MASK;
+
+	qp->req.opcode = pkt->opcode;
+
+	if (pkt->mask & RXE_END_MASK) {
+		if (qp_type(qp) == IB_QPT_RC)
+			wqe->state = wqe_state_pending;
+		else
+			wqe->state = wqe_state_done;
+
+		qp->req.wqe_index = next_index(qp->sq.queue,
+						qp->req.wqe_index);
+	} else
+		wqe->state = wqe_state_processing;
+
+	if (qp->qp_timeout_jiffies && !timer_pending(&qp->retrans_timer))
+		mod_timer(&qp->retrans_timer,
+			  jiffies + qp->qp_timeout_jiffies);
+}
+
+int rxe_requester(void *arg)
+{
+	struct rxe_qp *qp = (struct rxe_qp *)arg;
+	struct rxe_pkt_info *pkt;
+	struct rxe_send_wqe *wqe;
+	unsigned mask;
+	int payload;
+	int mtu;
+	int opcode;
+
+	if (!qp->valid || qp->req.state == QP_STATE_ERROR)
+		goto exit;
+
+	if (qp->req.state == QP_STATE_RESET) {
+		qp->req.wqe_index = consumer_index(qp->sq.queue);
+		qp->req.opcode = -1;
+		qp->req.need_rd_atomic = 0;
+		qp->req.wait_psn = 0;
+		qp->req.need_retry = 0;
+		goto exit;
+	}
+
+	if (qp->req.need_retry) {
+		req_retry(qp);
+		qp->req.need_retry = 0;
+	}
+
+	wqe = req_next_wqe(qp);
+	if (!wqe)
+		goto exit;
+
+	/* heuristic sliding window algorithm to keep
+	   sender from overrunning receiver queues */
+	if (qp_type(qp) == IB_QPT_RC)  {
+		if (psn_compare(qp->req.psn, qp->comp.psn +
+				rxe_max_req_comp_gap) > 0) {
+			qp->req.wait_psn = 1;
+			goto exit;
+		}
+	}
+
+	/* heuristic algorithm to prevent sender from
+	   overrunning the output queue. to not overrun
+	   the receiver's input queue (UC/UD) requires
+	   rate control or application level flow control */
+	if (atomic_read(&qp->req_skb_out) > rxe_max_skb_per_qp) {
+		qp->need_req_skb = 1;
+		goto exit;
+	}
+
+	opcode = next_opcode(qp, wqe, wqe->ibwr.opcode);
+	if (opcode < 0) {
+		wqe->status = IB_WC_LOC_QP_OP_ERR;
+		/* TODO most be more to do here ?? */
+		goto exit;
+	}
+
+	mask = rxe_opcode[opcode].mask;
+	if (unlikely(mask & RXE_READ_OR_ATOMIC)) {
+		if (check_init_depth(qp, wqe))
+			goto exit;
+	}
+
+	mtu = get_mtu(qp, wqe);
+	payload = (mask & RXE_WRITE_OR_SEND) ? wqe->dma.resid : 0;
+	if (payload > mtu) {
+		if (qp_type(qp) == IB_QPT_UD) {
+			/* believe it or not this is
+			   what the spec says to do */
+			/* TODO handle > 1 ports */
+
+			/*
+			 * fake a successful UD send
+			 */
+			wqe->first_psn = qp->req.psn;
+			wqe->last_psn = qp->req.psn;
+			qp->req.psn = (qp->req.psn + 1) & BTH_PSN_MASK;
+			qp->req.opcode = IB_OPCODE_UD_SEND_ONLY;
+			qp->req.wqe_index = next_index(qp->sq.queue,
+						       qp->req.wqe_index);
+			wqe->state = wqe_state_done;
+			wqe->status = IB_WC_SUCCESS;
+			goto complete;
+		}
+		payload = mtu;
+	}
+
+	pkt = init_req_packet(qp, wqe, opcode, payload);
+	if (!pkt) {
+		qp->need_req_skb = 1;
+		goto exit;
+	}
+
+	if (fill_packet(qp, wqe, pkt, payload)) {
+		wqe->status = IB_WC_LOC_PROT_ERR;
+		wqe->state = wqe_state_error;
+		goto complete;
+	}
+
+	update_state(qp, wqe, pkt, payload);
+
+	arbiter_skb_queue(to_rdev(qp->ibqp.device), qp, PKT_TO_SKB(pkt));
+
+	if (mask & RXE_END_MASK)
+		goto complete;
+	else
+		goto done;
+
+complete:
+	if (qp_type(qp) != IB_QPT_RC) {
+		while (rxe_completer(qp) == 0)
+			;
+	}
+done:
+	return 0;
+
+exit:
+	return -EAGAIN;
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 27/37] add rxe_resp.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (25 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 26/37] add rxe_req.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.464927217-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 28/37] add rxe_arbiter.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (10 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_resp.c --]
[-- Type: text/plain, Size: 34383 bytes --]

QP response logic.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_resp.c | 1366 +++++++++++++++++++++++++++++++++++
 1 file changed, 1366 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_resp.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_resp.c
@@ -0,0 +1,1366 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * implements the qp responder
+ */
+
+#include <linux/skbuff.h>
+
+#include "rxe.h"
+#include "rxe_loc.h"
+#include "rxe_queue.h"
+
+enum resp_states {
+	RESPST_NONE,
+	RESPST_GET_REQ,
+	RESPST_CHK_PSN,
+	RESPST_CHK_OP_SEQ,
+	RESPST_CHK_OP_VALID,
+	RESPST_CHK_RESOURCE,
+	RESPST_CHK_LENGTH,
+	RESPST_CHK_RKEY,
+	RESPST_EXECUTE,
+	RESPST_READ_REPLY,
+	RESPST_COMPLETE,
+	RESPST_ACKNOWLEDGE,
+	RESPST_CLEANUP,
+	RESPST_DUPLICATE_REQUEST,
+	RESPST_ERR_MALFORMED_WQE,
+	RESPST_ERR_UNSUPPORTED_OPCODE,
+	RESPST_ERR_MISALIGNED_ATOMIC,
+	RESPST_ERR_PSN_OUT_OF_SEQ,
+	RESPST_ERR_MISSING_OPCODE_FIRST,
+	RESPST_ERR_MISSING_OPCODE_LAST_C,
+	RESPST_ERR_MISSING_OPCODE_LAST_D1E,
+	RESPST_ERR_TOO_MANY_RDMA_ATM_REQ,
+	RESPST_ERR_RNR,
+	RESPST_ERR_RKEY_VIOLATION,
+	RESPST_ERR_LENGTH,
+	RESPST_ERR_CQ_OVERFLOW,
+	RESPST_ERROR,
+	RESPST_RESET,
+	RESPST_DONE,
+	RESPST_EXIT,
+};
+
+static char *resp_state_name[] = {
+	[RESPST_NONE]				= "NONE",
+	[RESPST_GET_REQ]			= "GET_REQ",
+	[RESPST_CHK_PSN]			= "CHK_PSN",
+	[RESPST_CHK_OP_SEQ]			= "CHK_OP_SEQ",
+	[RESPST_CHK_OP_VALID]			= "CHK_OP_VALID",
+	[RESPST_CHK_RESOURCE]			= "CHK_RESOURCE",
+	[RESPST_CHK_LENGTH]			= "CHK_LENGTH",
+	[RESPST_CHK_RKEY]			= "CHK_RKEY",
+	[RESPST_EXECUTE]			= "EXECUTE",
+	[RESPST_READ_REPLY]			= "READ_REPLY",
+	[RESPST_COMPLETE]			= "COMPLETE",
+	[RESPST_ACKNOWLEDGE]			= "ACKNOWLEDGE",
+	[RESPST_CLEANUP]			= "CLEANUP",
+	[RESPST_DUPLICATE_REQUEST]		= "DUPLICATE_REQUEST",
+	[RESPST_ERR_MALFORMED_WQE]		= "ERR_MALFORMED_WQE",
+	[RESPST_ERR_UNSUPPORTED_OPCODE]		= "ERR_UNSUPPORTED_OPCODE",
+	[RESPST_ERR_MISALIGNED_ATOMIC]		= "ERR_MISALIGNED_ATOMIC",
+	[RESPST_ERR_PSN_OUT_OF_SEQ]		= "ERR_PSN_OUT_OF_SEQ",
+	[RESPST_ERR_MISSING_OPCODE_FIRST]	= "ERR_MISSING_OPCODE_FIRST",
+	[RESPST_ERR_MISSING_OPCODE_LAST_C]	= "ERR_MISSING_OPCODE_LAST_C",
+	[RESPST_ERR_MISSING_OPCODE_LAST_D1E]	= "ERR_MISSING_OPCODE_LAST_D1E",
+	[RESPST_ERR_TOO_MANY_RDMA_ATM_REQ]	= "ERR_TOO_MANY_RDMA_ATM_REQ",
+	[RESPST_ERR_RNR]			= "ERR_RNR",
+	[RESPST_ERR_RKEY_VIOLATION]		= "ERR_RKEY_VIOLATION",
+	[RESPST_ERR_LENGTH]			= "ERR_LENGTH",
+	[RESPST_ERR_CQ_OVERFLOW]		= "ERR_CQ_OVERFLOW",
+	[RESPST_ERROR]				= "ERROR",
+	[RESPST_RESET]				= "RESET",
+	[RESPST_DONE]				= "DONE",
+	[RESPST_EXIT]				= "EXIT",
+};
+
+/* rxe_recv calls here to add a request packet to the input queue */
+void rxe_resp_queue_pkt(struct rxe_dev *rxe, struct rxe_qp *qp,
+			struct sk_buff *skb)
+{
+	int must_sched;
+	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+
+	atomic_inc(&qp->req_skb_in);
+	atomic_inc(&rxe->req_skb_in);
+
+	skb_queue_tail(&qp->req_pkts, skb);
+
+	must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST)
+		    || (skb_queue_len(&qp->req_pkts) > 1);
+
+	rxe_run_task(&qp->resp.task, must_sched);
+}
+
+static inline enum resp_states get_req(struct rxe_qp *qp,
+				       struct rxe_pkt_info **pkt_p)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	struct sk_buff *skb;
+
+	if (qp->resp.state == QP_STATE_ERROR) {
+		skb = skb_dequeue(&qp->req_pkts);
+		if (skb) {
+			/* drain request packet queue */
+			rxe_drop_ref(qp);
+			kfree_skb(skb);
+			atomic_dec(&qp->req_skb_in);
+			atomic_dec(&rxe->req_skb_in);
+			return RESPST_GET_REQ;
+		} else {
+			/* go drain recv wr queue */
+			return RESPST_CHK_RESOURCE;
+		}
+	}
+
+	skb = skb_peek(&qp->req_pkts);
+	if (!skb)
+		return RESPST_EXIT;
+
+	*pkt_p = SKB_TO_PKT(skb);
+
+	return (qp->resp.res) ? RESPST_READ_REPLY : RESPST_CHK_PSN;
+}
+
+static enum resp_states check_psn(struct rxe_qp *qp,
+				  struct rxe_pkt_info *pkt)
+{
+	int diff = psn_compare(pkt->psn, qp->resp.psn);
+
+	switch (qp_type(qp)) {
+	case IB_QPT_RC:
+		if (diff > 0) {
+			if (qp->resp.sent_psn_nak) {
+				return RESPST_CLEANUP;
+			} else {
+				qp->resp.sent_psn_nak = 1;
+				return RESPST_ERR_PSN_OUT_OF_SEQ;
+			}
+		} else if (diff < 0) {
+			return RESPST_DUPLICATE_REQUEST;
+		} else {
+			if (qp->resp.sent_psn_nak)
+				qp->resp.sent_psn_nak = 0;
+		}
+		break;
+
+	case IB_QPT_UC:
+		if (qp->resp.drop_msg || diff != 0) {
+			if (pkt->mask & RXE_START_MASK) {
+				qp->resp.drop_msg = 0;
+				return RESPST_CHK_OP_SEQ;
+			} else {
+				qp->resp.drop_msg = 1;
+				return RESPST_CLEANUP;
+			}
+		}
+		break;
+	default:
+		break;
+	}
+
+	return RESPST_CHK_OP_SEQ;
+}
+
+static enum resp_states check_op_seq(struct rxe_qp *qp,
+				     struct rxe_pkt_info *pkt)
+{
+	switch (qp_type(qp)) {
+	case IB_QPT_RC:
+		switch (qp->resp.opcode) {
+		case IB_OPCODE_RC_SEND_FIRST:
+		case IB_OPCODE_RC_SEND_MIDDLE:
+			switch (pkt->opcode) {
+			case IB_OPCODE_RC_SEND_MIDDLE:
+			case IB_OPCODE_RC_SEND_LAST:
+			case IB_OPCODE_RC_SEND_LAST_WITH_IMMEDIATE:
+			case IB_OPCODE_RC_SEND_LAST_INV:
+				return RESPST_CHK_OP_VALID;
+			default:
+				return RESPST_ERR_MISSING_OPCODE_LAST_C;
+			}
+
+		case IB_OPCODE_RC_RDMA_WRITE_FIRST:
+		case IB_OPCODE_RC_RDMA_WRITE_MIDDLE:
+			switch (pkt->opcode) {
+			case IB_OPCODE_RC_RDMA_WRITE_MIDDLE:
+			case IB_OPCODE_RC_RDMA_WRITE_LAST:
+			case IB_OPCODE_RC_RDMA_WRITE_LAST_WITH_IMMEDIATE:
+				return RESPST_CHK_OP_VALID;
+			default:
+				return RESPST_ERR_MISSING_OPCODE_LAST_C;
+			}
+
+		default:
+			switch (pkt->opcode) {
+			case IB_OPCODE_RC_SEND_MIDDLE:
+			case IB_OPCODE_RC_SEND_LAST:
+			case IB_OPCODE_RC_SEND_LAST_WITH_IMMEDIATE:
+			case IB_OPCODE_RC_SEND_LAST_INV:
+			case IB_OPCODE_RC_RDMA_WRITE_MIDDLE:
+			case IB_OPCODE_RC_RDMA_WRITE_LAST:
+			case IB_OPCODE_RC_RDMA_WRITE_LAST_WITH_IMMEDIATE:
+				return RESPST_ERR_MISSING_OPCODE_FIRST;
+			default:
+				return RESPST_CHK_OP_VALID;
+			}
+		}
+		break;
+
+	case IB_QPT_UC:
+		switch (qp->resp.opcode) {
+		case IB_OPCODE_UC_SEND_FIRST:
+		case IB_OPCODE_UC_SEND_MIDDLE:
+			switch (pkt->opcode) {
+			case IB_OPCODE_UC_SEND_MIDDLE:
+			case IB_OPCODE_UC_SEND_LAST:
+			case IB_OPCODE_UC_SEND_LAST_WITH_IMMEDIATE:
+				return RESPST_CHK_OP_VALID;
+			default:
+				return RESPST_ERR_MISSING_OPCODE_LAST_D1E;
+			}
+
+		case IB_OPCODE_UC_RDMA_WRITE_FIRST:
+		case IB_OPCODE_UC_RDMA_WRITE_MIDDLE:
+			switch (pkt->opcode) {
+			case IB_OPCODE_UC_RDMA_WRITE_MIDDLE:
+			case IB_OPCODE_UC_RDMA_WRITE_LAST:
+			case IB_OPCODE_UC_RDMA_WRITE_LAST_WITH_IMMEDIATE:
+				return RESPST_CHK_OP_VALID;
+			default:
+				return RESPST_ERR_MISSING_OPCODE_LAST_D1E;
+			}
+
+		default:
+			switch (pkt->opcode) {
+			case IB_OPCODE_UC_SEND_MIDDLE:
+			case IB_OPCODE_UC_SEND_LAST:
+			case IB_OPCODE_UC_SEND_LAST_WITH_IMMEDIATE:
+			case IB_OPCODE_UC_RDMA_WRITE_MIDDLE:
+			case IB_OPCODE_UC_RDMA_WRITE_LAST:
+			case IB_OPCODE_UC_RDMA_WRITE_LAST_WITH_IMMEDIATE:
+				qp->resp.drop_msg = 1;
+				return RESPST_CLEANUP;
+			default:
+				return RESPST_CHK_OP_VALID;
+			}
+		}
+		break;
+
+	default:
+		return RESPST_CHK_OP_VALID;
+	}
+}
+
+static enum resp_states check_op_valid(struct rxe_qp *qp,
+				       struct rxe_pkt_info *pkt)
+{
+	switch (qp_type(qp)) {
+	case IB_QPT_RC:
+		if (((pkt->mask & RXE_READ_MASK) &&
+		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_READ)) ||
+		    ((pkt->mask & RXE_WRITE_MASK) &&
+		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_WRITE)) ||
+		    ((pkt->mask & RXE_ATOMIC_MASK) &&
+		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_ATOMIC))) {
+			return RESPST_ERR_UNSUPPORTED_OPCODE;
+		}
+
+		if (!pkt->mask)
+			return RESPST_ERR_UNSUPPORTED_OPCODE;
+		break;
+
+	case IB_QPT_UC:
+		if ((pkt->mask & RXE_WRITE_MASK) &&
+		    !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_WRITE)) {
+			qp->resp.drop_msg = 1;
+			return RESPST_CLEANUP;
+		}
+
+		if (!pkt->mask) {
+			qp->resp.drop_msg = 1;
+			return RESPST_CLEANUP;
+		}
+		break;
+
+	case IB_QPT_UD:
+	case IB_QPT_SMI:
+	case IB_QPT_GSI:
+		if (!pkt->mask)
+			return RESPST_CLEANUP;
+		break;
+
+	default:
+		WARN_ON(1);
+		break;
+	}
+
+	return RESPST_CHK_RESOURCE;
+}
+
+static enum resp_states get_srq_wqe(struct rxe_qp *qp)
+{
+	struct rxe_srq *srq = qp->srq;
+	struct rxe_queue *q = srq->rq.queue;
+	struct rxe_recv_wqe *wqe;
+	struct ib_event ev;
+
+	if (srq->error)
+		return RESPST_ERR_RNR;
+
+	spin_lock_bh(&srq->rq.consumer_lock);
+
+	wqe = queue_head(q);
+	if (!wqe) {
+		spin_unlock_bh(&srq->rq.consumer_lock);
+		return RESPST_ERR_RNR;
+	}
+
+	/* note kernel and user space recv wqes have same size */
+	memcpy(&qp->resp.srq_wqe, wqe, sizeof(qp->resp.srq_wqe));
+
+	qp->resp.wqe = &qp->resp.srq_wqe.wqe;
+	advance_consumer(q);
+
+	if (srq->limit && srq->ibsrq.event_handler &&
+	    (queue_count(q) < srq->limit)) {
+		srq->limit = 0;
+		goto event;
+	}
+
+	spin_unlock_bh(&srq->rq.consumer_lock);
+	return RESPST_CHK_LENGTH;
+
+event:
+	spin_unlock_bh(&srq->rq.consumer_lock);
+	ev.device = qp->ibqp.device;
+	ev.element.srq = qp->ibqp.srq;
+	ev.event = IB_EVENT_SRQ_LIMIT_REACHED;
+	srq->ibsrq.event_handler(&ev, srq->ibsrq.srq_context);
+	return RESPST_CHK_LENGTH;
+}
+
+static enum resp_states check_resource(struct rxe_qp *qp,
+				       struct rxe_pkt_info *pkt)
+{
+	struct rxe_srq *srq = qp->srq;
+
+	if (qp->resp.state == QP_STATE_ERROR) {
+		if (qp->resp.wqe) {
+			qp->resp.status = IB_WC_WR_FLUSH_ERR;
+			return RESPST_COMPLETE;
+		} else if (!srq) {
+			qp->resp.wqe = queue_head(qp->rq.queue);
+			if (qp->resp.wqe) {
+				qp->resp.status = IB_WC_WR_FLUSH_ERR;
+				return RESPST_COMPLETE;
+			} else
+				return RESPST_EXIT;
+		} else
+			return RESPST_EXIT;
+	}
+
+	if (pkt->mask & RXE_READ_OR_ATOMIC) {
+		/* it is the requesters job to not send
+		   too many read/atomic ops, we just
+		   recycle the responder resource queue */
+		if (likely(qp->attr.max_rd_atomic > 0))
+			return RESPST_CHK_LENGTH;
+		else
+			return RESPST_ERR_TOO_MANY_RDMA_ATM_REQ;
+	}
+
+	if (pkt->mask & RXE_RWR_MASK) {
+		if (srq) {
+			return get_srq_wqe(qp);
+		} else {
+			qp->resp.wqe = queue_head(qp->rq.queue);
+			return (qp->resp.wqe) ? RESPST_CHK_LENGTH
+					      : RESPST_ERR_RNR;
+		}
+	}
+
+	return RESPST_CHK_LENGTH;
+}
+
+static enum resp_states check_length(struct rxe_qp *qp,
+				     struct rxe_pkt_info *pkt)
+{
+	switch (qp_type(qp)) {
+	case IB_QPT_RC:
+		return RESPST_CHK_RKEY;
+
+	case IB_QPT_UC:
+		return RESPST_CHK_RKEY;
+
+	default:
+		return RESPST_CHK_RKEY;
+	}
+}
+
+static enum resp_states check_rkey(struct rxe_qp *qp,
+				   struct rxe_pkt_info *pkt)
+{
+	struct rxe_mem *mem;
+	u64 va;
+	u32 rkey;
+	u32 resid;
+	u32 pktlen;
+	int mtu = qp->mtu;
+	enum resp_states state;
+	int access;
+
+	if (pkt->mask & (RXE_READ_MASK | RXE_WRITE_MASK)) {
+		if (pkt->mask & RXE_RETH_MASK) {
+			qp->resp.va = reth_va(pkt);
+			qp->resp.rkey = reth_rkey(pkt);
+			qp->resp.resid = reth_len(pkt);
+		}
+		access = (pkt->mask & RXE_READ_MASK) ? IB_ACCESS_REMOTE_READ
+						     : IB_ACCESS_REMOTE_WRITE;
+	} else if (pkt->mask & RXE_ATOMIC_MASK) {
+		qp->resp.va = atmeth_va(pkt);
+		qp->resp.rkey = atmeth_rkey(pkt);
+		qp->resp.resid = sizeof(u64);
+		access = IB_ACCESS_REMOTE_ATOMIC;
+	} else {
+		return RESPST_EXECUTE;
+	}
+
+	va	= qp->resp.va;
+	rkey	= qp->resp.rkey;
+	resid	= qp->resp.resid;
+	pktlen	= payload_size(pkt);
+
+	mem = lookup_mem(qp->pd, access, rkey, lookup_remote);
+	if (!mem) {
+		state = RESPST_ERR_RKEY_VIOLATION;
+		goto err1;
+	}
+
+	if (mem_check_range(mem, va, resid)) {
+		state = RESPST_ERR_RKEY_VIOLATION;
+		goto err2;
+	}
+
+	if (pkt->mask & RXE_WRITE_MASK)	 {
+		if (resid > mtu) {
+			if (pktlen != mtu || bth_pad(pkt)) {
+				state = RESPST_ERR_LENGTH;
+				goto err2;
+			}
+
+			resid = mtu;
+		} else {
+			if ((pktlen != resid)) {
+				state = RESPST_ERR_LENGTH;
+				goto err2;
+			}
+			if ((bth_pad(pkt) != (0x3 & (-resid)))) {
+				/* This case may not be exactly that
+				 * but nothing else fits. */
+				state = RESPST_ERR_LENGTH;
+				goto err2;
+			}
+		}
+	}
+
+	WARN_ON(qp->resp.mr != NULL);
+
+	qp->resp.mr = mem;
+	return RESPST_EXECUTE;
+
+err2:
+	rxe_drop_ref(mem);
+err1:
+	return state;
+}
+
+static enum resp_states send_data_in(struct rxe_qp *qp, void *data_addr,
+				     int data_len)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+
+	err = copy_data(rxe, qp->pd, IB_ACCESS_LOCAL_WRITE, &qp->resp.wqe->dma,
+			data_addr, data_len, direction_in, NULL);
+	if (unlikely(err))
+		return (err == -ENOSPC) ? RESPST_ERR_LENGTH
+					: RESPST_ERR_MALFORMED_WQE;
+
+	return RESPST_NONE;
+}
+
+static enum resp_states write_data_in(struct rxe_qp *qp,
+				      struct rxe_pkt_info *pkt)
+{
+	enum resp_states rc = RESPST_NONE;
+	int	err;
+	int data_len = payload_size(pkt);
+
+	err = rxe_mem_copy(qp->resp.mr, qp->resp.va, payload_addr(pkt),
+			   data_len, direction_in, NULL);
+	if (err) {
+		rc = RESPST_ERR_RKEY_VIOLATION;
+		goto out;
+	}
+
+	qp->resp.va += data_len;
+	qp->resp.resid -= data_len;
+
+out:
+	return rc;
+}
+
+/* Guarantee atomicity of atomic operations at the machine level. */
+static DEFINE_SPINLOCK(atomic_ops_lock);
+
+static enum resp_states process_atomic(struct rxe_qp *qp,
+				       struct rxe_pkt_info *pkt)
+{
+	u64 iova = atmeth_va(pkt);
+	u64 *vaddr;
+	enum resp_states ret;
+	struct rxe_mem *mr = qp->resp.mr;
+
+	if (mr->state != RXE_MEM_STATE_VALID) {
+		ret = RESPST_ERR_RKEY_VIOLATION;
+		goto out;
+	}
+
+	vaddr = iova_to_vaddr(mr, iova, sizeof(u64));
+
+	/* check vaddr is 8 bytes aligned. */
+	if (!vaddr || (uintptr_t)vaddr & 7) {
+		ret = RESPST_ERR_MISALIGNED_ATOMIC;
+		goto out;
+	}
+
+	spin_lock_bh(&atomic_ops_lock);
+
+	qp->resp.atomic_orig = *vaddr;
+
+	if (pkt->opcode == IB_OPCODE_RC_COMPARE_SWAP ||
+	    pkt->opcode == IB_OPCODE_RD_COMPARE_SWAP) {
+		if (*vaddr == atmeth_comp(pkt))
+			*vaddr = atmeth_swap_add(pkt);
+	} else {
+		*vaddr += atmeth_swap_add(pkt);
+	}
+
+	spin_unlock_bh(&atomic_ops_lock);
+
+	ret = RESPST_NONE;
+out:
+	return ret;
+}
+
+static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
+					  struct rxe_pkt_info *pkt,
+					  int opcode,
+					  int payload,
+					  u32 psn,
+					  u8 syndrome)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	struct sk_buff *skb;
+	struct rxe_pkt_info *ack;
+	int paylen;
+	int pad;
+
+	/*
+	 * allocate packet
+	 */
+	pad = (-payload) & 0x3;
+	paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE;
+
+	skb = rxe->ifc_ops->init_packet(rxe, &qp->pri_av, paylen);
+	if (!skb)
+		return NULL;
+
+	atomic_inc(&qp->resp_skb_out);
+	atomic_inc(&rxe->resp_skb_out);
+
+	ack = SKB_TO_PKT(skb);
+	ack->qp = qp;
+	ack->opcode = opcode;
+	ack->mask |= rxe_opcode[opcode].mask;
+	ack->offset = pkt->offset;
+	ack->paylen = paylen;
+
+	/* fill in lrh/grh/bth using the request packet headers */
+	memcpy(ack->hdr, pkt->hdr, pkt->offset + RXE_BTH_BYTES);
+
+	/* swap addresses in lrh */
+	if (ack->mask & RXE_LRH_MASK) {
+		__lrh_set_dlid(ack->hdr, __lrh_slid(pkt->hdr));
+		__lrh_set_slid(ack->hdr, __lrh_dlid(pkt->hdr));
+	}
+
+	/* swap addresses in grh */
+	if (ack->mask & RXE_GRH_MASK) {
+		grh_set_paylen(ack, paylen);
+		grh_set_dgid(ack, grh_sgid(pkt)->global.subnet_prefix,
+			grh_sgid(pkt)->global.interface_id);
+		grh_set_sgid(ack, grh_dgid(pkt)->global.subnet_prefix,
+			grh_dgid(pkt)->global.interface_id);
+	}
+
+	bth_set_opcode(ack, opcode);
+	bth_set_qpn(ack, qp->attr.dest_qp_num);
+	bth_set_pad(ack, pad);
+	bth_set_se(ack, 0);
+	bth_set_psn(ack, psn);
+	ack->psn = psn;
+
+	if (ack->mask & RXE_AETH_MASK) {
+		aeth_set_syn(ack, syndrome);
+		aeth_set_msn(ack, qp->resp.msn);
+	}
+
+	if (ack->mask & RXE_ATMACK_MASK)
+		atmack_set_orig(ack, qp->resp.atomic_orig);
+
+	return skb;
+}
+
+/* RDMA read response. If res is not NULL, then we have a current RDMA request
+ * being processed or replayed. */
+static enum resp_states read_reply(struct rxe_qp *qp,
+				   struct rxe_pkt_info *req_pkt)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	struct rxe_pkt_info *pkt;
+	int mtu = qp->mtu;
+	enum resp_states state;
+	int payload;
+	struct sk_buff *skb;
+	int opcode;
+	int err;
+	struct resp_res *res = qp->resp.res;
+	unsigned char *buf;
+
+	if (!res) {
+		/* This is the first time we process that request. Get a
+		 * resource */
+		res = &qp->resp.resources[qp->resp.res_head];
+
+		free_rd_atomic_resource(qp, res);
+		rxe_advance_resp_resource(qp);
+
+		res->type		= RXE_READ_MASK;
+
+		res->read.va		= qp->resp.va;
+		res->read.va_org	= qp->resp.va;
+
+		res->first_psn		= req_pkt->psn;
+		res->last_psn		= req_pkt->psn +
+					  (reth_len(req_pkt) + mtu - 1)/mtu - 1;
+		res->cur_psn		= req_pkt->psn;
+
+		res->read.resid		= qp->resp.resid;
+		res->read.length	= qp->resp.resid;
+		res->read.rkey		= qp->resp.rkey;
+
+		/* note res inherits the reference to mr from qp */
+		res->read.mr		= qp->resp.mr;
+		qp->resp.mr		= NULL;
+
+		qp->resp.res		= res;
+		res->state		= rdatm_res_state_new;
+	}
+
+	if (res->state == rdatm_res_state_new) {
+		if (res->read.resid <= mtu)
+			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY;
+		else
+			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST;
+	} else {
+		if (res->read.resid > mtu)
+			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE;
+		else
+			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST;
+	}
+
+	res->state = rdatm_res_state_next;
+
+	payload = min_t(int, res->read.resid, mtu);
+
+	skb = prepare_ack_packet(qp, req_pkt, opcode, payload,
+				 res->cur_psn, AETH_ACK_UNLIMITED);
+	if (!skb)
+		return RESPST_ERR_RNR;
+
+	pkt = SKB_TO_PKT(skb);
+
+	err = rxe_mem_copy(res->read.mr, res->read.va, payload_addr(pkt),
+			   payload, direction_out, NULL);
+	if (err)
+		pr_warn("rxe_mem_copy failed TODO ???\n");
+
+	res->read.va += payload;
+	res->read.resid -= payload;
+	res->cur_psn = (res->cur_psn + 1) & BTH_PSN_MASK;
+
+	buf = payload_addr(pkt) + payload;
+	while (payload & 0x3) {
+		*buf++ = 0;
+		payload++;
+	}
+
+	if (!rxe_crc_disable)
+		*buf = rxe_icrc_pkt(pkt);
+
+	arbiter_skb_queue(rxe, qp, skb);
+
+	if (res->read.resid > 0) {
+		state = RESPST_DONE;
+	} else {
+		qp->resp.res = NULL;
+		qp->resp.opcode = -1;
+		qp->resp.psn = res->cur_psn;
+		state = RESPST_CLEANUP;
+	}
+
+	return state;
+}
+
+/* Executes a new request. A retried request never reach that function (send
+ * and writes are discarded, and reads and atomics are retried elsewhere. */
+static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
+{
+	enum resp_states err;
+
+	if (pkt->mask & RXE_SEND_MASK) {
+		if (qp_type(qp) == IB_QPT_UD ||
+		    qp_type(qp) == IB_QPT_SMI ||
+		    qp_type(qp) == IB_QPT_GSI) {
+			/* Copy the GRH */
+			err = send_data_in(qp, pkt->hdr, RXE_GRH_BYTES);
+			if (err)
+				return err;
+		}
+
+		err = send_data_in(qp, payload_addr(pkt), payload_size(pkt));
+		if (err)
+			return err;
+	} else if (pkt->mask & RXE_WRITE_MASK) {
+		err = write_data_in(qp, pkt);
+		if (err)
+			return err;
+	} else if (pkt->mask & RXE_READ_MASK) {
+		/* For RDMA Read we can increment the msn now. See C9-148. */
+		qp->resp.msn++;
+		return RESPST_READ_REPLY;
+	} else if (pkt->mask & RXE_ATOMIC_MASK) {
+		err = process_atomic(qp, pkt);
+		if (err)
+			return err;
+	} else
+		/* Unreachable */
+		WARN_ON(1);
+
+	/* We successfully processed this new request. */
+	qp->resp.msn++;
+
+	/* next expected psn, read handles this separately */
+	qp->resp.psn = (pkt->psn + 1) & BTH_PSN_MASK;
+
+	qp->resp.opcode = pkt->opcode;
+	qp->resp.status = IB_WC_SUCCESS;
+
+	if (pkt->mask & RXE_COMP_MASK)
+		return RESPST_COMPLETE;
+	else if (qp_type(qp) == IB_QPT_RC)
+		return RESPST_ACKNOWLEDGE;
+	else
+		return RESPST_CLEANUP;
+}
+
+static enum resp_states do_complete(struct rxe_qp *qp,
+				    struct rxe_pkt_info *pkt)
+{
+	struct rxe_cqe cqe;
+	struct ib_wc *wc = &cqe.ibwc;
+	struct ib_uverbs_wc *uwc = &cqe.uibwc;
+	struct rxe_recv_wqe *wqe = qp->resp.wqe;
+
+	if (unlikely(!wqe))
+		return RESPST_CLEANUP;
+
+	memset(&cqe, 0, sizeof(cqe));
+
+	wc->wr_id		= wqe->wr_id;
+	wc->status		= qp->resp.status;
+
+	/* fields after status are not required for errors */
+	if (wc->status == IB_WC_SUCCESS) {
+		wc->opcode = (pkt->mask & RXE_IMMDT_MASK &&
+				pkt->mask & RXE_READ_MASK) ?
+					IB_WC_RECV_RDMA_WITH_IMM : IB_WC_RECV;
+		wc->vendor_err = 0;
+		wc->byte_len = wqe->dma.length - wqe->dma.resid;
+
+		/* fields after byte_len are offset
+		   between kernel and user space */
+		if (qp->rcq->is_user) {
+			uwc->wc_flags		= IB_WC_GRH;
+
+			if (pkt->mask & RXE_IMMDT_MASK) {
+				uwc->wc_flags |= IB_WC_WITH_IMM;
+				uwc->ex.imm_data =
+					(__u32 __force)immdt_imm(pkt);
+			}
+
+			if (pkt->mask & RXE_IETH_MASK) {
+				uwc->wc_flags |= IB_WC_WITH_INVALIDATE;
+				uwc->ex.invalidate_rkey = ieth_rkey(pkt);
+			}
+
+			uwc->qp_num		= qp->ibqp.qp_num;
+
+			if (pkt->mask & RXE_DETH_MASK)
+				uwc->src_qp = deth_sqp(pkt);
+
+			uwc->port_num		= qp->attr.port_num;
+		} else {
+			wc->wc_flags		= IB_WC_GRH;
+
+			if (pkt->mask & RXE_IMMDT_MASK) {
+				wc->wc_flags |= IB_WC_WITH_IMM;
+				wc->ex.imm_data = immdt_imm(pkt);
+			}
+
+			if (pkt->mask & RXE_IETH_MASK) {
+				wc->wc_flags |= IB_WC_WITH_INVALIDATE;
+				wc->ex.invalidate_rkey = ieth_rkey(pkt);
+			}
+
+			wc->qp			= &qp->ibqp;
+
+			if (pkt->mask & RXE_DETH_MASK)
+				wc->src_qp = deth_sqp(pkt);
+
+			wc->port_num		= qp->attr.port_num;
+		}
+	}
+
+	/* have copy for srq and reference for !srq */
+	if (!qp->srq)
+		advance_consumer(qp->rq.queue);
+
+	qp->resp.wqe = NULL;
+
+	if (rxe_cq_post(qp->rcq, &cqe, pkt ? bth_se(pkt) : 1))
+		return RESPST_ERR_CQ_OVERFLOW;
+
+	if (qp->resp.state == QP_STATE_ERROR)
+		return RESPST_CHK_RESOURCE;
+
+	if (!pkt)
+		return RESPST_DONE;
+	else if (qp_type(qp) == IB_QPT_RC)
+		return RESPST_ACKNOWLEDGE;
+	else
+		return RESPST_CLEANUP;
+}
+
+static int send_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
+		    u8 syndrome, u32 psn)
+{
+	int err = 0;
+	struct sk_buff *skb;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	struct rxe_pkt_info *ack;
+	u32 *buf;
+
+	skb = prepare_ack_packet(qp, pkt, IB_OPCODE_RC_ACKNOWLEDGE,
+				 0, psn, syndrome);
+	if (skb == NULL) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	/* CRC */
+	ack = SKB_TO_PKT(skb);
+	buf = payload_addr(ack);
+
+	if (!rxe_crc_disable)
+		*buf = rxe_icrc_pkt(ack);
+
+	arbiter_skb_queue(rxe, qp, skb);
+
+err1:
+	return err;
+}
+
+static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
+			   u8 syndrome)
+{
+	int rc = 0;
+	struct sk_buff *skb;
+	struct sk_buff *skb_copy;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	struct rxe_pkt_info *ack;
+	struct resp_res *res;
+	u32 *buf;
+
+	skb = prepare_ack_packet(qp, pkt, IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE,
+				 0, pkt->psn, syndrome);
+	if (skb == NULL) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	ack = SKB_TO_PKT(skb);
+	buf = payload_addr(ack);
+
+	if (!rxe_crc_disable)
+		*buf = rxe_icrc_pkt(ack);
+
+	res = &qp->resp.resources[qp->resp.res_head];
+	free_rd_atomic_resource(qp, res);
+	rxe_advance_resp_resource(qp);
+
+	res->type = RXE_ATOMIC_MASK;
+	res->atomic.skb = skb;
+	res->first_psn = qp->resp.psn;
+	res->last_psn = qp->resp.psn;
+	res->cur_psn = qp->resp.psn;
+
+	skb_copy = skb_clone(skb, GFP_ATOMIC);
+	if (skb_copy) {
+		rxe_add_ref(qp); /* for the new SKB */
+		atomic_inc(&qp->resp_skb_out);
+		atomic_inc(&rxe->resp_skb_out);
+	} else {
+		pr_warn("Could not clone atomic response\n");
+	}
+
+	arbiter_skb_queue(rxe, qp, skb_copy);
+
+out:
+	return rc;
+}
+
+static enum resp_states acknowledge(struct rxe_qp *qp,
+				    struct rxe_pkt_info *pkt)
+{
+	if (qp_type(qp) != IB_QPT_RC)
+		return RESPST_CLEANUP;
+
+	if (qp->resp.aeth_syndrome != AETH_ACK_UNLIMITED)
+		send_ack(qp, pkt, qp->resp.aeth_syndrome, pkt->psn);
+	else if (pkt->mask & RXE_ATOMIC_MASK)
+		send_atomic_ack(qp, pkt, AETH_ACK_UNLIMITED);
+	else if (bth_ack(pkt))
+		send_ack(qp, pkt, AETH_ACK_UNLIMITED, pkt->psn);
+
+	return RESPST_CLEANUP;
+}
+
+static enum resp_states cleanup(struct rxe_qp *qp,
+				struct rxe_pkt_info *pkt)
+{
+	struct sk_buff *skb;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+
+	if (pkt) {
+		skb = skb_dequeue(&qp->req_pkts);
+		rxe_drop_ref(qp);
+		kfree_skb(skb);
+		atomic_dec(&qp->req_skb_in);
+		atomic_dec(&rxe->req_skb_in);
+	}
+
+	if (qp->resp.mr) {
+		rxe_drop_ref(qp->resp.mr);
+		qp->resp.mr = NULL;
+	}
+
+	return RESPST_DONE;
+}
+
+static struct resp_res *find_resource(struct rxe_qp *qp, u32 psn)
+{
+	int i;
+
+	for (i = 0; i < qp->attr.max_rd_atomic; i++) {
+		struct resp_res *res = &qp->resp.resources[i];
+		if (res->type == 0)
+			continue;
+
+		if (psn_compare(psn, res->first_psn) >= 0 &&
+		    psn_compare(psn, res->last_psn) <= 0) {
+			return res;
+		}
+	}
+
+	return NULL;
+}
+
+static enum resp_states duplicate_request(struct rxe_qp *qp,
+					  struct rxe_pkt_info *pkt)
+{
+	enum resp_states rc;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+
+	if (pkt->mask & RXE_SEND_MASK ||
+	    pkt->mask & RXE_WRITE_MASK) {
+		/* SEND. Ack again and cleanup. C9-105. */
+		if (bth_ack(pkt))
+			send_ack(qp, pkt, AETH_ACK_UNLIMITED, qp->resp.psn - 1);
+		rc = RESPST_CLEANUP;
+		goto out;
+	} else if (pkt->mask & RXE_READ_MASK) {
+		struct resp_res *res;
+
+		res = find_resource(qp, pkt->psn);
+		if (!res) {
+			/* Ressource not found. Class D error.
+			   Drop the request. */
+			rc = RESPST_CLEANUP;
+			goto out;
+		} else {
+			/* Ensure this new request is the same as the previous
+			 * one or a subset of it. */
+			u64 iova = reth_va(pkt);
+			u32 resid = reth_len(pkt);
+
+			if (iova < res->read.va_org ||
+			    resid > res->read.length ||
+			    (iova + resid) > (res->read.va_org +
+					      res->read.length)) {
+				rc = RESPST_CLEANUP;
+				goto out;
+			}
+
+			if (reth_rkey(pkt) != res->read.rkey) {
+				rc = RESPST_CLEANUP;
+				goto out;
+			}
+
+			res->cur_psn = pkt->psn;
+			res->state = (pkt->psn == res->first_psn) ?
+					rdatm_res_state_new :
+					rdatm_res_state_replay;
+
+			/* Reset the resource, except length. */
+			res->read.va_org = iova;
+			res->read.va = iova;
+			res->read.resid = resid;
+
+			/* Replay the RDMA read reply. */
+			qp->resp.res = res;
+			rc = RESPST_READ_REPLY;
+			goto out;
+		}
+	} else {
+		struct resp_res *res;
+
+		WARN_ON((pkt->mask & RXE_ATOMIC_MASK) == 0);
+
+		/* Find the operation in our list of responder resources. */
+		res = find_resource(qp, pkt->psn);
+		if (res) {
+			struct sk_buff *skb_copy;
+
+			skb_copy = skb_clone(res->atomic.skb, GFP_ATOMIC);
+			if (skb_copy) {
+				rxe_add_ref(qp); /* for the new SKB */
+				atomic_inc(&qp->resp_skb_out);
+				atomic_inc(&rxe->resp_skb_out);
+			} else {
+				pr_warn("Couldn't clone atomic resp\n");
+				rc = RESPST_CLEANUP;
+				goto out;
+			}
+			bth_set_psn(SKB_TO_PKT(skb_copy),
+				    qp->resp.psn - 1);
+			/* Resend the result. */
+			arbiter_skb_queue(to_rdev(qp->ibqp.device), qp,
+					  skb_copy);
+		}
+
+		/* Ressource not found. Class D error. Drop the request. */
+		rc = RESPST_CLEANUP;
+		goto out;
+	}
+out:
+	return rc;
+}
+
+/* Process a class A or C. Both are treated the same in this implementation. */
+static void do_class_ac_error(struct rxe_qp *qp, u8 syndrome,
+			      enum ib_wc_status status)
+{
+	qp->resp.aeth_syndrome	= syndrome;
+	qp->resp.status		= status;
+
+	/* indicate that we should go through the ERROR state */
+	qp->resp.goto_error	= 1;
+}
+
+static enum resp_states do_class_d1e_error(struct rxe_qp *qp)
+{
+	/* UC */
+	if (qp->srq) {
+		/* Class E */
+		qp->resp.drop_msg = 1;
+		if (qp->resp.wqe) {
+			qp->resp.status = IB_WC_REM_INV_REQ_ERR;
+			return RESPST_COMPLETE;
+		} else {
+			return RESPST_CLEANUP;
+		}
+	} else {
+		/* Class D1. This packet may be the start of a
+		   new message and could be valid. The previous
+		   message is invalid and ignored. reset the
+		   recv wr to its original state */
+		if (qp->resp.wqe) {
+			qp->resp.wqe->dma.resid = qp->resp.wqe->dma.length;
+			qp->resp.wqe->dma.cur_sge = 0;
+			qp->resp.wqe->dma.sge_offset = 0;
+			qp->resp.opcode = -1;
+		}
+
+		if (qp->resp.mr) {
+			rxe_drop_ref(qp->resp.mr);
+			qp->resp.mr = NULL;
+		}
+
+		return RESPST_CHK_OP_SEQ;
+	}
+}
+
+int rxe_responder(void *arg)
+{
+	struct rxe_qp *qp = (struct rxe_qp *)arg;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	enum resp_states state;
+	struct rxe_pkt_info *pkt = NULL;
+	int ret = 0;
+
+	qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
+
+	if (!qp->valid) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	switch (qp->resp.state) {
+	case QP_STATE_RESET:
+		state = RESPST_RESET;
+		break;
+
+	default:
+		state = RESPST_GET_REQ;
+		break;
+	}
+
+	while (1) {
+		pr_debug("state = %s\n", resp_state_name[state]);
+		switch (state) {
+		case RESPST_GET_REQ:
+			state = get_req(qp, &pkt);
+			break;
+		case RESPST_CHK_PSN:
+			state = check_psn(qp, pkt);
+			break;
+		case RESPST_CHK_OP_SEQ:
+			state = check_op_seq(qp, pkt);
+			break;
+		case RESPST_CHK_OP_VALID:
+			state = check_op_valid(qp, pkt);
+			break;
+		case RESPST_CHK_RESOURCE:
+			state = check_resource(qp, pkt);
+			break;
+		case RESPST_CHK_LENGTH:
+			state = check_length(qp, pkt);
+			break;
+		case RESPST_CHK_RKEY:
+			state = check_rkey(qp, pkt);
+			break;
+		case RESPST_EXECUTE:
+			state = execute(qp, pkt);
+			break;
+		case RESPST_COMPLETE:
+			state = do_complete(qp, pkt);
+			break;
+		case RESPST_READ_REPLY:
+			state = read_reply(qp, pkt);
+			break;
+		case RESPST_ACKNOWLEDGE:
+			state = acknowledge(qp, pkt);
+			break;
+		case RESPST_CLEANUP:
+			state = cleanup(qp, pkt);
+			break;
+		case RESPST_DUPLICATE_REQUEST:
+			state = duplicate_request(qp, pkt);
+			break;
+		case RESPST_ERR_PSN_OUT_OF_SEQ:
+			/* RC only - Class B. Drop packet. */
+			send_ack(qp, pkt, AETH_NAK_PSN_SEQ_ERROR, qp->resp.psn);
+			state = RESPST_CLEANUP;
+			break;
+
+		case RESPST_ERR_TOO_MANY_RDMA_ATM_REQ:
+		case RESPST_ERR_MISSING_OPCODE_FIRST:
+		case RESPST_ERR_MISSING_OPCODE_LAST_C:
+		case RESPST_ERR_UNSUPPORTED_OPCODE:
+		case RESPST_ERR_MISALIGNED_ATOMIC:
+			/* RC Only - Class C. */
+			do_class_ac_error(qp, AETH_NAK_INVALID_REQ,
+					  IB_WC_REM_INV_REQ_ERR);
+			state = RESPST_COMPLETE;
+			break;
+
+		case RESPST_ERR_MISSING_OPCODE_LAST_D1E:
+			state = do_class_d1e_error(qp);
+			break;
+		case RESPST_ERR_RNR:
+			if (qp_type(qp) == IB_QPT_RC) {
+				/* RC - class B */
+				send_ack(qp, pkt, AETH_RNR_NAK |
+					 (~AETH_TYPE_MASK &
+					 qp->attr.min_rnr_timer),
+					 pkt->psn);
+			} else {
+				/* UD/UC - class D */
+				qp->resp.drop_msg = 1;
+			}
+			state = RESPST_CLEANUP;
+			break;
+
+		case RESPST_ERR_RKEY_VIOLATION:
+			if (qp_type(qp) == IB_QPT_RC) {
+				/* Class C */
+				do_class_ac_error(qp, AETH_NAK_REM_ACC_ERR,
+						  IB_WC_REM_ACCESS_ERR);
+				state = RESPST_COMPLETE;
+			} else {
+				qp->resp.drop_msg = 1;
+				if (qp->srq) {
+					/* UC/SRQ Class D */
+					qp->resp.status = IB_WC_REM_ACCESS_ERR;
+					state = RESPST_COMPLETE;
+				} else {
+					/* UC/non-SRQ Class E. */
+					state = RESPST_CLEANUP;
+				}
+			}
+			break;
+
+		case RESPST_ERR_LENGTH:
+			if (qp_type(qp) == IB_QPT_RC) {
+				/* Class C */
+				do_class_ac_error(qp, AETH_NAK_INVALID_REQ,
+						  IB_WC_REM_INV_REQ_ERR);
+				state = RESPST_COMPLETE;
+			} else if (qp->srq) {
+				/* UC/UD - class E */
+				qp->resp.status = IB_WC_REM_INV_REQ_ERR;
+				state = RESPST_COMPLETE;
+			} else {
+				/* UC/UD - class D */
+				qp->resp.drop_msg = 1;
+				state = RESPST_CLEANUP;
+			}
+			break;
+
+		case RESPST_ERR_MALFORMED_WQE:
+			/* All, Class A. */
+			do_class_ac_error(qp, AETH_NAK_REM_OP_ERR,
+					  IB_WC_LOC_QP_OP_ERR);
+			state = RESPST_COMPLETE;
+			break;
+
+		case RESPST_ERR_CQ_OVERFLOW:
+			/* All - Class G */
+			state = RESPST_ERROR;
+			break;
+
+		case RESPST_DONE:
+			if (qp->resp.goto_error) {
+				state = RESPST_ERROR;
+				break;
+			} else
+				goto done;
+
+		case RESPST_EXIT:
+			if (qp->resp.goto_error) {
+				state = RESPST_ERROR;
+				break;
+			} else
+				goto exit;
+
+		case RESPST_RESET: {
+			struct sk_buff *skb;
+
+			while ((skb = skb_dequeue(&qp->req_pkts))) {
+				rxe_drop_ref(qp);
+				kfree_skb(skb);
+				atomic_dec(&qp->req_skb_in);
+				atomic_dec(&rxe->req_skb_in);
+			}
+
+			while (!qp->srq && qp->rq.queue &&
+			       queue_head(qp->rq.queue))
+				advance_consumer(qp->rq.queue);
+
+			qp->resp.wqe = NULL;
+			goto exit;
+		}
+
+		case RESPST_ERROR:
+			qp->resp.goto_error = 0;
+			pr_warn("qp#%d moved to error state\n", qp_num(qp));
+			rxe_qp_error(qp);
+			goto exit;
+
+		default:
+			WARN_ON(1);
+		}
+	}
+
+exit:
+	ret = -EAGAIN;
+done:
+	return ret;
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 28/37] add rxe_arbiter.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (26 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 27/37] add rxe_resp.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.515453123-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 29/37] add rxe_dma.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (9 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_arbiter.c --]
[-- Type: text/plain, Size: 6210 bytes --]

packet output arbitration.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_arbiter.c |  190 ++++++++++++++++++++++++++++++++
 1 file changed, 190 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_arbiter.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_arbiter.c
@@ -0,0 +1,190 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/skbuff.h>
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+static inline void account_skb(struct rxe_dev *rxe, struct rxe_qp *qp,
+			       int is_request)
+{
+	if (is_request & RXE_REQ_MASK) {
+		atomic_dec(&rxe->req_skb_out);
+		atomic_dec(&qp->req_skb_out);
+		if (qp->need_req_skb) {
+			if (atomic_read(&qp->req_skb_out) < rxe_max_skb_per_qp)
+				rxe_run_task(&qp->req.task, 1);
+		}
+	} else {
+		atomic_dec(&rxe->resp_skb_out);
+		atomic_dec(&qp->resp_skb_out);
+	}
+}
+
+static int xmit_one_packet(struct rxe_dev *rxe, struct rxe_qp *qp,
+			   struct sk_buff *skb)
+{
+	int err;
+	struct timespec time;
+	long new_delay;
+	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	int is_request = pkt->mask & RXE_REQ_MASK;
+
+	/* drop pkt if qp is in wrong state to send */
+	if (!qp->valid)
+		goto drop;
+
+	if (is_request) {
+		if (qp->req.state != QP_STATE_READY)
+			goto drop;
+	} else {
+		if (qp->resp.state != QP_STATE_READY)
+			goto drop;
+	}
+
+	/* busy wait for static rate control
+	   we could refine this by yielding the tasklet
+	   for larger delays and waiting out the small ones */
+	if (rxe->arbiter.delay)
+		do {
+			getnstimeofday(&time);
+		} while (timespec_compare(&time, &rxe->arbiter.time) < 0);
+
+	new_delay = (skb->len*rxe_nsec_per_kbyte) >> 10;
+	if (new_delay < rxe_nsec_per_packet)
+		new_delay = rxe_nsec_per_packet;
+
+	if (pkt->mask & RXE_LOOPBACK_MASK)
+		err = rxe->ifc_ops->loopback(rxe, skb);
+	else
+		err = rxe->ifc_ops->send(rxe, skb);
+
+	/* we can recover from RXE_QUEUE_STOPPED errors
+	   by retrying the packet. In other cases
+	   the packet is consumed so move on */
+	if (err == RXE_QUEUE_STOPPED)
+		return err;
+	else if (err)
+		rxe->xmit_errors++;
+
+	rxe->arbiter.delay = new_delay > 0;
+	if (rxe->arbiter.delay) {
+		getnstimeofday(&time);
+		time.tv_nsec += new_delay;
+		while (time.tv_nsec > NSEC_PER_SEC) {
+			time.tv_sec += 1;
+			time.tv_nsec -= NSEC_PER_SEC;
+		}
+		rxe->arbiter.time = time;
+	}
+
+	goto done;
+
+drop:
+	kfree_skb(skb);
+	err = 0;
+done:
+	account_skb(rxe, qp, is_request);
+	return err;
+}
+
+/*
+ * choose one packet for sending
+ */
+int rxe_arbiter(void *arg)
+{
+	int err;
+	unsigned long flags;
+	struct rxe_dev *rxe = (struct rxe_dev *)arg;
+	struct sk_buff *skb;
+	struct list_head *qpl;
+	struct rxe_qp *qp;
+
+	/* get the next qp's send queue */
+	spin_lock_irqsave(&rxe->arbiter.list_lock, flags);
+	if (list_empty(&rxe->arbiter.qp_list)) {
+		spin_unlock_irqrestore(&rxe->arbiter.list_lock, flags);
+		return 1;
+	}
+
+	qpl = rxe->arbiter.qp_list.next;
+	list_del_init(qpl);
+	qp = list_entry(qpl, struct rxe_qp, arbiter_list);
+	spin_unlock_irqrestore(&rxe->arbiter.list_lock, flags);
+
+	/* get next packet from queue and try to send it
+	   note skb could have already been removed */
+	skb = skb_dequeue(&qp->send_pkts);
+	if (skb) {
+		err = xmit_one_packet(rxe, qp, skb);
+		if (err) {
+			if (err == RXE_QUEUE_STOPPED)
+				skb_queue_head(&qp->send_pkts, skb);
+			rxe_run_task(&rxe->arbiter.task, 1);
+			return 1;
+		}
+	}
+
+	/* if more work in queue put qp back on the list */
+	spin_lock_irqsave(&rxe->arbiter.list_lock, flags);
+
+	if (list_empty(qpl) && !skb_queue_empty(&qp->send_pkts))
+		list_add_tail(qpl, &rxe->arbiter.qp_list);
+
+	spin_unlock_irqrestore(&rxe->arbiter.list_lock, flags);
+	return 0;
+}
+
+/*
+ * queue a packet for sending from a qp
+ */
+void arbiter_skb_queue(struct rxe_dev *rxe, struct rxe_qp *qp,
+		       struct sk_buff *skb)
+{
+	int must_sched;
+	unsigned long flags;
+
+	/* add packet to send queue */
+	skb_queue_tail(&qp->send_pkts, skb);
+
+	/* if not already there add qp to arbiter list */
+	spin_lock_irqsave(&rxe->arbiter.list_lock, flags);
+	if (list_empty(&qp->arbiter_list))
+		list_add_tail(&qp->arbiter_list, &rxe->arbiter.qp_list);
+	spin_unlock_irqrestore(&rxe->arbiter.list_lock, flags);
+
+	/* run the arbiter, use tasklet unless only one packet */
+	must_sched = skb_queue_len(&qp->resp_pkts) > 1;
+	rxe_run_task(&rxe->arbiter.task, must_sched);
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 29/37] add rxe_dma.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (27 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 28/37] add rxe_arbiter.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.565188525-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 30/37] add rxe_icrc.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (8 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_dma.c --]
[-- Type: text/plain, Size: 5397 bytes --]

Dummy dma processing for rxe device.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_dma.c |  178 ++++++++++++++++++++++++++++++++++++
 1 file changed, 178 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_dma.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_dma.c
@@ -0,0 +1,178 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ * Copyright (c) 2006 QLogic, Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+static int rxe_mapping_error(struct ib_device *dev, u64 dma_addr)
+{
+	return dma_addr == 0;
+}
+
+static u64 rxe_dma_map_single(struct ib_device *dev,
+			      void *cpu_addr, size_t size,
+			      enum dma_data_direction direction)
+{
+	BUG_ON(!valid_dma_direction(direction));
+	return (u64) (uintptr_t) cpu_addr;
+}
+
+static void rxe_dma_unmap_single(struct ib_device *dev,
+				 u64 addr, size_t size,
+				 enum dma_data_direction direction)
+{
+	BUG_ON(!valid_dma_direction(direction));
+}
+
+static u64 rxe_dma_map_page(struct ib_device *dev,
+			    struct page *page,
+			    unsigned long offset,
+			    size_t size, enum dma_data_direction direction)
+{
+	u64 addr = 0;
+
+	BUG_ON(!valid_dma_direction(direction));
+
+	if (offset + size > PAGE_SIZE)
+		goto out;
+
+	addr = (uintptr_t) page_address(page);
+	if (addr)
+		addr += offset;
+
+out:
+	return addr;
+}
+
+static void rxe_dma_unmap_page(struct ib_device *dev,
+			       u64 addr, size_t size,
+			       enum dma_data_direction direction)
+{
+	BUG_ON(!valid_dma_direction(direction));
+}
+
+static int rxe_map_sg(struct ib_device *dev, struct scatterlist *sgl,
+		      int nents, enum dma_data_direction direction)
+{
+	struct scatterlist *sg;
+	u64 addr;
+	int i;
+	int ret = nents;
+
+	BUG_ON(!valid_dma_direction(direction));
+
+	for_each_sg(sgl, sg, nents, i) {
+		addr = (uintptr_t) page_address(sg_page(sg));
+		/* TODO: handle highmem pages */
+		if (!addr) {
+			ret = 0;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void rxe_unmap_sg(struct ib_device *dev,
+			 struct scatterlist *sg, int nents,
+			 enum dma_data_direction direction)
+{
+	BUG_ON(!valid_dma_direction(direction));
+}
+
+static u64 rxe_sg_dma_address(struct ib_device *dev, struct scatterlist *sg)
+{
+	u64 addr = (uintptr_t) page_address(sg_page(sg));
+
+	if (addr)
+		addr += sg->offset;
+
+	return addr;
+}
+
+static unsigned int rxe_sg_dma_len(struct ib_device *dev,
+				   struct scatterlist *sg)
+{
+	return sg->length;
+}
+
+static void rxe_sync_single_for_cpu(struct ib_device *dev,
+				    u64 addr,
+				    size_t size, enum dma_data_direction dir)
+{
+}
+
+static void rxe_sync_single_for_device(struct ib_device *dev,
+				       u64 addr,
+				       size_t size, enum dma_data_direction dir)
+{
+}
+
+static void *rxe_dma_alloc_coherent(struct ib_device *dev, size_t size,
+				    u64 *dma_handle, gfp_t flag)
+{
+	struct page *p;
+	void *addr = NULL;
+
+	p = alloc_pages(flag, get_order(size));
+	if (p)
+		addr = page_address(p);
+
+	if (dma_handle)
+		*dma_handle = (uintptr_t) addr;
+
+	return addr;
+}
+
+static void rxe_dma_free_coherent(struct ib_device *dev, size_t size,
+				  void *cpu_addr, u64 dma_handle)
+{
+	free_pages((unsigned long)cpu_addr, get_order(size));
+}
+
+struct ib_dma_mapping_ops rxe_dma_mapping_ops = {
+	rxe_mapping_error,
+	rxe_dma_map_single,
+	rxe_dma_unmap_single,
+	rxe_dma_map_page,
+	rxe_dma_unmap_page,
+	rxe_map_sg,
+	rxe_unmap_sg,
+	rxe_sg_dma_address,
+	rxe_sg_dma_len,
+	rxe_sync_single_for_cpu,
+	rxe_sync_single_for_device,
+	rxe_dma_alloc_coherent,
+	rxe_dma_free_coherent
+};


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 30/37] add rxe_icrc.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (28 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 29/37] add rxe_dma.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 31/37] add rxe.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (7 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_icrc.c --]
[-- Type: text/plain, Size: 3834 bytes --]

Compute ICRC

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_icrc.c |   99 +++++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_icrc.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_icrc.c
@@ -0,0 +1,99 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+/* Compute a partial ICRC for all the IB transport headers. */
+u32 rxe_icrc_hdr(struct rxe_pkt_info *pkt)
+{
+	u32 crc;
+	unsigned int length;
+	unsigned int grh_offset;
+	unsigned int bth_offset;
+	u8 tmp[RXE_LRH_BYTES + RXE_GRH_BYTES + RXE_BTH_BYTES];
+
+	/* This seed is the result of computing a CRC with a seed of
+	 * 0xfffffff and 8 bytes of 0xff representing a masked LRH. */
+	crc = 0xdebb20e3;
+
+	length = RXE_BTH_BYTES;
+	grh_offset = 0;
+	bth_offset = 0;
+
+	if (pkt->mask & RXE_LRH_MASK) {
+		length += RXE_LRH_BYTES;
+		grh_offset += RXE_LRH_BYTES;
+		bth_offset += RXE_LRH_BYTES;
+	}
+	if (pkt->mask & RXE_GRH_MASK) {
+		length += RXE_GRH_BYTES;
+		bth_offset += RXE_GRH_BYTES;
+	}
+
+	memcpy(tmp, pkt->hdr, length);
+
+	if (pkt->mask & RXE_GRH_MASK) {
+		tmp[grh_offset + 0] |= 0x0f;	/* GRH: tclass */
+		tmp[grh_offset + 1] = 0xff;
+		tmp[grh_offset + 2] = 0xff;
+		tmp[grh_offset + 3] = 0xff;
+		tmp[grh_offset + 7] = 0xff;
+	}
+
+	tmp[bth_offset + 4] = 0xff;		/* BTH: resv8a */
+
+	crc = crc32_le(crc, tmp + grh_offset, length - grh_offset);
+
+	/* And finish to compute the CRC on the remainder of the headers. */
+	crc = crc32_le(crc, pkt->hdr + length,
+		       rxe_opcode[pkt->opcode].length - RXE_BTH_BYTES);
+
+	return crc;
+}
+
+/* Compute the ICRC for a packet (incoming or outgoing). */
+u32 rxe_icrc_pkt(struct rxe_pkt_info *pkt)
+{
+	u32 crc;
+	int size;
+
+	crc = rxe_icrc_hdr(pkt);
+
+	/* And finish to compute the CRC on the remainder. */
+	size = pkt->paylen - rxe_opcode[pkt->opcode].length - RXE_ICRC_SIZE;
+	crc = crc32_le(crc, payload_addr(pkt), size);
+	crc = ~crc;
+
+	return crc;
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 31/37] add rxe.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (29 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 30/37] add rxe_icrc.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.666252623-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 32/37] add rxe_net.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (6 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe.c --]
[-- Type: text/plain, Size: 16261 bytes --]

module main for ib_rxe

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe.c |  549 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 549 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe.c
@@ -0,0 +1,549 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_loc.h"
+
+MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves");
+MODULE_DESCRIPTION("Soft RDMA transport");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static int rxe_max_ucontext = RXE_MAX_UCONTEXT;
+module_param_named(max_ucontext, rxe_max_ucontext, int, 0644);
+MODULE_PARM_DESC(max_ucontext, "max user contexts per device");
+
+static int rxe_max_qp = RXE_MAX_QP;
+module_param_named(max_qp, rxe_max_qp, int, 0444);
+MODULE_PARM_DESC(max_qp, "max QPs per device");
+
+static int rxe_max_qp_wr = RXE_MAX_QP_WR;
+module_param_named(max_qp_wr, rxe_max_qp_wr, int, 0644);
+MODULE_PARM_DESC(max_qp_wr, "max send or recv WR's per QP");
+
+static int rxe_max_inline_data = RXE_MAX_INLINE_DATA;
+module_param_named(max_inline_data, rxe_max_inline_data, int, 0644);
+MODULE_PARM_DESC(max_inline_data, "max inline data per WR");
+
+static int rxe_max_cq = RXE_MAX_CQ;
+module_param_named(max_cq, rxe_max_cq, int, 0644);
+MODULE_PARM_DESC(max_cq, "max CQs per device");
+
+static int rxe_max_mr = RXE_MAX_MR;
+module_param_named(max_mr, rxe_max_mr, int, 0644);
+MODULE_PARM_DESC(max_mr, "max MRs per device");
+
+static int rxe_max_fmr = RXE_MAX_FMR;
+module_param_named(max_fmr, rxe_max_fmr, int, 0644);
+MODULE_PARM_DESC(max_fmr, "max MRs per device");
+
+static int rxe_max_mw = RXE_MAX_MW;
+module_param_named(max_mw, rxe_max_mw, int, 0644);
+MODULE_PARM_DESC(max_mw, "max MWs per device");
+
+static int rxe_max_log_cqe = RXE_MAX_LOG_CQE;
+module_param_named(max_log_cqe, rxe_max_log_cqe, int, 0644);
+MODULE_PARM_DESC(max_log_cqe, "Log2 of max CQ entries per CQ");
+
+int rxe_fast_comp = 2;
+module_param_named(fast_comp, rxe_fast_comp, int, 0644);
+MODULE_PARM_DESC(fast_comp, "fast path call to completer "
+		 "(0=no, 1=no int context, 2=any context)");
+
+int rxe_fast_resp = 2;
+module_param_named(fast_resp, rxe_fast_resp, int, 0644);
+MODULE_PARM_DESC(fast_resp, "enable fast path call to responder "
+		 "(0=no, 1=no int context, 2=any context)");
+
+int rxe_fast_req = 2;
+module_param_named(fast_req, rxe_fast_req, int, 0644);
+MODULE_PARM_DESC(fast_req, "enable fast path call to requester "
+		 "(0=no, 1=no int context, 2=any context)");
+
+int rxe_fast_arb = 2;
+module_param_named(fast_arb, rxe_fast_arb, int, 0644);
+MODULE_PARM_DESC(fast_arb, "enable fast path call to arbiter "
+		 "(0=no, 1=no int context, 2=any context)");
+
+int rxe_crc_disable;
+EXPORT_SYMBOL(rxe_crc_disable);
+module_param_named(crc_disable, rxe_crc_disable, int, 0644);
+MODULE_PARM_DESC(rxe_crc_disable,
+		 "Disable crc32 computation on outbound packets");
+
+int rxe_nsec_per_packet = 200;
+module_param_named(nsec_per_packet, rxe_nsec_per_packet, int, 0644);
+MODULE_PARM_DESC(nsec_per_packet,
+		 "minimum output packet delay nsec");
+
+int rxe_nsec_per_kbyte = 700;
+module_param_named(nsec_per_kbyte, rxe_nsec_per_kbyte, int, 0644);
+MODULE_PARM_DESC(nsec_per_kbyte,
+		 "minimum output packet delay per kbyte nsec");
+
+int rxe_max_skb_per_qp = 800;
+module_param_named(max_skb_per_qp, rxe_max_skb_per_qp, int, 0644);
+MODULE_PARM_DESC(max_skb_per_qp,
+		 "maximum skb's posted for output");
+
+int rxe_max_req_comp_gap = 128;
+module_param_named(max_req_comp_gap, rxe_max_req_comp_gap, int, 0644);
+MODULE_PARM_DESC(max_req_comp_gap,
+		 "max difference between req.psn and comp.psn");
+
+int rxe_max_pkt_per_ack = 64;
+module_param_named(max_pkt_per_ack, rxe_max_pkt_per_ack, int, 0644);
+MODULE_PARM_DESC(max_pkt_per_ack,
+		 "max packets before an ack will be generated");
+
+int rxe_default_mtu = 1024;
+module_param_named(default_mtu, rxe_default_mtu, int, 0644);
+MODULE_PARM_DESC(default_mtu,
+		 "default rxe port mtu");
+
+/* free resources for all ports on a device */
+static void rxe_cleanup_ports(struct rxe_dev *rxe)
+{
+	unsigned int port_num;
+	struct rxe_port *port;
+
+	for (port_num = 1; port_num <= rxe->num_ports; port_num++) {
+		port = &rxe->port[port_num - 1];
+
+		kfree(port->guid_tbl);
+		port->guid_tbl = NULL;
+
+		kfree(port->pkey_tbl);
+		port->pkey_tbl = NULL;
+	}
+
+	kfree(rxe->port);
+	rxe->port = NULL;
+}
+
+/* free resources for a rxe device
+   all objects created for this device
+   must have been destroyed */
+static void rxe_cleanup(struct rxe_dev *rxe)
+{
+	rxe_cleanup_task(&rxe->arbiter.task);
+
+	rxe_pool_cleanup(&rxe->uc_pool);
+	rxe_pool_cleanup(&rxe->pd_pool);
+	rxe_pool_cleanup(&rxe->ah_pool);
+	rxe_pool_cleanup(&rxe->srq_pool);
+	rxe_pool_cleanup(&rxe->qp_pool);
+	rxe_pool_cleanup(&rxe->cq_pool);
+	rxe_pool_cleanup(&rxe->mr_pool);
+	rxe_pool_cleanup(&rxe->fmr_pool);
+	rxe_pool_cleanup(&rxe->mw_pool);
+	rxe_pool_cleanup(&rxe->mc_grp_pool);
+	rxe_pool_cleanup(&rxe->mc_elem_pool);
+
+	rxe_cleanup_ports(rxe);
+}
+
+/* called when all references have been dropped */
+void rxe_release(struct kref *kref)
+{
+	struct rxe_dev *rxe = container_of(kref, struct rxe_dev, ref_cnt);
+
+	rxe_cleanup(rxe);
+	ib_dealloc_device(&rxe->ib_dev);
+	module_put(THIS_MODULE);
+	rxe->ifc_ops->release(rxe);
+}
+
+/* initialize rxe device parameters */
+static int rxe_init_device_param(struct rxe_dev *rxe)
+{
+	rxe->num_ports				= RXE_NUM_PORT;
+	rxe->max_inline_data			= rxe_max_inline_data;
+
+	rxe->attr.fw_ver			= RXE_FW_VER;
+	rxe->attr.max_mr_size			= RXE_MAX_MR_SIZE;
+	rxe->attr.page_size_cap			= RXE_PAGE_SIZE_CAP;
+	rxe->attr.vendor_id			= RXE_VENDOR_ID;
+	rxe->attr.vendor_part_id		= RXE_VENDOR_PART_ID;
+	rxe->attr.hw_ver			= RXE_HW_VER;
+	rxe->attr.max_qp			= rxe_max_qp;
+	rxe->attr.max_qp_wr			= rxe_max_qp_wr;
+	rxe->attr.device_cap_flags		= RXE_DEVICE_CAP_FLAGS;
+	rxe->attr.max_sge			= RXE_MAX_SGE;
+	rxe->attr.max_sge_rd			= RXE_MAX_SGE_RD;
+	rxe->attr.max_cq			= rxe_max_cq;
+	if ((rxe_max_log_cqe < 0) || (rxe_max_log_cqe > 20))
+		rxe_max_log_cqe = RXE_MAX_LOG_CQE;
+	rxe->attr.max_cqe			= (1 << rxe_max_log_cqe) - 1;
+	rxe->attr.max_mr			= rxe_max_mr;
+	rxe->attr.max_pd			= RXE_MAX_PD;
+	rxe->attr.max_qp_rd_atom		= RXE_MAX_QP_RD_ATOM;
+	rxe->attr.max_ee_rd_atom		= RXE_MAX_EE_RD_ATOM;
+	rxe->attr.max_res_rd_atom		= RXE_MAX_RES_RD_ATOM;
+	rxe->attr.max_qp_init_rd_atom		= RXE_MAX_QP_INIT_RD_ATOM;
+	rxe->attr.max_ee_init_rd_atom		= RXE_MAX_EE_INIT_RD_ATOM;
+	rxe->attr.atomic_cap			= RXE_ATOMIC_CAP;
+	rxe->attr.max_ee			= RXE_MAX_EE;
+	rxe->attr.max_rdd			= RXE_MAX_RDD;
+	rxe->attr.max_mw			= rxe_max_mw;
+	rxe->attr.max_raw_ipv6_qp		= RXE_MAX_RAW_IPV6_QP;
+	rxe->attr.max_raw_ethy_qp		= RXE_MAX_RAW_ETHY_QP;
+	rxe->attr.max_mcast_grp			= RXE_MAX_MCAST_GRP;
+	rxe->attr.max_mcast_qp_attach		= RXE_MAX_MCAST_QP_ATTACH;
+	rxe->attr.max_total_mcast_qp_attach	= RXE_MAX_TOT_MCAST_QP_ATTACH;
+	rxe->attr.max_ah			= RXE_MAX_AH;
+	rxe->attr.max_fmr			= rxe_max_fmr;
+	rxe->attr.max_map_per_fmr		= RXE_MAX_MAP_PER_FMR;
+	rxe->attr.max_srq			= RXE_MAX_SRQ;
+	rxe->attr.max_srq_wr			= RXE_MAX_SRQ_WR;
+	rxe->attr.max_srq_sge			= RXE_MAX_SRQ_SGE;
+	rxe->attr.max_fast_reg_page_list_len	= RXE_MAX_FMR_PAGE_LIST_LEN;
+	rxe->attr.max_pkeys			= RXE_MAX_PKEYS;
+	rxe->attr.local_ca_ack_delay		= RXE_LOCAL_CA_ACK_DELAY;
+
+	rxe->max_ucontext			= rxe_max_ucontext;
+	rxe->pref_mtu = rxe_mtu_int_to_enum(rxe_default_mtu);
+
+	return 0;
+}
+
+/* initialize port attributes */
+static int rxe_init_port_param(struct rxe_dev *rxe, unsigned int port_num)
+{
+	struct rxe_port *port = &rxe->port[port_num - 1];
+
+	port->attr.state		= RXE_PORT_STATE;
+	port->attr.max_mtu		= RXE_PORT_MAX_MTU;
+	port->attr.active_mtu		= RXE_PORT_ACTIVE_MTU;
+	port->attr.gid_tbl_len		= RXE_PORT_GID_TBL_LEN;
+	port->attr.port_cap_flags	= RXE_PORT_PORT_CAP_FLAGS;
+	port->attr.max_msg_sz		= RXE_PORT_MAX_MSG_SZ;
+	port->attr.bad_pkey_cntr	= RXE_PORT_BAD_PKEY_CNTR;
+	port->attr.qkey_viol_cntr	= RXE_PORT_QKEY_VIOL_CNTR;
+	port->attr.pkey_tbl_len		= RXE_PORT_PKEY_TBL_LEN;
+	port->attr.lid			= RXE_PORT_LID;
+	port->attr.sm_lid		= RXE_PORT_SM_LID;
+	port->attr.lmc			= RXE_PORT_LMC;
+	port->attr.max_vl_num		= RXE_PORT_MAX_VL_NUM;
+	port->attr.sm_sl		= RXE_PORT_SM_SL;
+	port->attr.subnet_timeout	= RXE_PORT_SUBNET_TIMEOUT;
+	port->attr.init_type_reply	= RXE_PORT_INIT_TYPE_REPLY;
+	port->attr.active_width		= RXE_PORT_ACTIVE_WIDTH;
+	port->attr.active_speed		= RXE_PORT_ACTIVE_SPEED;
+	port->attr.phys_state		= RXE_PORT_PHYS_STATE;
+	port->mtu_cap			=
+				rxe_mtu_enum_to_int(RXE_PORT_ACTIVE_MTU);
+	port->subnet_prefix		= cpu_to_be64(RXE_PORT_SUBNET_PREFIX);
+
+	return 0;
+}
+
+/* initialize port state, note IB convention
+   that HCA ports are always numbered from 1 */
+static int rxe_init_ports(struct rxe_dev *rxe)
+{
+	int err;
+	unsigned int port_num;
+	struct rxe_port *port;
+
+	rxe->port = kcalloc(rxe->num_ports, sizeof(struct rxe_port),
+			    GFP_KERNEL);
+	if (!rxe->port)
+		return -ENOMEM;
+
+	for (port_num = 1; port_num <= rxe->num_ports; port_num++) {
+		port = &rxe->port[port_num - 1];
+
+		rxe_init_port_param(rxe, port_num);
+
+		if (!port->attr.pkey_tbl_len) {
+			err = -EINVAL;
+			goto err1;
+		}
+
+		port->pkey_tbl = kcalloc(port->attr.pkey_tbl_len,
+					 sizeof(*port->pkey_tbl), GFP_KERNEL);
+		if (!port->pkey_tbl) {
+			err = -ENOMEM;
+			goto err1;
+		}
+
+		port->pkey_tbl[0] = 0xffff;
+
+		if (!port->attr.gid_tbl_len) {
+			kfree(port->pkey_tbl);
+			err = -EINVAL;
+			goto err1;
+		}
+
+		port->guid_tbl = kcalloc(port->attr.gid_tbl_len,
+					 sizeof(*port->guid_tbl), GFP_KERNEL);
+		if (!port->guid_tbl) {
+			kfree(port->pkey_tbl);
+			err = -ENOMEM;
+			goto err1;
+		}
+
+		port->guid_tbl[0] = rxe->ifc_ops->port_guid(rxe, port_num);
+
+		spin_lock_init(&port->port_lock);
+	}
+
+	return 0;
+
+err1:
+	while (--port_num >= 1) {
+		port = &rxe->port[port_num - 1];
+		kfree(port->pkey_tbl);
+		kfree(port->guid_tbl);
+	}
+
+	kfree(rxe->port);
+	return err;
+}
+
+/* init pools of managed objects */
+static int rxe_init_pools(struct rxe_dev *rxe)
+{
+	int err;
+
+	err = rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC,
+			     rxe->max_ucontext);
+	if (err)
+		goto err1;
+
+	err = rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD,
+			     rxe->attr.max_pd);
+	if (err)
+		goto err2;
+
+	err = rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH,
+			     rxe->attr.max_ah);
+	if (err)
+		goto err3;
+
+	err = rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ,
+			     rxe->attr.max_srq);
+	if (err)
+		goto err4;
+
+	err = rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP,
+			     rxe->attr.max_qp);
+	if (err)
+		goto err5;
+
+	err = rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ,
+			     rxe->attr.max_cq);
+	if (err)
+		goto err6;
+
+	err = rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR,
+			     rxe->attr.max_mr);
+	if (err)
+		goto err7;
+
+	err = rxe_pool_init(rxe, &rxe->fmr_pool, RXE_TYPE_FMR,
+			     rxe->attr.max_fmr);
+	if (err)
+		goto err8;
+
+	err = rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW,
+			     rxe->attr.max_mw);
+	if (err)
+		goto err9;
+
+	err = rxe_pool_init(rxe, &rxe->mc_grp_pool, RXE_TYPE_MC_GRP,
+			     rxe->attr.max_mcast_grp);
+	if (err)
+		goto err10;
+
+	err = rxe_pool_init(rxe, &rxe->mc_elem_pool, RXE_TYPE_MC_ELEM,
+			     rxe->attr.max_total_mcast_qp_attach);
+	if (err)
+		goto err11;
+
+	return 0;
+
+err11:
+	rxe_pool_cleanup(&rxe->mc_grp_pool);
+err10:
+	rxe_pool_cleanup(&rxe->mw_pool);
+err9:
+	rxe_pool_cleanup(&rxe->fmr_pool);
+err8:
+	rxe_pool_cleanup(&rxe->mr_pool);
+err7:
+	rxe_pool_cleanup(&rxe->cq_pool);
+err6:
+	rxe_pool_cleanup(&rxe->qp_pool);
+err5:
+	rxe_pool_cleanup(&rxe->srq_pool);
+err4:
+	rxe_pool_cleanup(&rxe->ah_pool);
+err3:
+	rxe_pool_cleanup(&rxe->pd_pool);
+err2:
+	rxe_pool_cleanup(&rxe->uc_pool);
+err1:
+	return err;
+}
+
+/* initialize rxe device state */
+static int rxe_init(struct rxe_dev *rxe)
+{
+	int err;
+
+	/* init default device parameters */
+	rxe_init_device_param(rxe);
+
+	err = rxe_init_ports(rxe);
+	if (err)
+		goto err1;
+
+	err = rxe_init_pools(rxe);
+	if (err)
+		goto err2;
+
+	/* init packet counters */
+	atomic_set(&rxe->req_skb_in, 0);
+	atomic_set(&rxe->resp_skb_in, 0);
+	atomic_set(&rxe->req_skb_out, 0);
+	atomic_set(&rxe->resp_skb_out, 0);
+
+	/* init pending mmap list */
+	spin_lock_init(&rxe->mmap_offset_lock);
+	spin_lock_init(&rxe->pending_lock);
+	INIT_LIST_HEAD(&rxe->pending_mmaps);
+
+	/* init arbiter */
+	spin_lock_init(&rxe->arbiter.list_lock);
+	INIT_LIST_HEAD(&rxe->arbiter.qp_list);
+	rxe_init_task(rxe, &rxe->arbiter.task, &rxe_fast_arb,
+		      rxe, rxe_arbiter, "arb");
+
+	return 0;
+
+err2:
+	rxe_cleanup_ports(rxe);
+err1:
+	return err;
+}
+
+int rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu,
+		unsigned int port_num)
+{
+	struct rxe_port *port = &rxe->port[port_num - 1];
+	enum rxe_mtu mtu;
+
+	mtu = eth_mtu_int_to_enum(ndev_mtu);
+	if (!mtu)
+		return -EINVAL;
+
+	/* Set the port mtu to min(feasible, preferred) */
+	mtu = min_t(enum rxe_mtu, mtu, rxe->pref_mtu);
+
+	port->attr.active_mtu = (enum ib_mtu __force)mtu;
+	port->mtu_cap = rxe_mtu_enum_to_int(mtu);
+
+	return 0;
+}
+EXPORT_SYMBOL(rxe_set_mtu);
+
+/* called by ifc layer to create new rxe device
+   caller should allocate memory for rxe by calling
+   ib_alloc_device */
+int rxe_add(struct rxe_dev *rxe, unsigned int mtu)
+{
+	int err;
+	unsigned port_num = 1;
+
+	__module_get(THIS_MODULE);
+
+	kref_init(&rxe->ref_cnt);
+
+	err = rxe_init(rxe);
+	if (err)
+		goto err1;
+
+	err = rxe_set_mtu(rxe, mtu, port_num);
+	if (err)
+		goto err2;
+
+	err = rxe_register_device(rxe);
+	if (err)
+		goto err2;
+
+	return 0;
+
+err2:
+	rxe_cleanup(rxe);
+err1:
+	kref_put(&rxe->ref_cnt, rxe_release);
+	module_put(THIS_MODULE);
+	return err;
+}
+EXPORT_SYMBOL(rxe_add);
+
+/* called by the ifc layer to remove a device */
+void rxe_remove(struct rxe_dev *rxe)
+{
+	rxe_unregister_device(rxe);
+
+	kref_put(&rxe->ref_cnt, rxe_release);
+}
+EXPORT_SYMBOL(rxe_remove);
+
+static int __init rxe_module_init(void)
+{
+	int err;
+
+	/* initialize slab caches for managed objects */
+	err = rxe_cache_init();
+	if (err) {
+		pr_err("rxe: unable to init object pools\n");
+		return err;
+	}
+
+	pr_info("rxe: loaded\n");
+
+	return 0;
+}
+
+static void __exit rxe_module_exit(void)
+{
+	rxe_cache_exit();
+
+	pr_info("rxe: unloaded\n");
+}
+
+module_init(rxe_module_init);
+module_exit(rxe_module_exit);


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 32/37] add rxe_net.h
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (30 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 31/37] add rxe.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.715886208-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 33/37] add rxe_net.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (5 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_net.h --]
[-- Type: text/plain, Size: 3406 bytes --]

Common declarations for ib_rxe_net module.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_net.h |   86 ++++++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_net.h
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_net.h
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RXE_NET_H
+#define RXE_NET_H
+
+#include <net/sock.h>
+#include <net/if_inet6.h>
+
+/*
+ * this should be defined in .../include/linux/if_ether.h
+ */
+#define ETH_P_RXE			(0x8915)
+
+/*
+ * this should be defined in .../include/linux/netfilter.h
+ * to a specific value
+ */
+#define NFPROTO_RXE			(0)
+
+/*
+ * these should be defined in .../include/linux/netfilter_rxe.h
+ */
+#define NF_RXE_IN			(0)
+#define NF_RXE_OUT			(1)
+
+/* Should probably move to something other than an array...these can be big */
+#define RXE_MAX_IF_INDEX	(384)
+
+struct rxe_net_info {
+	struct rxe_dev		*rxe;
+	u8			port;
+	struct net_device	*ndev;
+	int			status;
+};
+
+extern struct rxe_net_info net_info[RXE_MAX_IF_INDEX];
+extern spinlock_t net_info_lock;
+
+/* caller must hold net_dev_lock */
+static inline struct rxe_dev *net_to_rxe(struct net_device *ndev)
+{
+	return (ndev->ifindex >= RXE_MAX_IF_INDEX) ?
+		NULL : net_info[ndev->ifindex].rxe;
+}
+
+static inline u8 net_to_port(struct net_device *ndev)
+{
+	return net_info[ndev->ifindex].port;
+}
+
+void rxe_net_add(struct net_device *ndev);
+void rxe_net_up(struct net_device *ndev);
+void rxe_net_down(struct net_device *ndev);
+
+#endif /* RXE_NET_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 33/37] add rxe_net.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (31 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 32/37] add rxe_net.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.765038738-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 34/37] add rxe_net_sysfs.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (4 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_net.c --]
[-- Type: text/plain, Size: 14791 bytes --]

implements kernel module that implements an interface between
ib_rxe and the netdev stack.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_net.c |  580 ++++++++++++++++++++++++++++++++++++
 1 file changed, 580 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_net.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_net.c
@@ -0,0 +1,580 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/if.h>
+#include <linux/if_vlan.h>
+#include <net/sch_generic.h>
+#include <linux/netfilter.h>
+#include <rdma/ib_addr.h>
+
+#include "rxe.h"
+#include "rxe_net.h"
+
+MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves");
+MODULE_DESCRIPTION("RDMA transport over Converged Enhanced Ethernet");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static int rxe_eth_proto_id = ETH_P_RXE;
+module_param_named(eth_proto_id, rxe_eth_proto_id, int, 0644);
+MODULE_PARM_DESC(eth_proto_id, "Ethernet protocol ID (default/correct=0x8915)");
+
+static int rxe_xmit_shortcut;
+module_param_named(xmit_shortcut, rxe_xmit_shortcut, int, 0644);
+MODULE_PARM_DESC(xmit_shortcut,
+		 "Shortcut transmit (EXPERIMENTAL)");
+
+static int rxe_loopback_mad_grh_fix = 1;
+module_param_named(loopback_mad_grh_fix, rxe_loopback_mad_grh_fix, int, 0644);
+MODULE_PARM_DESC(loopback_mad_grh_fix, "Allow MADs to self without GRH");
+
+/*
+ * note: this table is a replacement for a protocol specific pointer
+ * in struct net_device which exists for other ethertypes
+ * this allows us to not have to patch that data structure
+ * eventually we want to get our own when we're famous
+ */
+struct rxe_net_info net_info[RXE_MAX_IF_INDEX];
+spinlock_t net_info_lock;
+
+static int rxe_net_rcv(struct sk_buff *skb,
+		       struct net_device *ndev,
+		       struct packet_type *ptype,
+		       struct net_device *orig_dev);
+
+static __be64 rxe_mac_to_eui64(struct net_device *ndev)
+{
+	unsigned char *mac_addr = ndev->dev_addr;
+	__be64 eui64;
+	unsigned char *dst = (unsigned char *)&eui64;
+
+	dst[0] = mac_addr[0] ^ 2;
+	dst[1] = mac_addr[1];
+	dst[2] = mac_addr[2];
+	dst[3] = 0xff;
+	dst[4] = 0xfe;
+	dst[5] = mac_addr[3];
+	dst[6] = mac_addr[4];
+	dst[7] = mac_addr[5];
+
+	return eui64;
+}
+
+/* callback when rxe gets released */
+static void release(struct rxe_dev *rxe)
+{
+	module_put(THIS_MODULE);
+}
+
+static __be64 node_guid(struct rxe_dev *rxe)
+{
+	return rxe_mac_to_eui64(rxe->ndev);
+}
+
+static __be64 port_guid(struct rxe_dev *rxe, unsigned int port_num)
+{
+	return rxe_mac_to_eui64(rxe->ndev);
+}
+
+static struct device *dma_device(struct rxe_dev *rxe)
+{
+	struct net_device *ndev;
+
+	ndev = rxe->ndev;
+
+	if (ndev->priv_flags & IFF_802_1Q_VLAN)
+		ndev = vlan_dev_real_dev(ndev);
+
+	return ndev->dev.parent;
+}
+
+static int mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+	int err;
+	unsigned char ll_addr[ETH_ALEN];
+
+	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
+	err = dev_mc_add(rxe->ndev, ll_addr);
+
+	return err;
+}
+
+static int mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+	int err;
+	unsigned char ll_addr[ETH_ALEN];
+
+	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
+	err = dev_mc_del(rxe->ndev, ll_addr);
+
+	return err;
+}
+
+static inline int queue_deactivated(struct sk_buff *skb)
+{
+	const struct net_device_ops *ops = skb->dev->netdev_ops;
+	u16 queue_index = 0;
+	struct netdev_queue *txq;
+
+	if (ops->ndo_select_queue)
+		queue_index = ops->ndo_select_queue(skb->dev, skb);
+	else if (skb->dev->real_num_tx_queues > 1)
+		queue_index = skb_tx_hash(skb->dev, skb);
+
+	txq = netdev_get_tx_queue(skb->dev, queue_index);
+	return txq->qdisc->state & 2;
+}
+
+static int send_finish(struct sk_buff *skb)
+{
+	if (!rxe_xmit_shortcut)
+		return dev_queue_xmit(skb);
+	else
+		return skb->dev->netdev_ops->ndo_start_xmit(skb, skb->dev);
+}
+
+static int send(struct rxe_dev *rxe, struct sk_buff *skb)
+{
+	if (queue_deactivated(skb))
+		return RXE_QUEUE_STOPPED;
+
+	if (netif_queue_stopped(skb->dev))
+		return RXE_QUEUE_STOPPED;
+
+	return NF_HOOK(NFPROTO_RXE, NF_RXE_OUT, skb, rxe->ndev, NULL,
+		send_finish);
+}
+
+static int loopback_finish(struct sk_buff *skb)
+{
+	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	struct packet_type *ptype = NULL;
+	struct net_device *orig_dev = pkt->rxe->ndev;
+
+	return rxe_net_rcv(skb, pkt->rxe->ndev, ptype, orig_dev);
+}
+
+static int loopback(struct rxe_dev *rxe, struct sk_buff *skb)
+{
+	return NF_HOOK(NFPROTO_RXE, NF_RXE_OUT, skb, rxe->ndev, NULL,
+		loopback_finish);
+}
+
+static inline int addr_same(struct rxe_dev *rxe, struct rxe_av *av)
+{
+	int port_num = 1;
+	return rxe->port[port_num - 1].guid_tbl[0]
+			== av->attr.grh.dgid.global.interface_id;
+}
+
+static struct sk_buff *init_packet(struct rxe_dev *rxe, struct rxe_av *av,
+				   int paylen)
+{
+	struct sk_buff *skb;
+	struct rxe_pkt_info *pkt;
+
+
+	skb = alloc_skb(paylen + RXE_GRH_BYTES + LL_RESERVED_SPACE(rxe->ndev),
+			GFP_ATOMIC);
+	if (!skb)
+		return NULL;
+
+	skb_reserve(skb, LL_RESERVED_SPACE(rxe->ndev));
+	skb_reset_network_header(skb);
+
+	skb->dev	= rxe->ndev;
+	skb->protocol	= htons(rxe_eth_proto_id);
+
+	pkt		= SKB_TO_PKT(skb);
+	pkt->rxe	= rxe;
+	pkt->port_num	= 1;
+	pkt->hdr	= skb_put(skb, RXE_GRH_BYTES + paylen);
+	pkt->mask	= RXE_GRH_MASK;
+
+	dev_hard_header(skb, rxe->ndev, rxe_eth_proto_id,
+			av->ll_addr, rxe->ndev->dev_addr, skb->len);
+
+	if (addr_same(rxe, av))
+		pkt->mask |= RXE_LOOPBACK_MASK;
+
+	return skb;
+}
+
+static int init_av(struct rxe_dev *rxe, struct ib_ah_attr *attr,
+		   struct rxe_av *av)
+{
+	struct in6_addr *in6 = (struct in6_addr *)attr->grh.dgid.raw;
+
+	/* grh required for rxe_net */
+	if ((attr->ah_flags & IB_AH_GRH) == 0) {
+		if (rxe_loopback_mad_grh_fix) {
+			/* temporary fix so that we can handle mad's to self
+			   without grh's included add grh pointing to self */
+			attr->ah_flags |= IB_AH_GRH;
+			attr->grh.dgid.global.subnet_prefix
+				= rxe->port[0].subnet_prefix;
+			attr->grh.dgid.global.interface_id
+				= rxe->port[0].guid_tbl[0];
+			av->attr = *attr;
+		} else {
+			pr_info("rxe_net: attempting to init av without grh\n");
+			return -EINVAL;
+		}
+	}
+
+	if (rdma_link_local_addr(in6))
+		rdma_get_ll_mac(in6, av->ll_addr);
+	else if (rdma_is_multicast_addr(in6))
+		rdma_get_mcast_mac(in6, av->ll_addr);
+	else {
+		int i;
+		char addr[64];
+
+		for (i = 0; i < 16; i++)
+			sprintf(addr+2*i, "%02x", attr->grh.dgid.raw[i]);
+
+		pr_info("rxe_net: non local subnet address not supported %s\n",
+			addr);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * this is required by rxe_cfg to match rxe devices in
+ * /sys/class/infiniband up with their underlying ethernet devices
+ */
+static char *parent_name(struct rxe_dev *rxe, unsigned int port_num)
+{
+	return rxe->ndev->name;
+}
+
+static enum rdma_link_layer link_layer(struct rxe_dev *rxe,
+				       unsigned int port_num)
+{
+	return IB_LINK_LAYER_ETHERNET;
+}
+
+static struct rxe_ifc_ops ifc_ops = {
+	.release	= release,
+	.node_guid	= node_guid,
+	.port_guid	= port_guid,
+	.dma_device	= dma_device,
+	.mcast_add	= mcast_add,
+	.mcast_delete	= mcast_delete,
+	.send		= send,
+	.loopback	= loopback,
+	.init_packet	= init_packet,
+	.init_av	= init_av,
+	.parent_name	= parent_name,
+	.link_layer	= link_layer,
+};
+
+/* Caller must hold net_info_lock */
+void rxe_net_add(struct net_device *ndev)
+{
+	int err;
+	struct rxe_dev *rxe;
+	unsigned port_num;
+
+	__module_get(THIS_MODULE);
+
+	rxe = (struct rxe_dev *)ib_alloc_device(sizeof(*rxe));
+	if (!rxe) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	/* for now we always assign port = 1 */
+	port_num = 1;
+
+	rxe->ifc_ops = &ifc_ops;
+
+	rxe->ndev = ndev;
+
+	err = rxe_add(rxe, ndev->mtu);
+	if (err)
+		goto err2;
+
+	pr_info("rxe_net: added %s to %s\n",
+		 rxe->ib_dev.name, ndev->name);
+
+	net_info[ndev->ifindex].rxe = rxe;
+	net_info[ndev->ifindex].port = port_num;
+	net_info[ndev->ifindex].ndev = ndev;
+	return;
+
+err2:
+	ib_dealloc_device(&rxe->ib_dev);
+err1:
+	module_put(THIS_MODULE);
+
+	return;
+}
+
+/* Caller must hold net_info_lock */
+void rxe_net_up(struct net_device *ndev)
+{
+	struct rxe_dev *rxe;
+	struct rxe_port *port;
+	u8 port_num;
+
+	if (ndev->ifindex >= RXE_MAX_IF_INDEX)
+		goto out;
+
+	net_info[ndev->ifindex].status = IB_PORT_ACTIVE;
+
+	rxe = net_to_rxe(ndev);
+	if (!rxe)
+		goto out;
+
+	port_num = net_to_port(ndev);
+	port = &rxe->port[port_num-1];
+	port->attr.state = IB_PORT_ACTIVE;
+	port->attr.phys_state = IB_PHYS_STATE_LINK_UP;
+
+	pr_info("rxe_net: set %s active for %s\n",
+		rxe->ib_dev.name, ndev->name);
+out:
+	return;
+}
+
+/* Caller must hold net_info_lock */
+void rxe_net_down(struct net_device *ndev)
+{
+	struct rxe_dev *rxe;
+	struct rxe_port *port;
+	u8 port_num;
+
+	if (ndev->ifindex >= RXE_MAX_IF_INDEX)
+		goto out;
+
+	net_info[ndev->ifindex].status = IB_PORT_DOWN;
+
+	rxe = net_to_rxe(ndev);
+	if (!rxe)
+		goto out;
+
+	port_num = net_to_port(ndev);
+	port = &rxe->port[port_num-1];
+	port->attr.state = IB_PORT_DOWN;
+	port->attr.phys_state = 3;
+
+	pr_info("rxe_net: set %s down for %s\n",
+		rxe->ib_dev.name, ndev->name);
+out:
+	return;
+}
+
+static int can_support_rxe(struct net_device *ndev)
+{
+	int rc = 0;
+
+	if (ndev->ifindex >= RXE_MAX_IF_INDEX) {
+		pr_debug("%s index %d: too large for rxe ndev table\n",
+			 ndev->name, ndev->ifindex);
+		goto out;
+	}
+
+	/* Let's says we support all ethX devices */
+	if (strncmp(ndev->name, "eth", 3) == 0)
+		rc = 1;
+
+out:
+	return rc;
+}
+
+static int rxe_notify(struct notifier_block *not_blk,
+		      unsigned long event,
+		      void *arg)
+{
+	struct net_device *ndev = arg;
+	struct rxe_dev *rxe;
+
+	if (!can_support_rxe(ndev))
+		goto out;
+
+	spin_lock_bh(&net_info_lock);
+	switch (event) {
+	case NETDEV_REGISTER:
+		/* Keep a record of this NIC. */
+		net_info[ndev->ifindex].status = IB_PORT_DOWN;
+		net_info[ndev->ifindex].rxe = NULL;
+		net_info[ndev->ifindex].port = 1;
+		net_info[ndev->ifindex].ndev = ndev;
+		break;
+
+	case NETDEV_UNREGISTER:
+		if (net_info[ndev->ifindex].rxe) {
+			rxe = net_info[ndev->ifindex].rxe;
+			net_info[ndev->ifindex].rxe = NULL;
+			spin_unlock_bh(&net_info_lock);
+			rxe_remove(rxe);
+			spin_lock_bh(&net_info_lock);
+		}
+		net_info[ndev->ifindex].status = 0;
+		net_info[ndev->ifindex].port = 0;
+		net_info[ndev->ifindex].ndev = NULL;
+		break;
+
+	case NETDEV_UP:
+		rxe_net_up(ndev);
+		break;
+
+	case NETDEV_DOWN:
+		rxe_net_down(ndev);
+		break;
+
+	case NETDEV_CHANGEMTU:
+		rxe = net_to_rxe(ndev);
+		if (rxe) {
+			pr_info("rxe_net: %s changed mtu to %d\n",
+				ndev->name, ndev->mtu);
+			rxe_set_mtu(rxe, ndev->mtu, net_to_port(ndev));
+		}
+		break;
+
+	case NETDEV_REBOOT:
+	case NETDEV_CHANGE:
+	case NETDEV_GOING_DOWN:
+	case NETDEV_CHANGEADDR:
+	case NETDEV_CHANGENAME:
+	case NETDEV_FEAT_CHANGE:
+	default:
+		pr_info("rxe_net: ignoring netdev event = %ld for %s\n",
+			event, ndev->name);
+		break;
+	}
+	spin_unlock_bh(&net_info_lock);
+
+out:
+	return NOTIFY_OK;
+}
+
+static int rxe_net_rcv(struct sk_buff *skb,
+		       struct net_device *ndev,
+		       struct packet_type *ptype,
+		       struct net_device *orig_dev)
+{
+	struct rxe_dev *rxe = net_to_rxe(ndev);
+	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	int rc = 0;
+
+	if (!rxe)
+		goto drop;
+
+	/* TODO: We can receive packets in fragments. For now we
+	 * linearize and it's costly because we may copy a lot of
+	 * data. We should handle that case better. */
+	if (skb_linearize(skb))
+		goto drop;
+
+#if 0
+	/* Error injector */
+	{
+		static int x = 8;
+		static int counter;
+		counter++;
+
+		if (counter == x) {
+			x = 13-x; /* 8 or 5 */
+			counter = 0;
+			pr_debug("dropping one packet\n");
+			goto drop;
+		}
+	}
+#endif
+
+	skb = skb_share_check(skb, GFP_ATOMIC);
+	if (!skb) {
+		/* still return null */
+		goto out;
+	}
+
+	/* set required fields in pkt */
+	pkt->rxe = rxe;
+	pkt->port_num = net_to_port(ndev);
+	pkt->hdr = skb_network_header(skb);
+	pkt->mask = RXE_GRH_MASK;
+
+	rc = NF_HOOK(NFPROTO_RXE, NF_RXE_IN, skb, ndev, NULL, rxe_rcv);
+out:
+	return rc;
+
+drop:
+	kfree_skb(skb);
+	return 0;
+}
+
+static struct packet_type rxe_packet_type = {
+	.func = rxe_net_rcv,
+};
+
+static struct notifier_block rxe_net_notifier = {
+	.notifier_call = rxe_notify,
+};
+
+static int __init rxe_net_init(void)
+{
+	int err;
+
+	spin_lock_init(&net_info_lock);
+
+	if (rxe_eth_proto_id != ETH_P_RXE)
+		pr_info("rxe_net: protoid set to 0x%x\n",
+			rxe_eth_proto_id);
+
+	rxe_packet_type.type = cpu_to_be16(rxe_eth_proto_id);
+	dev_add_pack(&rxe_packet_type);
+
+	err = register_netdevice_notifier(&rxe_net_notifier);
+
+	pr_info("rxe_net: loaded\n");
+
+	return err;
+}
+
+static void __exit rxe_net_exit(void)
+{
+	unregister_netdevice_notifier(&rxe_net_notifier);
+	dev_remove_pack(&rxe_packet_type);
+
+	pr_info("rxe_net: unloaded\n");
+
+	return;
+}
+
+module_init(rxe_net_init);
+module_exit(rxe_net_exit);


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 34/37] add rxe_net_sysfs.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (32 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 33/37] add rxe_net.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.815123934-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 35/37] add rxe_sample.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (3 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_net_sysfs.c --]
[-- Type: text/plain, Size: 6455 bytes --]

sysfs interface for ib_rxe_net.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_net_sysfs.c |  229 ++++++++++++++++++++++++++++++
 1 file changed, 229 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_net_sysfs.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_net_sysfs.c
@@ -0,0 +1,229 @@
+/*
+ * Copyright (c) 2009-2011 Mellanox Technologies Ltd. All rights reserved.
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "rxe.h"
+#include "rxe_net.h"
+
+/* Copy argument and remove trailing CR. Return the new length. */
+static int sanitize_arg(const char *val, char *intf, int intf_len)
+{
+	int len;
+
+	if (!val)
+		return 0;
+
+	/* Remove newline. */
+	for (len = 0; len < intf_len - 1 && val[len] && val[len] != '\n'; len++)
+		intf[len] = val[len];
+	intf[len] = 0;
+
+	if (len == 0 || (val[len] != 0 && val[len] != '\n'))
+		return 0;
+
+	return len;
+}
+
+/* Caller must hold net_info_lock */
+static void rxe_set_port_state(struct net_device *ndev)
+{
+	struct rxe_dev *rxe;
+
+	if (ndev->ifindex >= RXE_MAX_IF_INDEX)
+		goto out;
+
+	rxe = net_to_rxe(ndev);
+	if (!rxe)
+		goto out;
+
+	if (net_info[ndev->ifindex].status == IB_PORT_ACTIVE)
+		rxe_net_up(ndev);
+	else
+		rxe_net_down(ndev); /* down for unknown state */
+out:
+	return;
+}
+
+static int rxe_param_set_add(const char *val, struct kernel_param *kp)
+{
+	int i, len;
+	char intf[32];
+
+	len = sanitize_arg(val, intf, sizeof(intf));
+	if (!len) {
+		pr_err("rxe_net: add: invalid interface name\n");
+		return -EINVAL;
+	}
+
+	spin_lock_bh(&net_info_lock);
+	for (i = 0; i < RXE_MAX_IF_INDEX; i++) {
+		struct net_device *ndev = net_info[i].ndev;
+
+		if (ndev && (0 == strncmp(intf, ndev->name, len))) {
+			spin_unlock_bh(&net_info_lock);
+			if (net_info[i].rxe)
+				pr_info("rxe_net: already configured on %s\n",
+					intf);
+			else {
+				rxe_net_add(ndev);
+				if (net_info[i].rxe) {
+					rxe_set_port_state(ndev);
+				} else
+					pr_err("rxe_net: add appears to have "
+					       "failed for %s (index %d)\n",
+					       intf, i);
+			}
+			return 0;
+		}
+	}
+	spin_unlock_bh(&net_info_lock);
+
+	pr_warning("interface %s not found\n", intf);
+
+	return 0;
+}
+
+static void rxe_remove_all(void)
+{
+	int i;
+	struct rxe_dev *rxe;
+
+	for (i = 0; i < RXE_MAX_IF_INDEX; i++) {
+		if (net_info[i].rxe) {
+			spin_lock_bh(&net_info_lock);
+			rxe = net_info[i].rxe;
+			net_info[i].rxe = NULL;
+			spin_unlock_bh(&net_info_lock);
+
+			rxe_remove(rxe);
+		}
+	}
+}
+
+static int rxe_param_set_remove(const char *val, struct kernel_param *kp)
+{
+	int i, len;
+	char intf[32];
+	struct rxe_dev *rxe;
+
+	len = sanitize_arg(val, intf, sizeof(intf));
+	if (!len) {
+		pr_err("rxe_net: remove: invalid interface name\n");
+		return -EINVAL;
+	}
+
+	if (strncmp("all", intf, len) == 0) {
+		pr_info("rxe_sys: remove all");
+		rxe_remove_all();
+		return 0;
+	}
+
+	spin_lock_bh(&net_info_lock);
+	for (i = 0; i < RXE_MAX_IF_INDEX; i++) {
+		if (!net_info[i].rxe || !net_info[i].ndev)
+			continue;
+
+		if (0 == strncmp(intf, net_info[i].rxe->ib_dev.name, len)) {
+			rxe = net_info[i].rxe;
+			net_info[i].rxe = NULL;
+			spin_unlock_bh(&net_info_lock);
+
+			rxe_remove(rxe);
+			return 0;
+		}
+	}
+	spin_unlock_bh(&net_info_lock);
+	pr_warning("rxe_sys: instance %s not found\n", intf);
+
+	return 0;
+}
+
+module_param_call(add, rxe_param_set_add, NULL, NULL, 0200);
+module_param_call(remove, rxe_param_set_remove, NULL, NULL, 0200);
+
+static int rxe_param_set_mtu(const char *val, struct kernel_param *kp)
+{
+	int i, rc, len;
+	int do_all = 0;
+	char intf[32];
+	char cmd[32];
+	int tmp_mtu;
+
+	len = sanitize_arg(val, intf, sizeof(intf));
+	if (!len)
+		return -EINVAL;
+
+	rc = sscanf(intf, "%s %d", cmd, &tmp_mtu);
+	if (rc != 2) {
+		pr_warning("rxe_net: mtu bogus input (%s)\n", intf);
+		goto out;
+	}
+
+	pr_info("set_mtu: %s %d\n", cmd, tmp_mtu);
+
+	if (!(is_power_of_2(tmp_mtu)
+	      && (tmp_mtu >= 256)
+	      && (tmp_mtu <= 4096))) {
+		pr_warning("rxe_net: bogus mtu (%s - %d pow2  %d)\n",
+			   intf, tmp_mtu, is_power_of_2(tmp_mtu));
+		goto out;
+	}
+
+	tmp_mtu = rxe_mtu_int_to_enum(tmp_mtu);
+
+	if (strcmp("all", cmd) == 0)
+		do_all = 1;
+
+	spin_lock_bh(&net_info_lock);
+	for (i = 0; i < RXE_MAX_IF_INDEX; i++) {
+		if (net_info[i].rxe && net_info[i].ndev) {
+			if (do_all
+			    || (0 == strncmp(cmd,
+					     net_info[i].rxe->ib_dev.name,
+					     len))) {
+				net_info[i].rxe->pref_mtu = tmp_mtu;
+
+				rxe_set_mtu(net_info[i].rxe,
+					    net_info[i].ndev->mtu, 1);
+
+				if (!do_all)
+					break;
+			}
+		}
+	}
+	spin_unlock_bh(&net_info_lock);
+
+out:
+	return 0;
+}
+
+module_param_call(mtu, rxe_param_set_mtu, NULL, NULL, 0200);


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 35/37] add rxe_sample.c
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (33 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 34/37] add rxe_net_sysfs.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.865806162-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-07-24 19:43 ` [patch v2 36/37] add Makefile rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (2 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-rxe_net_sample.c --]
[-- Type: text/plain, Size: 5979 bytes --]

module that implements a soft IB device using ib_rxe in loopback.

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/rxe_sample.c |  226 +++++++++++++++++++++++++++++++++
 1 file changed, 226 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/rxe_sample.c
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/rxe_sample.c
@@ -0,0 +1,226 @@
+/*
+ * Copyright (c) 2009-2011 System Fabric Works, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * sample driver for IB transport over rxe
+ * implements a simple loopback device on module load
+ */
+
+#include <linux/skbuff.h>
+
+#include <linux/device.h>
+
+#include "rxe.h"
+
+MODULE_AUTHOR("Bob Pearson");
+MODULE_DESCRIPTION("RDMA transport over Converged Enhanced Ethernet");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static __be64 node_guid(struct rxe_dev *rxe)
+{
+	return cpu_to_be64(0x3333333333333333ULL);
+}
+
+static __be64 port_guid(struct rxe_dev *rxe, unsigned int port_num)
+{
+	return cpu_to_be64(0x4444444444444444ULL);
+}
+
+/*
+ * the ofed core requires that we provide a valid device
+ * object for registration
+ */
+static struct class *my_class;
+static struct device *my_dev;
+
+static struct device *dma_device(struct rxe_dev *rxe)
+{
+	return my_dev;
+}
+
+static int mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+	return 0;
+}
+
+static int mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+	return 0;
+}
+
+/* just loopback packet */
+static int send(struct rxe_dev *rxe, struct sk_buff *skb)
+{
+	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+
+	pkt->rxe = rxe;
+	pkt->mask = RXE_LRH_MASK;
+
+	return rxe_rcv(skb);
+}
+
+static struct sk_buff *init_packet(struct rxe_dev *rxe, struct rxe_av *av,
+				   int paylen)
+{
+	struct sk_buff *skb;
+	struct rxe_pkt_info *pkt;
+
+	paylen += RXE_LRH_BYTES;
+
+	if (av->attr.ah_flags & IB_AH_GRH)
+		paylen += RXE_GRH_BYTES;
+
+	skb = alloc_skb(paylen, GFP_ATOMIC);
+	if (!skb)
+		return NULL;
+
+	skb->dev	= NULL;
+	skb->protocol	= 0;
+
+	pkt		= SKB_TO_PKT(skb);
+	pkt->rxe	= rxe;
+	pkt->port_num	= 1;
+	pkt->hdr	= skb_put(skb, paylen);
+	pkt->mask	= RXE_LRH_MASK;
+	if (av->attr.ah_flags & IB_AH_GRH)
+		pkt->mask	|= RXE_GRH_MASK;
+
+	return skb;
+}
+
+static int init_av(struct rxe_dev *rxe, struct ib_ah_attr *attr,
+		   struct rxe_av *av)
+{
+	if (!av->attr.dlid)
+		av->attr.dlid = 1;
+	return 0;
+}
+
+static char *parent_name(struct rxe_dev *rxe, unsigned int port_num)
+{
+	return "sample";
+}
+
+static enum rdma_link_layer link_layer(struct rxe_dev *rxe,
+				       unsigned int port_num)
+{
+	return IB_LINK_LAYER_INFINIBAND;
+}
+
+static struct rxe_ifc_ops ifc_ops = {
+	.node_guid	= node_guid,
+	.port_guid	= port_guid,
+	.dma_device	= dma_device,
+	.mcast_add	= mcast_add,
+	.mcast_delete	= mcast_delete,
+	.send		= send,
+	.init_packet	= init_packet,
+	.init_av	= init_av,
+	.parent_name	= parent_name,
+	.link_layer	= link_layer,
+};
+
+static struct rxe_dev *rxe_sample;
+
+static int rxe_sample_add(void)
+{
+	int err;
+	struct rxe_port *port;
+
+	rxe_sample = (struct rxe_dev *)ib_alloc_device(sizeof(*rxe_sample));
+	if (!rxe_sample) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	rxe_sample->ifc_ops = &ifc_ops;
+
+	err = rxe_add(rxe_sample, 4500);
+	if (err)
+		goto err2;
+
+	/* bit of a hack */
+	port = &rxe_sample->port[0];
+	port->attr.state = IB_PORT_ACTIVE;
+	port->attr.phys_state = 5;
+	port->attr.max_mtu = IB_MTU_4096;
+	port->attr.active_mtu = IB_MTU_4096;
+	port->mtu_cap = IB_MTU_4096;
+	port->attr.lid = 1;
+
+	pr_info("rxe_sample: added %s\n",
+		 rxe_sample->ib_dev.name);
+	return 0;
+
+err2:
+	ib_dealloc_device(&rxe_sample->ib_dev);
+err1:
+	return err;
+}
+
+static void rxe_sample_remove(void)
+{
+	if (!rxe_sample)
+		goto done;
+
+	rxe_remove(rxe_sample);
+
+	pr_info("rxe_sample: removed %s\n",
+		rxe_sample->ib_dev.name);
+done:
+	return;
+}
+
+static int __init rxe_sample_init(void)
+{
+	int err;
+
+	rxe_crc_disable = 1;
+
+	my_class = class_create(THIS_MODULE, "foo");
+	my_dev = device_create(my_class, NULL, 0, NULL, "bar");
+
+	err = rxe_sample_add();
+	return err;
+}
+
+static void __exit rxe_sample_exit(void)
+{
+	rxe_sample_remove();
+
+	device_destroy(my_class, my_dev->devt);
+	class_destroy(my_class);
+	return;
+}
+
+module_init(rxe_sample_init);
+module_exit(rxe_sample_exit);


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 36/37] add Makefile
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (34 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 35/37] add rxe_sample.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  2011-07-24 19:43 ` [patch v2 37/37] add Kconfig rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found] ` <20110724194300.421253331-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  37 siblings, 0 replies; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-Makefile.c --]
[-- Type: text/plain, Size: 1104 bytes --]

Makefile

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/hw/rxe/Makefile |   30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

Index: infiniband/drivers/infiniband/hw/rxe/Makefile
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/Makefile
@@ -0,0 +1,30 @@
+obj-$(CONFIG_INFINIBAND_RXE) += ib_rxe.o ib_rxe_net.o ib_rxe_sample.o
+
+ib_rxe-y := \
+	rxe.o \
+	rxe_comp.o \
+	rxe_req.o \
+	rxe_resp.o \
+	rxe_recv.o \
+	rxe_pool.o \
+	rxe_queue.o \
+	rxe_verbs.o \
+	rxe_av.o \
+	rxe_srq.o \
+	rxe_qp.o \
+	rxe_cq.o \
+	rxe_mr.o \
+	rxe_dma.o \
+	rxe_opcode.o \
+	rxe_mmap.o \
+	rxe_arbiter.o \
+	rxe_icrc.o \
+	rxe_mcast.o \
+	rxe_task.o
+
+ib_rxe_net-y := \
+	rxe_net.o \
+	rxe_net_sysfs.o
+
+ib_rxe_sample-y := \
+	rxe_sample.o


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [patch v2 37/37] add Kconfig
  2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
                   ` (35 preceding siblings ...)
  2011-07-24 19:43 ` [patch v2 36/37] add Makefile rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
@ 2011-07-24 19:43 ` rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
       [not found]   ` <20110724201229.966262587-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
       [not found] ` <20110724194300.421253331-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  37 siblings, 1 reply; 85+ messages in thread
From: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5 @ 2011-07-24 19:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5

[-- Attachment #1: add-Kconfig.c --]
[-- Type: text/plain, Size: 3032 bytes --]

Kconfig file

Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>

---
 drivers/infiniband/Kconfig        |    1 +
 drivers/infiniband/Makefile       |    1 +
 drivers/infiniband/hw/rxe/Kconfig |   28 ++++++++++++++++++++++++++++
 3 files changed, 30 insertions(+)

Index: infiniband/drivers/infiniband/Kconfig
===================================================================
--- infiniband.orig/drivers/infiniband/Kconfig
+++ infiniband/drivers/infiniband/Kconfig
@@ -51,6 +51,7 @@ source "drivers/infiniband/hw/cxgb3/Kcon
 source "drivers/infiniband/hw/cxgb4/Kconfig"
 source "drivers/infiniband/hw/mlx4/Kconfig"
 source "drivers/infiniband/hw/nes/Kconfig"
+source "drivers/infiniband/hw/rxe/Kconfig"
 
 source "drivers/infiniband/ulp/ipoib/Kconfig"
 
Index: infiniband/drivers/infiniband/Makefile
===================================================================
--- infiniband.orig/drivers/infiniband/Makefile
+++ infiniband/drivers/infiniband/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_INFINIBAND_CXGB3)		+= hw/cx
 obj-$(CONFIG_INFINIBAND_CXGB4)		+= hw/cxgb4/
 obj-$(CONFIG_MLX4_INFINIBAND)		+= hw/mlx4/
 obj-$(CONFIG_INFINIBAND_NES)		+= hw/nes/
+obj-$(CONFIG_INFINIBAND_RXE)		+= hw/rxe/
 obj-$(CONFIG_INFINIBAND_IPOIB)		+= ulp/ipoib/
 obj-$(CONFIG_INFINIBAND_SRP)		+= ulp/srp/
 obj-$(CONFIG_INFINIBAND_ISER)		+= ulp/iser/
Index: infiniband/drivers/infiniband/hw/rxe/Kconfig
===================================================================
--- /dev/null
+++ infiniband/drivers/infiniband/hw/rxe/Kconfig
@@ -0,0 +1,28 @@
+config INFINIBAND_RXE
+	tristate "Software RDMA over Ethernet (RoCE) driver"
+	depends on INET && PCI && INFINIBAND
+	---help---
+	This driver implements the InfiniBand RDMA transport over
+	the Linux network stack. It enables a system with a
+	standard Ethernet adapter to interoperate with a RoCE
+	adapter or with another system running the RXE driver.
+	Documentation on InfiniBand and RoCE can be downloaded at
+	www.infinibandta.org and www.openfabrics.org. (See also
+	siw which is a similar software driver for iWARP.)
+
+	The driver is split into two layers, one interfaces with the
+	Linux RDMA stack and implements a kernel or user space
+	verbs API. The user space verbs API requires a support
+	library named librxe which is loaded by the generic user
+	space verbs API, libibverbs. The other layer interfaces
+	with the Linux network stack at layer 3.
+
+	There are three kernel modules provided with RXE.
+
+	ib_rxe		- attaches to the Linux RDMA stack
+	ib_rxe_net	- attaches to the Linux network stack
+	ib_rxe_sample	- provides a simple sample network layer
+			  driver that provides a loopback interface.
+
+	Rxe_cfg provided with librxe is a configuration tool that
+	loads and unloads the ib_rxe and ib_rxe_net kernel modules


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 01/37] add slice by 8 algorithm to crc32.c
       [not found]   ` <20110724201228.154338911-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-05 18:39     ` Bart Van Assche
       [not found]       ` <CAO+b5-rFs79znwVcG42Px2i351vhiz9hFyCJV5qg77of8Y6P5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-05 18:39 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM, <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> -#if CRC_BE_BITS > 8 || CRC_BE_BITS < 1 || CRC_BE_BITS & CRC_BE_BITS-1
> -# error CRC_BE_BITS must be a power of 2 between 1 and 8
> +#if CRC_BE_BITS > 64 || CRC_BE_BITS < 1 || CRC_BE_BITS == 16 || \
> +       CRC_BE_BITS & CRC_BE_BITS-1
> +# error "CRC_BE_BITS must be one of {1, 2, 4, 8, 32, 64}"
>  #endif

It's a little unfortunate that Roland's macro
BUILD_BUG_ON_NOT_POWER_OF_2() can't be used in the above - that would
make the "CRC_BE_BITS & CRC_BE_BITS-1" part more readable.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 37/37] add Kconfig
       [not found]   ` <20110724201229.966262587-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-05 18:44     ` Bart Van Assche
       [not found]       ` <CAO+b5-oFKiAhTkCntAo5_mMCtdg1vb1eM_mTi-8E-T_atHa_3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-05 18:44 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
+config INFINIBAND_RXE
+	tristate "Software RDMA over Ethernet (RoCE) driver"
+	depends on INET && PCI && INFINIBAND
+	---help---
+	This driver implements the InfiniBand RDMA transport over
+	the Linux network stack. It enables a system with a

Shouldn't that read "an Ethernet network" instead of "the Linux network stack" ?

+	standard Ethernet adapter to interoperate with a RoCE
+	adapter or with another system running the RXE driver.
+	Documentation on InfiniBand and RoCE can be downloaded at
+	www.infinibandta.org and www.openfabrics.org. (See also
+	siw which is a similar software driver for iWARP.)

Isn't RoCE an official IBTA standard ? If so, seems like a good idea
to me to mention that here.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 33/37] add rxe_net.c
       [not found]   ` <20110724201229.765038738-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-05 18:47     ` Bart Van Assche
       [not found]       ` <CAO+b5-pk-X7aGgg12ND17OLcqcLU6qyiV+vf22anh9bYsXUUzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-05 18:47 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM, <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:

(resending as plain text)

+static __be64 rxe_mac_to_eui64(struct net_device *ndev)
+{
+    unsigned char *mac_addr = ndev->dev_addr;
+    __be64 eui64;
+    unsigned char *dst = (unsigned char *)&eui64;
+
+    dst[0] = mac_addr[0] ^ 2;
+    dst[1] = mac_addr[1];
+    dst[2] = mac_addr[2];
+    dst[3] = 0xff;
+    dst[4] = 0xfe;
+    dst[5] = mac_addr[3];
+    dst[6] = mac_addr[4];
+    dst[7] = mac_addr[5];
+
+    return eui64;
+}

The above is a standard EUI-48 into EUI-64 translation I assume ?

Also, I haven't found a function that does the opposite translation -
does that mean that there is no GID-to-rxe-device translation in the
ib_rxe driver yet ? I'd like to give ib_srp over ib_rxe a try, but as
far as I can see that won't work yet ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [patch v2 33/37] add rxe_net.c
       [not found]       ` <CAO+b5-pk-X7aGgg12ND17OLcqcLU6qyiV+vf22anh9bYsXUUzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-08-05 19:17         ` Bob Pearson
  2011-08-06  8:02         ` Bart Van Assche
  1 sibling, 0 replies; 85+ messages in thread
From: Bob Pearson @ 2011-08-05 19:17 UTC (permalink / raw)
  To: 'Bart Van Assche'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, David McMillen, 'frank zago'



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Bart Van Assche
> Sent: Friday, August 05, 2011 1:48 PM
> To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v2 33/37] add rxe_net.c
> 
> On Sun, Jul 24, 2011 at 9:43 PM, <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> 
> (resending as plain text)
> 
> +static __be64 rxe_mac_to_eui64(struct net_device *ndev)
> +{
> +    unsigned char *mac_addr = ndev->dev_addr;
> +    __be64 eui64;
> +    unsigned char *dst = (unsigned char *)&eui64;
> +
> +    dst[0] = mac_addr[0] ^ 2;
> +    dst[1] = mac_addr[1];
> +    dst[2] = mac_addr[2];
> +    dst[3] = 0xff;
> +    dst[4] = 0xfe;
> +    dst[5] = mac_addr[3];
> +    dst[6] = mac_addr[4];
> +    dst[7] = mac_addr[5];
> +
> +    return eui64;
> +}
> 
> The above is a standard EUI-48 into EUI-64 translation I assume ?
Yes.
> 
> Also, I haven't found a function that does the opposite translation -
> does that mean that there is no GID-to-rxe-device translation in the
> ib_rxe driver yet ? I'd like to give ib_srp over ib_rxe a try, but as
> far as I can see that won't work yet ?

There are a couple of issues with SRP/RoCE. One of them is that the current
scheme for configuring connections is based on DM MADs. The other is the one
you mention. We (SFW) looked at both problems a couple of years ago and came
up with a version of SRP that runs over RDMACM. SRP was standardized before
RDMACM was a gleam in Sean's eye and assumes that it has the entire private
data area for CM MADs to pass parameters. But if you squeeze a little you
can get it all in after the area that RDMACM takes. We have versions of SRPI
and SRPT that can use RDMACM for connection setup. Since RDMACM supports
connection setup automatically for RoCE you're all fixed. In order to make
this commercially practical you would have to go back to NCITS and update
the SRP standard but I'm not too sure that there is enough interest to make
that happen. The other problem requires making changes to the special file
that sets up the connection. We also have this patch.

If you are interested in this I would be glad to work with you to try to get
these patches into OFED.

> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [patch v2 37/37] add Kconfig
       [not found]       ` <CAO+b5-oFKiAhTkCntAo5_mMCtdg1vb1eM_mTi-8E-T_atHa_3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-08-05 19:19         ` Bob Pearson
  2011-08-06 18:44           ` Bart Van Assche
  0 siblings, 1 reply; 85+ messages in thread
From: Bob Pearson @ 2011-08-05 19:19 UTC (permalink / raw)
  To: 'Bart Van Assche'; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Bart Van Assche
> Sent: Friday, August 05, 2011 1:44 PM
> To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v2 37/37] add Kconfig
> 
> On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +config INFINIBAND_RXE
> +	tristate "Software RDMA over Ethernet (RoCE) driver"
> +	depends on INET && PCI && INFINIBAND
> +	---help---
> +	This driver implements the InfiniBand RDMA transport over
> +	the Linux network stack. It enables a system with a
> 
> Shouldn't that read "an Ethernet network" instead of "the Linux network
> stack" ?

Technically it will run over more than just Ethernet. RXE plugs in at the
same place as IP on top of the netdev core.
There are some assumptions about the size of the L2 address that make it
work better on Ethernet but I used to run it over the loopback device and it
also runs over VLANs.

> 
> +	standard Ethernet adapter to interoperate with a RoCE
> +	adapter or with another system running the RXE driver.
> +	Documentation on InfiniBand and RoCE can be downloaded at
> +	www.infinibandta.org and www.openfabrics.org. (See also
> +	siw which is a similar software driver for iWARP.)
> 
> Isn't RoCE an official IBTA standard ? If so, seems like a good idea
> to me to mention that here.

I'll do that for v3.

> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [patch v2 01/37] add slice by 8 algorithm to crc32.c
       [not found]       ` <CAO+b5-rFs79znwVcG42Px2i351vhiz9hFyCJV5qg77of8Y6P5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-08-05 19:24         ` Bob Pearson
  0 siblings, 0 replies; 85+ messages in thread
From: Bob Pearson @ 2011-08-05 19:24 UTC (permalink / raw)
  To: 'Bart Van Assche'; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

This patch has been submitted to linux-kernel and is being reviewed and
changing. When that is done I'll replace this one with whatever comes out
there.

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Bart Van Assche
> Sent: Friday, August 05, 2011 1:40 PM
> To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v2 01/37] add slice by 8 algorithm to crc32.c
> 
> On Sun, Jul 24, 2011 at 9:43 PM, <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> > -#if CRC_BE_BITS > 8 || CRC_BE_BITS < 1 || CRC_BE_BITS & CRC_BE_BITS-1
> > -# error CRC_BE_BITS must be a power of 2 between 1 and 8
> > +#if CRC_BE_BITS > 64 || CRC_BE_BITS < 1 || CRC_BE_BITS == 16 || \
> > +       CRC_BE_BITS & CRC_BE_BITS-1
> > +# error "CRC_BE_BITS must be one of {1, 2, 4, 8, 32, 64}"
> >  #endif
> 
> It's a little unfortunate that Roland's macro
> BUILD_BUG_ON_NOT_POWER_OF_2() can't be used in the above - that
> would
> make the "CRC_BE_BITS & CRC_BE_BITS-1" part more readable.
> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 33/37] add rxe_net.c
       [not found]       ` <CAO+b5-pk-X7aGgg12ND17OLcqcLU6qyiV+vf22anh9bYsXUUzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2011-08-05 19:17         ` Bob Pearson
@ 2011-08-06  8:02         ` Bart Van Assche
  1 sibling, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-06  8:02 UTC (permalink / raw)
  To: Bob Pearson, Linux-RDMA

On Sun, Jul 24, 2011 at 9:43 PM, <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
+static __be64 rxe_mac_to_eui64(struct net_device *ndev)
+{
+    unsigned char *mac_addr = ndev->dev_addr;
+    __be64 eui64;
+    unsigned char *dst = (unsigned char *)&eui64;
+
+    dst[0] = mac_addr[0] ^ 2;
+    dst[1] = mac_addr[1];
+    dst[2] = mac_addr[2];
+    dst[3] = 0xff;
+    dst[4] = 0xfe;
+    dst[5] = mac_addr[3];
+    dst[6] = mac_addr[4];
+    dst[7] = mac_addr[5];
+
+    return eui64;
+}

As far as I can see rxe_param_set_add() does not limit ndev to
Ethernet devices ? In that case my comments about the above code are
as follows:
* The name mac_addr is misleading - dev_addr is probably more appropriate.
* A device address length check is missing. For network devices with
addresses less than six bytes long the above code will trigger an
out-of-bounds read.
* If someone binds ib_rxe to a network device with an address length
above six bytes, some useful information will be discarded.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 37/37] add Kconfig
  2011-08-05 19:19         ` Bob Pearson
@ 2011-08-06 18:44           ` Bart Van Assche
  0 siblings, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-06 18:44 UTC (permalink / raw)
  To: Bob Pearson; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, Aug 5, 2011 at 9:19 PM, Bob Pearson
<rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Bart Van Assche
>> Sent: Friday, August 05, 2011 1:44 PM
>> To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org
>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: Re: [patch v2 37/37] add Kconfig
>>
>> On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
>> +config INFINIBAND_RXE
>> +     tristate "Software RDMA over Ethernet (RoCE) driver"
>> +     depends on INET && PCI && INFINIBAND
>> +     ---help---
>> +     This driver implements the InfiniBand RDMA transport over
>> +     the Linux network stack. It enables a system with a
>>
>> Shouldn't that read "an Ethernet network" instead of "the Linux network
>> stack" ?
>
> Technically it will run over more than just Ethernet. RXE plugs in at the
> same place as IP on top of the netdev core.
> There are some assumptions about the size of the L2 address that make it
> work better on Ethernet but I used to run it over the loopback device and it
> also runs over VLANs.

In that case I propose to change "Linux network stack" into "Linux
network driver" - that's all what the rxe driver uses from the network
stack, and there is more in the Linux network stack than only the
network drivers.

>> +     standard Ethernet adapter to interoperate with a RoCE
>> +     adapter or with another system running the RXE driver.
>> +     Documentation on InfiniBand and RoCE can be downloaded at
>> +     www.infinibandta.org and www.openfabrics.org. (See also
>> +     siw which is a similar software driver for iWARP.)
>>
>> Isn't RoCE an official IBTA standard ? If so, seems like a good idea
>> to me to mention that here.
>
> I'll do that for v3.

Thanks. Maybe it's a good idea to mention also that while RoCE
benefits from a lossless converged Ethernet network, that the protocol
also works over a regular (not lossless) Ethernet network.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 35/37] add rxe_sample.c
       [not found]   ` <20110724201229.865806162-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-07  8:08     ` Bart Van Assche
  0 siblings, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-07  8:08 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> module that implements a soft IB device using ib_rxe in loopback.
> +MODULE_DESCRIPTION("RDMA transport over Converged Enhanced Ethernet");

That's the same description as for the ib_rxe_net kernel module, so
one of the two probably should be updated.

> +MODULE_LICENSE("Dual BSD/GPL");
> +
> +static __be64 node_guid(struct rxe_dev *rxe)
> +{
> +	return cpu_to_be64(0x3333333333333333ULL);
> +}

In the OUI part of the above EUI-64 identifier the "M" (multicast) and
L ("locally administered") bits are set. I'm not sure that's fine.
Also, has it been considered to use __constant_cpu_to_be64() instead
of cpu_to_be64() ?

> +static __be64 port_guid(struct rxe_dev *rxe, unsigned int port_num)
> +{
> +	return cpu_to_be64(0x4444444444444444ULL);
> +}

Same remark here about __constant_cpu_to_be64().

> +/*
> + * the ofed core requires that we provide a valid device
> + * object for registration
> + */
> +static struct class *my_class;
> +static struct device *my_dev;

Probably the above comment should be updated since this driver is
being submitted for inclusion in the mainline Linux kernel ?

> +	if (av->attr.ah_flags & IB_AH_GRH)
> +		pkt->mask	|= RXE_GRH_MASK;

Please use a space instead of a tab before "|=".

> +static struct rxe_dev *rxe_sample;
> +
> +static int rxe_sample_add(void)
> +{
> +	int err;
> +	struct rxe_port *port;
> +
> +	rxe_sample = (struct rxe_dev *)ib_alloc_device(sizeof(*rxe_sample));
> +	if (!rxe_sample) {
> +		err = -ENOMEM;
> +		goto err1;
> +	}

Since calling rxe_sample_add() twice in a row is an error a check for
that condition might be appropriate, e.g. by inserting a
WARN_ON(rxe_sample) before the allocation.

> +static void rxe_sample_remove(void)
> +{
> +	if (!rxe_sample)
> +		goto done;
> +
> +	rxe_remove(rxe_sample);
> +
> +	pr_info("rxe_sample: removed %s\n",
> +		rxe_sample->ib_dev.name);
> +done:
> +	return;
> +}

Setting rxe_sample to NULL after the rxe_remove() call can be a help
to catch potential dangling pointer dereferences.

> +
> +static int __init rxe_sample_init(void)
> +{
> +	int err;
> +
> +	rxe_crc_disable = 1;
> +
> +	my_class = class_create(THIS_MODULE, "foo");
> +	my_dev = device_create(my_class, NULL, 0, NULL, "bar");
> +
> +	err = rxe_sample_add();
> +	return err;
> +}

Sorry, but it's not clear to me why a new class has to be created this
sample driver. Also, one should be aware that class and device names
are globally visible and hence should be meaningful - names like "foo"
and "bar" are not acceptable.

# find /sys -name foo -o -name bar
/sys/devices/virtual/foo
/sys/devices/virtual/foo/bar
/sys/class/foo
/sys/class/foo/bar

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 32/37] add rxe_net.h
       [not found]   ` <20110724201229.715886208-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-07  8:51     ` Bart Van Assche
  0 siblings, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-07  8:51 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +/*
> + * this should be defined in .../include/linux/if_ether.h
> + */
> +#define ETH_P_RXE			(0x8915)
> +
> +/*
> + * this should be defined in .../include/linux/netfilter.h
> + * to a specific value
> + */
> +#define NFPROTO_RXE			(0)
> +
> +/*
> + * these should be defined in .../include/linux/netfilter_rxe.h
> + */
> +#define NF_RXE_IN			(0)
> +#define NF_RXE_OUT			(1)

Please update the cited header files instead of adding these symbols here.

> +/* Should probably move to something other than an array...these can be big */
> +#define RXE_MAX_IF_INDEX	(384)

Has it been considered to change net_info[] from an array into a
linked list ? As far as I can see none of the net_info[] accesses are
in the fast path and the number of entries used in the (sparse)
net_info[] array is small. Such a change would allow to eliminate the
RXE_MAX_IF_INDEX artificial limit.

> +struct rxe_net_info {
> +	struct rxe_dev		*rxe;
> +	u8			port;
> +	struct net_device	*ndev;
> +	int			status;
> +};

Is there a reason why status and port are in struct rxe_net_info
instead of struct rxe_dev ? Moving these two fields would allow
several code simplifications IMHO.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 34/37] add rxe_net_sysfs.c
       [not found]   ` <20110724201229.815123934-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-07  9:27     ` Bart Van Assche
  0 siblings, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-07  9:27 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +/* Copy argument and remove trailing CR. Return the new length. */

As far as I can see the function below removes a trailing newline
instead of a trailing carriage return ?

> +static int sanitize_arg(const char *val, char *intf, int intf_len)
> +{
> +	int len;
> +
> +	if (!val)
> +		return 0;
> +
> +	/* Remove newline. */
> +	for (len = 0; len < intf_len - 1 && val[len] && val[len] != '\n'; len++)
> +		intf[len] = val[len];
> +	intf[len] = 0;
> +
> +	if (len == 0 || (val[len] != 0 && val[len] != '\n'))
> +		return 0;
> +
> +	return len;
> +}

Has it been considered to use strchr() to find the first occurrence of
a newline character in "val" ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 26/37] add rxe_req.c
       [not found]   ` <20110724201229.415319539-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-10 18:44     ` Bart Van Assche
  0 siblings, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-10 18:44 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +	/* heuristic sliding window algorithm to keep
> +	   sender from overrunning receiver queues */
> +	if (qp_type(qp) == IB_QPT_RC)  {
> +		if (psn_compare(qp->req.psn, qp->comp.psn +
> +				rxe_max_req_comp_gap) > 0) {
> +			qp->req.wait_psn = 1;
> +			goto exit;
> +		}
> +	}

Shouldn't the rxe_max_req_comp_gap parameter be negotiated via the
Priority-based Flow Control protocol (IEEE 802.1 Qbb) instead of using
a single parameter for all queue pairs ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 28/37] add rxe_arbiter.c
       [not found]   ` <20110724201229.515453123-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-10 18:46     ` Bart Van Assche
       [not found]       ` <CAO+b5-p=++ofCDj4ZB_PS9LiT5bcHsJiYDW8EL_2OrrzqjZmhg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-10 18:46 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +int rxe_arbiter(void *arg)
> +{
> +	int err;
> +	unsigned long flags;
> +	struct rxe_dev *rxe = (struct rxe_dev *)arg;
> +	struct sk_buff *skb;
> +	struct list_head *qpl;
> +	struct rxe_qp *qp;
> +
> +	/* get the next qp's send queue */
> +	spin_lock_irqsave(&rxe->arbiter.list_lock, flags);
> +	if (list_empty(&rxe->arbiter.qp_list)) {
> +		spin_unlock_irqrestore(&rxe->arbiter.list_lock, flags);
> +		return 1;
> +	}
> +
> +	qpl = rxe->arbiter.qp_list.next;
> +	list_del_init(qpl);
> +	qp = list_entry(qpl, struct rxe_qp, arbiter_list);
> +	spin_unlock_irqrestore(&rxe->arbiter.list_lock, flags);

Is there a reason why the list_first_entry() macro hasn't been used in
the above code ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [patch v2 28/37] add rxe_arbiter.c
       [not found]       ` <CAO+b5-p=++ofCDj4ZB_PS9LiT5bcHsJiYDW8EL_2OrrzqjZmhg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-08-10 19:12         ` Bob Pearson
  0 siblings, 0 replies; 85+ messages in thread
From: Bob Pearson @ 2011-08-10 19:12 UTC (permalink / raw)
  To: 'Bart Van Assche'; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Bart,

Keep them coming. I have been using my available time for code to get the
crc32 patch upstream but that is about done. I've started to look at your
comments and really appreciate the time you are taking.

Bob 

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Bart Van Assche
> Sent: Wednesday, August 10, 2011 1:47 PM
> To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v2 28/37] add rxe_arbiter.c
> 
> On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> > +int rxe_arbiter(void *arg)
> > +{
> > +	int err;
> > +	unsigned long flags;
> > +	struct rxe_dev *rxe = (struct rxe_dev *)arg;
> > +	struct sk_buff *skb;
> > +	struct list_head *qpl;
> > +	struct rxe_qp *qp;
> > +
> > +	/* get the next qp's send queue */
> > +	spin_lock_irqsave(&rxe->arbiter.list_lock, flags);
> > +	if (list_empty(&rxe->arbiter.qp_list)) {
> > +		spin_unlock_irqrestore(&rxe->arbiter.list_lock, flags);
> > +		return 1;
> > +	}
> > +
> > +	qpl = rxe->arbiter.qp_list.next;
> > +	list_del_init(qpl);
> > +	qp = list_entry(qpl, struct rxe_qp, arbiter_list);
> > +	spin_unlock_irqrestore(&rxe->arbiter.list_lock, flags);
> 
> Is there a reason why the list_first_entry() macro hasn't been used in
> the above code ?
> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 02/37] add opcodes to ib_pack.h
       [not found]   ` <20110724201228.204265835-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-15  9:13     ` Bart Van Assche
       [not found]       ` <CAO+b5-pb3p0n5MZYNSJicAdUfZ4PtMNG5vdzDOiZNQwLnB3Wcw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-15  9:13 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM, <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
>
> Bring up to date with the current version of the IBTA spec.
>        - add new opcodes for RC and RD
>        - add new groups of opcodes for CN and XRC
>
> Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
>
> ---
>  include/rdma/ib_pack.h |   39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
>
> Index: infiniband/include/rdma/ib_pack.h
> ===================================================================
> --- infiniband.orig/include/rdma/ib_pack.h
> +++ infiniband/include/rdma/ib_pack.h
> @@ -75,6 +75,8 @@ enum {
>        IB_OPCODE_UC                                = 0x20,
>        IB_OPCODE_RD                                = 0x40,
>        IB_OPCODE_UD                                = 0x60,
> +       IB_OPCODE_CN                                = 0x80,
> +       IB_OPCODE_XRC                               = 0xA0,

A convention when submitting Linux kernel patches is to add only those
constants that are actually used in the code submitted in the same
patch series. So I'm not sure it's fine to add the XRC constants in
ib_pack.h through the rxe patch series.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 12/37] add rxe_verbs.h
       [not found]   ` <20110724201228.710989476-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-15 12:32     ` Bart Van Assche
       [not found]       ` <CAO+b5-okTDAJ6iOTv_JKXeGttt0MFM+na-FRaNwJMFqHktKHJQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2011-08-15 12:59     ` Bart Van Assche
  1 sibling, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-15 12:32 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +static inline int pkey_match(u16 key1, u16 key2)
> +{
> +	return (((key1 & 0x7fff) != 0) &&
> +		((key1 & 0x7fff) == (key2 & 0x7fff)) &&
> +		((key1 & 0x8000) || (key2 & 0x8000))) ? 1 : 0;
> +}

Shouldn't the return type of the above function be "bool" instead of
"int" ? Also, the "? 1 : 0" part at the end of the above expression
looks superfluous to me.

> +
> +/* Return >0 if psn_a > psn_b
> + *	   0 if psn_a == psn_b
> + *	  <0 if psn_a < psn_b
> + */
> +static inline int psn_compare(u32 psn_a, u32 psn_b)
> +{
> +	s32 diff;
> +	diff = (psn_a - psn_b) << 8;
> +	return diff;
> +}

Mentioning in the comment block that (according to IBTA) PSNs are
24-bit quantities would help IMHO.

> +#define RXE_LL_ADDR_LEN		(16)
> +
> +struct rxe_av {
> +	struct ib_ah_attr	attr;
> +	u8			ll_addr[RXE_LL_ADDR_LEN];
> +};

Since the ib_rxe kernel module can be used in combination with other
network drivers than only Ethernet drivers, the number of bytes stored
in ll_addr[] by init_av() depends on the link layer type. Shouldn't
the address length be stored in this structure too ?

> +/* must match corresponding data structure in librxe */
> +struct rxe_send_wqe {
> +	struct ib_send_wr	ibwr;
> +	struct rxe_av		av;	/* UD only */
> +	u32			status;
> +	u32			state;
> +	u64			iova;
> +	u32			mask;
> +	u32			first_psn;
> +	u32			last_psn;
> +	u32			ack_length;
> +	u32			ssn;
> +	u32			has_rd_atomic;
> +	struct rxe_dma_info	dma;	/* must go last */
> +};

Can't librxe use this header file instead of duplicating the above definition ?

> +struct rxe_port {
> +	struct ib_port_attr	attr;
> +	u16			*pkey_tbl;
> +	__be64			*guid_tbl;
> +	__be64			subnet_prefix;
> +
> +	/* rate control */
> +	/* TODO */
> +
> +	spinlock_t		port_lock;
> +
> +	unsigned int		mtu_cap;
> +
> +	/* special QPs */
> +	u32			qp_smi_index;
> +	u32			qp_gsi_index;
> +};

Regarding the "to do" item above: is rate control something planned
for the first submission of this driver ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 12/37] add rxe_verbs.h
       [not found]   ` <20110724201228.710989476-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-08-15 12:32     ` Bart Van Assche
@ 2011-08-15 12:59     ` Bart Van Assche
       [not found]       ` <CAO+b5-qQBWeyFacCU3tx92jG2zw1=YZJRRX5SbQ57Ztx+zxuew-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-15 12:59 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM, <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:

+ rxe_qp.c:char *rxe_qp_state_name[] = {

One more comment about rxe_verbs.h: why is the above declaration
present in this header file ? As far as I can see it's only used in
rxe_qp.c and not in any other source file.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 13/37] add rxe_verbs.c
       [not found]   ` <20110724201228.761358826-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-15 13:24     ` Bart Van Assche
       [not found]       ` <CAO+b5-r3UtgnyfhyaodHQTj_xB5d2MQkZ8kRbrpZ1hF_0W1ONw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2011-08-15 14:33     ` Bart Van Assche
  2011-08-15 16:31     ` Bart Van Assche
  2 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-15 13:24 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +static int rxe_query_port(struct ib_device *dev,
> +			  u8 port_num, struct ib_port_attr *attr)
> +{
> +	struct rxe_dev *rxe = to_rdev(dev);
> +	struct rxe_port *port;
> +
> +	if (unlikely(port_num < 1 || port_num > rxe->num_ports)) {
> +		pr_warn("invalid port_number %d\n", port_num);
> +		goto err1;
> +	}
> +
> +	port = &rxe->port[port_num - 1];
> +
> +	*attr = port->attr;
> +	return 0;
> +
> +err1:
> +	return -EINVAL;
> +}

The pr_warn() statement in the above function looks to me like an
example of how pr_warn() should not be used: the message it generates
won't allow a user to find out which subsystem generated that message.
I suggest either to leave out that pr_warn() statement entirely or to
convert it into a pr_debug() statement and to use an appropriate
prefix, e.g. via pr_fmt().

> +static struct ib_ah *rxe_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
> +{
> +	int err;
> +	struct rxe_dev *rxe = to_rdev(ibpd->device);
> +	struct rxe_pd *pd = to_rpd(ibpd);
> +	struct rxe_ah *ah;
> +
> +	err = rxe_av_chk_attr(rxe, attr);
> +	if (err)
> +		goto err1;
> +
> +	ah = rxe_alloc(&rxe->ah_pool);
> +	if (!ah) {
> +		err = -ENOMEM;
> +		goto err1;
> +	}
> +
> +	rxe_add_ref(pd);
> +	ah->pd = pd;
> +
> +	err = rxe_av_from_attr(rxe, attr->port_num, &ah->av, attr);
> +	if (err)
> +		goto err2;
> +
> +	return &ah->ibah;
> +
> +err2:
> +	rxe_drop_ref(pd);
> +	rxe_drop_ref(ah);
> +err1:
> +	return ERR_PTR(err);
> +}

I'm not sure that using a variable with the name "err" and labels with
names "err1" and "err2" is a good idea with regard to readability.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 13/37] add rxe_verbs.c
       [not found]   ` <20110724201228.761358826-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-08-15 13:24     ` Bart Van Assche
@ 2011-08-15 14:33     ` Bart Van Assche
       [not found]       ` <CAO+b5-pRQXn08aVOPxeQBzrYXAV+3zD+HZdCCmPTTwBCCzY=ag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2011-08-15 16:31     ` Bart Van Assche
  2 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-15 14:33 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Greg KH

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +	for (i = 0; i < ARRAY_SIZE(rxe_dev_attributes); ++i) {
> +		err = device_create_file(&dev->dev, rxe_dev_attributes[i]);
> +		if (err) {
> +			pr_warn("device_create_file failed, "
> +				"i = %d, err = %d\n", i, err);
> +			goto err2;
> +		}
> +	}

(added Greg in CC)

It's not your fault but loops similar to the above for creating device
attributes occur in many drivers in the Linux kernel. How about adding
functions called device_create_files() and device_remote_files()
functions in drivers/base/core.c ? If you want you can start from the
implementations of sysfs_create_files() and sysfs_remove_files() in
fs/sysfs/file.c.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 13/37] add rxe_verbs.c
       [not found]       ` <CAO+b5-pRQXn08aVOPxeQBzrYXAV+3zD+HZdCCmPTTwBCCzY=ag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-08-15 14:45         ` Greg KH
       [not found]           ` <20110815144503.GA12014-l3A5Bk7waGM@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Greg KH @ 2011-08-15 14:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Aug 15, 2011 at 04:33:14PM +0200, Bart Van Assche wrote:
> On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> > +	for (i = 0; i < ARRAY_SIZE(rxe_dev_attributes); ++i) {
> > +		err = device_create_file(&dev->dev, rxe_dev_attributes[i]);
> > +		if (err) {
> > +			pr_warn("device_create_file failed, "
> > +				"i = %d, err = %d\n", i, err);
> > +			goto err2;
> > +		}
> > +	}
> 
> (added Greg in CC)
> 
> It's not your fault but loops similar to the above for creating device
> attributes occur in many drivers in the Linux kernel. How about adding
> functions called device_create_files() and device_remote_files()
> functions in drivers/base/core.c ? If you want you can start from the
> implementations of sysfs_create_files() and sysfs_remove_files() in
> fs/sysfs/file.c.

How about using the api functions that are already present in the kernel
to do the exact thing you are asking for here?

And no one should EVER be doing a loop like the above mentioned one, so
yes, it is their fault :)

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 13/37] add rxe_verbs.c
       [not found]           ` <20110815144503.GA12014-l3A5Bk7waGM@public.gmane.org>
@ 2011-08-15 14:58             ` Bart Van Assche
       [not found]               ` <CAO+b5-rsofKNdcjCjBdj1fH7i3DpbHhFajhk0QfNK4DE7nj_SQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-15 14:58 UTC (permalink / raw)
  To: Greg KH
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Aug 15, 2011 at 4:45 PM, Greg KH <gregkh-l3A5Bk7waGM@public.gmane.org> wrote:
> On Mon, Aug 15, 2011 at 04:33:14PM +0200, Bart Van Assche wrote:
>> On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
>> > +   for (i = 0; i < ARRAY_SIZE(rxe_dev_attributes); ++i) {
>> > +           err = device_create_file(&dev->dev, rxe_dev_attributes[i]);
>> > +           if (err) {
>> > +                   pr_warn("device_create_file failed, "
>> > +                           "i = %d, err = %d\n", i, err);
>> > +                   goto err2;
>> > +           }
>> > +   }
>
> How about using the api functions that are already present in the kernel
> to do the exact thing you are asking for here?
>
> And no one should EVER be doing a loop like the above mentioned one, so
> yes, it is their fault :)

Hello Greg,

Thanks for replying, but It seems like I have missed something. I have
found a function called device_add_attributes() in drivers/base/core.c
but it's declared static. Does your reply mean that there exists an
exported function that allows to add multiple device attributes at
once without defining a new device class ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 13/37] add rxe_verbs.c
       [not found]               ` <CAO+b5-rsofKNdcjCjBdj1fH7i3DpbHhFajhk0QfNK4DE7nj_SQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-08-15 15:07                 ` Greg KH
       [not found]                   ` <20110815150726.GA12490-l3A5Bk7waGM@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Greg KH @ 2011-08-15 15:07 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Aug 15, 2011 at 04:58:44PM +0200, Bart Van Assche wrote:
> On Mon, Aug 15, 2011 at 4:45 PM, Greg KH <gregkh-l3A5Bk7waGM@public.gmane.org> wrote:
> > On Mon, Aug 15, 2011 at 04:33:14PM +0200, Bart Van Assche wrote:
> >> On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pid6O66tcy7@public.gmane.orgm> wrote:
> >> > +   for (i = 0; i < ARRAY_SIZE(rxe_dev_attributes); ++i) {
> >> > +           err = device_create_file(&dev->dev, rxe_dev_attributes[i]);
> >> > +           if (err) {
> >> > +                   pr_warn("device_create_file failed, "
> >> > +                           "i = %d, err = %d\n", i, err);
> >> > +                   goto err2;
> >> > +           }
> >> > +   }
> >
> > How about using the api functions that are already present in the kernel
> > to do the exact thing you are asking for here?
> >
> > And no one should EVER be doing a loop like the above mentioned one, so
> > yes, it is their fault :)
> 
> Hello Greg,
> 
> Thanks for replying, but It seems like I have missed something. I have
> found a function called device_add_attributes() in drivers/base/core.c
> but it's declared static. Does your reply mean that there exists an
> exported function that allows to add multiple device attributes at
> once without defining a new device class ?

Yes.

For a device, you HAVE to be creating these attributes before the
hotplug event is sent out, and to do that, you need to set the correct
pointer before registering the device.  See the documentation for the
driver model for how to do this properly.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 13/37] add rxe_verbs.c
       [not found]                   ` <20110815150726.GA12490-l3A5Bk7waGM@public.gmane.org>
@ 2011-08-15 16:02                     ` Bart Van Assche
       [not found]                       ` <CAO+b5-ocwYv2_A8PfVUh1yp0CFNYasjTJNQ0F6bj-+wT8npavA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-15 16:02 UTC (permalink / raw)
  To: Greg KH
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Aug 15, 2011 at 5:07 PM, Greg KH <gregkh-l3A5Bk7waGM@public.gmane.org> wrote:
> For a device, you HAVE to be creating these attributes before the
> hotplug event is sent out, and to do that, you need to set the correct
> pointer before registering the device.  See the documentation for the
> driver model for how to do this properly.

Are you referring to the "groups" member in "struct device" ? The
source files I have found so far that fill in this member in a struct
device are: drivers/s390/cio/css.c and drivers/usb/core/endpoint.c.
But I haven't found anything about that member in the documents under
Documentation/driver-model. Is it just me who's again missing
something ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 13/37] add rxe_verbs.c
       [not found]                       ` <CAO+b5-ocwYv2_A8PfVUh1yp0CFNYasjTJNQ0F6bj-+wT8npavA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-08-15 16:13                         ` Greg KH
  0 siblings, 0 replies; 85+ messages in thread
From: Greg KH @ 2011-08-15 16:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Aug 15, 2011 at 06:02:23PM +0200, Bart Van Assche wrote:
> On Mon, Aug 15, 2011 at 5:07 PM, Greg KH <gregkh-l3A5Bk7waGM@public.gmane.org> wrote:
> > For a device, you HAVE to be creating these attributes before the
> > hotplug event is sent out, and to do that, you need to set the correct
> > pointer before registering the device.  See the documentation for the
> > driver model for how to do this properly.
> 
> Are you referring to the "groups" member in "struct device" ?

Yes, or the one in the class or bus structure, depending on what you are
wanting to create.

> The
> source files I have found so far that fill in this member in a struct
> device are: drivers/s390/cio/css.c and drivers/usb/core/endpoint.c.
> But I haven't found anything about that member in the documents under
> Documentation/driver-model. Is it just me who's again missing
> something ?

The docbook comments in include/linux/device.h?

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 21/37] add rxe_qp.c
       [not found]   ` <20110724201229.163423781-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-15 16:13     ` Bart Van Assche
       [not found]       ` <CAO+b5-opA-W4Lp69NzS=9MLEv1V7jWhHPo=bayKhaR-kPgTQTQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-15 16:13 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +char *rxe_qp_state_name[] = {
> +	[QP_STATE_RESET]	= "RESET",
> +	[QP_STATE_INIT]		= "INIT",
> +	[QP_STATE_READY]	= "READY",
> +	[QP_STATE_DRAIN]	= "DRAIN",
> +	[QP_STATE_DRAINED]	= "DRAINED",
> +	[QP_STATE_ERROR]	= "ERROR",
> +};

Doesn't the compiler complain about assigning const char* to char* for
the above array definition ? And since this array is only used in this
source file, I think it can be declared static.

> +static int rxe_qp_chk_cap(struct rxe_dev *rxe, struct ib_qp_cap *cap,
> +			  int has_srq)
> +{
> +	if (cap->max_send_wr > rxe->attr.max_qp_wr) {
> +		pr_warn("invalid send wr = %d > %d\n",
> +			cap->max_send_wr, rxe->attr.max_qp_wr);
> +		goto err1;
> +	}

Context information is missing in the message generated by the above
pr_warn() statement. Shouldn't that be dev_warn() instead ?

> +/* move the qp to the error state */
> +void rxe_qp_error(struct rxe_qp *qp)
> +{
> +	qp->req.state = QP_STATE_ERROR;
> +	qp->resp.state = QP_STATE_ERROR;
> +
> +	/* drain work and packet queues */
> +	rxe_run_task(&qp->resp.task, 1);
> +
> +	if (qp_type(qp) == IB_QPT_RC)
> +		rxe_run_task(&qp->comp.task, 1);
> +	else
> +		__rxe_do_task(&qp->comp.task);
> +	rxe_run_task(&qp->req.task, 1);
> +}

A quote from the InfiniBand Architecture Specification:

o11-5.2.5: If the HCA supports SRQ, for RC and UD service, the CI
shall generate a Last WQE Reached Affiliated Asynchronous Event on a
QP that is in the Error State and is associated with an SRQ when
either:
• a CQE is generated for the last WQE, or
• the QP gets in the Error State and there are no more WQEs on the RQ.

I haven't found the code yet for generating the last WQE event in the
ib_rxe implementation ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [patch v2 02/37] add opcodes to ib_pack.h
       [not found]       ` <CAO+b5-pb3p0n5MZYNSJicAdUfZ4PtMNG5vdzDOiZNQwLnB3Wcw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-08-15 16:15         ` Bob Pearson
  2011-08-20 14:47           ` Bart Van Assche
  2011-09-01  3:38         ` Bob Pearson
  1 sibling, 1 reply; 85+ messages in thread
From: Bob Pearson @ 2011-08-15 16:15 UTC (permalink / raw)
  To: 'Bart Van Assche'; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Originally I had xrc support in rxe but there was nothing in Roland's tree
to support it so I stripped it out.
I was experimenting with congestion notification to see if there was a way
to slow down spewing by fast Ethernet devices but that wasn't working, also
all the MAD support was stripped out. In the end the only thing left was the
constants and entries in the opcode table in rxe_opcode.c which as you say
are not used elsewhere. I think it is reasonable to have a complete set of
opcodes somewhere.

I want to eventually complete the IB transport parts of the driver although
they are not used by RoCE. They would be useful for a soft IB implementation
with say an FPGA based MAC/PHY. The long term goal is to have a complete
reference implementation of the IB Architecture.

> -----Original Message-----
> From: bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [mailto:bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org] On
> Behalf Of Bart Van Assche
> Sent: Monday, August 15, 2011 4:13 AM
> To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v2 02/37] add opcodes to ib_pack.h
> 
> On Sun, Jul 24, 2011 at 9:43 PM, <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> >
> > Bring up to date with the current version of the IBTA spec.
> >        - add new opcodes for RC and RD
> >        - add new groups of opcodes for CN and XRC
> >
> > Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
> >
> > ---
> >  include/rdma/ib_pack.h |   39
> ++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 38 insertions(+), 1 deletion(-)
> >
> > Index: infiniband/include/rdma/ib_pack.h
> >
> ==========================================================
> =========
> > --- infiniband.orig/include/rdma/ib_pack.h
> > +++ infiniband/include/rdma/ib_pack.h
> > @@ -75,6 +75,8 @@ enum {
> >        IB_OPCODE_UC                                = 0x20,
> >        IB_OPCODE_RD                                = 0x40,
> >        IB_OPCODE_UD                                = 0x60,
> > +       IB_OPCODE_CN                                = 0x80,
> > +       IB_OPCODE_XRC                               = 0xA0,
> 
> A convention when submitting Linux kernel patches is to add only those
> constants that are actually used in the code submitted in the same
> patch series. So I'm not sure it's fine to add the XRC constants in
> ib_pack.h through the rxe patch series.
> 
> Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 13/37] add rxe_verbs.c
       [not found]   ` <20110724201228.761358826-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
  2011-08-15 13:24     ` Bart Van Assche
  2011-08-15 14:33     ` Bart Van Assche
@ 2011-08-15 16:31     ` Bart Van Assche
  2 siblings, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-15 16:31 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +static ssize_t rxe_show_skb_num(struct device *device,
> +				struct device_attribute *attr, char *buf)
> +{
> +	struct rxe_dev *rxe = container_of(device, struct rxe_dev,
> +					   ib_dev.dev);
> +
> +	return sprintf(buf, "req_in:%d resp_in:%d req_out:%d resp_out:%d\n",
> +		atomic_read(&rxe->req_skb_in),
> +		atomic_read(&rxe->resp_skb_in),
> +		atomic_read(&rxe->req_skb_out),
> +		atomic_read(&rxe->resp_skb_out));
> +}

The rule for sysfs attributes is one value per file. See also
Documentation/sysfs-rules.txt.

Also, all sysfs changes have to be documented [1]. See also
Documentation/ABI/README for more information.

[1] Greg Kroah-Hartman, http://lkml.org/lkml/2010/10/9/103.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 21/37] add rxe_qp.c
       [not found]       ` <CAO+b5-opA-W4Lp69NzS=9MLEv1V7jWhHPo=bayKhaR-kPgTQTQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-08-15 16:58         ` Jason Gunthorpe
  0 siblings, 0 replies; 85+ messages in thread
From: Jason Gunthorpe @ 2011-08-15 16:58 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Aug 15, 2011 at 06:13:57PM +0200, Bart Van Assche wrote:
> On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> > +char *rxe_qp_state_name[] = {
> > +	[QP_STATE_RESET]	= "RESET",
> > +	[QP_STATE_INIT]		= "INIT",
> > +	[QP_STATE_READY]	= "READY",
> > +	[QP_STATE_DRAIN]	= "DRAIN",
> > +	[QP_STATE_DRAINED]	= "DRAINED",
> > +	[QP_STATE_ERROR]	= "ERROR",
> > +};
> 
> Doesn't the compiler complain about assigning const char* to char* for
> the above array definition ? And since this array is only used in this
> source file, I think it can be declared static.

Best would be:

 static const char * const rxe_qp_state_name[]

To put as much as possible in .rodata.

Be sure to run size and nm on your .o files and check that the stuff
in .data actually needs to be writeable..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 02/37] add opcodes to ib_pack.h
  2011-08-15 16:15         ` Bob Pearson
@ 2011-08-20 14:47           ` Bart Van Assche
       [not found]             ` <CAO+b5-qV9uxs=V_uHbVZ_3sm1LqkQUB0em=K=nrRxUvFQYWo5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-20 14:47 UTC (permalink / raw)
  To: Bob Pearson; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Aug 15, 2011 at 6:15 PM, Bob Pearson
<rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> I want to eventually complete the IB transport parts of the driver although
> they are not used by RoCE. They would be useful for a soft IB implementation
> with say an FPGA based MAC/PHY. The long term goal is to have a complete
> reference implementation of the IB Architecture.

Having a complete reference implementation of the IB architecture
available would be great. IMHO at least the following should be added
to ib_rxe (not necessarily in the first version) before it can called
a reference implementation:
- Support for secondary GIDs. These make it easy to implement
fail-over in a H.A. cluster.
- Support for multicast.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 31/37] add rxe.c
       [not found]   ` <20110724201229.666252623-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-20 15:12     ` Bart Van Assche
  0 siblings, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-20 15:12 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +int rxe_fast_comp = 2;
> +module_param_named(fast_comp, rxe_fast_comp, int, 0644);
> +MODULE_PARM_DESC(fast_comp, "fast path call to completer "
> +		 "(0=no, 1=no int context, 2=any context)");
> +
> +int rxe_fast_resp = 2;
> +module_param_named(fast_resp, rxe_fast_resp, int, 0644);
> +MODULE_PARM_DESC(fast_resp, "enable fast path call to responder "
> +		 "(0=no, 1=no int context, 2=any context)");
> +
> +int rxe_fast_req = 2;
> +module_param_named(fast_req, rxe_fast_req, int, 0644);
> +MODULE_PARM_DESC(fast_req, "enable fast path call to requester "
> +		 "(0=no, 1=no int context, 2=any context)");
> +
> +int rxe_fast_arb = 2;
> +module_param_named(fast_arb, rxe_fast_arb, int, 0644);
> +MODULE_PARM_DESC(fast_arb, "enable fast path call to arbiter "
> +		 "(0=no, 1=no int context, 2=any context)");

The ib_rxe kernel module has an impressive number of module
parameters. I have been wondering whether it's possible to reduce that
number somewhat. As an example, is it necessary that all four kernel
module parameters mentioned above can be configured individually ? As
far as I can see the only purpose of these parameters is to make it
possible to move processing of certain events from interrupt to
tasklet or process context. How about using one kernel module
parameter for that purpose instead of four ?

> +/* initialize port state, note IB convention
> +   that HCA ports are always numbered from 1 */
> +static int rxe_init_ports(struct rxe_dev *rxe)
> +{
> +	int err;
> +	unsigned int port_num;
> +	struct rxe_port *port;
> +
> +	rxe->port = kcalloc(rxe->num_ports, sizeof(struct rxe_port),
> +			    GFP_KERNEL);
> +	if (!rxe->port)
> +		return -ENOMEM;
> +
> +	for (port_num = 1; port_num <= rxe->num_ports; port_num++) {
> +		port = &rxe->port[port_num - 1];
> +
> +		rxe_init_port_param(rxe, port_num);
> +
> +		if (!port->attr.pkey_tbl_len) {
> +			err = -EINVAL;
> +			goto err1;
> +		}
> +
> +		port->pkey_tbl = kcalloc(port->attr.pkey_tbl_len,
> +					 sizeof(*port->pkey_tbl), GFP_KERNEL);
> +		if (!port->pkey_tbl) {
> +			err = -ENOMEM;
> +			goto err1;
> +		}
> +
> +		port->pkey_tbl[0] = 0xffff;
> +
> +		if (!port->attr.gid_tbl_len) {
> +			kfree(port->pkey_tbl);
> +			err = -EINVAL;
> +			goto err1;
> +		}
> +
> +		port->guid_tbl = kcalloc(port->attr.gid_tbl_len,
> +					 sizeof(*port->guid_tbl), GFP_KERNEL);
> +		if (!port->guid_tbl) {
> +			kfree(port->pkey_tbl);
> +			err = -ENOMEM;
> +			goto err1;
> +		}
> +
> +		port->guid_tbl[0] = rxe->ifc_ops->port_guid(rxe, port_num);
> +
> +		spin_lock_init(&port->port_lock);
> +	}
> +
> +	return 0;
> +
> +err1:
> +	while (--port_num >= 1) {
> +		port = &rxe->port[port_num - 1];
> +		kfree(port->pkey_tbl);
> +		kfree(port->guid_tbl);
> +	}
> +
> +	kfree(rxe->port);
> +	return err;
> +}

Same remark here about the "err" variable and the "err1" label - that
seems confusing to me.

> +int rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu,
> +		unsigned int port_num)
> +{
> +	struct rxe_port *port = &rxe->port[port_num - 1];
> +	enum rxe_mtu mtu;
> +
> +	mtu = eth_mtu_int_to_enum(ndev_mtu);
> +	if (!mtu)
> +		return -EINVAL;
> +
> +	/* Set the port mtu to min(feasible, preferred) */
> +	mtu = min_t(enum rxe_mtu, mtu, rxe->pref_mtu);
> +
> +	port->attr.active_mtu = (enum ib_mtu __force)mtu;
> +	port->mtu_cap = rxe_mtu_enum_to_int(mtu);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(rxe_set_mtu);

Sorry, but using the datatype "enum rxe_mtu" for an MTU doesn't make
sense to me. Shouldn't that be "int" instead ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 06/37] add rxe_param.h
       [not found]   ` <20110724201228.415281967-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-20 15:27     ` Bart Van Assche
       [not found]       ` <CAO+b5-q5ohgttDAc45MSKrSUdUVOfR3xHMSY99JtfX_MetSw8Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-20 15:27 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +enum rxe_mtu {
> +	RXE_MTU_INVALID = 0,
> +	RXE_MTU_256	= 1,
> +	RXE_MTU_512	= 2,
> +	RXE_MTU_1024	= 3,
> +	RXE_MTU_2048	= 4,
> +	RXE_MTU_4096	= 5,
> +	RXE_MTU_8192	= 6,
> +};

Do the numerical values in the above enum have a meaning ? If not, how
about using log2(mtu) for the numerical values and leaving out all
names except RXE_MTU_INVALID ? That would allow to simplify the
implementation of the two functions below.

Also, in the function rxe_param_set_mtu() it is verified whether the
user-provided mtu is in the range (256..4096). Shouldn't there be
symbolic names for these two extreme values ?

> +static inline int rxe_mtu_enum_to_int(enum rxe_mtu mtu)
> +{
> +	switch (mtu) {
> +	case RXE_MTU_256:	return	256;
> +	case RXE_MTU_512:	return	512;
> +	case RXE_MTU_1024:	return	1024;
> +	case RXE_MTU_2048:	return	2048;
> +	case RXE_MTU_4096:	return	4096;
> +	case RXE_MTU_8192:	return	8192;
> +	default:		return	-1;
> +	}
> +}

For this function I'd prefer a function name like "log2_mtu_to_mtu"
instead of "rxe_mtu_enum_to_int". It seems to me that all cases except
"default" produce the value (1 << (8 + log2_mtu)), so the above
function can be made shorter.

> +static inline enum rxe_mtu rxe_mtu_int_to_enum(int mtu)
> +{
> +	if (mtu < 256)
> +		return 0;
> +	else if (mtu < 512)
> +		return RXE_MTU_256;
> +	else if (mtu < 1024)
> +		return RXE_MTU_512;
> +	else if (mtu < 2048)
> +		return RXE_MTU_1024;
> +	else if (mtu < 4096)
> +		return RXE_MTU_2048;
> +	else if (mtu < 8192)
> +		return RXE_MTU_4096;
> +	else
> +		return RXE_MTU_8192;
> +}

The above function can be made shorter by using the function ilog2()
defined in <linux/log2.h>.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 29/37] add rxe_dma.c
       [not found]   ` <20110724201229.565188525-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-20 15:32     ` Bart Van Assche
  0 siblings, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-20 15:32 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +static u64 rxe_dma_map_single(struct ib_device *dev,
> +			      void *cpu_addr, size_t size,
> +			      enum dma_data_direction direction)
> +{
> +	BUG_ON(!valid_dma_direction(direction));
> +	return (u64) (uintptr_t) cpu_addr;
> +}

In the above function execution can continue without crashing even if
the DMA direction argument is invalid. So please use WARN_ON() instead
of BUG_ON() in the functions in source file rxe_dma.c.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 20/37] add rxe_cq.c
       [not found]   ` <20110724201229.112267527-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-20 15:53     ` Bart Van Assche
  0 siblings, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-20 15:53 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
> +		    int cqe, int comp_vector, struct ib_udata *udata)
> +{
> +	int count;
> +
> +	if (cqe <= 0) {
> +		pr_warn("cqe(%d) <= 0\n", cqe);
> +		goto err1;
> +	}

Shouldn't that be dev_warn() instead of pr_warn() ?

> +int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe,
> +		     int comp_vector, struct ib_ucontext *context,
> +		     struct ib_udata *udata)
> +{
> +	int err;
> +
> +	cq->queue = rxe_queue_init(rxe, (unsigned int *)&cqe,
> +				   sizeof(struct rxe_cqe));
> +	if (!cq->queue) {
> +		pr_warn("unable to create cq\n");
> +		return -ENOMEM;
> +	}
> +
> +	err = do_mmap_info(rxe, udata, 0, context, cq->queue->buf,
> +			   cq->queue->buf_size, &cq->queue->ip);
> +	if (err) {
> +		vfree(cq->queue->buf);
> +		kfree(cq->queue);
> +	}

In the above I see that cq->queue->buf and cq->queue are both
allocated via a single function call but that they are freed
individually. That doesn't look very symmetric to me.

> +	memcpy(producer_addr(cq->queue), cqe, sizeof(*cqe));
> +
> +	/* make sure all changes to the CQ are written before
> +	   we update the producer pointer */
> +	wmb();
> +
> +	advance_producer(cq->queue);

Almost all advance_producer() calls are preceded by a wmb() call. Has
it been considered to move the wmb() call into that function and to
add a numeric argument to advance_producer() specifying by how much
the producer index should be advanced ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 23/37] add rxe_mcast.c
       [not found]   ` <20110724201229.264963653-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-20 16:13     ` Bart Van Assche
  0 siblings, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-20 16:13 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
> +			    union ib_gid *mgid, u16 mlid)
> +{
> +	struct rxe_mc_grp *grp;
> +	struct rxe_mc_elem *elem, *tmp;
> +
> +	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
> +	if (!grp)
> +		goto err1;
> +
> +	spin_lock_bh(&qp->grp_lock);
> +	spin_lock_bh(&grp->mcg_lock);
> +
> +	list_for_each_entry_safe(elem, tmp, &grp->qp_list, qp_list) {
> +		if (elem->qp == qp) {
> +			list_del(&elem->qp_list);
> +			list_del(&elem->grp_list);
> +			grp->num_qp--;
> +
> +			spin_unlock_bh(&grp->mcg_lock);
> +			spin_unlock_bh(&qp->grp_lock);
> +			rxe_drop_ref(elem);
> +			rxe_drop_ref(grp);	/* ref held by QP */
> +			rxe_drop_ref(grp);	/* ref from get_key */
> +			return 0;
> +		}
> +	}

The above loop is left after a match has been found. Do we really need
list_for_each_entry_safe() here ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 27/37] add rxe_resp.c
       [not found]   ` <20110724201229.464927217-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2011-08-20 16:26     ` Bart Van Assche
  0 siblings, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-08-20 16:26 UTC (permalink / raw)
  To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> +static char *resp_state_name[] = {
> +	[RESPST_NONE]				= "NONE",

Shouldn't that be "const char * const" instead of "char *" ?

> +/* rxe_recv calls here to add a request packet to the input queue */
> +void rxe_resp_queue_pkt(struct rxe_dev *rxe, struct rxe_qp *qp,
> +			struct sk_buff *skb)
> +{
> +	int must_sched;

Shouldn't that be "bool" instead of "int" ?

> +	default:
> +		WARN_ON(1);
> +		break;

Shouldn't that be "__WARN()" instead of "WARN_ON(1)" ?

> +static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
> +					  struct rxe_pkt_info *pkt,
> +					  int opcode,
> +					  int payload,
> +					  u32 psn,
> +					  u8 syndrome)
> +{
> +	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
> +	struct sk_buff *skb;
> +	struct rxe_pkt_info *ack;
> +	int paylen;
> +	int pad;
> +
> +	/*
> +	 * allocate packet
> +	 */
> +	pad = (-payload) & 0x3;

Something like "pad = ALIGN(payload, 4) - payload" is probably easier to read.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 02/37] add opcodes to ib_pack.h
       [not found]             ` <CAO+b5-qV9uxs=V_uHbVZ_3sm1LqkQUB0em=K=nrRxUvFQYWo5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-08-22 18:08               ` Bart Van Assche
       [not found]                 ` <CAO+b5-pKPuhsrQV6igqo78y_1+gZ=09GxHoABOVnLxX3myFYZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Bart Van Assche @ 2011-08-22 18:08 UTC (permalink / raw)
  To: Bob Pearson; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sat, Aug 20, 2011 at 4:47 PM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
> On Mon, Aug 15, 2011 at 6:15 PM, Bob Pearson
> <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> > I want to eventually complete the IB transport parts of the driver although
> > they are not used by RoCE. They would be useful for a soft IB implementation
> > with say an FPGA based MAC/PHY. The long term goal is to have a complete
> > reference implementation of the IB Architecture.
>
> Having a complete reference implementation of the IB architecture
> available would be great. IMHO at least the following should be added
> to ib_rxe (not necessarily in the first version) before it can called
> a reference implementation:
> - Support for secondary GIDs. These make it easy to implement
> fail-over in a H.A. cluster.
> - Support for multicast.

More in detail, my comments with regard to multicast support in ib_rxe are:
1. Using ipv6_eth_mc_map() for mapping multicast GIDs seems an
unfortunate choice to me. That choice will cause multicast GIDs to be
mapped to the 33-33-xx-xx-xx-xx Ethernet address range that has been
reserved by RFC 2464 for IPv6 multicast addresses. If a collision with
an IPv6 multicast address occurs and IPv6 MLD snooping has enabled on
the switches in the involved network then RoCE multicast won't work
properly. IMHO we need a separate Ethernet address range for RoCE
multicast purposes, next to the existing ranges for IPv4 and IPv6.
2. We need a way to limit multicast traffic to those switch ports on
which receivers are listening. The InfiniBand architecture has such a
protocol but it relies on the subnet manager, a component that does
not exist in RoCE networks. And since IGMP and MLD are IPv4/IPv6
specific we need a new (IBTA-approved) protocol.
3. RFC 4541, in which IGMP and MLD snooping has been defined, will
have to be extended with a paragraph about snooping the protocol
defined in (2).
4. Switch vendors will have to implement the protocol item (3) refers to.

Shouldn't a RoCE multicast implementation be postponed until the IEEE
has assigned an Ethernet address range for mapping RoCE multicast GIDs
?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [patch v2 02/37] add opcodes to ib_pack.h
       [not found]                 ` <CAO+b5-pKPuhsrQV6igqo78y_1+gZ=09GxHoABOVnLxX3myFYZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-09-01  3:15                   ` Bob Pearson
  2011-09-01 17:29                     ` Jason Gunthorpe
  2011-09-01 17:51                     ` Bart Van Assche
  0 siblings, 2 replies; 85+ messages in thread
From: Bob Pearson @ 2011-09-01  3:15 UTC (permalink / raw)
  To: 'Bart Van Assche'; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Bart Van Assche
> Sent: Monday, August 22, 2011 1:08 PM
> To: Bob Pearson
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v2 02/37] add opcodes to ib_pack.h
> 
> On Sat, Aug 20, 2011 at 4:47 PM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> wrote:
> > On Mon, Aug 15, 2011 at 6:15 PM, Bob Pearson
> > <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> > > I want to eventually complete the IB transport parts of the driver
> although
> > > they are not used by RoCE. They would be useful for a soft IB
> implementation
> > > with say an FPGA based MAC/PHY. The long term goal is to have a
> complete
> > > reference implementation of the IB Architecture.
> >
> > Having a complete reference implementation of the IB architecture
> > available would be great. IMHO at least the following should be added
> > to ib_rxe (not necessarily in the first version) before it can called
> > a reference implementation:
> > - Support for secondary GIDs. These make it easy to implement
> > fail-over in a H.A. cluster.
> > - Support for multicast.
> 
> More in detail, my comments with regard to multicast support in ib_rxe
are:
> 1. Using ipv6_eth_mc_map() for mapping multicast GIDs seems an
> unfortunate choice to me. That choice will cause multicast GIDs to be
> mapped to the 33-33-xx-xx-xx-xx Ethernet address range that has been
> reserved by RFC 2464 for IPv6 multicast addresses. If a collision with
> an IPv6 multicast address occurs and IPv6 MLD snooping has enabled on
> the switches in the involved network then RoCE multicast won't work
> properly. IMHO we need a separate Ethernet address range for RoCE
> multicast purposes, next to the existing ranges for IPv4 and IPv6.

I thought this was intentional! I.e. in order to get multicast over RoCE to
work we were trying to piggyback on the behavior of intelligent switches.
Otherwise the only way to get packets forwarded without adding a new group
of multicast addresses would be to broadcast. The implementation of IGMP and
MLD at the end node is oblivious to the packet's ethertype. And switches (at
least the ones we tried) also happily multicast RoCE packets.

> 2. We need a way to limit multicast traffic to those switch ports on
> which receivers are listening. The InfiniBand architecture has such a
> protocol but it relies on the subnet manager, a component that does
> not exist in RoCE networks. And since IGMP and MLD are IPv4/IPv6
> specific we need a new (IBTA-approved) protocol.
> 3. RFC 4541, in which IGMP and MLD snooping has been defined, will
> have to be extended with a paragraph about snooping the protocol
> defined in (2).
> 4. Switch vendors will have to implement the protocol item (3) refers to.
> 
> Shouldn't a RoCE multicast implementation be postponed until the IEEE
> has assigned an Ethernet address range for mapping RoCE multicast GIDs
> ?
> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [patch v2 02/37] add opcodes to ib_pack.h
       [not found]       ` <CAO+b5-pb3p0n5MZYNSJicAdUfZ4PtMNG5vdzDOiZNQwLnB3Wcw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2011-08-15 16:15         ` Bob Pearson
@ 2011-09-01  3:38         ` Bob Pearson
  1 sibling, 0 replies; 85+ messages in thread
From: Bob Pearson @ 2011-09-01  3:38 UTC (permalink / raw)
  To: 'Bart Van Assche'; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

OK. Since for now I don't need it I'll remove this patch but it might be
worth considering adding them as a standalone patch.

> -----Original Message-----
> From: bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [mailto:bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org] On
> Behalf Of Bart Van Assche
> Sent: Monday, August 15, 2011 4:13 AM
> To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v2 02/37] add opcodes to ib_pack.h
> 
> On Sun, Jul 24, 2011 at 9:43 PM, <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> >
> > Bring up to date with the current version of the IBTA spec.
> >        - add new opcodes for RC and RD
> >        - add new groups of opcodes for CN and XRC
> >
> > Signed-off-by: Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
> >
> > ---
> >  include/rdma/ib_pack.h |   39
> ++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 38 insertions(+), 1 deletion(-)
> >
> > Index: infiniband/include/rdma/ib_pack.h
> >
> ==========================================================
> =========
> > --- infiniband.orig/include/rdma/ib_pack.h
> > +++ infiniband/include/rdma/ib_pack.h
> > @@ -75,6 +75,8 @@ enum {
> >        IB_OPCODE_UC                                = 0x20,
> >        IB_OPCODE_RD                                = 0x40,
> >        IB_OPCODE_UD                                = 0x60,
> > +       IB_OPCODE_CN                                = 0x80,
> > +       IB_OPCODE_XRC                               = 0xA0,
> 
> A convention when submitting Linux kernel patches is to add only those
> constants that are actually used in the code submitted in the same
> patch series. So I'm not sure it's fine to add the XRC constants in
> ib_pack.h through the rxe patch series.
> 
> Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [patch v2 06/37] add rxe_param.h
       [not found]       ` <CAO+b5-q5ohgttDAc45MSKrSUdUVOfR3xHMSY99JtfX_MetSw8Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-09-01  3:50         ` Bob Pearson
  0 siblings, 0 replies; 85+ messages in thread
From: Bob Pearson @ 2011-09-01  3:50 UTC (permalink / raw)
  To: 'Bart Van Assche'; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



> -----Original Message-----
> From: bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [mailto:bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org] On
> Behalf Of Bart Van Assche
> Sent: Saturday, August 20, 2011 10:27 AM
> To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v2 06/37] add rxe_param.h
> 
> On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> > +enum rxe_mtu {
> > +	RXE_MTU_INVALID = 0,
> > +	RXE_MTU_256	= 1,
> > +	RXE_MTU_512	= 2,
> > +	RXE_MTU_1024	= 3,
> > +	RXE_MTU_2048	= 4,
> > +	RXE_MTU_4096	= 5,
> > +	RXE_MTU_8192	= 6,
> > +};
> 
> Do the numerical values in the above enum have a meaning ? If not, how
> about using log2(mtu) for the numerical values and leaving out all
> names except RXE_MTU_INVALID ? That would allow to simplify the
> implementation of the two functions below.
> 
> Also, in the function rxe_param_set_mtu() it is verified whether the
> user-provided mtu is in the range (256..4096). Shouldn't there be
> symbolic names for these two extreme values ?

These enums were added when we were trying to extend the IB MTU to take
advantage of Ethernet jumbo frames which would allow payloads of 8K. Since
then we dropped this because it broke all the user space admin tools, and as
you notice check to see that the mtu is less than 4096. There are generic
macros in the standard IB files that are the same as these except that they
stop at 4096. Your comments apply to those as well but we can have that
battle another day. For now I can drop all the MTU macros here and use the
standard ones. And worry about jumbo frames another time as well.

> 
> > +static inline int rxe_mtu_enum_to_int(enum rxe_mtu mtu)
> > +{
> > +	switch (mtu) {
> > +	case RXE_MTU_256:	return	256;
> > +	case RXE_MTU_512:	return	512;
> > +	case RXE_MTU_1024:	return	1024;
> > +	case RXE_MTU_2048:	return	2048;
> > +	case RXE_MTU_4096:	return	4096;
> > +	case RXE_MTU_8192:	return	8192;
> > +	default:		return	-1;
> > +	}
> > +}
> 
> For this function I'd prefer a function name like "log2_mtu_to_mtu"
> instead of "rxe_mtu_enum_to_int". It seems to me that all cases except
> "default" produce the value (1 << (8 + log2_mtu)), so the above
> function can be made shorter.
> 
> > +static inline enum rxe_mtu rxe_mtu_int_to_enum(int mtu)
> > +{
> > +	if (mtu < 256)
> > +		return 0;
> > +	else if (mtu < 512)
> > +		return RXE_MTU_256;
> > +	else if (mtu < 1024)
> > +		return RXE_MTU_512;
> > +	else if (mtu < 2048)
> > +		return RXE_MTU_1024;
> > +	else if (mtu < 4096)
> > +		return RXE_MTU_2048;
> > +	else if (mtu < 8192)
> > +		return RXE_MTU_4096;
> > +	else
> > +		return RXE_MTU_8192;
> > +}
> 
> The above function can be made shorter by using the function ilog2()
> defined in <linux/log2.h>.
> 
> Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [patch v2 12/37] add rxe_verbs.h
       [not found]       ` <CAO+b5-okTDAJ6iOTv_JKXeGttt0MFM+na-FRaNwJMFqHktKHJQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-09-01  4:32         ` Bob Pearson
  0 siblings, 0 replies; 85+ messages in thread
From: Bob Pearson @ 2011-09-01  4:32 UTC (permalink / raw)
  To: 'Bart Van Assche'; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



> -----Original Message-----
> From: bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [mailto:bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org] On
> Behalf Of Bart Van Assche
> Sent: Monday, August 15, 2011 7:32 AM
> To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v2 12/37] add rxe_verbs.h
> 
> On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> > +static inline int pkey_match(u16 key1, u16 key2)
> > +{
> > +	return (((key1 & 0x7fff) != 0) &&
> > +		((key1 & 0x7fff) == (key2 & 0x7fff)) &&
> > +		((key1 & 0x8000) || (key2 & 0x8000))) ? 1 : 0;
> > +}
> 
> Shouldn't the return type of the above function be "bool" instead of
> "int" ? Also, the "? 1 : 0" part at the end of the above expression
> looks superfluous to me.

Sounds fine to me.

> 
> > +
> > +/* Return >0 if psn_a > psn_b
> > + *	   0 if psn_a == psn_b
> > + *	  <0 if psn_a < psn_b
> > + */
> > +static inline int psn_compare(u32 psn_a, u32 psn_b)
> > +{
> > +	s32 diff;
> > +	diff = (psn_a - psn_b) << 8;
> > +	return diff;
> > +}

Sure

> 
> Mentioning in the comment block that (according to IBTA) PSNs are
> 24-bit quantities would help IMHO.
> 
> > +#define RXE_LL_ADDR_LEN		(16)
> > +
> > +struct rxe_av {
> > +	struct ib_ah_attr	attr;
> > +	u8			ll_addr[RXE_LL_ADDR_LEN];
> > +};

Good idea. Can use MAX_ADDR_LEN as well.

> 
> Since the ib_rxe kernel module can be used in combination with other
> network drivers than only Ethernet drivers, the number of bytes stored
> in ll_addr[] by init_av() depends on the link layer type. Shouldn't
> the address length be stored in this structure too ?
> 
> > +/* must match corresponding data structure in librxe */
> > +struct rxe_send_wqe {
> > +	struct ib_send_wr	ibwr;
> > +	struct rxe_av		av;	/* UD only */
> > +	u32			status;
> > +	u32			state;
> > +	u64			iova;
> > +	u32			mask;
> > +	u32			first_psn;
> > +	u32			last_psn;
> > +	u32			ack_length;
> > +	u32			ssn;
> > +	u32			has_rd_atomic;
> > +	struct rxe_dma_info	dma;	/* must go last */
> > +};
> 
> Can't librxe use this header file instead of duplicating the above
definition ?

We followed the local convention which is to keep the header files separate.
(See e.g. nes/nes_user.h, mthca/mthca_user.h, etc...) I can't justify it
except that everyone else does it too.

> 
> > +struct rxe_port {
> > +	struct ib_port_attr	attr;
> > +	u16			*pkey_tbl;
> > +	__be64			*guid_tbl;
> > +	__be64			subnet_prefix;
> > +
> > +	/* rate control */
> > +	/* TODO */
> > +
> > +	spinlock_t		port_lock;
> > +
> > +	unsigned int		mtu_cap;
> > +
> > +	/* special QPs */
> > +	u32			qp_smi_index;
> > +	u32			qp_gsi_index;
> > +};
> 
> Regarding the "to do" item above: is rate control something planned
> for the first submission of this driver ?

Currently rxe only supports one port per device and the injection rate
control is tied to the device.  It is a fairly fussy change to completely
implement multiport devices and doesn't add much value as far as I can tell.
So for now I suppose we can drop this TODO.

> 
> Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [patch v2 12/37] add rxe_verbs.h
       [not found]       ` <CAO+b5-qQBWeyFacCU3tx92jG2zw1=YZJRRX5SbQ57Ztx+zxuew-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-09-01  4:35         ` Bob Pearson
  0 siblings, 0 replies; 85+ messages in thread
From: Bob Pearson @ 2011-09-01  4:35 UTC (permalink / raw)
  To: 'Bart Van Assche'; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



> -----Original Message-----
> From: bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [mailto:bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org] On
> Behalf Of Bart Van Assche
> Sent: Monday, August 15, 2011 8:00 AM
> To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v2 12/37] add rxe_verbs.h
> 
> On Sun, Jul 24, 2011 at 9:43 PM, <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> 
> + rxe_qp.c:char *rxe_qp_state_name[] = {
> 
> One more comment about rxe_verbs.h: why is the above declaration
> present in this header file ? As far as I can see it's only used in
> rxe_qp.c and not in any other source file.

Agreed. Can make static in rxe_qp.c and drop here.

> 
> Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [patch v2 13/37] add rxe_verbs.c
       [not found]       ` <CAO+b5-r3UtgnyfhyaodHQTj_xB5d2MQkZ8kRbrpZ1hF_0W1ONw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-09-01  5:20         ` Bob Pearson
  2011-09-01 10:33           ` Bart Van Assche
  2011-09-01 11:02           ` Bart Van Assche
  0 siblings, 2 replies; 85+ messages in thread
From: Bob Pearson @ 2011-09-01  5:20 UTC (permalink / raw)
  To: 'Bart Van Assche'; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



> -----Original Message-----
> From: bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [mailto:bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org] On
> Behalf Of Bart Van Assche
> Sent: Monday, August 15, 2011 8:25 AM
> To: rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v2 13/37] add rxe_verbs.c
> 
> On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> > +static int rxe_query_port(struct ib_device *dev,
> > +			  u8 port_num, struct ib_port_attr *attr)
> > +{
> > +	struct rxe_dev *rxe = to_rdev(dev);
> > +	struct rxe_port *port;
> > +
> > +	if (unlikely(port_num < 1 || port_num > rxe->num_ports)) {
> > +		pr_warn("invalid port_number %d\n", port_num);
> > +		goto err1;
> > +	}
> > +
> > +	port = &rxe->port[port_num - 1];
> > +
> > +	*attr = port->attr;
> > +	return 0;
> > +
> > +err1:
> > +	return -EINVAL;
> > +}
> 
> The pr_warn() statement in the above function looks to me like an
> example of how pr_warn() should not be used: the message it generates
> won't allow a user to find out which subsystem generated that message.
> I suggest either to leave out that pr_warn() statement entirely or to
> convert it into a pr_debug() statement and to use an appropriate
> prefix, e.g. via pr_fmt().

I am not familiar with pr_fmt but looked at some examples. It looks like the
conventional usage is to define pr_fmt(fmt) as KBUILD_MODNAME ": " fmt which
I suppose will output ib_rxe: or some variant. Ideally I would want rxe0: or
rxe1: which is the ib device name.

Do you have an opinion about best use of pr_debug, pr_warn and pr_err and
also about use of something like CONFIG_RXE_DEBUG vs just plain DEBUG vs
always on?

> 
> > +static struct ib_ah *rxe_create_ah(struct ib_pd *ibpd, struct
ib_ah_attr
> *attr)
> > +{
> > +	int err;
> > +	struct rxe_dev *rxe = to_rdev(ibpd->device);
> > +	struct rxe_pd *pd = to_rpd(ibpd);
> > +	struct rxe_ah *ah;
> > +
> > +	err = rxe_av_chk_attr(rxe, attr);
> > +	if (err)
> > +		goto err1;
> > +
> > +	ah = rxe_alloc(&rxe->ah_pool);
> > +	if (!ah) {
> > +		err = -ENOMEM;
> > +		goto err1;
> > +	}
> > +
> > +	rxe_add_ref(pd);
> > +	ah->pd = pd;
> > +
> > +	err = rxe_av_from_attr(rxe, attr->port_num, &ah->av, attr);
> > +	if (err)
> > +		goto err2;
> > +
> > +	return &ah->ibah;
> > +
> > +err2:
> > +	rxe_drop_ref(pd);
> > +	rxe_drop_ref(ah);
> > +err1:
> > +	return ERR_PTR(err);
> > +}
> 
> I'm not sure that using a variable with the name "err" and labels with
> names "err1" and "err2" is a good idea with regard to readability.

There are a lot of these. I used err instead of ret in cases where the only
purpose of the return was to indicate an error. I used a lot of stacked
error returns to avoid repetitive cleanup stanzas. Again the name errN:
implies an error occurred. How about errorN: Is that enough different to
avoid the confusion you sense?

> 
> Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 13/37] add rxe_verbs.c
  2011-09-01  5:20         ` Bob Pearson
@ 2011-09-01 10:33           ` Bart Van Assche
  2011-09-01 11:02           ` Bart Van Assche
  1 sibling, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-09-01 10:33 UTC (permalink / raw)
  To: Bob Pearson; +Cc: linux-rdma

On Thu, Sep 1, 2011 at 7:20 AM, Bob Pearson
<rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> From: bvanassche-HInyCGIudOg@public.gmane.org
>> On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
>> > +static int rxe_query_port(struct ib_device *dev,
>> > +                     u8 port_num, struct ib_port_attr *attr)
>> > +{
>> > +   struct rxe_dev *rxe = to_rdev(dev);
>> > +   struct rxe_port *port;
>> > +
>> > +   if (unlikely(port_num < 1 || port_num > rxe->num_ports)) {
>> > +           pr_warn("invalid port_number %d\n", port_num);
>> > +           goto err1;
>> > +   }
>> > +
>> > +   port = &rxe->port[port_num - 1];
>> > +
>> > +   *attr = port->attr;
>> > +   return 0;
>> > +
>> > +err1:
>> > +   return -EINVAL;
>> > +}
>>
>> The pr_warn() statement in the above function looks to me like an
>> example of how pr_warn() should not be used: the message it generates
>> won't allow a user to find out which subsystem generated that message.
>> I suggest either to leave out that pr_warn() statement entirely or to
>> convert it into a pr_debug() statement and to use an appropriate
>> prefix, e.g. via pr_fmt().
>
> I am not familiar with pr_fmt but looked at some examples. It looks like the
> conventional usage is to define pr_fmt(fmt) as KBUILD_MODNAME ": " fmt which
> I suppose will output ib_rxe: or some variant. Ideally I would want rxe0: or
> rxe1: which is the ib device name.
>
> Do you have an opinion about best use of pr_debug, pr_warn and pr_err and
> also about use of something like CONFIG_RXE_DEBUG vs just plain DEBUG vs
> always on?

I should have recommended dev_warn()/dev_dbg() instead of pr_fmt()
here. These are defined in <linux/device.h>.

And if you would like to make enabling debug statements configurable,
please use the dynamic_debug infrastructure instead of introducing a
configure option like CONFIG_RXE_DEBUG. See also
http://lwn.net/Articles/434833/ for more information.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 13/37] add rxe_verbs.c
  2011-09-01  5:20         ` Bob Pearson
  2011-09-01 10:33           ` Bart Van Assche
@ 2011-09-01 11:02           ` Bart Van Assche
  1 sibling, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-09-01 11:02 UTC (permalink / raw)
  To: Bob Pearson; +Cc: linux-rdma

On Thu, Sep 1, 2011 at 7:20 AM, Bob Pearson
<rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> From: bvanassche-HInyCGIudOg@public.gmane.org
> > On Sun, Jul 24, 2011 at 9:43 PM,  <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> > > +static struct ib_ah *rxe_create_ah(struct ib_pd *ibpd, struct
> ib_ah_attr
> > *attr)
> > > +{
> > > +   int err;
> > > +   struct rxe_dev *rxe = to_rdev(ibpd->device);
> > > +   struct rxe_pd *pd = to_rpd(ibpd);
> > > +   struct rxe_ah *ah;
> > > +
> > > +   err = rxe_av_chk_attr(rxe, attr);
> > > +   if (err)
> > > +           goto err1;
> > > +
> > > +   ah = rxe_alloc(&rxe->ah_pool);
> > > +   if (!ah) {
> > > +           err = -ENOMEM;
> > > +           goto err1;
> > > +   }
> > > +
> > > +   rxe_add_ref(pd);
> > > +   ah->pd = pd;
> > > +
> > > +   err = rxe_av_from_attr(rxe, attr->port_num, &ah->av, attr);
> > > +   if (err)
> > > +           goto err2;
> > > +
> > > +   return &ah->ibah;
> > > +
> > > +err2:
> > > +   rxe_drop_ref(pd);
> > > +   rxe_drop_ref(ah);
> > > +err1:
> > > +   return ERR_PTR(err);
> > > +}
> >
> > I'm not sure that using a variable with the name "err" and labels with
> > names "err1" and "err2" is a good idea with regard to readability.
>
> There are a lot of these. I used err instead of ret in cases where the only
> purpose of the return was to indicate an error. I used a lot of stacked
> error returns to avoid repetitive cleanup stanzas. Again the name errN:
> implies an error occurred. How about errorN: Is that enough different to
> avoid the confusion you sense?

If you are going to rename the "err<n>" labels, maybe it's a better
idea to rename these into "fail<n>" instead of "error<n>". Maybe it's
even possible to do this renaming with sed.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 02/37] add opcodes to ib_pack.h
  2011-09-01  3:15                   ` Bob Pearson
@ 2011-09-01 17:29                     ` Jason Gunthorpe
  2011-09-01 17:51                     ` Bart Van Assche
  1 sibling, 0 replies; 85+ messages in thread
From: Jason Gunthorpe @ 2011-09-01 17:29 UTC (permalink / raw)
  To: Bob Pearson; +Cc: 'Bart Van Assche', linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Aug 31, 2011 at 10:15:02PM -0500, Bob Pearson wrote:

> > 1. Using ipv6_eth_mc_map() for mapping multicast GIDs seems an
> > unfortunate choice to me. That choice will cause multicast GIDs to be
> > mapped to the 33-33-xx-xx-xx-xx Ethernet address range that has been
> > reserved by RFC 2464 for IPv6 multicast addresses. If a collision with
> > an IPv6 multicast address occurs and IPv6 MLD snooping has enabled on
> > the switches in the involved network then RoCE multicast won't work
> > properly. IMHO we need a separate Ethernet address range for RoCE
> > multicast purposes, next to the existing ranges for IPv4 and IPv6.
> 
> I thought this was intentional! I.e. in order to get multicast over RoCE to
> work we were trying to piggyback on the behavior of intelligent switches.
> Otherwise the only way to get packets forwarded without adding a new group
> of multicast addresses would be to broadcast. The implementation of IGMP and
> MLD at the end node is oblivious to the packet's ethertype. And switches (at
> least the ones we tried) also happily multicast RoCE packets.

Nothing about RoCE multicast is standardized or even proposed to be
standardized.

Hijacking the IPv6 MAC space - and also abusing the net stack to
generate MLD stuff is what Mellanox did in their implementation. I
don't know if that was included in the main line submission or not?

Frankly I'd be inclined to map to the broadcast MAC address to pressure
the relevant standards bodies to finish the work on RoCE.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 02/37] add opcodes to ib_pack.h
  2011-09-01  3:15                   ` Bob Pearson
  2011-09-01 17:29                     ` Jason Gunthorpe
@ 2011-09-01 17:51                     ` Bart Van Assche
  1 sibling, 0 replies; 85+ messages in thread
From: Bart Van Assche @ 2011-09-01 17:51 UTC (permalink / raw)
  To: Bob Pearson; +Cc: linux-rdma

On Thu, Sep 1, 2011 at 5:15 AM, Bob Pearson
<rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
> > On Sat, Aug 20, 2011 at 4:47 PM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> > More in detail, my comments with regard to multicast support in ib_rxe
> are:
> > 1. Using ipv6_eth_mc_map() for mapping multicast GIDs seems an
> > unfortunate choice to me. That choice will cause multicast GIDs to be
> > mapped to the 33-33-xx-xx-xx-xx Ethernet address range that has been
> > reserved by RFC 2464 for IPv6 multicast addresses. If a collision with
> > an IPv6 multicast address occurs and IPv6 MLD snooping has enabled on
> > the switches in the involved network then RoCE multicast won't work
> > properly. IMHO we need a separate Ethernet address range for RoCE
> > multicast purposes, next to the existing ranges for IPv4 and IPv6.
>
> I thought this was intentional! I.e. in order to get multicast over RoCE to
> work we were trying to piggyback on the behavior of intelligent switches.
> Otherwise the only way to get packets forwarded without adding a new group
> of multicast addresses would be to broadcast. The implementation of IGMP and
> MLD at the end node is oblivious to the packet's ethertype. And switches (at
> least the ones we tried) also happily multicast RoCE packets.

Shouldn't both the RoCE specification and RFC 4541 be updated
according to the above ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [patch v2 00/37] add rxe (soft RoCE)
       [not found] ` <20110724194300.421253331-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
@ 2012-09-05 10:36   ` Stefan (metze) Metzmacher
  0 siblings, 0 replies; 85+ messages in thread
From: Stefan (metze) Metzmacher @ 2012-09-05 10:36 UTC (permalink / raw)
  To: Bob Pearson; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1544 bytes --]

Hi Bob,

Am 24.07.2011 21:43, schrieb
rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5-XMD5yJDbdMReXY1tMh2IBg@public.gmane.org:
> Changes in v2 include:
> 
> 	- Updated to Roland's tree as of 7/24/2011
> 
> 	- Moved the crc32 algorithm into a patch (slice-by-8-for_crc32.c.diff)
> 	  that goes into the mainline kernel. It has been submitted upstream
> 	  but is also included in here since it is required to build the driver.
> 
> 	- renamed rxe_sb8.c rxe_icrc.c since that is all it now does.
> 
> 	- Cleaned up warnings from checkpatch, C=2 and __CHECK_ENDIAN__.
> 
> 	- moved small .h files into rxe_loc.h
> 
> 	- rewrote the Kconfig text to be a little friendlier
> 
> 	- Changed the patch names to make them easier to handle.
> 
> 	- the quilt patch series is online at:
> 	  http://support.systemfabricworks.com/downloads/rxe/patches-v2.tgz
> 
> 	- librxe is online at:
> 	  http://support.systemfabricworks.com/downloads/rxe/librxe-1.0.0.tar.gz
> 
> 	Thanks to Roland Dreier, Bart van Assche and David Dillow for helpful
> 	suggestions.

I'm wondering what the status of this patchset is?

It would be very good if the rxe modules could be
included in the mainline kernel.

Hopefully Samba will get support for SMB-Direct (SMB2 over RDMA) support
in future. And having RDMA support without hardware support would be very
good for testing and also for usage on the client side against a server
which has hardware RDMA support.

Are there git repositories yet (for the kernel and userspace)?

metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2012-09-05 10:36 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-24 19:43 [patch v2 00/37] add rxe (soft RoCE) rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 01/37] add slice by 8 algorithm to crc32.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201228.154338911-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-05 18:39     ` Bart Van Assche
     [not found]       ` <CAO+b5-rFs79znwVcG42Px2i351vhiz9hFyCJV5qg77of8Y6P5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-05 19:24         ` Bob Pearson
2011-07-24 19:43 ` [patch v2 02/37] add opcodes to ib_pack.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201228.204265835-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-15  9:13     ` Bart Van Assche
     [not found]       ` <CAO+b5-pb3p0n5MZYNSJicAdUfZ4PtMNG5vdzDOiZNQwLnB3Wcw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-15 16:15         ` Bob Pearson
2011-08-20 14:47           ` Bart Van Assche
     [not found]             ` <CAO+b5-qV9uxs=V_uHbVZ_3sm1LqkQUB0em=K=nrRxUvFQYWo5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-22 18:08               ` Bart Van Assche
     [not found]                 ` <CAO+b5-pKPuhsrQV6igqo78y_1+gZ=09GxHoABOVnLxX3myFYZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-01  3:15                   ` Bob Pearson
2011-09-01 17:29                     ` Jason Gunthorpe
2011-09-01 17:51                     ` Bart Van Assche
2011-09-01  3:38         ` Bob Pearson
2011-07-24 19:43 ` [patch v2 03/37] add rxe_hdr.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 04/37] add rxe_opcode.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 05/37] add rxe_opcode.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 06/37] add rxe_param.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201228.415281967-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-20 15:27     ` Bart Van Assche
     [not found]       ` <CAO+b5-q5ohgttDAc45MSKrSUdUVOfR3xHMSY99JtfX_MetSw8Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-01  3:50         ` Bob Pearson
2011-07-24 19:43 ` [patch v2 07/37] add rxe.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 08/37] add rxe_loc.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 09/37] add rxe_mmap.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 10/37] add rxe_queue.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 11/37] add rxe_queue.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 12/37] add rxe_verbs.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201228.710989476-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-15 12:32     ` Bart Van Assche
     [not found]       ` <CAO+b5-okTDAJ6iOTv_JKXeGttt0MFM+na-FRaNwJMFqHktKHJQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-01  4:32         ` Bob Pearson
2011-08-15 12:59     ` Bart Van Assche
     [not found]       ` <CAO+b5-qQBWeyFacCU3tx92jG2zw1=YZJRRX5SbQ57Ztx+zxuew-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-01  4:35         ` Bob Pearson
2011-07-24 19:43 ` [patch v2 13/37] add rxe_verbs.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201228.761358826-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-15 13:24     ` Bart Van Assche
     [not found]       ` <CAO+b5-r3UtgnyfhyaodHQTj_xB5d2MQkZ8kRbrpZ1hF_0W1ONw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-01  5:20         ` Bob Pearson
2011-09-01 10:33           ` Bart Van Assche
2011-09-01 11:02           ` Bart Van Assche
2011-08-15 14:33     ` Bart Van Assche
     [not found]       ` <CAO+b5-pRQXn08aVOPxeQBzrYXAV+3zD+HZdCCmPTTwBCCzY=ag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-15 14:45         ` Greg KH
     [not found]           ` <20110815144503.GA12014-l3A5Bk7waGM@public.gmane.org>
2011-08-15 14:58             ` Bart Van Assche
     [not found]               ` <CAO+b5-rsofKNdcjCjBdj1fH7i3DpbHhFajhk0QfNK4DE7nj_SQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-15 15:07                 ` Greg KH
     [not found]                   ` <20110815150726.GA12490-l3A5Bk7waGM@public.gmane.org>
2011-08-15 16:02                     ` Bart Van Assche
     [not found]                       ` <CAO+b5-ocwYv2_A8PfVUh1yp0CFNYasjTJNQ0F6bj-+wT8npavA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-15 16:13                         ` Greg KH
2011-08-15 16:31     ` Bart Van Assche
2011-07-24 19:43 ` [patch v2 14/37] add rxe_pool.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 15/37] add rxe_pool.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 16/37] add rxe_task.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 17/37] add rxe_task.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 18/37] add rxe_av.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 19/37] add rxe_srq.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 20/37] add rxe_cq.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.112267527-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-20 15:53     ` Bart Van Assche
2011-07-24 19:43 ` [patch v2 21/37] add rxe_qp.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.163423781-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-15 16:13     ` Bart Van Assche
     [not found]       ` <CAO+b5-opA-W4Lp69NzS=9MLEv1V7jWhHPo=bayKhaR-kPgTQTQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-15 16:58         ` Jason Gunthorpe
2011-07-24 19:43 ` [patch v2 22/37] add rxe_mr.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 23/37] add rxe_mcast.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.264963653-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-20 16:13     ` Bart Van Assche
2011-07-24 19:43 ` [patch v2 24/37] add rxe_recv.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 25/37] add rxe_comp.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 26/37] add rxe_req.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.415319539-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-10 18:44     ` Bart Van Assche
2011-07-24 19:43 ` [patch v2 27/37] add rxe_resp.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.464927217-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-20 16:26     ` Bart Van Assche
2011-07-24 19:43 ` [patch v2 28/37] add rxe_arbiter.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.515453123-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-10 18:46     ` Bart Van Assche
     [not found]       ` <CAO+b5-p=++ofCDj4ZB_PS9LiT5bcHsJiYDW8EL_2OrrzqjZmhg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-10 19:12         ` Bob Pearson
2011-07-24 19:43 ` [patch v2 29/37] add rxe_dma.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.565188525-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-20 15:32     ` Bart Van Assche
2011-07-24 19:43 ` [patch v2 30/37] add rxe_icrc.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 31/37] add rxe.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.666252623-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-20 15:12     ` Bart Van Assche
2011-07-24 19:43 ` [patch v2 32/37] add rxe_net.h rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.715886208-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-07  8:51     ` Bart Van Assche
2011-07-24 19:43 ` [patch v2 33/37] add rxe_net.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.765038738-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-05 18:47     ` Bart Van Assche
     [not found]       ` <CAO+b5-pk-X7aGgg12ND17OLcqcLU6qyiV+vf22anh9bYsXUUzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-05 19:17         ` Bob Pearson
2011-08-06  8:02         ` Bart Van Assche
2011-07-24 19:43 ` [patch v2 34/37] add rxe_net_sysfs.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.815123934-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-07  9:27     ` Bart Van Assche
2011-07-24 19:43 ` [patch v2 35/37] add rxe_sample.c rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.865806162-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-07  8:08     ` Bart Van Assche
2011-07-24 19:43 ` [patch v2 36/37] add Makefile rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
2011-07-24 19:43 ` [patch v2 37/37] add Kconfig rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5
     [not found]   ` <20110724201229.966262587-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2011-08-05 18:44     ` Bart Van Assche
     [not found]       ` <CAO+b5-oFKiAhTkCntAo5_mMCtdg1vb1eM_mTi-8E-T_atHa_3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-08-05 19:19         ` Bob Pearson
2011-08-06 18:44           ` Bart Van Assche
     [not found] ` <20110724194300.421253331-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>
2012-09-05 10:36   ` [patch v2 00/37] add rxe (soft RoCE) Stefan (metze) Metzmacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.