All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6
@ 2016-02-28  2:19 Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport Santosh Shilimkar
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel

v2:
Dropped module parameter from [PATCH 11/13] as suggested by David Miller

Series is generated against net-next but also applies against Linus's tip
cleanly. Entire patchset is available at below git tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git for_4.6/net-next/rds_v2

The diff-stat looks bit scary since almost ~4K lines of code is
getting removed. Brief summary of the series:

- Drop the stale iWARP support:
	RDS iWarp support code has become stale and non testable for
	sometime.  As discussed and agreed earlier on list, am dropping
	its support for good. If new iWarp user(s) shows up in future,
	the plan is to adapt existing IB RDMA with special sink case.
- RDS gets SO_TIMESTAMP support
- Long due RDS maintainer entry gets updated
- Some RDS IB code refactoring towards new FastReg Memory registration (FRMR)
- Lastly the initial support for FRMR
	
RDS IB RDMA performance with FRMR is not yet as good as FMR and I do have
some patches in progress to address that. But they are not ready for 4.6
so I left them out of this series. 

Also am keeping eye on new CQ API adaptations like other ULPs doing and
will try to adapt RDS for the same most likely in 4.7+ timeframe. 

Santosh Shilimkar (12):
  RDS: Drop stale iWARP RDMA transport
  RDS: Add support for SO_TIMESTAMP for incoming messages
  MAINTAINERS: update RDS entry
  RDS: IB: Remove the RDS_IB_SEND_OP dependency
  RDS: IB: Re-organise ibmr code
  RDS: IB: create struct rds_ib_fmr
  RDS: IB: move FMR code to its own file
  RDS: IB: add connection info to ibmr
  RDS: IB: handle the RDMA CM time wait event
  RDS: IB: add mr reused stats
  RDS: IB: add Fastreg MR (FRMR) detection support
  RDS: IB: allocate extra space on queues for FRMR support

Avinash Repaka (1):
  RDS: IB: Support Fastreg MR (FRMR) memory registration mode

 Documentation/networking/rds.txt |   4 +-
 MAINTAINERS                      |   6 +-
 net/rds/Kconfig                  |   7 +-
 net/rds/Makefile                 |   4 +-
 net/rds/af_rds.c                 |  26 ++
 net/rds/ib.c                     |  47 +-
 net/rds/ib.h                     |  37 +-
 net/rds/ib_cm.c                  |  59 ++-
 net/rds/ib_fmr.c                 | 248 ++++++++++
 net/rds/ib_frmr.c                | 376 +++++++++++++++
 net/rds/ib_mr.h                  | 148 ++++++
 net/rds/ib_rdma.c                | 495 ++++++--------------
 net/rds/ib_send.c                |   6 +-
 net/rds/ib_stats.c               |   2 +
 net/rds/iw.c                     | 312 -------------
 net/rds/iw.h                     | 398 ----------------
 net/rds/iw_cm.c                  | 769 ------------------------------
 net/rds/iw_rdma.c                | 837 ---------------------------------
 net/rds/iw_recv.c                | 904 ------------------------------------
 net/rds/iw_ring.c                | 169 -------
 net/rds/iw_send.c                | 981 ---------------------------------------
 net/rds/iw_stats.c               |  95 ----
 net/rds/iw_sysctl.c              | 123 -----
 net/rds/rdma_transport.c         |  21 +-
 net/rds/rdma_transport.h         |   5 -
 net/rds/rds.h                    |   1 +
 net/rds/recv.c                   |  20 +-
 27 files changed, 1065 insertions(+), 5035 deletions(-)
 create mode 100644 net/rds/ib_fmr.c
 create mode 100644 net/rds/ib_frmr.c
 create mode 100644 net/rds/ib_mr.h
 delete mode 100644 net/rds/iw.c
 delete mode 100644 net/rds/iw.h
 delete mode 100644 net/rds/iw_cm.c
 delete mode 100644 net/rds/iw_rdma.c
 delete mode 100644 net/rds/iw_recv.c
 delete mode 100644 net/rds/iw_ring.c
 delete mode 100644 net/rds/iw_send.c
 delete mode 100644 net/rds/iw_stats.c
 delete mode 100644 net/rds/iw_sysctl.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  9:05   ` Christoph Hellwig
  2016-02-28 19:51   ` Or Gerlitz
  2016-02-28  2:19 ` [net-next][PATCH v2 02/13] RDS: Add support for SO_TIMESTAMP for incoming messages Santosh Shilimkar
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

RDS iWarp support code has become stale and non testable. As
indicated earlier, am dropping the support for it.

If new iWarp user(s) shows up in future, we can adapat the RDS IB
transprt for the special RDMA READ sink case. iWarp needs an MR
for the RDMA READ sink.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 Documentation/networking/rds.txt |   4 +-
 net/rds/Kconfig                  |   7 +-
 net/rds/Makefile                 |   4 +-
 net/rds/iw.c                     | 312 -------------
 net/rds/iw.h                     | 398 ----------------
 net/rds/iw_cm.c                  | 769 ------------------------------
 net/rds/iw_rdma.c                | 837 ---------------------------------
 net/rds/iw_recv.c                | 904 ------------------------------------
 net/rds/iw_ring.c                | 169 -------
 net/rds/iw_send.c                | 981 ---------------------------------------
 net/rds/iw_stats.c               |  95 ----
 net/rds/iw_sysctl.c              | 123 -----
 net/rds/rdma_transport.c         |  13 +-
 net/rds/rdma_transport.h         |   5 -
 14 files changed, 7 insertions(+), 4614 deletions(-)
 delete mode 100644 net/rds/iw.c
 delete mode 100644 net/rds/iw.h
 delete mode 100644 net/rds/iw_cm.c
 delete mode 100644 net/rds/iw_rdma.c
 delete mode 100644 net/rds/iw_recv.c
 delete mode 100644 net/rds/iw_ring.c
 delete mode 100644 net/rds/iw_send.c
 delete mode 100644 net/rds/iw_stats.c
 delete mode 100644 net/rds/iw_sysctl.c

diff --git a/Documentation/networking/rds.txt b/Documentation/networking/rds.txt
index e1a3d59..9d219d8 100644
--- a/Documentation/networking/rds.txt
+++ b/Documentation/networking/rds.txt
@@ -19,9 +19,7 @@ to N*N if you use a connection-oriented socket transport like TCP.
 
 RDS is not Infiniband-specific; it was designed to support different
 transports.  The current implementation used to support RDS over TCP as well
-as IB. Work is in progress to support RDS over iWARP, and using DCE to
-guarantee no dropped packets on Ethernet, it may be possible to use RDS over
-UDP in the future.
+as IB.
 
 The high-level semantics of RDS from the application's point of view are
 
diff --git a/net/rds/Kconfig b/net/rds/Kconfig
index f2c670b..bffde4b 100644
--- a/net/rds/Kconfig
+++ b/net/rds/Kconfig
@@ -4,14 +4,13 @@ config RDS
 	depends on INET
 	---help---
 	  The RDS (Reliable Datagram Sockets) protocol provides reliable,
-	  sequenced delivery of datagrams over Infiniband, iWARP,
-	  or TCP.
+	  sequenced delivery of datagrams over Infiniband or TCP.
 
 config RDS_RDMA
-	tristate "RDS over Infiniband and iWARP"
+	tristate "RDS over Infiniband"
 	depends on RDS && INFINIBAND && INFINIBAND_ADDR_TRANS
 	---help---
-	  Allow RDS to use Infiniband and iWARP as a transport.
+	  Allow RDS to use Infiniband as a transport.
 	  This transport supports RDMA operations.
 
 config RDS_TCP
diff --git a/net/rds/Makefile b/net/rds/Makefile
index 56d3f60..19e5485 100644
--- a/net/rds/Makefile
+++ b/net/rds/Makefile
@@ -6,9 +6,7 @@ rds-y :=	af_rds.o bind.o cong.o connection.o info.o message.o   \
 obj-$(CONFIG_RDS_RDMA) += rds_rdma.o
 rds_rdma-y :=	rdma_transport.o \
 			ib.o ib_cm.o ib_recv.o ib_ring.o ib_send.o ib_stats.o \
-			ib_sysctl.o ib_rdma.o \
-			iw.o iw_cm.o iw_recv.o iw_ring.o iw_send.o iw_stats.o \
-			iw_sysctl.o iw_rdma.o
+			ib_sysctl.o ib_rdma.o
 
 
 obj-$(CONFIG_RDS_TCP) += rds_tcp.o
diff --git a/net/rds/iw.c b/net/rds/iw.c
deleted file mode 100644
index f4a9fff..0000000
diff --git a/net/rds/iw.h b/net/rds/iw.h
deleted file mode 100644
index 5af01d1..0000000
diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
deleted file mode 100644
index aea4c91..0000000
diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
deleted file mode 100644
index b09a40c..0000000
diff --git a/net/rds/iw_recv.c b/net/rds/iw_recv.c
deleted file mode 100644
index a66d179..0000000
diff --git a/net/rds/iw_ring.c b/net/rds/iw_ring.c
deleted file mode 100644
index da8e3b6..0000000
diff --git a/net/rds/iw_send.c b/net/rds/iw_send.c
deleted file mode 100644
index e20bd50..0000000
diff --git a/net/rds/iw_stats.c b/net/rds/iw_stats.c
deleted file mode 100644
index 5fe67f6..0000000
diff --git a/net/rds/iw_sysctl.c b/net/rds/iw_sysctl.c
deleted file mode 100644
index 139239d..0000000
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index 9c1fed8..4f4b3d8 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -49,9 +49,7 @@ int rds_rdma_cm_event_handler(struct rdma_cm_id *cm_id,
 	rdsdebug("conn %p id %p handling event %u (%s)\n", conn, cm_id,
 		 event->event, rdma_event_msg(event->event));
 
-	if (cm_id->device->node_type == RDMA_NODE_RNIC)
-		trans = &rds_iw_transport;
-	else
+	if (cm_id->device->node_type == RDMA_NODE_IB_CA)
 		trans = &rds_ib_transport;
 
 	/* Prevent shutdown from tearing down the connection
@@ -200,10 +198,6 @@ static int rds_rdma_init(void)
 	if (ret)
 		goto out;
 
-	ret = rds_iw_init();
-	if (ret)
-		goto err_iw_init;
-
 	ret = rds_ib_init();
 	if (ret)
 		goto err_ib_init;
@@ -211,8 +205,6 @@ static int rds_rdma_init(void)
 	goto out;
 
 err_ib_init:
-	rds_iw_exit();
-err_iw_init:
 	rds_rdma_listen_stop();
 out:
 	return ret;
@@ -224,11 +216,10 @@ static void rds_rdma_exit(void)
 	/* stop listening first to ensure no new connections are attempted */
 	rds_rdma_listen_stop();
 	rds_ib_exit();
-	rds_iw_exit();
 }
 module_exit(rds_rdma_exit);
 
 MODULE_AUTHOR("Oracle Corporation <rds-devel@oss.oracle.com>");
-MODULE_DESCRIPTION("RDS: IB/iWARP transport");
+MODULE_DESCRIPTION("RDS: IB transport");
 MODULE_LICENSE("Dual BSD/GPL");
 
diff --git a/net/rds/rdma_transport.h b/net/rds/rdma_transport.h
index faba4e3..ff2010e 100644
--- a/net/rds/rdma_transport.h
+++ b/net/rds/rdma_transport.h
@@ -16,9 +16,4 @@ extern struct rds_transport rds_ib_transport;
 int rds_ib_init(void);
 void rds_ib_exit(void);
 
-/* from iw.c */
-extern struct rds_transport rds_iw_transport;
-int rds_iw_init(void);
-void rds_iw_exit(void);
-
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 02/13] RDS: Add support for SO_TIMESTAMP for incoming messages
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 03/13] MAINTAINERS: update RDS entry Santosh Shilimkar
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

The SO_TIMESTAMP generates time stamp for each incoming RDS messages
User app can enable it by using SO_TIMESTAMP setsocketopt() at
SOL_SOCKET level. CMSG data of cmsg type SO_TIMESTAMP contains the
time stamp in struct timeval format.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/af_rds.c | 26 ++++++++++++++++++++++++++
 net/rds/rds.h    |  1 +
 net/rds/recv.c   | 20 ++++++++++++++++++--
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index b5476aeb..6beaeb1 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -277,6 +277,27 @@ static int rds_set_transport(struct rds_sock *rs, char __user *optval,
 	return rs->rs_transport ? 0 : -ENOPROTOOPT;
 }
 
+static int rds_enable_recvtstamp(struct sock *sk, char __user *optval,
+				 int optlen)
+{
+	int val, valbool;
+
+	if (optlen != sizeof(int))
+		return -EFAULT;
+
+	if (get_user(val, (int __user *)optval))
+		return -EFAULT;
+
+	valbool = val ? 1 : 0;
+
+	if (valbool)
+		sock_set_flag(sk, SOCK_RCVTSTAMP);
+	else
+		sock_reset_flag(sk, SOCK_RCVTSTAMP);
+
+	return 0;
+}
+
 static int rds_setsockopt(struct socket *sock, int level, int optname,
 			  char __user *optval, unsigned int optlen)
 {
@@ -312,6 +333,11 @@ static int rds_setsockopt(struct socket *sock, int level, int optname,
 		ret = rds_set_transport(rs, optval, optlen);
 		release_sock(sock->sk);
 		break;
+	case SO_TIMESTAMP:
+		lock_sock(sock->sk);
+		ret = rds_enable_recvtstamp(sock->sk, optval, optlen);
+		release_sock(sock->sk);
+		break;
 	default:
 		ret = -ENOPROTOOPT;
 	}
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 0e2797b..80256b0 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -222,6 +222,7 @@ struct rds_incoming {
 	__be32			i_saddr;
 
 	rds_rdma_cookie_t	i_rdma_cookie;
+	struct timeval		i_rx_tstamp;
 };
 
 struct rds_mr {
diff --git a/net/rds/recv.c b/net/rds/recv.c
index a00462b..c0be1ec 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -35,6 +35,8 @@
 #include <net/sock.h>
 #include <linux/in.h>
 #include <linux/export.h>
+#include <linux/time.h>
+#include <linux/rds.h>
 
 #include "rds.h"
 
@@ -46,6 +48,8 @@ void rds_inc_init(struct rds_incoming *inc, struct rds_connection *conn,
 	inc->i_conn = conn;
 	inc->i_saddr = saddr;
 	inc->i_rdma_cookie = 0;
+	inc->i_rx_tstamp.tv_sec = 0;
+	inc->i_rx_tstamp.tv_usec = 0;
 }
 EXPORT_SYMBOL_GPL(rds_inc_init);
 
@@ -228,6 +232,8 @@ void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr,
 		rds_recv_rcvbuf_delta(rs, sk, inc->i_conn->c_lcong,
 				      be32_to_cpu(inc->i_hdr.h_len),
 				      inc->i_hdr.h_dport);
+		if (sock_flag(sk, SOCK_RCVTSTAMP))
+			do_gettimeofday(&inc->i_rx_tstamp);
 		rds_inc_addref(inc);
 		list_add_tail(&inc->i_item, &rs->rs_recv_queue);
 		__rds_wake_sk_sleep(sk);
@@ -381,7 +387,8 @@ static int rds_notify_cong(struct rds_sock *rs, struct msghdr *msghdr)
 /*
  * Receive any control messages.
  */
-static int rds_cmsg_recv(struct rds_incoming *inc, struct msghdr *msg)
+static int rds_cmsg_recv(struct rds_incoming *inc, struct msghdr *msg,
+			 struct rds_sock *rs)
 {
 	int ret = 0;
 
@@ -392,6 +399,15 @@ static int rds_cmsg_recv(struct rds_incoming *inc, struct msghdr *msg)
 			return ret;
 	}
 
+	if ((inc->i_rx_tstamp.tv_sec != 0) &&
+	    sock_flag(rds_rs_to_sk(rs), SOCK_RCVTSTAMP)) {
+		ret = put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
+			       sizeof(struct timeval),
+			       &inc->i_rx_tstamp);
+		if (ret)
+			return ret;
+	}
+
 	return 0;
 }
 
@@ -474,7 +490,7 @@ int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 			msg->msg_flags |= MSG_TRUNC;
 		}
 
-		if (rds_cmsg_recv(inc, msg)) {
+		if (rds_cmsg_recv(inc, msg, rs)) {
 			ret = -EFAULT;
 			goto out;
 		}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 03/13] MAINTAINERS: update RDS entry
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 02/13] RDS: Add support for SO_TIMESTAMP for incoming messages Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 04/13] RDS: IB: Remove the RDS_IB_SEND_OP dependency Santosh Shilimkar
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

Acked-by: Chien Yen <chien.yen@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 MAINTAINERS | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 27393cf..08b084a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9067,10 +9067,14 @@ S:	Maintained
 F:	drivers/net/ethernet/rdc/r6040.c
 
 RDS - RELIABLE DATAGRAM SOCKETS
-M:	Chien Yen <chien.yen@oracle.com>
+M:	Santosh Shilimkar <santosh.shilimkar@oracle.com>
+L:	netdev@vger.kernel.org
+L:	linux-rdma@vger.kernel.org
 L:	rds-devel@oss.oracle.com (moderated for non-subscribers)
+W:	https://oss.oracle.com/projects/rds/
 S:	Supported
 F:	net/rds/
+F:	Documentation/networking/rds.txt
 
 READ-COPY UPDATE (RCU)
 M:	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 04/13] RDS: IB: Remove the RDS_IB_SEND_OP dependency
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
                   ` (2 preceding siblings ...)
  2016-02-28  2:19 ` [net-next][PATCH v2 03/13] MAINTAINERS: update RDS entry Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 05/13] RDS: IB: Re-organise ibmr code Santosh Shilimkar
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

This helps to combine asynchronous fastreg MR completion handler
with send completion handler.

No functional change.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/ib.h      |  1 -
 net/rds/ib_cm.c   | 42 +++++++++++++++++++++++++++---------------
 net/rds/ib_send.c |  6 ++----
 3 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/net/rds/ib.h b/net/rds/ib.h
index b3fdebb..09cd8e3 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -28,7 +28,6 @@
 #define RDS_IB_RECYCLE_BATCH_COUNT	32
 
 #define RDS_IB_WC_MAX			32
-#define RDS_IB_SEND_OP			BIT_ULL(63)
 
 extern struct rw_semaphore rds_ib_devices_lock;
 extern struct list_head rds_ib_devices;
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index da5a7fb..7f68abc 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -236,12 +236,10 @@ static void rds_ib_cq_comp_handler_recv(struct ib_cq *cq, void *context)
 	tasklet_schedule(&ic->i_recv_tasklet);
 }
 
-static void poll_cq(struct rds_ib_connection *ic, struct ib_cq *cq,
-		    struct ib_wc *wcs,
-		    struct rds_ib_ack_state *ack_state)
+static void poll_scq(struct rds_ib_connection *ic, struct ib_cq *cq,
+		     struct ib_wc *wcs)
 {
-	int nr;
-	int i;
+	int nr, i;
 	struct ib_wc *wc;
 
 	while ((nr = ib_poll_cq(cq, RDS_IB_WC_MAX, wcs)) > 0) {
@@ -251,10 +249,7 @@ static void poll_cq(struct rds_ib_connection *ic, struct ib_cq *cq,
 				 (unsigned long long)wc->wr_id, wc->status,
 				 wc->byte_len, be32_to_cpu(wc->ex.imm_data));
 
-			if (wc->wr_id & RDS_IB_SEND_OP)
-				rds_ib_send_cqe_handler(ic, wc);
-			else
-				rds_ib_recv_cqe_handler(ic, wc, ack_state);
+			rds_ib_send_cqe_handler(ic, wc);
 		}
 	}
 }
@@ -263,14 +258,12 @@ static void rds_ib_tasklet_fn_send(unsigned long data)
 {
 	struct rds_ib_connection *ic = (struct rds_ib_connection *)data;
 	struct rds_connection *conn = ic->conn;
-	struct rds_ib_ack_state state;
 
 	rds_ib_stats_inc(s_ib_tasklet_call);
 
-	memset(&state, 0, sizeof(state));
-	poll_cq(ic, ic->i_send_cq, ic->i_send_wc, &state);
+	poll_scq(ic, ic->i_send_cq, ic->i_send_wc);
 	ib_req_notify_cq(ic->i_send_cq, IB_CQ_NEXT_COMP);
-	poll_cq(ic, ic->i_send_cq, ic->i_send_wc, &state);
+	poll_scq(ic, ic->i_send_cq, ic->i_send_wc);
 
 	if (rds_conn_up(conn) &&
 	    (!test_bit(RDS_LL_SEND_FULL, &conn->c_flags) ||
@@ -278,6 +271,25 @@ static void rds_ib_tasklet_fn_send(unsigned long data)
 		rds_send_xmit(ic->conn);
 }
 
+static void poll_rcq(struct rds_ib_connection *ic, struct ib_cq *cq,
+		     struct ib_wc *wcs,
+		     struct rds_ib_ack_state *ack_state)
+{
+	int nr, i;
+	struct ib_wc *wc;
+
+	while ((nr = ib_poll_cq(cq, RDS_IB_WC_MAX, wcs)) > 0) {
+		for (i = 0; i < nr; i++) {
+			wc = wcs + i;
+			rdsdebug("wc wr_id 0x%llx status %u byte_len %u imm_data %u\n",
+				 (unsigned long long)wc->wr_id, wc->status,
+				 wc->byte_len, be32_to_cpu(wc->ex.imm_data));
+
+			rds_ib_recv_cqe_handler(ic, wc, ack_state);
+		}
+	}
+}
+
 static void rds_ib_tasklet_fn_recv(unsigned long data)
 {
 	struct rds_ib_connection *ic = (struct rds_ib_connection *)data;
@@ -291,9 +303,9 @@ static void rds_ib_tasklet_fn_recv(unsigned long data)
 	rds_ib_stats_inc(s_ib_tasklet_call);
 
 	memset(&state, 0, sizeof(state));
-	poll_cq(ic, ic->i_recv_cq, ic->i_recv_wc, &state);
+	poll_rcq(ic, ic->i_recv_cq, ic->i_recv_wc, &state);
 	ib_req_notify_cq(ic->i_recv_cq, IB_CQ_SOLICITED);
-	poll_cq(ic, ic->i_recv_cq, ic->i_recv_wc, &state);
+	poll_rcq(ic, ic->i_recv_cq, ic->i_recv_wc, &state);
 
 	if (state.ack_next_valid)
 		rds_ib_set_ack(ic, state.ack_next, state.ack_required);
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index eac30bf..f27d2c8 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -195,7 +195,7 @@ void rds_ib_send_init_ring(struct rds_ib_connection *ic)
 
 		send->s_op = NULL;
 
-		send->s_wr.wr_id = i | RDS_IB_SEND_OP;
+		send->s_wr.wr_id = i;
 		send->s_wr.sg_list = send->s_sge;
 		send->s_wr.ex.imm_data = 0;
 
@@ -263,9 +263,7 @@ void rds_ib_send_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc)
 
 	oldest = rds_ib_ring_oldest(&ic->i_send_ring);
 
-	completed = rds_ib_ring_completed(&ic->i_send_ring,
-					  (wc->wr_id & ~RDS_IB_SEND_OP),
-					  oldest);
+	completed = rds_ib_ring_completed(&ic->i_send_ring, wc->wr_id, oldest);
 
 	for (i = 0; i < completed; i++) {
 		send = &ic->i_sends[oldest];
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 05/13] RDS: IB: Re-organise ibmr code
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
                   ` (3 preceding siblings ...)
  2016-02-28  2:19 ` [net-next][PATCH v2 04/13] RDS: IB: Remove the RDS_IB_SEND_OP dependency Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 06/13] RDS: IB: create struct rds_ib_fmr Santosh Shilimkar
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

No functional changes. This is in preperation towards adding
fastreg memory resgitration support.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/Makefile  |   2 +-
 net/rds/ib.c      |  37 +++---
 net/rds/ib.h      |  25 +---
 net/rds/ib_fmr.c  | 217 +++++++++++++++++++++++++++++++
 net/rds/ib_mr.h   | 109 ++++++++++++++++
 net/rds/ib_rdma.c | 379 +++++++++++-------------------------------------------
 6 files changed, 422 insertions(+), 347 deletions(-)
 create mode 100644 net/rds/ib_fmr.c
 create mode 100644 net/rds/ib_mr.h

diff --git a/net/rds/Makefile b/net/rds/Makefile
index 19e5485..bcf5591 100644
--- a/net/rds/Makefile
+++ b/net/rds/Makefile
@@ -6,7 +6,7 @@ rds-y :=	af_rds.o bind.o cong.o connection.o info.o message.o   \
 obj-$(CONFIG_RDS_RDMA) += rds_rdma.o
 rds_rdma-y :=	rdma_transport.o \
 			ib.o ib_cm.o ib_recv.o ib_ring.o ib_send.o ib_stats.o \
-			ib_sysctl.o ib_rdma.o
+			ib_sysctl.o ib_rdma.o ib_fmr.o
 
 
 obj-$(CONFIG_RDS_TCP) += rds_tcp.o
diff --git a/net/rds/ib.c b/net/rds/ib.c
index 9481d55..bb32cb9 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -42,15 +42,16 @@
 
 #include "rds.h"
 #include "ib.h"
+#include "ib_mr.h"
 
-unsigned int rds_ib_fmr_1m_pool_size = RDS_FMR_1M_POOL_SIZE;
-unsigned int rds_ib_fmr_8k_pool_size = RDS_FMR_8K_POOL_SIZE;
+unsigned int rds_ib_mr_1m_pool_size = RDS_MR_1M_POOL_SIZE;
+unsigned int rds_ib_mr_8k_pool_size = RDS_MR_8K_POOL_SIZE;
 unsigned int rds_ib_retry_count = RDS_IB_DEFAULT_RETRY_COUNT;
 
-module_param(rds_ib_fmr_1m_pool_size, int, 0444);
-MODULE_PARM_DESC(rds_ib_fmr_1m_pool_size, " Max number of 1M fmr per HCA");
-module_param(rds_ib_fmr_8k_pool_size, int, 0444);
-MODULE_PARM_DESC(rds_ib_fmr_8k_pool_size, " Max number of 8K fmr per HCA");
+module_param(rds_ib_mr_1m_pool_size, int, 0444);
+MODULE_PARM_DESC(rds_ib_mr_1m_pool_size, " Max number of 1M mr per HCA");
+module_param(rds_ib_mr_8k_pool_size, int, 0444);
+MODULE_PARM_DESC(rds_ib_mr_8k_pool_size, " Max number of 8K mr per HCA");
 module_param(rds_ib_retry_count, int, 0444);
 MODULE_PARM_DESC(rds_ib_retry_count, " Number of hw retries before reporting an error");
 
@@ -140,13 +141,13 @@ static void rds_ib_add_one(struct ib_device *device)
 	rds_ibdev->max_sge = min(device->attrs.max_sge, RDS_IB_MAX_SGE);
 
 	rds_ibdev->fmr_max_remaps = device->attrs.max_map_per_fmr?: 32;
-	rds_ibdev->max_1m_fmrs = device->attrs.max_mr ?
+	rds_ibdev->max_1m_mrs = device->attrs.max_mr ?
 		min_t(unsigned int, (device->attrs.max_mr / 2),
-		      rds_ib_fmr_1m_pool_size) : rds_ib_fmr_1m_pool_size;
+		      rds_ib_mr_1m_pool_size) : rds_ib_mr_1m_pool_size;
 
-	rds_ibdev->max_8k_fmrs = device->attrs.max_mr ?
+	rds_ibdev->max_8k_mrs = device->attrs.max_mr ?
 		min_t(unsigned int, ((device->attrs.max_mr / 2) * RDS_MR_8K_SCALE),
-		      rds_ib_fmr_8k_pool_size) : rds_ib_fmr_8k_pool_size;
+		      rds_ib_mr_8k_pool_size) : rds_ib_mr_8k_pool_size;
 
 	rds_ibdev->max_initiator_depth = device->attrs.max_qp_init_rd_atom;
 	rds_ibdev->max_responder_resources = device->attrs.max_qp_rd_atom;
@@ -172,10 +173,10 @@ static void rds_ib_add_one(struct ib_device *device)
 		goto put_dev;
 	}
 
-	rdsdebug("RDS/IB: max_mr = %d, max_wrs = %d, max_sge = %d, fmr_max_remaps = %d, max_1m_fmrs = %d, max_8k_fmrs = %d\n",
+	rdsdebug("RDS/IB: max_mr = %d, max_wrs = %d, max_sge = %d, fmr_max_remaps = %d, max_1m_mrs = %d, max_8k_mrs = %d\n",
 		 device->attrs.max_fmr, rds_ibdev->max_wrs, rds_ibdev->max_sge,
-		 rds_ibdev->fmr_max_remaps, rds_ibdev->max_1m_fmrs,
-		 rds_ibdev->max_8k_fmrs);
+		 rds_ibdev->fmr_max_remaps, rds_ibdev->max_1m_mrs,
+		 rds_ibdev->max_8k_mrs);
 
 	INIT_LIST_HEAD(&rds_ibdev->ipaddr_list);
 	INIT_LIST_HEAD(&rds_ibdev->conn_list);
@@ -364,7 +365,7 @@ void rds_ib_exit(void)
 	rds_ib_sysctl_exit();
 	rds_ib_recv_exit();
 	rds_trans_unregister(&rds_ib_transport);
-	rds_ib_fmr_exit();
+	rds_ib_mr_exit();
 }
 
 struct rds_transport rds_ib_transport = {
@@ -400,13 +401,13 @@ int rds_ib_init(void)
 
 	INIT_LIST_HEAD(&rds_ib_devices);
 
-	ret = rds_ib_fmr_init();
+	ret = rds_ib_mr_init();
 	if (ret)
 		goto out;
 
 	ret = ib_register_client(&rds_ib_client);
 	if (ret)
-		goto out_fmr_exit;
+		goto out_mr_exit;
 
 	ret = rds_ib_sysctl_init();
 	if (ret)
@@ -430,8 +431,8 @@ out_sysctl:
 	rds_ib_sysctl_exit();
 out_ibreg:
 	rds_ib_unregister_client();
-out_fmr_exit:
-	rds_ib_fmr_exit();
+out_mr_exit:
+	rds_ib_mr_exit();
 out:
 	return ret;
 }
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 09cd8e3..c88cb22 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -9,12 +9,6 @@
 #include "rds.h"
 #include "rdma_transport.h"
 
-#define RDS_FMR_1M_POOL_SIZE		(8192 / 2)
-#define RDS_FMR_1M_MSG_SIZE		256
-#define RDS_FMR_8K_MSG_SIZE		2
-#define RDS_MR_8K_SCALE			(256 / (RDS_FMR_8K_MSG_SIZE + 1))
-#define RDS_FMR_8K_POOL_SIZE		(RDS_MR_8K_SCALE * (8192 / 2))
-
 #define RDS_IB_MAX_SGE			8
 #define RDS_IB_RECV_SGE 		2
 
@@ -206,12 +200,12 @@ struct rds_ib_device {
 	struct list_head	conn_list;
 	struct ib_device	*dev;
 	struct ib_pd		*pd;
-	unsigned int		max_fmrs;
+	unsigned int		max_mrs;
 	struct rds_ib_mr_pool	*mr_1m_pool;
 	struct rds_ib_mr_pool   *mr_8k_pool;
 	unsigned int		fmr_max_remaps;
-	unsigned int		max_8k_fmrs;
-	unsigned int		max_1m_fmrs;
+	unsigned int		max_8k_mrs;
+	unsigned int		max_1m_mrs;
 	int			max_sge;
 	unsigned int		max_wrs;
 	unsigned int		max_initiator_depth;
@@ -316,8 +310,6 @@ struct rds_ib_device *rds_ib_get_client_data(struct ib_device *device);
 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev);
 extern struct ib_client rds_ib_client;
 
-extern unsigned int rds_ib_fmr_1m_pool_size;
-extern unsigned int rds_ib_fmr_8k_pool_size;
 extern unsigned int rds_ib_retry_count;
 
 extern spinlock_t ib_nodev_conns_lock;
@@ -347,17 +339,6 @@ int rds_ib_update_ipaddr(struct rds_ib_device *rds_ibdev, __be32 ipaddr);
 void rds_ib_add_conn(struct rds_ib_device *rds_ibdev, struct rds_connection *conn);
 void rds_ib_remove_conn(struct rds_ib_device *rds_ibdev, struct rds_connection *conn);
 void rds_ib_destroy_nodev_conns(void);
-struct rds_ib_mr_pool *rds_ib_create_mr_pool(struct rds_ib_device *rds_dev,
-					     int npages);
-void rds_ib_get_mr_info(struct rds_ib_device *rds_ibdev, struct rds_info_rdma_connection *iinfo);
-void rds_ib_destroy_mr_pool(struct rds_ib_mr_pool *);
-void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
-		    struct rds_sock *rs, u32 *key_ret);
-void rds_ib_sync_mr(void *trans_private, int dir);
-void rds_ib_free_mr(void *trans_private, int invalidate);
-void rds_ib_flush_mrs(void);
-int rds_ib_fmr_init(void);
-void rds_ib_fmr_exit(void);
 
 /* ib_recv.c */
 int rds_ib_recv_init(void);
diff --git a/net/rds/ib_fmr.c b/net/rds/ib_fmr.c
new file mode 100644
index 0000000..d4f200d
--- /dev/null
+++ b/net/rds/ib_fmr.c
@@ -0,0 +1,217 @@
+/*
+ * Copyright (c) 2016 Oracle.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "ib_mr.h"
+
+struct rds_ib_mr *rds_ib_alloc_fmr(struct rds_ib_device *rds_ibdev, int npages)
+{
+	struct rds_ib_mr_pool *pool;
+	struct rds_ib_mr *ibmr = NULL;
+	int err = 0, iter = 0;
+
+	if (npages <= RDS_MR_8K_MSG_SIZE)
+		pool = rds_ibdev->mr_8k_pool;
+	else
+		pool = rds_ibdev->mr_1m_pool;
+
+	if (atomic_read(&pool->dirty_count) >= pool->max_items / 10)
+		queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10);
+
+	/* Switch pools if one of the pool is reaching upper limit */
+	if (atomic_read(&pool->dirty_count) >=  pool->max_items * 9 / 10) {
+		if (pool->pool_type == RDS_IB_MR_8K_POOL)
+			pool = rds_ibdev->mr_1m_pool;
+		else
+			pool = rds_ibdev->mr_8k_pool;
+	}
+
+	while (1) {
+		ibmr = rds_ib_reuse_mr(pool);
+		if (ibmr)
+			return ibmr;
+
+		/* No clean MRs - now we have the choice of either
+		 * allocating a fresh MR up to the limit imposed by the
+		 * driver, or flush any dirty unused MRs.
+		 * We try to avoid stalling in the send path if possible,
+		 * so we allocate as long as we're allowed to.
+		 *
+		 * We're fussy with enforcing the FMR limit, though. If the
+		 * driver tells us we can't use more than N fmrs, we shouldn't
+		 * start arguing with it
+		 */
+		if (atomic_inc_return(&pool->item_count) <= pool->max_items)
+			break;
+
+		atomic_dec(&pool->item_count);
+
+		if (++iter > 2) {
+			if (pool->pool_type == RDS_IB_MR_8K_POOL)
+				rds_ib_stats_inc(s_ib_rdma_mr_8k_pool_depleted);
+			else
+				rds_ib_stats_inc(s_ib_rdma_mr_1m_pool_depleted);
+			return ERR_PTR(-EAGAIN);
+		}
+
+		/* We do have some empty MRs. Flush them out. */
+		if (pool->pool_type == RDS_IB_MR_8K_POOL)
+			rds_ib_stats_inc(s_ib_rdma_mr_8k_pool_wait);
+		else
+			rds_ib_stats_inc(s_ib_rdma_mr_1m_pool_wait);
+		rds_ib_flush_mr_pool(pool, 0, &ibmr);
+		if (ibmr)
+			return ibmr;
+	}
+
+	ibmr = kzalloc_node(sizeof(*ibmr), GFP_KERNEL,
+			    rdsibdev_to_node(rds_ibdev));
+	if (!ibmr) {
+		err = -ENOMEM;
+		goto out_no_cigar;
+	}
+
+	ibmr->fmr = ib_alloc_fmr(rds_ibdev->pd,
+			(IB_ACCESS_LOCAL_WRITE |
+			 IB_ACCESS_REMOTE_READ |
+			 IB_ACCESS_REMOTE_WRITE |
+			 IB_ACCESS_REMOTE_ATOMIC),
+			&pool->fmr_attr);
+	if (IS_ERR(ibmr->fmr)) {
+		err = PTR_ERR(ibmr->fmr);
+		ibmr->fmr = NULL;
+		pr_warn("RDS/IB: %s failed (err=%d)\n", __func__, err);
+		goto out_no_cigar;
+	}
+
+	ibmr->pool = pool;
+	if (pool->pool_type == RDS_IB_MR_8K_POOL)
+		rds_ib_stats_inc(s_ib_rdma_mr_8k_alloc);
+	else
+		rds_ib_stats_inc(s_ib_rdma_mr_1m_alloc);
+
+	return ibmr;
+
+out_no_cigar:
+	if (ibmr) {
+		if (ibmr->fmr)
+			ib_dealloc_fmr(ibmr->fmr);
+		kfree(ibmr);
+	}
+	atomic_dec(&pool->item_count);
+	return ERR_PTR(err);
+}
+
+int rds_ib_map_fmr(struct rds_ib_device *rds_ibdev, struct rds_ib_mr *ibmr,
+		   struct scatterlist *sg, unsigned int nents)
+{
+	struct ib_device *dev = rds_ibdev->dev;
+	struct scatterlist *scat = sg;
+	u64 io_addr = 0;
+	u64 *dma_pages;
+	u32 len;
+	int page_cnt, sg_dma_len;
+	int i, j;
+	int ret;
+
+	sg_dma_len = ib_dma_map_sg(dev, sg, nents, DMA_BIDIRECTIONAL);
+	if (unlikely(!sg_dma_len)) {
+		pr_warn("RDS/IB: %s failed!\n", __func__);
+		return -EBUSY;
+	}
+
+	len = 0;
+	page_cnt = 0;
+
+	for (i = 0; i < sg_dma_len; ++i) {
+		unsigned int dma_len = ib_sg_dma_len(dev, &scat[i]);
+		u64 dma_addr = ib_sg_dma_address(dev, &scat[i]);
+
+		if (dma_addr & ~PAGE_MASK) {
+			if (i > 0)
+				return -EINVAL;
+			else
+				++page_cnt;
+		}
+		if ((dma_addr + dma_len) & ~PAGE_MASK) {
+			if (i < sg_dma_len - 1)
+				return -EINVAL;
+			else
+				++page_cnt;
+		}
+
+		len += dma_len;
+	}
+
+	page_cnt += len >> PAGE_SHIFT;
+	if (page_cnt > ibmr->pool->fmr_attr.max_pages)
+		return -EINVAL;
+
+	dma_pages = kmalloc_node(sizeof(u64) * page_cnt, GFP_ATOMIC,
+				 rdsibdev_to_node(rds_ibdev));
+	if (!dma_pages)
+		return -ENOMEM;
+
+	page_cnt = 0;
+	for (i = 0; i < sg_dma_len; ++i) {
+		unsigned int dma_len = ib_sg_dma_len(dev, &scat[i]);
+		u64 dma_addr = ib_sg_dma_address(dev, &scat[i]);
+
+		for (j = 0; j < dma_len; j += PAGE_SIZE)
+			dma_pages[page_cnt++] =
+				(dma_addr & PAGE_MASK) + j;
+	}
+
+	ret = ib_map_phys_fmr(ibmr->fmr, dma_pages, page_cnt, io_addr);
+	if (ret)
+		goto out;
+
+	/* Success - we successfully remapped the MR, so we can
+	 * safely tear down the old mapping.
+	 */
+	rds_ib_teardown_mr(ibmr);
+
+	ibmr->sg = scat;
+	ibmr->sg_len = nents;
+	ibmr->sg_dma_len = sg_dma_len;
+	ibmr->remap_count++;
+
+	if (ibmr->pool->pool_type == RDS_IB_MR_8K_POOL)
+		rds_ib_stats_inc(s_ib_rdma_mr_8k_used);
+	else
+		rds_ib_stats_inc(s_ib_rdma_mr_1m_used);
+	ret = 0;
+
+out:
+	kfree(dma_pages);
+
+	return ret;
+}
diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h
new file mode 100644
index 0000000..d88724f
--- /dev/null
+++ b/net/rds/ib_mr.h
@@ -0,0 +1,109 @@
+/*
+ * Copyright (c) 2016 Oracle.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef _RDS_IB_MR_H
+#define _RDS_IB_MR_H
+
+#include <linux/kernel.h>
+
+#include "rds.h"
+#include "ib.h"
+
+#define RDS_MR_1M_POOL_SIZE		(8192 / 2)
+#define RDS_MR_1M_MSG_SIZE		256
+#define RDS_MR_8K_MSG_SIZE		2
+#define RDS_MR_8K_SCALE			(256 / (RDS_MR_8K_MSG_SIZE + 1))
+#define RDS_MR_8K_POOL_SIZE		(RDS_MR_8K_SCALE * (8192 / 2))
+
+/* This is stored as mr->r_trans_private. */
+struct rds_ib_mr {
+	struct rds_ib_device	*device;
+	struct rds_ib_mr_pool	*pool;
+	struct ib_fmr		*fmr;
+
+	struct llist_node	llnode;
+
+	/* unmap_list is for freeing */
+	struct list_head	unmap_list;
+	unsigned int		remap_count;
+
+	struct scatterlist	*sg;
+	unsigned int		sg_len;
+	u64			*dma;
+	int			sg_dma_len;
+};
+
+/* Our own little MR pool */
+struct rds_ib_mr_pool {
+	unsigned int            pool_type;
+	struct mutex		flush_lock;	/* serialize fmr invalidate */
+	struct delayed_work	flush_worker;	/* flush worker */
+
+	atomic_t		item_count;	/* total # of MRs */
+	atomic_t		dirty_count;	/* # dirty of MRs */
+
+	struct llist_head	drop_list;	/* MRs not reached max_maps */
+	struct llist_head	free_list;	/* unused MRs */
+	struct llist_head	clean_list;	/* unused & unmapped MRs */
+	wait_queue_head_t	flush_wait;
+
+	atomic_t		free_pinned;	/* memory pinned by free MRs */
+	unsigned long		max_items;
+	unsigned long		max_items_soft;
+	unsigned long		max_free_pinned;
+	struct ib_fmr_attr	fmr_attr;
+};
+
+extern struct workqueue_struct *rds_ib_mr_wq;
+extern unsigned int rds_ib_mr_1m_pool_size;
+extern unsigned int rds_ib_mr_8k_pool_size;
+
+struct rds_ib_mr_pool *rds_ib_create_mr_pool(struct rds_ib_device *rds_dev,
+					     int npages);
+void rds_ib_get_mr_info(struct rds_ib_device *rds_ibdev,
+			struct rds_info_rdma_connection *iinfo);
+void rds_ib_destroy_mr_pool(struct rds_ib_mr_pool *);
+void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
+		    struct rds_sock *rs, u32 *key_ret);
+void rds_ib_sync_mr(void *trans_private, int dir);
+void rds_ib_free_mr(void *trans_private, int invalidate);
+void rds_ib_flush_mrs(void);
+int rds_ib_mr_init(void);
+void rds_ib_mr_exit(void);
+
+void __rds_ib_teardown_mr(struct rds_ib_mr *);
+void rds_ib_teardown_mr(struct rds_ib_mr *);
+struct rds_ib_mr *rds_ib_alloc_fmr(struct rds_ib_device *, int);
+int rds_ib_map_fmr(struct rds_ib_device *, struct rds_ib_mr *,
+		   struct scatterlist *, unsigned int);
+struct rds_ib_mr *rds_ib_reuse_mr(struct rds_ib_mr_pool *);
+int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *, int, struct rds_ib_mr **);
+#endif
diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index a234074..c594519 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -35,78 +35,13 @@
 #include <linux/rculist.h>
 #include <linux/llist.h>
 
-#include "rds.h"
-#include "ib.h"
+#include "ib_mr.h"
+
+struct workqueue_struct *rds_ib_mr_wq;
 
 static DEFINE_PER_CPU(unsigned long, clean_list_grace);
 #define CLEAN_LIST_BUSY_BIT 0
 
-/*
- * This is stored as mr->r_trans_private.
- */
-struct rds_ib_mr {
-	struct rds_ib_device	*device;
-	struct rds_ib_mr_pool	*pool;
-	struct ib_fmr		*fmr;
-
-	struct llist_node	llnode;
-
-	/* unmap_list is for freeing */
-	struct list_head	unmap_list;
-	unsigned int		remap_count;
-
-	struct scatterlist	*sg;
-	unsigned int		sg_len;
-	u64			*dma;
-	int			sg_dma_len;
-};
-
-/*
- * Our own little FMR pool
- */
-struct rds_ib_mr_pool {
-	unsigned int            pool_type;
-	struct mutex		flush_lock;		/* serialize fmr invalidate */
-	struct delayed_work	flush_worker;		/* flush worker */
-
-	atomic_t		item_count;		/* total # of MRs */
-	atomic_t		dirty_count;		/* # dirty of MRs */
-
-	struct llist_head	drop_list;		/* MRs that have reached their max_maps limit */
-	struct llist_head	free_list;		/* unused MRs */
-	struct llist_head	clean_list;		/* global unused & unamapped MRs */
-	wait_queue_head_t	flush_wait;
-
-	atomic_t		free_pinned;		/* memory pinned by free MRs */
-	unsigned long		max_items;
-	unsigned long		max_items_soft;
-	unsigned long		max_free_pinned;
-	struct ib_fmr_attr	fmr_attr;
-};
-
-static struct workqueue_struct *rds_ib_fmr_wq;
-
-int rds_ib_fmr_init(void)
-{
-	rds_ib_fmr_wq = create_workqueue("rds_fmr_flushd");
-	if (!rds_ib_fmr_wq)
-		return -ENOMEM;
-	return 0;
-}
-
-/* By the time this is called all the IB devices should have been torn down and
- * had their pools freed.  As each pool is freed its work struct is waited on,
- * so the pool flushing work queue should be idle by the time we get here.
- */
-void rds_ib_fmr_exit(void)
-{
-	destroy_workqueue(rds_ib_fmr_wq);
-}
-
-static int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool, int free_all, struct rds_ib_mr **);
-static void rds_ib_teardown_mr(struct rds_ib_mr *ibmr);
-static void rds_ib_mr_pool_flush_worker(struct work_struct *work);
-
 static struct rds_ib_device *rds_ib_get_device(__be32 ipaddr)
 {
 	struct rds_ib_device *rds_ibdev;
@@ -235,41 +170,6 @@ void rds_ib_destroy_nodev_conns(void)
 		rds_conn_destroy(ic->conn);
 }
 
-struct rds_ib_mr_pool *rds_ib_create_mr_pool(struct rds_ib_device *rds_ibdev,
-					     int pool_type)
-{
-	struct rds_ib_mr_pool *pool;
-
-	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
-	if (!pool)
-		return ERR_PTR(-ENOMEM);
-
-	pool->pool_type = pool_type;
-	init_llist_head(&pool->free_list);
-	init_llist_head(&pool->drop_list);
-	init_llist_head(&pool->clean_list);
-	mutex_init(&pool->flush_lock);
-	init_waitqueue_head(&pool->flush_wait);
-	INIT_DELAYED_WORK(&pool->flush_worker, rds_ib_mr_pool_flush_worker);
-
-	if (pool_type == RDS_IB_MR_1M_POOL) {
-		/* +1 allows for unaligned MRs */
-		pool->fmr_attr.max_pages = RDS_FMR_1M_MSG_SIZE + 1;
-		pool->max_items = RDS_FMR_1M_POOL_SIZE;
-	} else {
-		/* pool_type == RDS_IB_MR_8K_POOL */
-		pool->fmr_attr.max_pages = RDS_FMR_8K_MSG_SIZE + 1;
-		pool->max_items = RDS_FMR_8K_POOL_SIZE;
-	}
-
-	pool->max_free_pinned = pool->max_items * pool->fmr_attr.max_pages / 4;
-	pool->fmr_attr.max_maps = rds_ibdev->fmr_max_remaps;
-	pool->fmr_attr.page_shift = PAGE_SHIFT;
-	pool->max_items_soft = rds_ibdev->max_fmrs * 3 / 4;
-
-	return pool;
-}
-
 void rds_ib_get_mr_info(struct rds_ib_device *rds_ibdev, struct rds_info_rdma_connection *iinfo)
 {
 	struct rds_ib_mr_pool *pool_1m = rds_ibdev->mr_1m_pool;
@@ -278,16 +178,7 @@ void rds_ib_get_mr_info(struct rds_ib_device *rds_ibdev, struct rds_info_rdma_co
 	iinfo->rdma_mr_size = pool_1m->fmr_attr.max_pages;
 }
 
-void rds_ib_destroy_mr_pool(struct rds_ib_mr_pool *pool)
-{
-	cancel_delayed_work_sync(&pool->flush_worker);
-	rds_ib_flush_mr_pool(pool, 1, NULL);
-	WARN_ON(atomic_read(&pool->item_count));
-	WARN_ON(atomic_read(&pool->free_pinned));
-	kfree(pool);
-}
-
-static inline struct rds_ib_mr *rds_ib_reuse_fmr(struct rds_ib_mr_pool *pool)
+struct rds_ib_mr *rds_ib_reuse_mr(struct rds_ib_mr_pool *pool)
 {
 	struct rds_ib_mr *ibmr = NULL;
 	struct llist_node *ret;
@@ -317,190 +208,6 @@ static inline void wait_clean_list_grace(void)
 	}
 }
 
-static struct rds_ib_mr *rds_ib_alloc_fmr(struct rds_ib_device *rds_ibdev,
-					  int npages)
-{
-	struct rds_ib_mr_pool *pool;
-	struct rds_ib_mr *ibmr = NULL;
-	int err = 0, iter = 0;
-
-	if (npages <= RDS_FMR_8K_MSG_SIZE)
-		pool = rds_ibdev->mr_8k_pool;
-	else
-		pool = rds_ibdev->mr_1m_pool;
-
-	if (atomic_read(&pool->dirty_count) >= pool->max_items / 10)
-		queue_delayed_work(rds_ib_fmr_wq, &pool->flush_worker, 10);
-
-	/* Switch pools if one of the pool is reaching upper limit */
-	if (atomic_read(&pool->dirty_count) >=  pool->max_items * 9 / 10) {
-		if (pool->pool_type == RDS_IB_MR_8K_POOL)
-			pool = rds_ibdev->mr_1m_pool;
-		else
-			pool = rds_ibdev->mr_8k_pool;
-	}
-
-	while (1) {
-		ibmr = rds_ib_reuse_fmr(pool);
-		if (ibmr)
-			return ibmr;
-
-		/* No clean MRs - now we have the choice of either
-		 * allocating a fresh MR up to the limit imposed by the
-		 * driver, or flush any dirty unused MRs.
-		 * We try to avoid stalling in the send path if possible,
-		 * so we allocate as long as we're allowed to.
-		 *
-		 * We're fussy with enforcing the FMR limit, though. If the driver
-		 * tells us we can't use more than N fmrs, we shouldn't start
-		 * arguing with it */
-		if (atomic_inc_return(&pool->item_count) <= pool->max_items)
-			break;
-
-		atomic_dec(&pool->item_count);
-
-		if (++iter > 2) {
-			if (pool->pool_type == RDS_IB_MR_8K_POOL)
-				rds_ib_stats_inc(s_ib_rdma_mr_8k_pool_depleted);
-			else
-				rds_ib_stats_inc(s_ib_rdma_mr_1m_pool_depleted);
-			return ERR_PTR(-EAGAIN);
-		}
-
-		/* We do have some empty MRs. Flush them out. */
-		if (pool->pool_type == RDS_IB_MR_8K_POOL)
-			rds_ib_stats_inc(s_ib_rdma_mr_8k_pool_wait);
-		else
-			rds_ib_stats_inc(s_ib_rdma_mr_1m_pool_wait);
-		rds_ib_flush_mr_pool(pool, 0, &ibmr);
-		if (ibmr)
-			return ibmr;
-	}
-
-	ibmr = kzalloc_node(sizeof(*ibmr), GFP_KERNEL, rdsibdev_to_node(rds_ibdev));
-	if (!ibmr) {
-		err = -ENOMEM;
-		goto out_no_cigar;
-	}
-
-	ibmr->fmr = ib_alloc_fmr(rds_ibdev->pd,
-			(IB_ACCESS_LOCAL_WRITE |
-			 IB_ACCESS_REMOTE_READ |
-			 IB_ACCESS_REMOTE_WRITE|
-			 IB_ACCESS_REMOTE_ATOMIC),
-			&pool->fmr_attr);
-	if (IS_ERR(ibmr->fmr)) {
-		err = PTR_ERR(ibmr->fmr);
-		ibmr->fmr = NULL;
-		printk(KERN_WARNING "RDS/IB: ib_alloc_fmr failed (err=%d)\n", err);
-		goto out_no_cigar;
-	}
-
-	ibmr->pool = pool;
-	if (pool->pool_type == RDS_IB_MR_8K_POOL)
-		rds_ib_stats_inc(s_ib_rdma_mr_8k_alloc);
-	else
-		rds_ib_stats_inc(s_ib_rdma_mr_1m_alloc);
-
-	return ibmr;
-
-out_no_cigar:
-	if (ibmr) {
-		if (ibmr->fmr)
-			ib_dealloc_fmr(ibmr->fmr);
-		kfree(ibmr);
-	}
-	atomic_dec(&pool->item_count);
-	return ERR_PTR(err);
-}
-
-static int rds_ib_map_fmr(struct rds_ib_device *rds_ibdev, struct rds_ib_mr *ibmr,
-	       struct scatterlist *sg, unsigned int nents)
-{
-	struct ib_device *dev = rds_ibdev->dev;
-	struct scatterlist *scat = sg;
-	u64 io_addr = 0;
-	u64 *dma_pages;
-	u32 len;
-	int page_cnt, sg_dma_len;
-	int i, j;
-	int ret;
-
-	sg_dma_len = ib_dma_map_sg(dev, sg, nents,
-				 DMA_BIDIRECTIONAL);
-	if (unlikely(!sg_dma_len)) {
-		printk(KERN_WARNING "RDS/IB: dma_map_sg failed!\n");
-		return -EBUSY;
-	}
-
-	len = 0;
-	page_cnt = 0;
-
-	for (i = 0; i < sg_dma_len; ++i) {
-		unsigned int dma_len = ib_sg_dma_len(dev, &scat[i]);
-		u64 dma_addr = ib_sg_dma_address(dev, &scat[i]);
-
-		if (dma_addr & ~PAGE_MASK) {
-			if (i > 0)
-				return -EINVAL;
-			else
-				++page_cnt;
-		}
-		if ((dma_addr + dma_len) & ~PAGE_MASK) {
-			if (i < sg_dma_len - 1)
-				return -EINVAL;
-			else
-				++page_cnt;
-		}
-
-		len += dma_len;
-	}
-
-	page_cnt += len >> PAGE_SHIFT;
-	if (page_cnt > ibmr->pool->fmr_attr.max_pages)
-		return -EINVAL;
-
-	dma_pages = kmalloc_node(sizeof(u64) * page_cnt, GFP_ATOMIC,
-				 rdsibdev_to_node(rds_ibdev));
-	if (!dma_pages)
-		return -ENOMEM;
-
-	page_cnt = 0;
-	for (i = 0; i < sg_dma_len; ++i) {
-		unsigned int dma_len = ib_sg_dma_len(dev, &scat[i]);
-		u64 dma_addr = ib_sg_dma_address(dev, &scat[i]);
-
-		for (j = 0; j < dma_len; j += PAGE_SIZE)
-			dma_pages[page_cnt++] =
-				(dma_addr & PAGE_MASK) + j;
-	}
-
-	ret = ib_map_phys_fmr(ibmr->fmr,
-				   dma_pages, page_cnt, io_addr);
-	if (ret)
-		goto out;
-
-	/* Success - we successfully remapped the MR, so we can
-	 * safely tear down the old mapping. */
-	rds_ib_teardown_mr(ibmr);
-
-	ibmr->sg = scat;
-	ibmr->sg_len = nents;
-	ibmr->sg_dma_len = sg_dma_len;
-	ibmr->remap_count++;
-
-	if (ibmr->pool->pool_type == RDS_IB_MR_8K_POOL)
-		rds_ib_stats_inc(s_ib_rdma_mr_8k_used);
-	else
-		rds_ib_stats_inc(s_ib_rdma_mr_1m_used);
-	ret = 0;
-
-out:
-	kfree(dma_pages);
-
-	return ret;
-}
-
 void rds_ib_sync_mr(void *trans_private, int direction)
 {
 	struct rds_ib_mr *ibmr = trans_private;
@@ -518,7 +225,7 @@ void rds_ib_sync_mr(void *trans_private, int direction)
 	}
 }
 
-static void __rds_ib_teardown_mr(struct rds_ib_mr *ibmr)
+void __rds_ib_teardown_mr(struct rds_ib_mr *ibmr)
 {
 	struct rds_ib_device *rds_ibdev = ibmr->device;
 
@@ -549,7 +256,7 @@ static void __rds_ib_teardown_mr(struct rds_ib_mr *ibmr)
 	}
 }
 
-static void rds_ib_teardown_mr(struct rds_ib_mr *ibmr)
+void rds_ib_teardown_mr(struct rds_ib_mr *ibmr)
 {
 	unsigned int pinned = ibmr->sg_len;
 
@@ -623,8 +330,8 @@ static void list_to_llist_nodes(struct rds_ib_mr_pool *pool,
  * If the number of MRs allocated exceeds the limit, we also try
  * to free as many MRs as needed to get back to this limit.
  */
-static int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
-				int free_all, struct rds_ib_mr **ibmr_ret)
+int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
+			 int free_all, struct rds_ib_mr **ibmr_ret)
 {
 	struct rds_ib_mr *ibmr, *next;
 	struct llist_node *clean_nodes;
@@ -643,7 +350,7 @@ static int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
 	if (ibmr_ret) {
 		DEFINE_WAIT(wait);
 		while (!mutex_trylock(&pool->flush_lock)) {
-			ibmr = rds_ib_reuse_fmr(pool);
+			ibmr = rds_ib_reuse_mr(pool);
 			if (ibmr) {
 				*ibmr_ret = ibmr;
 				finish_wait(&pool->flush_wait, &wait);
@@ -655,7 +362,7 @@ static int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
 			if (llist_empty(&pool->clean_list))
 				schedule();
 
-			ibmr = rds_ib_reuse_fmr(pool);
+			ibmr = rds_ib_reuse_mr(pool);
 			if (ibmr) {
 				*ibmr_ret = ibmr;
 				finish_wait(&pool->flush_wait, &wait);
@@ -667,7 +374,7 @@ static int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
 		mutex_lock(&pool->flush_lock);
 
 	if (ibmr_ret) {
-		ibmr = rds_ib_reuse_fmr(pool);
+		ibmr = rds_ib_reuse_mr(pool);
 		if (ibmr) {
 			*ibmr_ret = ibmr;
 			goto out;
@@ -773,7 +480,7 @@ void rds_ib_free_mr(void *trans_private, int invalidate)
 	/* If we've pinned too many pages, request a flush */
 	if (atomic_read(&pool->free_pinned) >= pool->max_free_pinned ||
 	    atomic_read(&pool->dirty_count) >= pool->max_items / 5)
-		queue_delayed_work(rds_ib_fmr_wq, &pool->flush_worker, 10);
+		queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10);
 
 	if (invalidate) {
 		if (likely(!in_interrupt())) {
@@ -782,7 +489,7 @@ void rds_ib_free_mr(void *trans_private, int invalidate)
 			/* We get here if the user created a MR marked
 			 * as use_once and invalidate at the same time.
 			 */
-			queue_delayed_work(rds_ib_fmr_wq,
+			queue_delayed_work(rds_ib_mr_wq,
 					   &pool->flush_worker, 10);
 		}
 	}
@@ -849,3 +556,63 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
 	return ibmr;
 }
 
+void rds_ib_destroy_mr_pool(struct rds_ib_mr_pool *pool)
+{
+	cancel_delayed_work_sync(&pool->flush_worker);
+	rds_ib_flush_mr_pool(pool, 1, NULL);
+	WARN_ON(atomic_read(&pool->item_count));
+	WARN_ON(atomic_read(&pool->free_pinned));
+	kfree(pool);
+}
+
+struct rds_ib_mr_pool *rds_ib_create_mr_pool(struct rds_ib_device *rds_ibdev,
+					     int pool_type)
+{
+	struct rds_ib_mr_pool *pool;
+
+	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+	if (!pool)
+		return ERR_PTR(-ENOMEM);
+
+	pool->pool_type = pool_type;
+	init_llist_head(&pool->free_list);
+	init_llist_head(&pool->drop_list);
+	init_llist_head(&pool->clean_list);
+	mutex_init(&pool->flush_lock);
+	init_waitqueue_head(&pool->flush_wait);
+	INIT_DELAYED_WORK(&pool->flush_worker, rds_ib_mr_pool_flush_worker);
+
+	if (pool_type == RDS_IB_MR_1M_POOL) {
+		/* +1 allows for unaligned MRs */
+		pool->fmr_attr.max_pages = RDS_MR_1M_MSG_SIZE + 1;
+		pool->max_items = RDS_MR_1M_POOL_SIZE;
+	} else {
+		/* pool_type == RDS_IB_MR_8K_POOL */
+		pool->fmr_attr.max_pages = RDS_MR_8K_MSG_SIZE + 1;
+		pool->max_items = RDS_MR_8K_POOL_SIZE;
+	}
+
+	pool->max_free_pinned = pool->max_items * pool->fmr_attr.max_pages / 4;
+	pool->fmr_attr.max_maps = rds_ibdev->fmr_max_remaps;
+	pool->fmr_attr.page_shift = PAGE_SHIFT;
+	pool->max_items_soft = rds_ibdev->max_mrs * 3 / 4;
+
+	return pool;
+}
+
+int rds_ib_mr_init(void)
+{
+	rds_ib_mr_wq = create_workqueue("rds_mr_flushd");
+	if (!rds_ib_mr_wq)
+		return -ENOMEM;
+	return 0;
+}
+
+/* By the time this is called all the IB devices should have been torn down and
+ * had their pools freed.  As each pool is freed its work struct is waited on,
+ * so the pool flushing work queue should be idle by the time we get here.
+ */
+void rds_ib_mr_exit(void)
+{
+	destroy_workqueue(rds_ib_mr_wq);
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 06/13] RDS: IB: create struct rds_ib_fmr
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
                   ` (4 preceding siblings ...)
  2016-02-28  2:19 ` [net-next][PATCH v2 05/13] RDS: IB: Re-organise ibmr code Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 07/13] RDS: IB: move FMR code to its own file Santosh Shilimkar
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

Keep fmr related filed in its own struct. Fastreg MR structure
will be added to the union.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/ib_fmr.c  | 17 ++++++++++-------
 net/rds/ib_mr.h   | 11 +++++++++--
 net/rds/ib_rdma.c | 14 ++++++++++----
 3 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/net/rds/ib_fmr.c b/net/rds/ib_fmr.c
index d4f200d..74f2c21 100644
--- a/net/rds/ib_fmr.c
+++ b/net/rds/ib_fmr.c
@@ -36,6 +36,7 @@ struct rds_ib_mr *rds_ib_alloc_fmr(struct rds_ib_device *rds_ibdev, int npages)
 {
 	struct rds_ib_mr_pool *pool;
 	struct rds_ib_mr *ibmr = NULL;
+	struct rds_ib_fmr *fmr;
 	int err = 0, iter = 0;
 
 	if (npages <= RDS_MR_8K_MSG_SIZE)
@@ -99,15 +100,16 @@ struct rds_ib_mr *rds_ib_alloc_fmr(struct rds_ib_device *rds_ibdev, int npages)
 		goto out_no_cigar;
 	}
 
-	ibmr->fmr = ib_alloc_fmr(rds_ibdev->pd,
+	fmr = &ibmr->u.fmr;
+	fmr->fmr = ib_alloc_fmr(rds_ibdev->pd,
 			(IB_ACCESS_LOCAL_WRITE |
 			 IB_ACCESS_REMOTE_READ |
 			 IB_ACCESS_REMOTE_WRITE |
 			 IB_ACCESS_REMOTE_ATOMIC),
 			&pool->fmr_attr);
-	if (IS_ERR(ibmr->fmr)) {
-		err = PTR_ERR(ibmr->fmr);
-		ibmr->fmr = NULL;
+	if (IS_ERR(fmr->fmr)) {
+		err = PTR_ERR(fmr->fmr);
+		fmr->fmr = NULL;
 		pr_warn("RDS/IB: %s failed (err=%d)\n", __func__, err);
 		goto out_no_cigar;
 	}
@@ -122,8 +124,8 @@ struct rds_ib_mr *rds_ib_alloc_fmr(struct rds_ib_device *rds_ibdev, int npages)
 
 out_no_cigar:
 	if (ibmr) {
-		if (ibmr->fmr)
-			ib_dealloc_fmr(ibmr->fmr);
+		if (fmr->fmr)
+			ib_dealloc_fmr(fmr->fmr);
 		kfree(ibmr);
 	}
 	atomic_dec(&pool->item_count);
@@ -134,6 +136,7 @@ int rds_ib_map_fmr(struct rds_ib_device *rds_ibdev, struct rds_ib_mr *ibmr,
 		   struct scatterlist *sg, unsigned int nents)
 {
 	struct ib_device *dev = rds_ibdev->dev;
+	struct rds_ib_fmr *fmr = &ibmr->u.fmr;
 	struct scatterlist *scat = sg;
 	u64 io_addr = 0;
 	u64 *dma_pages;
@@ -190,7 +193,7 @@ int rds_ib_map_fmr(struct rds_ib_device *rds_ibdev, struct rds_ib_mr *ibmr,
 				(dma_addr & PAGE_MASK) + j;
 	}
 
-	ret = ib_map_phys_fmr(ibmr->fmr, dma_pages, page_cnt, io_addr);
+	ret = ib_map_phys_fmr(fmr->fmr, dma_pages, page_cnt, io_addr);
 	if (ret)
 		goto out;
 
diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h
index d88724f..309ad59 100644
--- a/net/rds/ib_mr.h
+++ b/net/rds/ib_mr.h
@@ -43,11 +43,15 @@
 #define RDS_MR_8K_SCALE			(256 / (RDS_MR_8K_MSG_SIZE + 1))
 #define RDS_MR_8K_POOL_SIZE		(RDS_MR_8K_SCALE * (8192 / 2))
 
+struct rds_ib_fmr {
+	struct ib_fmr		*fmr;
+	u64			*dma;
+};
+
 /* This is stored as mr->r_trans_private. */
 struct rds_ib_mr {
 	struct rds_ib_device	*device;
 	struct rds_ib_mr_pool	*pool;
-	struct ib_fmr		*fmr;
 
 	struct llist_node	llnode;
 
@@ -57,8 +61,11 @@ struct rds_ib_mr {
 
 	struct scatterlist	*sg;
 	unsigned int		sg_len;
-	u64			*dma;
 	int			sg_dma_len;
+
+	union {
+		struct rds_ib_fmr	fmr;
+	} u;
 };
 
 /* Our own little MR pool */
diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index c594519..9e608d9 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -334,6 +334,7 @@ int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
 			 int free_all, struct rds_ib_mr **ibmr_ret)
 {
 	struct rds_ib_mr *ibmr, *next;
+	struct rds_ib_fmr *fmr;
 	struct llist_node *clean_nodes;
 	struct llist_node *clean_tail;
 	LIST_HEAD(unmap_list);
@@ -395,8 +396,10 @@ int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
 		goto out;
 
 	/* String all ib_mr's onto one list and hand them to ib_unmap_fmr */
-	list_for_each_entry(ibmr, &unmap_list, unmap_list)
-		list_add(&ibmr->fmr->list, &fmr_list);
+	list_for_each_entry(ibmr, &unmap_list, unmap_list) {
+		fmr = &ibmr->u.fmr;
+		list_add(&fmr->fmr->list, &fmr_list);
+	}
 
 	ret = ib_unmap_fmr(&fmr_list);
 	if (ret)
@@ -405,6 +408,7 @@ int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
 	/* Now we can destroy the DMA mapping and unpin any pages */
 	list_for_each_entry_safe(ibmr, next, &unmap_list, unmap_list) {
 		unpinned += ibmr->sg_len;
+		fmr = &ibmr->u.fmr;
 		__rds_ib_teardown_mr(ibmr);
 		if (nfreed < free_goal ||
 		    ibmr->remap_count >= pool->fmr_attr.max_maps) {
@@ -413,7 +417,7 @@ int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
 			else
 				rds_ib_stats_inc(s_ib_rdma_mr_1m_free);
 			list_del(&ibmr->unmap_list);
-			ib_dealloc_fmr(ibmr->fmr);
+			ib_dealloc_fmr(fmr->fmr);
 			kfree(ibmr);
 			nfreed++;
 		}
@@ -517,6 +521,7 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
 {
 	struct rds_ib_device *rds_ibdev;
 	struct rds_ib_mr *ibmr = NULL;
+	struct rds_ib_fmr *fmr;
 	int ret;
 
 	rds_ibdev = rds_ib_get_device(rs->rs_bound_addr);
@@ -536,9 +541,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
 		return ibmr;
 	}
 
+	fmr = &ibmr->u.fmr;
 	ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents);
 	if (ret == 0)
-		*key_ret = ibmr->fmr->rkey;
+		*key_ret = fmr->fmr->rkey;
 	else
 		printk(KERN_WARNING "RDS/IB: map_fmr failed (errno=%d)\n", ret);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 07/13] RDS: IB: move FMR code to its own file
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
                   ` (5 preceding siblings ...)
  2016-02-28  2:19 ` [net-next][PATCH v2 06/13] RDS: IB: create struct rds_ib_fmr Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 08/13] RDS: IB: add connection info to ibmr Santosh Shilimkar
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

No functional change.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/ib_fmr.c  | 126 +++++++++++++++++++++++++++++++++---------------------
 net/rds/ib_mr.h   |   6 +++
 net/rds/ib_rdma.c | 108 ++++++++++++++++++++++------------------------
 3 files changed, 134 insertions(+), 106 deletions(-)

diff --git a/net/rds/ib_fmr.c b/net/rds/ib_fmr.c
index 74f2c21..4fe8f4f 100644
--- a/net/rds/ib_fmr.c
+++ b/net/rds/ib_fmr.c
@@ -37,61 +37,16 @@ struct rds_ib_mr *rds_ib_alloc_fmr(struct rds_ib_device *rds_ibdev, int npages)
 	struct rds_ib_mr_pool *pool;
 	struct rds_ib_mr *ibmr = NULL;
 	struct rds_ib_fmr *fmr;
-	int err = 0, iter = 0;
+	int err = 0;
 
 	if (npages <= RDS_MR_8K_MSG_SIZE)
 		pool = rds_ibdev->mr_8k_pool;
 	else
 		pool = rds_ibdev->mr_1m_pool;
 
-	if (atomic_read(&pool->dirty_count) >= pool->max_items / 10)
-		queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10);
-
-	/* Switch pools if one of the pool is reaching upper limit */
-	if (atomic_read(&pool->dirty_count) >=  pool->max_items * 9 / 10) {
-		if (pool->pool_type == RDS_IB_MR_8K_POOL)
-			pool = rds_ibdev->mr_1m_pool;
-		else
-			pool = rds_ibdev->mr_8k_pool;
-	}
-
-	while (1) {
-		ibmr = rds_ib_reuse_mr(pool);
-		if (ibmr)
-			return ibmr;
-
-		/* No clean MRs - now we have the choice of either
-		 * allocating a fresh MR up to the limit imposed by the
-		 * driver, or flush any dirty unused MRs.
-		 * We try to avoid stalling in the send path if possible,
-		 * so we allocate as long as we're allowed to.
-		 *
-		 * We're fussy with enforcing the FMR limit, though. If the
-		 * driver tells us we can't use more than N fmrs, we shouldn't
-		 * start arguing with it
-		 */
-		if (atomic_inc_return(&pool->item_count) <= pool->max_items)
-			break;
-
-		atomic_dec(&pool->item_count);
-
-		if (++iter > 2) {
-			if (pool->pool_type == RDS_IB_MR_8K_POOL)
-				rds_ib_stats_inc(s_ib_rdma_mr_8k_pool_depleted);
-			else
-				rds_ib_stats_inc(s_ib_rdma_mr_1m_pool_depleted);
-			return ERR_PTR(-EAGAIN);
-		}
-
-		/* We do have some empty MRs. Flush them out. */
-		if (pool->pool_type == RDS_IB_MR_8K_POOL)
-			rds_ib_stats_inc(s_ib_rdma_mr_8k_pool_wait);
-		else
-			rds_ib_stats_inc(s_ib_rdma_mr_1m_pool_wait);
-		rds_ib_flush_mr_pool(pool, 0, &ibmr);
-		if (ibmr)
-			return ibmr;
-	}
+	ibmr = rds_ib_try_reuse_ibmr(pool);
+	if (ibmr)
+		return ibmr;
 
 	ibmr = kzalloc_node(sizeof(*ibmr), GFP_KERNEL,
 			    rdsibdev_to_node(rds_ibdev));
@@ -218,3 +173,76 @@ out:
 
 	return ret;
 }
+
+struct rds_ib_mr *rds_ib_reg_fmr(struct rds_ib_device *rds_ibdev,
+				 struct scatterlist *sg,
+				 unsigned long nents,
+				 u32 *key)
+{
+	struct rds_ib_mr *ibmr = NULL;
+	struct rds_ib_fmr *fmr;
+	int ret;
+
+	ibmr = rds_ib_alloc_fmr(rds_ibdev, nents);
+	if (IS_ERR(ibmr))
+		return ibmr;
+
+	ibmr->device = rds_ibdev;
+	fmr = &ibmr->u.fmr;
+	ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents);
+	if (ret == 0)
+		*key = fmr->fmr->rkey;
+	else
+		rds_ib_free_mr(ibmr, 0);
+
+	return ibmr;
+}
+
+void rds_ib_unreg_fmr(struct list_head *list, unsigned int *nfreed,
+		      unsigned long *unpinned, unsigned int goal)
+{
+	struct rds_ib_mr *ibmr, *next;
+	struct rds_ib_fmr *fmr;
+	LIST_HEAD(fmr_list);
+	int ret = 0;
+	unsigned int freed = *nfreed;
+
+	/* String all ib_mr's onto one list and hand them to  ib_unmap_fmr */
+	list_for_each_entry(ibmr, list, unmap_list) {
+		fmr = &ibmr->u.fmr;
+		list_add(&fmr->fmr->list, &fmr_list);
+	}
+
+	ret = ib_unmap_fmr(&fmr_list);
+	if (ret)
+		pr_warn("RDS/IB: FMR invalidation failed (err=%d)\n", ret);
+
+	/* Now we can destroy the DMA mapping and unpin any pages */
+	list_for_each_entry_safe(ibmr, next, list, unmap_list) {
+		fmr = &ibmr->u.fmr;
+		*unpinned += ibmr->sg_len;
+		__rds_ib_teardown_mr(ibmr);
+		if (freed < goal ||
+		    ibmr->remap_count >= ibmr->pool->fmr_attr.max_maps) {
+			if (ibmr->pool->pool_type == RDS_IB_MR_8K_POOL)
+				rds_ib_stats_inc(s_ib_rdma_mr_8k_free);
+			else
+				rds_ib_stats_inc(s_ib_rdma_mr_1m_free);
+			list_del(&ibmr->unmap_list);
+			ib_dealloc_fmr(fmr->fmr);
+			kfree(ibmr);
+			freed++;
+		}
+	}
+	*nfreed = freed;
+}
+
+void rds_ib_free_fmr_list(struct rds_ib_mr *ibmr)
+{
+	struct rds_ib_mr_pool *pool = ibmr->pool;
+
+	if (ibmr->remap_count >= pool->fmr_attr.max_maps)
+		llist_add(&ibmr->llnode, &pool->drop_list);
+	else
+		llist_add(&ibmr->llnode, &pool->free_list);
+}
diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h
index 309ad59..f5c1fcb 100644
--- a/net/rds/ib_mr.h
+++ b/net/rds/ib_mr.h
@@ -113,4 +113,10 @@ int rds_ib_map_fmr(struct rds_ib_device *, struct rds_ib_mr *,
 		   struct scatterlist *, unsigned int);
 struct rds_ib_mr *rds_ib_reuse_mr(struct rds_ib_mr_pool *);
 int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *, int, struct rds_ib_mr **);
+struct rds_ib_mr *rds_ib_reg_fmr(struct rds_ib_device *, struct scatterlist *,
+				 unsigned long, u32 *);
+struct rds_ib_mr *rds_ib_try_reuse_ibmr(struct rds_ib_mr_pool *);
+void rds_ib_unreg_fmr(struct list_head *, unsigned int *,
+		      unsigned long *, unsigned int);
+void rds_ib_free_fmr_list(struct rds_ib_mr *);
 #endif
diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index 9e608d9..0e84843 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -333,15 +333,12 @@ static void list_to_llist_nodes(struct rds_ib_mr_pool *pool,
 int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
 			 int free_all, struct rds_ib_mr **ibmr_ret)
 {
-	struct rds_ib_mr *ibmr, *next;
-	struct rds_ib_fmr *fmr;
+	struct rds_ib_mr *ibmr;
 	struct llist_node *clean_nodes;
 	struct llist_node *clean_tail;
 	LIST_HEAD(unmap_list);
-	LIST_HEAD(fmr_list);
 	unsigned long unpinned = 0;
 	unsigned int nfreed = 0, dirty_to_clean = 0, free_goal;
-	int ret = 0;
 
 	if (pool->pool_type == RDS_IB_MR_8K_POOL)
 		rds_ib_stats_inc(s_ib_rdma_mr_8k_pool_flush);
@@ -395,33 +392,7 @@ int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
 	if (list_empty(&unmap_list))
 		goto out;
 
-	/* String all ib_mr's onto one list and hand them to ib_unmap_fmr */
-	list_for_each_entry(ibmr, &unmap_list, unmap_list) {
-		fmr = &ibmr->u.fmr;
-		list_add(&fmr->fmr->list, &fmr_list);
-	}
-
-	ret = ib_unmap_fmr(&fmr_list);
-	if (ret)
-		printk(KERN_WARNING "RDS/IB: ib_unmap_fmr failed (err=%d)\n", ret);
-
-	/* Now we can destroy the DMA mapping and unpin any pages */
-	list_for_each_entry_safe(ibmr, next, &unmap_list, unmap_list) {
-		unpinned += ibmr->sg_len;
-		fmr = &ibmr->u.fmr;
-		__rds_ib_teardown_mr(ibmr);
-		if (nfreed < free_goal ||
-		    ibmr->remap_count >= pool->fmr_attr.max_maps) {
-			if (ibmr->pool->pool_type == RDS_IB_MR_8K_POOL)
-				rds_ib_stats_inc(s_ib_rdma_mr_8k_free);
-			else
-				rds_ib_stats_inc(s_ib_rdma_mr_1m_free);
-			list_del(&ibmr->unmap_list);
-			ib_dealloc_fmr(fmr->fmr);
-			kfree(ibmr);
-			nfreed++;
-		}
-	}
+	rds_ib_unreg_fmr(&unmap_list, &nfreed, &unpinned, free_goal);
 
 	if (!list_empty(&unmap_list)) {
 		/* we have to make sure that none of the things we're about
@@ -454,7 +425,47 @@ out:
 	if (waitqueue_active(&pool->flush_wait))
 		wake_up(&pool->flush_wait);
 out_nolock:
-	return ret;
+	return 0;
+}
+
+struct rds_ib_mr *rds_ib_try_reuse_ibmr(struct rds_ib_mr_pool *pool)
+{
+	struct rds_ib_mr *ibmr = NULL;
+	int iter = 0;
+
+	if (atomic_read(&pool->dirty_count) >= pool->max_items_soft / 10)
+		queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10);
+
+	while (1) {
+		ibmr = rds_ib_reuse_mr(pool);
+		if (ibmr)
+			return ibmr;
+
+		if (atomic_inc_return(&pool->item_count) <= pool->max_items)
+			break;
+
+		atomic_dec(&pool->item_count);
+
+		if (++iter > 2) {
+			if (pool->pool_type == RDS_IB_MR_8K_POOL)
+				rds_ib_stats_inc(s_ib_rdma_mr_8k_pool_depleted);
+			else
+				rds_ib_stats_inc(s_ib_rdma_mr_1m_pool_depleted);
+			return ERR_PTR(-EAGAIN);
+		}
+
+		/* We do have some empty MRs. Flush them out. */
+		if (pool->pool_type == RDS_IB_MR_8K_POOL)
+			rds_ib_stats_inc(s_ib_rdma_mr_8k_pool_wait);
+		else
+			rds_ib_stats_inc(s_ib_rdma_mr_1m_pool_wait);
+
+		rds_ib_flush_mr_pool(pool, 0, &ibmr);
+		if (ibmr)
+			return ibmr;
+	}
+
+	return ibmr;
 }
 
 static void rds_ib_mr_pool_flush_worker(struct work_struct *work)
@@ -473,10 +484,7 @@ void rds_ib_free_mr(void *trans_private, int invalidate)
 	rdsdebug("RDS/IB: free_mr nents %u\n", ibmr->sg_len);
 
 	/* Return it to the pool's free list */
-	if (ibmr->remap_count >= pool->fmr_attr.max_maps)
-		llist_add(&ibmr->llnode, &pool->drop_list);
-	else
-		llist_add(&ibmr->llnode, &pool->free_list);
+	rds_ib_free_fmr_list(ibmr);
 
 	atomic_add(ibmr->sg_len, &pool->free_pinned);
 	atomic_inc(&pool->dirty_count);
@@ -521,7 +529,6 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
 {
 	struct rds_ib_device *rds_ibdev;
 	struct rds_ib_mr *ibmr = NULL;
-	struct rds_ib_fmr *fmr;
 	int ret;
 
 	rds_ibdev = rds_ib_get_device(rs->rs_bound_addr);
@@ -535,30 +542,17 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
 		goto out;
 	}
 
-	ibmr = rds_ib_alloc_fmr(rds_ibdev, nents);
-	if (IS_ERR(ibmr)) {
-		rds_ib_dev_put(rds_ibdev);
-		return ibmr;
-	}
-
-	fmr = &ibmr->u.fmr;
-	ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents);
-	if (ret == 0)
-		*key_ret = fmr->fmr->rkey;
-	else
-		printk(KERN_WARNING "RDS/IB: map_fmr failed (errno=%d)\n", ret);
-
-	ibmr->device = rds_ibdev;
-	rds_ibdev = NULL;
+	ibmr = rds_ib_reg_fmr(rds_ibdev, sg, nents, key_ret);
+	if (ibmr)
+		rds_ibdev = NULL;
 
  out:
-	if (ret) {
-		if (ibmr)
-			rds_ib_free_mr(ibmr, 0);
-		ibmr = ERR_PTR(ret);
-	}
+	if (!ibmr)
+		pr_warn("RDS/IB: rds_ib_get_mr failed (errno=%d)\n", ret);
+
 	if (rds_ibdev)
 		rds_ib_dev_put(rds_ibdev);
+
 	return ibmr;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 08/13] RDS: IB: add connection info to ibmr
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
                   ` (6 preceding siblings ...)
  2016-02-28  2:19 ` [net-next][PATCH v2 07/13] RDS: IB: move FMR code to its own file Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 09/13] RDS: IB: handle the RDMA CM time wait event Santosh Shilimkar
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

Preperatory patch for FRMR support. From connection info,
we can retrieve cm_id which contains qp handled needed for
work request posting.

We also need to drop the RDS connection on QP error states
where connection handle becomes useful.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/ib_mr.h | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h
index f5c1fcb..add7725 100644
--- a/net/rds/ib_mr.h
+++ b/net/rds/ib_mr.h
@@ -50,18 +50,19 @@ struct rds_ib_fmr {
 
 /* This is stored as mr->r_trans_private. */
 struct rds_ib_mr {
-	struct rds_ib_device	*device;
-	struct rds_ib_mr_pool	*pool;
+	struct rds_ib_device		*device;
+	struct rds_ib_mr_pool		*pool;
+	struct rds_ib_connection	*ic;
 
-	struct llist_node	llnode;
+	struct llist_node		llnode;
 
 	/* unmap_list is for freeing */
-	struct list_head	unmap_list;
-	unsigned int		remap_count;
+	struct list_head		unmap_list;
+	unsigned int			remap_count;
 
-	struct scatterlist	*sg;
-	unsigned int		sg_len;
-	int			sg_dma_len;
+	struct scatterlist		*sg;
+	unsigned int			sg_len;
+	int				sg_dma_len;
 
 	union {
 		struct rds_ib_fmr	fmr;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 09/13] RDS: IB: handle the RDMA CM time wait event
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
                   ` (7 preceding siblings ...)
  2016-02-28  2:19 ` [net-next][PATCH v2 08/13] RDS: IB: add connection info to ibmr Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 10/13] RDS: IB: add mr reused stats Santosh Shilimkar
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

Drop the RDS connection on RDMA_CM_EVENT_TIMEWAIT_EXIT so that
it can reconnect and resume.

While testing fastreg, this error happened in couple of tests but
was getting un-noticed.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/rdma_transport.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index 4f4b3d8..7220beb 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -117,6 +117,14 @@ int rds_rdma_cm_event_handler(struct rdma_cm_id *cm_id,
 		rds_conn_drop(conn);
 		break;
 
+	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
+		if (conn) {
+			pr_info("RDS: RDMA_CM_EVENT_TIMEWAIT_EXIT event: dropping connection %pI4->%pI4\n",
+				&conn->c_laddr, &conn->c_faddr);
+			rds_conn_drop(conn);
+		}
+		break;
+
 	default:
 		/* things like device disconnect? */
 		printk(KERN_ERR "RDS: unknown event %u (%s)!\n",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 10/13] RDS: IB: add mr reused stats
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
                   ` (8 preceding siblings ...)
  2016-02-28  2:19 ` [net-next][PATCH v2 09/13] RDS: IB: handle the RDMA CM time wait event Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 11/13] RDS: IB: add Fastreg MR (FRMR) detection support Santosh Shilimkar
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

Add MR reuse statistics to RDS IB transport.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/ib.h       | 2 ++
 net/rds/ib_rdma.c  | 7 ++++++-
 net/rds/ib_stats.c | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/rds/ib.h b/net/rds/ib.h
index c88cb22..62fe7d5 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -259,6 +259,8 @@ struct rds_ib_statistics {
 	uint64_t	s_ib_rdma_mr_1m_pool_flush;
 	uint64_t	s_ib_rdma_mr_1m_pool_wait;
 	uint64_t	s_ib_rdma_mr_1m_pool_depleted;
+	uint64_t	s_ib_rdma_mr_8k_reused;
+	uint64_t	s_ib_rdma_mr_1m_reused;
 	uint64_t	s_ib_atomic_cswp;
 	uint64_t	s_ib_atomic_fadd;
 };
diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index 0e84843..ec7ea32 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -188,8 +188,13 @@ struct rds_ib_mr *rds_ib_reuse_mr(struct rds_ib_mr_pool *pool)
 	flag = this_cpu_ptr(&clean_list_grace);
 	set_bit(CLEAN_LIST_BUSY_BIT, flag);
 	ret = llist_del_first(&pool->clean_list);
-	if (ret)
+	if (ret) {
 		ibmr = llist_entry(ret, struct rds_ib_mr, llnode);
+		if (pool->pool_type == RDS_IB_MR_8K_POOL)
+			rds_ib_stats_inc(s_ib_rdma_mr_8k_reused);
+		else
+			rds_ib_stats_inc(s_ib_rdma_mr_1m_reused);
+	}
 
 	clear_bit(CLEAN_LIST_BUSY_BIT, flag);
 	preempt_enable();
diff --git a/net/rds/ib_stats.c b/net/rds/ib_stats.c
index d77e044..7e78dca 100644
--- a/net/rds/ib_stats.c
+++ b/net/rds/ib_stats.c
@@ -73,6 +73,8 @@ static const char *const rds_ib_stat_names[] = {
 	"ib_rdma_mr_1m_pool_flush",
 	"ib_rdma_mr_1m_pool_wait",
 	"ib_rdma_mr_1m_pool_depleted",
+	"ib_rdma_mr_8k_reused",
+	"ib_rdma_mr_1m_reused",
 	"ib_atomic_cswp",
 	"ib_atomic_fadd",
 };
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 11/13] RDS: IB: add Fastreg MR (FRMR) detection support
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
                   ` (9 preceding siblings ...)
  2016-02-28  2:19 ` [net-next][PATCH v2 10/13] RDS: IB: add mr reused stats Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  9:08   ` Christoph Hellwig
  2016-02-28  2:19 ` [net-next][PATCH v2 12/13] RDS: IB: allocate extra space on queues for FRMR support Santosh Shilimkar
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

Discovere Fast Memmory Registration support using IB device
IB_DEVICE_MEM_MGT_EXTENSIONS. Certain HCA might support just FRMR
or FMR or both FMR and FRWR. In case both mr type are supported,
default FMR is used.

Default MR is still kept as FMR against what everyone else
is following. Default will be changed to FRMR once the
RDS performance with FRMR is comparable with FMR. The
work is in progress for the same.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
v2: Dropped the module parameter as suggested by David Miller

 net/rds/ib.c    | 10 ++++++++++
 net/rds/ib.h    |  4 ++++
 net/rds/ib_mr.h |  1 +
 3 files changed, 15 insertions(+)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index bb32cb9..b5342fd 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -140,6 +140,12 @@ static void rds_ib_add_one(struct ib_device *device)
 	rds_ibdev->max_wrs = device->attrs.max_qp_wr;
 	rds_ibdev->max_sge = min(device->attrs.max_sge, RDS_IB_MAX_SGE);
 
+	rds_ibdev->has_fr = (device->attrs.device_cap_flags &
+				  IB_DEVICE_MEM_MGT_EXTENSIONS);
+	rds_ibdev->has_fmr = (device->alloc_fmr && device->dealloc_fmr &&
+			    device->map_phys_fmr && device->unmap_fmr);
+	rds_ibdev->use_fastreg = (rds_ibdev->has_fr && !rds_ibdev->has_fmr);
+
 	rds_ibdev->fmr_max_remaps = device->attrs.max_map_per_fmr?: 32;
 	rds_ibdev->max_1m_mrs = device->attrs.max_mr ?
 		min_t(unsigned int, (device->attrs.max_mr / 2),
@@ -178,6 +184,10 @@ static void rds_ib_add_one(struct ib_device *device)
 		 rds_ibdev->fmr_max_remaps, rds_ibdev->max_1m_mrs,
 		 rds_ibdev->max_8k_mrs);
 
+	pr_info("RDS/IB: %s: %s supported and preferred\n",
+		device->name,
+		rds_ibdev->use_fastreg ? "FRMR" : "FMR");
+
 	INIT_LIST_HEAD(&rds_ibdev->ipaddr_list);
 	INIT_LIST_HEAD(&rds_ibdev->conn_list);
 
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 62fe7d5..c5eddc2 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -200,6 +200,10 @@ struct rds_ib_device {
 	struct list_head	conn_list;
 	struct ib_device	*dev;
 	struct ib_pd		*pd;
+	bool                    has_fmr;
+	bool                    has_fr;
+	bool                    use_fastreg;
+
 	unsigned int		max_mrs;
 	struct rds_ib_mr_pool	*mr_1m_pool;
 	struct rds_ib_mr_pool   *mr_8k_pool;
diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h
index add7725..2f9b9c3 100644
--- a/net/rds/ib_mr.h
+++ b/net/rds/ib_mr.h
@@ -93,6 +93,7 @@ struct rds_ib_mr_pool {
 extern struct workqueue_struct *rds_ib_mr_wq;
 extern unsigned int rds_ib_mr_1m_pool_size;
 extern unsigned int rds_ib_mr_8k_pool_size;
+extern bool prefer_frmr;
 
 struct rds_ib_mr_pool *rds_ib_create_mr_pool(struct rds_ib_device *rds_dev,
 					     int npages);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 12/13] RDS: IB: allocate extra space on queues for FRMR support
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
                   ` (10 preceding siblings ...)
  2016-02-28  2:19 ` [net-next][PATCH v2 11/13] RDS: IB: add Fastreg MR (FRMR) detection support Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-02-28  2:19 ` [net-next][PATCH v2 13/13] RDS: IB: Support Fastreg MR (FRMR) memory registration mode Santosh Shilimkar
  2016-03-01 22:33 ` [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 David Miller
  13 siblings, 0 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

Fastreg MR(FRMR) memory registration and invalidation makes use
of work request and completion queues for its operation. Patch
allocates extra queue space towards these operation(s).

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/ib.h    |  4 ++++
 net/rds/ib_cm.c | 16 ++++++++++++----
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/net/rds/ib.h b/net/rds/ib.h
index c5eddc2..eeb0d6c 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -14,6 +14,7 @@
 
 #define RDS_IB_DEFAULT_RECV_WR		1024
 #define RDS_IB_DEFAULT_SEND_WR		256
+#define RDS_IB_DEFAULT_FR_WR		512
 
 #define RDS_IB_DEFAULT_RETRY_COUNT	2
 
@@ -122,6 +123,9 @@ struct rds_ib_connection {
 	struct ib_wc		i_send_wc[RDS_IB_WC_MAX];
 	struct ib_wc		i_recv_wc[RDS_IB_WC_MAX];
 
+	/* To control the number of wrs from fastreg */
+	atomic_t		i_fastreg_wrs;
+
 	/* interrupt handling */
 	struct tasklet_struct	i_send_tasklet;
 	struct tasklet_struct	i_recv_tasklet;
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index 7f68abc..83f4673 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -363,7 +363,7 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 	struct ib_qp_init_attr attr;
 	struct ib_cq_init_attr cq_attr = {};
 	struct rds_ib_device *rds_ibdev;
-	int ret;
+	int ret, fr_queue_space;
 
 	/*
 	 * It's normal to see a null device if an incoming connection races
@@ -373,6 +373,12 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 	if (!rds_ibdev)
 		return -EOPNOTSUPP;
 
+	/* The fr_queue_space is currently set to 512, to add extra space on
+	 * completion queue and send queue. This extra space is used for FRMR
+	 * registration and invalidation work requests
+	 */
+	fr_queue_space = (rds_ibdev->use_fastreg ? RDS_IB_DEFAULT_FR_WR : 0);
+
 	/* add the conn now so that connection establishment has the dev */
 	rds_ib_add_conn(rds_ibdev, conn);
 
@@ -384,7 +390,7 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 	/* Protection domain and memory range */
 	ic->i_pd = rds_ibdev->pd;
 
-	cq_attr.cqe = ic->i_send_ring.w_nr + 1;
+	cq_attr.cqe = ic->i_send_ring.w_nr + fr_queue_space + 1;
 
 	ic->i_send_cq = ib_create_cq(dev, rds_ib_cq_comp_handler_send,
 				     rds_ib_cq_event_handler, conn,
@@ -424,7 +430,7 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 	attr.event_handler = rds_ib_qp_event_handler;
 	attr.qp_context = conn;
 	/* + 1 to allow for the single ack message */
-	attr.cap.max_send_wr = ic->i_send_ring.w_nr + 1;
+	attr.cap.max_send_wr = ic->i_send_ring.w_nr + fr_queue_space + 1;
 	attr.cap.max_recv_wr = ic->i_recv_ring.w_nr + 1;
 	attr.cap.max_send_sge = rds_ibdev->max_sge;
 	attr.cap.max_recv_sge = RDS_IB_RECV_SGE;
@@ -432,6 +438,7 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 	attr.qp_type = IB_QPT_RC;
 	attr.send_cq = ic->i_send_cq;
 	attr.recv_cq = ic->i_recv_cq;
+	atomic_set(&ic->i_fastreg_wrs, RDS_IB_DEFAULT_FR_WR);
 
 	/*
 	 * XXX this can fail if max_*_wr is too large?  Are we supposed
@@ -751,7 +758,8 @@ void rds_ib_conn_shutdown(struct rds_connection *conn)
 		 */
 		wait_event(rds_ib_ring_empty_wait,
 			   rds_ib_ring_empty(&ic->i_recv_ring) &&
-			   (atomic_read(&ic->i_signaled_sends) == 0));
+			   (atomic_read(&ic->i_signaled_sends) == 0) &&
+			   (atomic_read(&ic->i_fastreg_wrs) == RDS_IB_DEFAULT_FR_WR));
 		tasklet_kill(&ic->i_send_tasklet);
 		tasklet_kill(&ic->i_recv_tasklet);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next][PATCH v2 13/13] RDS: IB: Support Fastreg MR (FRMR) memory registration mode
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
                   ` (11 preceding siblings ...)
  2016-02-28  2:19 ` [net-next][PATCH v2 12/13] RDS: IB: allocate extra space on queues for FRMR support Santosh Shilimkar
@ 2016-02-28  2:19 ` Santosh Shilimkar
  2016-03-01 22:33 ` [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 David Miller
  13 siblings, 0 replies; 23+ messages in thread
From: Santosh Shilimkar @ 2016-02-28  2:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar

From: Avinash Repaka <avinash.repaka@oracle.com>

Fastreg MR(FRMR) is another method with which one can
register memory to HCA. Some of the newer HCAs supports only fastreg
mr mode, so we need to add support for it to have RDS functional
on them.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/Makefile  |   2 +-
 net/rds/ib.h      |   1 +
 net/rds/ib_cm.c   |   7 +-
 net/rds/ib_frmr.c | 376 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/rds/ib_mr.h   |  24 ++++
 net/rds/ib_rdma.c |  17 ++-
 6 files changed, 422 insertions(+), 5 deletions(-)
 create mode 100644 net/rds/ib_frmr.c

diff --git a/net/rds/Makefile b/net/rds/Makefile
index bcf5591..0e72bec 100644
--- a/net/rds/Makefile
+++ b/net/rds/Makefile
@@ -6,7 +6,7 @@ rds-y :=	af_rds.o bind.o cong.o connection.o info.o message.o   \
 obj-$(CONFIG_RDS_RDMA) += rds_rdma.o
 rds_rdma-y :=	rdma_transport.o \
 			ib.o ib_cm.o ib_recv.o ib_ring.o ib_send.o ib_stats.o \
-			ib_sysctl.o ib_rdma.o ib_fmr.o
+			ib_sysctl.o ib_rdma.o ib_fmr.o ib_frmr.o
 
 
 obj-$(CONFIG_RDS_TCP) += rds_tcp.o
diff --git a/net/rds/ib.h b/net/rds/ib.h
index eeb0d6c..627fb79 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -349,6 +349,7 @@ int rds_ib_update_ipaddr(struct rds_ib_device *rds_ibdev, __be32 ipaddr);
 void rds_ib_add_conn(struct rds_ib_device *rds_ibdev, struct rds_connection *conn);
 void rds_ib_remove_conn(struct rds_ib_device *rds_ibdev, struct rds_connection *conn);
 void rds_ib_destroy_nodev_conns(void);
+void rds_ib_mr_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc);
 
 /* ib_recv.c */
 int rds_ib_recv_init(void);
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index 83f4673..8764970 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -249,7 +249,12 @@ static void poll_scq(struct rds_ib_connection *ic, struct ib_cq *cq,
 				 (unsigned long long)wc->wr_id, wc->status,
 				 wc->byte_len, be32_to_cpu(wc->ex.imm_data));
 
-			rds_ib_send_cqe_handler(ic, wc);
+			if (wc->wr_id <= ic->i_send_ring.w_nr ||
+			    wc->wr_id == RDS_IB_ACK_WR_ID)
+				rds_ib_send_cqe_handler(ic, wc);
+			else
+				rds_ib_mr_cqe_handler(ic, wc);
+
 		}
 	}
 }
diff --git a/net/rds/ib_frmr.c b/net/rds/ib_frmr.c
new file mode 100644
index 0000000..93ff038
--- /dev/null
+++ b/net/rds/ib_frmr.c
@@ -0,0 +1,376 @@
+/*
+ * Copyright (c) 2016 Oracle.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "ib_mr.h"
+
+static struct rds_ib_mr *rds_ib_alloc_frmr(struct rds_ib_device *rds_ibdev,
+					   int npages)
+{
+	struct rds_ib_mr_pool *pool;
+	struct rds_ib_mr *ibmr = NULL;
+	struct rds_ib_frmr *frmr;
+	int err = 0;
+
+	if (npages <= RDS_MR_8K_MSG_SIZE)
+		pool = rds_ibdev->mr_8k_pool;
+	else
+		pool = rds_ibdev->mr_1m_pool;
+
+	ibmr = rds_ib_try_reuse_ibmr(pool);
+	if (ibmr)
+		return ibmr;
+
+	ibmr = kzalloc_node(sizeof(*ibmr), GFP_KERNEL,
+			    rdsibdev_to_node(rds_ibdev));
+	if (!ibmr) {
+		err = -ENOMEM;
+		goto out_no_cigar;
+	}
+
+	frmr = &ibmr->u.frmr;
+	frmr->mr = ib_alloc_mr(rds_ibdev->pd, IB_MR_TYPE_MEM_REG,
+			 pool->fmr_attr.max_pages);
+	if (IS_ERR(frmr->mr)) {
+		pr_warn("RDS/IB: %s failed to allocate MR", __func__);
+		goto out_no_cigar;
+	}
+
+	ibmr->pool = pool;
+	if (pool->pool_type == RDS_IB_MR_8K_POOL)
+		rds_ib_stats_inc(s_ib_rdma_mr_8k_alloc);
+	else
+		rds_ib_stats_inc(s_ib_rdma_mr_1m_alloc);
+
+	if (atomic_read(&pool->item_count) > pool->max_items_soft)
+		pool->max_items_soft = pool->max_items;
+
+	frmr->fr_state = FRMR_IS_FREE;
+	return ibmr;
+
+out_no_cigar:
+	kfree(ibmr);
+	atomic_dec(&pool->item_count);
+	return ERR_PTR(err);
+}
+
+static void rds_ib_free_frmr(struct rds_ib_mr *ibmr, bool drop)
+{
+	struct rds_ib_mr_pool *pool = ibmr->pool;
+
+	if (drop)
+		llist_add(&ibmr->llnode, &pool->drop_list);
+	else
+		llist_add(&ibmr->llnode, &pool->free_list);
+	atomic_add(ibmr->sg_len, &pool->free_pinned);
+	atomic_inc(&pool->dirty_count);
+
+	/* If we've pinned too many pages, request a flush */
+	if (atomic_read(&pool->free_pinned) >= pool->max_free_pinned ||
+	    atomic_read(&pool->dirty_count) >= pool->max_items / 5)
+		queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10);
+}
+
+static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
+{
+	struct rds_ib_frmr *frmr = &ibmr->u.frmr;
+	struct ib_send_wr *failed_wr;
+	struct ib_reg_wr reg_wr;
+	int ret;
+
+	while (atomic_dec_return(&ibmr->ic->i_fastreg_wrs) <= 0) {
+		atomic_inc(&ibmr->ic->i_fastreg_wrs);
+		cpu_relax();
+	}
+
+	ret = ib_map_mr_sg_zbva(frmr->mr, ibmr->sg, ibmr->sg_len, PAGE_SIZE);
+	if (unlikely(ret != ibmr->sg_len))
+		return ret < 0 ? ret : -EINVAL;
+
+	/* Perform a WR for the fast_reg_mr. Each individual page
+	 * in the sg list is added to the fast reg page list and placed
+	 * inside the fast_reg_mr WR.  The key used is a rolling 8bit
+	 * counter, which should guarantee uniqueness.
+	 */
+	ib_update_fast_reg_key(frmr->mr, ibmr->remap_count++);
+	frmr->fr_state = FRMR_IS_INUSE;
+
+	memset(&reg_wr, 0, sizeof(reg_wr));
+	reg_wr.wr.wr_id = (unsigned long)(void *)ibmr;
+	reg_wr.wr.opcode = IB_WR_REG_MR;
+	reg_wr.wr.num_sge = 0;
+	reg_wr.mr = frmr->mr;
+	reg_wr.key = frmr->mr->rkey;
+	reg_wr.access = IB_ACCESS_LOCAL_WRITE |
+			IB_ACCESS_REMOTE_READ |
+			IB_ACCESS_REMOTE_WRITE;
+	reg_wr.wr.send_flags = IB_SEND_SIGNALED;
+
+	failed_wr = &reg_wr.wr;
+	ret = ib_post_send(ibmr->ic->i_cm_id->qp, &reg_wr.wr, &failed_wr);
+	WARN_ON(failed_wr != &reg_wr.wr);
+	if (unlikely(ret)) {
+		/* Failure here can be because of -ENOMEM as well */
+		frmr->fr_state = FRMR_IS_STALE;
+		atomic_inc(&ibmr->ic->i_fastreg_wrs);
+		if (printk_ratelimit())
+			pr_warn("RDS/IB: %s returned error(%d)\n",
+				__func__, ret);
+	}
+	return ret;
+}
+
+static int rds_ib_map_frmr(struct rds_ib_device *rds_ibdev,
+			   struct rds_ib_mr_pool *pool,
+			   struct rds_ib_mr *ibmr,
+			   struct scatterlist *sg, unsigned int sg_len)
+{
+	struct ib_device *dev = rds_ibdev->dev;
+	struct rds_ib_frmr *frmr = &ibmr->u.frmr;
+	int i;
+	u32 len;
+	int ret = 0;
+
+	/* We want to teardown old ibmr values here and fill it up with
+	 * new sg values
+	 */
+	rds_ib_teardown_mr(ibmr);
+
+	ibmr->sg = sg;
+	ibmr->sg_len = sg_len;
+	ibmr->sg_dma_len = 0;
+	frmr->sg_byte_len = 0;
+	WARN_ON(ibmr->sg_dma_len);
+	ibmr->sg_dma_len = ib_dma_map_sg(dev, ibmr->sg, ibmr->sg_len,
+					 DMA_BIDIRECTIONAL);
+	if (unlikely(!ibmr->sg_dma_len)) {
+		pr_warn("RDS/IB: %s failed!\n", __func__);
+		return -EBUSY;
+	}
+
+	frmr->sg_byte_len = 0;
+	frmr->dma_npages = 0;
+	len = 0;
+
+	ret = -EINVAL;
+	for (i = 0; i < ibmr->sg_dma_len; ++i) {
+		unsigned int dma_len = ib_sg_dma_len(dev, &ibmr->sg[i]);
+		u64 dma_addr = ib_sg_dma_address(dev, &ibmr->sg[i]);
+
+		frmr->sg_byte_len += dma_len;
+		if (dma_addr & ~PAGE_MASK) {
+			if (i > 0)
+				goto out_unmap;
+			else
+				++frmr->dma_npages;
+		}
+
+		if ((dma_addr + dma_len) & ~PAGE_MASK) {
+			if (i < ibmr->sg_dma_len - 1)
+				goto out_unmap;
+			else
+				++frmr->dma_npages;
+		}
+
+		len += dma_len;
+	}
+	frmr->dma_npages += len >> PAGE_SHIFT;
+
+	if (frmr->dma_npages > ibmr->pool->fmr_attr.max_pages) {
+		ret = -EMSGSIZE;
+		goto out_unmap;
+	}
+
+	ret = rds_ib_post_reg_frmr(ibmr);
+	if (ret)
+		goto out_unmap;
+
+	if (ibmr->pool->pool_type == RDS_IB_MR_8K_POOL)
+		rds_ib_stats_inc(s_ib_rdma_mr_8k_used);
+	else
+		rds_ib_stats_inc(s_ib_rdma_mr_1m_used);
+
+	return ret;
+
+out_unmap:
+	ib_dma_unmap_sg(rds_ibdev->dev, ibmr->sg, ibmr->sg_len,
+			DMA_BIDIRECTIONAL);
+	ibmr->sg_dma_len = 0;
+	return ret;
+}
+
+static int rds_ib_post_inv(struct rds_ib_mr *ibmr)
+{
+	struct ib_send_wr *s_wr, *failed_wr;
+	struct rds_ib_frmr *frmr = &ibmr->u.frmr;
+	struct rdma_cm_id *i_cm_id = ibmr->ic->i_cm_id;
+	int ret = -EINVAL;
+
+	if (!i_cm_id || !i_cm_id->qp || !frmr->mr)
+		goto out;
+
+	if (frmr->fr_state != FRMR_IS_INUSE)
+		goto out;
+
+	while (atomic_dec_return(&ibmr->ic->i_fastreg_wrs) <= 0) {
+		atomic_inc(&ibmr->ic->i_fastreg_wrs);
+		cpu_relax();
+	}
+
+	frmr->fr_inv = true;
+	s_wr = &frmr->fr_wr;
+
+	memset(s_wr, 0, sizeof(*s_wr));
+	s_wr->wr_id = (unsigned long)(void *)ibmr;
+	s_wr->opcode = IB_WR_LOCAL_INV;
+	s_wr->ex.invalidate_rkey = frmr->mr->rkey;
+	s_wr->send_flags = IB_SEND_SIGNALED;
+
+	failed_wr = s_wr;
+	ret = ib_post_send(i_cm_id->qp, s_wr, &failed_wr);
+	WARN_ON(failed_wr != s_wr);
+	if (unlikely(ret)) {
+		frmr->fr_state = FRMR_IS_STALE;
+		frmr->fr_inv = false;
+		atomic_inc(&ibmr->ic->i_fastreg_wrs);
+		pr_err("RDS/IB: %s returned error(%d)\n", __func__, ret);
+		goto out;
+	}
+out:
+	return ret;
+}
+
+void rds_ib_mr_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc)
+{
+	struct rds_ib_mr *ibmr = (void *)(unsigned long)wc->wr_id;
+	struct rds_ib_frmr *frmr = &ibmr->u.frmr;
+
+	if (wc->status != IB_WC_SUCCESS) {
+		frmr->fr_state = FRMR_IS_STALE;
+		if (rds_conn_up(ic->conn))
+			rds_ib_conn_error(ic->conn,
+					  "frmr completion <%pI4,%pI4> status %u(%s), vendor_err 0x%x, disconnecting and reconnecting\n",
+					  &ic->conn->c_laddr,
+					  &ic->conn->c_faddr,
+					  wc->status,
+					  ib_wc_status_msg(wc->status),
+					  wc->vendor_err);
+	}
+
+	if (frmr->fr_inv) {
+		frmr->fr_state = FRMR_IS_FREE;
+		frmr->fr_inv = false;
+	}
+
+	atomic_inc(&ic->i_fastreg_wrs);
+}
+
+void rds_ib_unreg_frmr(struct list_head *list, unsigned int *nfreed,
+		       unsigned long *unpinned, unsigned int goal)
+{
+	struct rds_ib_mr *ibmr, *next;
+	struct rds_ib_frmr *frmr;
+	int ret = 0;
+	unsigned int freed = *nfreed;
+
+	/* String all ib_mr's onto one list and hand them to ib_unmap_fmr */
+	list_for_each_entry(ibmr, list, unmap_list) {
+		if (ibmr->sg_dma_len)
+			ret |= rds_ib_post_inv(ibmr);
+	}
+	if (ret)
+		pr_warn("RDS/IB: %s failed (err=%d)\n", __func__, ret);
+
+	/* Now we can destroy the DMA mapping and unpin any pages */
+	list_for_each_entry_safe(ibmr, next, list, unmap_list) {
+		*unpinned += ibmr->sg_len;
+		frmr = &ibmr->u.frmr;
+		__rds_ib_teardown_mr(ibmr);
+		if (freed < goal || frmr->fr_state == FRMR_IS_STALE) {
+			/* Don't de-allocate if the MR is not free yet */
+			if (frmr->fr_state == FRMR_IS_INUSE)
+				continue;
+
+			if (ibmr->pool->pool_type == RDS_IB_MR_8K_POOL)
+				rds_ib_stats_inc(s_ib_rdma_mr_8k_free);
+			else
+				rds_ib_stats_inc(s_ib_rdma_mr_1m_free);
+			list_del(&ibmr->unmap_list);
+			if (frmr->mr)
+				ib_dereg_mr(frmr->mr);
+			kfree(ibmr);
+			freed++;
+		}
+	}
+	*nfreed = freed;
+}
+
+struct rds_ib_mr *rds_ib_reg_frmr(struct rds_ib_device *rds_ibdev,
+				  struct rds_ib_connection *ic,
+				  struct scatterlist *sg,
+				  unsigned long nents, u32 *key)
+{
+	struct rds_ib_mr *ibmr = NULL;
+	struct rds_ib_frmr *frmr;
+	int ret;
+
+	do {
+		if (ibmr)
+			rds_ib_free_frmr(ibmr, true);
+		ibmr = rds_ib_alloc_frmr(rds_ibdev, nents);
+		if (IS_ERR(ibmr))
+			return ibmr;
+		frmr = &ibmr->u.frmr;
+	} while (frmr->fr_state != FRMR_IS_FREE);
+
+	ibmr->ic = ic;
+	ibmr->device = rds_ibdev;
+	ret = rds_ib_map_frmr(rds_ibdev, ibmr->pool, ibmr, sg, nents);
+	if (ret == 0) {
+		*key = frmr->mr->rkey;
+	} else {
+		rds_ib_free_frmr(ibmr, false);
+		ibmr = ERR_PTR(ret);
+	}
+
+	return ibmr;
+}
+
+void rds_ib_free_frmr_list(struct rds_ib_mr *ibmr)
+{
+	struct rds_ib_mr_pool *pool = ibmr->pool;
+	struct rds_ib_frmr *frmr = &ibmr->u.frmr;
+
+	if (frmr->fr_state == FRMR_IS_STALE)
+		llist_add(&ibmr->llnode, &pool->drop_list);
+	else
+		llist_add(&ibmr->llnode, &pool->free_list);
+}
diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h
index 2f9b9c3..1c754f4 100644
--- a/net/rds/ib_mr.h
+++ b/net/rds/ib_mr.h
@@ -48,6 +48,21 @@ struct rds_ib_fmr {
 	u64			*dma;
 };
 
+enum rds_ib_fr_state {
+	FRMR_IS_FREE,	/* mr invalidated & ready for use */
+	FRMR_IS_INUSE,	/* mr is in use or used & can be invalidated */
+	FRMR_IS_STALE,	/* Stale MR and needs to be dropped  */
+};
+
+struct rds_ib_frmr {
+	struct ib_mr		*mr;
+	enum rds_ib_fr_state	fr_state;
+	bool			fr_inv;
+	struct ib_send_wr	fr_wr;
+	unsigned int		dma_npages;
+	unsigned int		sg_byte_len;
+};
+
 /* This is stored as mr->r_trans_private. */
 struct rds_ib_mr {
 	struct rds_ib_device		*device;
@@ -66,6 +81,7 @@ struct rds_ib_mr {
 
 	union {
 		struct rds_ib_fmr	fmr;
+		struct rds_ib_frmr	frmr;
 	} u;
 };
 
@@ -88,6 +104,7 @@ struct rds_ib_mr_pool {
 	unsigned long		max_items_soft;
 	unsigned long		max_free_pinned;
 	struct ib_fmr_attr	fmr_attr;
+	bool			use_fastreg;
 };
 
 extern struct workqueue_struct *rds_ib_mr_wq;
@@ -121,4 +138,11 @@ struct rds_ib_mr *rds_ib_try_reuse_ibmr(struct rds_ib_mr_pool *);
 void rds_ib_unreg_fmr(struct list_head *, unsigned int *,
 		      unsigned long *, unsigned int);
 void rds_ib_free_fmr_list(struct rds_ib_mr *);
+struct rds_ib_mr *rds_ib_reg_frmr(struct rds_ib_device *rds_ibdev,
+				  struct rds_ib_connection *ic,
+				  struct scatterlist *sg,
+				  unsigned long nents, u32 *key);
+void rds_ib_unreg_frmr(struct list_head *list, unsigned int *nfreed,
+		       unsigned long *unpinned, unsigned int goal);
+void rds_ib_free_frmr_list(struct rds_ib_mr *);
 #endif
diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index ec7ea32..f7164ac 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -397,7 +397,10 @@ int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
 	if (list_empty(&unmap_list))
 		goto out;
 
-	rds_ib_unreg_fmr(&unmap_list, &nfreed, &unpinned, free_goal);
+	if (pool->use_fastreg)
+		rds_ib_unreg_frmr(&unmap_list, &nfreed, &unpinned, free_goal);
+	else
+		rds_ib_unreg_fmr(&unmap_list, &nfreed, &unpinned, free_goal);
 
 	if (!list_empty(&unmap_list)) {
 		/* we have to make sure that none of the things we're about
@@ -489,7 +492,10 @@ void rds_ib_free_mr(void *trans_private, int invalidate)
 	rdsdebug("RDS/IB: free_mr nents %u\n", ibmr->sg_len);
 
 	/* Return it to the pool's free list */
-	rds_ib_free_fmr_list(ibmr);
+	if (rds_ibdev->use_fastreg)
+		rds_ib_free_frmr_list(ibmr);
+	else
+		rds_ib_free_fmr_list(ibmr);
 
 	atomic_add(ibmr->sg_len, &pool->free_pinned);
 	atomic_inc(&pool->dirty_count);
@@ -534,6 +540,7 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
 {
 	struct rds_ib_device *rds_ibdev;
 	struct rds_ib_mr *ibmr = NULL;
+	struct rds_ib_connection *ic = rs->rs_conn->c_transport_data;
 	int ret;
 
 	rds_ibdev = rds_ib_get_device(rs->rs_bound_addr);
@@ -547,7 +554,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
 		goto out;
 	}
 
-	ibmr = rds_ib_reg_fmr(rds_ibdev, sg, nents, key_ret);
+	if (rds_ibdev->use_fastreg)
+		ibmr = rds_ib_reg_frmr(rds_ibdev, ic, sg, nents, key_ret);
+	else
+		ibmr = rds_ib_reg_fmr(rds_ibdev, sg, nents, key_ret);
 	if (ibmr)
 		rds_ibdev = NULL;
 
@@ -601,6 +611,7 @@ struct rds_ib_mr_pool *rds_ib_create_mr_pool(struct rds_ib_device *rds_ibdev,
 	pool->fmr_attr.max_maps = rds_ibdev->fmr_max_remaps;
 	pool->fmr_attr.page_shift = PAGE_SHIFT;
 	pool->max_items_soft = rds_ibdev->max_mrs * 3 / 4;
+	pool->use_fastreg = rds_ibdev->use_fastreg;
 
 	return pool;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport
  2016-02-28  2:19 ` [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport Santosh Shilimkar
@ 2016-02-28  9:05   ` Christoph Hellwig
  2016-02-28 10:11     ` santosh.shilimkar
  2016-02-28 19:51   ` Or Gerlitz
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-28  9:05 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem, linux-kernel

On Sat, Feb 27, 2016 at 06:19:38PM -0800, Santosh Shilimkar wrote:
> RDS iWarp support code has become stale and non testable. As
> indicated earlier, am dropping the support for it.
> 
> If new iWarp user(s) shows up in future, we can adapat the RDS IB
> transprt for the special RDMA READ sink case. iWarp needs an MR
> for the RDMA READ sink.

Please take a look at the RDMA RW API series I posted yesterday - if
you can adopt RDS to that you should get iWarp support for free.

But having two different codebases for IB/RoCE vs iWarp was always a bad
idea, so great to see the second one retired!

Acked-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next][PATCH v2 11/13] RDS: IB: add Fastreg MR (FRMR) detection support
  2016-02-28  2:19 ` [net-next][PATCH v2 11/13] RDS: IB: add Fastreg MR (FRMR) detection support Santosh Shilimkar
@ 2016-02-28  9:08   ` Christoph Hellwig
  2016-02-28 10:26     ` santosh.shilimkar
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-28  9:08 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem, linux-kernel

On Sat, Feb 27, 2016 at 06:19:48PM -0800, Santosh Shilimkar wrote:
> Discovere Fast Memmory Registration support using IB device
> IB_DEVICE_MEM_MGT_EXTENSIONS. Certain HCA might support just FRMR
> or FMR or both FMR and FRWR. In case both mr type are supported,
> default FMR is used.
> 
> Default MR is still kept as FMR against what everyone else
> is following. Default will be changed to FRMR once the
> RDS performance with FRMR is comparable with FMR. The
> work is in progress for the same.
> 
> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> ---
> v2: Dropped the module parameter as suggested by David Miller

This means we only use the safer method if the HCA doesn't support
the other one.  All other RDMA ULP that support both methods have
a module_param so the veto from Dave is a bit unfortunate.  Anyway,
let's get the code in for now and figure out what to use later.

Just curious:  where / how do you see worse peformance using FRs?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport
  2016-02-28  9:05   ` Christoph Hellwig
@ 2016-02-28 10:11     ` santosh.shilimkar
  0 siblings, 0 replies; 23+ messages in thread
From: santosh.shilimkar @ 2016-02-28 10:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: netdev, davem, linux-kernel



On 2/28/16 1:05 AM, Christoph Hellwig wrote:
> On Sat, Feb 27, 2016 at 06:19:38PM -0800, Santosh Shilimkar wrote:
>> RDS iWarp support code has become stale and non testable. As
>> indicated earlier, am dropping the support for it.
>>
>> If new iWarp user(s) shows up in future, we can adapat the RDS IB
>> transprt for the special RDMA READ sink case. iWarp needs an MR
>> for the RDMA READ sink.
>
> Please take a look at the RDMA RW API series I posted yesterday - if
> you can adopt RDS to that you should get iWarp support for free.
>
Will have a look. Thanks for the pointer.

> But having two different codebases for IB/RoCE vs iWarp was always a bad
> idea, so great to see the second one retired!
>
> Acked-by: Christoph Hellwig <hch@lst.de>
>
Thanks !!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next][PATCH v2 11/13] RDS: IB: add Fastreg MR (FRMR) detection support
  2016-02-28  9:08   ` Christoph Hellwig
@ 2016-02-28 10:26     ` santosh.shilimkar
  0 siblings, 0 replies; 23+ messages in thread
From: santosh.shilimkar @ 2016-02-28 10:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: netdev, davem, linux-kernel


On 2/28/16 1:08 AM, Christoph Hellwig wrote:
> On Sat, Feb 27, 2016 at 06:19:48PM -0800, Santosh Shilimkar wrote:
>> Discovere Fast Memmory Registration support using IB device
>> IB_DEVICE_MEM_MGT_EXTENSIONS. Certain HCA might support just FRMR
>> or FMR or both FMR and FRWR. In case both mr type are supported,
>> default FMR is used.
>>
>> Default MR is still kept as FMR against what everyone else
>> is following. Default will be changed to FRMR once the
>> RDS performance with FRMR is comparable with FMR. The
>> work is in progress for the same.
>>
>> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>> ---
>> v2: Dropped the module parameter as suggested by David Miller
>
> This means we only use the safer method if the HCA doesn't support
> the other one.  All other RDMA ULP that support both methods have
> a module_param so the veto from Dave is a bit unfortunate.  Anyway,
> let's get the code in for now and figure out what to use later.
>
Indeed. It wasn't really deal breaker for RDS so I agreed to drop it.

> Just curious:  where / how do you see worse peformance using FRs?
>
I wouldn't call it worse but its not comparable. Use case
is multi-threaded RDS RDMA perf test(s). Hopefully the follow
up series which am working on should minimise the gap. I am
leaving the details for later, but one of the main issue I
saw was contention on driver post_send() lock from send, MR reg
and MR invalidation.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport
  2016-02-28  2:19 ` [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport Santosh Shilimkar
  2016-02-28  9:05   ` Christoph Hellwig
@ 2016-02-28 19:51   ` Or Gerlitz
  2016-02-29  0:26     ` santosh.shilimkar
  1 sibling, 1 reply; 23+ messages in thread
From: Or Gerlitz @ 2016-02-28 19:51 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: Linux Netdev List, David Miller

On Sun, Feb 28, 2016 at 4:19 AM, Santosh Shilimkar
<santosh.shilimkar@oracle.com> wrote:
> RDS iWarp support code has become stale and non testable. As
> indicated earlier, am dropping the support for it.
>
> If new iWarp user(s) shows up in future, we can adapat the RDS IB
> transprt for the special RDMA READ sink case. iWarp needs an MR
> for the RDMA READ sink.
>
> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Hi, just wondered if there's any special reason that all this series
carries double S.O.B signature line with your name appearing twice on
two different email addresses?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport
  2016-02-28 19:51   ` Or Gerlitz
@ 2016-02-29  0:26     ` santosh.shilimkar
  2016-03-02 10:54       ` Or Gerlitz
  0 siblings, 1 reply; 23+ messages in thread
From: santosh.shilimkar @ 2016-02-29  0:26 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Linux Netdev List, David Miller

On 2/28/16 11:51 AM, Or Gerlitz wrote:
> On Sun, Feb 28, 2016 at 4:19 AM, Santosh Shilimkar
> <santosh.shilimkar@oracle.com> wrote:
>> RDS iWarp support code has become stale and non testable. As
>> indicated earlier, am dropping the support for it.
>>
>> If new iWarp user(s) shows up in future, we can adapat the RDS IB
>> transprt for the special RDMA READ sink case. iWarp needs an MR
>> for the RDMA READ sink.
>>
>> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>
> Hi, just wondered if there's any special reason that all this series
> carries double S.O.B signature line with your name appearing twice on
> two different email addresses?
>
Nothing special. I sign of all my patches with k.org id and have to
keep Oracle id as well being a payed Oracle employee. ;-)

Regards,
Santosh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6
  2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
                   ` (12 preceding siblings ...)
  2016-02-28  2:19 ` [net-next][PATCH v2 13/13] RDS: IB: Support Fastreg MR (FRMR) memory registration mode Santosh Shilimkar
@ 2016-03-01 22:33 ` David Miller
  2016-03-01 23:09   ` santosh shilimkar
  13 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2016-03-01 22:33 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: netdev, linux-kernel


When I try to apply this series, it (strangely) fails on the first patch with:

Applying: RDS: Drop stale iWARP RDMA transport
error: removal patch leaves file contents
error: net/rds/iw.c: patch does not apply
error: removal patch leaves file contents
error: net/rds/iw.h: patch does not apply
error: removal patch leaves file contents
error: net/rds/iw_cm.c: patch does not apply
error: removal patch leaves file contents
error: net/rds/iw_rdma.c: patch does not apply
error: removal patch leaves file contents
error: net/rds/iw_recv.c: patch does not apply
error: removal patch leaves file contents
error: net/rds/iw_ring.c: patch does not apply
error: removal patch leaves file contents
error: net/rds/iw_send.c: patch does not apply
error: removal patch leaves file contents
error: net/rds/iw_stats.c: patch does not apply
error: removal patch leaves file contents
error: net/rds/iw_sysctl.c: patch does not apply

Please sort this out and respin, thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6
  2016-03-01 22:33 ` [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 David Miller
@ 2016-03-01 23:09   ` santosh shilimkar
  0 siblings, 0 replies; 23+ messages in thread
From: santosh shilimkar @ 2016-03-01 23:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel

On 3/1/2016 2:33 PM, David Miller wrote:
>
> When I try to apply this series, it (strangely) fails on the first patch with:
>
Strange indeed since patches and the tree is against net-next.

> Applying: RDS: Drop stale iWARP RDMA transport
> error: removal patch leaves file contents
This patch has file removals and looks like git am/apply won't
work when patch is formatted with "-D". Its good
for review but didn't realize it will create problem for
apply. Sorry for that but I didn't know this issue.

git merge or pull seems to work though when tried from branch
directly.

>
> Please sort this out and respin, thanks.
>
OK. Will send the same series again just with first
patch generated without -D option. Thanks !!

Regards,
Santosh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport
  2016-02-29  0:26     ` santosh.shilimkar
@ 2016-03-02 10:54       ` Or Gerlitz
  0 siblings, 0 replies; 23+ messages in thread
From: Or Gerlitz @ 2016-03-02 10:54 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: Linux Netdev List, David Miller

On Mon, Feb 29, 2016 at 2:26 AM, santosh.shilimkar@oracle.com
<santosh.shilimkar@oracle.com> wrote:
> On 2/28/16 11:51 AM, Or Gerlitz wrote:
>> On Sun, Feb 28, 2016 at 4:19 AM, Santosh Shilimkar <santosh.shilimkar@oracle.com> wrote:

>>> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
>>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

>> Hi, just wondered if there's any special reason that all this series
>> carries double S.O.B signature line with your name appearing twice on
>> two different email addresses?

> Nothing special. I sign of all my patches with k.org id and have to
> keep Oracle id as well being a payed Oracle employee. ;-)

so the kernel logs are carrying forever either 64 or 55 bytes with
your 2nd signature
which are sort of not a must...

Or.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-03-02 10:54 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-28  2:19 [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 Santosh Shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 01/13] RDS: Drop stale iWARP RDMA transport Santosh Shilimkar
2016-02-28  9:05   ` Christoph Hellwig
2016-02-28 10:11     ` santosh.shilimkar
2016-02-28 19:51   ` Or Gerlitz
2016-02-29  0:26     ` santosh.shilimkar
2016-03-02 10:54       ` Or Gerlitz
2016-02-28  2:19 ` [net-next][PATCH v2 02/13] RDS: Add support for SO_TIMESTAMP for incoming messages Santosh Shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 03/13] MAINTAINERS: update RDS entry Santosh Shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 04/13] RDS: IB: Remove the RDS_IB_SEND_OP dependency Santosh Shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 05/13] RDS: IB: Re-organise ibmr code Santosh Shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 06/13] RDS: IB: create struct rds_ib_fmr Santosh Shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 07/13] RDS: IB: move FMR code to its own file Santosh Shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 08/13] RDS: IB: add connection info to ibmr Santosh Shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 09/13] RDS: IB: handle the RDMA CM time wait event Santosh Shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 10/13] RDS: IB: add mr reused stats Santosh Shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 11/13] RDS: IB: add Fastreg MR (FRMR) detection support Santosh Shilimkar
2016-02-28  9:08   ` Christoph Hellwig
2016-02-28 10:26     ` santosh.shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 12/13] RDS: IB: allocate extra space on queues for FRMR support Santosh Shilimkar
2016-02-28  2:19 ` [net-next][PATCH v2 13/13] RDS: IB: Support Fastreg MR (FRMR) memory registration mode Santosh Shilimkar
2016-03-01 22:33 ` [net-next][PATCH v2 00/13] RDS: Major clean-up with couple of new features for 4.6 David Miller
2016-03-01 23:09   ` santosh shilimkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.