linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/24] Second batch of LNet updates
@ 2016-02-22 22:29 James Simmons
  2016-02-22 22:29 ` [PATCH 01/24] staging: lustre: Dynamic LNet Configuration (DLC) IOCTL changes James Simmons
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, James Simmons

This patch set fixes many of the LNet issues encounter run in
production environments. One of the long standing issues was
not being able to reconfigure LNet after initialization. Doing
so left it in a broken state. Several other issues are also
addressed in this patch set. Merged back into this patch set
are also the suggestions for improvement from Dan Carpenter
when the original patch set was posted.

Amir Shehata (14):
  staging: lustre: Dynamic LNet Configuration (DLC) IOCTL changes
  staging: lustre: Dynamic LNet Configuration (DLC) show command
  staging: lustre: fix crash due to NULL networks string
  staging: lustre: DLC user/kernel space glue code
  staging: lustre: improve LNet clean up code and API
  staging: lustre: return appropriate errno when adding route
  staging: lustre: startup lnet acceptor thread dynamically
  staging: lustre: reject invalid net configuration for lnet
  staging: lustre: return -EEXIST if NI is not unique
  staging: lustre: handle lnet_check_routes() errors
  staging: lustre: improvement to router checker
  staging: lustre: prevent assert on LNet module unload
  staging: lustre: remove messages from lazy portal on NI shutdown
  staging: lustre: Allocate the correct number of rtr buffers

Bruno Faccini (1):
  staging: lustre: avoid race during lnet acceptor thread termination

Chris Horn (1):
  staging: lustre: Use lnet_is_route_alive for router aliveness

Doug Oucharek (1):
  staging: lustre: Remove LASSERTS from router checker

Frank Zago (4):
  staging: lustre: make local functions static for LNet ni
  staging: lustre: make some lnet functions static
  staging: lustre: missed a few cases of using NULL instead of 0
  staging: lustre: remove unnecessary EXPORT_SYMBOL from lnet layer

James Simmons (1):
  staging: lustre: use sock.h in only acceptor.c

John L. Hammond (2):
  staging: lustre: remove LUSTRE_{,SRV_}LNET_PID
  staging: lustre: assume a kernel build

 .../staging/lustre/include/linux/libcfs/libcfs.h   |    2 -
 .../lustre/include/linux/libcfs/libcfs_ioctl.h     |   49 +-
 .../lustre/include/linux/libcfs/linux/libcfs.h     |    3 -
 .../staging/lustre/include/linux/lnet/lib-dlc.h    |  122 ++++
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |   22 +
 .../staging/lustre/include/linux/lnet/lib-types.h  |   25 +-
 .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    7 +-
 drivers/staging/lustre/lnet/lnet/acceptor.c        |   21 +-
 drivers/staging/lustre/lnet/lnet/api-ni.c          |  761 +++++++++++++-------
 drivers/staging/lustre/lnet/lnet/config.c          |   25 +-
 drivers/staging/lustre/lnet/lnet/lib-eq.c          |    3 -
 drivers/staging/lustre/lnet/lnet/lib-md.c          |    3 -
 drivers/staging/lustre/lnet/lnet/lib-me.c          |    3 -
 drivers/staging/lustre/lnet/lnet/lib-move.c        |   10 +-
 drivers/staging/lustre/lnet/lnet/lib-msg.c         |   20 +-
 drivers/staging/lustre/lnet/lnet/lib-ptl.c         |   54 +-
 drivers/staging/lustre/lnet/lnet/lib-socket.c      |    3 -
 drivers/staging/lustre/lnet/lnet/module.c          |   81 ++-
 drivers/staging/lustre/lnet/lnet/peer.c            |   63 ++
 drivers/staging/lustre/lnet/lnet/router.c          |  181 ++++--
 drivers/staging/lustre/lnet/lnet/router_proc.c     |    2 -
 drivers/staging/lustre/lnet/selftest/conctl.c      |    9 +-
 drivers/staging/lustre/lnet/selftest/console.c     |    6 +-
 drivers/staging/lustre/lnet/selftest/console.h     |    1 -
 drivers/staging/lustre/lnet/selftest/framework.c   |   10 -
 drivers/staging/lustre/lnet/selftest/module.c      |    4 +-
 drivers/staging/lustre/lnet/selftest/rpc.c         |    4 +-
 .../lustre/lustre/libcfs/linux/linux-module.c      |   55 +-
 drivers/staging/lustre/lustre/libcfs/module.c      |   58 ++-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |    2 +-
 .../staging/lustre/lustre/obdclass/obd_config.c    |    4 +-
 drivers/staging/lustre/lustre/ptlrpc/events.c      |    4 +-
 32 files changed, 1122 insertions(+), 495 deletions(-)
 create mode 100644 drivers/staging/lustre/include/linux/lnet/lib-dlc.h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/24] staging: lustre: Dynamic LNet Configuration (DLC) IOCTL changes
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 02/24] staging: lustre: Dynamic LNet Configuration (DLC) show command James Simmons
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

This is the fourth patch of a set of patches that enables DLC.

This patch changes the IOCTL infrastructure in preparation of
adding extra IOCTL communication between user and kernel space.
The changes include:
- adding a common header to be passed to ioctl infra functions
  instead of passing an exact structure.  This header is meant
  to be included in all structures to be passed through that
  interface.  The IOCTL handler casts this header to a particular
  type that it expects
- All sanity testing on the past in structure is performed in the
  generic ioctl infrastructure code.
- All ioctl handlers changed to take the header instead of a
  particular structure type

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2456
Reviewed-on: http://review.whamcloud.com/8021
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../lustre/include/linux/libcfs/libcfs_ioctl.h     |   23 ++++----
 drivers/staging/lustre/lnet/lnet/module.c          |    4 +-
 drivers/staging/lustre/lnet/selftest/conctl.c      |    9 +++-
 drivers/staging/lustre/lnet/selftest/console.c     |    2 +-
 drivers/staging/lustre/lnet/selftest/console.h     |    1 -
 .../lustre/lustre/libcfs/linux/linux-module.c      |   54 ++++++++------------
 drivers/staging/lustre/lustre/libcfs/module.c      |   51 +++++++++++++-----
 7 files changed, 80 insertions(+), 64 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
index e4463ad..0598702 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
@@ -43,9 +43,13 @@
 
 #define LIBCFS_IOCTL_VERSION 0x0001000a
 
-struct libcfs_ioctl_data {
+struct libcfs_ioctl_hdr {
 	__u32 ioc_len;
 	__u32 ioc_version;
+};
+
+struct libcfs_ioctl_data {
+	struct libcfs_ioctl_hdr ioc_hdr;
 
 	__u64 ioc_nid;
 	__u64 ioc_u64[1];
@@ -70,11 +74,6 @@ struct libcfs_ioctl_data {
 
 #define ioc_priority ioc_u32[0]
 
-struct libcfs_ioctl_hdr {
-	__u32 ioc_len;
-	__u32 ioc_version;
-};
-
 struct libcfs_debug_ioctl_data {
 	struct libcfs_ioctl_hdr hdr;
 	unsigned int subs;
@@ -90,7 +89,7 @@ do {						    \
 
 struct libcfs_ioctl_handler {
 	struct list_head item;
-	int (*handle_ioctl)(unsigned int cmd, struct libcfs_ioctl_data *data);
+	int (*handle_ioctl)(unsigned int cmd, struct libcfs_ioctl_hdr *hdr);
 };
 
 #define DECLARE_IOCTL_HANDLER(ident, func)		      \
@@ -148,9 +147,9 @@ static inline int libcfs_ioctl_packlen(struct libcfs_ioctl_data *data)
 	return len;
 }
 
-static inline int libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data)
+static inline bool libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data)
 {
-	if (data->ioc_len > (1<<30)) {
+	if (data->ioc_hdr.ioc_len > (1 << 30)) {
 		CERROR("LIBCFS ioctl: ioc_len larger than 1<<30\n");
 		return 1;
 	}
@@ -186,7 +185,7 @@ static inline int libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data)
 		CERROR("LIBCFS ioctl: plen2 nonzero but no pbuf2 pointer\n");
 		return 1;
 	}
-	if ((__u32)libcfs_ioctl_packlen(data) != data->ioc_len) {
+	if ((__u32)libcfs_ioctl_packlen(data) != data->ioc_hdr.ioc_len) {
 		CERROR("LIBCFS ioctl: packlen != ioc_len\n");
 		return 1;
 	}
@@ -206,7 +205,9 @@ static inline int libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data)
 
 int libcfs_register_ioctl(struct libcfs_ioctl_handler *hand);
 int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand);
-int libcfs_ioctl_getdata(char *buf, char *end, void __user *arg);
+int libcfs_ioctl_getdata_len(const struct libcfs_ioctl_hdr __user *arg,
+			     __u32 *buf_len);
 int libcfs_ioctl_popdata(void __user *arg, void *buf, int size);
+int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data);
 
 #endif /* __LIBCFS_IOCTL_H__ */
diff --git a/drivers/staging/lustre/lnet/lnet/module.c b/drivers/staging/lustre/lnet/lnet/module.c
index cd37303..46f5241 100644
--- a/drivers/staging/lustre/lnet/lnet/module.c
+++ b/drivers/staging/lustre/lnet/lnet/module.c
@@ -84,7 +84,7 @@ lnet_unconfigure(void)
 }
 
 static int
-lnet_ioctl(unsigned int cmd, struct libcfs_ioctl_data *data)
+lnet_ioctl(unsigned int cmd, struct libcfs_ioctl_hdr *hdr)
 {
 	int rc;
 
@@ -103,7 +103,7 @@ lnet_ioctl(unsigned int cmd, struct libcfs_ioctl_data *data)
 		 */
 		rc = LNetNIInit(LNET_PID_ANY);
 		if (rc >= 0) {
-			rc = LNetCtl(cmd, data);
+			rc = LNetCtl(cmd, hdr);
 			LNetNIFini();
 		}
 		return rc;
diff --git a/drivers/staging/lustre/lnet/selftest/conctl.c b/drivers/staging/lustre/lnet/selftest/conctl.c
index 210e24e..90b7771 100644
--- a/drivers/staging/lustre/lnet/selftest/conctl.c
+++ b/drivers/staging/lustre/lnet/selftest/conctl.c
@@ -801,15 +801,20 @@ out:
 }
 
 int
-lstcon_ioctl_entry(unsigned int cmd, struct libcfs_ioctl_data *data)
+lstcon_ioctl_entry(unsigned int cmd, struct libcfs_ioctl_hdr *hdr)
 {
 	char   *buf;
-	int     opc = data->ioc_u32[0];
+	struct libcfs_ioctl_data *data;
+	int     opc;
 	int     rc;
 
 	if (cmd != IOC_LIBCFS_LNETST)
 		return -EINVAL;
 
+	data = container_of(hdr, struct libcfs_ioctl_data, ioc_hdr);
+
+	opc = data->ioc_u32[0];
+
 	if (data->ioc_plen1 > PAGE_CACHE_SIZE)
 		return -EINVAL;
 
diff --git a/drivers/staging/lustre/lnet/selftest/console.c b/drivers/staging/lustre/lnet/selftest/console.c
index 1385dc0..badc696 100644
--- a/drivers/staging/lustre/lnet/selftest/console.c
+++ b/drivers/staging/lustre/lnet/selftest/console.c
@@ -1983,7 +1983,7 @@ static void lstcon_init_acceptor_service(void)
 	lstcon_acceptor_service.sv_wi_total = SFW_FRWK_WI_MAX;
 }
 
-extern int lstcon_ioctl_entry(unsigned int cmd, struct libcfs_ioctl_data *data);
+extern int lstcon_ioctl_entry(unsigned int cmd, struct libcfs_ioctl_hdr *hdr);
 
 static DECLARE_IOCTL_HANDLER(lstcon_ioctl_handler, lstcon_ioctl_entry);
 
diff --git a/drivers/staging/lustre/lnet/selftest/console.h b/drivers/staging/lustre/lnet/selftest/console.h
index b7e14e4..c9d1081 100644
--- a/drivers/staging/lustre/lnet/selftest/console.h
+++ b/drivers/staging/lustre/lnet/selftest/console.h
@@ -184,7 +184,6 @@ lstcon_id2hash(lnet_process_id_t id, struct list_head *hash)
 }
 
 int lstcon_console_init(void);
-int lstcon_ioctl_entry(unsigned int cmd, struct libcfs_ioctl_data *data);
 int lstcon_console_fini(void);
 int lstcon_session_match(lst_sid_t sid);
 int lstcon_session_new(char *name, int key, unsigned version,
diff --git a/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c b/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c
index ff90772..f62c5bc 100644
--- a/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c
+++ b/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c
@@ -40,41 +40,10 @@
 
 #define LNET_MINOR 240
 
-int libcfs_ioctl_getdata(char *buf, char *end, void __user *arg)
+int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data)
 {
-	struct libcfs_ioctl_hdr   *hdr;
-	struct libcfs_ioctl_data  *data;
-	int orig_len;
-
-	hdr = (struct libcfs_ioctl_hdr *)buf;
-	data = (struct libcfs_ioctl_data *)buf;
-
-	if (copy_from_user(buf, arg, sizeof(*hdr)))
-		return -EFAULT;
-
-	if (hdr->ioc_version != LIBCFS_IOCTL_VERSION) {
-		CERROR("PORTALS: version mismatch kernel vs application\n");
-		return -EINVAL;
-	}
-
-	if (hdr->ioc_len >= end - buf) {
-		CERROR("PORTALS: user buffer exceeds kernel buffer\n");
-		return -EINVAL;
-	}
-
-	if (hdr->ioc_len < sizeof(struct libcfs_ioctl_data)) {
-		CERROR("PORTALS: user buffer too small for ioctl\n");
-		return -EINVAL;
-	}
-
-	orig_len = hdr->ioc_len;
-	if (copy_from_user(buf, arg, hdr->ioc_len))
-		return -EFAULT;
-	if (orig_len != data->ioc_len)
-		return -EINVAL;
-
 	if (libcfs_ioctl_is_invalid(data)) {
-		CERROR("PORTALS: ioctl not correctly formatted\n");
+		CERROR("LNET: ioctl not correctly formatted\n");
 		return -EINVAL;
 	}
 
@@ -88,6 +57,25 @@ int libcfs_ioctl_getdata(char *buf, char *end, void __user *arg)
 	return 0;
 }
 
+int libcfs_ioctl_getdata_len(const struct libcfs_ioctl_hdr __user *arg,
+			     __u32 *len)
+{
+	struct libcfs_ioctl_hdr hdr;
+
+	if (copy_from_user(&hdr, arg, sizeof(hdr)))
+		return -EFAULT;
+
+	if (hdr.ioc_version != LIBCFS_IOCTL_VERSION) {
+		CERROR("LNET: version mismatch expected %#x, got %#x\n",
+		       LIBCFS_IOCTL_VERSION, hdr.ioc_version);
+		return -EINVAL;
+	}
+
+	*len = hdr.ioc_len;
+
+	return 0;
+}
+
 int libcfs_ioctl_popdata(void __user *arg, void *data, int size)
 {
 	if (copy_to_user(arg, data, size))
diff --git a/drivers/staging/lustre/lustre/libcfs/module.c b/drivers/staging/lustre/lustre/libcfs/module.c
index ea3dc9b..1cb6c80 100644
--- a/drivers/staging/lustre/lustre/libcfs/module.c
+++ b/drivers/staging/lustre/lustre/libcfs/module.c
@@ -54,6 +54,8 @@
 
 # define DEBUG_SUBSYSTEM S_LNET
 
+#define LIBCFS_MAX_IOCTL_BUF_LEN 2048
+
 #include "../../include/linux/libcfs/libcfs.h"
 #include <asm/div64.h>
 
@@ -115,11 +117,20 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand)
 }
 EXPORT_SYMBOL(libcfs_deregister_ioctl);
 
-static int libcfs_ioctl_int(struct cfs_psdev_file *pfile, unsigned long cmd,
-			    void __user *arg, struct libcfs_ioctl_data *data)
+static int libcfs_ioctl_handle(struct cfs_psdev_file *pfile, unsigned long cmd,
+			       void *arg, struct libcfs_ioctl_hdr *hdr)
 {
+	struct libcfs_ioctl_data *data = NULL;
 	int err = -EINVAL;
 
+	if ((cmd <= IOC_LIBCFS_LNETST) ||
+	    (cmd >= IOC_LIBCFS_REGISTER_MYNID)) {
+		data = container_of(hdr, struct libcfs_ioctl_data, ioc_hdr);
+		err = libcfs_ioctl_data_adjust(data);
+		if (err)
+			return err;
+	}
+
 	switch (cmd) {
 	case IOC_LIBCFS_CLEAR_DEBUG:
 		libcfs_debug_clear_buffer();
@@ -141,11 +152,11 @@ static int libcfs_ioctl_int(struct cfs_psdev_file *pfile, unsigned long cmd,
 		err = -EINVAL;
 		down_read(&ioctl_list_sem);
 		list_for_each_entry(hand, &ioctl_list, item) {
-			err = hand->handle_ioctl(cmd, data);
+			err = hand->handle_ioctl(cmd, hdr);
 			if (err != -EINVAL) {
 				if (err == 0)
 					err = libcfs_ioctl_popdata(arg,
-							data, sizeof(*data));
+							hdr, hdr->ioc_len);
 				break;
 			}
 		}
@@ -160,26 +171,38 @@ static int libcfs_ioctl_int(struct cfs_psdev_file *pfile, unsigned long cmd,
 static int libcfs_ioctl(struct cfs_psdev_file *pfile, unsigned long cmd,
 			void __user *arg)
 {
-	char    *buf;
-	struct libcfs_ioctl_data *data;
+	struct libcfs_ioctl_hdr *hdr;
 	int err = 0;
+	__u32 buf_len;
 
-	LIBCFS_ALLOC_GFP(buf, 1024, GFP_KERNEL);
-	if (!buf)
+	err = libcfs_ioctl_getdata_len(arg, &buf_len);
+	if (err)
+		return err;
+
+	/*
+	 * do a check here to restrict the size of the memory
+	 * to allocate to guard against DoS attacks.
+	 */
+	if (buf_len > LIBCFS_MAX_IOCTL_BUF_LEN) {
+		CERROR("LNET: user buffer exceeds kernel buffer\n");
+		return -EINVAL;
+	}
+
+	LIBCFS_ALLOC_GFP(hdr, buf_len, GFP_KERNEL);
+	if (!hdr)
 		return -ENOMEM;
 
 	/* 'cmd' and permissions get checked in our arch-specific caller */
-	if (libcfs_ioctl_getdata(buf, buf + 800, arg)) {
-		CERROR("PORTALS ioctl: data error\n");
-		err = -EINVAL;
+	if (copy_from_user(hdr, arg, buf_len)) {
+		CERROR("LNET ioctl: data error\n");
+		err = -EFAULT;
 		goto out;
 	}
-	data = (struct libcfs_ioctl_data *)buf;
 
-	err = libcfs_ioctl_int(pfile, cmd, arg, data);
+	err = libcfs_ioctl_handle(pfile, cmd, arg, hdr);
 
 out:
-	LIBCFS_FREE(buf, 1024);
+	LIBCFS_FREE(hdr, buf_len);
 	return err;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/24] staging: lustre: Dynamic LNet Configuration (DLC) show command
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
  2016-02-22 22:29 ` [PATCH 01/24] staging: lustre: Dynamic LNet Configuration (DLC) IOCTL changes James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 03/24] staging: lustre: fix crash due to NULL networks string James Simmons
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

This is the fifth patch of a set of patches that enables DLC.

This patch adds the new structures which will be used
in the IOCTL communication.  It also added a set of
show operations to show buffers, networks, statistics
and peer information.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2456
Reviewed-on: http://review.whamcloud.com/8022
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../lustre/include/linux/libcfs/libcfs_ioctl.h     |   26 ++++-
 .../staging/lustre/include/linux/lnet/lib-dlc.h    |  122 ++++++++++++++++++++
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    5 +
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   56 ++++++++-
 drivers/staging/lustre/lnet/lnet/module.c          |    4 +
 drivers/staging/lustre/lnet/lnet/peer.c            |   61 ++++++++++
 .../lustre/lustre/libcfs/linux/linux-module.c      |    3 +-
 drivers/staging/lustre/lustre/libcfs/module.c      |   15 ++-
 8 files changed, 277 insertions(+), 15 deletions(-)
 create mode 100644 drivers/staging/lustre/include/linux/lnet/lib-dlc.h

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
index 0598702..f788631 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
@@ -41,7 +41,8 @@
 #ifndef __LIBCFS_IOCTL_H__
 #define __LIBCFS_IOCTL_H__
 
-#define LIBCFS_IOCTL_VERSION 0x0001000a
+#define LIBCFS_IOCTL_VERSION	0x0001000a
+#define LIBCFS_IOCTL_VERSION2	0x0001000b
 
 struct libcfs_ioctl_hdr {
 	__u32 ioc_len;
@@ -111,9 +112,6 @@ struct libcfs_ioctl_handler {
 /* lnet ioctls */
 #define IOC_LIBCFS_GET_NI		  _IOWR('e', 50, long)
 #define IOC_LIBCFS_FAIL_NID		_IOWR('e', 51, long)
-#define IOC_LIBCFS_ADD_ROUTE	       _IOWR('e', 52, long)
-#define IOC_LIBCFS_DEL_ROUTE	       _IOWR('e', 53, long)
-#define IOC_LIBCFS_GET_ROUTE	       _IOWR('e', 54, long)
 #define IOC_LIBCFS_NOTIFY_ROUTER	   _IOWR('e', 55, long)
 #define IOC_LIBCFS_UNCONFIGURE	     _IOWR('e', 56, long)
 /*	#define IOC_LIBCFS_PORTALS_COMPATIBILITY   _IOWR('e', 57, long) */
@@ -136,7 +134,25 @@ struct libcfs_ioctl_handler {
 #define IOC_LIBCFS_DEL_INTERFACE	   _IOWR('e', 79, long)
 #define IOC_LIBCFS_GET_INTERFACE	   _IOWR('e', 80, long)
 
-#define IOC_LIBCFS_MAX_NR			     80
+/*
+ * DLC Specific IOCTL numbers.
+ * In order to maintain backward compatibility with any possible external
+ * tools which might be accessing the IOCTL numbers, a new group of IOCTL
+ * number have been allocated.
+ */
+#define IOCTL_CONFIG_SIZE		struct lnet_ioctl_config_data
+#define IOC_LIBCFS_ADD_ROUTE		_IOWR(IOC_LIBCFS_TYPE, 81, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_DEL_ROUTE		_IOWR(IOC_LIBCFS_TYPE, 82, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_ROUTE		_IOWR(IOC_LIBCFS_TYPE, 83, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_ADD_NET		_IOWR(IOC_LIBCFS_TYPE, 84, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_DEL_NET		_IOWR(IOC_LIBCFS_TYPE, 85, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_NET		_IOWR(IOC_LIBCFS_TYPE, 86, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_CONFIG_RTR		_IOWR(IOC_LIBCFS_TYPE, 87, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_ADD_BUF		_IOWR(IOC_LIBCFS_TYPE, 88, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_BUF		_IOWR(IOC_LIBCFS_TYPE, 89, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_PEER_INFO	_IOWR(IOC_LIBCFS_TYPE, 90, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_LNET_STATS	_IOWR(IOC_LIBCFS_TYPE, 91, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_MAX_NR		91
 
 static inline int libcfs_ioctl_packlen(struct libcfs_ioctl_data *data)
 {
diff --git a/drivers/staging/lustre/include/linux/lnet/lib-dlc.h b/drivers/staging/lustre/include/linux/lnet/lib-dlc.h
new file mode 100644
index 0000000..84a19e9
--- /dev/null
+++ b/drivers/staging/lustre/include/linux/lnet/lib-dlc.h
@@ -0,0 +1,122 @@
+/*
+ * LGPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library.
+ *
+ * LGPL HEADER END
+ *
+ */
+/*
+ * Copyright (c) 2014, Intel Corporation.
+ */
+/*
+ * Author: Amir Shehata <amir.shehata@intel.com>
+ */
+
+#ifndef LNET_DLC_H
+#define LNET_DLC_H
+
+#include "../libcfs/libcfs_ioctl.h"
+#include "types.h"
+
+#define MAX_NUM_SHOW_ENTRIES	32
+#define LNET_MAX_STR_LEN	128
+#define LNET_MAX_SHOW_NUM_CPT	128
+#define LNET_UNDEFINED_HOPS	((__u32) -1)
+
+struct lnet_ioctl_net_config {
+	char ni_interfaces[LNET_MAX_INTERFACES][LNET_MAX_STR_LEN];
+	__u32 ni_status;
+	__u32 ni_cpts[LNET_MAX_SHOW_NUM_CPT];
+};
+
+#define LNET_TINY_BUF_IDX	0
+#define LNET_SMALL_BUF_IDX	1
+#define LNET_LARGE_BUF_IDX	2
+
+/* # different router buffer pools */
+#define LNET_NRBPOOLS		(LNET_LARGE_BUF_IDX + 1)
+
+struct lnet_ioctl_pool_cfg {
+	struct {
+		__u32 pl_npages;
+		__u32 pl_nbuffers;
+		__u32 pl_credits;
+		__u32 pl_mincredits;
+	} pl_pools[LNET_NRBPOOLS];
+	__u32 pl_routing;
+};
+
+struct lnet_ioctl_config_data {
+	struct libcfs_ioctl_hdr cfg_hdr;
+
+	__u32 cfg_net;
+	__u32 cfg_count;
+	__u64 cfg_nid;
+	__u32 cfg_ncpts;
+
+	union {
+		struct {
+			__u32 rtr_hop;
+			__u32 rtr_priority;
+			__u32 rtr_flags;
+		} cfg_route;
+		struct {
+			char net_intf[LNET_MAX_STR_LEN];
+			__s32 net_peer_timeout;
+			__s32 net_peer_tx_credits;
+			__s32 net_peer_rtr_credits;
+			__s32 net_max_tx_credits;
+			__u32 net_cksum_algo;
+			__u32 net_pad;
+		} cfg_net;
+		struct {
+			__u32 buf_enable;
+			__s32 buf_tiny;
+			__s32 buf_small;
+			__s32 buf_large;
+		} cfg_buffers;
+	} cfg_config_u;
+
+	char cfg_bulk[0];
+};
+
+struct lnet_ioctl_peer {
+	struct libcfs_ioctl_hdr pr_hdr;
+	__u32 pr_count;
+	__u32 pr_pad;
+	__u64 pr_nid;
+
+	union {
+		struct {
+			char cr_aliveness[LNET_MAX_STR_LEN];
+			__u32 cr_refcount;
+			__u32 cr_ni_peer_tx_credits;
+			__u32 cr_peer_tx_credits;
+			__u32 cr_peer_rtr_credits;
+			__u32 cr_peer_min_rtr_credits;
+			__u32 cr_peer_tx_qnob;
+			__u32 cr_ncpt;
+		} pr_peer_credits;
+	} pr_lnd_u;
+};
+
+struct lnet_ioctl_lnet_stats {
+	struct libcfs_ioctl_hdr st_hdr;
+	struct lnet_counters st_cntrs;
+};
+
+#endif /* LNET_DLC_H */
diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 0592e30..1157819 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -693,6 +693,11 @@ void lnet_peer_tables_cleanup(lnet_ni_t *ni);
 void lnet_peer_tables_destroy(void);
 int lnet_peer_tables_create(void);
 void lnet_debug_peer(lnet_nid_t nid);
+int lnet_get_peers(int count, __u64 *nid, char *alivness,
+		   int *ncpt, int *refcount,
+		   int *ni_peer_tx_credits, int *peer_tx_credits,
+		   int *peer_rtr_credits, int *peer_min_rtr_credtis,
+		   int *peer_tx_qnob);
 
 static inline void
 lnet_peer_set_alive(lnet_peer_t *lp)
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 933f345..b2b914a 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -39,6 +39,7 @@
 #include <linux/ktime.h>
 
 #include "../../include/linux/lnet/lib-lnet.h"
+#include "../../include/linux/lnet/lib-dlc.h"
 
 #define D_LNI D_CONSOLE
 
@@ -1743,6 +1744,7 @@ int
 LNetCtl(unsigned int cmd, void *arg)
 {
 	struct libcfs_ioctl_data *data = arg;
+	struct lnet_ioctl_config_data *config;
 	lnet_process_id_t id = {0};
 	lnet_ni_t *ni;
 	int rc;
@@ -1767,16 +1769,60 @@ LNetCtl(unsigned int cmd, void *arg)
 		return rc ? rc : lnet_check_routes();
 
 	case IOC_LIBCFS_DEL_ROUTE:
+		config = arg;
+
+		if (config->cfg_hdr.ioc_len < sizeof(*config))
+			return -EINVAL;
+
 		mutex_lock(&the_lnet.ln_api_mutex);
-		rc = lnet_del_route(data->ioc_net, data->ioc_nid);
+		rc = lnet_del_route(config->cfg_net, config->cfg_nid);
 		mutex_unlock(&the_lnet.ln_api_mutex);
 		return rc;
 
 	case IOC_LIBCFS_GET_ROUTE:
-		return lnet_get_route(data->ioc_count,
-				      &data->ioc_net, &data->ioc_count,
-				      &data->ioc_nid, &data->ioc_flags,
-				      &data->ioc_priority);
+		config = arg;
+
+		if (config->cfg_hdr.ioc_len < sizeof(*config))
+			return -EINVAL;
+
+		return lnet_get_route(config->cfg_count,
+				      &config->cfg_net,
+				      &config->cfg_config_u.cfg_route.rtr_hop,
+				      &config->cfg_nid,
+				      &config->cfg_config_u.cfg_route.rtr_flags,
+				      &config->cfg_config_u.cfg_route.rtr_priority);
+
+	case IOC_LIBCFS_ADD_NET:
+		return 0;
+
+	case IOC_LIBCFS_DEL_NET:
+		return 0;
+
+	case IOC_LIBCFS_GET_NET:
+		return 0;
+
+	case IOC_LIBCFS_GET_LNET_STATS: {
+		struct lnet_ioctl_lnet_stats *lnet_stats = arg;
+
+		if (lnet_stats->st_hdr.ioc_len < sizeof(*lnet_stats))
+			return -EINVAL;
+
+		lnet_counters_get(&lnet_stats->st_cntrs);
+		return 0;
+	}
+
+	case IOC_LIBCFS_CONFIG_RTR:
+		return 0;
+
+	case IOC_LIBCFS_ADD_BUF:
+		return 0;
+
+	case IOC_LIBCFS_GET_BUF:
+		return 0;
+
+	case IOC_LIBCFS_GET_PEER_INFO:
+		return 0;
+
 	case IOC_LIBCFS_NOTIFY_ROUTER:
 		secs_passed = (ktime_get_real_seconds() - data->ioc_u64[0]);
 		return lnet_notify(NULL, data->ioc_nid, data->ioc_flags,
diff --git a/drivers/staging/lustre/lnet/lnet/module.c b/drivers/staging/lustre/lnet/lnet/module.c
index 46f5241..27213f1 100644
--- a/drivers/staging/lustre/lnet/lnet/module.c
+++ b/drivers/staging/lustre/lnet/lnet/module.c
@@ -36,6 +36,7 @@
 
 #define DEBUG_SUBSYSTEM S_LNET
 #include "../../include/linux/lnet/lib-lnet.h"
+#include "../../include/linux/lnet/lib-dlc.h"
 
 static int config_on_load;
 module_param(config_on_load, int, 0444);
@@ -95,6 +96,9 @@ lnet_ioctl(unsigned int cmd, struct libcfs_ioctl_hdr *hdr)
 	case IOC_LIBCFS_UNCONFIGURE:
 		return lnet_unconfigure();
 
+	case IOC_LIBCFS_ADD_NET:
+		return LNetCtl(cmd, hdr);
+
 	default:
 		/*
 		 * Passing LNET_PID_ANY only gives me a ref if the net is up
diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c
index 42b2d44..5771708 100644
--- a/drivers/staging/lustre/lnet/lnet/peer.c
+++ b/drivers/staging/lustre/lnet/lnet/peer.c
@@ -39,6 +39,7 @@
 #define DEBUG_SUBSYSTEM S_LNET
 
 #include "../../include/linux/lnet/lib-lnet.h"
+#include "../../include/linux/lnet/lib-dlc.h"
 
 int
 lnet_peer_tables_create(void)
@@ -392,3 +393,63 @@ lnet_debug_peer(lnet_nid_t nid)
 
 	lnet_net_unlock(cpt);
 }
+
+int lnet_get_peers(int count, __u64 *nid, char *aliveness,
+		   int *ncpt, int *refcount,
+		   int *ni_peer_tx_credits, int *peer_tx_credits,
+		   int *peer_rtr_credits, int *peer_min_rtr_credits,
+		   int *peer_tx_qnob)
+{
+	struct lnet_peer_table *peer_table;
+	lnet_peer_t *lp;
+	int j;
+	int lncpt, found = 0;
+
+	/* get the number of CPTs */
+	lncpt = cfs_percpt_number(the_lnet.ln_peer_tables);
+
+	/*
+	 * if the cpt number to be examined is >= the number of cpts in
+	 * the system then indicate that there are no more cpts to examin
+	 */
+	if (*ncpt >= lncpt)
+		return -ENOENT;
+
+	/* get the current table */
+	peer_table = the_lnet.ln_peer_tables[*ncpt];
+	/* if the ptable is NULL then there are no more cpts to examine */
+	if (!peer_table)
+		return -ENOENT;
+
+	lnet_net_lock(*ncpt);
+
+	for (j = 0; j < LNET_PEER_HASH_SIZE && !found; j++) {
+		struct list_head *peers = &peer_table->pt_hash[j];
+
+		list_for_each_entry(lp, peers, lp_hashlist) {
+			if (count-- > 0)
+				continue;
+
+			snprintf(aliveness, LNET_MAX_STR_LEN, "NA");
+			if (lnet_isrouter(lp) ||
+			    lnet_peer_aliveness_enabled(lp))
+				snprintf(aliveness, LNET_MAX_STR_LEN,
+					 lp->lp_alive ? "up" : "down");
+
+			*nid = lp->lp_nid;
+			*refcount = lp->lp_refcount;
+			*ni_peer_tx_credits = lp->lp_ni->ni_peertxcredits;
+			*peer_tx_credits = lp->lp_txcredits;
+			*peer_rtr_credits = lp->lp_rtrcredits;
+			*peer_min_rtr_credits = lp->lp_mintxcredits;
+			*peer_tx_qnob = lp->lp_txqnob;
+
+			found = 1;
+		}
+	}
+	lnet_net_unlock(*ncpt);
+
+	*ncpt = lncpt;
+
+	return found ? 0 : -ENOENT;
+}
diff --git a/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c b/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c
index f62c5bc..ebc60ac 100644
--- a/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c
+++ b/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c
@@ -65,7 +65,8 @@ int libcfs_ioctl_getdata_len(const struct libcfs_ioctl_hdr __user *arg,
 	if (copy_from_user(&hdr, arg, sizeof(hdr)))
 		return -EFAULT;
 
-	if (hdr.ioc_version != LIBCFS_IOCTL_VERSION) {
+	if (hdr.ioc_version != LIBCFS_IOCTL_VERSION &&
+	    hdr.ioc_version != LIBCFS_IOCTL_VERSION2) {
 		CERROR("LNET: version mismatch expected %#x, got %#x\n",
 		       LIBCFS_IOCTL_VERSION, hdr.ioc_version);
 		return -EINVAL;
diff --git a/drivers/staging/lustre/lustre/libcfs/module.c b/drivers/staging/lustre/lustre/libcfs/module.c
index 1cb6c80..05e2c56 100644
--- a/drivers/staging/lustre/lustre/libcfs/module.c
+++ b/drivers/staging/lustre/lustre/libcfs/module.c
@@ -54,13 +54,15 @@
 
 # define DEBUG_SUBSYSTEM S_LNET
 
-#define LIBCFS_MAX_IOCTL_BUF_LEN 2048
+#define LNET_MAX_IOCTL_BUF_LEN (sizeof(struct lnet_ioctl_net_config) + \
+				sizeof(struct lnet_ioctl_config_data))
 
 #include "../../include/linux/libcfs/libcfs.h"
 #include <asm/div64.h>
 
 #include "../../include/linux/libcfs/libcfs_crypto.h"
 #include "../../include/linux/lnet/lib-lnet.h"
+#include "../../include/linux/lnet/lib-dlc.h"
 #include "../../include/linux/lnet/lnet.h"
 #include "tracefile.h"
 
@@ -123,8 +125,13 @@ static int libcfs_ioctl_handle(struct cfs_psdev_file *pfile, unsigned long cmd,
 	struct libcfs_ioctl_data *data = NULL;
 	int err = -EINVAL;
 
-	if ((cmd <= IOC_LIBCFS_LNETST) ||
-	    (cmd >= IOC_LIBCFS_REGISTER_MYNID)) {
+	/*
+	 * The libcfs_ioctl_data_adjust() function performs adjustment
+	 * operations on the libcfs_ioctl_data structure to make
+	 * it usable by the code.  This doesn't need to be called
+	 * for new data structures added.
+	 */
+	if (hdr->ioc_version == LIBCFS_IOCTL_VERSION) {
 		data = container_of(hdr, struct libcfs_ioctl_data, ioc_hdr);
 		err = libcfs_ioctl_data_adjust(data);
 		if (err)
@@ -183,7 +190,7 @@ static int libcfs_ioctl(struct cfs_psdev_file *pfile, unsigned long cmd,
 	 * do a check here to restrict the size of the memory
 	 * to allocate to guard against DoS attacks.
 	 */
-	if (buf_len > LIBCFS_MAX_IOCTL_BUF_LEN) {
+	if (buf_len > LNET_MAX_IOCTL_BUF_LEN) {
 		CERROR("LNET: user buffer exceeds kernel buffer\n");
 		return -EINVAL;
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/24] staging: lustre: fix crash due to NULL networks string
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
  2016-02-22 22:29 ` [PATCH 01/24] staging: lustre: Dynamic LNet Configuration (DLC) IOCTL changes James Simmons
  2016-02-22 22:29 ` [PATCH 02/24] staging: lustre: Dynamic LNet Configuration (DLC) show command James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 04/24] staging: lustre: DLC user/kernel space glue code James Simmons
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

If there is an invalid networks or ip2nets lnet_parse_networks()
gets called with a NULL 'network' string parameter

lnet_parse_networks() needs to sanitize its input string now that
it's being called from multiple places.  Instead, check for
a NULL string everytime the function is called, which reduces the
probability of errors with other code modifications.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5540
Reviewed-on: http://review.whamcloud.com/11626
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |    5 +----
 drivers/staging/lustre/lnet/lnet/config.c |    9 ++++++++-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index b2b914a..c68d01e 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1535,7 +1535,6 @@ LNetNIInit(lnet_pid_t requested_pid)
 	lnet_ping_info_t *pinfo;
 	lnet_handle_md_t md_handle;
 	struct list_head net_head;
-	char *nets;
 
 	INIT_LIST_HEAD(&net_head);
 
@@ -1550,13 +1549,11 @@ LNetNIInit(lnet_pid_t requested_pid)
 		return rc;
 	}
 
-	nets = lnet_get_networks();
-
 	rc = lnet_prepare(requested_pid);
 	if (rc)
 		goto failed0;
 
-	rc = lnet_parse_networks(&net_head, nets);
+	rc = lnet_parse_networks(&net_head, lnet_get_networks());
 	if (rc < 0)
 		goto failed1;
 
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 1ef07cd..013d41b 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -184,7 +184,7 @@ int
 lnet_parse_networks(struct list_head *nilist, char *networks)
 {
 	struct cfs_expr_list *el = NULL;
-	int tokensize = strlen(networks) + 1;
+	int tokensize;
 	char *tokens;
 	char *str;
 	char *tmp;
@@ -192,6 +192,11 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
 	__u32 net;
 	int nnets = 0;
 
+	if (!networks) {
+		CERROR("networks string is undefined\n");
+		return -EINVAL;
+	}
+
 	if (strlen(networks) > LNET_SINGLE_TEXTBUF_NOB) {
 		/* _WAY_ conservative */
 		LCONSOLE_ERROR_MSG(0x112,
@@ -199,6 +204,8 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
 		return -EINVAL;
 	}
 
+	tokensize = strlen(networks) + 1;
+
 	LIBCFS_ALLOC(tokens, tokensize);
 	if (!tokens) {
 		CERROR("Can't allocate net tokens\n");
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/24] staging: lustre: DLC user/kernel space glue code
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (2 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 03/24] staging: lustre: fix crash due to NULL networks string James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 05/24] staging: lustre: make local functions static for LNet ni James Simmons
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

This is the sixth patch of a set of patches that enables DLC.

This patch enables the user space to call into the kernel space
DLC code.  Added handlers in the LNetCtl function to call
the new functions added for Dynamic Lnet Configuration

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
ntel-bug-id: https://jira.hpdd.intel.com/browse/LU-2456
Reviewed-on: http://review.whamcloud.com/8023
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |   22 ++-
 .../staging/lustre/include/linux/lnet/lib-types.h  |    8 +
 drivers/staging/lustre/lnet/lnet/api-ni.c          |  188 ++++++++++++++++++--
 drivers/staging/lustre/lnet/lnet/module.c          |   64 +++++++-
 drivers/staging/lustre/lnet/lnet/peer.c            |   30 ++--
 drivers/staging/lustre/lnet/lnet/router.c          |   67 ++++++--
 6 files changed, 329 insertions(+), 50 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 1157819..5e16fe0 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -39,6 +39,7 @@
 #include "api.h"
 #include "lnet.h"
 #include "lib-types.h"
+#include "lib-dlc.h"
 
 extern lnet_t	the_lnet;	/* THE network */
 
@@ -458,6 +459,12 @@ int lnet_del_route(__u32 net, lnet_nid_t gw_nid);
 void lnet_destroy_routes(void);
 int lnet_get_route(int idx, __u32 *net, __u32 *hops,
 		   lnet_nid_t *gateway, __u32 *alive, __u32 *priority);
+int lnet_get_net_config(int idx, __u32 *cpt_count, __u64 *nid,
+			int *peer_timeout, int *peer_tx_credits,
+			int *peer_rtr_cr, int *max_tx_credits,
+			struct lnet_ioctl_net_config *net_config);
+int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg);
+
 void lnet_router_debugfs_init(void);
 void lnet_router_debugfs_fini(void);
 int  lnet_rtrpools_alloc(int im_a_router);
@@ -467,6 +474,10 @@ int lnet_rtrpools_enable(void);
 void lnet_rtrpools_disable(void);
 void lnet_rtrpools_free(int keep_pools);
 lnet_remotenet_t *lnet_find_net_locked(__u32 net);
+int lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
+		    __s32 peer_timeout, __s32 peer_cr, __s32 peer_buf_cr,
+		    __s32 credits);
+int lnet_dyn_del_ni(__u32 net);
 
 int lnet_islocalnid(lnet_nid_t nid);
 int lnet_islocalnet(__u32 net);
@@ -693,11 +704,12 @@ void lnet_peer_tables_cleanup(lnet_ni_t *ni);
 void lnet_peer_tables_destroy(void);
 int lnet_peer_tables_create(void);
 void lnet_debug_peer(lnet_nid_t nid);
-int lnet_get_peers(int count, __u64 *nid, char *alivness,
-		   int *ncpt, int *refcount,
-		   int *ni_peer_tx_credits, int *peer_tx_credits,
-		   int *peer_rtr_credits, int *peer_min_rtr_credtis,
-		   int *peer_tx_qnob);
+int lnet_get_peer_info(__u32 peer_index, __u64 *nid,
+		       char alivness[LNET_MAX_STR_LEN],
+		       __u32 *cpt_iter, __u32 *refcount,
+		       __u32 *ni_peer_tx_credits, __u32 *peer_tx_credits,
+		       __u32 *peer_rtr_credits, __u32 *peer_min_rtr_credtis,
+		       __u32 *peer_tx_qnob);
 
 static inline void
 lnet_peer_set_alive(lnet_peer_t *lp)
diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index b0ba9d8..e4a8f6e 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -627,6 +627,14 @@ typedef struct {
 	/* test protocol compatibility flags */
 	int				  ln_testprotocompat;
 
+	/*
+	 * 0 - load the NIs from the mod params
+	 * 1 - do not load the NIs from the mod params
+	 * Reverse logic to ensure that other calls to LNetNIInit
+	 * need no change
+	 */
+	bool				  ln_nis_from_mod_params;
+
 } lnet_t;
 
 #endif
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index c68d01e..fa65797 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1553,7 +1553,9 @@ LNetNIInit(lnet_pid_t requested_pid)
 	if (rc)
 		goto failed0;
 
-	rc = lnet_parse_networks(&net_head, lnet_get_networks());
+	rc = lnet_parse_networks(&net_head,
+				 !the_lnet.ln_nis_from_mod_params ?
+				 lnet_get_networks() : "");
 	if (rc < 0)
 		goto failed1;
 
@@ -1668,6 +1670,94 @@ LNetNIFini(void)
 }
 EXPORT_SYMBOL(LNetNIFini);
 
+/**
+ * Grabs the ni data from the ni structure and fills the out
+ * parameters
+ *
+ * \param[in] ni network       interface structure
+ * \param[out] cpt_count       the number of cpts the ni is on
+ * \param[out] nid             Network Interface ID
+ * \param[out] peer_timeout    NI peer timeout
+ * \param[out] peer_tx_crdits  NI peer transmit credits
+ * \param[out] peer_rtr_credits NI peer router credits
+ * \param[out] max_tx_credits  NI max transmit credit
+ * \param[out] net_config      Network configuration
+ */
+static void
+lnet_fill_ni_info(struct lnet_ni *ni, __u32 *cpt_count, __u64 *nid,
+		  int *peer_timeout, int *peer_tx_credits,
+		  int *peer_rtr_credits, int *max_tx_credits,
+		  struct lnet_ioctl_net_config *net_config)
+{
+	int i;
+
+	if (!ni)
+		return;
+
+	if (!net_config)
+		return;
+
+	BUILD_BUG_ON(ARRAY_SIZE(ni->ni_interfaces) !=
+		     ARRAY_SIZE(net_config->ni_interfaces));
+
+	for (i = 0; i < ARRAY_SIZE(ni->ni_interfaces); i++) {
+		if (!ni->ni_interfaces[i])
+			break;
+
+		strncpy(net_config->ni_interfaces[i],
+			ni->ni_interfaces[i],
+			sizeof(net_config->ni_interfaces[i]));
+	}
+
+	*nid = ni->ni_nid;
+	*peer_timeout = ni->ni_peertimeout;
+	*peer_tx_credits = ni->ni_peertxcredits;
+	*peer_rtr_credits = ni->ni_peerrtrcredits;
+	*max_tx_credits = ni->ni_maxtxcredits;
+
+	net_config->ni_status = ni->ni_status->ns_status;
+
+	if (ni->ni_cpts) {
+		int num_cpts = min(ni->ni_ncpts, LNET_MAX_SHOW_NUM_CPT);
+
+		for (i = 0; i < num_cpts; i++)
+			net_config->ni_cpts[i] = ni->ni_cpts[i];
+
+		*cpt_count = num_cpts;
+	}
+}
+
+int
+lnet_get_net_config(int idx, __u32 *cpt_count, __u64 *nid, int *peer_timeout,
+		    int *peer_tx_credits, int *peer_rtr_credits,
+		    int *max_tx_credits,
+		    struct lnet_ioctl_net_config *net_config)
+{
+	struct lnet_ni *ni;
+	struct list_head *tmp;
+	int cpt, i = 0;
+	int rc = -ENOENT;
+
+	cpt = lnet_net_lock_current();
+
+	list_for_each(tmp, &the_lnet.ln_nis) {
+		if (i++ != idx)
+			continue;
+
+		ni = list_entry(tmp, lnet_ni_t, ni_list);
+		lnet_ni_lock(ni);
+		lnet_fill_ni_info(ni, cpt_count, nid, peer_timeout,
+				  peer_tx_credits, peer_rtr_credits,
+				  max_tx_credits, net_config);
+		lnet_ni_unlock(ni);
+		rc = 0;
+		break;
+	}
+
+	lnet_net_unlock(cpt);
+	return rc;
+}
+
 int
 lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
 		__s32 peer_timeout, __s32 peer_cr, __s32 peer_buf_cr,
@@ -1759,9 +1849,16 @@ LNetCtl(unsigned int cmd, void *arg)
 		return lnet_fail_nid(data->ioc_nid, data->ioc_count);
 
 	case IOC_LIBCFS_ADD_ROUTE:
+		config = arg;
+
+		if (config->cfg_hdr.ioc_len < sizeof(*config))
+			return -EINVAL;
+
 		mutex_lock(&the_lnet.ln_api_mutex);
-		rc = lnet_add_route(data->ioc_net, data->ioc_count,
-				    data->ioc_nid, data->ioc_priority);
+		rc = lnet_add_route(config->cfg_net,
+				    config->cfg_config_u.cfg_route.rtr_hop,
+				    config->cfg_nid,
+				    config->cfg_config_u.cfg_route.rtr_priority);
 		mutex_unlock(&the_lnet.ln_api_mutex);
 		return rc ? rc : lnet_check_routes();
 
@@ -1789,14 +1886,29 @@ LNetCtl(unsigned int cmd, void *arg)
 				      &config->cfg_config_u.cfg_route.rtr_flags,
 				      &config->cfg_config_u.cfg_route.rtr_priority);
 
-	case IOC_LIBCFS_ADD_NET:
-		return 0;
+	case IOC_LIBCFS_GET_NET: {
+		struct lnet_ioctl_net_config *net_config;
+		size_t total = sizeof(*config) + sizeof(*net_config);
 
-	case IOC_LIBCFS_DEL_NET:
-		return 0;
+		config = arg;
 
-	case IOC_LIBCFS_GET_NET:
-		return 0;
+		if (config->cfg_hdr.ioc_len < total)
+			return -EINVAL;
+
+		net_config = (struct lnet_ioctl_net_config *)
+				config->cfg_bulk;
+		if (!net_config)
+			return -EINVAL;
+
+		return lnet_get_net_config(config->cfg_count,
+					   &config->cfg_ncpts,
+					   &config->cfg_nid,
+					   &config->cfg_config_u.cfg_net.net_peer_timeout,
+					   &config->cfg_config_u.cfg_net.net_peer_tx_credits,
+					   &config->cfg_config_u.cfg_net.net_peer_rtr_credits,
+					   &config->cfg_config_u.cfg_net.net_max_tx_credits,
+					   net_config);
+	}
 
 	case IOC_LIBCFS_GET_LNET_STATS: {
 		struct lnet_ioctl_lnet_stats *lnet_stats = arg;
@@ -1809,16 +1921,64 @@ LNetCtl(unsigned int cmd, void *arg)
 	}
 
 	case IOC_LIBCFS_CONFIG_RTR:
+		config = arg;
+
+		if (config->cfg_hdr.ioc_len < sizeof(*config))
+			return -EINVAL;
+
+		mutex_lock(&the_lnet.ln_api_mutex);
+		if (config->cfg_config_u.cfg_buffers.buf_enable) {
+			rc = lnet_rtrpools_enable();
+			mutex_unlock(&the_lnet.ln_api_mutex);
+			return rc;
+		}
+		lnet_rtrpools_disable();
+		mutex_unlock(&the_lnet.ln_api_mutex);
 		return 0;
 
 	case IOC_LIBCFS_ADD_BUF:
-		return 0;
+		config = arg;
 
-	case IOC_LIBCFS_GET_BUF:
-		return 0;
+		if (config->cfg_hdr.ioc_len < sizeof(*config))
+			return -EINVAL;
 
-	case IOC_LIBCFS_GET_PEER_INFO:
-		return 0;
+		mutex_lock(&the_lnet.ln_api_mutex);
+		rc = lnet_rtrpools_adjust(config->cfg_config_u.cfg_buffers.buf_tiny,
+					  config->cfg_config_u.cfg_buffers.buf_small,
+					  config->cfg_config_u.cfg_buffers.buf_large);
+		mutex_unlock(&the_lnet.ln_api_mutex);
+		return rc;
+
+	case IOC_LIBCFS_GET_BUF: {
+		struct lnet_ioctl_pool_cfg *pool_cfg;
+		size_t total = sizeof(*config) + sizeof(*pool_cfg);
+
+		config = arg;
+
+		if (config->cfg_hdr.ioc_len < total)
+			return -EINVAL;
+
+		pool_cfg = (struct lnet_ioctl_pool_cfg *)config->cfg_bulk;
+		return lnet_get_rtr_pool_cfg(config->cfg_count, pool_cfg);
+	}
+
+	case IOC_LIBCFS_GET_PEER_INFO: {
+		struct lnet_ioctl_peer *peer_info = arg;
+
+		if (peer_info->pr_hdr.ioc_len < sizeof(*peer_info))
+			return -EINVAL;
+
+		return lnet_get_peer_info(peer_info->pr_count,
+			&peer_info->pr_nid,
+			peer_info->pr_lnd_u.pr_peer_credits.cr_aliveness,
+			&peer_info->pr_lnd_u.pr_peer_credits.cr_ncpt,
+			&peer_info->pr_lnd_u.pr_peer_credits.cr_refcount,
+			&peer_info->pr_lnd_u.pr_peer_credits.cr_ni_peer_tx_credits,
+			&peer_info->pr_lnd_u.pr_peer_credits.cr_peer_tx_credits,
+			&peer_info->pr_lnd_u.pr_peer_credits.cr_peer_rtr_credits,
+			&peer_info->pr_lnd_u.pr_peer_credits.cr_peer_min_rtr_credits,
+			&peer_info->pr_lnd_u.pr_peer_credits.cr_peer_tx_qnob);
+	}
 
 	case IOC_LIBCFS_NOTIFY_ROUTER:
 		secs_passed = (ktime_get_real_seconds() - data->ioc_u64[0]);
diff --git a/drivers/staging/lustre/lnet/lnet/module.c b/drivers/staging/lustre/lnet/lnet/module.c
index 27213f1..e9b1e69 100644
--- a/drivers/staging/lustre/lnet/lnet/module.c
+++ b/drivers/staging/lustre/lnet/lnet/module.c
@@ -85,19 +85,79 @@ lnet_unconfigure(void)
 }
 
 static int
+lnet_dyn_configure(struct libcfs_ioctl_hdr *hdr)
+{
+	struct lnet_ioctl_config_data *conf =
+		(struct lnet_ioctl_config_data *)hdr;
+	int rc;
+
+	if (conf->cfg_hdr.ioc_len < sizeof(*conf))
+		return -EINVAL;
+
+	mutex_lock(&lnet_config_mutex);
+	if (!the_lnet.ln_niinit_self) {
+		rc = -EINVAL;
+		goto out_unlock;
+	}
+	rc = lnet_dyn_add_ni(LUSTRE_SRV_LNET_PID,
+			     conf->cfg_config_u.cfg_net.net_intf,
+			     conf->cfg_config_u.cfg_net.net_peer_timeout,
+			     conf->cfg_config_u.cfg_net.net_peer_tx_credits,
+			     conf->cfg_config_u.cfg_net.net_peer_rtr_credits,
+			     conf->cfg_config_u.cfg_net.net_max_tx_credits);
+out_unlock:
+	mutex_unlock(&lnet_config_mutex);
+
+	return rc;
+}
+
+static int
+lnet_dyn_unconfigure(struct libcfs_ioctl_hdr *hdr)
+{
+	struct lnet_ioctl_config_data *conf =
+		(struct lnet_ioctl_config_data *)hdr;
+	int rc;
+
+	if (conf->cfg_hdr.ioc_len < sizeof(*conf))
+		return -EINVAL;
+
+	mutex_lock(&lnet_config_mutex);
+	if (!the_lnet.ln_niinit_self) {
+		rc = -EINVAL;
+		goto out_unlock;
+	}
+	rc = lnet_dyn_del_ni(conf->cfg_net);
+out_unlock:
+	mutex_unlock(&lnet_config_mutex);
+
+	return rc;
+}
+
+static int
 lnet_ioctl(unsigned int cmd, struct libcfs_ioctl_hdr *hdr)
 {
 	int rc;
 
 	switch (cmd) {
-	case IOC_LIBCFS_CONFIGURE:
+	case IOC_LIBCFS_CONFIGURE: {
+		struct libcfs_ioctl_data *data =
+			(struct libcfs_ioctl_data *)hdr;
+
+		if (data->ioc_hdr.ioc_len < sizeof(*data))
+			return -EINVAL;
+
+		the_lnet.ln_nis_from_mod_params = data->ioc_flags;
 		return lnet_configure(NULL);
+	}
 
 	case IOC_LIBCFS_UNCONFIGURE:
 		return lnet_unconfigure();
 
 	case IOC_LIBCFS_ADD_NET:
-		return LNetCtl(cmd, hdr);
+		return lnet_dyn_configure(hdr);
+
+	case IOC_LIBCFS_DEL_NET:
+		return lnet_dyn_unconfigure(hdr);
 
 	default:
 		/*
diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c
index 5771708..19c80c9 100644
--- a/drivers/staging/lustre/lnet/lnet/peer.c
+++ b/drivers/staging/lustre/lnet/lnet/peer.c
@@ -394,16 +394,18 @@ lnet_debug_peer(lnet_nid_t nid)
 	lnet_net_unlock(cpt);
 }
 
-int lnet_get_peers(int count, __u64 *nid, char *aliveness,
-		   int *ncpt, int *refcount,
-		   int *ni_peer_tx_credits, int *peer_tx_credits,
-		   int *peer_rtr_credits, int *peer_min_rtr_credits,
-		   int *peer_tx_qnob)
+int
+lnet_get_peer_info(__u32 peer_index, __u64 *nid,
+		   char aliveness[LNET_MAX_STR_LEN],
+		   __u32 *cpt_iter, __u32 *refcount,
+		   __u32 *ni_peer_tx_credits, __u32 *peer_tx_credits,
+		   __u32 *peer_rtr_credits, __u32 *peer_min_rtr_credits,
+		   __u32 *peer_tx_qnob)
 {
 	struct lnet_peer_table *peer_table;
 	lnet_peer_t *lp;
-	int j;
-	int lncpt, found = 0;
+	bool found = false;
+	int lncpt, j;
 
 	/* get the number of CPTs */
 	lncpt = cfs_percpt_number(the_lnet.ln_peer_tables);
@@ -412,22 +414,22 @@ int lnet_get_peers(int count, __u64 *nid, char *aliveness,
 	 * if the cpt number to be examined is >= the number of cpts in
 	 * the system then indicate that there are no more cpts to examin
 	 */
-	if (*ncpt >= lncpt)
+	if (*cpt_iter >= lncpt)
 		return -ENOENT;
 
 	/* get the current table */
-	peer_table = the_lnet.ln_peer_tables[*ncpt];
+	peer_table = the_lnet.ln_peer_tables[*cpt_iter];
 	/* if the ptable is NULL then there are no more cpts to examine */
 	if (!peer_table)
 		return -ENOENT;
 
-	lnet_net_lock(*ncpt);
+	lnet_net_lock(*cpt_iter);
 
 	for (j = 0; j < LNET_PEER_HASH_SIZE && !found; j++) {
 		struct list_head *peers = &peer_table->pt_hash[j];
 
 		list_for_each_entry(lp, peers, lp_hashlist) {
-			if (count-- > 0)
+			if (peer_index-- > 0)
 				continue;
 
 			snprintf(aliveness, LNET_MAX_STR_LEN, "NA");
@@ -444,12 +446,12 @@ int lnet_get_peers(int count, __u64 *nid, char *aliveness,
 			*peer_min_rtr_credits = lp->lp_mintxcredits;
 			*peer_tx_qnob = lp->lp_txqnob;
 
-			found = 1;
+			found = true;
 		}
 	}
-	lnet_net_unlock(*ncpt);
+	lnet_net_unlock(*cpt_iter);
 
-	*ncpt = lncpt;
+	*cpt_iter = lncpt;
 
 	return found ? 0 : -ENOENT;
 }
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index e62a1b3..7e7afc4 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -542,6 +542,38 @@ lnet_destroy_routes(void)
 	lnet_del_route(LNET_NIDNET(LNET_NID_ANY), LNET_NID_ANY);
 }
 
+int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg)
+{
+	int i, rc = -ENOENT, j;
+
+	if (!the_lnet.ln_rtrpools)
+		return rc;
+
+	for (i = 0; i < LNET_NRBPOOLS; i++) {
+		lnet_rtrbufpool_t *rbp;
+
+		lnet_net_lock(LNET_LOCK_EX);
+		cfs_percpt_for_each(rbp, j, the_lnet.ln_rtrpools) {
+			if (i++ != idx)
+				continue;
+
+			pool_cfg->pl_pools[i].pl_npages = rbp[i].rbp_npages;
+			pool_cfg->pl_pools[i].pl_nbuffers = rbp[i].rbp_nbuffers;
+			pool_cfg->pl_pools[i].pl_credits = rbp[i].rbp_credits;
+			pool_cfg->pl_pools[i].pl_mincredits = rbp[i].rbp_mincredits;
+			rc = 0;
+			break;
+		}
+		lnet_net_unlock(LNET_LOCK_EX);
+	}
+
+	lnet_net_lock(LNET_LOCK_EX);
+	pool_cfg->pl_routing = the_lnet.ln_routing;
+	lnet_net_unlock(LNET_LOCK_EX);
+
+	return rc;
+}
+
 int
 lnet_get_route(int idx, __u32 *net, __u32 *hops,
 	       lnet_nid_t *gateway, __u32 *alive, __u32 *priority)
@@ -1544,8 +1576,8 @@ lnet_rtrpools_alloc(int im_a_router)
 	return rc;
 }
 
-int
-lnet_rtrpools_adjust(int tiny, int small, int large)
+static int
+lnet_rtrpools_adjust_helper(int tiny, int small, int large)
 {
 	int nrb = 0;
 	int rc = 0;
@@ -1553,19 +1585,10 @@ lnet_rtrpools_adjust(int tiny, int small, int large)
 	lnet_rtrbufpool_t *rtrp;
 
 	/*
-	 * this function doesn't revert the changes if adding new buffers
-	 * failed.  It's up to the user space caller to revert the
-	 * changes.
-	 */
-
-	if (!the_lnet.ln_routing)
-		return 0;
-
-	/*
 	 * If the provided values for each buffer pool are different than the
 	 * configured values, we need to take action.
 	 */
-	if (tiny >= 0 && tiny != tiny_router_buffers) {
+	if (tiny >= 0) {
 		tiny_router_buffers = tiny;
 		nrb = lnet_nrb_tiny_calculate();
 		cfs_percpt_for_each(rtrp, i, the_lnet.ln_rtrpools) {
@@ -1575,7 +1598,7 @@ lnet_rtrpools_adjust(int tiny, int small, int large)
 				return rc;
 		}
 	}
-	if (small >= 0 && small != small_router_buffers) {
+	if (small >= 0) {
 		small_router_buffers = small;
 		nrb = lnet_nrb_small_calculate();
 		cfs_percpt_for_each(rtrp, i, the_lnet.ln_rtrpools) {
@@ -1585,7 +1608,7 @@ lnet_rtrpools_adjust(int tiny, int small, int large)
 				return rc;
 		}
 	}
-	if (large >= 0 && large != large_router_buffers) {
+	if (large >= 0) {
 		large_router_buffers = large;
 		nrb = lnet_nrb_large_calculate();
 		cfs_percpt_for_each(rtrp, i, the_lnet.ln_rtrpools) {
@@ -1600,6 +1623,20 @@ lnet_rtrpools_adjust(int tiny, int small, int large)
 }
 
 int
+lnet_rtrpools_adjust(int tiny, int small, int large)
+{
+	/*
+	 * this function doesn't revert the changes if adding new buffers
+	 * failed.  It's up to the user space caller to revert the
+	 * changes.
+	 */
+	if (!the_lnet.ln_routing)
+		return 0;
+
+	return lnet_rtrpools_adjust_helper(tiny, small, large);
+}
+
+int
 lnet_rtrpools_enable(void)
 {
 	int rc;
@@ -1617,7 +1654,7 @@ lnet_rtrpools_enable(void)
 		 */
 		return lnet_rtrpools_alloc(1);
 
-	rc = lnet_rtrpools_adjust(0, 0, 0);
+	rc = lnet_rtrpools_adjust_helper(0, 0, 0);
 	if (rc)
 		return rc;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/24] staging: lustre: make local functions static for LNet ni
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (3 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 04/24] staging: lustre: DLC user/kernel space glue code James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 06/24] staging: lustre: remove LUSTRE_{,SRV_}LNET_PID James Simmons
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Frank Zago

From: Frank Zago <fzago@cray.com>

The function lnet_unprepare can be made static.

Signed-off-by: Frank Zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5396
Reviewed-on: http://review.whamcloud.com/11306
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index fa65797..7583ae4 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -525,7 +525,7 @@ lnet_res_lh_initialize(struct lnet_res_container *rec, lnet_libhandle_t *lh)
 	list_add(&lh->lh_hash_chain, &rec->rec_lh_hash[hash]);
 }
 
-int lnet_unprepare(void);
+static int lnet_unprepare(void);
 
 static int
 lnet_prepare(lnet_pid_t requested_pid)
@@ -611,7 +611,7 @@ lnet_prepare(lnet_pid_t requested_pid)
 	return rc;
 }
 
-int
+static int
 lnet_unprepare(void)
 {
 	/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/24] staging: lustre: remove LUSTRE_{,SRV_}LNET_PID
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (4 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 05/24] staging: lustre: make local functions static for LNet ni James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 07/24] staging: lustre: improve LNet clean up code and API James Simmons
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, John L. Hammond

From: John L. Hammond <john.hammond@intel.com>

Remove LUSTRE_LNET_PID (12354) and LUSTRE_SRV_LNET_PID (12345) from
the libcfs headers and replace their uses with a new macro
LNET_PID_LUSTRE (also 12345) in lnet/types.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/11985
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
---
 .../staging/lustre/include/linux/libcfs/libcfs.h   |    2 --
 .../lustre/include/linux/libcfs/linux/libcfs.h     |    3 ---
 .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    7 +++++--
 drivers/staging/lustre/lnet/lnet/api-ni.c          |    2 +-
 drivers/staging/lustre/lnet/lnet/lib-move.c        |    2 +-
 drivers/staging/lustre/lnet/lnet/module.c          |    4 ++--
 drivers/staging/lustre/lnet/lnet/router.c          |    2 +-
 drivers/staging/lustre/lnet/selftest/rpc.c         |    2 +-
 drivers/staging/lustre/lustre/ptlrpc/events.c      |    4 ++--
 9 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
index dc9b88f..5c598c8 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
@@ -51,8 +51,6 @@
 #define LERRCHKSUM(hexnum) (((hexnum) & 0xf) ^ ((hexnum) >> 4 & 0xf) ^ \
 			   ((hexnum) >> 8 & 0xf))
 
-#define LUSTRE_SRV_LNET_PID      LUSTRE_LNET_PID
-
 #include <linux/list.h>
 
 /* need both kernel and user-land acceptor */
diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
index aac5900..d94b266 100644
--- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
+++ b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
@@ -118,9 +118,6 @@ do {								    \
 #define CDEBUG_STACK() (0L)
 #endif /* __x86_64__ */
 
-/* initial pid  */
-#define LUSTRE_LNET_PID	  12345
-
 #define __current_nesting_level() (0)
 
 /**
diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
index 49d716d..854814c 100644
--- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
+++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
@@ -1842,7 +1842,10 @@ ksocknal_query(lnet_ni_t *ni, lnet_nid_t nid, unsigned long *when)
 	unsigned long now = cfs_time_current();
 	ksock_peer_t *peer = NULL;
 	rwlock_t *glock = &ksocknal_data.ksnd_global_lock;
-	lnet_process_id_t id = {.nid = nid, .pid = LUSTRE_SRV_LNET_PID};
+	lnet_process_id_t id = {
+		.nid = nid,
+		.pid = LNET_PID_LUSTRE,
+	};
 
 	read_lock(glock);
 
@@ -2187,7 +2190,7 @@ ksocknal_ctl(lnet_ni_t *ni, unsigned int cmd, void *arg)
 
 	case IOC_LIBCFS_ADD_PEER:
 		id.nid = data->ioc_nid;
-		id.pid = LUSTRE_SRV_LNET_PID;
+		id.pid = LNET_PID_LUSTRE;
 		return ksocknal_add_peer(ni, id,
 					  data->ioc_u32[0], /* IP */
 					  data->ioc_u32[1]); /* port */
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 7583ae4..3bed4c3 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -2117,7 +2117,7 @@ static int lnet_ping(lnet_process_id_t id, int timeout_ms,
 		return -EINVAL;
 
 	if (id.pid == LNET_PID_ANY)
-		id.pid = LUSTRE_SRV_LNET_PID;
+		id.pid = LNET_PID_LUSTRE;
 
 	LIBCFS_ALLOC(info, infosz);
 	if (!info)
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index f2b1116..a342ce0 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1407,7 +1407,7 @@ lnet_send(lnet_nid_t src_nid, lnet_msg_t *msg, lnet_nid_t rtr_nid)
 
 		msg->msg_target_is_router = 1;
 		msg->msg_target.nid = lp->lp_nid;
-		msg->msg_target.pid = LUSTRE_SRV_LNET_PID;
+		msg->msg_target.pid = LNET_PID_LUSTRE;
 	}
 
 	/* 'lp' is our best choice of peer */
diff --git a/drivers/staging/lustre/lnet/lnet/module.c b/drivers/staging/lustre/lnet/lnet/module.c
index e9b1e69..8f053d7 100644
--- a/drivers/staging/lustre/lnet/lnet/module.c
+++ b/drivers/staging/lustre/lnet/lnet/module.c
@@ -53,7 +53,7 @@ lnet_configure(void *arg)
 	mutex_lock(&lnet_config_mutex);
 
 	if (!the_lnet.ln_niinit_self) {
-		rc = LNetNIInit(LUSTRE_SRV_LNET_PID);
+		rc = LNetNIInit(LNET_PID_LUSTRE);
 		if (rc >= 0) {
 			the_lnet.ln_niinit_self = 1;
 			rc = 0;
@@ -99,7 +99,7 @@ lnet_dyn_configure(struct libcfs_ioctl_hdr *hdr)
 		rc = -EINVAL;
 		goto out_unlock;
 	}
-	rc = lnet_dyn_add_ni(LUSTRE_SRV_LNET_PID,
+	rc = lnet_dyn_add_ni(LNET_PID_LUSTRE,
 			     conf->cfg_config_u.cfg_net.net_intf,
 			     conf->cfg_config_u.cfg_net.net_peer_timeout,
 			     conf->cfg_config_u.cfg_net.net_peer_tx_credits,
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 7e7afc4..d748931 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -1012,7 +1012,7 @@ lnet_ping_router_locked(lnet_peer_t *rtr)
 		lnet_handle_md_t mdh;
 
 		id.nid = rtr->lp_nid;
-		id.pid = LUSTRE_SRV_LNET_PID;
+		id.pid = LNET_PID_LUSTRE;
 		CDEBUG(D_NET, "Check: %s\n", libcfs_id2str(id));
 
 		rtr->lp_ping_notsent   = 1;
diff --git a/drivers/staging/lustre/lnet/selftest/rpc.c b/drivers/staging/lustre/lnet/selftest/rpc.c
index f95fd9b..4213198 100644
--- a/drivers/staging/lustre/lnet/selftest/rpc.c
+++ b/drivers/staging/lustre/lnet/selftest/rpc.c
@@ -1612,7 +1612,7 @@ srpc_startup(void)
 
 	srpc_data.rpc_state = SRPC_STATE_NONE;
 
-	rc = LNetNIInit(LUSTRE_SRV_LNET_PID);
+	rc = LNetNIInit(LNET_PID_LUSTRE);
 	if (rc < 0) {
 		CERROR("LNetNIInit() has failed: %d\n", rc);
 		return rc;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/events.c b/drivers/staging/lustre/lustre/ptlrpc/events.c
index 64eaa0e..ffceba5 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/events.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/events.c
@@ -443,7 +443,7 @@ int ptlrpc_uuid_to_peer(struct obd_uuid *uuid,
 	lnet_nid_t dst_nid;
 	lnet_nid_t src_nid;
 
-	peer->pid = LUSTRE_SRV_LNET_PID;
+	peer->pid = LNET_PID_LUSTRE;
 
 	/* Choose the matching UUID that's closest */
 	while (lustre_uuid_to_peer(uuid->uuid, &dst_nid, count++) == 0) {
@@ -513,7 +513,7 @@ static lnet_pid_t ptl_get_pid(void)
 {
 	lnet_pid_t pid;
 
-	pid = LUSTRE_SRV_LNET_PID;
+	pid = LNET_PID_LUSTRE;
 	return pid;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/24] staging: lustre: improve LNet clean up code and API
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (5 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 06/24] staging: lustre: remove LUSTRE_{,SRV_}LNET_PID James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 08/24] staging: lustre: return appropriate errno when adding route James Simmons
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

This patch addresses a set of related issues: LU-5568, LU-5734,
LU-5839, LU-5849, LU-5850.

Create the local lnet_startup_lndni() API.  This function starts
up one LND.  lnet_startup_lndnis() calls this function in a loop
on every ni in the list passed in.  lnet_startup_lndni() is
responsible for cleaning up after itself in case of failure.
It calls lnet_free_ni() if the ni fails to start.  It calls
lnet_shutdown_lndni() if it successfully called the
lnd startup function, but fails later on.

lnet_startup_lndnis() also cleans up after itself.
If lnet_startup_lndni() fails then lnet_shutdown_lndnis() is
called to clean up all nis that might have been
started, and then free the rest of the nis on the list
which have not been started yet.

To facilitate the above changes lnet_dyn_del_ni() now
manages the ping info.  It calls lnet_shutdown_lndni(),
to shutdown the NI.  lnet_shutdown_lndni() is no longer
an exposed API and doesn't manage the ping info, making
it callable from lnet_startup_lndni() as well.

There are two scenarios for calling lnet_startup_lndni()

1. from lnet_startup_lndnis()
If lnet_startup_lndni() fails it requires to shutdown the ni
without doing anything with the ping information as it hasn't
been created yet.

2. from lnet_dyn_add_ni()
As above it will shutdown the ni, and then lnet_dyn_add_ni() will
take care of managing the ping info

The second part of this change is to ensure that the LOLND is not
added by lnet_parse_networks(), but the caller which needs to do
it (IE: LNetNIInit)

This change ensures that lnet_dyn_add_ni() need only check if there is
only one net that's being added, if not then it frees everything,
otherwise it proceeds to startup the requested net.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5734
Reviewed-on: http://review.whamcloud.com/12658
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    2 +
 drivers/staging/lustre/lnet/lnet/api-ni.c          |  468 ++++++++++----------
 drivers/staging/lustre/lnet/lnet/config.c          |   14 +-
 3 files changed, 251 insertions(+), 233 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 5e16fe0..2ee3d73 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -422,6 +422,8 @@ lnet_ni_decref(lnet_ni_t *ni)
 }
 
 void lnet_ni_free(lnet_ni_t *ni);
+lnet_ni_t *
+lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist);
 
 static inline int
 lnet_nid2peerhash(lnet_nid_t nid)
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 3bed4c3..3b7bc36 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1066,6 +1066,20 @@ lnet_ni_tq_credits(lnet_ni_t *ni)
 }
 
 static void
+lnet_ni_unlink_locked(lnet_ni_t *ni)
+{
+	if (!list_empty(&ni->ni_cptlist)) {
+		list_del_init(&ni->ni_cptlist);
+		lnet_ni_decref_locked(ni, 0);
+	}
+
+	/* move it to zombie list and nobody can find it anymore */
+	LASSERT(!list_empty(&ni->ni_list));
+	list_move(&ni->ni_list, &the_lnet.ln_nis_zombie);
+	lnet_ni_decref_locked(ni, 0);	/* drop ln_nis' ref */
+}
+
+static void
 lnet_clear_zombies_nis_locked(void)
 {
 	int i;
@@ -1148,14 +1162,7 @@ lnet_shutdown_lndnis(void)
 	while (!list_empty(&the_lnet.ln_nis)) {
 		ni = list_entry(the_lnet.ln_nis.next,
 				lnet_ni_t, ni_list);
-		/* move it to zombie list and nobody can find it anymore */
-		list_move(&ni->ni_list, &the_lnet.ln_nis_zombie);
-		lnet_ni_decref_locked(ni, 0);	/* drop ln_nis' ref */
-
-		if (!list_empty(&ni->ni_cptlist)) {
-			list_del_init(&ni->ni_cptlist);
-			lnet_ni_decref_locked(ni, 0);
-		}
+		lnet_ni_unlink_locked(ni);
 	}
 
 	/* Drop the cached eqwait NI. */
@@ -1192,228 +1199,196 @@ lnet_shutdown_lndnis(void)
 	lnet_net_unlock(LNET_LOCK_EX);
 }
 
-int
-lnet_shutdown_lndni(__u32 net)
+/* shutdown down the NI and release refcount */
+static void
+lnet_shutdown_lndni(struct lnet_ni *ni)
 {
-	lnet_ping_info_t *pinfo;
-	lnet_handle_md_t md_handle;
-	lnet_ni_t *found_ni = NULL;
-	int ni_count;
-	int rc;
-
-	if (LNET_NETTYP(net) == LOLND)
-		return -EINVAL;
-
-	ni_count = lnet_get_ni_count();
-
-	/* create and link a new ping info, before removing the old one */
-	rc = lnet_ping_info_setup(&pinfo, &md_handle, ni_count - 1, false);
-	if (rc)
-		return rc;
-
-	/* proceed with shutting down the NI */
 	lnet_net_lock(LNET_LOCK_EX);
-
-	found_ni = lnet_net2ni_locked(net, 0);
-	if (!found_ni) {
-		lnet_net_unlock(LNET_LOCK_EX);
-		lnet_ping_md_unlink(pinfo, &md_handle);
-		lnet_ping_info_free(pinfo);
-		return -EINVAL;
-	}
-
-	/*
-	 * decrement the reference counter on found_ni which was
-	 * incremented when we called lnet_net2ni_locked()
-	 */
-	lnet_ni_decref_locked(found_ni, 0);
-	/* Move ni to zombie list so nobody can find it anymore */
-	list_move(&found_ni->ni_list, &the_lnet.ln_nis_zombie);
-
-	/* Drop the lock reference for the ln_nis ref. */
-	lnet_ni_decref_locked(found_ni, 0);
-
-	if (!list_empty(&found_ni->ni_cptlist)) {
-		list_del_init(&found_ni->ni_cptlist);
-		lnet_ni_decref_locked(found_ni, 0);
-	}
-
+	lnet_ni_unlink_locked(ni);
 	lnet_net_unlock(LNET_LOCK_EX);
 
 	/* Do peer table cleanup for this ni */
-	lnet_peer_tables_cleanup(found_ni);
+	lnet_peer_tables_cleanup(ni);
 
 	lnet_net_lock(LNET_LOCK_EX);
 	lnet_clear_zombies_nis_locked();
 	lnet_net_unlock(LNET_LOCK_EX);
-
-	lnet_ping_target_update(pinfo, md_handle);
-
-	return 0;
 }
 
 static int
-lnet_startup_lndnis(struct list_head *nilist, __s32 peer_timeout,
-		    __s32 peer_cr, __s32 peer_buf_cr, __s32 credits,
-		    int *ni_count)
+lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout,
+		   __s32 peer_cr, __s32 peer_buf_cr, __s32 credits)
 {
+	int rc = 0;
+	int lnd_type;
 	lnd_t *lnd;
-	struct lnet_ni *ni;
 	struct lnet_tx_queue *tq;
 	int i;
-	int rc = 0;
-	__u32 lnd_type;
-
-	while (!list_empty(nilist)) {
-		ni = list_entry(nilist->next, lnet_ni_t, ni_list);
-		lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));
 
-		if (!libcfs_isknown_lnd(lnd_type))
-			goto failed;
+	lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));
 
-		if (lnd_type == CIBLND    ||
-		    lnd_type == OPENIBLND ||
-		    lnd_type == IIBLND    ||
-		    lnd_type == VIBLND) {
-			CERROR("LND %s obsoleted\n",
-			       libcfs_lnd2str(lnd_type));
-			goto failed;
-		}
+	LASSERT(libcfs_isknown_lnd(lnd_type));
 
-		/* Make sure this new NI is unique. */
-		lnet_net_lock(LNET_LOCK_EX);
-		if (!lnet_net_unique(LNET_NIDNET(ni->ni_nid),
-				     &the_lnet.ln_nis)) {
-			if (lnd_type == LOLND) {
-				lnet_net_unlock(LNET_LOCK_EX);
-				list_del(&ni->ni_list);
-				lnet_ni_free(ni);
-				continue;
-			}
+	if (lnd_type == CIBLND || lnd_type == OPENIBLND ||
+	    lnd_type == IIBLND || lnd_type == VIBLND) {
+		CERROR("LND %s obsoleted\n", libcfs_lnd2str(lnd_type));
+		goto failed0;
+	}
 
-			CERROR("Net %s is not unique\n",
-			       libcfs_net2str(LNET_NIDNET(ni->ni_nid)));
+	/* Make sure this new NI is unique. */
+	lnet_net_lock(LNET_LOCK_EX);
+	if (!lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nis)) {
+		if (lnd_type == LOLND) {
 			lnet_net_unlock(LNET_LOCK_EX);
-			goto failed;
+			lnet_ni_free(ni);
+			return 0;
 		}
 		lnet_net_unlock(LNET_LOCK_EX);
 
+		CERROR("Net %s is not unique\n",
+		       libcfs_net2str(LNET_NIDNET(ni->ni_nid)));
+		goto failed0;
+	}
+	lnet_net_unlock(LNET_LOCK_EX);
+
+	mutex_lock(&the_lnet.ln_lnd_mutex);
+	lnd = lnet_find_lnd_by_type(lnd_type);
+
+	if (!lnd) {
+		mutex_unlock(&the_lnet.ln_lnd_mutex);
+		rc = request_module("%s", libcfs_lnd2modname(lnd_type));
 		mutex_lock(&the_lnet.ln_lnd_mutex);
-		lnd = lnet_find_lnd_by_type(lnd_type);
 
+		lnd = lnet_find_lnd_by_type(lnd_type);
 		if (!lnd) {
 			mutex_unlock(&the_lnet.ln_lnd_mutex);
-			rc = request_module("%s",
-					    libcfs_lnd2modname(lnd_type));
-			mutex_lock(&the_lnet.ln_lnd_mutex);
-
-			lnd = lnet_find_lnd_by_type(lnd_type);
-			if (!lnd) {
-				mutex_unlock(&the_lnet.ln_lnd_mutex);
-				CERROR("Can't load LND %s, module %s, rc=%d\n",
-				       libcfs_lnd2str(lnd_type),
-				       libcfs_lnd2modname(lnd_type), rc);
-				goto failed;
-			}
+			CERROR("Can't load LND %s, module %s, rc=%d\n",
+			       libcfs_lnd2str(lnd_type),
+			       libcfs_lnd2modname(lnd_type), rc);
+			goto failed0;
 		}
+	}
+
+	lnet_net_lock(LNET_LOCK_EX);
+	lnd->lnd_refcount++;
+	lnet_net_unlock(LNET_LOCK_EX);
+
+	ni->ni_lnd = lnd;
+
+	rc = lnd->lnd_startup(ni);
 
+	mutex_unlock(&the_lnet.ln_lnd_mutex);
+
+	if (rc) {
+		LCONSOLE_ERROR_MSG(0x105, "Error %d starting up LNI %s\n",
+				   rc, libcfs_lnd2str(lnd->lnd_type));
 		lnet_net_lock(LNET_LOCK_EX);
-		lnd->lnd_refcount++;
+		lnd->lnd_refcount--;
 		lnet_net_unlock(LNET_LOCK_EX);
+		goto failed0;
+	}
 
-		ni->ni_lnd = lnd;
+	/*
+	 * If given some LND tunable parameters, parse those now to
+	 * override the values in the NI structure.
+	 */
+	if (peer_buf_cr >= 0)
+		ni->ni_peerrtrcredits = peer_buf_cr;
+	if (peer_timeout >= 0)
+		ni->ni_peertimeout = peer_timeout;
+	/*
+	 * TODO
+	 * Note: For now, don't allow the user to change
+	 * peertxcredits as this number is used in the
+	 * IB LND to control queue depth.
+	 * if (peer_cr != -1)
+	 *	ni->ni_peertxcredits = peer_cr;
+	 */
+	if (credits >= 0)
+		ni->ni_maxtxcredits = credits;
 
-		rc = lnd->lnd_startup(ni);
+	LASSERT(ni->ni_peertimeout <= 0 || lnd->lnd_query);
 
-		mutex_unlock(&the_lnet.ln_lnd_mutex);
+	lnet_net_lock(LNET_LOCK_EX);
+	/* refcount for ln_nis */
+	lnet_ni_addref_locked(ni, 0);
+	list_add_tail(&ni->ni_list, &the_lnet.ln_nis);
+	if (ni->ni_cpts) {
+		lnet_ni_addref_locked(ni, 0);
+		list_add_tail(&ni->ni_cptlist, &the_lnet.ln_nis_cpt);
+	}
 
-		if (rc) {
-			LCONSOLE_ERROR_MSG(0x105, "Error %d starting up LNI %s\n",
-					   rc, libcfs_lnd2str(lnd->lnd_type));
-			lnet_net_lock(LNET_LOCK_EX);
-			lnd->lnd_refcount--;
-			lnet_net_unlock(LNET_LOCK_EX);
-			goto failed;
-		}
+	lnet_net_unlock(LNET_LOCK_EX);
 
+	if (lnd->lnd_type == LOLND) {
+		lnet_ni_addref(ni);
+		LASSERT(!the_lnet.ln_loni);
+		the_lnet.ln_loni = ni;
+		return 0;
+	}
+
+	if (!ni->ni_peertxcredits || !ni->ni_maxtxcredits) {
+		LCONSOLE_ERROR_MSG(0x107, "LNI %s has no %scredits\n",
+				   libcfs_lnd2str(lnd->lnd_type),
+				   !ni->ni_peertxcredits ?
+				   "" : "per-peer ");
 		/*
-		 * If given some LND tunable parameters, parse those now to
-		 * override the values in the NI structure.
-		 */
-		if (peer_buf_cr >= 0)
-			ni->ni_peerrtrcredits = peer_buf_cr;
-		if (peer_timeout >= 0)
-			ni->ni_peertimeout = peer_timeout;
-		/*
-		 * TODO
-		 * Note: For now, don't allow the user to change
-		 * peertxcredits as this number is used in the
-		 * IB LND to control queue depth.
-		 * if (peer_cr != -1)
-		 *	ni->ni_peertxcredits = peer_cr;
+		 * shutdown the NI since if we get here then it must've already
+		 * been started
 		 */
-		if (credits >= 0)
-			ni->ni_maxtxcredits = credits;
-
-		LASSERT(ni->ni_peertimeout <= 0 || lnd->lnd_query);
+                lnet_shutdown_lndni(ni);
+                return -EINVAL;
+	}
 
-		list_del(&ni->ni_list);
+	cfs_percpt_for_each(tq, i, ni->ni_tx_queues) {
+		tq->tq_credits_min =
+		tq->tq_credits_max =
+		tq->tq_credits = lnet_ni_tq_credits(ni);
+	}
 
-		lnet_net_lock(LNET_LOCK_EX);
-		/* refcount for ln_nis */
-		lnet_ni_addref_locked(ni, 0);
-		list_add_tail(&ni->ni_list, &the_lnet.ln_nis);
-		if (ni->ni_cpts) {
-			list_add_tail(&ni->ni_cptlist,
-				      &the_lnet.ln_nis_cpt);
-			lnet_ni_addref_locked(ni, 0);
-		}
+	CDEBUG(D_LNI, "Added LNI %s [%d/%d/%d/%d]\n",
+	       libcfs_nid2str(ni->ni_nid), ni->ni_peertxcredits,
+	       lnet_ni_tq_credits(ni) * LNET_CPT_NUMBER,
+	       ni->ni_peerrtrcredits, ni->ni_peertimeout);
 
-		lnet_net_unlock(LNET_LOCK_EX);
+	return 0;
+failed0:
+	lnet_ni_free(ni);
+	return -EINVAL;
+}
 
-		/* increment the ni_count here to account for the LOLND as
-		 * well.  If we increment past this point then the number
-		 * of count will be missing the LOLND, and then ping and
-		 * will not report the LOLND
-		 */
-		if (ni_count)
-			(*ni_count)++;
+static int
+lnet_startup_lndnis(struct list_head *nilist)
+{
+	struct lnet_ni *ni;
+	int rc;
+	int lnd_type;
+	int ni_count = 0;
 
-		if (lnd->lnd_type == LOLND) {
-			lnet_ni_addref(ni);
-			LASSERT(!the_lnet.ln_loni);
-			the_lnet.ln_loni = ni;
-			continue;
-		}
+	while (!list_empty(nilist)) {
+		ni = list_entry(nilist->next, lnet_ni_t, ni_list);
+		list_del(&ni->ni_list);
+		rc = lnet_startup_lndni(ni, -1, -1, -1, -1);
 
-		if (!ni->ni_peertxcredits || !ni->ni_maxtxcredits) {
-			LCONSOLE_ERROR_MSG(0x107, "LNI %s has no %scredits\n",
-					   libcfs_lnd2str(lnd->lnd_type),
-					   !ni->ni_peertxcredits ?
-					   "" : "per-peer ");
+		if (rc < 0)
 			goto failed;
-		}
 
-		cfs_percpt_for_each(tq, i, ni->ni_tx_queues) {
-			tq->tq_credits_min =
-			tq->tq_credits_max =
-			tq->tq_credits = lnet_ni_tq_credits(ni);
-		}
+		ni_count++;
+	}
 
-		CDEBUG(D_LNI, "Added LNI %s [%d/%d/%d/%d]\n",
-		       libcfs_nid2str(ni->ni_nid), ni->ni_peertxcredits,
-		       lnet_ni_tq_credits(ni) * LNET_CPT_NUMBER,
-		       ni->ni_peerrtrcredits, ni->ni_peertimeout);
+	if (the_lnet.ln_eq_waitni && ni_count > 1) {
+		lnd_type = the_lnet.ln_eq_waitni->ni_lnd->lnd_type;
+		LCONSOLE_ERROR_MSG(0x109, "LND %s can only run single-network\n",
+				   libcfs_lnd2str(lnd_type));
+		rc = -EINVAL;
+		goto failed;
 	}
 
-	return 0;
+	return ni_count;
 failed:
-	while (!list_empty(nilist)) {
-		ni = list_entry(nilist->next, lnet_ni_t, ni_list);
-		list_del(&ni->ni_list);
-		lnet_ni_free(ni);
-	}
-	return -EINVAL;
+	lnet_shutdown_lndnis();
+
+	return rc;
 }
 
 /**
@@ -1528,10 +1503,8 @@ int
 LNetNIInit(lnet_pid_t requested_pid)
 {
 	int im_a_router = 0;
-	int rc;
-	int ni_count = 0;
-	int lnd_type;
-	struct lnet_ni *ni;
+	int rc, rc2;
+	int ni_count;
 	lnet_ping_info_t *pinfo;
 	lnet_handle_md_t md_handle;
 	struct list_head net_head;
@@ -1550,54 +1523,67 @@ LNetNIInit(lnet_pid_t requested_pid)
 	}
 
 	rc = lnet_prepare(requested_pid);
-	if (rc)
-		goto failed0;
+	if (rc) {
+		mutex_unlock(&the_lnet.ln_api_mutex);
+		return rc;
+	}
 
-	rc = lnet_parse_networks(&net_head,
-				 !the_lnet.ln_nis_from_mod_params ?
-				 lnet_get_networks() : "");
-	if (rc < 0)
-		goto failed1;
+	/* Add in the loopback network */
+	if (!lnet_ni_alloc(LNET_MKNET(LOLND, 0), NULL, &net_head)) {
+		rc = -ENOMEM;
+		goto err_empty_list;
+	}
 
-	rc = lnet_startup_lndnis(&net_head, -1, -1, -1, -1, &ni_count);
-	if (rc)
-		goto failed1;
+	/*
+	 * If LNet is being initialized via DLC it is possible
+	 * that the user requests not to load module parameters (ones which
+	 * are supported by DLC) on initialization.  Therefore, make sure not
+	 * to load networks, routes and forwarding from module parameters
+	 * in this case. On cleanup in case of failure only clean up
+	 * routes if it has been loaded
+	 */
+	if (!the_lnet.ln_nis_from_mod_params) {
+		rc = lnet_parse_networks(&net_head, lnet_get_networks());
+		if (rc < 0)
+			goto err_empty_list;
+	}
 
-	if (the_lnet.ln_eq_waitni && ni_count > 1) {
-		lnd_type = the_lnet.ln_eq_waitni->ni_lnd->lnd_type;
-		LCONSOLE_ERROR_MSG(0x109, "LND %s can only run single-network\n",
-				   libcfs_lnd2str(lnd_type));
-		goto failed2;
+	ni_count = lnet_startup_lndnis(&net_head);
+	if (ni_count < 0) {
+		rc = ni_count;
+		goto err_empty_list;
 	}
 
-	rc = lnet_parse_routes(lnet_get_routes(), &im_a_router);
-	if (rc)
-		goto failed2;
+	if (!the_lnet.ln_nis_from_mod_params) {
+		rc = lnet_parse_routes(lnet_get_routes(), &im_a_router);
+		if (rc)
+			goto err_shutdown_lndnis;
 
-	rc = lnet_check_routes();
-	if (rc)
-		goto failed2;
+		rc = lnet_check_routes();
+		if (rc)
+			goto err_destory_routes;
 
-	rc = lnet_rtrpools_alloc(im_a_router);
-	if (rc)
-		goto failed2;
+		rc = lnet_rtrpools_alloc(im_a_router);
+		if (rc)
+			goto err_destory_routes;
+	}
 
 	rc = lnet_acceptor_start();
 	if (rc)
-		goto failed2;
+		goto err_destory_routes;
 
 	the_lnet.ln_refcount = 1;
 	/* Now I may use my own API functions... */
 
 	rc = lnet_ping_info_setup(&pinfo, &md_handle, ni_count, true);
 	if (rc)
-		goto failed3;
+		goto err_acceptor_stop;
 
 	lnet_ping_target_update(pinfo, md_handle);
 
 	rc = lnet_router_checker_start();
 	if (rc)
-		goto failed4;
+		goto err_stop_ping;
 
 	lnet_router_debugfs_init();
 
@@ -1605,23 +1591,26 @@ LNetNIInit(lnet_pid_t requested_pid)
 
 	return 0;
 
- failed4:
-	the_lnet.ln_refcount = 0;
+err_stop_ping:
 	lnet_ping_md_unlink(pinfo, &md_handle);
 	lnet_ping_info_free(pinfo);
- failed3:
+	rc2 = LNetEQFree(the_lnet.ln_ping_target_eq);
+	LASSERT(!rc2);
+err_acceptor_stop:
+	the_lnet.ln_refcount = 0;
 	lnet_acceptor_stop();
-	rc = LNetEQFree(the_lnet.ln_ping_target_eq);
-	LASSERT(!rc);
- failed2:
-	lnet_destroy_routes();
+err_destory_routes:
+	if (!the_lnet.ln_nis_from_mod_params)
+		lnet_destroy_routes();
+err_shutdown_lndnis:
 	lnet_shutdown_lndnis();
- failed1:
+err_empty_list:
 	lnet_unprepare();
- failed0:
 	LASSERT(rc < 0);
 	mutex_unlock(&the_lnet.ln_api_mutex);
 	while (!list_empty(&net_head)) {
+		struct lnet_ni *ni;
+
 		ni = list_entry(net_head.next, struct lnet_ni, ni_list);
 		list_del_init(&ni->ni_list);
 		lnet_ni_free(ni);
@@ -1773,8 +1762,8 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
 
 	/* Create a ni structure for the network string */
 	rc = lnet_parse_networks(&net_head, nets);
-	if (rc < 0)
-		return rc;
+	if (rc <= 0)
+		return !rc ? -EINVAL : rc;
 
 	mutex_lock(&the_lnet.ln_api_mutex);
 
@@ -1788,8 +1777,11 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
 	if (rc)
 		goto failed0;
 
-	rc = lnet_startup_lndnis(&net_head, peer_timeout, peer_cr,
-				 peer_buf_cr, credits, NULL);
+	ni = list_entry(net_head.next, struct lnet_ni, ni_list);
+	list_del_init(&ni->ni_list);
+
+	rc = lnet_startup_lndni(ni, peer_timeout, peer_cr,
+				peer_buf_cr, credits);
 	if (rc)
 		goto failed1;
 
@@ -1814,10 +1806,38 @@ failed0:
 int
 lnet_dyn_del_ni(__u32 net)
 {
+	lnet_ni_t *ni;
+	lnet_ping_info_t *pinfo;
+	lnet_handle_md_t md_handle;
 	int rc;
 
+	/* don't allow userspace to shutdown the LOLND */
+	if (LNET_NETTYP(net) == LOLND)
+		return -EINVAL;
+
 	mutex_lock(&the_lnet.ln_api_mutex);
-	rc = lnet_shutdown_lndni(net);
+	/* create and link a new ping info, before removing the old one */
+	rc = lnet_ping_info_setup(&pinfo, &md_handle,
+				  lnet_get_ni_count() - 1, false);
+	if (rc)
+		goto out;
+
+	ni = lnet_net2ni(net);
+	if (!ni) {
+		rc = -EINVAL;
+		goto failed;
+	}
+
+	/* decrement the reference counter taken by lnet_net2ni() */
+	lnet_ni_decref_locked(ni, 0);
+
+	lnet_shutdown_lndni(ni);
+	lnet_ping_target_update(pinfo, md_handle);
+	goto out;
+failed:
+	lnet_ping_md_unlink(pinfo, &md_handle);
+	lnet_ping_info_free(pinfo);
+out:
 	mutex_unlock(&the_lnet.ln_api_mutex);
 
 	return rc;
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 013d41b..c04a0ef 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -114,7 +114,7 @@ lnet_ni_free(struct lnet_ni *ni)
 	LIBCFS_FREE(ni, sizeof(*ni));
 }
 
-static lnet_ni_t *
+lnet_ni_t *
 lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist)
 {
 	struct lnet_tx_queue *tq;
@@ -191,6 +191,7 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
 	struct lnet_ni *ni;
 	__u32 net;
 	int nnets = 0;
+	struct list_head *temp_node;
 
 	if (!networks) {
 		CERROR("networks string is undefined\n");
@@ -216,11 +217,6 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
 	tmp = tokens;
 	str = tokens;
 
-	/* Add in the loopback network */
-	ni = lnet_ni_alloc(LNET_MKNET(LOLND, 0), NULL, nilist);
-	if (!ni)
-		goto failed;
-
 	while (str && *str) {
 		char *comma = strchr(str, ',');
 		char *bracket = strchr(str, '(');
@@ -294,7 +290,6 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
 			goto failed_syntax;
 		}
 
-		nnets++;
 		ni = lnet_ni_alloc(net, el, nilist);
 		if (!ni)
 			goto failed;
@@ -372,10 +367,11 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
 		}
 	}
 
-	LASSERT(!list_empty(nilist));
+	list_for_each(temp_node, nilist)
+		nnets++;
 
 	LIBCFS_FREE(tokens, tokensize);
-	return 0;
+	return nnets;
 
  failed_syntax:
 	lnet_syntax("networks", networks, (int)(tmp - tokens), strlen(tmp));
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/24] staging: lustre: return appropriate errno when adding route
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (6 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 07/24] staging: lustre: improve LNet clean up code and API James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 09/24] staging: lustre: make some lnet functions static James Simmons
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

When adding route it ignored specific scenarios, namely:
1. route already exists
2. route is on a local net
3. route is unreacheable

This patch returns the appropriate return codes from the lower level
function lnet_add_route(), and then ignores the above case from the
calling function, lnet_parse_route().  This is needed so we don't
halt processing routes in the module parameters.

However, we can now add routes dynamically, and it should be returned
to the user whether adding the requested route succeeded or failed.

In userspace it is determined whether to continue adding routes or to
halt processing.  Currently "lnetctl import < config" continues
adding the rest of the configuration and reports at the end which
operations passed and which ones failed.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6045
Reviewed-on: http://review.whamcloud.com/13116
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/config.c |    2 +-
 drivers/staging/lustre/lnet/lnet/router.c |   11 +++++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index c04a0ef..8c80625 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -769,7 +769,7 @@ lnet_parse_route(char *str, int *im_a_router)
 			}
 
 			rc = lnet_add_route(net, hops, nid, priority);
-			if (rc) {
+			if (rc && rc != -EEXIST && rc != -EHOSTUNREACH) {
 				CERROR("Can't create route to %s via %s\n",
 				       libcfs_net2str(net),
 				       libcfs_nid2str(nid));
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index d748931..511e446 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -317,7 +317,7 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t gateway,
 		return -EINVAL;
 
 	if (lnet_islocalnet(net))	       /* it's a local network */
-		return 0;		       /* ignore the route entry */
+		return -EEXIST;
 
 	/* Assume net, route, all new */
 	LIBCFS_ALLOC(route, sizeof(*route));
@@ -348,7 +348,7 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t gateway,
 		LIBCFS_FREE(rnet, sizeof(*rnet));
 
 		if (rc == -EHOSTUNREACH) /* gateway is not on a local net */
-			return 0;	/* ignore the route entry */
+			return rc;	/* ignore the route entry */
 		CERROR("Error %d creating route %s %d %s\n", rc,
 		       libcfs_net2str(net), hops,
 		       libcfs_nid2str(gateway));
@@ -395,14 +395,17 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t gateway,
 	/* -1 for notify or !add_route */
 	lnet_peer_decref_locked(route->lr_gateway);
 	lnet_net_unlock(LNET_LOCK_EX);
+	rc = 0;
 
-	if (!add_route)
+	if (!add_route) {
+		rc = -EEXIST;
 		LIBCFS_FREE(route, sizeof(*route));
+	}
 
 	if (rnet != rnet2)
 		LIBCFS_FREE(rnet, sizeof(*rnet));
 
-	return 0;
+	return rc;
 }
 
 int
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/24] staging: lustre: make some lnet functions static
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (7 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 08/24] staging: lustre: return appropriate errno when adding route James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 10/24] staging: lustre: missed a few cases of using NULL instead of 0 James Simmons
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Frank Zago

From: Frank Zago <fzago@cray.com>

Some functions and variables are only used in their C file, so reduce
their scope. This reduces the code size, and fixes sparse warnings
such as:

warning: symbol 'proc_lnet_routes' was not declared.
        Should it be static?
warning: symbol 'proc_lnet_routers' was not declared.
        Should it be static?

Some prototypes were removed from C files and added to the proper
header.

Signed-off-by: Frank Zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5396
Reviewed-on: http://review.whamcloud.com/12206
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    2 ++
 drivers/staging/lustre/lnet/lnet/router_proc.c     |    2 --
 drivers/staging/lustre/lnet/selftest/console.c     |    4 +---
 drivers/staging/lustre/lnet/selftest/framework.c   |   10 ----------
 drivers/staging/lustre/lnet/selftest/module.c      |    4 +---
 drivers/staging/lustre/lnet/selftest/rpc.c         |    2 +-
 6 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 2ee3d73..0928bc9 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -448,6 +448,8 @@ lnet_ni_t *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
 lnet_ni_t *lnet_net2ni_locked(__u32 net, int cpt);
 lnet_ni_t *lnet_net2ni(__u32 net);
 
+extern int portal_rotor;
+
 int lnet_init(void);
 void lnet_fini(void);
 
diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
index 5cb87b8..fc643df 100644
--- a/drivers/staging/lustre/lnet/lnet/router_proc.c
+++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
@@ -800,8 +800,6 @@ static struct lnet_portal_rotors	portal_rotors[] = {
 	},
 };
 
-extern int portal_rotor;
-
 static int __proc_lnet_portal_rotor(void *data, int write,
 				    loff_t pos, void __user *buffer, int nob)
 {
diff --git a/drivers/staging/lustre/lnet/selftest/console.c b/drivers/staging/lustre/lnet/selftest/console.c
index badc696..e8ca1bf 100644
--- a/drivers/staging/lustre/lnet/selftest/console.c
+++ b/drivers/staging/lustre/lnet/selftest/console.c
@@ -1693,8 +1693,6 @@ lstcon_new_session_id(lst_sid_t *sid)
 	sid->ses_stamp = cfs_time_current();
 }
 
-extern srpc_service_t lstcon_acceptor_service;
-
 int
 lstcon_session_new(char *name, int key, unsigned feats,
 		   int timeout, int force, lst_sid_t __user *sid_up)
@@ -1973,7 +1971,7 @@ out:
 	return rc;
 }
 
-srpc_service_t lstcon_acceptor_service;
+static srpc_service_t lstcon_acceptor_service;
 static void lstcon_init_acceptor_service(void)
 {
 	/* initialize selftest console acceptor service table */
diff --git a/drivers/staging/lustre/lnet/selftest/framework.c b/drivers/staging/lustre/lnet/selftest/framework.c
index 7eca046..3bbc720 100644
--- a/drivers/staging/lustre/lnet/selftest/framework.c
+++ b/drivers/staging/lustre/lnet/selftest/framework.c
@@ -1629,16 +1629,6 @@ static srpc_service_t sfw_services[] = {
 	}
 };
 
-extern sfw_test_client_ops_t ping_test_client;
-extern srpc_service_t	ping_test_service;
-extern void ping_init_test_client(void);
-extern void ping_init_test_service(void);
-
-extern sfw_test_client_ops_t brw_test_client;
-extern srpc_service_t	brw_test_service;
-extern void brw_init_test_client(void);
-extern void brw_init_test_service(void);
-
 int
 sfw_startup(void)
 {
diff --git a/drivers/staging/lustre/lnet/selftest/module.c b/drivers/staging/lustre/lnet/selftest/module.c
index c4bf442..cbb7884 100644
--- a/drivers/staging/lustre/lnet/selftest/module.c
+++ b/drivers/staging/lustre/lnet/selftest/module.c
@@ -37,6 +37,7 @@
 #define DEBUG_SUBSYSTEM S_LNET
 
 #include "selftest.h"
+#include "console.h"
 
 enum {
 	LST_INIT_NONE = 0,
@@ -47,9 +48,6 @@ enum {
 	LST_INIT_CONSOLE
 };
 
-extern int lstcon_console_init(void);
-extern int lstcon_console_fini(void);
-
 static int lst_init_step = LST_INIT_NONE;
 
 struct cfs_wi_sched *lst_sched_serial;
diff --git a/drivers/staging/lustre/lnet/selftest/rpc.c b/drivers/staging/lustre/lnet/selftest/rpc.c
index 4213198..1b76933 100644
--- a/drivers/staging/lustre/lnet/selftest/rpc.c
+++ b/drivers/staging/lustre/lnet/selftest/rpc.c
@@ -1097,7 +1097,7 @@ srpc_client_rpc_expired(void *data)
 	spin_unlock(&srpc_data.rpc_glock);
 }
 
-inline void
+static void
 srpc_add_client_rpc_timer(srpc_client_rpc_t *rpc)
 {
 	stt_timer_t *timer = &rpc->crpc_timer;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/24] staging: lustre: missed a few cases of using NULL instead of 0
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (8 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 09/24] staging: lustre: make some lnet functions static James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 11/24] staging: lustre: startup lnet acceptor thread dynamically James Simmons
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Frank Zago

From: Frank Zago <fzago@cray.com>

It is preferable to use NULL instead of 0 for pointers. This fixes sparse
warnings such as:

lustre/fld/fld_request.c:126:17: warning: Using plain integer as NULL pointer

The second parameter of class_match_param() was changed to a const, to
be able to remove a cast in one user, to prevent splitting a long
line. No other code change.

Signed-off-by: Frank Zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5396
Reviewed-on: http://review.whamcloud.com/12567
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c          |    2 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |    2 +-
 .../staging/lustre/lustre/obdclass/obd_config.c    |    4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 3b7bc36..f223d5d 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -904,7 +904,7 @@ lnet_ping_info_setup(lnet_ping_info_t **ppinfo, lnet_handle_md_t *md_handle,
 {
 	lnet_process_id_t id = {LNET_NID_ANY, LNET_PID_ANY};
 	lnet_handle_me_t me_handle;
-	lnet_md_t md = {0};
+	lnet_md_t md = { NULL };
 	int rc, rc2;
 
 	if (set_eq) {
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 4de085d..2440b07 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1940,7 +1940,7 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req,
 		  struct super_block *sb, struct lookup_intent *it)
 {
 	struct ll_sb_info *sbi = NULL;
-	struct lustre_md md;
+	struct lustre_md md = { NULL };
 	int rc;
 
 	LASSERT(*inode || sb);
diff --git a/drivers/staging/lustre/lustre/obdclass/obd_config.c b/drivers/staging/lustre/lustre/obdclass/obd_config.c
index c4128ac..6417946 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_config.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_config.c
@@ -72,7 +72,7 @@ EXPORT_SYMBOL(class_find_param);
 
 /* returns 0 if this is the first key in the buffer, else 1.
    valp points to first char after key. */
-static int class_match_param(char *buf, char *key, char **valp)
+static int class_match_param(char *buf, const char *key, char **valp)
 {
 	if (!buf)
 		return 1;
@@ -1008,7 +1008,7 @@ int class_process_proc_param(char *prefix, struct lprocfs_vars *lvars,
 		/* Search proc entries */
 		while (lvars[j].name) {
 			var = &lvars[j];
-			if (class_match_param(key, (char *)var->name, NULL) == 0
+			if (!class_match_param(key, var->name, NULL)
 			    && keylen == strlen(var->name)) {
 				matched++;
 				rc = -EROFS;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/24] staging: lustre: startup lnet acceptor thread dynamically
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (9 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 10/24] staging: lustre: missed a few cases of using NULL instead of 0 James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 12/24] staging: lustre: reject invalid net configuration for lnet James Simmons
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

With DLC it's possible to start up a system with no NIs that require
the acceptor thread, and thus it won't start.  Later on the user
can add an NI that requires the acceptor thread to start, it is
then necessary to start it up.

If the user removes a NI and as a result there are no more
NIs that require the acceptor thread then it should be stopped.
This patch adds logic in the dynamically adding and removing NIs
code to ensure the above logic is implemented.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6002
Reviewed-on: http://review.whamcloud.com/13010
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/acceptor.c |   10 ++++++++--
 drivers/staging/lustre/lnet/lnet/api-ni.c   |   14 ++++++++++++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
index 07df727..9fe3ff7 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -46,7 +46,9 @@ static struct {
 	int			pta_shutdown;
 	struct socket		*pta_sock;
 	struct completion	pta_signal;
-} lnet_acceptor_state;
+} lnet_acceptor_state = {
+	.pta_shutdown = 1
+};
 
 int
 lnet_acceptor_port(void)
@@ -444,6 +446,10 @@ lnet_acceptor_start(void)
 	long rc2;
 	long secure;
 
+	/* if acceptor is already running return immediately */
+	if (!lnet_acceptor_state.pta_shutdown)
+		return 0;
+
 	LASSERT(!lnet_acceptor_state.pta_sock);
 
 	rc = lnet_acceptor_get_tunables();
@@ -484,7 +490,7 @@ lnet_acceptor_start(void)
 void
 lnet_acceptor_stop(void)
 {
-	if (!lnet_acceptor_state.pta_sock) /* not running */
+	if (lnet_acceptor_state.pta_shutdown) /* not running */
 		return;
 
 	lnet_acceptor_state.pta_shutdown = 1;
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index f223d5d..9497ce1 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1785,6 +1785,16 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
 	if (rc)
 		goto failed1;
 
+	if (ni->ni_lnd->lnd_accept) {
+		rc = lnet_acceptor_start();
+		if (rc < 0) {
+			/* shutdown the ni that we just started */
+			CERROR("Failed to start up acceptor thread\n");
+			lnet_shutdown_lndni(ni);
+			goto failed1;
+		}
+	}
+
 	lnet_ping_target_update(pinfo, md_handle);
 	mutex_unlock(&the_lnet.ln_api_mutex);
 
@@ -1832,6 +1842,10 @@ lnet_dyn_del_ni(__u32 net)
 	lnet_ni_decref_locked(ni, 0);
 
 	lnet_shutdown_lndni(ni);
+
+	if (!lnet_count_acceptor_nis())
+		lnet_acceptor_stop();
+
 	lnet_ping_target_update(pinfo, md_handle);
 	goto out;
 failed:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/24] staging: lustre: reject invalid net configuration for lnet
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (10 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 11/24] staging: lustre: startup lnet acceptor thread dynamically James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 13/24] staging: lustre: return -EEXIST if NI is not unique James Simmons
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

Currently if there exists a route that goes over a
remote net and then this net is added dynamically as
a local net, then traffic stops because the code in
lnet_send() determines that the destination nid
can be reached from another local_ni, but the src_nid
is still stuck on the earlier NI, because the src_nid
is stored in the ptlrpc layer and is not updated
when a local NI is configured.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5874
Reviewed-on: http://review.whamcloud.com/12912
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 9497ce1..62a9e45 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1756,6 +1756,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
 	lnet_handle_md_t md_handle;
 	struct lnet_ni *ni;
 	struct list_head net_head;
+	lnet_remotenet_t *rnet;
 	int rc;
 
 	INIT_LIST_HEAD(&net_head);
@@ -1772,12 +1773,27 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
 		goto failed0;
 	}
 
+	ni = list_entry(net_head.next, struct lnet_ni, ni_list);
+
+	lnet_net_lock(LNET_LOCK_EX);
+	rnet = lnet_find_net_locked(LNET_NIDNET(ni->ni_nid));
+	lnet_net_unlock(LNET_LOCK_EX);
+	/*
+	 * make sure that the net added doesn't invalidate the current
+	 * configuration LNet is keeping
+	 */
+	if (rnet) {
+		CERROR("Adding net %s will invalidate routing configuration\n",
+		       nets);
+		rc = -EUSERS;
+		goto failed0;
+	}
+
 	rc = lnet_ping_info_setup(&pinfo, &md_handle, 1 + lnet_get_ni_count(),
 				  false);
 	if (rc)
 		goto failed0;
 
-	ni = list_entry(net_head.next, struct lnet_ni, ni_list);
 	list_del_init(&ni->ni_list);
 
 	rc = lnet_startup_lndni(ni, peer_timeout, peer_cr,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/24] staging: lustre: return -EEXIST if NI is not unique
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (11 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 12/24] staging: lustre: reject invalid net configuration for lnet James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 14/24] staging: lustre: handle lnet_check_routes() errors James Simmons
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

Return -EEXIST and not -EINVAL when trying to add a
network interface which is not unique.

Some minor cleanup in api-ni.c

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5875
Reviewed-on: http://review.whamcloud.com/13056
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   20 +++++++++-----------
 1 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 62a9e45..ccd7dcd 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1219,7 +1219,7 @@ static int
 lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout,
 		   __s32 peer_cr, __s32 peer_buf_cr, __s32 credits)
 {
-	int rc = 0;
+	int rc = -EINVAL;
 	int lnd_type;
 	lnd_t *lnd;
 	struct lnet_tx_queue *tq;
@@ -1237,19 +1237,19 @@ lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout,
 
 	/* Make sure this new NI is unique. */
 	lnet_net_lock(LNET_LOCK_EX);
-	if (!lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nis)) {
+	rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nis);
+	lnet_net_unlock(LNET_LOCK_EX);
+	if (!rc) {
 		if (lnd_type == LOLND) {
-			lnet_net_unlock(LNET_LOCK_EX);
 			lnet_ni_free(ni);
 			return 0;
 		}
-		lnet_net_unlock(LNET_LOCK_EX);
 
 		CERROR("Net %s is not unique\n",
 		       libcfs_net2str(LNET_NIDNET(ni->ni_nid)));
+		rc = -EEXIST;
 		goto failed0;
 	}
-	lnet_net_unlock(LNET_LOCK_EX);
 
 	mutex_lock(&the_lnet.ln_lnd_mutex);
 	lnd = lnet_find_lnd_by_type(lnd_type);
@@ -1265,6 +1265,7 @@ lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout,
 			CERROR("Can't load LND %s, module %s, rc=%d\n",
 			       libcfs_lnd2str(lnd_type),
 			       libcfs_lnd2modname(lnd_type), rc);
+			rc = -EINVAL;
 			goto failed0;
 		}
 	}
@@ -1354,7 +1355,7 @@ lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout,
 	return 0;
 failed0:
 	lnet_ni_free(ni);
-	return -EINVAL;
+	return rc;
 }
 
 static int
@@ -1503,7 +1504,7 @@ int
 LNetNIInit(lnet_pid_t requested_pid)
 {
 	int im_a_router = 0;
-	int rc, rc2;
+	int rc;
 	int ni_count;
 	lnet_ping_info_t *pinfo;
 	lnet_handle_md_t md_handle;
@@ -1592,10 +1593,7 @@ LNetNIInit(lnet_pid_t requested_pid)
 	return 0;
 
 err_stop_ping:
-	lnet_ping_md_unlink(pinfo, &md_handle);
-	lnet_ping_info_free(pinfo);
-	rc2 = LNetEQFree(the_lnet.ln_ping_target_eq);
-	LASSERT(!rc2);
+	lnet_ping_target_fini();
 err_acceptor_stop:
 	the_lnet.ln_refcount = 0;
 	lnet_acceptor_stop();
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/24] staging: lustre: handle lnet_check_routes() errors
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (12 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 13/24] staging: lustre: return -EEXIST if NI is not unique James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 15/24] staging: lustre: improvement to router checker James Simmons
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

After adding a route, lnet_check_routes() is called to ensure that
the route added doesn't invalidate the routing configuration.  If
lnet_check_routes() fails then the route just added, which caused the
current configuration to be invalidated is deleted, and an error
is returned to the user.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6218
Reviewed-on: http://review.whamcloud.com/13445
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index ccd7dcd..ed121a8 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1907,8 +1907,14 @@ LNetCtl(unsigned int cmd, void *arg)
 				    config->cfg_config_u.cfg_route.rtr_hop,
 				    config->cfg_nid,
 				    config->cfg_config_u.cfg_route.rtr_priority);
+		if (!rc) {
+			rc = lnet_check_routes();
+			if (rc)
+				lnet_del_route(config->cfg_net,
+					       config->cfg_nid);
+		}
 		mutex_unlock(&the_lnet.ln_api_mutex);
-		return rc ? rc : lnet_check_routes();
+		return rc;
 
 	case IOC_LIBCFS_DEL_ROUTE:
 		config = arg;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 15/24] staging: lustre: improvement to router checker
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (13 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 14/24] staging: lustre: handle lnet_check_routes() errors James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 16/24] staging: lustre: assume a kernel build James Simmons
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

This patch starts router checker thread all the time.

The router checker only checks routes by ping if
live_router_check_interval or dead_router_check_interval are set
to something other than 0, and there are routes configured.

If these conditions are not met the router checker sleeps until woken
up when a route is added.  It is also woken up whenever the RC is
being stopped to ensure the thread doesn't hang.

In the future when DLC starts configuring the live and dead
router_check_interval parameters, then by manipulating them
the router checker can be turned on and off by the user.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6003
Reviewed-on: http://review.whamcloud.com/13035
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |    7 +++
 drivers/staging/lustre/lnet/lnet/api-ni.c          |    1 +
 drivers/staging/lustre/lnet/lnet/router.c          |   51 +++++++++++++++++---
 3 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index e4a8f6e..06d4656 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -635,6 +635,13 @@ typedef struct {
 	 */
 	bool				  ln_nis_from_mod_params;
 
+	/*
+	 * waitq for router checker.  As long as there are no routes in
+	 * the list, the router checker will sleep on this queue.  when
+	 * routes are added the thread will wake up
+	 */
+	wait_queue_head_t		  ln_rc_waitq;
+
 } lnet_t;
 
 #endif
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index ed121a8..0ec656a 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -98,6 +98,7 @@ lnet_init_locks(void)
 {
 	spin_lock_init(&the_lnet.ln_eq_wait_lock);
 	init_waitqueue_head(&the_lnet.ln_eq_waitq);
+	init_waitqueue_head(&the_lnet.ln_rc_waitq);
 	mutex_init(&the_lnet.ln_lnd_mutex);
 	mutex_init(&the_lnet.ln_api_mutex);
 }
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 511e446..ad9cd44 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -405,6 +405,9 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t gateway,
 	if (rnet != rnet2)
 		LIBCFS_FREE(rnet, sizeof(*rnet));
 
+	/* indicate to startup the router checker if configured */
+	wake_up(&the_lnet.ln_rc_waitq);
+
 	return rc;
 }
 
@@ -1056,11 +1059,6 @@ lnet_router_checker_start(void)
 		return -EINVAL;
 	}
 
-	if (!the_lnet.ln_routing &&
-	    live_router_check_interval <= 0 &&
-	    dead_router_check_interval <= 0)
-		return 0;
-
 	sema_init(&the_lnet.ln_rc_signal, 0);
 	/*
 	 * EQ size doesn't matter; the callback is guaranteed to get every
@@ -1109,6 +1107,8 @@ lnet_router_checker_stop(void)
 
 	LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING);
 	the_lnet.ln_rc_state = LNET_RC_STATE_STOPPING;
+	/* wakeup the RC thread if it's sleeping */
+	wake_up(&the_lnet.ln_rc_waitq);
 
 	/* block until event callback signals exit */
 	down(&the_lnet.ln_rc_signal);
@@ -1199,6 +1199,33 @@ lnet_prune_rc_data(int wait_unlink)
 	lnet_net_unlock(LNET_LOCK_EX);
 }
 
+/*
+ * This function is called to check if the RC should block indefinitely.
+ * It's called from lnet_router_checker() as well as being passed to
+ * wait_event_interruptible() to avoid the lost wake_up problem.
+ *
+ * When it's called from wait_event_interruptible() it is necessary to
+ * also not sleep if the rc state is not running to avoid a deadlock
+ * when the system is shutting down
+ */
+static inline bool
+lnet_router_checker_active(void)
+{
+	if (the_lnet.ln_rc_state != LNET_RC_STATE_RUNNING)
+		return true;
+
+	/*
+	 * Router Checker thread needs to run when routing is enabled in
+	 * order to call lnet_update_ni_status_locked()
+	 */
+	if (the_lnet.ln_routing)
+		return true;
+
+	return !list_empty(&the_lnet.ln_routers) &&
+		(live_router_check_interval > 0 ||
+		 dead_router_check_interval > 0);
+}
+
 static int
 lnet_router_checker(void *arg)
 {
@@ -1252,8 +1279,18 @@ rescan:
 		 * because kernel counts # active tasks as nr_running
 		 * + nr_uninterruptible.
 		 */
-		set_current_state(TASK_INTERRUPTIBLE);
-		schedule_timeout(cfs_time_seconds(1));
+		/*
+		 * if there are any routes then wakeup every second.  If
+		 * there are no routes then sleep indefinitely until woken
+		 * up by a user adding a route
+		 */
+		if (!lnet_router_checker_active())
+			wait_event_interruptible(the_lnet.ln_rc_waitq,
+						 lnet_router_checker_active());
+		else
+			wait_event_interruptible_timeout(the_lnet.ln_rc_waitq,
+							 false,
+							 cfs_time_seconds(1));
 	}
 
 	LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_STOPPING);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 16/24] staging: lustre: assume a kernel build
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (14 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 15/24] staging: lustre: improvement to router checker James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 17/24] staging: lustre: prevent assert on LNet module unload James Simmons
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, John L. Hammond

From: John L. Hammond <john.hammond@intel.com>

In lnet/lnet/ and lnet/selftest/ assume a kernel build (assume that
 __KERNEL__ is defined). Remove some common code only needed for user
space LNet.

Only part of the work of this patch got merged. This is the final
bits.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/13121
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |    4 --
 drivers/staging/lustre/lnet/lnet/acceptor.c        |    2 -
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   39 ++------------------
 drivers/staging/lustre/lnet/lnet/lib-eq.c          |    3 --
 drivers/staging/lustre/lnet/lnet/lib-md.c          |    3 --
 drivers/staging/lustre/lnet/lnet/lib-me.c          |    3 --
 drivers/staging/lustre/lnet/lnet/lib-move.c        |    5 ---
 drivers/staging/lustre/lnet/lnet/lib-msg.c         |   20 +----------
 drivers/staging/lustre/lnet/lnet/router.c          |   11 ++----
 9 files changed, 7 insertions(+), 83 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 06d4656..f588e06 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -577,8 +577,6 @@ typedef struct {
 	/* dying LND instances */
 	struct list_head		  ln_nis_zombie;
 	lnet_ni_t			 *ln_loni;	/* the loopback NI */
-	/* NI to wait for events in */
-	lnet_ni_t			 *ln_eq_waitni;
 
 	/* remote networks with routes to them */
 	struct list_head		 *ln_remote_nets_hash;
@@ -608,8 +606,6 @@ typedef struct {
 
 	struct mutex			  ln_api_mutex;
 	struct mutex			  ln_lnd_mutex;
-	int				  ln_init;	/* lnet_init()
-							   called? */
 	/* Have I called LNetNIInit myself? */
 	int				  ln_niinit_self;
 	/* LNetNIInit/LNetNIFini counter */
diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
index 9fe3ff7..8f9876b 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -206,8 +206,6 @@ lnet_connect(struct socket **sockp, lnet_nid_t peer_nid,
 }
 EXPORT_SYMBOL(lnet_connect);
 
-/* Below is the code common for both kernel and MT user-space */
-
 static int
 lnet_accept(struct socket *sock, __u32 magic)
 {
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 0ec656a..0c7db19 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -291,7 +291,6 @@ lnet_register_lnd(lnd_t *lnd)
 {
 	mutex_lock(&the_lnet.ln_lnd_mutex);
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(libcfs_isknown_lnd(lnd->lnd_type));
 	LASSERT(!lnet_find_lnd_by_type(lnd->lnd_type));
 
@@ -309,7 +308,6 @@ lnet_unregister_lnd(lnd_t *lnd)
 {
 	mutex_lock(&the_lnet.ln_lnd_mutex);
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(lnet_find_lnd_by_type(lnd->lnd_type) == lnd);
 	LASSERT(!lnd->lnd_refcount);
 
@@ -1166,12 +1164,6 @@ lnet_shutdown_lndnis(void)
 		lnet_ni_unlink_locked(ni);
 	}
 
-	/* Drop the cached eqwait NI. */
-	if (the_lnet.ln_eq_waitni) {
-		lnet_ni_decref_locked(the_lnet.ln_eq_waitni, 0);
-		the_lnet.ln_eq_waitni = NULL;
-	}
-
 	/* Drop the cached loopback NI. */
 	if (the_lnet.ln_loni) {
 		lnet_ni_decref_locked(the_lnet.ln_loni, 0);
@@ -1364,7 +1356,6 @@ lnet_startup_lndnis(struct list_head *nilist)
 {
 	struct lnet_ni *ni;
 	int rc;
-	int lnd_type;
 	int ni_count = 0;
 
 	while (!list_empty(nilist)) {
@@ -1378,14 +1369,6 @@ lnet_startup_lndnis(struct list_head *nilist)
 		ni_count++;
 	}
 
-	if (the_lnet.ln_eq_waitni && ni_count > 1) {
-		lnd_type = the_lnet.ln_eq_waitni->ni_lnd->lnd_type;
-		LCONSOLE_ERROR_MSG(0x109, "LND %s can only run single-network\n",
-				   libcfs_lnd2str(lnd_type));
-		rc = -EINVAL;
-		goto failed;
-	}
-
 	return ni_count;
 failed:
 	lnet_shutdown_lndnis();
@@ -1396,10 +1379,9 @@ failed:
 /**
  * Initialize LNet library.
  *
- * Only userspace program needs to call this function - it's automatically
- * called in the kernel at module loading time. Caller has to call lnet_fini()
- * after a call to lnet_init(), if and only if the latter returned 0. It must
- * be called exactly once.
+ * Automatically called at module loading time. Caller has to call
+ * lnet_exit() after a call to lnet_init(), if and only if the
+ * latter returned 0. It must be called exactly once.
  *
  * \return 0 on success, and -ve on failures.
  */
@@ -1409,7 +1391,6 @@ lnet_init(void)
 	int rc;
 
 	lnet_assert_wire_constants();
-	LASSERT(!the_lnet.ln_init);
 
 	memset(&the_lnet, 0, sizeof(the_lnet));
 
@@ -1435,7 +1416,6 @@ lnet_init(void)
 	}
 
 	the_lnet.ln_refcount = 0;
-	the_lnet.ln_init = 1;
 	LNetInvalidateHandle(&the_lnet.ln_rc_eqh);
 	INIT_LIST_HEAD(&the_lnet.ln_lnds);
 	INIT_LIST_HEAD(&the_lnet.ln_rcd_zombie);
@@ -1465,30 +1445,23 @@ lnet_init(void)
 /**
  * Finalize LNet library.
  *
- * Only userspace program needs to call this function. It can be called
- * at most once.
- *
  * \pre lnet_init() called with success.
  * \pre All LNet users called LNetNIFini() for matching LNetNIInit() calls.
  */
 void
 lnet_fini(void)
 {
-	LASSERT(the_lnet.ln_init);
 	LASSERT(!the_lnet.ln_refcount);
 
 	while (!list_empty(&the_lnet.ln_lnds))
 		lnet_unregister_lnd(list_entry(the_lnet.ln_lnds.next,
 					       lnd_t, lnd_list));
 	lnet_destroy_locks();
-
-	the_lnet.ln_init = 0;
 }
 
 /**
  * Set LNet PID and start LNet interfaces, routing, and forwarding.
  *
- * Userspace program should call this after a successful call to lnet_init().
  * Users must call this function at least once before any other functions.
  * For each successful call there must be a corresponding call to
  * LNetNIFini(). For subsequent calls to LNetNIInit(), \a requested_pid is
@@ -1515,7 +1488,6 @@ LNetNIInit(lnet_pid_t requested_pid)
 
 	mutex_lock(&the_lnet.ln_api_mutex);
 
-	LASSERT(the_lnet.ln_init);
 	CDEBUG(D_OTHER, "refs %d\n", the_lnet.ln_refcount);
 
 	if (the_lnet.ln_refcount > 0) {
@@ -1632,7 +1604,6 @@ LNetNIFini(void)
 {
 	mutex_lock(&the_lnet.ln_api_mutex);
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	if (the_lnet.ln_refcount != 1) {
@@ -1886,8 +1857,6 @@ LNetCtl(unsigned int cmd, void *arg)
 	int rc;
 	unsigned long secs_passed;
 
-	LASSERT(the_lnet.ln_init);
-
 	switch (cmd) {
 	case IOC_LIBCFS_GET_NI:
 		rc = LNetGetId(data->ioc_count, &id);
@@ -2107,8 +2076,6 @@ LNetGetId(unsigned int index, lnet_process_id_t *id)
 	int cpt;
 	int rc = -ENOENT;
 
-	LASSERT(the_lnet.ln_init);
-
 	/* LNetNI initilization failed? */
 	if (!the_lnet.ln_refcount)
 		return rc;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-eq.c b/drivers/staging/lustre/lnet/lnet/lib-eq.c
index b8f248e..042e974 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-eq.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-eq.c
@@ -72,7 +72,6 @@ LNetEQAlloc(unsigned int count, lnet_eq_handler_t callback,
 {
 	lnet_eq_t *eq;
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	/*
@@ -167,7 +166,6 @@ LNetEQFree(lnet_handle_eq_t eqh)
 	int size = 0;
 	int i;
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	lnet_res_lock(LNET_LOCK_EX);
@@ -383,7 +381,6 @@ LNetEQPoll(lnet_handle_eq_t *eventqs, int neq, int timeout_ms,
 	int rc;
 	int i;
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	if (neq < 1)
diff --git a/drivers/staging/lustre/lnet/lnet/lib-md.c b/drivers/staging/lustre/lnet/lnet/lib-md.c
index f26bb03..c74514f 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-md.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-md.c
@@ -281,7 +281,6 @@ LNetMDAttach(lnet_handle_me_t meh, lnet_md_t umd,
 	int cpt;
 	int rc;
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	if (lnet_md_validate(&umd))
@@ -360,7 +359,6 @@ LNetMDBind(lnet_md_t umd, lnet_unlink_t unlink, lnet_handle_md_t *handle)
 	int cpt;
 	int rc;
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	if (lnet_md_validate(&umd))
@@ -435,7 +433,6 @@ LNetMDUnlink(lnet_handle_md_t mdh)
 	lnet_libmd_t *md;
 	int cpt;
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	cpt = lnet_cpt_of_cookie(mdh.cookie);
diff --git a/drivers/staging/lustre/lnet/lnet/lib-me.c b/drivers/staging/lustre/lnet/lnet/lib-me.c
index 3c59c88..e671aed 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-me.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-me.c
@@ -83,7 +83,6 @@ LNetMEAttach(unsigned int portal,
 	struct lnet_me *me;
 	struct list_head *head;
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	if ((int)portal >= the_lnet.ln_nportals)
@@ -156,7 +155,6 @@ LNetMEInsert(lnet_handle_me_t current_meh,
 	struct lnet_portal *ptl;
 	int cpt;
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	if (pos == LNET_INS_LOCAL)
@@ -233,7 +231,6 @@ LNetMEUnlink(lnet_handle_me_t meh)
 	lnet_event_t ev;
 	int cpt;
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	cpt = lnet_cpt_of_cookie(meh.cookie);
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index a342ce0..e5a8dbc 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -59,8 +59,6 @@ lnet_fail_nid(lnet_nid_t nid, unsigned int threshold)
 	struct list_head *next;
 	struct list_head cull;
 
-	LASSERT(the_lnet.ln_init);
-
 	/* NB: use lnet_net_lock(0) to serialize operations on test peers */
 	if (threshold) {
 		/* Adding a new entry */
@@ -2162,7 +2160,6 @@ LNetPut(lnet_nid_t self, lnet_handle_md_t mdh, lnet_ack_req_t ack,
 	int cpt;
 	int rc;
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	if (!list_empty(&the_lnet.ln_test_peers) && /* normally we don't */
@@ -2367,7 +2364,6 @@ LNetGet(lnet_nid_t self, lnet_handle_md_t mdh,
 	int cpt;
 	int rc;
 
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	if (!list_empty(&the_lnet.ln_test_peers) && /* normally we don't */
@@ -2467,7 +2463,6 @@ LNetDist(lnet_nid_t dstnid, lnet_nid_t *srcnidp, __u32 *orderp)
 	 * keep order 0 free for 0@lo and order 1 free for a local NID
 	 * match
 	 */
-	LASSERT(the_lnet.ln_init);
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	cpt = lnet_net_lock_current();
diff --git a/drivers/staging/lustre/lnet/lnet/lib-msg.c b/drivers/staging/lustre/lnet/lnet/lib-msg.c
index 749e76a..c372390 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-msg.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-msg.c
@@ -571,35 +571,17 @@ lnet_msg_container_cleanup(struct lnet_msg_container *container)
 			    sizeof(*container->msc_finalizers));
 		container->msc_finalizers = NULL;
 	}
-#ifdef LNET_USE_LIB_FREELIST
-	lnet_freelist_fini(&container->msc_freelist);
-#endif
 	container->msc_init = 0;
 }
 
 int
 lnet_msg_container_setup(struct lnet_msg_container *container, int cpt)
 {
-	int rc;
-
 	container->msc_init = 1;
 
 	INIT_LIST_HEAD(&container->msc_active);
 	INIT_LIST_HEAD(&container->msc_finalizing);
 
-#ifdef LNET_USE_LIB_FREELIST
-	memset(&container->msc_freelist, 0, sizeof(lnet_freelist_t));
-
-	rc = lnet_freelist_init(&container->msc_freelist,
-				LNET_FL_MAX_MSGS, sizeof(lnet_msg_t));
-	if (rc) {
-		CERROR("Failed to init freelist for message container\n");
-		lnet_msg_container_cleanup(container);
-		return rc;
-	}
-#else
-	rc = 0;
-#endif
 	/* number of CPUs */
 	container->msc_nfinalizers = cfs_cpt_weight(lnet_cpt_table(), cpt);
 
@@ -613,7 +595,7 @@ lnet_msg_container_setup(struct lnet_msg_container *container, int cpt)
 		return -ENOMEM;
 	}
 
-	return rc;
+	return 0;
 }
 
 void
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index ad9cd44..c1e7bc5 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -1049,7 +1049,7 @@ lnet_router_checker_start(void)
 {
 	struct task_struct *task;
 	int rc;
-	int eqsz;
+	int eqsz = 0;
 
 	LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_SHUTDOWN);
 
@@ -1060,13 +1060,8 @@ lnet_router_checker_start(void)
 	}
 
 	sema_init(&the_lnet.ln_rc_signal, 0);
-	/*
-	 * EQ size doesn't matter; the callback is guaranteed to get every
-	 * event
-	 */
-	eqsz = 0;
-	rc = LNetEQAlloc(eqsz, lnet_router_checker_event,
-			 &the_lnet.ln_rc_eqh);
+
+	rc = LNetEQAlloc(0, lnet_router_checker_event, &the_lnet.ln_rc_eqh);
 	if (rc) {
 		CERROR("Can't allocate EQ(%d): %d\n", eqsz, rc);
 		return -ENOMEM;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 17/24] staging: lustre: prevent assert on LNet module unload
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (15 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 16/24] staging: lustre: assume a kernel build James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 18/24] staging: lustre: remove messages from lazy portal on NI shutdown James Simmons
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

There is a use case where lnet can be unloaded while there are
no NIs configured.  Removing lnet in this case will cause
LNetFini() to be called without a prior call to LNetNIFini().
This will cause the LASSERT(the_lnet.ln_refcount == 0) to be
triggered.

To deal with this use case when LNet is configured a reference
count on the module is taken using try_module_get().  This way
LNet must be unconfigured before it could be removed; therefore
avoiding the above case.  When LNet is unconfigured module_put()
is called to return the reference count.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6010
Reviewed-on: http://review.whamcloud.com/13110
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/module.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/module.c b/drivers/staging/lustre/lnet/lnet/module.c
index 8f053d7..e12fe37 100644
--- a/drivers/staging/lustre/lnet/lnet/module.c
+++ b/drivers/staging/lustre/lnet/lnet/module.c
@@ -53,13 +53,21 @@ lnet_configure(void *arg)
 	mutex_lock(&lnet_config_mutex);
 
 	if (!the_lnet.ln_niinit_self) {
+		rc = try_module_get(THIS_MODULE);
+
+		if (rc != 1)
+			goto out;
+
 		rc = LNetNIInit(LNET_PID_LUSTRE);
 		if (rc >= 0) {
 			the_lnet.ln_niinit_self = 1;
 			rc = 0;
+		} else {
+			module_put(THIS_MODULE);
 		}
 	}
 
+out:
 	mutex_unlock(&lnet_config_mutex);
 	return rc;
 }
@@ -74,6 +82,7 @@ lnet_unconfigure(void)
 	if (the_lnet.ln_niinit_self) {
 		the_lnet.ln_niinit_self = 0;
 		LNetNIFini();
+		module_put(THIS_MODULE);
 	}
 
 	mutex_lock(&the_lnet.ln_api_mutex);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 18/24] staging: lustre: remove messages from lazy portal on NI shutdown
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (16 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 17/24] staging: lustre: prevent assert on LNet module unload James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 19/24] staging: lustre: remove unnecessary EXPORT_SYMBOL from lnet layer James Simmons
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

When shutting down an NI in a busy system, some messages received
on this NI, might be on the lazy portal.  They would have grabbed
a ref count on the NI.  Therefore NI will not be removed until
messages are processed.

In order to avoid this scenario, when an NI is shutdown go through
all messages queued on the lazy portal and drop messages for the
NI being shutdown

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6040
Reviewed-on: http://review.whamcloud.com/13836
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    1 +
 drivers/staging/lustre/lnet/lnet/api-ni.c          |    6 ++
 drivers/staging/lustre/lnet/lnet/lib-ptl.c         |   54 +++++++++++++-------
 3 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 0928bc9..a5f1aec 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -482,6 +482,7 @@ int lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
 		    __s32 peer_timeout, __s32 peer_cr, __s32 peer_buf_cr,
 		    __s32 credits);
 int lnet_dyn_del_ni(__u32 net);
+int lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason);
 
 int lnet_islocalnid(lnet_nid_t nid);
 int lnet_islocalnet(__u32 net);
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 0c7db19..3ecc96a 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1196,10 +1196,16 @@ lnet_shutdown_lndnis(void)
 static void
 lnet_shutdown_lndni(struct lnet_ni *ni)
 {
+	int i;
+
 	lnet_net_lock(LNET_LOCK_EX);
 	lnet_ni_unlink_locked(ni);
 	lnet_net_unlock(LNET_LOCK_EX);
 
+	/* clear messages for this NI on the lazy portal */
+	for (i = 0; i < the_lnet.ln_nportals; i++)
+		lnet_clear_lazy_portal(ni, i, "Shutting down NI");
+
 	/* Do peer table cleanup for this ni */
 	lnet_peer_tables_cleanup(ni);
 
diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
index 0cdeea9..5a9ab87 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
@@ -902,17 +902,8 @@ LNetSetLazyPortal(int portal)
 }
 EXPORT_SYMBOL(LNetSetLazyPortal);
 
-/**
- * Turn off the lazy portal attribute. Delayed requests on the portal,
- * if any, will be all dropped when this function returns.
- *
- * \param portal Index of the portal to disable the lazy attribute on.
- *
- * \retval 0       On success.
- * \retval -EINVAL If \a portal is not a valid index.
- */
 int
-LNetClearLazyPortal(int portal)
+lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason)
 {
 	struct lnet_portal *ptl;
 	LIST_HEAD(zombies);
@@ -931,21 +922,48 @@ LNetClearLazyPortal(int portal)
 		return 0;
 	}
 
-	if (the_lnet.ln_shutdown)
-		CWARN("Active lazy portal %d on exit\n", portal);
-	else
-		CDEBUG(D_NET, "clearing portal %d lazy\n", portal);
+	if (ni) {
+		struct lnet_msg *msg, *tmp;
 
-	/* grab all the blocked messages atomically */
-	list_splice_init(&ptl->ptl_msg_delayed, &zombies);
+		/* grab all messages which are on the NI passed in */
+		list_for_each_entry_safe(msg, tmp, &ptl->ptl_msg_delayed,
+					 msg_list) {
+			if (msg->msg_rxpeer->lp_ni == ni)
+				list_move(&msg->msg_list, &zombies);
+		}
+	} else {
+		if (the_lnet.ln_shutdown)
+			CWARN("Active lazy portal %d on exit\n", portal);
+		else
+			CDEBUG(D_NET, "clearing portal %d lazy\n", portal);
+
+		/* grab all the blocked messages atomically */
+		list_splice_init(&ptl->ptl_msg_delayed, &zombies);
 
-	lnet_ptl_unsetopt(ptl, LNET_PTL_LAZY);
+		lnet_ptl_unsetopt(ptl, LNET_PTL_LAZY);
+	}
 
 	lnet_ptl_unlock(ptl);
 	lnet_res_unlock(LNET_LOCK_EX);
 
-	lnet_drop_delayed_msg_list(&zombies, "Clearing lazy portal attr");
+	lnet_drop_delayed_msg_list(&zombies, reason);
 
 	return 0;
 }
+
+/**
+ * Turn off the lazy portal attribute. Delayed requests on the portal,
+ * if any, will be all dropped when this function returns.
+ *
+ * \param portal Index of the portal to disable the lazy attribute on.
+ *
+ * \retval 0       On success.
+ * \retval -EINVAL If \a portal is not a valid index.
+ */
+int
+LNetClearLazyPortal(int portal)
+{
+	return lnet_clear_lazy_portal(NULL, portal,
+				      "Clearing lazy portal attr");
+}
 EXPORT_SYMBOL(LNetClearLazyPortal);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 19/24] staging: lustre: remove unnecessary EXPORT_SYMBOL from lnet layer
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (17 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 18/24] staging: lustre: remove messages from lazy portal on NI shutdown James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 20/24] staging: lustre: avoid race during lnet acceptor thread termination James Simmons
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Frank Zago

From: Frank Zago <fzago@cray.com>

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Signed-off-by: Frank Zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829
Reviewed-on: http://review.whamcloud.com/13320
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/lib-socket.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/lib-socket.c b/drivers/staging/lustre/lnet/lnet/lib-socket.c
index 53dd0bd..88905d5 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-socket.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-socket.c
@@ -514,7 +514,6 @@ lnet_sock_listen(struct socket **sockp, __u32 local_ip, int local_port,
 	sock_release(*sockp);
 	return rc;
 }
-EXPORT_SYMBOL(lnet_sock_listen);
 
 int
 lnet_sock_accept(struct socket **newsockp, struct socket *sock)
@@ -558,7 +557,6 @@ failed:
 	sock_release(newsock);
 	return rc;
 }
-EXPORT_SYMBOL(lnet_sock_accept);
 
 int
 lnet_sock_connect(struct socket **sockp, int *fatal, __u32 local_ip,
@@ -596,4 +594,3 @@ lnet_sock_connect(struct socket **sockp, int *fatal, __u32 local_ip,
 	sock_release(*sockp);
 	return rc;
 }
-EXPORT_SYMBOL(lnet_sock_connect);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 20/24] staging: lustre: avoid race during lnet acceptor thread termination
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (18 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 19/24] staging: lustre: remove unnecessary EXPORT_SYMBOL from lnet layer James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 21/24] staging: lustre: use sock.h in only acceptor.c James Simmons
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Bruno Faccini

From: Bruno Faccini <bruno.faccini@intel.com>

This patch will avoid potential race, around socket sleepers
wait list, during acceptor thread termination and using
sk_callback_lock RW-Lock protection.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6476
Reviewed-on: http://review.whamcloud.com/14503
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/acceptor.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
index 8f9876b..3468433 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -488,11 +488,17 @@ lnet_acceptor_start(void)
 void
 lnet_acceptor_stop(void)
 {
+	struct sock *sk;
+
 	if (lnet_acceptor_state.pta_shutdown) /* not running */
 		return;
 
 	lnet_acceptor_state.pta_shutdown = 1;
-	wake_up_all(sk_sleep(lnet_acceptor_state.pta_sock->sk));
+
+	sk = lnet_acceptor_state.pta_sock->sk;
+
+	/* awake any sleepers using safe method */
+	sk->sk_state_change(sk);
 
 	/* block until acceptor signals exit */
 	wait_for_completion(&lnet_acceptor_state.pta_signal);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 21/24] staging: lustre: use sock.h in only acceptor.c
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (19 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 20/24] staging: lustre: avoid race during lnet acceptor thread termination James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 22/24] staging: lustre: Allocate the correct number of rtr buffers James Simmons
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, James Simmons

On some platforms having sock.h in lib-types.h would collide with
other included header files being used in the LNet layer. Looking
at what was needed from sock.h only acceptor.c is dependent on it.
To avoid these issues we just use sock.h only in acceptor.c.

Signed-off-by: James Simmons <jsimmons@infradead.org>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6763
Reviewed-on: http://review.whamcloud.com/15386
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |    1 -
 drivers/staging/lustre/lnet/lnet/acceptor.c        |    1 +
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index f588e06..c10f03b 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -38,7 +38,6 @@
 #include <linux/kthread.h>
 #include <linux/uio.h>
 #include <linux/types.h>
-#include <net/sock.h>
 
 #include "types.h"
 
diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
index 3468433..1452bb3 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -36,6 +36,7 @@
 
 #define DEBUG_SUBSYSTEM S_LNET
 #include <linux/completion.h>
+#include <net/sock.h>
 #include "../../include/linux/lnet/lib-lnet.h"
 
 static int   accept_port    = 988;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 22/24] staging: lustre: Allocate the correct number of rtr buffers
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (20 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 21/24] staging: lustre: use sock.h in only acceptor.c James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 23/24] staging: lustre: Use lnet_is_route_alive for router aliveness James Simmons
  2016-02-22 22:29 ` [PATCH 24/24] staging: lustre: Remove LASSERTS from router checker James Simmons
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Amir Shehata

From: Amir Shehata <amir.shehata@intel.com>

This patch ensures that the correct number of router buffers are
allocated.  It keeps a count that keeps track of the number of
buffers allocated.  Another count keeps the number of buffers
requested. The number of buffers allocated is set when creating
new buffers and reduced when buffers are freed.

The number of requested buffer is set when the buffers are
allocated and is checked when credits are returned to determine
whether the buffer should be freed or kept.

In lnet_rtrpool_adjust_bufs() grab lnet_net_lock() before using
rbp_nbuffers to ensure that it doesn't change by
lnet_return_rx_credits_locked() during the process of allocating
new buffers.  All other access to rbp_nbuffers is already being
protected by lnet_net_lock().

This avoids the case where we allocate less than the desired
number of buffers.

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6122
Reviewed-on: http://review.whamcloud.com/13519
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |    5 ++-
 drivers/staging/lustre/lnet/lnet/lib-move.c        |    3 +-
 drivers/staging/lustre/lnet/lnet/router.c          |   32 +++++++++++++++-----
 3 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index c10f03b..07b8db1 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -396,7 +396,10 @@ typedef struct {
 	struct list_head	rbp_msgs;	/* messages blocking
 						   for a buffer */
 	int			rbp_npages;	/* # pages in each buffer */
-	int			rbp_nbuffers;	/* # buffers */
+	/* requested number of buffers */
+	int			rbp_req_nbuffers;
+	/* # buffers actually allocated */
+	int			rbp_nbuffers;
 	int			rbp_credits;	/* # free buffers /
 						     blocked messages */
 	int			rbp_mincredits;	/* low water mark */
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index e5a8dbc..7bc3e91 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1113,9 +1113,10 @@ lnet_return_rx_credits_locked(lnet_msg_t *msg)
 		 * buffers in this pool.  Make sure we never put back
 		 * more buffers than the stated number.
 		 */
-		if (rbp->rbp_credits >= rbp->rbp_nbuffers) {
+		if (unlikely(rbp->rbp_credits >= rbp->rbp_req_nbuffers)) {
 			/* Discard this buffer so we don't have too many. */
 			lnet_destroy_rtrbuf(rb, rbp->rbp_npages);
+			rbp->rbp_nbuffers--;
 		} else {
 			list_add(&rb->rb_list, &rbp->rbp_bufs);
 			rbp->rbp_credits++;
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index c1e7bc5..198ff03 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -1359,6 +1359,7 @@ lnet_rtrpool_free_bufs(lnet_rtrbufpool_t *rbp, int cpt)
 	lnet_net_lock(cpt);
 	lnet_drop_routed_msgs_locked(&rbp->rbp_msgs, cpt);
 	list_splice_init(&rbp->rbp_bufs, &tmp);
+	rbp->rbp_req_nbuffers = 0;
 	rbp->rbp_nbuffers = 0;
 	rbp->rbp_credits = 0;
 	rbp->rbp_mincredits = 0;
@@ -1379,20 +1380,33 @@ lnet_rtrpool_adjust_bufs(lnet_rtrbufpool_t *rbp, int nbufs, int cpt)
 	lnet_rtrbuf_t *rb;
 	int num_rb;
 	int num_buffers = 0;
+	int old_req_nbufs;
 	int npages = rbp->rbp_npages;
 
+	lnet_net_lock(cpt);
 	/*
 	 * If we are called for less buffers than already in the pool, we
-	 * just lower the nbuffers number and excess buffers will be
+	 * just lower the req_nbuffers number and excess buffers will be
 	 * thrown away as they are returned to the free list.  Credits
 	 * then get adjusted as well.
+	 * If we already have enough buffers allocated to serve the
+	 * increase requested, then we can treat that the same way as we
+	 * do the decrease.
 	 */
-	if (nbufs <= rbp->rbp_nbuffers) {
-		lnet_net_lock(cpt);
-		rbp->rbp_nbuffers = nbufs;
+	num_rb = nbufs - rbp->rbp_nbuffers;
+	if (nbufs <= rbp->rbp_req_nbuffers || num_rb <= 0) {
+		rbp->rbp_req_nbuffers = nbufs;
 		lnet_net_unlock(cpt);
 		return 0;
 	}
+	/*
+	 * store the older value of rbp_req_nbuffers and then set it to
+	 * the new request to prevent lnet_return_rx_credits_locked() from
+	 * freeing buffers that we need to keep around
+	 */
+	old_req_nbufs = rbp->rbp_req_nbuffers;
+	rbp->rbp_req_nbuffers = nbufs;
+	lnet_net_unlock(cpt);
 
 	INIT_LIST_HEAD(&rb_list);
 
@@ -1401,19 +1415,21 @@ lnet_rtrpool_adjust_bufs(lnet_rtrbufpool_t *rbp, int nbufs, int cpt)
 	 * allocated successfully then join this list to the rbp buffer
 	 * list. If not then free all allocated buffers.
 	 */
-	num_rb = rbp->rbp_nbuffers;
-
-	while (num_rb < nbufs) {
+	while (num_rb-- > 0) {
 		rb = lnet_new_rtrbuf(rbp, cpt);
 		if (!rb) {
 			CERROR("Failed to allocate %d route bufs of %d pages\n",
 			       nbufs, npages);
+
+			lnet_net_lock(cpt);
+			rbp->rbp_req_nbuffers = old_req_nbufs;
+			lnet_net_unlock(cpt);
+
 			goto failed;
 		}
 
 		list_add(&rb->rb_list, &rb_list);
 		num_buffers++;
-		num_rb++;
 	}
 
 	lnet_net_lock(cpt);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 23/24] staging: lustre: Use lnet_is_route_alive for router aliveness
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (21 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 22/24] staging: lustre: Allocate the correct number of rtr buffers James Simmons
@ 2016-02-22 22:29 ` James Simmons
  2016-02-22 22:29 ` [PATCH 24/24] staging: lustre: Remove LASSERTS from router checker James Simmons
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Chris Horn

From: Chris Horn <hornc@cray.com>

lctl show_route and lctl route_list will output router aliveness
information via lnet_get_route(). lnet_get_route() should use the
lnet_is_route_alive() function, introduced in e8a1124
http://review.whamcloud.com/7857, to determine route aliveness.

Signed-off-by: Chris Horn <hornc@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5733
Reviewed-on: http://review.whamcloud.com/14055
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/router.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 198ff03..5a6086b 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -607,8 +607,7 @@ lnet_get_route(int idx, __u32 *net, __u32 *hops,
 					*hops     = route->lr_hops;
 					*priority = route->lr_priority;
 					*gateway  = route->lr_gateway->lp_nid;
-					*alive = route->lr_gateway->lp_alive &&
-						 !route->lr_downis;
+					*alive = lnet_is_route_alive(route);
 					lnet_net_unlock(cpt);
 					return 0;
 				}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 24/24] staging: lustre: Remove LASSERTS from router checker
  2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
                   ` (22 preceding siblings ...)
  2016-02-22 22:29 ` [PATCH 23/24] staging: lustre: Use lnet_is_route_alive for router aliveness James Simmons
@ 2016-02-22 22:29 ` James Simmons
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Doug Oucharek

From: Doug Oucharek <doug.s.oucharek@intel.com>

In lnet_router_checker(), there are two LASSERTS.  Neither protects
us from anything and one of them triggered for a customer crashing
the system unecessarily.  This patch removes them.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7362
Reviewed-on: http://review.whamcloud.com/17003
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Matt Ezell <ezellma@ornl.gov>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
---
 drivers/staging/lustre/lnet/lnet/router.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 5a6086b..5e8b0ba 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -1228,8 +1228,6 @@ lnet_router_checker(void *arg)
 
 	cfs_block_allsigs();
 
-	LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING);
-
 	while (the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING) {
 		__u64 version;
 		int cpt;
@@ -1287,8 +1285,6 @@ rescan:
 							 cfs_time_seconds(1));
 	}
 
-	LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_STOPPING);
-
 	lnet_prune_rc_data(1); /* wait for UNLINK */
 
 	the_lnet.ln_rc_state = LNET_RC_STATE_SHUTDOWN;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-02-22 22:36 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-22 22:29 [PATCH 00/24] Second batch of LNet updates James Simmons
2016-02-22 22:29 ` [PATCH 01/24] staging: lustre: Dynamic LNet Configuration (DLC) IOCTL changes James Simmons
2016-02-22 22:29 ` [PATCH 02/24] staging: lustre: Dynamic LNet Configuration (DLC) show command James Simmons
2016-02-22 22:29 ` [PATCH 03/24] staging: lustre: fix crash due to NULL networks string James Simmons
2016-02-22 22:29 ` [PATCH 04/24] staging: lustre: DLC user/kernel space glue code James Simmons
2016-02-22 22:29 ` [PATCH 05/24] staging: lustre: make local functions static for LNet ni James Simmons
2016-02-22 22:29 ` [PATCH 06/24] staging: lustre: remove LUSTRE_{,SRV_}LNET_PID James Simmons
2016-02-22 22:29 ` [PATCH 07/24] staging: lustre: improve LNet clean up code and API James Simmons
2016-02-22 22:29 ` [PATCH 08/24] staging: lustre: return appropriate errno when adding route James Simmons
2016-02-22 22:29 ` [PATCH 09/24] staging: lustre: make some lnet functions static James Simmons
2016-02-22 22:29 ` [PATCH 10/24] staging: lustre: missed a few cases of using NULL instead of 0 James Simmons
2016-02-22 22:29 ` [PATCH 11/24] staging: lustre: startup lnet acceptor thread dynamically James Simmons
2016-02-22 22:29 ` [PATCH 12/24] staging: lustre: reject invalid net configuration for lnet James Simmons
2016-02-22 22:29 ` [PATCH 13/24] staging: lustre: return -EEXIST if NI is not unique James Simmons
2016-02-22 22:29 ` [PATCH 14/24] staging: lustre: handle lnet_check_routes() errors James Simmons
2016-02-22 22:29 ` [PATCH 15/24] staging: lustre: improvement to router checker James Simmons
2016-02-22 22:29 ` [PATCH 16/24] staging: lustre: assume a kernel build James Simmons
2016-02-22 22:29 ` [PATCH 17/24] staging: lustre: prevent assert on LNet module unload James Simmons
2016-02-22 22:29 ` [PATCH 18/24] staging: lustre: remove messages from lazy portal on NI shutdown James Simmons
2016-02-22 22:29 ` [PATCH 19/24] staging: lustre: remove unnecessary EXPORT_SYMBOL from lnet layer James Simmons
2016-02-22 22:29 ` [PATCH 20/24] staging: lustre: avoid race during lnet acceptor thread termination James Simmons
2016-02-22 22:29 ` [PATCH 21/24] staging: lustre: use sock.h in only acceptor.c James Simmons
2016-02-22 22:29 ` [PATCH 22/24] staging: lustre: Allocate the correct number of rtr buffers James Simmons
2016-02-22 22:29 ` [PATCH 23/24] staging: lustre: Use lnet_is_route_alive for router aliveness James Simmons
2016-02-22 22:29 ` [PATCH 24/24] staging: lustre: Remove LASSERTS from router checker James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).