lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021
@ 2021-04-15  4:01 James Simmons
  2021-04-15  4:01 ` [lustre-devel] [PATCH 01/49] lnet: libcfs: Fix for unconfigured arch_stackwalk James Simmons
                   ` (48 more replies)
  0 siblings, 49 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:01 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

New backport of patches from OpenSFS tree as of March 30, 2021.
Include a few missing fixes that were lacking in the Linux client.
This covers a few pool quota patches and the NRS delay policy.

Andreas Dilger (4):
  lustre: llite: mark extended attr and inode flags
  lustre: mds: add enums for MDS_ATTR flags
  lustre: uapi: remove OBD_IOC_LOV_GET_CONFIG
  lustre: lmv: handle default stripe_count=-1 properly

Bobi Jam (1):
  lustre: lov: fault page update cp_lov_index

Chris Horn (10):
  lnet: lnet_notify sets route aliveness incorrectly
  lnet: Prevent discovery on peer marked deletion
  lnet: Prevent discovery on deleted peer
  lnet: Transfer disc src NID when merging peers
  lnet: Lookup lpni after discovery
  lnet: Correct asymmetric route detection
  lustre: ptlrpc: Implement NRS Delay Policy
  lnet: Age peer NI out of recovery
  lnet: Only recover known good peer NIs
  lnet: Recover peer NI w/exponential backoff interval

Emoly Liu (1):
  lustre: lov: return valid stripe_count/size for PFL files

Etienne AUJAMES (1):
  lustre: obdclass: Protect cl_env_percpu[]

James Simmons (4):
  lustre: llite: update and fix module loading bug in mounting code
  lustre: llite: create file_operations registration function.
  lnet: uapi: move userland only nidstr.h handling
  lustre: use tgt_pool for lov layer

Lai Siyao (1):
  lustre: lmv: striped directory as subdirectory mount

Lei Feng (1):
  lustre: log: Add ending newline for some messages.

Mikhail Pershin (1):
  lustre: llite: mirror extend/copy keeps sparseness

Mr NeilBrown (14):
  lustre: lmv: iput() can safely be passed NULL.
  lnet: socklnd: change various ints to bool.
  lustre: fixup ldlm_pool and lu_object shrinker failure cases
  lustre: use with_imp_locked() more broadly.
  lnet: o2iblnd: change some ints to bool.
  lustre: osc: fix performance regression in osc_extent_merge()
  lnet: libcfs: restore LNET_DUMP_ON_PANIC functionality.
  lnet: libcfs: don't depend on sysctl support for debugfs
  lustre: ptlrpc: rename cfs_binheap to simply binheap
  lustre: ptlrpc: mark some functions as static
  lustre: quota: call rhashtable_lookup near params decl
  lnet: libcfs: discard cfs_trace_console_buffers[]
  lnet: libcfs: discard cfs_trace_copyin_string()
  lustre: lmv: don't use lqr_alloc spinlock in lmv

Mr. NeilBrown (2):
  lnet: convert lpni_refcount to a kref
  lnet: libcfs: discard cfs_array_alloc()

NeilBrown (1):
  lustre: ptlrpc: don't use list_for_each_entry_safe unnecessarily.

Nikitas Angelinas (1):
  lustre: ptlrpc: Add a binary heap implementation

Oleg Drokin (1):
  lustre: update version to 2.14.51

Sebastien Buisson (1):
  lustre: sec: fix migrate for encrypted dir

Sergey Cheremencev (1):
  lustre: quota: make used for pool correct

Shaun Tancheff (1):
  lnet: libcfs: Fix for unconfigured arch_stackwalk

Vitaly Fertman (2):
  lustre: ldlm: not freed req on enqueue
  lustre: lov: cancel layout lock on replay deadlock

Yang Sheng (1):
  lustre: ptlrpc: fix ASSERTION on scp_rqbd_posted

 fs/lustre/include/lprocfs_status.h       |  11 +-
 fs/lustre/include/lustre_disk.h          |   5 +-
 fs/lustre/include/lustre_lmv.h           |   3 +-
 fs/lustre/include/lustre_net.h           |   1 -
 fs/lustre/include/lustre_nrs.h           |  16 +
 fs/lustre/include/lustre_nrs_delay.h     |  87 ++++
 fs/lustre/include/obd_class.h            |   5 +-
 fs/lustre/include/obd_support.h          |  30 +-
 fs/lustre/ldlm/ldlm_pool.c               |  43 +-
 fs/lustre/ldlm/ldlm_request.c            |   4 +
 fs/lustre/llite/crypto.c                 |   1 +
 fs/lustre/llite/dir.c                    |  13 +-
 fs/lustre/llite/file.c                   | 143 ++++--
 fs/lustre/llite/llite_internal.h         |  24 +-
 fs/lustre/llite/llite_lib.c              |  14 +-
 fs/lustre/llite/namei.c                  |   2 +
 fs/lustre/llite/super25.c                |  60 ++-
 fs/lustre/llite/xattr.c                  |  11 +
 fs/lustre/lmv/lmv_intent.c               |  17 +-
 fs/lustre/lmv/lmv_obd.c                  |  12 +-
 fs/lustre/lov/lov_cl_internal.h          |  10 +-
 fs/lustre/lov/lov_internal.h             |  10 +-
 fs/lustre/lov/lov_io.c                   |  37 ++
 fs/lustre/lov/lov_obd.c                  |  84 ++-
 fs/lustre/lov/lov_object.c               |  44 +-
 fs/lustre/lov/lov_pack.c                 |   7 -
 fs/lustre/lov/lov_pool.c                 | 130 +----
 fs/lustre/mdc/lproc_mdc.c                |  14 +-
 fs/lustre/mdc/mdc_lib.c                  |  10 +-
 fs/lustre/mdc/mdc_request.c              |  12 +-
 fs/lustre/mgc/mgc_request.c              |  94 ++--
 fs/lustre/obdclass/Makefile              |   4 +-
 fs/lustre/obdclass/cl_object.c           |   3 +
 fs/lustre/obdclass/class_obd.c           |  15 +-
 fs/lustre/obdclass/lu_object.c           |  39 +-
 fs/lustre/obdclass/lu_tgt_pool.c         | 241 +++++++++
 fs/lustre/osc/lproc_osc.c                |  10 +-
 fs/lustre/osc/osc_cache.c                |   5 +-
 fs/lustre/osc/osc_request.c              |  12 +-
 fs/lustre/ptlrpc/Makefile                |   3 +-
 fs/lustre/ptlrpc/client.c                |  10 +-
 fs/lustre/ptlrpc/heap.c                  | 502 ++++++++++++++++++
 fs/lustre/ptlrpc/heap.h                  | 189 +++++++
 fs/lustre/ptlrpc/import.c                |  14 +-
 fs/lustre/ptlrpc/llog_client.c           |   2 +-
 fs/lustre/ptlrpc/nrs.c                   |   4 +
 fs/lustre/ptlrpc/nrs_delay.c             | 851 +++++++++++++++++++++++++++++++
 fs/lustre/ptlrpc/pinger.c                |  21 +-
 fs/lustre/ptlrpc/ptlrpc_internal.h       |   7 +-
 fs/lustre/ptlrpc/ptlrpcd.c               |   4 +-
 fs/lustre/ptlrpc/recover.c               |  14 +-
 fs/lustre/ptlrpc/sec_config.c            |   8 +-
 fs/lustre/ptlrpc/service.c               |  26 +-
 include/linux/libcfs/libcfs.h            |   3 +
 include/linux/libcfs/libcfs_debug.h      |   2 -
 include/linux/libcfs/libcfs_private.h    |   7 -
 include/linux/lnet/lib-lnet.h            |  38 +-
 include/linux/lnet/lib-types.h           |  28 +-
 include/uapi/linux/lnet/nidstr.h         |  15 -
 include/uapi/linux/lustre/lustre_idl.h   |  50 +-
 include/uapi/linux/lustre/lustre_ioctl.h |   3 +-
 include/uapi/linux/lustre/lustre_ver.h   |   4 +-
 net/lnet/Kconfig                         |   9 +
 net/lnet/klnds/o2iblnd/o2iblnd.c         |   8 +-
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c      |  26 +-
 net/lnet/klnds/socklnd/socklnd.c         |  13 +-
 net/lnet/klnds/socklnd/socklnd_cb.c      |  22 +-
 net/lnet/libcfs/debug.c                  |  37 +-
 net/lnet/libcfs/libcfs_mem.c             |  52 --
 net/lnet/libcfs/module.c                 | 133 ++++-
 net/lnet/libcfs/tracefile.c              | 220 ++++----
 net/lnet/libcfs/tracefile.h              |   2 -
 net/lnet/lnet/api-ni.c                   |  12 +
 net/lnet/lnet/lib-move.c                 | 158 +++---
 net/lnet/lnet/lib-msg.c                  |  31 +-
 net/lnet/lnet/lib-ptl.c                  |  19 +-
 net/lnet/lnet/nidstrings.c               |   1 +
 net/lnet/lnet/peer.c                     | 303 +++++++----
 net/lnet/lnet/router.c                   |  39 +-
 net/lnet/lnet/router_proc.c              |  26 +-
 80 files changed, 3194 insertions(+), 1005 deletions(-)
 create mode 100644 fs/lustre/include/lustre_nrs_delay.h
 create mode 100644 fs/lustre/obdclass/lu_tgt_pool.c
 create mode 100644 fs/lustre/ptlrpc/heap.c
 create mode 100644 fs/lustre/ptlrpc/heap.h
 create mode 100644 fs/lustre/ptlrpc/nrs_delay.c

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 01/49] lnet: libcfs: Fix for unconfigured arch_stackwalk
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
@ 2021-04-15  4:01 ` James Simmons
  2021-04-15  4:01 ` [lustre-devel] [PATCH 02/49] lustre: lmv: iput() can safely be passed NULL James Simmons
                   ` (47 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:01 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Shaun Tancheff, Lustre Development List

From: Shaun Tancheff <shaun.tancheff@hpe.com>

On aarch64 CONFIG_ARCH_STACKWALK is not defined and
save_stack_trace_tsk() is not available.

HPE-bug-id: LUS-9518
WC-bug-id: https://jira.whamcloud.com/browse/LU-14099
Lustre-commit: 58ac9d3f1844701 ("LU-14099 build: Fix for unconfigured arch_stackwalk")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/40503
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/libcfs/debug.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/net/lnet/libcfs/debug.c b/net/lnet/libcfs/debug.c
index ba32a99..e68dd91 100644
--- a/net/lnet/libcfs/debug.c
+++ b/net/lnet/libcfs/debug.c
@@ -451,32 +451,40 @@ void __noreturn lbug_with_loc(struct libcfs_debug_msg_data *msgdata)
 EXPORT_SYMBOL(lbug_with_loc);
 
 #ifdef CONFIG_STACKTRACE
+
 #define MAX_ST_ENTRIES 100
 static DEFINE_SPINLOCK(st_lock);
 
 static void libcfs_call_trace(struct task_struct *tsk)
 {
 	static unsigned long entries[MAX_ST_ENTRIES];
+#ifdef CONFIG_ARCH_STACKWALK
 	unsigned int nr_entries;
 
+	spin_lock(&st_lock);
 	pr_info("Pid: %d, comm: %.20s %s %s\n", tsk->pid, tsk->comm,
 		init_utsname()->release, init_utsname()->version);
 	pr_info("Call Trace:\n");
-
-	spin_lock(&st_lock);
 	nr_entries = stack_trace_save_tsk(tsk, entries,
 					  MAX_ST_ENTRIES, 0);
-
 	stack_trace_print(entries, nr_entries, 0);
 	spin_unlock(&st_lock);
-}
 #else /* !CONFIG_STACKTRACE */
-static void libcfs_call_trace(struct task_struct *tsk)
-{
-	if (tsk == current)
-		dump_stack();
-	else
-		CWARN("can't show stack: kernel doesn't export show_task\n");
+	struct stack_trace trace;
+
+	trace.nr_entries = 0;
+	trace.max_entries = MAX_ST_ENTRIES;
+	trace.entries = entries;
+	trace.skip = 0;
+
+	spin_lock(&st_lock);
+	pr_info("Pid: %d, comm: %.20s %s %s\n", tsk->pid, tsk->comm,
+		init_utsname()->release, init_utsname()->version);
+	pr_info("Call Trace:\n");
+	save_stack_trace_tsk(tsk, &trace);
+	stack_trace_print(entries, nr_entries, 0);
+	spin_unlock(&st_lock);
+#endif
 }
 #endif /* !CONFIG_STACKTRACE */
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 02/49] lustre: lmv: iput() can safely be passed NULL.
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
  2021-04-15  4:01 ` [lustre-devel] [PATCH 01/49] lnet: libcfs: Fix for unconfigured arch_stackwalk James Simmons
@ 2021-04-15  4:01 ` James Simmons
  2021-04-15  4:01 ` [lustre-devel] [PATCH 03/49] lustre: llite: mark extended attr and inode flags James Simmons
                   ` (46 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:01 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

iput() is a no-op when passed a NULL pointer, so there is no
need to test for NULL before calling it - doing so clutters
the code.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: 650a6dec18306e56 ("LU-6142 lustre: iput() can safely be passed NULL.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/40291
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lmv/lmv_obd.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 747786e..9c0a0cf 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -3141,10 +3141,8 @@ static int lmv_unpackmd(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 		}
 
 		if (lmv_dir_striped(lsm)) {
-			for (i = 0; i < lsm->lsm_md_stripe_count; i++) {
-				if (lsm->lsm_md_oinfo[i].lmo_root)
-					iput(lsm->lsm_md_oinfo[i].lmo_root);
-			}
+			for (i = 0; i < lsm->lsm_md_stripe_count; i++)
+				iput(lsm->lsm_md_oinfo[i].lmo_root);
 			lsm_size = lmv_stripe_md_size(lsm->lsm_md_stripe_count);
 		} else {
 			lsm_size = lmv_stripe_md_size(0);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 03/49] lustre: llite: mark extended attr and inode flags
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
  2021-04-15  4:01 ` [lustre-devel] [PATCH 01/49] lnet: libcfs: Fix for unconfigured arch_stackwalk James Simmons
  2021-04-15  4:01 ` [lustre-devel] [PATCH 02/49] lustre: lmv: iput() can safely be passed NULL James Simmons
@ 2021-04-15  4:01 ` James Simmons
  2021-04-15  4:01 ` [lustre-devel] [PATCH 04/49] lnet: lnet_notify sets route aliveness incorrectly James Simmons
                   ` (45 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:01 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Clearly name the extended attribute and inode flags so that it is
possible to distinguish them more easily.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12885
Lustre-commit: b51a1e1140cac80c ("LU-12885 llite: mark extended attr and inode flags")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36519
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_support.h        | 28 ++++++++++++++--------------
 fs/lustre/llite/file.c                 | 25 ++++++++++---------------
 fs/lustre/llite/llite_internal.h       | 12 ++++++------
 fs/lustre/llite/llite_lib.c            |  2 +-
 include/uapi/linux/lustre/lustre_idl.h |  4 +++-
 5 files changed, 34 insertions(+), 37 deletions(-)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index c678c8b..152f95c 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -554,28 +554,28 @@
  * versions. These flags are set/cleared via FSFILT_IOC_{GET,SET}_FLAGS.
  * See b=16526 for a full history.
  */
-static inline int ll_ext_to_inode_flags(int flags)
+static inline int ll_ext_to_inode_flags(int ext_flags)
 {
-	return (((flags & LUSTRE_SYNC_FL)      ? S_SYNC      : 0) |
-		((flags & LUSTRE_NOATIME_FL)   ? S_NOATIME   : 0) |
-		((flags & LUSTRE_APPEND_FL)    ? S_APPEND    : 0) |
-		((flags & LUSTRE_DIRSYNC_FL)   ? S_DIRSYNC   : 0) |
+	return (((ext_flags & LUSTRE_SYNC_FL)      ? S_SYNC      : 0) |
+		((ext_flags & LUSTRE_NOATIME_FL)   ? S_NOATIME   : 0) |
+		((ext_flags & LUSTRE_APPEND_FL)    ? S_APPEND    : 0) |
+		((ext_flags & LUSTRE_DIRSYNC_FL)   ? S_DIRSYNC   : 0) |
 #if defined(S_ENCRYPTED)
-		((flags & LUSTRE_ENCRYPT_FL)   ? S_ENCRYPTED : 0) |
+		((ext_flags & LUSTRE_ENCRYPT_FL)   ? S_ENCRYPTED : 0) |
 #endif
-		((flags & LUSTRE_IMMUTABLE_FL) ? S_IMMUTABLE : 0));
+		((ext_flags & LUSTRE_IMMUTABLE_FL) ? S_IMMUTABLE : 0));
 }
 
-static inline int ll_inode_to_ext_flags(int iflags)
+static inline int ll_inode_to_ext_flags(int inode_flags)
 {
-	return (((iflags & S_SYNC)      ? LUSTRE_SYNC_FL      : 0) |
-		((iflags & S_NOATIME)   ? LUSTRE_NOATIME_FL   : 0) |
-		((iflags & S_APPEND)    ? LUSTRE_APPEND_FL    : 0) |
-		((iflags & S_DIRSYNC)   ? LUSTRE_DIRSYNC_FL   : 0) |
+	return (((inode_flags & S_SYNC)      ? LUSTRE_SYNC_FL      : 0) |
+		((inode_flags & S_NOATIME)   ? LUSTRE_NOATIME_FL   : 0) |
+		((inode_flags & S_APPEND)    ? LUSTRE_APPEND_FL    : 0) |
+		((inode_flags & S_DIRSYNC)   ? LUSTRE_DIRSYNC_FL   : 0) |
 #if defined(S_ENCRYPTED)
-		((iflags & S_ENCRYPTED) ? LUSTRE_ENCRYPT_FL   : 0) |
+		((inode_flags & S_ENCRYPTED) ? LUSTRE_ENCRYPT_FL   : 0) |
 #endif
-		((iflags & S_IMMUTABLE) ? LUSTRE_IMMUTABLE_FL : 0));
+		((inode_flags & S_IMMUTABLE) ? LUSTRE_IMMUTABLE_FL : 0));
 }
 
 struct obd_heat_instance {
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index c6d53b1..0d866ec 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -3182,9 +3182,8 @@ int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd,
 	struct md_op_data *op_data;
 	struct fsxattr fsxattr;
 	struct cl_object *obj;
-	struct iattr *attr;
+	unsigned int inode_flags;
 	int rc = 0;
-	int flags;
 
 	if (copy_from_user(&fsxattr,
 			   (const struct fsxattr __user *)arg,
@@ -3200,8 +3199,8 @@ int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd,
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
-	flags = ll_xflags_to_inode_flags(fsxattr.fsx_xflags);
-	op_data->op_attr_flags = ll_inode_to_ext_flags(flags);
+	inode_flags = ll_xflags_to_inode_flags(fsxattr.fsx_xflags);
+	op_data->op_attr_flags = ll_inode_to_ext_flags(inode_flags);
 	if (fsxattr.fsx_xflags & FS_XFLAG_PROJINHERIT)
 		op_data->op_attr_flags |= LUSTRE_PROJINHERIT_FL;
 	op_data->op_projid = fsxattr.fsx_projid;
@@ -3213,24 +3212,20 @@ int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd,
 		goto out_fsxattr;
 
 	ll_update_inode_flags(inode, op_data->op_attr_flags);
-	obj = ll_i2info(inode)->lli_clob;
-	if (!obj)
-		goto out_fsxattr;
 
-	/* Avoiding OST RPC if this is only project ioctl */
+	/* Avoid OST RPC if this is only ioctl setting project inherit flag */
 	if (fsxattr.fsx_xflags == 0 ||
 	    fsxattr.fsx_xflags == FS_XFLAG_PROJINHERIT)
 		goto out_fsxattr;
 
-	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
-	if (!attr) {
-		rc = -ENOMEM;
-		goto out_fsxattr;
+	obj = ll_i2info(inode)->lli_clob;
+	if (obj) {
+		struct iattr attr = { 0 };
+
+		rc = cl_setattr_ost(obj, &attr, OP_XVALID_FLAGS,
+				    fsxattr.fsx_xflags);
 	}
 
-	rc = cl_setattr_ost(obj, attr, OP_XVALID_FLAGS,
-			    fsxattr.fsx_xflags);
-	kfree(attr);
 out_fsxattr:
 	ll_finish_md_op_data(op_data);
 
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 677106d..041f6d3 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1081,12 +1081,12 @@ static inline int ll_xflags_to_inode_flags(int xflags)
 	       ((xflags & FS_XFLAG_IMMUTABLE) ? S_IMMUTABLE : 0);
 }
 
-static inline int ll_inode_flags_to_xflags(int flags)
+static inline int ll_inode_flags_to_xflags(int inode_flags)
 {
-	return ((flags & S_SYNC)      ? FS_XFLAG_SYNC      : 0) |
-	       ((flags & S_NOATIME)   ? FS_XFLAG_NOATIME   : 0) |
-	       ((flags & S_APPEND)    ? FS_XFLAG_APPEND    : 0) |
-	       ((flags & S_IMMUTABLE) ? FS_XFLAG_IMMUTABLE : 0);
+	return ((inode_flags & S_SYNC)      ? FS_XFLAG_SYNC      : 0) |
+	       ((inode_flags & S_NOATIME)   ? FS_XFLAG_NOATIME   : 0) |
+	       ((inode_flags & S_APPEND)    ? FS_XFLAG_APPEND    : 0) |
+	       ((inode_flags & S_IMMUTABLE) ? FS_XFLAG_IMMUTABLE : 0);
 }
 
 int ll_migrate(struct inode *parent, struct file *file,
@@ -1148,7 +1148,7 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr,
 int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs,
 		       u32 flags);
 int ll_update_inode(struct inode *inode, struct lustre_md *md);
-void ll_update_inode_flags(struct inode *inode, int ext_flags);
+void ll_update_inode_flags(struct inode *inode, unsigned int ext_flags);
 int ll_read_inode2(struct inode *inode, void *opaque);
 void ll_delete_inode(struct inode *inode);
 int ll_iocontrol(struct inode *inode, struct file *file,
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 3139669..ca6e736 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2250,7 +2250,7 @@ void ll_inode_size_unlock(struct inode *inode)
 	mutex_unlock(&lli->lli_size_mutex);
 }
 
-void ll_update_inode_flags(struct inode *inode, int ext_flags)
+void ll_update_inode_flags(struct inode *inode, unsigned int ext_flags)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index 449ac47..3a33657 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -1614,7 +1614,9 @@ struct mdt_body {
 	__u32	mbo_mode;
 	__u32	mbo_uid;
 	__u32	mbo_gid;
-	__u32	mbo_flags;	/* LUSTRE_*_FL file attributes */
+	__u32	mbo_flags;	/* most replies: LUSTRE_*_FL file attributes,
+				 * data_version: OBD_FL_* flags
+				 */
 	__u32	mbo_rdev;
 	__u32	mbo_nlink;	/* #bytes to read in the case of MDS_READPAGE */
 	__u32	mbo_layout_gen;	/* was "generation" until 2.4.0 */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 04/49] lnet: lnet_notify sets route aliveness incorrectly
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (2 preceding siblings ...)
  2021-04-15  4:01 ` [lustre-devel] [PATCH 03/49] lustre: llite: mark extended attr and inode flags James Simmons
@ 2021-04-15  4:01 ` James Simmons
  2021-04-15  4:01 ` [lustre-devel] [PATCH 05/49] lnet: Prevent discovery on peer marked deletion James Simmons
                   ` (44 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:01 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

lnet_notify() modifies route aliveness in two ways:
1. By setting lp_alive field of the lnet_peer struct.
2. By setting lr_alive field of the lnet_route struct (via call to
   lnet_set_route_aliveness())

In both cases, the aliveness value assigned is determined by a call
to lnet_is_peer_ni_alive(), but that value only reflects the aliveness
of a particular peer NI. A gateway may have multiple peer NIs, so the
aliveness of a gateway peer (lp_alive) is not necessarily equivalent
to the aliveness of one of its NIs. Furthermore, the lr_alive field
is only used to determine route aliveness for path selection if
discovery is disabled locally or on the gateway (see
lnet_find_route_locked() and lnet_is_route_alive()).

In general, we should not set lp_alive based on an lnet_notify()
call, and we should only set lr_alive if discovery is disabled. For
lr_alive specifically, we should only set it for those routes that
have the peer NI as a next-hop.

An exception to the above exists when the reset argument to
lnet_notify() is set. The gnilnd uses this flag in its calls to
lnet_notify() because gnilnd receives out-of-band notifications of
node up and down events. Thus, when gnilnd calls lnet_notify() we
actually know whether the gateway peer is up or down and we can set
lp_alive appropriately.

net lock/EX is held by other callers of lnet_set_route_aliveness, so
we do the same in lnet_notify().

Fixes: 938a22eaf2 ("lnet: discovery off route state update")
Fixes: 1cadb960f1 ("lnet: Add peer level aliveness information")
HPE-bug-id: LUS-9034
WC-bug-id: https://jira.whamcloud.com/browse/LU-13708
Lustre-commit: e24471a722a6f23f ("LU-13708 lnet: lnet_notify sets route aliveness incorrectly")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39160
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 39 +++++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index e030b16..ee3c15f 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -391,6 +391,7 @@ bool lnet_is_route_alive(struct lnet_route *route)
 	lnet_check_route_inconsistency(route);
 }
 
+/* Must hold net_lock/EX */
 static inline void
 lnet_set_route_aliveness(struct lnet_route *route, bool alive)
 {
@@ -405,6 +406,7 @@ bool lnet_is_route_alive(struct lnet_route *route)
 	}
 }
 
+/* Must hold net_lock/EX */
 void
 lnet_router_discovery_ping_reply(struct lnet_peer *lp)
 {
@@ -1745,6 +1747,37 @@ bool lnet_router_checker_active(void)
 
 	/* recalculate aliveness */
 	alive = lnet_is_peer_ni_alive(lpni);
+
+	lp = lpni->lpni_peer_net->lpn_peer;
+	/* If this is an LNet router then update route aliveness */
+	if (lp->lp_rtr_refcount) {
+		if (reset)
+			/* reset flag indicates gateway peer went up or down */
+			lp->lp_alive = alive;
+
+		/* If discovery is disabled, locally or on the gateway, then
+		 * any routes using lpni as next-hop need to be updated
+		 *
+		 * NB: We can get many notifications while a route is down, so
+		 * we try and avoid the expensive net_lock/EX here for the
+		 * common case of receiving duplicate lnet_notify() calls (i.e.
+		 * only grab EX lock when we actually need to update the route
+		 * aliveness).
+		 */
+		if (lnet_is_discovery_disabled(lp)) {
+			list_for_each_entry(route, &lp->lp_routes, lr_gwlist) {
+				if (route->lr_nid == lpni->lpni_nid &&
+				    route->lr_alive != alive) {
+					lnet_net_unlock(0);
+					lnet_net_lock(LNET_LOCK_EX);
+					lnet_set_route_aliveness(route, alive);
+					lnet_net_unlock(LNET_LOCK_EX);
+					lnet_net_lock(0);
+				}
+			}
+		}
+	}
+
 	lnet_net_unlock(0);
 
 	if (ni && !alive)
@@ -1753,12 +1786,6 @@ bool lnet_router_checker_active(void)
 	cpt = lpni->lpni_cpt;
 	lnet_net_lock(cpt);
 	lnet_peer_ni_decref_locked(lpni);
-	if (lpni && lpni->lpni_peer_net && lpni->lpni_peer_net->lpn_peer) {
-		lp = lpni->lpni_peer_net->lpn_peer;
-		lp->lp_alive = alive;
-		list_for_each_entry(route, &lp->lp_routes, lr_gwlist)
-			lnet_set_route_aliveness(route, alive);
-	}
 	lnet_net_unlock(cpt);
 
 	return 0;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 05/49] lnet: Prevent discovery on peer marked deletion
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (3 preceding siblings ...)
  2021-04-15  4:01 ` [lustre-devel] [PATCH 04/49] lnet: lnet_notify sets route aliveness incorrectly James Simmons
@ 2021-04-15  4:01 ` James Simmons
  2021-04-15  4:01 ` [lustre-devel] [PATCH 06/49] lnet: Prevent discovery on deleted peer James Simmons
                   ` (43 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:01 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

If a peer has been marked for deletion then we needn't perform any
other discovery operation on it. Integrate this peer state into the
top level of the discovery state machine so that it is checked before
any other state.

HPE-bug-id: LUS-9192
WC-bug-id: https://jira.whamcloud.com/browse/LU-13895
Lustre-commit: aa7de0af6969df77 ("LU-13895 lnet: Prevent discovery on peer marked deletion")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39604
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 109 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 65 insertions(+), 44 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 8ee5ec3..48f78ef 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2934,6 +2934,68 @@ static bool lnet_is_nid_in_ping_info(lnet_nid_t nid,
 	return false;
 }
 
+/* Delete a peer that has been marked for deletion. NB: when this peer was added
+ * to the discovery queue a reference was taken that will prevent the peer from
+ * actually being freed by this function. After this function exits the
+ * discovery thread should call lnet_peer_discovery_complete() which will
+ * drop that reference as well as wake any waiters that may also be holding a
+ * ref on the peer
+ */
+static int lnet_peer_deletion(struct lnet_peer *lp)
+__must_hold(&lp->lp_lock)
+{
+	struct list_head rlist;
+	struct lnet_route *route, *tmp;
+	int sensitivity = lp->lp_health_sensitivity;
+
+	INIT_LIST_HEAD(&rlist);
+
+	lp->lp_state &= ~(LNET_PEER_DISCOVERING | LNET_PEER_FORCE_PING |
+			  LNET_PEER_FORCE_PUSH);
+	CDEBUG(D_NET, "peer %s(%p) state %#x\n",
+	       libcfs_nid2str(lp->lp_primary_nid), lp, lp->lp_state);
+
+	if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING)
+		return -ESHUTDOWN;
+
+	spin_unlock(&lp->lp_lock);
+
+	mutex_lock(&the_lnet.ln_api_mutex);
+
+	lnet_net_lock(LNET_LOCK_EX);
+	/* remove the peer from the discovery work
+	 * queue if it's on there in preparation
+	 * of deleting it.
+	 */
+	if (!list_empty(&lp->lp_dc_list))
+		list_del(&lp->lp_dc_list);
+	list_for_each_entry_safe(route, tmp,
+				 &lp->lp_routes,
+				 lr_gwlist)
+		lnet_move_route(route, NULL, &rlist);
+	lnet_net_unlock(LNET_LOCK_EX);
+
+	/* lnet_peer_del() deletes all the peer NIs owned by this peer */
+	lnet_peer_del(lp);
+
+	list_for_each_entry_safe(route, tmp,
+				 &rlist, lr_list) {
+		/* re-add these routes */
+		lnet_add_route(route->lr_net,
+			       route->lr_hops,
+			       route->lr_nid,
+			       route->lr_priority,
+			       sensitivity);
+		kfree(route);
+	}
+
+	mutex_unlock(&the_lnet.ln_api_mutex);
+
+	spin_lock(&lp->lp_lock);
+
+	return 0;
+}
+
 /*
  * Update a peer using the data received.
  */
@@ -3504,7 +3566,9 @@ static int lnet_peer_discovery(void *arg)
 			CDEBUG(D_NET, "peer %s(%p) state %#x\n",
 			       libcfs_nid2str(lp->lp_primary_nid), lp,
 			       lp->lp_state);
-			if (lp->lp_state & LNET_PEER_DATA_PRESENT)
+			if (lp->lp_state & LNET_PEER_MARK_DELETION)
+				rc = lnet_peer_deletion(lp);
+			else if (lp->lp_state & LNET_PEER_DATA_PRESENT)
 				rc = lnet_peer_data_present(lp);
 			else if (lp->lp_state & LNET_PEER_PING_FAILED)
 				rc = lnet_peer_ping_failed(lp);
@@ -3536,49 +3600,6 @@ static int lnet_peer_discovery(void *arg)
 				lnet_peer_discovery_complete(lp);
 			if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING)
 				break;
-
-			if (lp->lp_state & LNET_PEER_MARK_DELETION) {
-				struct list_head rlist;
-				struct lnet_route *route, *tmp;
-				int sensitivity = lp->lp_health_sensitivity;
-
-				INIT_LIST_HEAD(&rlist);
-
-				/* remove the peer from the discovery work
-				 * queue if it's on there in preparation
-				 * of deleting it.
-				 */
-				if (!list_empty(&lp->lp_dc_list))
-					list_del(&lp->lp_dc_list);
-
-				lnet_net_unlock(LNET_LOCK_EX);
-
-				mutex_lock(&the_lnet.ln_api_mutex);
-
-				lnet_net_lock(LNET_LOCK_EX);
-				list_for_each_entry_safe(route, tmp,
-							 &lp->lp_routes,
-							 lr_gwlist)
-					lnet_move_route(route, NULL, &rlist);
-				lnet_net_unlock(LNET_LOCK_EX);
-
-				/* delete the peer */
-				lnet_peer_del(lp);
-
-				list_for_each_entry_safe(route, tmp,
-							 &rlist, lr_list) {
-					/* re-add these routes */
-					lnet_add_route(route->lr_net,
-						       route->lr_hops,
-						       route->lr_nid,
-						       route->lr_priority,
-						       sensitivity);
-					kfree(route);
-				}
-				mutex_unlock(&the_lnet.ln_api_mutex);
-
-				lnet_net_lock(LNET_LOCK_EX);
-			}
 		}
 
 		lnet_net_unlock(LNET_LOCK_EX);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 06/49] lnet: Prevent discovery on deleted peer
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (4 preceding siblings ...)
  2021-04-15  4:01 ` [lustre-devel] [PATCH 05/49] lnet: Prevent discovery on peer marked deletion James Simmons
@ 2021-04-15  4:01 ` James Simmons
  2021-04-15  4:01 ` [lustre-devel] [PATCH 07/49] lnet: Transfer disc src NID when merging peers James Simmons
                   ` (42 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:01 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

We needn't perform any discovery activities on a peer that has had
lnet_peer_del() called on it.

HPE-bug-id: LUS-9192
WC-bug-id: https://jira.whamcloud.com/browse/LU-13895
Lustre-commit: fd32cd817cba336c ("LU-13895 lnet: Prevent discovery on deleted peer"):1
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39605
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h  |  2 ++
 include/linux/lnet/lib-types.h |  4 ++-
 net/lnet/lnet/peer.c           | 73 ++++++++++++++++++++++++------------------
 3 files changed, 46 insertions(+), 33 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 1efac9b..2741c6f 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -893,6 +893,8 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid,
 {
 	if (!(lp->lp_state & LNET_PEER_MULTI_RAIL))
 		return false;
+	if (lp->lp_state & LNET_PEER_MARK_DELETED)
+		return false;
 	if (lp->lp_state & LNET_PEER_FORCE_PUSH)
 		return true;
 	if (lp->lp_state & LNET_PEER_NO_DISCOVERY)
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index f1f4eac5..2424993 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -750,7 +750,9 @@ struct lnet_peer {
 #define LNET_PEER_RTR_DISCOVERED BIT(17)
 
 /* peer is marked for deletion */
-#define LNET_PEER_MARK_DELETION BIT(18)
+#define LNET_PEER_MARK_DELETION		BIT(18)
+/* lnet_peer_del()/lnet_peer_del_locked() has been called on the peer */
+#define LNET_PEER_MARK_DELETED		BIT(19)
 
 struct lnet_peer_net {
 	/* chain on lp_peer_nets */
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 48f78ef..34153a8 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -450,6 +450,10 @@ void lnet_peer_uninit(void)
 
 	CDEBUG(D_NET, "peer %s\n", libcfs_nid2str(peer->lp_primary_nid));
 
+	spin_lock(&peer->lp_lock);
+	peer->lp_state |= LNET_PEER_MARK_DELETED;
+	spin_unlock(&peer->lp_lock);
+
 	lpni = lnet_get_next_peer_ni_locked(peer, NULL, lpni);
 	while (lpni) {
 		lpni2 = lnet_get_next_peer_ni_locked(peer, NULL, lpni);
@@ -462,9 +466,40 @@ void lnet_peer_uninit(void)
 	return rc2;
 }
 
+/* Discovering this peer is taking too long. Cancel any Ping or Push
+ * that discovery is waiting on by unlinking the relevant MDs. The
+ * lnet_discovery_event_handler() will proceed from here and complete
+ * the cleanup.
+ */
+static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
+{
+	struct lnet_handle_md ping_mdh;
+	struct lnet_handle_md push_mdh;
+
+	LNetInvalidateMDHandle(&ping_mdh);
+	LNetInvalidateMDHandle(&push_mdh);
+
+	spin_lock(&lp->lp_lock);
+	if (lp->lp_state & LNET_PEER_PING_SENT) {
+		ping_mdh = lp->lp_ping_mdh;
+		LNetInvalidateMDHandle(&lp->lp_ping_mdh);
+	}
+	if (lp->lp_state & LNET_PEER_PUSH_SENT) {
+		push_mdh = lp->lp_push_mdh;
+		LNetInvalidateMDHandle(&lp->lp_push_mdh);
+	}
+	spin_unlock(&lp->lp_lock);
+
+	if (!LNetMDHandleIsInvalid(ping_mdh))
+		LNetMDUnlink(ping_mdh);
+	if (!LNetMDHandleIsInvalid(push_mdh))
+		LNetMDUnlink(push_mdh);
+}
+
 static int
 lnet_peer_del(struct lnet_peer *peer)
 {
+	lnet_peer_cancel_discovery(peer);
 	lnet_net_lock(LNET_LOCK_EX);
 	lnet_peer_del_locked(peer);
 	lnet_net_unlock(LNET_LOCK_EX);
@@ -2955,6 +2990,10 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
 	CDEBUG(D_NET, "peer %s(%p) state %#x\n",
 	       libcfs_nid2str(lp->lp_primary_nid), lp, lp->lp_state);
 
+	/* no-op if lnet_peer_del() has already been called on this peer */
+	if (lp->lp_state & LNET_PEER_MARK_DELETED)
+		return 0;
+
 	if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING)
 		return -ESHUTDOWN;
 
@@ -3382,37 +3421,6 @@ static void lnet_peer_discovery_error(struct lnet_peer *lp, int error)
 }
 
 /*
- * Discovering this peer is taking too long. Cancel any Ping or Push
- * that discovery is waiting on by unlinking the relevant MDs. The
- * lnet_discovery_event_handler() will proceed from here and complete
- * the cleanup.
- */
-static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
-{
-	struct lnet_handle_md ping_mdh;
-	struct lnet_handle_md push_mdh;
-
-	LNetInvalidateMDHandle(&ping_mdh);
-	LNetInvalidateMDHandle(&push_mdh);
-
-	spin_lock(&lp->lp_lock);
-	if (lp->lp_state & LNET_PEER_PING_SENT) {
-		ping_mdh = lp->lp_ping_mdh;
-		LNetInvalidateMDHandle(&lp->lp_ping_mdh);
-	}
-	if (lp->lp_state & LNET_PEER_PUSH_SENT) {
-		push_mdh = lp->lp_push_mdh;
-		LNetInvalidateMDHandle(&lp->lp_push_mdh);
-	}
-	spin_unlock(&lp->lp_lock);
-
-	if (!LNetMDHandleIsInvalid(ping_mdh))
-		LNetMDUnlink(ping_mdh);
-	if (!LNetMDHandleIsInvalid(push_mdh))
-		LNetMDUnlink(push_mdh);
-}
-
-/*
  * Wait for work to be queued or some other change that must be
  * attended to. Returns non-zero if the discovery thread should shut
  * down.
@@ -3566,7 +3574,8 @@ static int lnet_peer_discovery(void *arg)
 			CDEBUG(D_NET, "peer %s(%p) state %#x\n",
 			       libcfs_nid2str(lp->lp_primary_nid), lp,
 			       lp->lp_state);
-			if (lp->lp_state & LNET_PEER_MARK_DELETION)
+			if (lp->lp_state & (LNET_PEER_MARK_DELETION |
+					    LNET_PEER_MARK_DELETED))
 				rc = lnet_peer_deletion(lp);
 			else if (lp->lp_state & LNET_PEER_DATA_PRESENT)
 				rc = lnet_peer_data_present(lp);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 07/49] lnet: Transfer disc src NID when merging peers
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (5 preceding siblings ...)
  2021-04-15  4:01 ` [lustre-devel] [PATCH 06/49] lnet: Prevent discovery on deleted peer James Simmons
@ 2021-04-15  4:01 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 08/49] lnet: Lookup lpni after discovery James Simmons
                   ` (41 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:01 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

If we're merging two peers in lnet_peer_data_present() then we need
to transfer the src NID stored in the peer whose ping buffer we are
processing to the peer that actually owns the NIDs in the ping
buffer. Otherwise it is possible that the subsequent push to the peer
that is being discovered will go out over an interface that the peer
does not know about and it will be dropped.

HPE-bug-id: LUS-9193
WC-bug-id: https://jira.whamcloud.com/browse/LU-13894
Lustre-commit: e65d8ba583858ae1 ("LU-13894 lnet: Transfer disc src NID when merging peers")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39607
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 34153a8..1b240f1 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -3116,7 +3116,7 @@ static int lnet_peer_data_present(struct lnet_peer *lp)
 		rc = lnet_peer_merge_data(lp, pbuf);
 	} else {
 		lpni = lnet_find_peer_ni_locked(nid);
-		if (!lpni) {
+		if (!lpni || lp == lpni->lpni_peer_net->lpn_peer) {
 			rc = lnet_peer_set_primary_nid(lp, nid, flags);
 			if (rc) {
 				CERROR("Primary NID error %s versus %s: %d\n",
@@ -3125,6 +3125,8 @@ static int lnet_peer_data_present(struct lnet_peer *lp)
 			} else {
 				rc = lnet_peer_merge_data(lp, pbuf);
 			}
+			if (lpni)
+				lnet_peer_ni_decref_locked(lpni);
 		} else {
 			struct lnet_peer *new_lp;
 
@@ -3133,10 +3135,22 @@ static int lnet_peer_data_present(struct lnet_peer *lp)
 			 * should have discovery/MR enabled as well, since
 			 * it's the same peer, which we're about to merge
 			 */
+			spin_lock(&lp->lp_lock);
+			spin_lock(&new_lp->lp_lock);
 			if (!(lp->lp_state & LNET_PEER_NO_DISCOVERY))
 				new_lp->lp_state &= ~LNET_PEER_NO_DISCOVERY;
 			if (lp->lp_state & LNET_PEER_MULTI_RAIL)
 				new_lp->lp_state |= LNET_PEER_MULTI_RAIL;
+			/* If we're processing a ping reply then we may be
+			 * about to send a push to the peer that we ping'd.
+			 * Since the ping reply that we're processing was
+			 * received by lp, we need to set the discovery source
+			 * NID for new_lp to the NID stored in lp.
+			 */
+			if (lp->lp_disc_src_nid != LNET_NID_ANY)
+				new_lp->lp_disc_src_nid = lp->lp_disc_src_nid;
+			spin_unlock(&new_lp->lp_lock);
+			spin_unlock(&lp->lp_lock);
 
 			rc = lnet_peer_set_primary_data(new_lp, pbuf);
 			lnet_consolidate_routes_locked(lp, new_lp);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 08/49] lnet: Lookup lpni after discovery
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (6 preceding siblings ...)
  2021-04-15  4:01 ` [lustre-devel] [PATCH 07/49] lnet: Transfer disc src NID when merging peers James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 09/49] lustre: llite: update and fix module loading bug in mounting code James Simmons
                   ` (40 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

The lpni for a nid can change as part of the discovery process (see
lnet_peer_add_nid()). As such, callers of lnet_discover_peer_locked()
need to lookup the lpni again after discovery completes to make sure
they get the correct peer.

An exception is lnet_check_routers() which doesn't do anything with
the peer or peer NI after the call to lnet_discover_peer_locked().
If the router list is changed then lnet_check_routers() will already
repeat discovery.

HPE-bug-id: LUS-9167
WC-bug-id: https://jira.whamcloud.com/browse/LU-13883
Lustre-commit: 584d9e46053234d0 ("LU-13883 lnet: Lookup lpni after discovery")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39747
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  1 +
 net/lnet/lnet/api-ni.c        | 12 ++++++++++++
 net/lnet/lnet/lib-move.c      | 30 ++++++++++++++++++++++++------
 net/lnet/lnet/peer.c          | 30 ++++++++++++++++++++++++++++++
 4 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 2741c6f..1954614 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -817,6 +817,7 @@ struct lnet_peer_ni *lnet_peer_get_ni_locked(struct lnet_peer *lp,
 void lnet_peer_net_added(struct lnet_net *net);
 lnet_nid_t lnet_peer_primary_nid_locked(lnet_nid_t nid);
 int lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt, bool block);
+void lnet_peer_queue_message(struct lnet_peer *lp, struct lnet_msg *msg);
 int lnet_peer_discovery_start(void);
 void lnet_peer_discovery_stop(void);
 void lnet_push_update_to_peers(int force);
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 542cc2e..0c0b304 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -4540,6 +4540,18 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout,
 	if (rc)
 		goto out_decref;
 
+	/* The lpni (or lp) for this NID may have changed and our ref is
+	 * the only thing keeping the old one around. Release the ref
+	 * and lookup the lpni again
+	 */
+	lnet_peer_ni_decref_locked(lpni);
+	lpni = lnet_find_peer_ni_locked(id.nid);
+	if (!lpni) {
+		rc = -ENOENT;
+		goto out;
+	}
+	lp = lpni->lpni_peer_net->lpn_peer;
+
 	i = 0;
 	p = NULL;
 	while ((p = lnet_get_next_peer_ni_locked(lp, NULL, p)) != NULL) {
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index de17de4b..25e0fd2 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1834,6 +1834,7 @@ struct lnet_ni *
 			     int cpt)
 {
 	struct lnet_peer *peer;
+	struct lnet_peer_ni *new_lpni;
 	int rc;
 
 	lnet_peer_ni_addref_locked(lpni);
@@ -1855,21 +1856,38 @@ struct lnet_ni *
 		lnet_peer_ni_decref_locked(lpni);
 		return rc;
 	}
-	/* The peer may have changed. */
-	peer = lpni->lpni_peer_net->lpn_peer;
+
+	new_lpni = lnet_find_peer_ni_locked(lpni->lpni_nid);
+	if (!new_lpni) {
+		lnet_peer_ni_decref_locked(lpni);
+		return -ENOENT;
+	}
+
+	peer = new_lpni->lpni_peer_net->lpn_peer;
 	spin_lock(&peer->lp_lock);
-	if (lnet_peer_is_uptodate_locked(peer)) {
+	if (lpni == new_lpni && lnet_peer_is_uptodate_locked(peer)) {
+		/* The peer NI did not change and the peer is up to date.
+		 * Nothing more to do.
+		 */
 		spin_unlock(&peer->lp_lock);
 		lnet_peer_ni_decref_locked(lpni);
+		lnet_peer_ni_decref_locked(new_lpni);
 		return 0;
 	}
-	/* queue message and return */
+	spin_unlock(&peer->lp_lock);
+
+	/* Either the peer NI changed during discovery, or the peer isn't up
+	 * to date. In both cases we want to queue the message on the
+	 * (possibly new) peer's pending queue and queue the peer for discovery
+	 */
 	msg->msg_sending = 0;
 	msg->msg_txpeer = NULL;
-	list_add_tail(&msg->msg_list, &peer->lp_dc_pendq);
-	spin_unlock(&peer->lp_lock);
+	lnet_net_unlock(cpt);
+	lnet_peer_queue_message(peer, msg);
+	lnet_net_lock(cpt);
 
 	lnet_peer_ni_decref_locked(lpni);
+	lnet_peer_ni_decref_locked(new_lpni);
 
 	CDEBUG(D_NET, "msg %p delayed. %s pending discovery\n",
 	       msg, libcfs_nid2str(peer->lp_primary_nid));
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 1b240f1..ba41d86 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -1346,6 +1346,16 @@ struct lnet_peer_ni *
 		rc = lnet_discover_peer_locked(lpni, cpt, true);
 		if (rc)
 			goto out_decref;
+		/* The lpni (or lp) for this NID may have changed and our ref is
+		 * the only thing keeping the old one around. Release the ref
+		 * and lookup the lpni again
+		 */
+		lnet_peer_ni_decref_locked(lpni);
+		lpni = lnet_find_peer_ni_locked(nid);
+		if (!lpni) {
+			rc = -ENOENT;
+			goto out_unlock;
+		}
 		lp = lpni->lpni_peer_net->lpn_peer;
 
 		/* Only try once if discovery is disabled */
@@ -2054,6 +2064,26 @@ struct lnet_peer_ni *
 	return rc;
 }
 
+/* Add the message to the peer's lp_dc_pendq and queue the peer for discovery */
+void
+lnet_peer_queue_message(struct lnet_peer *lp, struct lnet_msg *msg)
+{
+	/* The discovery thread holds net_lock/EX and lp_lock when it splices
+	 * the lp_dc_pendq onto a local list for resending. Thus, we do the same
+	 * when adding to the list and queuing the peer to ensure that we do not
+	 * strand any messages on the lp_dc_pendq. This scheme ensures the
+	 * message will be resent even if the peer is already being discovered.
+	 * Therefore we needn't check the return value of
+	 * lnet_peer_queue_for_discovery(lp).
+	 */
+	lnet_net_lock(LNET_LOCK_EX);
+	spin_lock(&lp->lp_lock);
+	list_add_tail(&msg->msg_list, &lp->lp_dc_pendq);
+	spin_unlock(&lp->lp_lock);
+	lnet_peer_queue_for_discovery(lp);
+	lnet_net_unlock(LNET_LOCK_EX);
+}
+
 /*
  * Queue a peer for the attention of the discovery thread.  Call with
  * lnet_net_lock/EX held. Returns 0 if the peer was queued, and
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 09/49] lustre: llite: update and fix module loading bug in mounting code
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (7 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 08/49] lnet: Lookup lpni after discovery James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 10/49] lnet: socklnd: change various ints to bool James Simmons
                   ` (39 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

If a MSG was not up and you attempted to mount a client it would
fail as expected. The issue was when you attempted to unload
the lustre module that it would oops. Remove the module_put()
that was causing this problem. Update the llite module code to
sync us with the OpenSFS tree.

Fixes: a989830c88 ("lustre: llite: move client mounting from obdclass to llite")
WC-bug-id: https://jira.whamcloud.com/browse/LU-12514
Lustre-commit: 53fa81765750e38f ("LU-12514 llite: move client mounting from obdclass to llite")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37693
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 fs/lustre/include/lustre_disk.h |  5 ++--
 fs/lustre/llite/llite_lib.c     |  5 ----
 fs/lustre/llite/super25.c       | 60 ++++++++++++++++++++++-------------------
 fs/lustre/obdclass/class_obd.c  | 15 +++++------
 4 files changed, 42 insertions(+), 43 deletions(-)

diff --git a/fs/lustre/include/lustre_disk.h b/fs/lustre/include/lustre_disk.h
index b6b693f..a54b5fd 100644
--- a/fs/lustre/include/lustre_disk.h
+++ b/fs/lustre/include/lustre_disk.h
@@ -150,12 +150,13 @@ struct lustre_sb_info {
 int lustre_start_mgc(struct super_block *sb);
 int lustre_common_put_super(struct super_block *sb);
 
-int mgc_fsname2resid(char *fsname, struct ldlm_res_id *res_id, int type);
-
 struct lustre_sb_info *lustre_init_lsi(struct super_block *sb);
 int lustre_put_lsi(struct super_block *sb);
 int lmd_parse(char *options, struct lustre_mount_data *lmd);
 
+/* mgc_request.c */
+int mgc_fsname2resid(char *fsname, struct ldlm_res_id *res_id, int type);
+
 /** @} disk */
 
 #endif /* _LUSTRE_DISK_H */
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index ca6e736..e7c1b73 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1052,8 +1052,6 @@ int ll_fill_super(struct super_block *sb)
 
 	CDEBUG(D_VFSTRACE, "VFS Op: sb %p\n", sb);
 
-	try_module_get(THIS_MODULE);
-
 	cfg = kzalloc(sizeof(*cfg), GFP_NOFS);
 	if (!cfg) {
 		err = -ENOMEM;
@@ -1252,9 +1250,6 @@ void ll_put_super(struct super_block *sb)
 	ll_common_put_super(sb);
 
 	cl_env_cache_purge(~0);
-
-	module_put(THIS_MODULE);
-
 } /* client_put_super */
 
 struct inode *ll_inode_from_resource_lock(struct ldlm_lock *lock)
diff --git a/fs/lustre/llite/super25.c b/fs/lustre/llite/super25.c
index e3194a5..1b074a6 100644
--- a/fs/lustre/llite/super25.c
+++ b/fs/lustre/llite/super25.c
@@ -32,6 +32,7 @@
  */
 
 #define DEBUG_SUBSYSTEM S_LLITE
+
 #define D_MOUNT (D_SUPER | D_CONFIG/*|D_WARNING */)
 
 #include <linux/module.h>
@@ -95,11 +96,12 @@ static int ll_drop_inode(struct inode *inode)
 	.show_options		= ll_show_options,
 };
 
-/** This is the entry point for the mount call into Lustre.
+/**
+ * This is the entry point for the mount call into Lustre.
  * This is called when a server or client is mounted,
  * and this is where we start setting things up.
  *
- * @data:	Mount options (e.g. -o flock,abort_recov)
+ * @lmd2data	Mount options (e.g. -o flock,abort_recov)
  */
 static int lustre_fill_super(struct super_block *sb, void *lmd2_data,
 			     int silent)
@@ -132,30 +134,30 @@ static int lustre_fill_super(struct super_block *sb, void *lmd2_data,
 		goto out_put_lsi;
 	}
 
-	if (lmd_is_client(lmd)) {
-		CDEBUG(D_MOUNT, "Mounting client %s\n", lmd->lmd_profile);
-
-		rc = ptlrpc_inc_ref();
-		if (rc)
-			goto out_put_lsi;
-		rc = lustre_start_mgc(sb);
-		if (rc) {
-			/* This will put_lsi and ptlrpc_dec_ref */
-			ll_common_put_super(sb);
-			goto out;
-		}
-		/* Connect and start */
-		rc = ll_fill_super(sb);
-		/*
-		 * c_f_s will call ll_common_put_super on failure, otherwise
-		 * c_f_s will have taken another reference to the module
-		 */
-	} else {
-		CERROR("This is client-side-only module, cannot handle server mount.\n");
-		rc = -EINVAL;
+	if (!lmd_is_client(lmd)) {
+		rc = -ENODEV;
+		CERROR("%s: This is client-side-only module, cannot handle server mount: rc = %d\n",
+		       lmd->lmd_profile, rc);
+		goto out_put_lsi;
 	}
 
-	/* If error happens in fill_super() call, @lsi will be killed there.
+	CDEBUG(D_MOUNT, "Mounting client %s\n", lmd->lmd_profile);
+	rc = ptlrpc_inc_ref();
+	if (rc)
+		goto out_put_lsi;
+
+	rc = lustre_start_mgc(sb);
+	if (rc) {
+		/* This will put_lsi and ptlrpc_dec_ref */
+		ll_common_put_super(sb);
+		goto out;
+	}
+	/* Connect and start */
+	rc = ll_fill_super(sb);
+	/* ll_file_super will call lustre_common_put_super on failure,
+	 * which takes care of the module reference.
+	 *
+	 * If error happens in fill_super() call, @lsi will be killed there.
 	 * This is why we do not put it here.
 	 */
 	goto out;
@@ -163,10 +165,10 @@ static int lustre_fill_super(struct super_block *sb, void *lmd2_data,
 	lustre_put_lsi(sb);
 out:
 	if (rc) {
-		CERROR("Unable to mount %s (%d)\n",
-		       s2lsi(sb) ? lmd->lmd_dev : "", rc);
+		CERROR("llite: Unable to mount %s: rc = %d\n",
+		       s2lsi(sb) ? lmd->lmd_dev : "<unknown>", rc);
 	} else {
-		CDEBUG(D_SUPER, "Mount %s complete\n",
+		CDEBUG(D_SUPER, "%s: Mount complete\n",
 		       lmd->lmd_dev);
 	}
 	lockdep_on();
@@ -268,10 +270,12 @@ static int __init lustre_init(void)
 
 	rc = register_filesystem(&lustre_fs_type);
 	if (rc)
-		goto out_inode_fini_env;
+		goto out_xattr;
 
 	return 0;
 
+out_xattr:
+	ll_xattr_fini();
 out_inode_fini_env:
 	cl_env_put(cl_inode_fini_env, &cl_inode_fini_refcheck);
 out_vvp:
diff --git a/fs/lustre/obdclass/class_obd.c b/fs/lustre/obdclass/class_obd.c
index 76664bf..38b8967 100644
--- a/fs/lustre/obdclass/class_obd.c
+++ b/fs/lustre/obdclass/class_obd.c
@@ -719,12 +719,14 @@ static int __init obdclass_init(void)
 	/* simulate a late OOM situation now to require all
 	 * alloc'ed/initialized resources to be freed
 	 */
-	if (!OBD_FAIL_CHECK(OBD_FAIL_OBDCLASS_MODULE_LOAD))
-		return 0;
-
-	/* force error to ensure module will be unloaded/cleaned */
-	err = -ENOMEM;
+	if (OBD_FAIL_CHECK(OBD_FAIL_OBDCLASS_MODULE_LOAD)) {
+		/* force error to ensure module will be unloaded/cleaned */
+		err = -ENOMEM;
+		goto cleanup_all;
+	}
+	return 0;
 
+cleanup_all:
 	llog_info_fini();
 
 cleanup_cl_global:
@@ -748,9 +750,6 @@ static int __init obdclass_init(void)
 cleanup_zombie_impexp:
 	obd_zombie_impexp_stop();
 
-	if (err)
-		return err;
-
 	return err;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 10/49] lnet: socklnd: change various ints to bool.
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (8 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 09/49] lustre: llite: update and fix module loading bug in mounting code James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 11/49] lnet: Correct asymmetric route detection James Simmons
                   ` (38 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Each of these int variables, and one int function, are
really truth values, so change to bool.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12678
Lustre-commit: a3275d1d79df5ab5 ("LU-12678 socklnd: change various ints to bool.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39302
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd.c    | 13 ++++++-------
 net/lnet/klnds/socklnd/socklnd_cb.c | 22 +++++++++++-----------
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index a7c0b65..589a835 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -1533,7 +1533,7 @@ struct ksock_peer_ni *
 void
 ksocknal_peer_failed(struct ksock_peer_ni *peer_ni)
 {
-	int notify = 0;
+	bool notify = false;
 	time64_t last_alive = 0;
 
 	/*
@@ -1547,7 +1547,7 @@ struct ksock_peer_ni *
 	    list_empty(&peer_ni->ksnp_conns) &&
 	    !peer_ni->ksnp_accepting &&
 	    !ksocknal_find_connecting_route_locked(peer_ni)) {
-		notify = 1;
+		notify = true;
 		last_alive = peer_ni->ksnp_last_alive;
 	}
 
@@ -1598,15 +1598,14 @@ struct ksock_peer_ni *
 void
 ksocknal_terminate_conn(struct ksock_conn *conn)
 {
-	/*
-	 * This gets called by the reaper (guaranteed thread context) to
+	/* This gets called by the reaper (guaranteed thread context) to
 	 * disengage the socket from its callbacks and close it.
 	 * ksnc_refcount will eventually hit zero, and then the reaper will
 	 * destroy it.
 	 */
 	struct ksock_peer_ni *peer_ni = conn->ksnc_peer;
 	struct ksock_sched *sched = conn->ksnc_scheduler;
-	int failed = 0;
+	bool failed = false;
 
 	LASSERT(conn->ksnc_closing);
 
@@ -1643,7 +1642,7 @@ struct ksock_peer_ni *
 	if (peer_ni->ksnp_error) {
 		/* peer_ni's last conn closed in error */
 		LASSERT(list_empty(&peer_ni->ksnp_conns));
-		failed = 1;
+		failed = true;
 		peer_ni->ksnp_error = 0;     /* avoid multiple notifications */
 	}
 
@@ -2493,7 +2492,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id)
 	for (i = 0; i < net->ksnn_ninterfaces; i++) {
 		char *ifnam = &net->ksnn_interfaces[i].ksni_name[0];
 		char *colon = strchr(ifnam, ':');
-		int found  = 0;
+		bool found  = false;
 		struct ksock_net *tmp;
 		int j;
 
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index 7fa2d58..b1146dc 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -1337,7 +1337,7 @@ int ksocknal_scheduler(void *arg)
 	spin_lock_bh(&sched->kss_lock);
 
 	while (!ksocknal_data.ksnd_shuttingdown) {
-		int did_something = 0;
+		bool did_something = false;
 
 		/* Ensure I progress everything semi-fairly */
 
@@ -1387,7 +1387,7 @@ int ksocknal_scheduler(void *arg)
 				ksocknal_conn_decref(conn);
 			}
 
-			did_something = 1;
+			did_something = true;
 		}
 
 		if (!list_empty(&sched->kss_tx_conns)) {
@@ -1463,7 +1463,7 @@ int ksocknal_scheduler(void *arg)
 				ksocknal_conn_decref(conn);
 			}
 
-			did_something = 1;
+			did_something = true;
 		}
 		if (!did_something ||	/* nothing to do */
 		    need_resched()) {	/* hogging CPU? */
@@ -1767,7 +1767,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 	return 0;
 }
 
-static int
+static bool
 ksocknal_connect(struct ksock_route *route)
 {
 	LIST_HEAD(zombies);
@@ -1776,7 +1776,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 	int wanted;
 	struct socket *sock;
 	time64_t deadline;
-	int retry_later = 0;
+	bool retry_later = 0;
 	int rc = 0;
 
 	deadline = ktime_get_seconds() + ksocknal_timeout();
@@ -1797,7 +1797,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 		 */
 		if (peer_ni->ksnp_closing || route->ksnr_deleted ||
 		    !wanted) {
-			retry_later = 0;
+			retry_later = false;
 			break;
 		}
 
@@ -1807,7 +1807,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 			       "peer_ni %s(%d) already connecting to me, retry later.\n",
 			       libcfs_nid2str(peer_ni->ksnp_id.nid),
 			       peer_ni->ksnp_accepting);
-			retry_later = 1;
+			retry_later = true;
 		}
 
 		if (retry_later) /* needs reschedule */
@@ -2087,7 +2087,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 		struct ksock_route *route = NULL;
 		time64_t sec = ktime_get_real_seconds();
 		long timeout = MAX_SCHEDULE_TIMEOUT;
-		int dropped_lock = 0;
+		bool dropped_lock = false;
 
 		if (ksocknal_connd_check_stop(sec, &timeout)) {
 			/* wakeup another one to check stop */
@@ -2097,7 +2097,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 
 		if (ksocknal_connd_check_start(sec, &timeout)) {
 			/* created new thread */
-			dropped_lock = 1;
+			dropped_lock = true;
 		}
 
 		cr = list_first_entry_or_null(&ksocknal_data.ksnd_connd_connreqs,
@@ -2107,7 +2107,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 
 			list_del(&cr->ksncr_list);
 			spin_unlock_bh(connd_lock);
-			dropped_lock = 1;
+			dropped_lock = true;
 
 			ksocknal_create_conn(cr->ksncr_ni, NULL,
 					     cr->ksncr_sock, SOCKLND_CONN_NONE);
@@ -2130,7 +2130,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 			list_del(&route->ksnr_connd_list);
 			ksocknal_data.ksnd_connd_connecting++;
 			spin_unlock_bh(connd_lock);
-			dropped_lock = 1;
+			dropped_lock = true;
 
 			if (ksocknal_connect(route)) {
 				/* consecutive retry */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 11/49] lnet: Correct asymmetric route detection
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (9 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 10/49] lnet: socklnd: change various ints to bool James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 12/49] lustre: fixup ldlm_pool and lu_object shrinker failure cases James Simmons
                   ` (37 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

Failure to lookup the remote net for LNET_NIDNET(src_nid) indicates an
asymmetric route, but we do not drop the message in this case. Another
problem with this code is that there is no guarantee that we'll have a
route->lr_lnet that matches the net of ni->ni_nid.

We can move the asymmetric route detection to after we have looked up
the lpni of from_nid. Then, we can look at just the routes associated
with the gateway that owns the lpni. If one of those routes has
lr_net == LNET_NIDNET(src_nid), then the route is symmetrical.

Fixes: ed7389fa9f ("lnet: check for asymmetrical route messages")
HPE-bug-id: LUS-9087
WC-bug-id: https://jira.whamcloud.com/browse/LU-13779
Lustre-commit: 955080c3ae3f33c ("LU-13779 lnet: Correct asymmetric route detection")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39349
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 80 ++++++++++++++++--------------------------------
 1 file changed, 27 insertions(+), 53 deletions(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 25e0fd2..1868506 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -4308,59 +4308,6 @@ void lnet_monitor_thr_stop(void)
 		goto drop;
 	}
 
-	if (lnet_drop_asym_route && for_me &&
-	    LNET_NIDNET(src_nid) != LNET_NIDNET(from_nid)) {
-		struct lnet_net *net;
-		struct lnet_remotenet *rnet;
-		bool found = true;
-
-		/* we are dealing with a routed message,
-		 * so see if route to reach src_nid goes through from_nid
-		 */
-		lnet_net_lock(cpt);
-		net = lnet_get_net_locked(LNET_NIDNET(ni->ni_nid));
-		if (!net) {
-			lnet_net_unlock(cpt);
-			CERROR("net %s not found\n",
-			       libcfs_net2str(LNET_NIDNET(ni->ni_nid)));
-			return -EPROTO;
-		}
-
-		rnet = lnet_find_rnet_locked(LNET_NIDNET(src_nid));
-		if (rnet) {
-			struct lnet_peer *gw = NULL;
-			struct lnet_peer_ni *lpni = NULL;
-			struct lnet_route *route;
-
-			list_for_each_entry(route, &rnet->lrn_routes, lr_list) {
-				found = false;
-				gw = route->lr_gateway;
-				if (route->lr_lnet != net->net_id)
-					continue;
-				/* if the nid is one of the gateway's NIDs
-				 * then this is a valid gateway
-				 */
-				while ((lpni = lnet_get_next_peer_ni_locked(gw, NULL, lpni)) != NULL) {
-					if (lpni->lpni_nid == from_nid) {
-						found = true;
-						break;
-					}
-				}
-			}
-		}
-		lnet_net_unlock(cpt);
-		if (!found) {
-			/* we would not use from_nid to route a message to
-			 * src_nid
-			 * => asymmetric routing detected but forbidden
-			 */
-			CERROR("%s, src %s: Dropping asymmetrical route %s\n",
-			       libcfs_nid2str(from_nid),
-			       libcfs_nid2str(src_nid), lnet_msgtyp2str(type));
-			goto drop;
-		}
-	}
-
 	msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS);
 	if (!msg) {
 		CERROR("%s, src %s: Dropping %s (out of memory)\n",
@@ -4410,6 +4357,33 @@ void lnet_monitor_thr_stop(void)
 		goto drop;
 	}
 
+	if (lnet_drop_asym_route && for_me &&
+	    LNET_NIDNET(src_nid) != LNET_NIDNET(from_nid)) {
+		u32 src_net_id = LNET_NIDNET(src_nid);
+		struct lnet_peer *gw = lpni->lpni_peer_net->lpn_peer;
+		struct lnet_route *route;
+		bool found = false;
+
+		list_for_each_entry(route, &gw->lp_routes, lr_gwlist) {
+			if (route->lr_net == src_net_id) {
+				found = true;
+				break;
+			}
+		}
+		if (!found) {
+			lnet_net_unlock(cpt);
+			/* we would not use from_nid to route a message to
+			 * src_nid
+			 * => asymmetric routing detected but forbidden
+			 */
+			CERROR("%s, src %s: Dropping asymmetrical route %s\n",
+			       libcfs_nid2str(from_nid),
+			       libcfs_nid2str(src_nid), lnet_msgtyp2str(type));
+			kfree(msg);
+			goto drop;
+		}
+	}
+
 	if (the_lnet.ln_routing)
 		lpni->lpni_last_alive = ktime_get_seconds();
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 12/49] lustre: fixup ldlm_pool and lu_object shrinker failure cases
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (10 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 11/49] lnet: Correct asymmetric route detection James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 13/49] lustre: log: Add ending newline for some messages James Simmons
                   ` (36 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

For ldlm_pools, ldlm_pools_fini() can be called when ldlm_pools_init()
fails, or even in case where it hasn't been called.  So add a static
flag to ensure we ldlm_pools_fini() does undo things that haven't been
done.

For lu_global_init() we need to add proper cleanup if anything fails.

Fixes: ab33cb5ad1e57 ("drivers: lustre: obdclass: check result of register_shrinker()")
WC-bug-id: https://jira.whamcloud.com/browse/LU-12477
Lustre-commit: 812b2ccf0284df42 ("LU-12477 lustre: check return status of register_shrinker()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/40883
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_pool.c     | 43 +++++++++++++++++++-----------------------
 fs/lustre/obdclass/lu_object.c | 39 +++++++++++++++++++-------------------
 2 files changed, 38 insertions(+), 44 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c
index 2e4d16b..b93ad4d 100644
--- a/fs/lustre/ldlm/ldlm_pool.c
+++ b/fs/lustre/ldlm/ldlm_pool.c
@@ -889,6 +889,12 @@ static unsigned long ldlm_pools_cli_scan(struct shrinker *s,
 			       sc->gfp_mask);
 }
 
+static struct shrinker ldlm_pools_cli_shrinker = {
+	.count_objects	= ldlm_pools_cli_count,
+	.scan_objects	= ldlm_pools_cli_scan,
+	.seeks		= DEFAULT_SEEKS,
+};
+
 static void ldlm_pools_recalc(struct work_struct *ws);
 static DECLARE_DELAYED_WORK(ldlm_recalc_pools, ldlm_pools_recalc);
 
@@ -976,40 +982,29 @@ static void ldlm_pools_recalc(struct work_struct *ws)
 	schedule_delayed_work(&ldlm_recalc_pools, delay * HZ);
 }
 
-static int ldlm_pools_thread_start(void)
-{
-	time64_t delay = LDLM_POOL_CLI_DEF_RECALC_PERIOD;
-
-	schedule_delayed_work(&ldlm_recalc_pools, delay);
-
-	return 0;
-}
-
-static void ldlm_pools_thread_stop(void)
-{
-	cancel_delayed_work_sync(&ldlm_recalc_pools);
-}
-
-static struct shrinker ldlm_pools_cli_shrinker = {
-	.count_objects	= ldlm_pools_cli_count,
-	.scan_objects	= ldlm_pools_cli_scan,
-	.seeks		= DEFAULT_SEEKS,
-};
+static bool ldlm_pools_init_done;
 
 int ldlm_pools_init(void)
 {
+	time64_t delay = LDLM_POOL_CLI_DEF_RECALC_PERIOD;
 	int rc;
 
-	rc = ldlm_pools_thread_start();
-	if (!rc)
-		rc = register_shrinker(&ldlm_pools_cli_shrinker);
+	rc = register_shrinker(&ldlm_pools_cli_shrinker);
+	if (rc)
+		goto out;
 
+	schedule_delayed_work(&ldlm_recalc_pools, delay);
+	ldlm_pools_init_done = true;
+out:
 	return rc;
 }
 
 void ldlm_pools_fini(void)
 {
-	unregister_shrinker(&ldlm_pools_cli_shrinker);
+	if (ldlm_pools_init_done) {
+		unregister_shrinker(&ldlm_pools_cli_shrinker);
 
-	ldlm_pools_thread_stop();
+		cancel_delayed_work_sync(&ldlm_recalc_pools);
+	}
 }
+
diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c
index e8fc328..fcf0739 100644
--- a/fs/lustre/obdclass/lu_object.c
+++ b/fs/lustre/obdclass/lu_object.c
@@ -2138,10 +2138,8 @@ int lu_global_init(void)
 
 	LU_CONTEXT_KEY_INIT(&lu_global_key);
 	result = lu_context_key_register(&lu_global_key);
-	if (result != 0) {
-		lu_ref_global_fini();
-		return result;
-	}
+	if (result != 0)
+		goto out_lu_ref;
 
 	/*
 	 * At this level, we don't know what tags are needed, so allocate them
@@ -2153,8 +2151,7 @@ int lu_global_init(void)
 	up_write(&lu_sites_guard);
 	if (result != 0) {
 		lu_context_key_degister(&lu_global_key);
-		lu_ref_global_fini();
-		return result;
+		goto out_lu_ref;
 	}
 
 	/*
@@ -2163,23 +2160,25 @@ int lu_global_init(void)
 	 * lu_object/inode cache consuming all the memory.
 	 */
 	result = register_shrinker(&lu_site_shrinker);
-	if (result == 0) {
-		result = rhashtable_init(&lu_env_rhash, &lu_env_rhash_params);
-		if (result != 0)
-			unregister_shrinker(&lu_site_shrinker);
-	}
-	if (result != 0) {
-		/* Order explained in lu_global_fini(). */
-		lu_context_key_degister(&lu_global_key);
+	if (result)
+		goto out_env;
 
-		down_write(&lu_sites_guard);
-		lu_env_fini(&lu_shrink_env);
-		up_write(&lu_sites_guard);
+	result = rhashtable_init(&lu_env_rhash, &lu_env_rhash_params);
+	if (result)
+		goto out_shrinker;
 
-		lu_ref_global_fini();
-		return result;
-	}
+	return result;
 
+out_shrinker:
+	unregister_shrinker(&lu_site_shrinker);
+out_env:
+	/* Order explained in lu_global_fini(). */
+	lu_context_key_degister(&lu_global_key);
+	down_write(&lu_sites_guard);
+	lu_env_fini(&lu_shrink_env);
+	up_write(&lu_sites_guard);
+out_lu_ref:
+	lu_ref_global_fini();
 	return result;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 13/49] lustre: log: Add ending newline for some messages.
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (11 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 12/49] lustre: fixup ldlm_pool and lu_object shrinker failure cases James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 14/49] lustre: use with_imp_locked() more broadly James Simmons
                   ` (35 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lei Feng, Lustre Development List

From: Lei Feng <flei@whamcloud.com>

Some log messages don't have ending newline. So two log messages
will be merged into one line and cause error for parsing program.
Add ending newline for these messages.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14431
Lustre-commit: 503bf7f29a499140 ("LU-14431 log: Add ending newline for some messages.")
Signed-off-by: Lei Feng <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41723
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Anjus George <georgea@ornl.gov>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/llog_client.c | 2 +-
 fs/lustre/ptlrpc/service.c     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/ptlrpc/llog_client.c b/fs/lustre/ptlrpc/llog_client.c
index 79764cf..8bbff60 100644
--- a/fs/lustre/ptlrpc/llog_client.c
+++ b/fs/lustre/ptlrpc/llog_client.c
@@ -53,7 +53,7 @@
 	} else {							\
 		CERROR("ctxt->loc_imp == NULL for context idx %d."	\
 		       "Unable to complete MDS/OSS recovery,"		\
-		       "but I'll try again next time.  Not fatal.\n",	\
+		       "but I'll try again next time. Not fatal.\n",	\
 		       ctxt->loc_idx);					\
 		imp = NULL;						\
 		mutex_unlock(&ctxt->loc_mutex);				\
diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c
index b341877..f3f94d4 100644
--- a/fs/lustre/ptlrpc/service.c
+++ b/fs/lustre/ptlrpc/service.c
@@ -596,7 +596,7 @@ struct ptlrpc_service *ptlrpc_register_service(struct ptlrpc_service_conf *conf,
 						 strlen(cconf->cc_pattern),
 						 0, ncpts - 1, &el);
 			if (rc != 0) {
-				CERROR("%s: invalid CPT pattern string: %s",
+				CERROR("%s: invalid CPT pattern string: %s\n",
 				       conf->psc_name, cconf->cc_pattern);
 				return ERR_PTR(-EINVAL);
 			}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 14/49] lustre: use with_imp_locked() more broadly.
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (12 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 13/49] lustre: log: Add ending newline for some messages James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 15/49] lnet: o2iblnd: change some ints to bool James Simmons
                   ` (34 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Several places in lustre take u.cli.cl_sem to protect access to
u.cli.cl_import, and so could use with_imp_locked() achieving cleaner
code.

Using with_imp_locked() in functions calling
ptlrpc_set_import_active() requires care as that function gets a
write-lock on ->cl_sem.  So they need to use with_imp_locked() only to
get a counted reference on the imp, and must drop the lock before
calling ptlrpc_set_import_active().

This patch makes those changes and also:

- introduces with_imp_locked_nested() for sptlrpc_conf_client_adapt(),
- re-indents obd_cleanup_client_import(), which is only tangentially
  related the the main purpose of this patch,
- removes code in ldlm_flock_completion_ast() which takes a copy
  of cl_import, and doesn't use it.
- adds with_imp_locked() to two functions named 'active_store' which
  weren't using it but should
- removes with_imp_locked() from ping_show() and instead includes it
  in ptlrpc_obd_ping() where 'imp' is actually used.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9855
Lustre-commit: 168ec247779f3ab7 ("LU-9855 lustre: use with_imp_locked() more broadly.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39595
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lprocfs_status.h | 11 ++++-
 fs/lustre/include/obd_class.h      |  5 +-
 fs/lustre/mdc/lproc_mdc.c          | 14 +++---
 fs/lustre/mdc/mdc_request.c        | 12 ++---
 fs/lustre/mgc/mgc_request.c        | 94 +++++++++++++++++++-------------------
 fs/lustre/osc/lproc_osc.c          | 10 +++-
 fs/lustre/osc/osc_request.c        | 12 ++---
 fs/lustre/ptlrpc/pinger.c          | 21 +++++----
 fs/lustre/ptlrpc/sec_config.c      |  8 +---
 9 files changed, 98 insertions(+), 89 deletions(-)

diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h
index 33d78de..9051de7 100644
--- a/fs/lustre/include/lprocfs_status.h
+++ b/fs/lustre/include/lprocfs_status.h
@@ -488,14 +488,21 @@ void lprocfs_stats_collect(struct lprocfs_stats *stats, int idx,
 
 /* You must use these macros when you want to refer to
  * the import in a client obd_device for a lprocfs entry
+ * Note that it is not safe to 'goto', 'return' or 'break'
+ * out of the body of this statement.  It *IS* safe to
+ * 'goto' the a label inside the statement, or to 'continue'
+ * to get out of the statement.
  */
-#define with_imp_locked(__obd, __imp, __rc)				\
-	for (down_read(&(__obd)->u.cli.cl_sem),				\
+#define with_imp_locked_nested(__obd, __imp, __rc, __nested)		\
+	for (down_read_nested(&(__obd)->u.cli.cl_sem, __nested),	\
 	     __imp = (__obd)->u.cli.cl_import,				\
 	     __rc = __imp ? 0 : -ENODEV;				\
 	     __imp ? 1 : (up_read(&(__obd)->u.cli.cl_sem), 0);		\
 	     __imp = NULL)
 
+#define with_imp_locked(__obd, __imp, __rc)	\
+	with_imp_locked_nested(__obd, __imp, __rc, 0)
+
 /* write the name##_seq_show function, call LDEBUGFS_SEQ_FOPS_RO for read-only
  * debugfs entries; otherwise, you will define name##_seq_write function also
  * for a read-write debugfs entry, and then call LDEBUGFS_SEQ_SEQ instead.
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index b441215..6bcd89b 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -520,9 +520,8 @@ static inline int obd_cleanup(struct obd_device *obd)
 
 static inline void obd_cleanup_client_import(struct obd_device *obd)
 {
-	/*
-	 * If we set up but never connected, the
-	 * client import will not have been cleaned.
+	/* If we set up but never connected, the client import will not
+	 * have been cleaned.
 	 */
 	down_write(&obd->u.cli.cl_sem);
 	if (obd->u.cli.cl_import) {
diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index 3a2c37a2..af2b725 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -342,6 +342,7 @@ static ssize_t active_store(struct kobject *kobj, struct attribute *attr,
 {
 	struct obd_device *obd = container_of(kobj, struct obd_device,
 					      obd_kset.kobj);
+	struct obd_import *imp, *imp0;
 	bool val;
 	int rc;
 
@@ -349,15 +350,16 @@ static ssize_t active_store(struct kobject *kobj, struct attribute *attr,
 	if (rc)
 		return rc;
 
+	with_imp_locked(obd, imp0, rc)
+		imp = class_import_get(imp0);
 	/* opposite senses */
-	if (obd->u.cli.cl_import->imp_deactive == val) {
+	if (imp->imp_deactive == val)
 		rc = ptlrpc_set_import_active(obd->u.cli.cl_import, val);
-		if (rc)
-			count = rc;
-	} else {
+	else
 		CDEBUG(D_CONFIG, "activate %u: ignoring repeat request\n", val);
-	}
-	return count;
+
+	class_import_put(imp);
+	return rc ?: count;
 }
 LUSTRE_RW_ATTR(active);
 
diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index a146af8..ef27af6 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -1589,19 +1589,17 @@ static int mdc_statfs(const struct lu_env *env,
 	struct req_format *fmt;
 	struct ptlrpc_request *req;
 	struct obd_statfs *msfs;
-	struct obd_import *imp = NULL;
+	struct obd_import *imp, *imp0;
 	int rc;
 
 	/*
 	 * Since the request might also come from lprocfs, so we need
 	 * sync this with client_disconnect_export Bug15684
 	 */
-	down_read(&obd->u.cli.cl_sem);
-	if (obd->u.cli.cl_import)
-		imp = class_import_get(obd->u.cli.cl_import);
-	up_read(&obd->u.cli.cl_sem);
-	if (!imp)
-		return -ENODEV;
+	with_imp_locked(obd, imp0, rc)
+		imp = class_import_get(imp0);
+	if (rc)
+		return rc;
 
 	fmt = &RQF_MDS_STATFS;
 	if ((exp_connect_flags2(exp) & OBD_CONNECT2_SUM_STATFS) &&
diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c
index 8133f27..f115479 100644
--- a/fs/lustre/mgc/mgc_request.c
+++ b/fs/lustre/mgc/mgc_request.c
@@ -1133,6 +1133,7 @@ static int mgc_apply_recover_logs(struct obd_device *mgc,
 		int entry_len = sizeof(*entry);
 		int is_ost;
 		struct obd_device *obd;
+		struct obd_import *imp;
 		char *obdname;
 		char *cname;
 		char *params;
@@ -1210,8 +1211,8 @@ static int mgc_apply_recover_logs(struct obd_device *mgc,
 		pos += sprintf(obdname + pos, "-%s%04x",
 				  is_ost ? "OST" : "MDT", entry->mne_index);
 
-		cname = is_ost ? "osc" : "mdc";
-		pos += sprintf(obdname + pos, "-%s-%s", cname, inst);
+		cname = is_ost ? "osc" : "mdc",
+			pos += sprintf(obdname + pos, "-%s-%s", cname, inst);
 		lustre_cfg_bufs_reset(&bufs, obdname);
 
 		/* find the obd by obdname */
@@ -1230,54 +1231,56 @@ static int mgc_apply_recover_logs(struct obd_device *mgc,
 		pos += sprintf(params, "%s.import=%s", cname, "connection=");
 		uuid = buf + pos;
 
-		down_read(&obd->u.cli.cl_sem);
-		if (!obd->u.cli.cl_import) {
-			/* client does not connect to the OST yet */
-			up_read(&obd->u.cli.cl_sem);
-			rc = 0;
-			continue;
-		}
-
-		/* iterate all nids to find one */
-		/* find uuid by nid */
-		/* create import entries if they don't exist */
-		rc = client_import_add_nids_to_conn(obd->u.cli.cl_import,
-						    entry->u.nids,
-						    entry->mne_nid_count,
-						    (struct obd_uuid *)uuid);
-		if (rc == -ENOENT && dynamic_nids) {
-			/* create a new connection for this import */
-			char *primary_nid = libcfs_nid2str(entry->u.nids[0]);
-			int prim_nid_len = strlen(primary_nid) + 1;
-			struct obd_uuid server_uuid;
-
-			if (prim_nid_len > UUID_MAX)
-				goto fail;
-			strncpy(server_uuid.uuid, primary_nid, prim_nid_len);
-
-			CDEBUG(D_INFO, "Adding a connection for %s\n",
-			       primary_nid);
-
-			rc = client_import_dyn_add_conn(obd->u.cli.cl_import,
-							&server_uuid,
-							entry->u.nids[0], 1);
-			if (rc < 0) {
-				CERROR("%s: Failed to add new connection with NID '%s' to import: rc = %d\n",
-				       obd->obd_name, primary_nid, rc);
-				goto fail;
-			}
-			rc = client_import_add_nids_to_conn(obd->u.cli.cl_import,
+		with_imp_locked(obd, imp, rc) {
+			/* iterate all nids to find one */
+			/* find uuid by nid */
+			/* create import entries if they don't exist */
+			rc = client_import_add_nids_to_conn(imp,
 							    entry->u.nids,
 							    entry->mne_nid_count,
 							    (struct obd_uuid *)uuid);
-			if (rc < 0) {
-				CERROR("%s: failed to lookup UUID: rc = %d\n",
-				       obd->obd_name, rc);
-				goto fail;
+			if (rc == -ENOENT && dynamic_nids) {
+				/* create a new connection for this import */
+				char *primary_nid =
+					libcfs_nid2str(entry->u.nids[0]);
+				int prim_nid_len = strlen(primary_nid) + 1;
+				struct obd_uuid server_uuid;
+
+				if (prim_nid_len > UUID_MAX)
+					goto fail;
+				strncpy(server_uuid.uuid, primary_nid,
+					prim_nid_len);
+
+				CDEBUG(D_INFO, "Adding a connection for %s\n",
+				       primary_nid);
+
+				rc = client_import_dyn_add_conn(imp,
+								&server_uuid,
+								entry->u.nids[0],
+								1);
+				if (rc < 0) {
+					CERROR("%s: Failed to add new connection with NID '%s' to import: rc = %d\n",
+					       obd->obd_name, primary_nid, rc);
+					goto fail;
+				}
+				rc = client_import_add_nids_to_conn(imp,
+								    entry->u.nids,
+								    entry->mne_nid_count,
+								    (struct obd_uuid *)uuid);
+				if (rc < 0) {
+					CERROR("%s: failed to lookup UUID: rc = %d\n",
+					       obd->obd_name, rc);
+					goto fail;
+				}
 			}
+fail:;
 		}
-fail:
-		up_read(&obd->u.cli.cl_sem);
+		if (rc == -ENODEV) {
+			/* client does not connect to the OST yet */
+			rc = 0;
+			continue;
+		}
+
 		if (rc < 0 && rc != -ENOSPC) {
 			CERROR("mgc: cannot find UUID by nid '%s': rc = %d\n",
 			       libcfs_nid2str(entry->u.nids[0]), rc);
@@ -1293,7 +1296,6 @@ static int mgc_apply_recover_logs(struct obd_device *mgc,
 
 		lustre_cfg_bufs_set_string(&bufs, 1, params);
 
-		rc = -ENOMEM;
 		len = lustre_cfg_len(bufs.lcfg_bufcount, bufs.lcfg_buflen);
 		lcfg = kzalloc(len, GFP_NOFS);
 		if (!lcfg) {
diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c
index e64176e..df48c76 100644
--- a/fs/lustre/osc/lproc_osc.c
+++ b/fs/lustre/osc/lproc_osc.c
@@ -61,6 +61,7 @@ static ssize_t active_store(struct kobject *kobj, struct attribute *attr,
 {
 	struct obd_device *obd = container_of(kobj, struct obd_device,
 					      obd_kset.kobj);
+	struct obd_import *imp, *imp0;
 	bool val;
 	int rc;
 
@@ -68,14 +69,19 @@ static ssize_t active_store(struct kobject *kobj, struct attribute *attr,
 	if (rc)
 		return rc;
 
+	with_imp_locked(obd, imp0, rc)
+		imp = class_import_get(imp0);
+	if (rc)
+		return rc;
 	/* opposite senses */
-	if (obd->u.cli.cl_import->imp_deactive == val)
+	if (imp->imp_deactive == val)
 		rc = ptlrpc_set_import_active(obd->u.cli.cl_import, val);
 	else
 		CDEBUG(D_CONFIG, "activate %u: ignoring repeat request\n",
 		       val);
+	class_import_put(imp);
 
-	return count;
+	return rc ?: count;
 }
 LUSTRE_RW_ATTR(active);
 
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 066ecdb..8046e33 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -3069,18 +3069,16 @@ static int osc_statfs(const struct lu_env *env, struct obd_export *exp,
 	struct obd_device *obd = class_exp2obd(exp);
 	struct obd_statfs *msfs;
 	struct ptlrpc_request *req;
-	struct obd_import *imp = NULL;
+	struct obd_import *imp, *imp0;
 	int rc;
 
 	/* Since the request might also come from lprocfs, so we need
 	 * sync this with client_disconnect_export Bug15684
 	 */
-	down_read(&obd->u.cli.cl_sem);
-	if (obd->u.cli.cl_import)
-		imp = class_import_get(obd->u.cli.cl_import);
-	up_read(&obd->u.cli.cl_sem);
-	if (!imp)
-		return -ENODEV;
+	with_imp_locked(obd, imp0, rc)
+		imp = class_import_get(imp0);
+	if (rc)
+		return rc;
 
 	/* We could possibly pass max_age in the request (as an absolute
 	 * timestamp or a "seconds.usec ago") so the target can avoid doing
diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c
index 178153c..99d077b 100644
--- a/fs/lustre/ptlrpc/pinger.c
+++ b/fs/lustre/ptlrpc/pinger.c
@@ -71,17 +71,18 @@ int ptlrpc_obd_ping(struct obd_device *obd)
 {
 	int rc;
 	struct ptlrpc_request *req;
+	struct obd_import *imp;
 
-	req = ptlrpc_prep_ping(obd->u.cli.cl_import);
-	if (!req)
-		return -ENOMEM;
-
-	req->rq_send_state = LUSTRE_IMP_FULL;
-
-	rc = ptlrpc_queue_wait(req);
-
-	ptlrpc_req_finished(req);
-
+	with_imp_locked(obd, imp, rc) {
+		req = ptlrpc_prep_ping(imp);
+		if (!req) {
+			rc = -ENOMEM;
+			continue;
+		}
+		req->rq_send_state = LUSTRE_IMP_FULL;
+		rc = ptlrpc_queue_wait(req);
+		ptlrpc_req_finished(req);
+	}
 	return rc;
 }
 EXPORT_SYMBOL(ptlrpc_obd_ping);
diff --git a/fs/lustre/ptlrpc/sec_config.c b/fs/lustre/ptlrpc/sec_config.c
index 0891f2f..d9e3520 100644
--- a/fs/lustre/ptlrpc/sec_config.c
+++ b/fs/lustre/ptlrpc/sec_config.c
@@ -836,24 +836,20 @@ void sptlrpc_conf_choose_flavor(enum lustre_sec_part from,
 void sptlrpc_conf_client_adapt(struct obd_device *obd)
 {
 	struct obd_import *imp;
+	int rc;
 
 	LASSERT(strcmp(obd->obd_type->typ_name, LUSTRE_MDC_NAME) == 0 ||
 		strcmp(obd->obd_type->typ_name, LUSTRE_OSC_NAME) == 0);
 	CDEBUG(D_SEC, "obd %s\n", obd->u.cli.cl_target_uuid.uuid);
 
 	/* serialize with connect/disconnect import */
-	down_read_nested(&obd->u.cli.cl_sem, OBD_CLI_SEM_MDCOSC);
-
-	imp = obd->u.cli.cl_import;
-	if (imp) {
+	with_imp_locked_nested(obd, imp, rc, OBD_CLI_SEM_MDCOSC) {
 		write_lock(&imp->imp_sec_lock);
 		if (imp->imp_sec)
 			imp->imp_sec_expire = ktime_get_real_seconds() +
 				SEC_ADAPT_DELAY;
 		write_unlock(&imp->imp_sec_lock);
 	}
-
-	up_read(&obd->u.cli.cl_sem);
 }
 EXPORT_SYMBOL(sptlrpc_conf_client_adapt);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 15/49] lnet: o2iblnd: change some ints to bool.
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (13 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 14/49] lustre: use with_imp_locked() more broadly James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 16/49] lustre: lmv: striped directory as subdirectory mount James Simmons
                   ` (33 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Each of these ints can suitably be bool.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12678
Lustre-commit: 86e192059905ac49 ("LU-12678 o2iblnd: change some ints to bool.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39304
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c    |  8 ++++----
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 26 +++++++++++++-------------
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index c8cebf6..01cc1ed 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -127,7 +127,7 @@ static int kiblnd_msgtype2size(int type)
 	}
 }
 
-static int kiblnd_unpack_rd(struct kib_msg *msg, int flip)
+static int kiblnd_unpack_rd(struct kib_msg *msg, bool flip)
 {
 	struct kib_rdma_desc *rd;
 	int nob;
@@ -206,7 +206,7 @@ int kiblnd_unpack_msg(struct kib_msg *msg, int nob)
 	u32 msg_cksum;
 	u16 version;
 	int msg_nob;
-	int flip;
+	bool flip;
 
 	/* 6 bytes are enough to have received magic + version */
 	if (nob < 6) {
@@ -215,9 +215,9 @@ int kiblnd_unpack_msg(struct kib_msg *msg, int nob)
 	}
 
 	if (msg->ibm_magic == IBLND_MSG_MAGIC) {
-		flip = 0;
+		flip = false;
 	} else if (msg->ibm_magic == __swab32(IBLND_MSG_MAGIC)) {
-		flip = 1;
+		flip = true;
 	} else {
 		CERROR("Bad magic: %08x\n", msg->ibm_magic);
 		return -EPROTO;
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 2ebda4e..5066c93 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -2758,7 +2758,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		if (priv_nob >= offsetof(struct kib_rej, ibr_padding)) {
 			struct kib_rej *rej = priv;
 			struct kib_connparams *cp = NULL;
-			int flip = 0;
+			bool flip = false;
 			u64 incarnation = -1;
 
 			/* NB. default incarnation is -1 because:
@@ -2777,7 +2777,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			    rej->ibr_magic == __swab32(LNET_PROTO_MAGIC)) {
 				__swab32s(&rej->ibr_magic);
 				__swab16s(&rej->ibr_version);
-				flip = 1;
+				flip = true;
 			}
 
 			if (priv_nob >= sizeof(struct kib_rej) &&
@@ -3385,7 +3385,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	struct kib_conn *conn;
 	int timeout;
 	int i;
-	int dropped_lock;
+	bool dropped_lock;
 	int peer_index = 0;
 	unsigned long deadline = jiffies;
 
@@ -3397,7 +3397,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	while (!kiblnd_data.kib_shutdown) {
 		int reconn = 0;
 
-		dropped_lock = 0;
+		dropped_lock = false;
 
 		conn = list_first_entry_or_null(
 			&kiblnd_data.kib_connd_zombies,
@@ -3412,7 +3412,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			}
 
 			spin_unlock_irqrestore(lock, flags);
-			dropped_lock = 1;
+			dropped_lock = true;
 
 			kiblnd_destroy_conn(conn);
 
@@ -3439,7 +3439,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			list_del(&conn->ibc_list);
 
 			spin_unlock_irqrestore(lock, flags);
-			dropped_lock = 1;
+			dropped_lock = true;
 
 			kiblnd_disconnect_conn(conn);
 			wait = conn->ibc_waits;
@@ -3469,7 +3469,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			list_del(&conn->ibc_list);
 
 			spin_unlock_irqrestore(lock, flags);
-			dropped_lock = 1;
+			dropped_lock = true;
 
 			reconn += kiblnd_reconnect_peer(conn->ibc_peer);
 			kiblnd_peer_decref(conn->ibc_peer);
@@ -3503,7 +3503,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			unsigned int lnd_timeout;
 
 			spin_unlock_irqrestore(lock, flags);
-			dropped_lock = 1;
+			dropped_lock = true;
 
 			/*
 			 * Time to check for RDMA timeouts on a few more
@@ -3677,7 +3677,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	wait_queue_entry_t wait;
 	unsigned long flags;
 	struct ib_wc wc;
-	int did_something;
+	bool did_something;
 	int rc;
 
 	init_wait(&wait);
@@ -3701,7 +3701,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			spin_lock_irqsave(&sched->ibs_lock, flags);
 		}
 
-		did_something = 0;
+		did_something = false;
 
 		conn = list_first_entry_or_null(&sched->ibs_conns,
 						struct kib_conn,
@@ -3778,7 +3778,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			}
 
 			kiblnd_conn_decref(conn); /* ...drop my ref from above */
-			did_something = 1;
+			did_something = true;
 		}
 
 		if (did_something)
@@ -3816,14 +3816,14 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	write_lock_irqsave(glock, flags);
 
 	while (!kiblnd_data.kib_shutdown) {
-		int do_failover = 0;
+		bool do_failover = false;
 		int long_sleep;
 
 		list_for_each_entry(dev, &kiblnd_data.kib_failed_devs,
 				    ibd_fail_list) {
 			if (ktime_get_seconds() < dev->ibd_next_failover)
 				continue;
-			do_failover = 1;
+			do_failover = true;
 			break;
 		}
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 16/49] lustre: lmv: striped directory as subdirectory mount
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (14 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 15/49] lnet: o2iblnd: change some ints to bool James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 17/49] lustre: llite: create file_operations registration function James Simmons
                   ` (32 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

lmv_intent_lookup() will replace fid1 with stripe FID, but if striped
directory is mounted as subdirectory mount, it should be handled
differently. Because fid2 is directory master object, if stripe is
located on different MDT as master object, it will be treated as
remote object by server, thus server won't reply LOOKUP lock back,
therefore each file access needs to lookup "/".

And remote directory (either plain or striped) shouldn't be used for
subdirectory mount, because remote object can't get LOOKUP lock.
Add an option "mdt_enable_remote_subdir_mount" (1 by default for
backward compatibility), mdt_get_root() will return -EREMOTE if
user specified subdir is a remote directory and this option is
disabled.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14490
Lustre-commit: 775f88ed6c8b623 ("LU-14490 lmv: striped directory as subdirectory mount")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41893
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lmv/lmv_intent.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c
index 38b8c75..2a15ec2 100644
--- a/fs/lustre/lmv/lmv_intent.c
+++ b/fs/lustre/lmv/lmv_intent.c
@@ -451,11 +451,20 @@ static int lmv_intent_lookup(struct obd_export *exp,
 
 retry:
 	if (op_data->op_flags & MF_GETATTR_BY_FID) {
-		/* getattr by FID, replace fid1 with stripe FID */
+		/* getattr by FID, replace fid1 with stripe FID,
+		 * NB, don't replace if name is "/", because it may be a subtree
+		 * mount, and if it's a striped directory, fid1 will be replaced
+		 * to stripe FID by hash, while fid2 is master object FID, which
+		 * will be treated as a remote object if the two FIDs are
+		 * located on different MDTs, and LOOKUP lock can't be fetched.
+		 */
 		LASSERT(op_data->op_name);
-		tgt = lmv_locate_tgt(lmv, op_data);
-		if (IS_ERR(tgt))
-			return PTR_ERR(tgt);
+		if (op_data->op_namelen != 1 ||
+		    strncmp(op_data->op_name, "/", 1) != 0) {
+			tgt = lmv_locate_tgt(lmv, op_data);
+			if (IS_ERR(tgt))
+				return PTR_ERR(tgt);
+		}
 
 		/* name is used to locate stripe target, clear it here
 		 * to avoid packing name in request, so that MDS knows
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 17/49] lustre: llite: create file_operations registration function.
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (15 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 16/49] lustre: lmv: striped directory as subdirectory mount James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 18/49] lustre: osc: fix performance regression in osc_extent_merge() James Simmons
                   ` (31 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Create new ll_register_file_operations() to set sbi->ll_ops to the
correct struct file_operations. We can make all the struct
file_operations static.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: cfa2c25f1b9b0344 ("LU-6142 llite: create file_operations registration function.")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/40608
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 fs/lustre/llite/file.c           | 18 +++++++++++++++---
 fs/lustre/llite/llite_internal.h |  4 +---
 fs/lustre/llite/llite_lib.c      |  7 +------
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 0d866ec..767eafa 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -5116,7 +5116,7 @@ int ll_inode_permission(struct inode *inode, int mask)
 }
 
 /* -o localflock - only provides locally consistent flock locks */
-const struct file_operations ll_file_operations = {
+static const struct file_operations ll_file_operations = {
 	.read_iter		= ll_file_read_iter,
 	.write_iter		= ll_file_write_iter,
 	.unlocked_ioctl		= ll_file_ioctl,
@@ -5130,7 +5130,7 @@ int ll_inode_permission(struct inode *inode, int mask)
 	.fallocate		= ll_fallocate,
 };
 
-const struct file_operations ll_file_operations_flock = {
+static const struct file_operations ll_file_operations_flock = {
 	.read_iter		= ll_file_read_iter,
 	.write_iter		= ll_file_write_iter,
 	.unlocked_ioctl		= ll_file_ioctl,
@@ -5147,7 +5147,7 @@ int ll_inode_permission(struct inode *inode, int mask)
 };
 
 /* These are for -o noflock - to return ENOSYS on flock calls */
-const struct file_operations ll_file_operations_noflock = {
+static const struct file_operations ll_file_operations_noflock = {
 	.read_iter		= ll_file_read_iter,
 	.write_iter		= ll_file_write_iter,
 	.unlocked_ioctl		= ll_file_ioctl,
@@ -5172,6 +5172,18 @@ int ll_inode_permission(struct inode *inode, int mask)
 	.get_acl		= ll_get_acl,
 };
 
+const struct file_operations *ll_select_file_operations(struct ll_sb_info *sbi)
+{
+	const struct file_operations *fops = &ll_file_operations_noflock;
+
+	if (sbi->ll_flags & LL_SBI_FLOCK)
+		fops = &ll_file_operations_flock;
+	else if (sbi->ll_flags & LL_SBI_LOCALFLOCK)
+		fops = &ll_file_operations;
+
+	return fops;
+}
+
 int ll_layout_conf(struct inode *inode, const struct cl_object_conf *conf)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 041f6d3..0d97253 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1049,10 +1049,8 @@ void ll_cl_add(struct file *file, const struct lu_env *env, struct cl_io *io,
 extern const struct address_space_operations ll_aops;
 
 /* llite/file.c */
-extern const struct file_operations ll_file_operations;
-extern const struct file_operations ll_file_operations_flock;
-extern const struct file_operations ll_file_operations_noflock;
 extern const struct inode_operations ll_file_inode_operations;
+const struct file_operations *ll_select_file_operations(struct ll_sb_info *sbi);
 int ll_have_md_lock(struct inode *inode, u64 *bits,
 		    enum ldlm_mode l_req_mode);
 enum ldlm_mode ll_take_md_lock(struct inode *inode, u64 bits,
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index e7c1b73..e15962e 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -293,12 +293,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 	 */
 	sb->s_flags |= SB_NOSEC;
 
-	if (sbi->ll_flags & LL_SBI_FLOCK)
-		sbi->ll_fop = &ll_file_operations_flock;
-	else if (sbi->ll_flags & LL_SBI_LOCALFLOCK)
-		sbi->ll_fop = &ll_file_operations;
-	else
-		sbi->ll_fop = &ll_file_operations_noflock;
+	sbi->ll_fop = ll_select_file_operations(sbi);
 
 	/* always ping even if server suppress_pings */
 	if (sbi->ll_flags & LL_SBI_ALWAYS_PING)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 18/49] lustre: osc: fix performance regression in osc_extent_merge()
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (16 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 17/49] lustre: llite: create file_operations registration function James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 19/49] lustre: mds: add enums for MDS_ATTR flags James Simmons
                   ` (30 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The following IOR performance regression was reported at LU-14424:

Client Version		NP	Write(MB/s)	Read(MB/s)
Lustre-2.13.0		1	803		3293
Lustre-2.14.0-RC1	1	529	 	3092
Lustre-2.13.0		16	6962		12021
Lustre-2.14.0-RC1	16	5127		11951

This was tracked down to commit 85ebb57ddc5b. Restore the original
performance with fix in this patch.

Fixes: 85ebb57ddc5b ("lustre: osc: simplify osc_extent_find()")
WC-bug-id: https://jira.whamcloud.com/browse/LU-9679
WC-bug-id: https://jira.whamcloud.com/browse/LU-14424
Lustre-commit: 3e6b7a785cab514a ("LU-9679 osc: simplify osc_extent_find()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41691
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_cache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index 4abe8ba..e85f320 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -568,7 +568,8 @@ static int osc_extent_merge(const struct lu_env *env, struct osc_extent *cur,
 	if (!victim)
 		return -EINVAL;
 
-	if (victim->oe_state != OES_CACHE || victim->oe_fsync_wait)
+	if (victim->oe_state != OES_INV &&
+	    (victim->oe_state != OES_CACHE || victim->oe_fsync_wait))
 		return -EBUSY;
 
 	if (cur->oe_max_end != victim->oe_max_end)
@@ -809,7 +810,6 @@ static struct osc_extent *osc_extent_find(const struct lu_env *env,
 		if (osc_extent_merge(env, ext, cur) == 0) {
 			LASSERT(*grants >= chunksize);
 			*grants -= chunksize;
-			found = osc_extent_hold(ext);
 
 			/*
 			 * Try to merge with the next one too because we
@@ -819,6 +819,7 @@ static struct osc_extent *osc_extent_find(const struct lu_env *env,
 				/* we can save extent tax from next extent */
 				*grants += cli->cl_grant_extent_tax;
 
+			found = osc_extent_hold(ext);
 			break;
 		}
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 19/49] lustre: mds: add enums for MDS_ATTR flags
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (17 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 18/49] lustre: osc: fix performance regression in osc_extent_merge() James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 20/49] lustre: uapi: remove OBD_IOC_LOV_GET_CONFIG James Simmons
                   ` (29 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Add mds_attr_flags to the code to make it easier to follow the logic.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12885
Lustre-commit: 6c6b8cb972ac92c0 ("LU-12885 mds: add enums for MDS_ATTR flags")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/33512
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/mdc_lib.c                | 10 +++++---
 include/uapi/linux/lustre/lustre_idl.h | 46 ++++++++++++++++++----------------
 2 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c
index 6cb14c1..9251aec 100644
--- a/fs/lustre/mdc/mdc_lib.c
+++ b/fs/lustre/mdc/mdc_lib.c
@@ -325,9 +325,10 @@ void mdc_open_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 	set_mrc_cr_flags(rec, cr_flags);
 }
 
-static inline u64 attr_pack(unsigned int ia_valid, enum op_xvalid ia_xvalid)
+static inline enum mds_attr_flags mdc_attr_pack(unsigned int ia_valid,
+						enum op_xvalid ia_xvalid)
 {
-	u64 sa_valid = 0;
+	enum mds_attr_flags sa_valid = 0;
 
 	if (ia_valid & ATTR_MODE)
 		sa_valid |= MDS_ATTR_MODE;
@@ -370,6 +371,7 @@ static inline u64 attr_pack(unsigned int ia_valid, enum op_xvalid ia_xvalid)
 		sa_valid |= MDS_ATTR_LSIZE;
 	if (ia_xvalid & OP_XVALID_LAZYBLOCKS)
 		sa_valid |= MDS_ATTR_LBLOCKS;
+
 	return sa_valid;
 }
 
@@ -383,8 +385,8 @@ static void mdc_setattr_pack_rec(struct mdt_rec_setattr *rec,
 	rec->sa_suppgid = -1;
 
 	rec->sa_fid = op_data->op_fid1;
-	rec->sa_valid  = attr_pack(op_data->op_attr.ia_valid,
-				   op_data->op_xvalid);
+	rec->sa_valid  = mdc_attr_pack(op_data->op_attr.ia_valid,
+				       op_data->op_xvalid);
 	rec->sa_mode = op_data->op_attr.ia_mode;
 	rec->sa_uid = from_kuid(&init_user_ns, op_data->op_attr.ia_uid);
 	rec->sa_gid = from_kgid(&init_user_ns, op_data->op_attr.ia_gid);
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index 3a33657..b0c6191 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -1683,28 +1683,30 @@ struct mdt_rec_setattr {
  * since the client and MDS may run different kernels (see bug 13828)
  * Therefore, we should only use MDS_ATTR_* attributes for sa_valid.
  */
-#define MDS_ATTR_MODE		   0x1ULL /* = 1 */
-#define MDS_ATTR_UID		   0x2ULL /* = 2 */
-#define MDS_ATTR_GID		   0x4ULL /* = 4 */
-#define MDS_ATTR_SIZE		   0x8ULL /* = 8 */
-#define MDS_ATTR_ATIME		  0x10ULL /* = 16 */
-#define MDS_ATTR_MTIME		  0x20ULL /* = 32 */
-#define MDS_ATTR_CTIME		  0x40ULL /* = 64 */
-#define MDS_ATTR_ATIME_SET	  0x80ULL /* = 128 */
-#define MDS_ATTR_MTIME_SET	 0x100ULL /* = 256 */
-#define MDS_ATTR_FORCE		 0x200ULL /* = 512, Not a change, but a change it */
-#define MDS_ATTR_ATTR_FLAG	 0x400ULL /* = 1024 */
-#define MDS_ATTR_KILL_SUID	 0x800ULL /* = 2048 */
-#define MDS_ATTR_KILL_SGID	0x1000ULL /* = 4096 */
-#define MDS_ATTR_CTIME_SET	0x2000ULL /* = 8192 */
-#define MDS_ATTR_FROM_OPEN	0x4000ULL /* = 16384, called from open path,
-					   * ie O_TRUNC
-					   */
-#define MDS_ATTR_BLOCKS		0x8000ULL  /* = 32768 */
-#define MDS_ATTR_PROJID		0x10000ULL /* = 65536 */
-#define MDS_ATTR_LSIZE		0x20000ULL /* = 131072 */
-#define MDS_ATTR_LBLOCKS	0x40000ULL /* = 262144 */
-#define MDS_ATTR_OVERRIDE	0x2000000ULL /* = 33554432 */
+enum mds_attr_flags {
+	MDS_ATTR_MODE =		      0x1ULL, /* = 1 */
+	MDS_ATTR_UID =		      0x2ULL, /* = 2 */
+	MDS_ATTR_GID =		      0x4ULL, /* = 4 */
+	MDS_ATTR_SIZE =		      0x8ULL, /* = 8 */
+	MDS_ATTR_ATIME =	     0x10ULL, /* = 16 */
+	MDS_ATTR_MTIME =	     0x20ULL, /* = 32 */
+	MDS_ATTR_CTIME =	     0x40ULL, /* = 64 */
+	MDS_ATTR_ATIME_SET =	     0x80ULL, /* = 128 */
+	MDS_ATTR_MTIME_SET =	    0x100ULL, /* = 256 */
+	MDS_ATTR_FORCE =	    0x200ULL, /* = 512, Not a change, but a change it */
+	MDS_ATTR_ATTR_FLAG =	    0x400ULL, /* = 1024 */
+	MDS_ATTR_KILL_SUID =	    0x800ULL, /* = 2048 */
+	MDS_ATTR_KILL_SGID =	   0x1000ULL, /* = 4096 */
+	MDS_ATTR_CTIME_SET =	   0x2000ULL, /* = 8192 */
+	MDS_ATTR_FROM_OPEN =	   0x4000ULL, /* = 16384, called from open path,
+					       * ie O_TRUNC
+					       */
+	MDS_ATTR_BLOCKS =	   0x8000ULL, /* = 32768 */
+	MDS_ATTR_PROJID	=	  0x10000ULL, /* = 65536 */
+	MDS_ATTR_LSIZE =	  0x20000ULL, /* = 131072 */
+	MDS_ATTR_LBLOCKS =	  0x40000ULL, /* = 262144 */
+	MDS_ATTR_OVERRIDE =	0x2000000ULL, /* = 33554432 */
+};
 
 enum mds_op_bias {
 /*	MDS_CHECK_SPLIT		= 1 << 0, obsolete before 2.3.58 */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 20/49] lustre: uapi: remove OBD_IOC_LOV_GET_CONFIG
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (18 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 19/49] lustre: mds: add enums for MDS_ATTR flags James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 21/49] lustre: sec: fix migrate for encrypted dir James Simmons
                   ` (28 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

The "lctl lov_getconfig" command has been obsolete for some time,
but was kept around for sanity test_44a to work properly.  Now
that LU-11656 has landed, "lfs getstripe -d $DIR" can be used to
get the actual layout used for files created in a directory.

Remove the lov_getconfig command along with the IOC definition
it was using.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13107
Lustre-commit: d2cb789537485eec ("LU-13107 utils: remove lctl lov_getconfig command")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37106
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_obd.c                  | 43 --------------------------------
 include/uapi/linux/lustre/lustre_ioctl.h |  3 +--
 2 files changed, 1 insertion(+), 45 deletions(-)

diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c
index 9554d85..2939d66 100644
--- a/fs/lustre/lov/lov_obd.c
+++ b/fs/lustre/lov/lov_obd.c
@@ -964,7 +964,6 @@ static int lov_iocontrol(unsigned int cmd, struct obd_export *exp, int len,
 	struct obd_device *obd = class_exp2obd(exp);
 	struct lov_obd *lov = &obd->u.lov;
 	int i = 0, rc = 0, count = lov->desc.ld_tgt_count;
-	struct obd_uuid *uuidp;
 
 	switch (cmd) {
 	case IOC_OBD_STATFS: {
@@ -1013,48 +1012,6 @@ static int lov_iocontrol(unsigned int cmd, struct obd_export *exp, int len,
 			return -EFAULT;
 		break;
 	}
-	case OBD_IOC_LOV_GET_CONFIG: {
-		struct obd_ioctl_data *data;
-		struct lov_desc *desc;
-		u32 *genp;
-
-		len = 0;
-		if (obd_ioctl_getdata(&data, &len, uarg))
-			return -EINVAL;
-
-		if (sizeof(*desc) > data->ioc_inllen1) {
-			kvfree(data);
-			return -EINVAL;
-		}
-
-		if (sizeof(uuidp->uuid) * count > data->ioc_inllen2) {
-			kvfree(data);
-			return -EINVAL;
-		}
-
-		if (sizeof(u32) * count > data->ioc_inllen3) {
-			kvfree(data);
-			return -EINVAL;
-		}
-
-		desc = (struct lov_desc *)data->ioc_inlbuf1;
-		memcpy(desc, &lov->desc, sizeof(*desc));
-
-		uuidp = (struct obd_uuid *)data->ioc_inlbuf2;
-		genp = (u32 *)data->ioc_inlbuf3;
-		/* the uuid will be empty for deleted OSTs */
-		for (i = 0; i < count; i++, uuidp++, genp++) {
-			if (!lov->lov_tgts[i])
-				continue;
-			*uuidp = lov->lov_tgts[i]->ltd_uuid;
-			*genp = lov->lov_tgts[i]->ltd_gen;
-		}
-
-		if (copy_to_user(uarg, data, len))
-			rc = -EFAULT;
-		kvfree(data);
-		break;
-	}
 	case OBD_IOC_QUOTACTL: {
 		struct if_quotactl *qctl = karg;
 		struct lov_tgt_desc *tgt = NULL;
diff --git a/include/uapi/linux/lustre/lustre_ioctl.h b/include/uapi/linux/lustre/lustre_ioctl.h
index 7d48e27..be388fc 100644
--- a/include/uapi/linux/lustre/lustre_ioctl.h
+++ b/include/uapi/linux/lustre/lustre_ioctl.h
@@ -163,10 +163,9 @@ static inline __u32 obd_ioctl_packlen(struct obd_ioctl_data *data)
 #define OBD_IOC_GETNAME		_IOWR('f', 131, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_GETMDNAME	_IOR('f', 131, char[MAX_OBD_NAME])
 #define OBD_IOC_GETDTNAME	OBD_IOC_GETNAME
-#define OBD_IOC_LOV_GET_CONFIG	_IOWR('f', 132, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_LOV_GET_CONFIG	_IOWR('f', 132, OBD_IOC_DATA_TYPE) until 2.14 */
 #define OBD_IOC_CLIENT_RECOVER	_IOW('f', 133, OBD_IOC_DATA_TYPE)
 /* was	OBD_IOC_PING_TARGET	_IOW('f', 136, OBD_IOC_DATA_TYPE) until 2.11 */
-
 /*	OBD_IOC_DEC_FS_USE_COUNT _IO('f', 139) */
 /* was OBD_IOC_NO_TRANSNO      _IOW('f', 140, OBD_IOC_DATA_TYPE) until 2.14 */
 #define OBD_IOC_SET_READONLY	_IOW('f', 141, OBD_IOC_DATA_TYPE)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 21/49] lustre: sec: fix migrate for encrypted dir
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (19 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 20/49] lustre: uapi: remove OBD_IOC_LOV_GET_CONFIG James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 22/49] lnet: libcfs: restore LNET_DUMP_ON_PANIC functionality James Simmons
                   ` (27 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

When setting an encryption policy on a directory that we want to
be encrypted, we need to make sure it is empty.
But, in some cases, setting the LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr
should be allowed on non-empty directories, for instance when a
directory is migrated across MDTs into new shard directories.
Also, it is required for the encrpytion key to be available on the
client when migrating a directory so that the filenames can be
properly rehashed for the new MDT directory shard.
And, in any case, we need to prevent explicit setting of
LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr outside of encryption policy
definition.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14401
Lustre-commit: 67c4cffac6dbd30c ("LU-14401 sec: fix migrate for encrypted dir")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/41413
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/crypto.c         |  1 +
 fs/lustre/llite/file.c           | 12 ++++++++++++
 fs/lustre/llite/llite_internal.h |  8 +++++---
 fs/lustre/llite/xattr.c          | 11 +++++++++++
 4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c
index 0598b3c..8bbb766 100644
--- a/fs/lustre/llite/crypto.c
+++ b/fs/lustre/llite/crypto.c
@@ -104,6 +104,7 @@ static int ll_set_context(struct inode *inode, const void *ctx, size_t len,
 		return -EPERM;
 
 	dentry = (struct dentry *)fs_data;
+	set_bit(LLIF_SET_ENC_CTX, &ll_i2info(inode)->lli_flags);
 	rc = __vfs_setxattr(dentry, inode, LL_XATTR_NAME_ENCRYPTION_CONTEXT,
 			    ctx, len, XATTR_CREATE);
 	if (rc)
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 767eafa..225008e 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -4455,6 +4455,18 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum,
 		goto out_iput;
 	}
 
+	if (IS_ENCRYPTED(child_inode)) {
+		rc = llcrypt_get_encryption_info(child_inode);
+		if (rc)
+			goto out_iput;
+		if (!llcrypt_has_encryption_key(child_inode)) {
+			CDEBUG(D_SEC, "no enc key for "DFID"\n",
+			       PFID(ll_inode2fid(child_inode)));
+			rc = -ENOKEY;
+			goto out_iput;
+		}
+	}
+
 	op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen,
 				     child_inode->i_mode, LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data)) {
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 0d97253..dc9ea03 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -97,12 +97,14 @@ enum ll_file_flags {
 	LLIF_FILE_RESTORING	= 1,
 	/* Xattr cache is attached to the file */
 	LLIF_XATTR_CACHE	= 2,
+	/* Project inherit */
+	LLIF_PROJECT_INHERIT	= 3,
 	/* update atime from MDS no matter if it's older than
 	 * local inode atime.
 	 */
-	LLIF_UPDATE_ATIME,
-	/* Project inherit */
-	LLIF_PROJECT_INHERIT	= 3,
+	LLIF_UPDATE_ATIME	= 4,
+	/* setting encryption context in progress */
+	LLIF_SET_ENC_CTX	= 6,
 };
 
 /* See comment on trunc_sem_down_read_nowait */
diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c
index 119fb26..7004893 100644
--- a/fs/lustre/llite/xattr.c
+++ b/fs/lustre/llite/xattr.c
@@ -133,6 +133,17 @@ static int ll_xattr_set_common(const struct xattr_handler *handler,
 			return -EPERM;
 	}
 
+	/* Setting LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr is only allowed
+	 * when defining an encryption policy on a directory, ie when it
+	 * comes from ll_set_context().
+	 * When new files/dirs are created in an encrypted dir, the xattr
+	 * is set directly in the create request.
+	 */
+	if (handler->flags == XATTR_SECURITY_T &&
+	    !strcmp(name, "c") &&
+	    !test_and_clear_bit(LLIF_SET_ENC_CTX, &ll_i2info(inode)->lli_flags))
+		return -EPERM;
+
 	fullname = kasprintf(GFP_KERNEL, "%s%s", xattr_prefix(handler), name);
 	if (!fullname)
 		return -ENOMEM;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 22/49] lnet: libcfs: restore LNET_DUMP_ON_PANIC functionality.
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (20 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 21/49] lustre: sec: fix migrate for encrypted dir James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 23/49] lustre: ptlrpc: fix ASSERTION on scp_rqbd_posted James Simmons
                   ` (26 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The functionality enabled by CONFIG_LNET_DUMP_ON_PANIC was never
implemented for the Linux client.

Restore this functionality.

While we are there, add conditional-compliation for other code that is
only needed when this is enabled.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14427
Lustre-commit: f9b75c5fb4d4e397 ("LU-14427 libcfs: restore LNET_DUMP_ON_PANIC functionality.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41488
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/Kconfig            | 9 +++++++++
 net/lnet/libcfs/debug.c     | 9 +++++++++
 net/lnet/libcfs/tracefile.c | 2 ++
 3 files changed, 20 insertions(+)

diff --git a/net/lnet/Kconfig b/net/lnet/Kconfig
index 6062a82..b5ee5fa 100644
--- a/net/lnet/Kconfig
+++ b/net/lnet/Kconfig
@@ -8,6 +8,15 @@ config LNET
 	  case of Lustre routers only the LNet layer is required. Lately other
 	  projects are also looking into using LNet as their networking API as well.
 
+config LNET_DUMP_ON_PANIC
+	bool "LNet dump logs on panic"
+	depends on LNET
+	help
+	  Special funcitonality to enable collecting extra logs when LNet panics.
+	  Normally only used by developers for debugging purposes.
+
+	  If unsure, say N.
+
 config LNET_SELFTEST
 	tristate "Lustre networking self testing"
 	depends on LNET
diff --git a/net/lnet/libcfs/debug.c b/net/lnet/libcfs/debug.c
index e68dd91..e519fdb 100644
--- a/net/lnet/libcfs/debug.c
+++ b/net/lnet/libcfs/debug.c
@@ -503,6 +503,15 @@ static int panic_notifier(struct notifier_block *self, unsigned long unused1,
 	libcfs_panic_in_progress = 1;
 	mb();
 
+#ifdef CONFIG_LNET_DUMP_ON_PANIC
+	/* This is currently disabled because it spews far too much to the
+	 * console on the rare cases it is ever triggered.
+	 */
+	if (in_interrupt())
+		cfs_trace_debug_print();
+	else
+		libcfs_debug_dumplog_internal((void *)(long)current->pid);
+#endif
 	return 0;
 }
 
diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c
index 4e1900d..32bab98 100644
--- a/net/lnet/libcfs/tracefile.c
+++ b/net/lnet/libcfs/tracefile.c
@@ -764,6 +764,7 @@ static void put_pages_on_daemon_list(struct page_collection *pc)
 	}
 }
 
+#ifdef CONFIG_LNET_DUMP_ON_PANIC
 void cfs_trace_debug_print(void)
 {
 	struct page_collection pc;
@@ -801,6 +802,7 @@ void cfs_trace_debug_print(void)
 		cfs_tage_free(tage);
 	}
 }
+#endif /* CONFIG_LNET_DUMP_ON_PANIC */
 
 int cfs_tracefile_dump_all_pages(char *filename)
 {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 23/49] lustre: ptlrpc: fix ASSERTION on scp_rqbd_posted
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (21 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 22/49] lnet: libcfs: restore LNET_DUMP_ON_PANIC functionality James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 24/49] lustre: ldlm: not freed req on enqueue James Simmons
                   ` (25 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Yang Sheng, Lustre Development List

From: Yang Sheng <ys@whamcloud.com>

The request may be referenced by other target even the threads
of service were stopped. It caused by some portal shared among
different services. Just wait the request to be released as a
workaround.

LustreError: (service.c::ptlrpc_service_purge_all())
    ASSERTION( list_empty(&svcpt->scp_rqbd_posted) ) failed:
LustreError: (service.c::ptlrpc_service_purge_all()) LBUG
Pid: 21, comm: umount 3.10.0 #1 SMP
Call Trace:
  [<a01c47dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
  [<a01c488c>] lbug_with_loc+0x4c/0xa0 [libcfs]
  [<a0b534dd>] ptlrpc_unregister_service+0xced/0xd90 [ptlrpc]
  [<a005e122>] ost_cleanup+0x82/0x1b0 [ost]
  [<a08e0bfa>] class_free_dev+0x1ca/0x630 [obdclass]
  [<a08e1240>] class_export_put+0x1e0/0x2b0 [obdclass]
  [<a08e2cc5>] class_unlink_export+0x135/0x170 [obdclass]
  [<a08f8030>] class_decref+0x80/0x160 [obdclass]
  [<a08f8481>] class_detach+0x1b1/0x2e0 [obdclass]
  [<a08fef21>] class_process_config+0x1a91/0x2820 [obdclass]
  [<a08ffe90>] class_manual_cleanup+0x1e0/0x6d0 [obdclass]
  [<a092a115>] server_stop_servers+0xd5/0x160 [obdclass]
  [<a092f6c6>] server_put_super+0x126/0xca0 [obdclass]
  [<8121068a>] generic_shutdown_super+0x6a/0xf0
  [<81210a62>] kill_anon_super+0x12/0x20
  [<a09027e2>] lustre_kill_super+0x32/0x50 [obdclass]
  [<81210e59>] deactivate_locked_super+0x49/0x60
  [<812115a6>] deactivate_super+0x46/0x60
  [<8123019f>] cleanup_mnt+0x3f/0x80
  [<81230232>] __cleanup_mnt+0x12/0x20
  [<810ab085>] task_work_run+0xb5/0xf0
  [<8102ac12>] do_notify_resume+0x92/0xb0
  [<81783c83>] int_signal+0x12/0x17
   Kernel panic - not syncing: LBUG

WC-bug-id: https://jira.whamcloud.com/browse/LU-11289
Lustre-commit: b635a0435d13d843 ("LU-11289 ptlrpc: fix ASSERTION on scp_rqbd_posted")
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41936
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/service.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c
index f3f94d4..427215c 100644
--- a/fs/lustre/ptlrpc/service.c
+++ b/fs/lustre/ptlrpc/service.c
@@ -2922,7 +2922,23 @@ static void ptlrpc_wait_replies(struct ptlrpc_service_part *svcpt)
 			ptlrpc_server_finish_active_request(svcpt, req);
 		}
 
-		LASSERT(list_empty(&svcpt->scp_rqbd_posted));
+		/*
+		 * The portal may be shared by several services (eg:OUT_PORTAL).
+		 * So the request could be referenced by other target. So we
+		 * have to wait the ptlrpc_server_drop_request invoked.
+		 *
+		 * TODO: move the req_buffer as global rather than per service.
+		 */
+		spin_lock(&svcpt->scp_lock);
+		while (!list_empty(&svcpt->scp_rqbd_posted)) {
+			spin_unlock(&svcpt->scp_lock);
+			wait_event_idle_timeout(svcpt->scp_waitq,
+				list_empty(&svcpt->scp_rqbd_posted),
+				HZ);
+			spin_lock(&svcpt->scp_lock);
+		}
+		spin_unlock(&svcpt->scp_lock);
+
 		LASSERT(svcpt->scp_nreqs_incoming == 0);
 		LASSERT(svcpt->scp_nreqs_active == 0);
 		/*
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 24/49] lustre: ldlm: not freed req on enqueue
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (22 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 23/49] lustre: ptlrpc: fix ASSERTION on scp_rqbd_posted James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 25/49] lnet: uapi: move userland only nidstr.h handling James Simmons
                   ` (24 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Vitaly Fertman, Lustre Development List

From: Vitaly Fertman <c17818@cray.com>

ldlm_cli_enqueue may allocate a req but failed to allocate a req
slot and returns an errors without freeing the req.

Fixes: b16c40c836 ("lustre: ldlm: FLOCK request can be processed twice")
HPE-bug-id: LUS-9337
WC-bug-id: https://jira.whamcloud.com/browse/LU-12828
Lustre-commit: ce9c1c11593814da ("LU-12828 ldlm: not freed req on enqueue")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://es-gerrit.dev.cray.com/158433
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/41818
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_request.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 1c2ecf2..d8ca744 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -840,6 +840,8 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
 				ptlrpc_put_mod_rpc_slot(req);
 			failed_lock_cleanup(ns, lock, einfo->ei_mode);
 			LDLM_LOCK_RELEASE(lock);
+			if (!req_passed_in)
+				ptlrpc_req_finished(req);
 			goto out;
 		}
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 25/49] lnet: uapi: move userland only nidstr.h handling
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (23 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 24/49] lustre: ldlm: not freed req on enqueue James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 26/49] lnet: libcfs: don't depend on sysctl support for debugfs James Simmons
                   ` (23 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

The function cfs_expand_nidlist() no longer exist for kernel
internals. We can move the function prototype from the UAPI
header to string.h which is a libcfs user land header.
The structure netstrfns that is defined in a UAPI header
has been adding user land only handling. Additional its
use struct list_head which will confuse reviewers since
kernel developers see this as a kernel only thing.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13903
Lustre-commit: 062809b1313ac700 ("LU-13903 utils: move userland only nidstr.h handling")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39115
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 include/linux/lnet/lib-types.h   | 15 +++++++++++++++
 include/uapi/linux/lnet/nidstr.h | 15 ---------------
 net/lnet/lnet/nidstrings.c       |  1 +
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 2424993..cc451cf 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -238,6 +238,21 @@ struct lnet_test_peer {
 #define LNET_COOKIE_TYPE_BITS	2
 #define LNET_COOKIE_MASK	((1ULL << LNET_COOKIE_TYPE_BITS) - 1ULL)
 
+struct netstrfns {
+	u32	nf_type;
+	char	*nf_name;
+	char	*nf_modname;
+	void	(*nf_addr2str)(u32 addr, char *str, size_t size);
+	int	(*nf_str2addr)(const char *str, int nob, u32 *addr);
+	int	(*nf_parse_addrlist)(char *str, int len,
+				     struct list_head *list);
+	int	(*nf_print_addrlist)(char *buffer, int count,
+				     struct list_head *list);
+	int	(*nf_match_addr)(u32 addr, struct list_head *list);
+	int	(*nf_min_max)(struct list_head *nidlist, u32 *min_nid,
+			      u32 *max_nid);
+};
+
 struct lnet_ni;			/* forward ref */
 
 struct lnet_lnd {
diff --git a/include/uapi/linux/lnet/nidstr.h b/include/uapi/linux/lnet/nidstr.h
index d402a6a..caf28e2 100644
--- a/include/uapi/linux/lnet/nidstr.h
+++ b/include/uapi/linux/lnet/nidstr.h
@@ -108,19 +108,4 @@ int cfs_match_net(__u32 net_id, __u32 net_type,
 int cfs_nidrange_find_min_max(struct list_head *nidlist, char *min_nid,
 			      char *max_nid, __kernel_size_t nidstr_length);
 
-struct netstrfns {
-	__u32	nf_type;
-	char	*nf_name;
-	char	*nf_modname;
-	void	(*nf_addr2str)(__u32 addr, char *str, __kernel_size_t size);
-	int	(*nf_str2addr)(const char *str, int nob, __u32 *addr);
-	int	(*nf_parse_addrlist)(char *str, int len,
-				     struct list_head *list);
-	int	(*nf_print_addrlist)(char *buffer, int count,
-				     struct list_head *list);
-	int	(*nf_match_addr)(__u32 addr, struct list_head *list);
-	int	(*nf_min_max)(struct list_head *nidlist, __u32 *min_nid,
-			      __u32 *max_nid);
-};
-
 #endif /* _LNET_NIDSTRINGS_H */
diff --git a/net/lnet/lnet/nidstrings.c b/net/lnet/lnet/nidstrings.c
index b1cd86b..cce2ae4 100644
--- a/net/lnet/lnet/nidstrings.c
+++ b/net/lnet/lnet/nidstrings.c
@@ -42,6 +42,7 @@
 #include <linux/libcfs/libcfs.h>
 #include <linux/libcfs/libcfs_string.h>
 #include <uapi/linux/lnet/nidstr.h>
+#include <linux/lnet/lib-types.h>
 
 /* max value for numeric network address */
 #define MAX_NUMERIC_VALUE 0xffffffff
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 26/49] lnet: libcfs: don't depend on sysctl support for debugfs
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (24 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 25/49] lnet: uapi: move userland only nidstr.h handling James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 27/49] lustre: ptlrpc: Add a binary heap implementation James Simmons
                   ` (22 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Since Linux v5.8-rc1~55^2~6 sysctl support routines like
proc_dointvec() expect a pointer to kernel-space, not userspace.

So stop using these function for debugfs files, and instead
provide bespoke functions.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13783
Lustre-commit: d707b390aec5e95a ("LU-13783 libcfs: don't depend on sysctl support for debugfs")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/40832
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/libcfs/libcfs.h |   3 ++
 net/lnet/libcfs/module.c      | 114 +++++++++++++++++++++++++++++++++++++++---
 net/lnet/lnet/router_proc.c   |   2 +-
 3 files changed, 110 insertions(+), 9 deletions(-)

diff --git a/include/linux/libcfs/libcfs.h b/include/linux/libcfs/libcfs.h
index c77e04b..c98d7a1 100644
--- a/include/linux/libcfs/libcfs.h
+++ b/include/linux/libcfs/libcfs.h
@@ -61,6 +61,9 @@ static inline int notifier_from_ioctl_errno(int err)
 void lnet_insert_debugfs(struct ctl_table *table);
 void lnet_remove_debugfs(struct ctl_table *table);
 
+int debugfs_doint(struct ctl_table *table, int write,
+		  void __user *buffer, size_t *lenp, loff_t *ppos);
+
 /*
  * Memory
  */
diff --git a/net/lnet/libcfs/module.c b/net/lnet/libcfs/module.c
index ad6935c..f9cc6df 100644
--- a/net/lnet/libcfs/module.c
+++ b/net/lnet/libcfs/module.c
@@ -377,13 +377,36 @@ static int libcfs_force_lbug(struct ctl_table *table, int write,
 }
 
 static int proc_fail_loc(struct ctl_table *table, int write,
-			 void __user *buffer,
-			 size_t *lenp, loff_t *ppos)
+			 void __user *buffer, size_t *lenp, loff_t *ppos)
 {
 	int rc;
 	long old_fail_loc = cfs_fail_loc;
 
-	rc = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
+	if (!*lenp || *ppos) {
+		*lenp = 0;
+		return 0;
+	}
+
+	if (write) {
+		char *kbuf = memdup_user_nul(buffer, *lenp);
+
+		if (IS_ERR(kbuf))
+			return PTR_ERR(kbuf);
+		rc = kstrtoul(kbuf, 0, &cfs_fail_loc);
+		kfree(kbuf);
+		*ppos += *lenp;
+	} else {
+		char kbuf[64 / 3 + 3];
+
+		rc = scnprintf(kbuf, sizeof(kbuf), "%lu\n", cfs_fail_loc);
+		if (copy_to_user(buffer, kbuf, rc)) {
+			rc = -EFAULT;
+		} else {
+			*lenp = rc;
+			*ppos += rc;
+		}
+	}
+
 	if (old_fail_loc != cfs_fail_loc) {
 		cfs_race_state = 1;
 		wake_up(&cfs_race_waitq);
@@ -391,6 +414,81 @@ static int proc_fail_loc(struct ctl_table *table, int write,
 	return rc;
 }
 
+int debugfs_doint(struct ctl_table *table, int write,
+		  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int rc;
+
+	if (!*lenp || *ppos) {
+		*lenp = 0;
+		return 0;
+	}
+
+	if (write) {
+		char *kbuf = memdup_user_nul(buffer, *lenp);
+		int val;
+
+		if (IS_ERR(kbuf))
+			return PTR_ERR(kbuf);
+
+		rc = kstrtoint(kbuf, 0, &val);
+		kfree(kbuf);
+		if (!rc) {
+			if (table->extra1 && val < *(int *)table->extra1)
+				val = *(int *)table->extra1;
+			if (table->extra2 && val > *(int *)table->extra2)
+				val = *(int *)table->extra2;
+			*(int *)table->data = val;
+		}
+		*ppos += *lenp;
+	} else {
+		char kbuf[64 / 3 + 3];
+
+		rc = scnprintf(kbuf, sizeof(kbuf), "%u\n", *(int *)table->data);
+		if (copy_to_user(buffer, kbuf, rc)) {
+			rc = -EFAULT;
+		} else {
+			*lenp = rc;
+			*ppos += rc;
+		}
+	}
+
+	return rc;
+}
+EXPORT_SYMBOL(debugfs_doint);
+
+static int debugfs_dostring(struct ctl_table *table, int write,
+			    void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int len = *lenp;
+	char *kbuf = table->data;
+
+	if (!len || *ppos) {
+		*lenp = 0;
+		return 0;
+	}
+	if (len > table->maxlen)
+		len = table->maxlen;
+	if (write) {
+		if (copy_from_user(kbuf, buffer, len))
+			return -EFAULT;
+		memset(kbuf + len, 0, table->maxlen - len);
+		*ppos = *lenp;
+	} else {
+		len = strnlen(kbuf, len);
+		if (copy_to_user(buffer, kbuf, len))
+			return -EFAULT;
+		if (len < *lenp) {
+			if (copy_to_user(buffer + len, "\n", 1))
+				return -EFAULT;
+			len += 1;
+		}
+		*ppos += len;
+		*lenp -= len;
+	}
+	return len;
+}
+
 static int proc_cpt_table(struct ctl_table *table, int write,
 			  void __user *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -512,14 +610,14 @@ static int proc_cpt_distance(struct ctl_table *table, int write,
 		.data		= lnet_debug_log_upcall,
 		.maxlen		= sizeof(lnet_debug_log_upcall),
 		.mode		= 0644,
-		.proc_handler	= &proc_dostring,
+		.proc_handler	= &debugfs_dostring,
 	},
 	{
 		.procname	= "catastrophe",
 		.data		= &libcfs_catastrophe,
 		.maxlen		= sizeof(int),
 		.mode		= 0444,
-		.proc_handler	= &proc_dointvec,
+		.proc_handler	= &debugfs_doint,
 	},
 	{
 		.procname	= "dump_kernel",
@@ -538,7 +636,7 @@ static int proc_cpt_distance(struct ctl_table *table, int write,
 		.data		= &libcfs_watchdog_ratelimit,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= &proc_dointvec_minmax,
+		.proc_handler	= &debugfs_doint,
 		.extra1		= &min_watchdog_ratelimit,
 		.extra2		= &max_watchdog_ratelimit,
 	},
@@ -561,14 +659,14 @@ static int proc_cpt_distance(struct ctl_table *table, int write,
 		.data		= &cfs_fail_val,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= &proc_dointvec
+		.proc_handler	= &debugfs_doint
 	},
 	{
 		.procname	= "fail_err",
 		.data		= &cfs_fail_err,
 		.maxlen		= sizeof(cfs_fail_err),
 		.mode		= 0644,
-		.proc_handler	= &proc_dointvec,
+		.proc_handler	= &debugfs_doint,
 	},
 	{
 	}
diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c
index 96cc506..623899e 100644
--- a/net/lnet/lnet/router_proc.c
+++ b/net/lnet/lnet/router_proc.c
@@ -880,7 +880,7 @@ static int proc_lnet_portal_rotor(struct ctl_table *table, int write,
 		.data		= &lnet_lnd_timeout,
 		.maxlen		= sizeof(lnet_lnd_timeout),
 		.mode		= 0444,
-		.proc_handler	= &proc_dointvec,
+		.proc_handler	= &debugfs_doint,
 	},
 	{
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 27/49] lustre: ptlrpc: Add a binary heap implementation
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (25 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 26/49] lnet: libcfs: don't depend on sysctl support for debugfs James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 28/49] lustre: ptlrpc: Implement NRS Delay Policy James Simmons
                   ` (21 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Nikitas Angelinas, Lustre Development List

From: Nikitas Angelinas <nikitas_angelinas@xyratex.com>

The heap can be used to build and maintain sorted arrays and lists,
and prioritized queues of large numbers of elements, with minimal
insertion and removal time. The first user for the data structure are
NRS policies, which use it to maintain prioritized queues of RPCs at
PTLRPC services.

There is no 'search' operation, but the data type aims to be useful
in cases where performing searches on the data set is not required,
and instead the lowest priority element is usually removed from the
data set for consumption.

WC-bug-id: https://jira.whamcloud.com/browse/LU-398
Lustre-commit: 2070cf5996a140 ("LU-398 libcfs: Add libcfs heap, a binary heap implementation")
Original-author: Eric Barton <eeb@whamcloud.com>
Original-author: Liang Zhen <liang@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Oracle-bug-id: b=13634
Xyratex-bug-id: MRP-73
Reviewed-on: http://review.whamcloud.com/4412
Reviewed-by: Liang Zhen <liang@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 fs/lustre/include/lustre_nrs.h     |  10 +
 fs/lustre/ptlrpc/Makefile          |   3 +-
 fs/lustre/ptlrpc/heap.c            | 502 +++++++++++++++++++++++++++++++++++++
 fs/lustre/ptlrpc/heap.h            | 189 ++++++++++++++
 fs/lustre/ptlrpc/ptlrpc_internal.h |   1 +
 5 files changed, 704 insertions(+), 1 deletion(-)
 create mode 100644 fs/lustre/ptlrpc/heap.c
 create mode 100644 fs/lustre/ptlrpc/heap.h

diff --git a/fs/lustre/include/lustre_nrs.h b/fs/lustre/include/lustre_nrs.h
index f57756a..f15fb03 100644
--- a/fs/lustre/include/lustre_nrs.h
+++ b/fs/lustre/include/lustre_nrs.h
@@ -671,6 +671,16 @@ enum {
 };
 
 #include <lustre_nrs_fifo.h>
+/**
+ * Binary heap node.
+ *
+ * Objects of this type are embedded into objects of the ordered set that is to
+ * be maintained by a \e struct cfs_binheap instance.
+ */
+struct cfs_binheap_node {
+	/** Index into the binary tree */
+	unsigned int	chn_index;
+};
 
 /**
  * NRS request
diff --git a/fs/lustre/ptlrpc/Makefile b/fs/lustre/ptlrpc/Makefile
index b989146..adffb231 100644
--- a/fs/lustre/ptlrpc/Makefile
+++ b/fs/lustre/ptlrpc/Makefile
@@ -15,7 +15,8 @@ ptlrpc_objs += events.o ptlrpc_module.o service.o pinger.o
 ptlrpc_objs += llog_net.o llog_client.o import.o ptlrpcd.o
 ptlrpc_objs += pers.o lproc_ptlrpc.o wiretest.o layout.o
 ptlrpc_objs += sec.o sec_bulk.o sec_gc.o sec_config.o
-ptlrpc_objs += sec_null.o sec_plain.o nrs.o nrs_fifo.o
+ptlrpc_objs += sec_null.o sec_plain.o
+ptlrpc_objs += heap.o nrs.o nrs_fifo.o
 
 ptlrpc-y := $(ldlm_objs) $(ptlrpc_objs) sec_lproc.o
 ptlrpc-$(CONFIG_LUSTRE_TRANSLATE_ERRNOS) += errno.o
diff --git a/fs/lustre/ptlrpc/heap.c b/fs/lustre/ptlrpc/heap.c
new file mode 100644
index 0000000..92f8a2e
--- /dev/null
+++ b/fs/lustre/ptlrpc/heap.c
@@ -0,0 +1,502 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License version 2 for more details.  A copy is
+ * included in the COPYING file that accompanied this code.
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2011 Intel Corporation
+ */
+/*
+ * fs/lustre/ptlrpc/heap.c
+ *
+ * Author: Eric Barton	<eeb@whamcloud.com>
+ *	   Liang Zhen	<liang@whamcloud.com>
+ */
+/** \addtogroup heap
+ *
+ * @{
+ */
+
+#define DEBUG_SUBSYSTEM S_RPC
+
+#include <linux/libcfs/libcfs_cpu.h>
+#include <lustre_net.h>
+#include "heap.h"
+
+#define CBH_ALLOC(ptr, h)						      \
+do {									      \
+	if (h->cbh_cptab) {						      \
+		if ((h)->cbh_flags & CBH_FLAG_ATOMIC_GROW) {		      \
+			ptr = kzalloc_node(CBH_NOB, GFP_ATOMIC,		      \
+					   cfs_cpt_spread_node(h->cbh_cptab,  \
+							       h->cbh_cptid));\
+		} else {						      \
+			ptr = kzalloc_node(CBH_NOB, GFP_KERNEL,		      \
+					   cfs_cpt_spread_node(h->cbh_cptab,  \
+							       h->cbh_cptid));\
+		}							      \
+	} else {							      \
+		if ((h)->cbh_flags & CBH_FLAG_ATOMIC_GROW)		      \
+			ptr = kzalloc(CBH_NOB, GFP_ATOMIC);		      \
+		else							      \
+			ptr = kzalloc(CBH_NOB, GFP_KERNEL);		      \
+	}								      \
+} while (0)
+
+#define CBH_FREE(ptr)	kfree(ptr)
+
+/**
+ * Grows the capacity of a binary heap so that it can handle a larger number of
+ * \e struct cfs_binheap_node objects.
+ *
+ * \param[in] h The binary heap
+ *
+ * \retval 0	   Successfully grew the heap
+ * \retval -ENOMEM OOM error
+ */
+static int
+cfs_binheap_grow(struct cfs_binheap *h)
+{
+	struct cfs_binheap_node ***frag1 = NULL;
+	struct cfs_binheap_node  **frag2;
+	int hwm = h->cbh_hwm;
+
+	/* need a whole new chunk of pointers */
+	LASSERT((h->cbh_hwm & CBH_MASK) == 0);
+
+	if (hwm == 0) {
+		/* first use of single indirect */
+		CBH_ALLOC(h->cbh_elements1, h);
+		if (!h->cbh_elements1)
+			return -ENOMEM;
+
+		goto out;
+	}
+
+	hwm -= CBH_SIZE;
+	if (hwm < CBH_SIZE * CBH_SIZE) {
+		/* not filled double indirect */
+		CBH_ALLOC(frag2, h);
+		if (!frag2)
+			return -ENOMEM;
+
+		if (hwm == 0) {
+			/* first use of double indirect */
+			CBH_ALLOC(h->cbh_elements2, h);
+			if (!h->cbh_elements2) {
+				CBH_FREE(frag2);
+				return -ENOMEM;
+			}
+		}
+
+		h->cbh_elements2[hwm >> CBH_SHIFT] = frag2;
+		goto out;
+	}
+
+	hwm -= CBH_SIZE * CBH_SIZE;
+#if (CBH_SHIFT * 3 < 32)
+	if (hwm >= CBH_SIZE * CBH_SIZE * CBH_SIZE) {
+		/* filled triple indirect */
+		return -ENOMEM;
+	}
+#endif
+	CBH_ALLOC(frag2, h);
+	if (!frag2)
+		return -ENOMEM;
+
+	if (((hwm >> CBH_SHIFT) & CBH_MASK) == 0) {
+		/* first use of this 2nd level index */
+		CBH_ALLOC(frag1, h);
+		if (!frag1) {
+			CBH_FREE(frag2);
+			return -ENOMEM;
+		}
+	}
+
+	if (hwm == 0) {
+		/* first use of triple indirect */
+		CBH_ALLOC(h->cbh_elements3, h);
+		if (!h->cbh_elements3) {
+			CBH_FREE(frag2);
+			CBH_FREE(frag1);
+			return -ENOMEM;
+		}
+	}
+
+	if (frag1) {
+		LASSERT(!h->cbh_elements3[hwm >> (2 * CBH_SHIFT)]);
+		h->cbh_elements3[hwm >> (2 * CBH_SHIFT)] = frag1;
+	} else {
+		frag1 = h->cbh_elements3[hwm >> (2 * CBH_SHIFT)];
+		LASSERT(frag1);
+	}
+
+	frag1[(hwm >> CBH_SHIFT) & CBH_MASK] = frag2;
+
+ out:
+	h->cbh_hwm += CBH_SIZE;
+	return 0;
+}
+
+/**
+ * Creates and initializes a binary heap instance.
+ *
+ * \param[in] ops   The operations to be used
+ * \param[in] flags The heap flags
+ * \parm[in]  count The initial heap capacity in # of elements
+ * \param[in] arg   An optional private argument
+ * \param[in] cptab The CPT table this heap instance will operate over
+ * \param[in] cptid The CPT id of \a cptab this heap instance will operate over
+ *
+ * \retval valid-pointer A newly-created and initialized binary heap object
+ * \retval NULL		 error
+ */
+struct cfs_binheap *
+cfs_binheap_create(struct cfs_binheap_ops *ops, unsigned int flags,
+		   unsigned int count, void *arg, struct cfs_cpt_table *cptab,
+		   int cptid)
+{
+	struct cfs_binheap *h;
+
+	LASSERT(ops);
+	LASSERT(ops->hop_compare);
+	if (cptab) {
+		LASSERT(cptid == CFS_CPT_ANY ||
+		       (cptid >= 0 && cptid < cfs_cpt_number(cptab)));
+
+		h = kzalloc_node(sizeof(*h), GFP_KERNEL,
+				 cfs_cpt_spread_node(cptab, cptid));
+	} else {
+		h = kzalloc(sizeof(*h), GFP_KERNEL);
+	}
+	if (!h)
+		return NULL;
+
+	h->cbh_ops	  = ops;
+	h->cbh_nelements  = 0;
+	h->cbh_hwm	  = 0;
+	h->cbh_private	  = arg;
+	h->cbh_flags	  = flags & (~CBH_FLAG_ATOMIC_GROW);
+	h->cbh_cptab	  = cptab;
+	h->cbh_cptid	  = cptid;
+
+	while (h->cbh_hwm < count) { /* preallocate */
+		if (cfs_binheap_grow(h) != 0) {
+			cfs_binheap_destroy(h);
+			return NULL;
+		}
+	}
+
+	h->cbh_flags |= flags & CBH_FLAG_ATOMIC_GROW;
+
+	return h;
+}
+EXPORT_SYMBOL(cfs_binheap_create);
+
+/**
+ * Releases all resources associated with a binary heap instance.
+ *
+ * Deallocates memory for all indirection levels and the binary heap object
+ * itself.
+ *
+ * \param[in] h The binary heap object
+ */
+void
+cfs_binheap_destroy(struct cfs_binheap *h)
+{
+	int idx0;
+	int idx1;
+	int n;
+
+	LASSERT(h);
+
+	n = h->cbh_hwm;
+
+	if (n > 0) {
+		CBH_FREE(h->cbh_elements1);
+		n -= CBH_SIZE;
+	}
+
+	if (n > 0) {
+		for (idx0 = 0; idx0 < CBH_SIZE && n > 0; idx0++) {
+			CBH_FREE(h->cbh_elements2[idx0]);
+			n -= CBH_SIZE;
+		}
+
+		CBH_FREE(h->cbh_elements2);
+	}
+
+	if (n > 0) {
+		for (idx0 = 0; idx0 < CBH_SIZE && n > 0; idx0++) {
+
+			for (idx1 = 0; idx1 < CBH_SIZE && n > 0; idx1++) {
+				CBH_FREE(h->cbh_elements3[idx0][idx1]);
+				n -= CBH_SIZE;
+			}
+
+			CBH_FREE(h->cbh_elements3[idx0]);
+		}
+
+		CBH_FREE(h->cbh_elements3);
+	}
+
+	kfree(h);
+}
+EXPORT_SYMBOL(cfs_binheap_destroy);
+
+/**
+ * Obtains a double pointer to a heap element, given its index into the binary
+ * tree.
+ *
+ * \param[in] h	  The binary heap instance
+ * \param[in] idx The requested node's index
+ *
+ * \retval valid-pointer A double pointer to a heap pointer entry
+ */
+static struct cfs_binheap_node **
+cfs_binheap_pointer(struct cfs_binheap *h, unsigned int idx)
+{
+	if (idx < CBH_SIZE)
+		return &(h->cbh_elements1[idx]);
+
+	idx -= CBH_SIZE;
+	if (idx < CBH_SIZE * CBH_SIZE)
+		return &(h->cbh_elements2[idx >> CBH_SHIFT][idx & CBH_MASK]);
+
+	idx -= CBH_SIZE * CBH_SIZE;
+	return &(h->cbh_elements3[idx >> (2 * CBH_SHIFT)]
+				 [(idx >> CBH_SHIFT) & CBH_MASK]
+				 [idx & CBH_MASK]);
+}
+
+/**
+ * Obtains a pointer to a heap element, given its index into the binary tree.
+ *
+ * \param[in] h	  The binary heap
+ * \param[in] idx The requested node's index
+ *
+ * \retval valid-pointer The requested heap node
+ * \retval NULL		 Supplied index is out of bounds
+ */
+struct cfs_binheap_node *
+cfs_binheap_find(struct cfs_binheap *h, unsigned int idx)
+{
+	if (idx >= h->cbh_nelements)
+		return NULL;
+
+	return *cfs_binheap_pointer(h, idx);
+}
+EXPORT_SYMBOL(cfs_binheap_find);
+
+/**
+ * Moves a node upwards, towards the root of the binary tree.
+ *
+ * \param[in] h The heap
+ * \param[in] e The node
+ *
+ * \retval 1 The position of \a e in the tree was changed at least once
+ * \retval 0 The position of \a e in the tree was not changed
+ */
+static int
+cfs_binheap_bubble(struct cfs_binheap *h, struct cfs_binheap_node *e)
+{
+	unsigned int	     cur_idx = e->chn_index;
+	struct cfs_binheap_node **cur_ptr;
+	unsigned int	     parent_idx;
+	struct cfs_binheap_node **parent_ptr;
+	int		     did_sth = 0;
+
+	cur_ptr = cfs_binheap_pointer(h, cur_idx);
+	LASSERT(*cur_ptr == e);
+
+	while (cur_idx > 0) {
+		parent_idx = (cur_idx - 1) >> 1;
+
+		parent_ptr = cfs_binheap_pointer(h, parent_idx);
+		LASSERT((*parent_ptr)->chn_index == parent_idx);
+
+		if (h->cbh_ops->hop_compare(*parent_ptr, e))
+			break;
+
+		(*parent_ptr)->chn_index = cur_idx;
+		*cur_ptr = *parent_ptr;
+		cur_ptr = parent_ptr;
+		cur_idx = parent_idx;
+		did_sth = 1;
+	}
+
+	e->chn_index = cur_idx;
+	*cur_ptr = e;
+
+	return did_sth;
+}
+
+/**
+ * Moves a node downwards, towards the last level of the binary tree.
+ *
+ * \param[in] h The heap
+ * \param[in] e The node
+ *
+ * \retval 1 The position of \a e in the tree was changed at least once
+ * \retval 0 The position of \a e in the tree was not changed
+ */
+static int
+cfs_binheap_sink(struct cfs_binheap *h, struct cfs_binheap_node *e)
+{
+	unsigned int	     n = h->cbh_nelements;
+	unsigned int	     child_idx;
+	struct cfs_binheap_node **child_ptr;
+	struct cfs_binheap_node  *child;
+	unsigned int	     child2_idx;
+	struct cfs_binheap_node **child2_ptr;
+	struct cfs_binheap_node  *child2;
+	unsigned int	     cur_idx;
+	struct cfs_binheap_node **cur_ptr;
+	int		     did_sth = 0;
+
+	cur_idx = e->chn_index;
+	cur_ptr = cfs_binheap_pointer(h, cur_idx);
+	LASSERT(*cur_ptr == e);
+
+	while (cur_idx < n) {
+		child_idx = (cur_idx << 1) + 1;
+		if (child_idx >= n)
+			break;
+
+		child_ptr = cfs_binheap_pointer(h, child_idx);
+		child = *child_ptr;
+
+		child2_idx = child_idx + 1;
+		if (child2_idx < n) {
+			child2_ptr = cfs_binheap_pointer(h, child2_idx);
+			child2 = *child2_ptr;
+
+			if (h->cbh_ops->hop_compare(child2, child)) {
+				child_idx = child2_idx;
+				child_ptr = child2_ptr;
+				child = child2;
+			}
+		}
+
+		LASSERT(child->chn_index == child_idx);
+
+		if (h->cbh_ops->hop_compare(e, child))
+			break;
+
+		child->chn_index = cur_idx;
+		*cur_ptr = child;
+		cur_ptr = child_ptr;
+		cur_idx = child_idx;
+		did_sth = 1;
+	}
+
+	e->chn_index = cur_idx;
+	*cur_ptr = e;
+
+	return did_sth;
+}
+
+/**
+ * Sort-inserts a node into the binary heap.
+ *
+ * \param[in] h The heap
+ * \param[in] e The node
+ *
+ * \retval 0	Element inserted successfully
+ * \retval != 0 error
+ */
+int
+cfs_binheap_insert(struct cfs_binheap *h, struct cfs_binheap_node *e)
+{
+	struct cfs_binheap_node **new_ptr;
+	unsigned int	     new_idx = h->cbh_nelements;
+	int		     rc;
+
+	if (new_idx == h->cbh_hwm) {
+		rc = cfs_binheap_grow(h);
+		if (rc != 0)
+			return rc;
+	}
+
+	if (h->cbh_ops->hop_enter) {
+		rc = h->cbh_ops->hop_enter(h, e);
+		if (rc != 0)
+			return rc;
+	}
+
+	e->chn_index = new_idx;
+	new_ptr = cfs_binheap_pointer(h, new_idx);
+	h->cbh_nelements++;
+	*new_ptr = e;
+
+	cfs_binheap_bubble(h, e);
+
+	return 0;
+}
+EXPORT_SYMBOL(cfs_binheap_insert);
+
+/**
+ * Removes a node from the binary heap.
+ *
+ * \param[in] h The heap
+ * \param[in] e The node
+ */
+void
+cfs_binheap_remove(struct cfs_binheap *h, struct cfs_binheap_node *e)
+{
+	unsigned int	     n = h->cbh_nelements;
+	unsigned int	     cur_idx = e->chn_index;
+	struct cfs_binheap_node **cur_ptr;
+	struct cfs_binheap_node  *last;
+
+	LASSERT(cur_idx != CBH_POISON);
+	LASSERT(cur_idx < n);
+
+	cur_ptr = cfs_binheap_pointer(h, cur_idx);
+	LASSERT(*cur_ptr == e);
+
+	n--;
+	last = *cfs_binheap_pointer(h, n);
+	h->cbh_nelements = n;
+	if (last == e)
+		return;
+
+	last->chn_index = cur_idx;
+	*cur_ptr = last;
+	cfs_binheap_relocate(h, *cur_ptr);
+
+	e->chn_index = CBH_POISON;
+	if (h->cbh_ops->hop_exit)
+		h->cbh_ops->hop_exit(h, e);
+}
+EXPORT_SYMBOL(cfs_binheap_remove);
+
+/**
+ * Relocate a node in the binary heap.
+ * Should be called whenever a node's values
+ * which affects its ranking are changed.
+ *
+ * \param[in] h The heap
+ * \param[in] e The node
+ */
+void
+cfs_binheap_relocate(struct cfs_binheap *h, struct cfs_binheap_node *e)
+{
+	if (!cfs_binheap_bubble(h, e))
+		cfs_binheap_sink(h, e);
+}
+EXPORT_SYMBOL(cfs_binheap_relocate);
+/** @} heap */
diff --git a/fs/lustre/ptlrpc/heap.h b/fs/lustre/ptlrpc/heap.h
new file mode 100644
index 0000000..3972917
--- /dev/null
+++ b/fs/lustre/ptlrpc/heap.h
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License version 2 for more details.  A copy is
+ * included in the COPYING file that accompanied this code.
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2011 Intel Corporation
+ */
+/*
+ * libcfs/include/libcfs/heap.h
+ *
+ * Author: Eric Barton	<eeb@whamcloud.com>
+ *	   Liang Zhen	<liang@whamcloud.com>
+ */
+
+#ifndef __LIBCFS_HEAP_H__
+#define __LIBCFS_HEAP_H__
+
+/** \defgroup heap Binary heap
+ *
+ * The binary heap is a scalable data structure created using a binary tree. It
+ * is capable of maintaining large sets of elements sorted usually by one or
+ * more element properties, but really based on anything that can be used as a
+ * binary predicate in order to determine the relevant ordering of any two nodes
+ * that belong to the set. There is no search operation, rather the intention is
+ * for the element of the lowest priority which will always be at the root of
+ * the tree (as this is an implementation of a min-heap) to be removed by users
+ * for consumption.
+ *
+ * Users of the heap should embed a \e struct cfs_binheap_node object instance
+ * on every object of the set that they wish the binary heap instance to handle,
+ * and (at a minimum) provide a struct cfs_binheap_ops::hop_compare()
+ * implementation which is used by the heap as the binary predicate during its
+ * internal sorting operations.
+ *
+ * The current implementation enforces no locking scheme, and so assumes the
+ * user caters for locking between calls to insert, delete and lookup
+ * operations. Since the only consumer for the data structure at this point
+ * are NRS policies, and these operate on a per-CPT basis, binary heap instances
+ * are tied to a specific CPT.
+ * @{
+ */
+
+#define CBH_SHIFT	9
+#define CBH_SIZE	(1 << CBH_SHIFT)		/* # ptrs per level */
+#define CBH_MASK	(CBH_SIZE - 1)
+#define CBH_NOB		(CBH_SIZE * sizeof(struct cfs_binheap_node *))
+
+#define CBH_POISON	0xdeadbeef
+
+/**
+ * Binary heap flags.
+ */
+enum {
+	CBH_FLAG_ATOMIC_GROW	= 1,
+};
+
+struct cfs_binheap;
+
+/**
+ * Binary heap operations.
+ */
+struct cfs_binheap_ops {
+	/**
+	 * Called right before inserting a node into the binary heap.
+	 *
+	 * Implementing this operation is optional.
+	 *
+	 * \param[in] h The heap
+	 * \param[in] e The node
+	 *
+	 * \retval 0 success
+	 * \retval != 0 error
+	 */
+	int		(*hop_enter)(struct cfs_binheap *h,
+				     struct cfs_binheap_node *e);
+	/**
+	 * Called right after removing a node from the binary heap.
+	 *
+	 * Implementing this operation is optional.
+	 *
+	 * \param[in] h The heap
+	 * \param[in] e The node
+	 */
+	void		(*hop_exit)(struct cfs_binheap *h,
+				    struct cfs_binheap_node *e);
+	/**
+	 * A binary predicate which is called during internal heap sorting
+	 * operations, and used in order to determine the relevant ordering of
+	 * two heap nodes.
+	 *
+	 * Implementing this operation is mandatory.
+	 *
+	 * \param[in] a The first heap node
+	 * \param[in] b The second heap node
+	 *
+	 * \retval 0 Node a > node b
+	 * \retval 1 Node a < node b
+	 *
+	 * \see cfs_binheap_bubble()
+	 * \see cfs_biheap_sink()
+	 */
+	int		(*hop_compare)(struct cfs_binheap_node *a,
+				       struct cfs_binheap_node *b);
+};
+
+/**
+ * Binary heap object.
+ *
+ * Sorts elements of type \e struct cfs_binheap_node
+ */
+struct cfs_binheap {
+	/** Triple indirect */
+	struct cfs_binheap_node  ****cbh_elements3;
+	/** double indirect */
+	struct cfs_binheap_node   ***cbh_elements2;
+	/** single indirect */
+	struct cfs_binheap_node    **cbh_elements1;
+	/** # elements referenced */
+	unsigned int		cbh_nelements;
+	/** high water mark */
+	unsigned int		cbh_hwm;
+	/** user flags */
+	unsigned int		cbh_flags;
+	/** operations table */
+	struct cfs_binheap_ops *cbh_ops;
+	/** private data */
+	void		       *cbh_private;
+	/** associated CPT table */
+	struct cfs_cpt_table   *cbh_cptab;
+	/** associated CPT id of this struct cfs_binheap::cbh_cptab */
+	int			cbh_cptid;
+};
+
+void cfs_binheap_destroy(struct cfs_binheap *h);
+struct cfs_binheap *
+cfs_binheap_create(struct cfs_binheap_ops *ops, unsigned int flags,
+		   unsigned int count, void *arg, struct cfs_cpt_table *cptab,
+		   int cptid);
+struct cfs_binheap_node *
+cfs_binheap_find(struct cfs_binheap *h, unsigned int idx);
+int cfs_binheap_insert(struct cfs_binheap *h, struct cfs_binheap_node *e);
+void cfs_binheap_remove(struct cfs_binheap *h, struct cfs_binheap_node *e);
+void cfs_binheap_relocate(struct cfs_binheap *h, struct cfs_binheap_node *e);
+
+static inline int
+cfs_binheap_size(struct cfs_binheap *h)
+{
+	return h->cbh_nelements;
+}
+
+static inline int
+cfs_binheap_is_empty(struct cfs_binheap *h)
+{
+	return h->cbh_nelements == 0;
+}
+
+static inline struct cfs_binheap_node *
+cfs_binheap_root(struct cfs_binheap *h)
+{
+	return cfs_binheap_find(h, 0);
+}
+
+static inline struct cfs_binheap_node *
+cfs_binheap_remove_root(struct cfs_binheap *h)
+{
+	struct cfs_binheap_node *e = cfs_binheap_find(h, 0);
+
+	if (e != NULL)
+		cfs_binheap_remove(h, e);
+	return e;
+}
+
+/** @} heap */
+
+#endif /* __LIBCFS_HEAP_H__ */
diff --git a/fs/lustre/ptlrpc/ptlrpc_internal.h b/fs/lustre/ptlrpc/ptlrpc_internal.h
index 83995cc..190c2b1 100644
--- a/fs/lustre/ptlrpc/ptlrpc_internal.h
+++ b/fs/lustre/ptlrpc/ptlrpc_internal.h
@@ -37,6 +37,7 @@
 #define PTLRPC_INTERNAL_H
 
 #include "../ldlm/ldlm_internal.h"
+#include "heap.h"
 
 struct ldlm_namespace;
 struct obd_import;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 28/49] lustre: ptlrpc: Implement NRS Delay Policy
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (26 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 27/49] lustre: ptlrpc: Add a binary heap implementation James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 29/49] lustre: ptlrpc: rename cfs_binheap to simply binheap James Simmons
                   ` (20 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Chris Horn <hornc@cray.com>

The NRS Delay policy seeks to perturb the timing of request processing
at the PtlRPC layer, with the goal of simulating high server load, and
finding and exposing timing related problems. When this policy is
active, upon arrival of a request the policy will calculate an offset,
within a defined, user-configurable range, from the request arrival
time, to determine a time after which the request should be handled.
The request is then stored using the cfs_binheap implementation,
which sorts the request according to the assigned start time.
Requests are removed from the binheap for handling once their start
time has been passed.

The behavior of the policy can be controlled via three proc files
which can be written to via lctl similar to other policies.

nrs_delay_min: Controls the minimum amount of time, in seconds, that a
request will be delayed by this policy. The default is 5 seconds.

nrs_delay_max: Controls the maximum amount of time, in seconds, that a
request will be delayed by this policy. The default is 300 seconds.

nrs_delay_pct: Control the percentage of requests that will be delayed
by this policy. The default is 100. Note, when a request is not
selected for handling by the delay policy due to this variable then
the request will be handled by whatever fallback policy is defined
for that service. If no other fallback policy is defined then the
request will be handled by the FIFO policy.

Some examples:

lctl set_param *.*.*.nrs_delay_min=reg_delay_min:5, to set the regular
request minimum delay on all PtlRPC services to 5 seconds.

lctl set_param *.*.*.nrs_delay_min=hp_delay_min:2, to set the
high-priority request minimum delay on all PtlRPC services to 2
seconds.

lctl set_param *.*.ost_io.nrs_delay_min=8, to set both the regular and
high-priority request minimum delay of the ost_io service to 8
seconds.

lctl set_param *.*.*.nrs_delay_max=reg_delay_max:20, to set the
regular request maximum delay on all PtlRPC services to 20 seconds.

lctl set_param *.*.*.nrs_delay_max=hp_delay_max:10, to set the
high-priority request maximum delay on all PtlRPC services to 10
seconds.

lctl set_param *.*.ost_io.nrs_delay_max=35, to set both the regular
and high-priority request maximum delay of the ost_io service to 35
seconds.

lctl set_param *.*.*.nrs_delay_pct=reg_delay_pct:5, to delay 5
percent of regular requests on all PtlRPC services.

lctl set_param *.*.*.nrs_delay_pct=hp_delay_pct:2, to delay 2 percent
of high-priority requests on all PtlRPC services.

lctl set_param *.*.ost_io.nrs_delay_pct=8, to delay 8 percent of both
regular and high-priority requests of the ost_io service.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6283
Lustre-commit: 588831e9eac38b8 ("LU-6283 ptlrpc: Implement NRS Delay Policy")
Signed-off-by: Chris Horn <hornc@cray.com>
Reviewed-on: https://review.whamcloud.com/14701
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_nrs.h       |   6 +
 fs/lustre/include/lustre_nrs_delay.h |  87 ++++
 fs/lustre/ptlrpc/Makefile            |   2 +-
 fs/lustre/ptlrpc/nrs.c               |   4 +
 fs/lustre/ptlrpc/nrs_delay.c         | 852 +++++++++++++++++++++++++++++++++++
 fs/lustre/ptlrpc/ptlrpc_internal.h   |   5 +-
 6 files changed, 952 insertions(+), 4 deletions(-)
 create mode 100644 fs/lustre/include/lustre_nrs_delay.h
 create mode 100644 fs/lustre/ptlrpc/nrs_delay.c

diff --git a/fs/lustre/include/lustre_nrs.h b/fs/lustre/include/lustre_nrs.h
index f15fb03..0fc9e94 100644
--- a/fs/lustre/include/lustre_nrs.h
+++ b/fs/lustre/include/lustre_nrs.h
@@ -681,6 +681,7 @@ struct cfs_binheap_node {
 	/** Index into the binary tree */
 	unsigned int	chn_index;
 };
+#include <lustre_nrs_delay.h>
 
 /**
  * NRS request
@@ -706,6 +707,7 @@ struct ptlrpc_nrs_request {
 	unsigned int			nr_enqueued:1;
 	unsigned int			nr_started:1;
 	unsigned int			nr_finalized:1;
+	struct cfs_binheap_node		nr_node;
 
 	/**
 	 * Policy-specific fields, used for determining a request's scheduling
@@ -716,6 +718,10 @@ struct ptlrpc_nrs_request {
 		 * Fields for the FIFO policy
 		 */
 		struct nrs_fifo_req	fifo;
+		/**
+		 * Fields for the delay policy
+		 */
+		struct nrs_delay_req	delay;
 	} nr_u;
 	/**
 	 * Externally-registering policies may want to use this to allocate
diff --git a/fs/lustre/include/lustre_nrs_delay.h b/fs/lustre/include/lustre_nrs_delay.h
new file mode 100644
index 0000000..01f0725
--- /dev/null
+++ b/fs/lustre/include/lustre_nrs_delay.h
@@ -0,0 +1,87 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License version 2 for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2015, Cray Inc. All Rights Reserved.
+ *
+ * Copyright (c) 2015, Intel Corporation.
+ */
+/*
+ *
+ * Network Request Scheduler (NRS) Delay policy
+ *
+ */
+
+#ifndef _LUSTRE_NRS_DELAY_H
+#define _LUSTRE_NRS_DELAY_H
+
+/* \name delay
+ *
+ * Delay policy
+ * @{
+ */
+
+/**
+ * Private data structure for the delay policy
+ */
+struct nrs_delay_data {
+	struct ptlrpc_nrs_resource	 delay_res;
+
+	/**
+	 * Delayed requests are stored in this binheap until they are
+	 * removed for handling.
+	 */
+	struct cfs_binheap		*delay_binheap;
+
+	/**
+	 * Minimum service time
+	 */
+	u32				 min_delay;
+
+	/**
+	 * Maximum service time
+	 */
+	u32				 max_delay;
+
+	/**
+	 * We'll delay this percent of requests
+	 */
+	u32				 delay_pct;
+};
+
+struct nrs_delay_req {
+	/**
+	 * This is the time at which a request becomes eligible for handling
+	 */
+	time64_t	req_start_time;
+};
+
+enum nrs_ctl_delay {
+	NRS_CTL_DELAY_RD_MIN = PTLRPC_NRS_CTL_1ST_POL_SPEC,
+	NRS_CTL_DELAY_WR_MIN,
+	NRS_CTL_DELAY_RD_MAX,
+	NRS_CTL_DELAY_WR_MAX,
+	NRS_CTL_DELAY_RD_PCT,
+	NRS_CTL_DELAY_WR_PCT,
+};
+
+/** @} delay */
+
+#endif
diff --git a/fs/lustre/ptlrpc/Makefile b/fs/lustre/ptlrpc/Makefile
index adffb231..3badb05 100644
--- a/fs/lustre/ptlrpc/Makefile
+++ b/fs/lustre/ptlrpc/Makefile
@@ -16,7 +16,7 @@ ptlrpc_objs += llog_net.o llog_client.o import.o ptlrpcd.o
 ptlrpc_objs += pers.o lproc_ptlrpc.o wiretest.o layout.o
 ptlrpc_objs += sec.o sec_bulk.o sec_gc.o sec_config.o
 ptlrpc_objs += sec_null.o sec_plain.o
-ptlrpc_objs += heap.o nrs.o nrs_fifo.o
+ptlrpc_objs += heap.o nrs.o nrs_fifo.o nrs_delay.o
 
 ptlrpc-y := $(ldlm_objs) $(ptlrpc_objs) sec_lproc.o
 ptlrpc-$(CONFIG_LUSTRE_TRANSLATE_ERRNOS) += errno.o
diff --git a/fs/lustre/ptlrpc/nrs.c b/fs/lustre/ptlrpc/nrs.c
index 953a0b8..dd36d18 100644
--- a/fs/lustre/ptlrpc/nrs.c
+++ b/fs/lustre/ptlrpc/nrs.c
@@ -1579,6 +1579,10 @@ int ptlrpc_nrs_init(void)
 	if (rc != 0)
 		goto fail;
 
+	rc = ptlrpc_nrs_policy_register(&nrs_conf_delay);
+	if (rc != 0)
+		goto fail;
+
 	return rc;
 fail:
 	/**
diff --git a/fs/lustre/ptlrpc/nrs_delay.c b/fs/lustre/ptlrpc/nrs_delay.c
new file mode 100644
index 0000000..8ff8e8d
--- /dev/null
+++ b/fs/lustre/ptlrpc/nrs_delay.c
@@ -0,0 +1,852 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License version 2 for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2017, Cray Inc. All Rights Reserved.
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ */
+/*
+ * lustre/ptlrpc/nrs_delay.c
+ *
+ * Network Request Scheduler (NRS) Delay policy
+ *
+ * This policy will delay request handling for some configurable amount of
+ * time.
+ *
+ * Author: Chris Horn <hornc@cray.com>
+ */
+/**
+ * \addtogoup nrs
+ * @{
+ */
+
+#define DEBUG_SUBSYSTEM S_RPC
+
+#include <linux/random.h>
+
+#include <linux/libcfs/libcfs_cpu.h>
+#include <obd_support.h>
+#include <obd_class.h>
+#include "ptlrpc_internal.h"
+
+/**
+ * \name delay
+ *
+ * The delay policy schedules RPCs so that they are only processed after some
+ * configurable amount of time (in seconds) has passed.
+ *
+ * The defaults were chosen arbitrarily.
+ *
+ * @{
+ */
+
+#define NRS_POL_NAME_DELAY	"delay"
+
+/* Default minimum delay in seconds. */
+#define NRS_DELAY_MIN_DEFAULT	5
+/* Default maximum delay, in seconds. */
+#define NRS_DELAY_MAX_DEFAULT	300
+/* Default percentage of delayed RPCs. */
+#define NRS_DELAY_PCT_DEFAULT	100
+
+/**
+ * Binary heap predicate.
+ *
+ * Elements are sorted according to the start time assigned to the requests
+ * upon enqueue. An element with an earlier start time is "less than" an
+ * element with a later start time.
+ *
+ * \retval 0 start_time(e1) > start_time(e2)
+ * \retval 1 start_time(e1) <= start_time(e2)
+ */
+static int delay_req_compare(struct cfs_binheap_node *e1,
+			     struct cfs_binheap_node *e2)
+{
+	struct ptlrpc_nrs_request *nrq1;
+	struct ptlrpc_nrs_request *nrq2;
+
+	nrq1 = container_of(e1, struct ptlrpc_nrs_request, nr_node);
+	nrq2 = container_of(e2, struct ptlrpc_nrs_request, nr_node);
+
+	return nrq1->nr_u.delay.req_start_time <=
+	       nrq2->nr_u.delay.req_start_time;
+}
+
+static struct cfs_binheap_ops nrs_delay_heap_ops = {
+	.hop_enter	= NULL,
+	.hop_exit	= NULL,
+	.hop_compare	= delay_req_compare,
+};
+
+/**
+ * Is called before the policy transitions into
+ * ptlrpc_nrs_pol_state::NRS_POL_STATE_STARTED; allocates and initializes
+ * the delay-specific private data structure.
+ *
+ * @policy	The policy to start
+ *
+ * Return:	-ENOMEM OOM error
+ *		0 success
+ *
+ * \see nrs_policy_register()
+ * \see nrs_policy_ctl()
+ */
+static int nrs_delay_start(struct ptlrpc_nrs_policy *policy)
+{
+	struct nrs_delay_data *delay_data;
+
+	delay_data = kzalloc_node(sizeof(*delay_data), GFP_NOFS,
+				  cfs_cpt_spread_node(nrs_pol2cptab(policy),
+						      nrs_pol2cptid(policy)));
+	if (!delay_data)
+		return -ENOMEM;
+
+	delay_data->delay_binheap = cfs_binheap_create(&nrs_delay_heap_ops,
+						       CBH_FLAG_ATOMIC_GROW,
+						       4096, NULL,
+						       nrs_pol2cptab(policy),
+						       nrs_pol2cptid(policy));
+
+	if (!delay_data->delay_binheap) {
+		kfree(delay_data);
+		return -ENOMEM;
+	}
+
+	delay_data->min_delay = NRS_DELAY_MIN_DEFAULT;
+	delay_data->max_delay = NRS_DELAY_MAX_DEFAULT;
+	delay_data->delay_pct = NRS_DELAY_PCT_DEFAULT;
+
+	policy->pol_private = delay_data;
+
+	return 0;
+}
+
+/**
+ * Is called before the policy transitions into
+ * ptlrpc_nrs_pol_state::NRS_POL_STATE_STOPPED; deallocates the delay-specific
+ * private data structure.
+ *
+ * @policy	The policy to stop
+ *
+ * \see nrs_policy_stop0()
+ */
+static void nrs_delay_stop(struct ptlrpc_nrs_policy *policy)
+{
+	struct nrs_delay_data *delay_data = policy->pol_private;
+
+	LASSERT(delay_data);
+	LASSERT(delay_data->delay_binheap);
+	LASSERT(cfs_binheap_is_empty(delay_data->delay_binheap));
+
+	cfs_binheap_destroy(delay_data->delay_binheap);
+
+	kfree(delay_data);
+}
+
+/**
+ * Is called for obtaining a delay policy resource.
+ *
+ * @policy		The policy on which the request is being asked for
+ * @nrq		The request for which resources are being taken
+ * @parent		Parent resource, unused in this policy
+ * @resp		Resources references are placed in this array
+ * @moving_req		Signifies limited caller context; unused in this
+ *			policy
+ *
+ * Return:		1 The delay policy only has a one-level resource
+ *			hierarchy
+ *
+ * \see nrs_resource_get_safe()
+ */
+static int nrs_delay_res_get(struct ptlrpc_nrs_policy *policy,
+			     struct ptlrpc_nrs_request *nrq,
+			     const struct ptlrpc_nrs_resource *parent,
+			     struct ptlrpc_nrs_resource **resp, bool moving_req)
+{
+	/**
+	 * Just return the resource embedded inside nrs_delay_data, and end this
+	 * resource hierarchy reference request.
+	 */
+	*resp = &((struct nrs_delay_data *)policy->pol_private)->delay_res;
+	return 1;
+}
+
+/**
+ * Called when getting a request from the delay policy for handling, or just
+ * peeking; removes the request from the policy when it is to be handled.
+ * Requests are only removed from this policy when their start time has
+ * passed.
+ *
+ * @policy	The policy
+ * @peek	When set, signifies that we just want to examine the
+ *		request, and not handle it, so the request is not removed
+ *		from the policy.
+ * @force	Force the policy to return a request
+ *
+ * Return:	The request to be handled
+ *		NULL no request available
+ *
+ * \see ptlrpc_nrs_req_get_nolock()
+ * \see nrs_request_get()
+ */
+static
+struct ptlrpc_nrs_request *nrs_delay_req_get(struct ptlrpc_nrs_policy *policy,
+					     bool peek, bool force)
+{
+	struct nrs_delay_data *delay_data = policy->pol_private;
+	struct cfs_binheap_node *node;
+	struct ptlrpc_nrs_request *nrq;
+
+	node = cfs_binheap_root(delay_data->delay_binheap);
+	nrq = unlikely(!node) ? NULL :
+	      container_of(node, struct ptlrpc_nrs_request, nr_node);
+
+	if (likely(nrq)) {
+		if (!force &&
+		    ktime_get_real_seconds() < nrq->nr_u.delay.req_start_time)
+			nrq = NULL;
+		else if (likely(!peek))
+			cfs_binheap_remove(delay_data->delay_binheap,
+					   &nrq->nr_node);
+	}
+
+	return nrq;
+}
+
+/**
+ * Adds request \a nrq to a delay \a policy instance's set of queued requests
+ *
+ * A percentage (delay_pct) of incoming requests are delayed by this policy.
+ * If selected for delay a request start time is calculated. A start time
+ * is the current time plus a random offset in the range [min_delay, max_delay]
+ * The start time is recorded in the request, and is then used by
+ * delay_req_compare() to maintain a set of requests ordered by their start
+ * times.
+ *
+ * @policy	The policy
+ * @nrq	The request to add
+ *
+ * Return:	0 request added
+ *		1 request not added
+ *
+ */
+static int nrs_delay_req_add(struct ptlrpc_nrs_policy *policy,
+			     struct ptlrpc_nrs_request *nrq)
+{
+	struct nrs_delay_data *delay_data = policy->pol_private;
+
+	if (delay_data->delay_pct == 0 || /* Not delaying anything */
+	    (delay_data->delay_pct != 100 &&
+	     delay_data->delay_pct < prandom_u32_max(100)))
+		return 1;
+
+	nrq->nr_u.delay.req_start_time = ktime_get_real_seconds() +
+					 prandom_u32_max(delay_data->max_delay - delay_data->min_delay + 1) +
+					 delay_data->min_delay;
+
+	return cfs_binheap_insert(delay_data->delay_binheap, &nrq->nr_node);
+}
+
+/**
+ * Removes request \a nrq from \a policy's list of queued requests.
+ *
+ * @policy	The policy
+ * @nrq	The request to remove
+ */
+static void nrs_delay_req_del(struct ptlrpc_nrs_policy *policy,
+			      struct ptlrpc_nrs_request *nrq)
+{
+	struct nrs_delay_data *delay_data = policy->pol_private;
+
+	cfs_binheap_remove(delay_data->delay_binheap, &nrq->nr_node);
+}
+
+/**
+ * Prints a debug statement right before the request \a nrq stops being
+ * handled.
+ *
+ * @policy	The policy handling the request
+ * @nrq	The request being handled
+ *
+ * \see ptlrpc_server_finish_request()
+ * \see ptlrpc_nrs_req_stop_nolock()
+ */
+static void nrs_delay_req_stop(struct ptlrpc_nrs_policy *policy,
+			       struct ptlrpc_nrs_request *nrq)
+{
+	struct ptlrpc_request *req = container_of(nrq, struct ptlrpc_request,
+						  rq_nrq);
+
+	DEBUG_REQ(D_RPCTRACE, req,
+		  "NRS: finished delayed request from %s after %llds",
+		  libcfs_id2str(req->rq_peer),
+		  (s64)(nrq->nr_u.delay.req_start_time -
+			req->rq_srv.sr_arrival_time.tv_sec));
+}
+
+/**
+ * Performs ctl functions specific to delay policy instances; similar to ioctl
+ *
+ * @policy		the policy instance
+ * @opc		the opcode
+ * @arg		used for passing parameters and information
+ *
+ * \pre assert_spin_locked(&policy->pol_nrs->->nrs_lock)
+ * \post assert_spin_locked(&policy->pol_nrs->->nrs_lock)
+ *
+ * Return:		0   operation carried out successfully
+ *			-ve error
+ */
+static int nrs_delay_ctl(struct ptlrpc_nrs_policy *policy,
+			 enum ptlrpc_nrs_ctl opc, void *arg)
+{
+	struct nrs_delay_data *delay_data = policy->pol_private;
+	u32 *val = (u32 *)arg;
+
+	assert_spin_locked(&policy->pol_nrs->nrs_lock);
+
+	switch ((enum nrs_ctl_delay)opc) {
+	default:
+		return -EINVAL;
+
+	case NRS_CTL_DELAY_RD_MIN:
+		*val = delay_data->min_delay;
+		break;
+
+	case NRS_CTL_DELAY_WR_MIN:
+		if (*val > delay_data->max_delay)
+			return -EINVAL;
+
+		delay_data->min_delay = *val;
+		break;
+
+	case NRS_CTL_DELAY_RD_MAX:
+		*val = delay_data->max_delay;
+		break;
+
+	case NRS_CTL_DELAY_WR_MAX:
+		if (*val < delay_data->min_delay)
+			return -EINVAL;
+
+		delay_data->max_delay = *val;
+		break;
+
+	case NRS_CTL_DELAY_RD_PCT:
+		*val = delay_data->delay_pct;
+		break;
+
+	case NRS_CTL_DELAY_WR_PCT:
+		if (*val < 0 || *val > 100)
+			return -EINVAL;
+
+		delay_data->delay_pct = *val;
+		break;
+	}
+	return 0;
+}
+
+/**
+ * debugfs interface
+ */
+
+/* nrs_delay_min and nrs_delay_max are bounded by these values */
+#define LPROCFS_NRS_DELAY_LOWER_BOUND		0
+#define LPROCFS_NRS_DELAY_UPPER_BOUND		65535
+
+#define LPROCFS_NRS_DELAY_MIN_NAME		"delay_min:"
+#define LPROCFS_NRS_DELAY_MIN_NAME_REG		"reg_delay_min:"
+#define LPROCFS_NRS_DELAY_MIN_NAME_HP		"hp_delay_min:"
+
+/**
+ * Max size of the nrs_delay_min seq_write buffer. Needs to be large enough
+ * to hold the string: "reg_min_delay:65535 hp_min_delay:65535"
+ */
+#define LPROCFS_NRS_DELAY_MIN_SIZE					       \
+	sizeof(LPROCFS_NRS_DELAY_MIN_NAME_REG				       \
+	       __stringify(LPROCFS_NRS_DELAY_UPPER_BOUND)		       \
+	       " " LPROCFS_NRS_DELAY_MIN_NAME_HP			       \
+	       __stringify(LPROCFS_NRS_DELAY_UPPER_BOUND))
+
+#define LPROCFS_NRS_DELAY_MAX_NAME		"delay_max:"
+#define LPROCFS_NRS_DELAY_MAX_NAME_REG		"reg_delay_max:"
+#define LPROCFS_NRS_DELAY_MAX_NAME_HP		"hp_delay_max:"
+
+/**
+ * Similar to LPROCFS_NRS_DELAY_MIN_SIZE above, but for the nrs_delay_max
+ * variable.
+ */
+#define LPROCFS_NRS_DELAY_MAX_SIZE					       \
+	sizeof(LPROCFS_NRS_DELAY_MAX_NAME_REG				       \
+	       __stringify(LPROCFS_NRS_DELAY_UPPER_BOUND)		       \
+	       " " LPROCFS_NRS_DELAY_MAX_NAME_HP			       \
+	       __stringify(LPROCFS_NRS_DELAY_UPPER_BOUND))
+
+#define LPROCFS_NRS_DELAY_PCT_MIN_VAL		0
+#define LPROCFS_NRS_DELAY_PCT_MAX_VAL		100
+#define LPROCFS_NRS_DELAY_PCT_NAME		"delay_pct:"
+#define LPROCFS_NRS_DELAY_PCT_NAME_REG		"reg_delay_pct:"
+#define LPROCFS_NRS_DELAY_PCT_NAME_HP		"hp_delay_pct:"
+
+/**
+ * Similar to LPROCFS_NRS_DELAY_MIN_SIZE above, but for the nrs_delay_pct
+ * variable.
+ */
+#define LPROCFS_NRS_DELAY_PCT_SIZE					       \
+	sizeof(LPROCFS_NRS_DELAY_PCT_NAME_REG				       \
+	       __stringify(LPROCFS_NRS_DELAY_PCT_MAX_VAL)		       \
+	       " " LPROCFS_NRS_DELAY_PCT_NAME_HP			       \
+	       __stringify(LPROCFS_NRS_DELAY_PCT_MAX_VAL))
+
+/**
+ * Helper for delay's seq_write functions.
+ */
+static ssize_t
+lprocfs_nrs_delay_seq_write_common(const char __user *buffer,
+				   unsigned int bufsize, size_t count,
+				   const char *var_name, unsigned int min_val,
+				   unsigned int max_val,
+				   struct ptlrpc_service *svc, char *pol_name,
+				   enum ptlrpc_nrs_ctl opc, bool single)
+{
+	enum ptlrpc_nrs_queue_type queue = 0;
+	char *kernbuf;
+	char *val_str;
+	unsigned long val_reg;
+	unsigned long val_hp;
+	size_t count_copy;
+	int rc = 0;
+	char *tmp = NULL;
+	int tmpsize = 0;
+
+	if (count > bufsize - 1)
+		return -EINVAL;
+
+	kernbuf = kzalloc(bufsize, GFP_KERNEL);
+	if (!kernbuf)
+		return -ENOMEM;
+
+	if (copy_from_user(kernbuf, buffer, count)) {
+		rc = -EFAULT;
+		goto free_kernbuf;
+	}
+
+	tmpsize = strlen("reg_") + strlen(var_name) + 1;
+	tmp = kzalloc(tmpsize, GFP_KERNEL);
+	if (!tmp) {
+		rc = -ENOMEM;
+		goto free_tmp;
+	}
+
+	/* look for "reg_<var_name>" in kernbuf */
+	snprintf(tmp, tmpsize, "reg_%s", var_name);
+	count_copy = count;
+	val_str = lprocfs_find_named_value(kernbuf, tmp, &count_copy);
+	if (val_str != kernbuf) {
+		rc = kstrtoul(val_str, 10, &val_reg);
+		if (rc != 0) {
+			rc = -EINVAL;
+			goto free_tmp;
+		}
+		queue |= PTLRPC_NRS_QUEUE_REG;
+	}
+
+	/* look for "hp_<var_name>" in kernbuf */
+	snprintf(tmp, tmpsize, "hp_%s", var_name);
+	count_copy = count;
+	val_str = lprocfs_find_named_value(kernbuf, tmp, &count_copy);
+	if (val_str != kernbuf) {
+		if (!nrs_svc_has_hp(svc)) {
+			rc = -ENODEV;
+			goto free_tmp;
+		}
+
+		rc = kstrtoul(val_str, 10, &val_hp);
+		if (rc != 0) {
+			rc = -EINVAL;
+			goto free_tmp;
+		}
+		queue |= PTLRPC_NRS_QUEUE_HP;
+	}
+
+	if (queue == 0) {
+		if (!isdigit(kernbuf[0])) {
+			rc = -EINVAL;
+			goto free_tmp;
+		}
+
+		rc = kstrtoul(kernbuf, 10, &val_reg);
+		if (rc != 0) {
+			rc = -EINVAL;
+			goto free_tmp;
+		}
+
+		queue = PTLRPC_NRS_QUEUE_REG;
+
+		if (nrs_svc_has_hp(svc)) {
+			queue |= PTLRPC_NRS_QUEUE_HP;
+			val_hp = val_reg;
+		}
+	}
+
+	if (queue & PTLRPC_NRS_QUEUE_REG) {
+		if (val_reg > max_val || val_reg < min_val) {
+			rc = -EINVAL;
+			goto free_tmp;
+		}
+
+		rc = ptlrpc_nrs_policy_control(svc, PTLRPC_NRS_QUEUE_REG,
+					       pol_name, opc, single, &val_reg);
+		if ((rc < 0 && rc != -ENODEV) ||
+		    (rc == -ENODEV && queue == PTLRPC_NRS_QUEUE_REG))
+			goto free_tmp;
+	}
+
+	if (queue & PTLRPC_NRS_QUEUE_HP) {
+		int rc2 = 0;
+
+		if (val_hp > max_val || val_hp < min_val) {
+			rc = -EINVAL;
+			goto free_tmp;
+		}
+
+		rc2 = ptlrpc_nrs_policy_control(svc, PTLRPC_NRS_QUEUE_HP,
+						pol_name, opc, single, &val_hp);
+		if ((rc2 < 0 && rc2 != -ENODEV) ||
+		    (rc2 == -ENODEV && queue == PTLRPC_NRS_QUEUE_HP)) {
+			rc = rc2;
+			goto free_tmp;
+		}
+	}
+
+	/* If we've reached here then we want to return count */
+	rc = count;
+
+free_tmp:
+	kfree(tmp);
+free_kernbuf:
+	kfree(kernbuf);
+
+	return rc;
+}
+
+/**
+ * Retrieves the value of the minimum delay for delay policy instances on both
+ * the regular and high-priority NRS head of a service, as long as a policy
+ * instance is not in the ptlrpc_nrs_pol_state::NRS_POL_STATE_STOPPED state;
+ */
+static int
+ptlrpc_lprocfs_nrs_delay_min_seq_show(struct seq_file *m, void *data)
+{
+	struct ptlrpc_service *svc = m->private;
+	unsigned int min_delay;
+	int rc;
+
+	rc = ptlrpc_nrs_policy_control(svc, PTLRPC_NRS_QUEUE_REG,
+				       NRS_POL_NAME_DELAY,
+				       NRS_CTL_DELAY_RD_MIN,
+				       true, &min_delay);
+
+	if (rc == 0)
+		seq_printf(m, LPROCFS_NRS_DELAY_MIN_NAME_REG"%-5d\n",
+			   min_delay);
+		/**
+		 * Ignore -ENODEV as the regular NRS head's policy may be in
+		 * the ptlrpc_nrs_pol_state::NRS_POL_STATE_STOPPED state.
+		 */
+	else if (rc != -ENODEV)
+		return rc;
+
+	if (!nrs_svc_has_hp(svc))
+		return 0;
+
+	rc = ptlrpc_nrs_policy_control(svc, PTLRPC_NRS_QUEUE_HP,
+				       NRS_POL_NAME_DELAY,
+				       NRS_CTL_DELAY_RD_MIN,
+				       true, &min_delay);
+	if (rc == 0)
+		seq_printf(m, LPROCFS_NRS_DELAY_MIN_NAME_HP"%-5d\n",
+			   min_delay);
+		/**
+		 * Ignore -ENODEV as the regular NRS head's policy may be in
+		 * the ptlrpc_nrs_pol_state::NRS_POL_STATE_STOPPED state.
+		 */
+	else if (rc == -ENODEV)
+		rc = 0;
+
+	return rc;
+}
+
+/**
+ * Sets the value of the minimum request delay for delay policy instances of a
+ * service. The user can set the minimum request delay for the regular or high
+ * priority NRS head individually by specifying each value, or both together in
+ * a single invocation.
+ *
+ * For example:
+ *
+ * lctl set_param *.*.*.nrs_delay_min=reg_delay_min:5, to set the regular
+ * request minimum delay on all PtlRPC services to 5 seconds
+ *
+ * lctl set_param *.*.*.nrs_delay_min=hp_delay_min:2, to set the high-priority
+ * request minimum delay on all PtlRPC services to 2 seconds, and
+ *
+ * lctl set_param *.*.ost_io.nrs_delay_min=8, to set both the regular and
+ * high priority request minimum delay of the ost_io service to 8 seconds.
+ */
+static ssize_t
+ptlrpc_lprocfs_nrs_delay_min_seq_write(struct file *file,
+				       const char __user *buffer, size_t count,
+				       loff_t *off)
+{
+	struct seq_file *m = file->private_data;
+	struct ptlrpc_service *svc = m->private;
+
+	return lprocfs_nrs_delay_seq_write_common(buffer,
+						  LPROCFS_NRS_DELAY_MIN_SIZE,
+						  count,
+						  LPROCFS_NRS_DELAY_MIN_NAME,
+						  LPROCFS_NRS_DELAY_LOWER_BOUND,
+						  LPROCFS_NRS_DELAY_UPPER_BOUND,
+						  svc, NRS_POL_NAME_DELAY,
+						  NRS_CTL_DELAY_WR_MIN, false);
+}
+
+LDEBUGFS_SEQ_FOPS(ptlrpc_lprocfs_nrs_delay_min);
+
+/**
+ * Retrieves the value of the maximum delay for delay policy instances on both
+ * the regular and high-priority NRS head of a service, as long as a policy
+ * instance is not in the ptlrpc_nrs_pol_state::NRS_POL_STATE_STOPPED state;
+ */
+static int
+ptlrpc_lprocfs_nrs_delay_max_seq_show(struct seq_file *m, void *data)
+{
+	struct ptlrpc_service *svc = m->private;
+	unsigned int max_delay;
+	int rc;
+
+	rc = ptlrpc_nrs_policy_control(svc, PTLRPC_NRS_QUEUE_REG,
+				       NRS_POL_NAME_DELAY,
+				       NRS_CTL_DELAY_RD_MAX,
+				       true, &max_delay);
+
+	if (rc == 0)
+		seq_printf(m, LPROCFS_NRS_DELAY_MAX_NAME_REG"%-5d\n",
+			   max_delay);
+		/**
+		 * Ignore -ENODEV as the regular NRS head's policy may be in
+		 * the ptlrpc_nrs_pol_state::NRS_POL_STATE_STOPPED state.
+		 */
+	else if (rc != -ENODEV)
+		return rc;
+
+	if (!nrs_svc_has_hp(svc))
+		return 0;
+
+	rc = ptlrpc_nrs_policy_control(svc, PTLRPC_NRS_QUEUE_HP,
+				       NRS_POL_NAME_DELAY,
+				       NRS_CTL_DELAY_RD_MAX,
+				       true, &max_delay);
+	if (rc == 0)
+		seq_printf(m, LPROCFS_NRS_DELAY_MAX_NAME_HP"%-5d\n",
+			   max_delay);
+		/**
+		 * Ignore -ENODEV as the regular NRS head's policy may be in
+		 * the ptlrpc_nrs_pol_state::NRS_POL_STATE_STOPPED state.
+		 */
+	else if (rc == -ENODEV)
+		rc = 0;
+
+	return rc;
+}
+
+/**
+ * Sets the value of the maximum request delay for delay policy instances of a
+ * service. The user can set the maximum request delay for the regular or high
+ * priority NRS head individually by specifying each value, or both together in
+ * a single invocation.
+ *
+ * For example:
+ *
+ * lctl set_param *.*.*.nrs_delay_max=reg_delay_max:20, to set the regular
+ * request maximum delay on all PtlRPC services to 20 seconds
+ *
+ * lctl set_param *.*.*.nrs_delay_max=hp_delay_max:10, to set the high-priority
+ * request maximum delay on all PtlRPC services to 10 seconds, and
+ *
+ * lctl set_param *.*.ost_io.nrs_delay_max=35, to set both the regular and
+ * high priority request maximum delay of the ost_io service to 35 seconds.
+ */
+static ssize_t
+ptlrpc_lprocfs_nrs_delay_max_seq_write(struct file *file,
+				       const char __user *buffer, size_t count,
+				       loff_t *off)
+{
+	struct seq_file *m = file->private_data;
+	struct ptlrpc_service *svc = m->private;
+
+	return lprocfs_nrs_delay_seq_write_common(buffer,
+						  LPROCFS_NRS_DELAY_MAX_SIZE,
+						  count,
+						  LPROCFS_NRS_DELAY_MAX_NAME,
+						  LPROCFS_NRS_DELAY_LOWER_BOUND,
+						  LPROCFS_NRS_DELAY_UPPER_BOUND,
+						  svc, NRS_POL_NAME_DELAY,
+						  NRS_CTL_DELAY_WR_MAX, false);
+}
+
+LDEBUGFS_SEQ_FOPS(ptlrpc_lprocfs_nrs_delay_max);
+
+/**
+ * Retrieves the value of the percentage of requests which should be delayed
+ * for delay policy instances on both the regular and high-priority NRS head
+ * of a service, as long as a policy instance is not in the
+ * ptlrpc_nrs_pol_state::NRS_POL_STATE_STOPPED state;
+ */
+static int
+ptlrpc_lprocfs_nrs_delay_pct_seq_show(struct seq_file *m, void *data)
+{
+	struct ptlrpc_service *svc = m->private;
+	unsigned int delay_pct;
+	int rc;
+
+	rc = ptlrpc_nrs_policy_control(svc, PTLRPC_NRS_QUEUE_REG,
+				       NRS_POL_NAME_DELAY,
+				       NRS_CTL_DELAY_RD_PCT,
+				       true, &delay_pct);
+
+	if (rc == 0)
+		seq_printf(m, LPROCFS_NRS_DELAY_PCT_NAME_REG"%-3d\n",
+			   delay_pct);
+		/**
+		 * Ignore -ENODEV as the regular NRS head's policy may be in
+		 * the ptlrpc_nrs_pol_state::NRS_POL_STATE_STOPPED state.
+		 */
+	else if (rc != -ENODEV)
+		return rc;
+
+	if (!nrs_svc_has_hp(svc))
+		return 0;
+
+	rc = ptlrpc_nrs_policy_control(svc, PTLRPC_NRS_QUEUE_HP,
+				       NRS_POL_NAME_DELAY,
+				       NRS_CTL_DELAY_RD_PCT,
+				       true, &delay_pct);
+	if (rc == 0)
+		seq_printf(m, LPROCFS_NRS_DELAY_PCT_NAME_HP"%-3d\n",
+			   delay_pct);
+		/**
+		 * Ignore -ENODEV as the regular NRS head's policy may be in
+		 * the ptlrpc_nrs_pol_state::NRS_POL_STATE_STOPPED state.
+		 */
+	else if (rc == -ENODEV)
+		rc = 0;
+
+	return rc;
+}
+
+/**
+ * Sets the value of the percentage of requests to be delayed for delay policy
+ * instances of a service. The user can set the percentage for the regular or
+ * high-priority NRS head individually by specifying each value, or both
+ * together in a single invocation.
+ *
+ * For example:
+ *
+ * lctl set_param *.*.*.nrs_delay_pct=reg_delay_pct:5, to delay 5 percent of
+ * regular requests on all PtlRPC services
+ *
+ * lctl set_param *.*.*.nrs_delay_pct=hp_delay_pct:2, to delay 2 percent of
+ * high-priority requests on all PtlRPC services, and
+ *
+ * lctl set_param *.*.ost_io.nrs_delay_pct=8, to delay 8 percent of both
+ * regular and high-priority requests of the ost_io service.
+ */
+static ssize_t
+ptlrpc_lprocfs_nrs_delay_pct_seq_write(struct file *file,
+				       const char __user *buffer, size_t count,
+				       loff_t *off)
+{
+	struct seq_file *m = file->private_data;
+	struct ptlrpc_service *svc = m->private;
+
+	return lprocfs_nrs_delay_seq_write_common(buffer,
+						  LPROCFS_NRS_DELAY_PCT_SIZE,
+						  count,
+						  LPROCFS_NRS_DELAY_PCT_NAME,
+						  LPROCFS_NRS_DELAY_PCT_MIN_VAL,
+						  LPROCFS_NRS_DELAY_PCT_MAX_VAL,
+						  svc, NRS_POL_NAME_DELAY,
+						  NRS_CTL_DELAY_WR_PCT, false);
+}
+
+LDEBUGFS_SEQ_FOPS(ptlrpc_lprocfs_nrs_delay_pct);
+
+static int nrs_delay_lprocfs_init(struct ptlrpc_service *svc)
+{
+	struct ldebugfs_vars nrs_delay_lprocfs_vars[] = {
+		{ .name		= "nrs_delay_min",
+		  .fops		= &ptlrpc_lprocfs_nrs_delay_min_fops,
+		  .data		= svc },
+		{ .name		= "nrs_delay_max",
+		  .fops		= &ptlrpc_lprocfs_nrs_delay_max_fops,
+		  .data		= svc },
+		{ .name		= "nrs_delay_pct",
+		  .fops		= &ptlrpc_lprocfs_nrs_delay_pct_fops,
+		  .data		= svc },
+		{ NULL }
+	};
+
+	if (!svc->srv_debugfs_entry)
+		return 0;
+
+	ldebugfs_add_vars(svc->srv_debugfs_entry, nrs_delay_lprocfs_vars, NULL);
+
+	return 0;
+}
+
+/**
+ * Delay policy operations
+ */
+static const struct ptlrpc_nrs_pol_ops nrs_delay_ops = {
+	.op_policy_start	= nrs_delay_start,
+	.op_policy_stop		= nrs_delay_stop,
+	.op_policy_ctl		= nrs_delay_ctl,
+	.op_res_get		= nrs_delay_res_get,
+	.op_req_get		= nrs_delay_req_get,
+	.op_req_enqueue		= nrs_delay_req_add,
+	.op_req_dequeue		= nrs_delay_req_del,
+	.op_req_stop		= nrs_delay_req_stop,
+	.op_lprocfs_init	= nrs_delay_lprocfs_init,
+};
+
+/**
+ * Delay policy configuration
+ */
+struct ptlrpc_nrs_pol_conf nrs_conf_delay = {
+	.nc_name		= NRS_POL_NAME_DELAY,
+	.nc_ops			= &nrs_delay_ops,
+	.nc_compat		= nrs_policy_compat_all,
+};
+
+/** @} delay */
+
+/** @} nrs */
diff --git a/fs/lustre/ptlrpc/ptlrpc_internal.h b/fs/lustre/ptlrpc/ptlrpc_internal.h
index 190c2b1..bb4759ce 100644
--- a/fs/lustre/ptlrpc/ptlrpc_internal.h
+++ b/fs/lustre/ptlrpc/ptlrpc_internal.h
@@ -46,6 +46,8 @@
 extern int test_req_buffer_pressure;
 extern struct mutex ptlrpc_all_services_mutex;
 extern struct list_head ptlrpc_all_services;
+extern struct ptlrpc_nrs_pol_conf nrs_conf_fifo;
+extern struct ptlrpc_nrs_pol_conf nrs_conf_delay;
 
 extern struct mutex ptlrpcd_mutex;
 extern struct mutex pinger_mutex;
@@ -232,9 +234,6 @@ struct ptlrpc_nrs_policy *nrs_request_policy(struct ptlrpc_nrs_request *nrq)
 	sizeof(NRS_LPROCFS_QUANTUM_NAME_REG __stringify(LPROCFS_NRS_QUANTUM_MAX) " "  \
 	       NRS_LPROCFS_QUANTUM_NAME_HP __stringify(LPROCFS_NRS_QUANTUM_MAX))
 
-/* ptlrpc/nrs_fifo.c */
-extern struct ptlrpc_nrs_pol_conf nrs_conf_fifo;
-
 /* recovd_thread.c */
 
 int ptlrpc_expire_one_request(struct ptlrpc_request *req, int async_unlink);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 29/49] lustre: ptlrpc: rename cfs_binheap to simply binheap
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (27 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 28/49] lustre: ptlrpc: Implement NRS Delay Policy James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 30/49] lustre: ptlrpc: mark some functions as static James Simmons
                   ` (19 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

As the binheap code is no longer part of libcfs, the cfs_ prefix is
misleading.  As this code is local to one module and doesn't conflict
with anything global, there is no need for a prefix at all.  So change
cfs_binheap to binheap.

This patch was prepare using 'sed', then fixing a few text-alignment
issues caused by the loss of those 4 characters.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14289
Lustre-commit: 8587508f5ddd7b8 ("LU-14289 ptlrpc: rename cfs_binheap to simply binheap")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41375
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_nrs.h       |  6 +--
 fs/lustre/include/lustre_nrs_delay.h |  2 +-
 fs/lustre/ptlrpc/heap.c              | 98 ++++++++++++++++++------------------
 fs/lustre/ptlrpc/heap.h              | 72 +++++++++++++-------------
 fs/lustre/ptlrpc/nrs_delay.c         | 31 ++++++------
 5 files changed, 104 insertions(+), 105 deletions(-)

diff --git a/fs/lustre/include/lustre_nrs.h b/fs/lustre/include/lustre_nrs.h
index 0fc9e94..7e0a840 100644
--- a/fs/lustre/include/lustre_nrs.h
+++ b/fs/lustre/include/lustre_nrs.h
@@ -675,9 +675,9 @@ enum {
  * Binary heap node.
  *
  * Objects of this type are embedded into objects of the ordered set that is to
- * be maintained by a \e struct cfs_binheap instance.
+ * be maintained by a \e struct binheap instance.
  */
-struct cfs_binheap_node {
+struct binheap_node {
 	/** Index into the binary tree */
 	unsigned int	chn_index;
 };
@@ -707,7 +707,7 @@ struct ptlrpc_nrs_request {
 	unsigned int			nr_enqueued:1;
 	unsigned int			nr_started:1;
 	unsigned int			nr_finalized:1;
-	struct cfs_binheap_node		nr_node;
+	struct binheap_node		nr_node;
 
 	/**
 	 * Policy-specific fields, used for determining a request's scheduling
diff --git a/fs/lustre/include/lustre_nrs_delay.h b/fs/lustre/include/lustre_nrs_delay.h
index 01f0725..52c3885 100644
--- a/fs/lustre/include/lustre_nrs_delay.h
+++ b/fs/lustre/include/lustre_nrs_delay.h
@@ -48,7 +48,7 @@ struct nrs_delay_data {
 	 * Delayed requests are stored in this binheap until they are
 	 * removed for handling.
 	 */
-	struct cfs_binheap		*delay_binheap;
+	struct binheap			*delay_binheap;
 
 	/**
 	 * Minimum service time
diff --git a/fs/lustre/ptlrpc/heap.c b/fs/lustre/ptlrpc/heap.c
index 92f8a2e..0c5e29d 100644
--- a/fs/lustre/ptlrpc/heap.c
+++ b/fs/lustre/ptlrpc/heap.c
@@ -60,7 +60,7 @@
 
 /**
  * Grows the capacity of a binary heap so that it can handle a larger number of
- * \e struct cfs_binheap_node objects.
+ * \e struct binheap_node objects.
  *
  * \param[in] h The binary heap
  *
@@ -68,10 +68,10 @@
  * \retval -ENOMEM OOM error
  */
 static int
-cfs_binheap_grow(struct cfs_binheap *h)
+binheap_grow(struct binheap *h)
 {
-	struct cfs_binheap_node ***frag1 = NULL;
-	struct cfs_binheap_node  **frag2;
+	struct binheap_node ***frag1 = NULL;
+	struct binheap_node  **frag2;
 	int hwm = h->cbh_hwm;
 
 	/* need a whole new chunk of pointers */
@@ -164,12 +164,12 @@
  * \retval valid-pointer A newly-created and initialized binary heap object
  * \retval NULL		 error
  */
-struct cfs_binheap *
-cfs_binheap_create(struct cfs_binheap_ops *ops, unsigned int flags,
+struct binheap *
+binheap_create(struct binheap_ops *ops, unsigned int flags,
 		   unsigned int count, void *arg, struct cfs_cpt_table *cptab,
 		   int cptid)
 {
-	struct cfs_binheap *h;
+	struct binheap *h;
 
 	LASSERT(ops);
 	LASSERT(ops->hop_compare);
@@ -194,8 +194,8 @@ struct cfs_binheap *
 	h->cbh_cptid	  = cptid;
 
 	while (h->cbh_hwm < count) { /* preallocate */
-		if (cfs_binheap_grow(h) != 0) {
-			cfs_binheap_destroy(h);
+		if (binheap_grow(h) != 0) {
+			binheap_destroy(h);
 			return NULL;
 		}
 	}
@@ -204,7 +204,7 @@ struct cfs_binheap *
 
 	return h;
 }
-EXPORT_SYMBOL(cfs_binheap_create);
+EXPORT_SYMBOL(binheap_create);
 
 /**
  * Releases all resources associated with a binary heap instance.
@@ -215,7 +215,7 @@ struct cfs_binheap *
  * \param[in] h The binary heap object
  */
 void
-cfs_binheap_destroy(struct cfs_binheap *h)
+binheap_destroy(struct binheap *h)
 {
 	int idx0;
 	int idx1;
@@ -255,7 +255,7 @@ struct cfs_binheap *
 
 	kfree(h);
 }
-EXPORT_SYMBOL(cfs_binheap_destroy);
+EXPORT_SYMBOL(binheap_destroy);
 
 /**
  * Obtains a double pointer to a heap element, given its index into the binary
@@ -266,8 +266,8 @@ struct cfs_binheap *
  *
  * \retval valid-pointer A double pointer to a heap pointer entry
  */
-static struct cfs_binheap_node **
-cfs_binheap_pointer(struct cfs_binheap *h, unsigned int idx)
+static struct binheap_node **
+binheap_pointer(struct binheap *h, unsigned int idx)
 {
 	if (idx < CBH_SIZE)
 		return &(h->cbh_elements1[idx]);
@@ -291,15 +291,15 @@ struct cfs_binheap *
  * \retval valid-pointer The requested heap node
  * \retval NULL		 Supplied index is out of bounds
  */
-struct cfs_binheap_node *
-cfs_binheap_find(struct cfs_binheap *h, unsigned int idx)
+struct binheap_node *
+binheap_find(struct binheap *h, unsigned int idx)
 {
 	if (idx >= h->cbh_nelements)
 		return NULL;
 
-	return *cfs_binheap_pointer(h, idx);
+	return *binheap_pointer(h, idx);
 }
-EXPORT_SYMBOL(cfs_binheap_find);
+EXPORT_SYMBOL(binheap_find);
 
 /**
  * Moves a node upwards, towards the root of the binary tree.
@@ -311,21 +311,21 @@ struct cfs_binheap_node *
  * \retval 0 The position of \a e in the tree was not changed
  */
 static int
-cfs_binheap_bubble(struct cfs_binheap *h, struct cfs_binheap_node *e)
+binheap_bubble(struct binheap *h, struct binheap_node *e)
 {
 	unsigned int	     cur_idx = e->chn_index;
-	struct cfs_binheap_node **cur_ptr;
+	struct binheap_node **cur_ptr;
 	unsigned int	     parent_idx;
-	struct cfs_binheap_node **parent_ptr;
+	struct binheap_node **parent_ptr;
 	int		     did_sth = 0;
 
-	cur_ptr = cfs_binheap_pointer(h, cur_idx);
+	cur_ptr = binheap_pointer(h, cur_idx);
 	LASSERT(*cur_ptr == e);
 
 	while (cur_idx > 0) {
 		parent_idx = (cur_idx - 1) >> 1;
 
-		parent_ptr = cfs_binheap_pointer(h, parent_idx);
+		parent_ptr = binheap_pointer(h, parent_idx);
 		LASSERT((*parent_ptr)->chn_index == parent_idx);
 
 		if (h->cbh_ops->hop_compare(*parent_ptr, e))
@@ -354,21 +354,21 @@ struct cfs_binheap_node *
  * \retval 0 The position of \a e in the tree was not changed
  */
 static int
-cfs_binheap_sink(struct cfs_binheap *h, struct cfs_binheap_node *e)
+binheap_sink(struct binheap *h, struct binheap_node *e)
 {
 	unsigned int	     n = h->cbh_nelements;
 	unsigned int	     child_idx;
-	struct cfs_binheap_node **child_ptr;
-	struct cfs_binheap_node  *child;
+	struct binheap_node **child_ptr;
+	struct binheap_node  *child;
 	unsigned int	     child2_idx;
-	struct cfs_binheap_node **child2_ptr;
-	struct cfs_binheap_node  *child2;
+	struct binheap_node **child2_ptr;
+	struct binheap_node  *child2;
 	unsigned int	     cur_idx;
-	struct cfs_binheap_node **cur_ptr;
+	struct binheap_node **cur_ptr;
 	int		     did_sth = 0;
 
 	cur_idx = e->chn_index;
-	cur_ptr = cfs_binheap_pointer(h, cur_idx);
+	cur_ptr = binheap_pointer(h, cur_idx);
 	LASSERT(*cur_ptr == e);
 
 	while (cur_idx < n) {
@@ -376,12 +376,12 @@ struct cfs_binheap_node *
 		if (child_idx >= n)
 			break;
 
-		child_ptr = cfs_binheap_pointer(h, child_idx);
+		child_ptr = binheap_pointer(h, child_idx);
 		child = *child_ptr;
 
 		child2_idx = child_idx + 1;
 		if (child2_idx < n) {
-			child2_ptr = cfs_binheap_pointer(h, child2_idx);
+			child2_ptr = binheap_pointer(h, child2_idx);
 			child2 = *child2_ptr;
 
 			if (h->cbh_ops->hop_compare(child2, child)) {
@@ -419,14 +419,14 @@ struct cfs_binheap_node *
  * \retval != 0 error
  */
 int
-cfs_binheap_insert(struct cfs_binheap *h, struct cfs_binheap_node *e)
+binheap_insert(struct binheap *h, struct binheap_node *e)
 {
-	struct cfs_binheap_node **new_ptr;
+	struct binheap_node **new_ptr;
 	unsigned int	     new_idx = h->cbh_nelements;
 	int		     rc;
 
 	if (new_idx == h->cbh_hwm) {
-		rc = cfs_binheap_grow(h);
+		rc = binheap_grow(h);
 		if (rc != 0)
 			return rc;
 	}
@@ -438,15 +438,15 @@ struct cfs_binheap_node *
 	}
 
 	e->chn_index = new_idx;
-	new_ptr = cfs_binheap_pointer(h, new_idx);
+	new_ptr = binheap_pointer(h, new_idx);
 	h->cbh_nelements++;
 	*new_ptr = e;
 
-	cfs_binheap_bubble(h, e);
+	binheap_bubble(h, e);
 
 	return 0;
 }
-EXPORT_SYMBOL(cfs_binheap_insert);
+EXPORT_SYMBOL(binheap_insert);
 
 /**
  * Removes a node from the binary heap.
@@ -455,34 +455,34 @@ struct cfs_binheap_node *
  * \param[in] e The node
  */
 void
-cfs_binheap_remove(struct cfs_binheap *h, struct cfs_binheap_node *e)
+binheap_remove(struct binheap *h, struct binheap_node *e)
 {
 	unsigned int	     n = h->cbh_nelements;
 	unsigned int	     cur_idx = e->chn_index;
-	struct cfs_binheap_node **cur_ptr;
-	struct cfs_binheap_node  *last;
+	struct binheap_node **cur_ptr;
+	struct binheap_node  *last;
 
 	LASSERT(cur_idx != CBH_POISON);
 	LASSERT(cur_idx < n);
 
-	cur_ptr = cfs_binheap_pointer(h, cur_idx);
+	cur_ptr = binheap_pointer(h, cur_idx);
 	LASSERT(*cur_ptr == e);
 
 	n--;
-	last = *cfs_binheap_pointer(h, n);
+	last = *binheap_pointer(h, n);
 	h->cbh_nelements = n;
 	if (last == e)
 		return;
 
 	last->chn_index = cur_idx;
 	*cur_ptr = last;
-	cfs_binheap_relocate(h, *cur_ptr);
+	binheap_relocate(h, *cur_ptr);
 
 	e->chn_index = CBH_POISON;
 	if (h->cbh_ops->hop_exit)
 		h->cbh_ops->hop_exit(h, e);
 }
-EXPORT_SYMBOL(cfs_binheap_remove);
+EXPORT_SYMBOL(binheap_remove);
 
 /**
  * Relocate a node in the binary heap.
@@ -493,10 +493,10 @@ struct cfs_binheap_node *
  * \param[in] e The node
  */
 void
-cfs_binheap_relocate(struct cfs_binheap *h, struct cfs_binheap_node *e)
+binheap_relocate(struct binheap *h, struct binheap_node *e)
 {
-	if (!cfs_binheap_bubble(h, e))
-		cfs_binheap_sink(h, e);
+	if (!binheap_bubble(h, e))
+		binheap_sink(h, e);
 }
-EXPORT_SYMBOL(cfs_binheap_relocate);
+EXPORT_SYMBOL(binheap_relocate);
 /** @} heap */
diff --git a/fs/lustre/ptlrpc/heap.h b/fs/lustre/ptlrpc/heap.h
index 3972917..bc8fb19 100644
--- a/fs/lustre/ptlrpc/heap.h
+++ b/fs/lustre/ptlrpc/heap.h
@@ -40,9 +40,9 @@
  * the tree (as this is an implementation of a min-heap) to be removed by users
  * for consumption.
  *
- * Users of the heap should embed a \e struct cfs_binheap_node object instance
+ * Users of the heap should embed a \e struct binheap_node object instance
  * on every object of the set that they wish the binary heap instance to handle,
- * and (at a minimum) provide a struct cfs_binheap_ops::hop_compare()
+ * and (at a minimum) provide a struct binheap_ops::hop_compare()
  * implementation which is used by the heap as the binary predicate during its
  * internal sorting operations.
  *
@@ -57,7 +57,7 @@
 #define CBH_SHIFT	9
 #define CBH_SIZE	(1 << CBH_SHIFT)		/* # ptrs per level */
 #define CBH_MASK	(CBH_SIZE - 1)
-#define CBH_NOB		(CBH_SIZE * sizeof(struct cfs_binheap_node *))
+#define CBH_NOB		(CBH_SIZE * sizeof(struct binheap_node *))
 
 #define CBH_POISON	0xdeadbeef
 
@@ -68,12 +68,12 @@ enum {
 	CBH_FLAG_ATOMIC_GROW	= 1,
 };
 
-struct cfs_binheap;
+struct binheap;
 
 /**
  * Binary heap operations.
  */
-struct cfs_binheap_ops {
+struct binheap_ops {
 	/**
 	 * Called right before inserting a node into the binary heap.
 	 *
@@ -85,8 +85,8 @@ struct cfs_binheap_ops {
 	 * \retval 0 success
 	 * \retval != 0 error
 	 */
-	int		(*hop_enter)(struct cfs_binheap *h,
-				     struct cfs_binheap_node *e);
+	int		(*hop_enter)(struct binheap *h,
+				     struct binheap_node *e);
 	/**
 	 * Called right after removing a node from the binary heap.
 	 *
@@ -95,8 +95,8 @@ struct cfs_binheap_ops {
 	 * \param[in] h The heap
 	 * \param[in] e The node
 	 */
-	void		(*hop_exit)(struct cfs_binheap *h,
-				    struct cfs_binheap_node *e);
+	void		(*hop_exit)(struct binheap *h,
+				    struct binheap_node *e);
 	/**
 	 * A binary predicate which is called during internal heap sorting
 	 * operations, and used in order to determine the relevant ordering of
@@ -110,25 +110,25 @@ struct cfs_binheap_ops {
 	 * \retval 0 Node a > node b
 	 * \retval 1 Node a < node b
 	 *
-	 * \see cfs_binheap_bubble()
+	 * \see binheap_bubble()
 	 * \see cfs_biheap_sink()
 	 */
-	int		(*hop_compare)(struct cfs_binheap_node *a,
-				       struct cfs_binheap_node *b);
+	int		(*hop_compare)(struct binheap_node *a,
+				       struct binheap_node *b);
 };
 
 /**
  * Binary heap object.
  *
- * Sorts elements of type \e struct cfs_binheap_node
+ * Sorts elements of type \e struct binheap_node
  */
-struct cfs_binheap {
+struct binheap {
 	/** Triple indirect */
-	struct cfs_binheap_node  ****cbh_elements3;
+	struct binheap_node  ****cbh_elements3;
 	/** double indirect */
-	struct cfs_binheap_node   ***cbh_elements2;
+	struct binheap_node   ***cbh_elements2;
 	/** single indirect */
-	struct cfs_binheap_node    **cbh_elements1;
+	struct binheap_node    **cbh_elements1;
 	/** # elements referenced */
 	unsigned int		cbh_nelements;
 	/** high water mark */
@@ -136,51 +136,51 @@ struct cfs_binheap {
 	/** user flags */
 	unsigned int		cbh_flags;
 	/** operations table */
-	struct cfs_binheap_ops *cbh_ops;
+	struct binheap_ops *cbh_ops;
 	/** private data */
 	void		       *cbh_private;
 	/** associated CPT table */
 	struct cfs_cpt_table   *cbh_cptab;
-	/** associated CPT id of this struct cfs_binheap::cbh_cptab */
+	/** associated CPT id of this struct binheap::cbh_cptab */
 	int			cbh_cptid;
 };
 
-void cfs_binheap_destroy(struct cfs_binheap *h);
-struct cfs_binheap *
-cfs_binheap_create(struct cfs_binheap_ops *ops, unsigned int flags,
+void binheap_destroy(struct binheap *h);
+struct binheap *
+binheap_create(struct binheap_ops *ops, unsigned int flags,
 		   unsigned int count, void *arg, struct cfs_cpt_table *cptab,
 		   int cptid);
-struct cfs_binheap_node *
-cfs_binheap_find(struct cfs_binheap *h, unsigned int idx);
-int cfs_binheap_insert(struct cfs_binheap *h, struct cfs_binheap_node *e);
-void cfs_binheap_remove(struct cfs_binheap *h, struct cfs_binheap_node *e);
-void cfs_binheap_relocate(struct cfs_binheap *h, struct cfs_binheap_node *e);
+struct binheap_node *
+binheap_find(struct binheap *h, unsigned int idx);
+int binheap_insert(struct binheap *h, struct binheap_node *e);
+void binheap_remove(struct binheap *h, struct binheap_node *e);
+void binheap_relocate(struct binheap *h, struct binheap_node *e);
 
 static inline int
-cfs_binheap_size(struct cfs_binheap *h)
+binheap_size(struct binheap *h)
 {
 	return h->cbh_nelements;
 }
 
 static inline int
-cfs_binheap_is_empty(struct cfs_binheap *h)
+binheap_is_empty(struct binheap *h)
 {
 	return h->cbh_nelements == 0;
 }
 
-static inline struct cfs_binheap_node *
-cfs_binheap_root(struct cfs_binheap *h)
+static inline struct binheap_node *
+binheap_root(struct binheap *h)
 {
-	return cfs_binheap_find(h, 0);
+	return binheap_find(h, 0);
 }
 
-static inline struct cfs_binheap_node *
-cfs_binheap_remove_root(struct cfs_binheap *h)
+static inline struct binheap_node *
+binheap_remove_root(struct binheap *h)
 {
-	struct cfs_binheap_node *e = cfs_binheap_find(h, 0);
+	struct binheap_node *e = binheap_find(h, 0);
 
 	if (e != NULL)
-		cfs_binheap_remove(h, e);
+		binheap_remove(h, e);
 	return e;
 }
 
diff --git a/fs/lustre/ptlrpc/nrs_delay.c b/fs/lustre/ptlrpc/nrs_delay.c
index 8ff8e8d..5b4c2a9 100644
--- a/fs/lustre/ptlrpc/nrs_delay.c
+++ b/fs/lustre/ptlrpc/nrs_delay.c
@@ -77,8 +77,8 @@
  * \retval 0 start_time(e1) > start_time(e2)
  * \retval 1 start_time(e1) <= start_time(e2)
  */
-static int delay_req_compare(struct cfs_binheap_node *e1,
-			     struct cfs_binheap_node *e2)
+static int delay_req_compare(struct binheap_node *e1,
+			     struct binheap_node *e2)
 {
 	struct ptlrpc_nrs_request *nrq1;
 	struct ptlrpc_nrs_request *nrq2;
@@ -90,7 +90,7 @@ static int delay_req_compare(struct cfs_binheap_node *e1,
 	       nrq2->nr_u.delay.req_start_time;
 }
 
-static struct cfs_binheap_ops nrs_delay_heap_ops = {
+static struct binheap_ops nrs_delay_heap_ops = {
 	.hop_enter	= NULL,
 	.hop_exit	= NULL,
 	.hop_compare	= delay_req_compare,
@@ -119,12 +119,11 @@ static int nrs_delay_start(struct ptlrpc_nrs_policy *policy)
 	if (!delay_data)
 		return -ENOMEM;
 
-	delay_data->delay_binheap = cfs_binheap_create(&nrs_delay_heap_ops,
-						       CBH_FLAG_ATOMIC_GROW,
-						       4096, NULL,
-						       nrs_pol2cptab(policy),
-						       nrs_pol2cptid(policy));
-
+	delay_data->delay_binheap = binheap_create(&nrs_delay_heap_ops,
+						   CBH_FLAG_ATOMIC_GROW,
+						   4096, NULL,
+						   nrs_pol2cptab(policy),
+						   nrs_pol2cptid(policy));
 	if (!delay_data->delay_binheap) {
 		kfree(delay_data);
 		return -ENOMEM;
@@ -154,9 +153,9 @@ static void nrs_delay_stop(struct ptlrpc_nrs_policy *policy)
 
 	LASSERT(delay_data);
 	LASSERT(delay_data->delay_binheap);
-	LASSERT(cfs_binheap_is_empty(delay_data->delay_binheap));
+	LASSERT(binheap_is_empty(delay_data->delay_binheap));
 
-	cfs_binheap_destroy(delay_data->delay_binheap);
+	binheap_destroy(delay_data->delay_binheap);
 
 	kfree(delay_data);
 }
@@ -212,10 +211,10 @@ struct ptlrpc_nrs_request *nrs_delay_req_get(struct ptlrpc_nrs_policy *policy,
 					     bool peek, bool force)
 {
 	struct nrs_delay_data *delay_data = policy->pol_private;
-	struct cfs_binheap_node *node;
+	struct binheap_node *node;
 	struct ptlrpc_nrs_request *nrq;
 
-	node = cfs_binheap_root(delay_data->delay_binheap);
+	node = binheap_root(delay_data->delay_binheap);
 	nrq = unlikely(!node) ? NULL :
 	      container_of(node, struct ptlrpc_nrs_request, nr_node);
 
@@ -224,7 +223,7 @@ struct ptlrpc_nrs_request *nrs_delay_req_get(struct ptlrpc_nrs_policy *policy,
 		    ktime_get_real_seconds() < nrq->nr_u.delay.req_start_time)
 			nrq = NULL;
 		else if (likely(!peek))
-			cfs_binheap_remove(delay_data->delay_binheap,
+			binheap_remove(delay_data->delay_binheap,
 					   &nrq->nr_node);
 	}
 
@@ -262,7 +261,7 @@ static int nrs_delay_req_add(struct ptlrpc_nrs_policy *policy,
 					 prandom_u32_max(delay_data->max_delay - delay_data->min_delay + 1) +
 					 delay_data->min_delay;
 
-	return cfs_binheap_insert(delay_data->delay_binheap, &nrq->nr_node);
+	return binheap_insert(delay_data->delay_binheap, &nrq->nr_node);
 }
 
 /**
@@ -276,7 +275,7 @@ static void nrs_delay_req_del(struct ptlrpc_nrs_policy *policy,
 {
 	struct nrs_delay_data *delay_data = policy->pol_private;
 
-	cfs_binheap_remove(delay_data->delay_binheap, &nrq->nr_node);
+	binheap_remove(delay_data->delay_binheap, &nrq->nr_node);
 }
 
 /**
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 30/49] lustre: ptlrpc: mark some functions as static
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (28 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 29/49] lustre: ptlrpc: rename cfs_binheap to simply binheap James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 31/49] lustre: use tgt_pool for lov layer James Simmons
                   ` (18 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The functions
 ptlrpc_start_threads,
 ptlrpc_start_thread,

are only used in the same file that defines them, so mark them as
'static' and remove the declarations from include files.

WC-bug-id: https://jira.whamcloud.com/browse/LU-8837
Lustre-commit: f77e53d3656504c ("LU-8837 ptlrpc: mark some functions as static")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41947
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_net.h     | 1 -
 fs/lustre/ptlrpc/ptlrpc_internal.h | 1 -
 fs/lustre/ptlrpc/service.c         | 6 ++++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index a9aa363..2b98468 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -2009,7 +2009,6 @@ struct ptlrpc_service *ptlrpc_register_service(struct ptlrpc_service_conf *conf,
 					       struct kset *parent,
 					       struct dentry *debugfs_entry);
 
-int ptlrpc_start_threads(struct ptlrpc_service *svc);
 int ptlrpc_unregister_service(struct ptlrpc_service *service);
 
 int ptlrpc_hr_init(void);
diff --git a/fs/lustre/ptlrpc/ptlrpc_internal.h b/fs/lustre/ptlrpc/ptlrpc_internal.h
index bb4759ce..248f9cb 100644
--- a/fs/lustre/ptlrpc/ptlrpc_internal.h
+++ b/fs/lustre/ptlrpc/ptlrpc_internal.h
@@ -55,7 +55,6 @@
 extern lnet_handler_t ptlrpc_handler;
 extern struct percpu_ref ptlrpc_pending;
 
-int ptlrpc_start_thread(struct ptlrpc_service_part *svcpt, int wait);
 /* ptlrpcd.c */
 int ptlrpcd_start(struct ptlrpcd_ctl *pc);
 
diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c
index 427215c..070eecc 100644
--- a/fs/lustre/ptlrpc/service.c
+++ b/fs/lustre/ptlrpc/service.c
@@ -67,6 +67,8 @@
 static int ptlrpc_server_post_idle_rqbds(struct ptlrpc_service_part *svcpt);
 static void ptlrpc_server_hpreq_fini(struct ptlrpc_request *req);
 static void ptlrpc_at_remove_timed(struct ptlrpc_request *req);
+static int ptlrpc_start_threads(struct ptlrpc_service *svc);
+static int ptlrpc_start_thread(struct ptlrpc_service_part *svcpt, int wait);
 
 /** Holds a list of all PTLRPC services */
 LIST_HEAD(ptlrpc_all_services);
@@ -2566,7 +2568,7 @@ static void ptlrpc_stop_all_threads(struct ptlrpc_service *svc)
 	}
 }
 
-int ptlrpc_start_threads(struct ptlrpc_service *svc)
+static int ptlrpc_start_threads(struct ptlrpc_service *svc)
 {
 	int rc = 0;
 	int i;
@@ -2596,7 +2598,7 @@ int ptlrpc_start_threads(struct ptlrpc_service *svc)
 	return rc;
 }
 
-int ptlrpc_start_thread(struct ptlrpc_service_part *svcpt, int wait)
+static int ptlrpc_start_thread(struct ptlrpc_service_part *svcpt, int wait)
 {
 	struct ptlrpc_thread *thread;
 	struct ptlrpc_service *svc;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 31/49] lustre: use tgt_pool for lov layer
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (29 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 30/49] lustre: ptlrpc: mark some functions as static James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 32/49] lustre: quota: make used for pool correct James Simmons
                   ` (17 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Sergey Cheremencev, Lustre Development List

New general code was created for server target pool handling. We
can use this new code with the lov layer. Place this tgt_pool.c
in the obdclass instead of having a special target directory just
to build this code for the client.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14291
Lustre-commit: 01d23cc780c6c7f ("LU-14291 build: use tgt_pool for lov layer")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39683
WC-bug-id: https://jira.whamcloud.com/browse/LU-11023
Lustre-commit: 09f9fb3211cd998 ("LU-11023 quota: quota pools for OSTs")
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 fs/lustre/lov/lov_internal.h     |   7 --
 fs/lustre/lov/lov_obd.c          |  10 +-
 fs/lustre/lov/lov_pool.c         | 114 +-----------------
 fs/lustre/obdclass/Makefile      |   4 +-
 fs/lustre/obdclass/lu_tgt_pool.c | 241 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 253 insertions(+), 123 deletions(-)
 create mode 100644 fs/lustre/obdclass/lu_tgt_pool.c

diff --git a/fs/lustre/lov/lov_internal.h b/fs/lustre/lov/lov_internal.h
index 81adce4..2e1e2dd 100644
--- a/fs/lustre/lov/lov_internal.h
+++ b/fs/lustre/lov/lov_internal.h
@@ -333,13 +333,6 @@ struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, void *buf,
 
 #define LOV_MDC_TGT_MAX 256
 
-/* lu_tgt_pool methods */
-int lov_ost_pool_init(struct lu_tgt_pool *op, unsigned int count);
-int lov_ost_pool_extend(struct lu_tgt_pool *op, unsigned int min_count);
-int lov_ost_pool_add(struct lu_tgt_pool *op, u32 idx, unsigned int min_count);
-int lov_ost_pool_remove(struct lu_tgt_pool *op, u32 idx);
-int lov_ost_pool_free(struct lu_tgt_pool *op);
-
 /* high level pool methods */
 int lov_pool_new(struct obd_device *obd, char *poolname);
 int lov_pool_del(struct obd_device *obd, char *poolname);
diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c
index 2939d66..4f574ad 100644
--- a/fs/lustre/lov/lov_obd.c
+++ b/fs/lustre/lov/lov_obd.c
@@ -96,7 +96,7 @@ void lov_tgts_putref(struct obd_device *obd)
 			 * being the maximum tgt index for computing the
 			 * mds_max_easize. So we can't shrink it.
 			 */
-			lov_ost_pool_remove(&lov->lov_packed, i);
+			tgt_pool_remove(&lov->lov_packed, i);
 			lov->lov_tgts[i] = NULL;
 			lov->lov_death_row--;
 		}
@@ -545,7 +545,7 @@ static int lov_add_target(struct obd_device *obd, struct obd_uuid *uuidp,
 		return -ENOMEM;
 	}
 
-	rc = lov_ost_pool_add(&lov->lov_packed, index, lov->lov_tgt_size);
+	rc = tgt_pool_add(&lov->lov_packed, index, lov->lov_tgt_size);
 	if (rc) {
 		mutex_unlock(&lov->lov_lock);
 		kfree(tgt);
@@ -764,7 +764,7 @@ int lov_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 	if (rc)
 		goto out_hash;
 
-	rc = lov_ost_pool_init(&lov->lov_packed, 0);
+	rc = tgt_pool_init(&lov->lov_packed, 0);
 	if (rc)
 		goto out_pool;
 
@@ -778,7 +778,7 @@ int lov_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 	return 0;
 
 out_tunables:
-	lov_ost_pool_free(&lov->lov_packed);
+	tgt_pool_free(&lov->lov_packed);
 out_pool:
 	lov_pool_hash_destroy(&lov->lov_pools_hash_body);
 out_hash:
@@ -805,7 +805,7 @@ static int lov_cleanup(struct obd_device *obd)
 		lov_pool_del(obd, pool->pool_name);
 	}
 	lov_pool_hash_destroy(&lov->lov_pools_hash_body);
-	lov_ost_pool_free(&lov->lov_packed);
+	tgt_pool_free(&lov->lov_packed);
 
 	lprocfs_obd_cleanup(obd);
 	if (lov->lov_tgts) {
diff --git a/fs/lustre/lov/lov_pool.c b/fs/lustre/lov/lov_pool.c
index f8f14f9..2617974 100644
--- a/fs/lustre/lov/lov_pool.c
+++ b/fs/lustre/lov/lov_pool.c
@@ -83,7 +83,7 @@ void lov_pool_putref(struct pool_desc *pool)
 	CDEBUG(D_INFO, "pool %p\n", pool);
 	if (atomic_dec_and_test(&pool->pool_refcount)) {
 		LASSERT(list_empty(&pool->pool_list));
-		lov_ost_pool_free(&pool->pool_obds);
+		tgt_pool_free(&pool->pool_obds);
 		kfree_rcu(pool, rcu);
 	}
 }
@@ -230,110 +230,6 @@ static int pool_proc_open(struct inode *inode, struct file *file)
 	.release	= seq_release,
 };
 
-#define LOV_POOL_INIT_COUNT 2
-int lov_ost_pool_init(struct lu_tgt_pool *op, unsigned int count)
-{
-	if (count == 0)
-		count = LOV_POOL_INIT_COUNT;
-	op->op_array = NULL;
-	op->op_count = 0;
-	init_rwsem(&op->op_rw_sem);
-	op->op_size = count * sizeof(op->op_array[0]);
-	op->op_array = kcalloc(count, sizeof(op->op_array[0]),
-			       GFP_KERNEL);
-	if (!op->op_array) {
-		op->op_size = 0;
-		return -ENOMEM;
-	}
-	return 0;
-}
-
-/* Caller must hold write op_rwlock */
-int lov_ost_pool_extend(struct lu_tgt_pool *op, unsigned int min_count)
-{
-	int new_count;
-	u32 *new;
-
-	LASSERT(min_count != 0);
-
-	if (op->op_count * sizeof(op->op_array[0]) < op->op_size)
-		return 0;
-
-	new_count = max_t(u32, min_count,
-			  2 * op->op_size / sizeof(op->op_array[0]));
-	new = kcalloc(new_count, sizeof(op->op_array[0]), GFP_KERNEL);
-	if (!new)
-		return -ENOMEM;
-
-	/* copy old array to new one */
-	memcpy(new, op->op_array, op->op_size);
-	kfree(op->op_array);
-	op->op_array = new;
-	op->op_size = new_count * sizeof(op->op_array[0]);
-	return 0;
-}
-
-int lov_ost_pool_add(struct lu_tgt_pool *op, u32 idx, unsigned int min_count)
-{
-	int rc = 0, i;
-
-	down_write(&op->op_rw_sem);
-
-	rc = lov_ost_pool_extend(op, min_count);
-	if (rc)
-		goto out;
-
-	/* search ost in pool array */
-	for (i = 0; i < op->op_count; i++) {
-		if (op->op_array[i] == idx) {
-			rc = -EEXIST;
-			goto out;
-		}
-	}
-	/* ost not found we add it */
-	op->op_array[op->op_count] = idx;
-	op->op_count++;
-out:
-	up_write(&op->op_rw_sem);
-	return rc;
-}
-
-int lov_ost_pool_remove(struct lu_tgt_pool *op, u32 idx)
-{
-	int i;
-
-	down_write(&op->op_rw_sem);
-
-	for (i = 0; i < op->op_count; i++) {
-		if (op->op_array[i] == idx) {
-			memmove(&op->op_array[i], &op->op_array[i + 1],
-				(op->op_count - i - 1) * sizeof(op->op_array[0]));
-			op->op_count--;
-			up_write(&op->op_rw_sem);
-			return 0;
-		}
-	}
-
-	up_write(&op->op_rw_sem);
-	return -EINVAL;
-}
-
-int lov_ost_pool_free(struct lu_tgt_pool *op)
-{
-	if (op->op_size == 0)
-		return 0;
-
-	down_write(&op->op_rw_sem);
-
-	kfree(op->op_array);
-	op->op_array = NULL;
-	op->op_count = 0;
-	op->op_size = 0;
-
-	up_write(&op->op_rw_sem);
-	return 0;
-}
-
 static void
 pools_hash_exit(void *vpool, void *data)
 {
@@ -373,7 +269,7 @@ int lov_pool_new(struct obd_device *obd, char *poolname)
 	 * up to deletion
 	 */
 	atomic_set(&new_pool->pool_refcount, 1);
-	rc = lov_ost_pool_init(&new_pool->pool_obds, 0);
+	rc = tgt_pool_init(&new_pool->pool_obds, 0);
 	if (rc)
 		goto out_err;
 
@@ -415,7 +311,7 @@ int lov_pool_new(struct obd_device *obd, char *poolname)
 	lov->lov_pool_count--;
 	spin_unlock(&obd->obd_dev_lock);
 	debugfs_remove_recursive(new_pool->pool_debugfs_entry);
-	lov_ost_pool_free(&new_pool->pool_obds);
+	tgt_pool_free(&new_pool->pool_obds);
 	kfree(new_pool);
 
 	return rc;
@@ -490,7 +386,7 @@ int lov_pool_add(struct obd_device *obd, char *poolname, char *ostname)
 		goto out;
 	}
 
-	rc = lov_ost_pool_add(&pool->pool_obds, lov_idx, lov->lov_tgt_size);
+	rc = tgt_pool_add(&pool->pool_obds, lov_idx, lov->lov_tgt_size);
 	if (rc)
 		goto out;
 
@@ -542,7 +438,7 @@ int lov_pool_remove(struct obd_device *obd, char *poolname, char *ostname)
 		goto out;
 	}
 
-	lov_ost_pool_remove(&pool->pool_obds, lov_idx);
+	tgt_pool_remove(&pool->pool_obds, lov_idx);
 
 	CDEBUG(D_CONFIG, "%s removed from " LOV_POOLNAMEF "\n", ostname,
 	       poolname);
diff --git a/fs/lustre/obdclass/Makefile b/fs/lustre/obdclass/Makefile
index de37a89..1c46ea4 100644
--- a/fs/lustre/obdclass/Makefile
+++ b/fs/lustre/obdclass/Makefile
@@ -8,5 +8,5 @@ obdclass-y := llog.o llog_cat.o llog_obd.o llog_swab.o class_obd.o \
 	      lustre_handles.o lustre_peer.o statfs_pack.o linkea.o \
 	      obdo.o obd_config.o obd_mount.o lu_object.o lu_ref.o \
 	      cl_object.o cl_page.o cl_lock.o cl_io.o kernelcomm.o \
-	      jobid.o integrity.o obd_cksum.o lu_tgt_descs.o \
-	      range_lock.o
+	      jobid.o integrity.o obd_cksum.o range_lock.o \
+	      lu_tgt_descs.o lu_tgt_pool.o
diff --git a/fs/lustre/obdclass/lu_tgt_pool.c b/fs/lustre/obdclass/lu_tgt_pool.c
new file mode 100644
index 0000000..fc5e298
--- /dev/null
+++ b/fs/lustre/obdclass/lu_tgt_pool.c
@@ -0,0 +1,241 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright  2008 Sun Microsystems, Inc. All rights reserved
+ * Use is subject to license terms.
+ *
+ * Copyright (c) 2012, 2017, Intel Corporation.
+ */
+/*
+ * This file is part of Lustre, http://www.lustre.org/
+ * Lustre is a trademark of Sun Microsystems, Inc.
+ */
+/*
+ * lustre/target/tgt_pool.c
+ *
+ * This file handles creation, lookup, and removal of pools themselves, as
+ * well as adding and removing targets to pools.
+ *
+ * Author: Jacques-Charles LAFOUCRIERE <jc.lafoucriere@cea.fr>
+ * Author: Alex Lyashkov <Alexey.Lyashkov@Sun.COM>
+ * Author: Nathaniel Rutman <Nathan.Rutman@Sun.COM>
+ */
+
+#define DEBUG_SUBSYSTEM S_CLASS
+
+#include <linux/libcfs/libcfs_private.h>
+#include <obd_target.h>
+#include <obd_support.h>
+
+/**
+ * Initialize the pool data structures at startup.
+ *
+ * Allocate and initialize the pool data structures with the specified
+ * array size.  If pool count is not specified (\a count == 0), then
+ * POOL_INIT_COUNT will be used.  Allocating a non-zero initial array
+ * size avoids the need to reallocate as new pools are added.
+ *
+ * @op		pool structure
+ * @count	initial size of the target op_array[] array
+ *
+ * Return:	0 indicates successful pool initialization
+ *		negative error number on failure
+ */
+#define POOL_INIT_COUNT 2
+int tgt_pool_init(struct lu_tgt_pool *op, unsigned int count)
+{
+	if (count == 0)
+		count = POOL_INIT_COUNT;
+	op->op_array = NULL;
+	op->op_count = 0;
+	init_rwsem(&op->op_rw_sem);
+	op->op_size = count * sizeof(op->op_array[0]);
+	op->op_array = kcalloc(count, sizeof(op->op_array[0]),
+			       GFP_KERNEL);
+	if (!op->op_array) {
+		op->op_size = 0;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(tgt_pool_init);
+
+/**
+ * Increase the op_array size to hold more targets in this pool.
+ *
+ * The size is increased to at least \a min_count, but may be larger
+ * for an existing pool since ->op_array[] is growing exponentially.
+ * Caller must hold write op_rwlock.
+ *
+ * @op		pool structure
+ * @min_count	minimum number of entries to handle
+ *
+ * Return:	0 on success
+ *		negative error number on failure.
+ */
+int tgt_pool_extend(struct lu_tgt_pool *op, unsigned int min_count)
+{
+	u32 *new;
+	u32 new_size;
+
+	LASSERT(min_count != 0);
+
+	if (op->op_count * sizeof(op->op_array[0]) < op->op_size)
+		return 0;
+
+	new_size = max_t(u32, min_count * sizeof(op->op_array[0]),
+			 2 * op->op_size);
+	new = kzalloc(new_size, GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+
+	/* copy old array to new one */
+	memcpy(new, op->op_array, op->op_size);
+	kfree(op->op_array);
+	op->op_array = new;
+	op->op_size = new_size;
+
+	return 0;
+}
+EXPORT_SYMBOL(tgt_pool_extend);
+
+/**
+ * Add a new target to an existing pool.
+ *
+ * Add a new target device to the pool previously created and returned by
+ * lod_pool_new().  Each target can only be in each pool at most one time.
+ *
+ * @op		target pool to add new entry
+ * @idx		pool index number to add to the \a op array
+ * @min_count	minimum number of entries to expect in the pool
+ *
+ * Return:	0 if target could be added to the pool
+ *		negative error if target \a idx was not added
+ */
+int tgt_pool_add(struct lu_tgt_pool *op, u32 idx, unsigned int min_count)
+{
+	unsigned int i;
+	int rc = 0;
+
+	down_write(&op->op_rw_sem);
+
+	rc = tgt_pool_extend(op, min_count);
+	if (rc)
+		goto out;
+
+	/* search ost in pool array */
+	for (i = 0; i < op->op_count; i++) {
+		if (op->op_array[i] == idx) {
+			rc = -EEXIST;
+			goto out;
+		}
+	}
+	/* ost not found we add it */
+	op->op_array[op->op_count] = idx;
+	op->op_count++;
+out:
+	up_write(&op->op_rw_sem);
+	return rc;
+}
+EXPORT_SYMBOL(tgt_pool_add);
+
+/**
+ * Remove an existing pool from the system.
+ *
+ * The specified pool must have previously been allocated by
+ * lod_pool_new() and not have any target members in the pool.
+ * If the removed target is not the last, compact the array
+ * to remove empty spaces.
+ *
+ * @op		pointer to the original data structure
+ * @idx		target index to be removed
+ *
+ * Return:	0 on success
+ *		negative error number on failure
+ */
+int tgt_pool_remove(struct lu_tgt_pool *op, u32 idx)
+{
+	unsigned int i;
+
+	down_write(&op->op_rw_sem);
+
+	for (i = 0; i < op->op_count; i++) {
+		if (op->op_array[i] == idx) {
+			memmove(&op->op_array[i], &op->op_array[i + 1],
+				(op->op_count - i - 1) *
+				sizeof(op->op_array[0]));
+			op->op_count--;
+			up_write(&op->op_rw_sem);
+			return 0;
+		}
+	}
+
+	up_write(&op->op_rw_sem);
+	return -EINVAL;
+}
+EXPORT_SYMBOL(tgt_pool_remove);
+
+int tgt_check_index(int idx, struct lu_tgt_pool *osts)
+{
+	int rc = 0, i;
+
+	down_read(&osts->op_rw_sem);
+	for (i = 0; i < osts->op_count; i++) {
+		if (osts->op_array[i] == idx)
+			goto out;
+	}
+	rc = -ENOENT;
+out:
+	up_read(&osts->op_rw_sem);
+	return rc;
+}
+EXPORT_SYMBOL(tgt_check_index);
+
+/**
+ * Free the pool after it was emptied and removed from /proc.
+ *
+ * Note that all of the child/target entries referenced by this pool
+ * must have been removed by lod_ost_pool_remove() before it can be
+ * deleted from memory.
+ *
+ * @op		pool to be freed.
+ *
+ * Return:	0 on success or if pool was already freed
+ */
+int tgt_pool_free(struct lu_tgt_pool *op)
+{
+	if (op->op_size == 0)
+		return 0;
+
+	down_write(&op->op_rw_sem);
+
+	kfree(op->op_array);
+	op->op_array = NULL;
+	op->op_count = 0;
+	op->op_size = 0;
+
+	up_write(&op->op_rw_sem);
+	return 0;
+}
+EXPORT_SYMBOL(tgt_pool_free);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 32/49] lustre: quota: make used for pool correct
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (30 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 31/49] lustre: use tgt_pool for lov layer James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 33/49] lustre: quota: call rhashtable_lookup near params decl James Simmons
                   ` (16 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Sergey Cheremencev, Lustre Development List

From: Sergey Cheremencev <sergey.cheremencev@hpe.com>

Before this patch used space for quota pool
was a sum of a space used by user at all OSTs
in a system. Now it is fixed and lfs quota --pool
takes into account only OSTs form the pool.
With option -v it also shows only OSTs from the pool.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13359
Lustre-commit: 6b9f849fd5f49ce6 ("LU-13359 quota: make used for pool correct")
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/39298
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c        | 13 +++++++++++--
 fs/lustre/lov/lov_internal.h |  3 +++
 fs/lustre/lov/lov_obd.c      | 37 ++++++++++++++++++++++++++++++++-----
 fs/lustre/lov/lov_pool.c     |  2 +-
 4 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index c42cff7..5dc93f4 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -1169,14 +1169,22 @@ int quotactl_ioctl(struct ll_sb_info *sbi, struct if_quotactl *qctl)
 		    !(oqctl->qc_dqblk.dqb_valid & QIF_SPACE) &&
 		    !oqctl->qc_dqblk.dqb_curspace) {
 			struct obd_quotactl *oqctl_tmp;
+			int qctl_len = sizeof(*oqctl_tmp) + LOV_MAXPOOLNAME + 1;
 
-			oqctl_tmp = kzalloc(sizeof(*oqctl_tmp), GFP_NOFS);
+			oqctl_tmp = kzalloc(qctl_len, GFP_NOFS);
 			if (!oqctl_tmp) {
 				rc = -ENOMEM;
 				goto out;
 			}
 
-			oqctl_tmp->qc_cmd = Q_GETOQUOTA;
+			if (cmd == LUSTRE_Q_GETQUOTAPOOL) {
+				oqctl_tmp->qc_cmd = LUSTRE_Q_GETQUOTAPOOL;
+				memcpy(oqctl_tmp->qc_poolname,
+				       qctl->qc_poolname,
+				       LOV_MAXPOOLNAME + 1);
+			} else {
+				oqctl_tmp->qc_cmd = Q_GETOQUOTA;
+			}
 			oqctl_tmp->qc_id = oqctl->qc_id;
 			oqctl_tmp->qc_type = oqctl->qc_type;
 
@@ -1190,6 +1198,7 @@ int quotactl_ioctl(struct ll_sb_info *sbi, struct if_quotactl *qctl)
 			}
 
 			/* collect space & inode usage from MDTs */
+			oqctl_tmp->qc_cmd = Q_GETOQUOTA;
 			oqctl_tmp->qc_dqblk.dqb_curspace = 0;
 			oqctl_tmp->qc_dqblk.dqb_curinodes = 0;
 			rc = obd_quotactl(sbi->ll_md_exp, oqctl_tmp);
diff --git a/fs/lustre/lov/lov_internal.h b/fs/lustre/lov/lov_internal.h
index 2e1e2dd..d26e68b 100644
--- a/fs/lustre/lov/lov_internal.h
+++ b/fs/lustre/lov/lov_internal.h
@@ -379,4 +379,7 @@ static inline void lov_lsm2layout(struct lov_stripe_md *lsm,
 		ol->ol_comp_id = 0;
 	}
 }
+
+extern const struct rhashtable_params pools_hash_params;
+extern void lov_pool_putref(struct pool_desc *pool);
 #endif
diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c
index 4f574ad..95fcd57 100644
--- a/fs/lustre/lov/lov_obd.c
+++ b/fs/lustre/lov/lov_obd.c
@@ -1220,14 +1220,34 @@ static int lov_quotactl(struct obd_device *obd, struct obd_export *exp,
 {
 	struct lov_obd *lov = &obd->u.lov;
 	struct lov_tgt_desc *tgt;
+	struct pool_desc *pool = NULL;
 	u64 curspace = 0;
 	u64 bhardlimit = 0;
 	int i, rc = 0;
 
 	if (oqctl->qc_cmd != Q_GETOQUOTA &&
-	    oqctl->qc_cmd != LUSTRE_Q_SETQUOTA) {
-		CERROR("bad quota opc %x for lov obd\n", oqctl->qc_cmd);
-		return -EFAULT;
+	    oqctl->qc_cmd != LUSTRE_Q_SETQUOTA &&
+	    oqctl->qc_cmd != LUSTRE_Q_GETQUOTAPOOL) {
+		rc = -EFAULT;
+		CERROR("%s: bad quota opc %x for lov obd: rc = %d\n",
+		       obd->obd_name, oqctl->qc_cmd, rc);
+		return rc;
+	}
+
+	if (oqctl->qc_cmd == LUSTRE_Q_GETQUOTAPOOL) {
+		rcu_read_lock();
+		pool = rhashtable_lookup(&lov->lov_pools_hash_body,
+					 oqctl->qc_poolname,
+					 pools_hash_params);
+		if (pool && !atomic_inc_not_zero(&pool->pool_refcount))
+			pool = NULL;
+		rcu_read_unlock();
+		if (!pool)
+			return -ENOENT;
+		/* Set Q_GETOQUOTA back as targets report it's own
+		 * usage and doesn't care about pools
+		 */
+		oqctl->qc_cmd = Q_GETOQUOTA;
 	}
 
 	/* for lov tgt */
@@ -1240,11 +1260,16 @@ static int lov_quotactl(struct obd_device *obd, struct obd_export *exp,
 		if (!tgt)
 			continue;
 
+		if (pool &&
+		    tgt_check_index(tgt->ltd_index, &pool->pool_obds))
+			continue;
+
 		if (!tgt->ltd_active || tgt->ltd_reap) {
 			if (oqctl->qc_cmd == Q_GETOQUOTA &&
 			    lov->lov_tgts[i]->ltd_activate) {
-				rc = -EREMOTEIO;
-				CERROR("ost %d is inactive\n", i);
+				rc = -ENETDOWN;
+				CERROR("%s: ost %d is inactive: rc = %d\n",
+				       obd->obd_name, i, rc);
 			} else {
 				CDEBUG(D_HA, "ost %d is inactive\n", i);
 			}
@@ -1264,6 +1289,8 @@ static int lov_quotactl(struct obd_device *obd, struct obd_export *exp,
 		}
 	}
 	lov_tgts_putref(obd);
+	if (pool)
+		lov_pool_putref(pool);
 
 	if (oqctl->qc_cmd == Q_GETOQUOTA) {
 		oqctl->qc_dqblk.dqb_curspace = curspace;
diff --git a/fs/lustre/lov/lov_pool.c b/fs/lustre/lov/lov_pool.c
index 2617974..de8aed9 100644
--- a/fs/lustre/lov/lov_pool.c
+++ b/fs/lustre/lov/lov_pool.c
@@ -63,7 +63,7 @@ static int pool_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
 	return strcmp(pool_name, pool->pool_name);
 }
 
-static const struct rhashtable_params pools_hash_params = {
+const struct rhashtable_params pools_hash_params = {
 	.key_len	= 1, /* actually variable */
 	.key_offset	= offsetof(struct pool_desc, pool_name),
 	.head_offset	= offsetof(struct pool_desc, pool_hash),
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 33/49] lustre: quota: call rhashtable_lookup near params decl
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (31 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 32/49] lustre: quota: make used for pool correct James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 34/49] lustre: lov: cancel layout lock on replay deadlock James Simmons
                   ` (15 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

rhashtable_lookup() is an inline function which depends - for
performancs - on the 'rhashtable_params' being visible and
consnt.  So it should only be called in the same file that
declared the params.

recent patch make pools_hash_params an external variable and calls
rhashtable_lookup from a separate file, which will break the
optimisation.

So add lov_pool_find() and use it to maintainer optimization.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13359
Fixes: 226a0a32c6b3 ("lustre: quota: make used for pool correct")
Lustre-commit: 1d116c8ff68fc784 ("LU-13359 quota: call rhashtable_lookup near params decl")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39676
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_internal.h |  4 ++--
 fs/lustre/lov/lov_obd.c      |  8 +-------
 fs/lustre/lov/lov_pool.c     | 18 +++++++++++++++++-
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/lustre/lov/lov_internal.h b/fs/lustre/lov/lov_internal.h
index d26e68b..e58235a 100644
--- a/fs/lustre/lov/lov_internal.h
+++ b/fs/lustre/lov/lov_internal.h
@@ -380,6 +380,6 @@ static inline void lov_lsm2layout(struct lov_stripe_md *lsm,
 	}
 }
 
-extern const struct rhashtable_params pools_hash_params;
-extern void lov_pool_putref(struct pool_desc *pool);
+struct pool_desc *lov_pool_find(struct obd_device *obd, char *poolname);
+void lov_pool_putref(struct pool_desc *pool);
 #endif
diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c
index 95fcd57..29b0645 100644
--- a/fs/lustre/lov/lov_obd.c
+++ b/fs/lustre/lov/lov_obd.c
@@ -1235,13 +1235,7 @@ static int lov_quotactl(struct obd_device *obd, struct obd_export *exp,
 	}
 
 	if (oqctl->qc_cmd == LUSTRE_Q_GETQUOTAPOOL) {
-		rcu_read_lock();
-		pool = rhashtable_lookup(&lov->lov_pools_hash_body,
-					 oqctl->qc_poolname,
-					 pools_hash_params);
-		if (pool && !atomic_inc_not_zero(&pool->pool_refcount))
-			pool = NULL;
-		rcu_read_unlock();
+		pool = lov_pool_find(obd, oqctl->qc_poolname);
 		if (!pool)
 			return -ENOENT;
 		/* Set Q_GETOQUOTA back as targets report it's own
diff --git a/fs/lustre/lov/lov_pool.c b/fs/lustre/lov/lov_pool.c
index de8aed9..8fbc6ee 100644
--- a/fs/lustre/lov/lov_pool.c
+++ b/fs/lustre/lov/lov_pool.c
@@ -63,7 +63,7 @@ static int pool_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
 	return strcmp(pool_name, pool->pool_name);
 }
 
-const struct rhashtable_params pools_hash_params = {
+static const struct rhashtable_params pools_hash_params = {
 	.key_len	= 1, /* actually variable */
 	.key_offset	= offsetof(struct pool_desc, pool_name),
 	.head_offset	= offsetof(struct pool_desc, pool_hash),
@@ -317,6 +317,22 @@ int lov_pool_new(struct obd_device *obd, char *poolname)
 	return rc;
 }
 
+struct pool_desc *lov_pool_find(struct obd_device *obd, char *poolname)
+{
+	struct pool_desc *pool;
+	struct lov_obd *lov = &obd->u.lov;
+
+	rcu_read_lock();
+	pool = rhashtable_lookup(&lov->lov_pools_hash_body,
+				 poolname,
+				 pools_hash_params);
+	if (pool && !atomic_inc_not_zero(&pool->pool_refcount))
+		pool = NULL;
+	rcu_read_unlock();
+
+	return pool;
+}
+
 int lov_pool_del(struct obd_device *obd, char *poolname)
 {
 	struct lov_obd *lov;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 34/49] lustre: lov: cancel layout lock on replay deadlock
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (32 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 33/49] lustre: quota: call rhashtable_lookup near params decl James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 35/49] lustre: obdclass: Protect cl_env_percpu[] James Simmons
                   ` (14 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Vitaly Fertman, Lustre Development List

From: Vitaly Fertman <c17818@cray.com>

layout locks are not replayed and instead cancelled as unused, what
requires to take lov_conf_lock. the semaphore may be already taken by
cl_lock_flush() which prepares a new IO which is not be able to be
sent to MDS as it is in the recovery.

HPE-bug-id: LUS-9232
WC-bug-id: https://jira.whamcloud.com/browse/LU-14182
Lustre-commit: 68fb53ad4bb2dbc ("LU-14182 lov: cancel layout lock on replay deadlock")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/40867
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_support.h |  1 +
 fs/lustre/ldlm/ldlm_request.c   |  2 ++
 fs/lustre/llite/namei.c         |  2 ++
 fs/lustre/lov/lov_cl_internal.h | 10 +++++++---
 fs/lustre/lov/lov_object.c      | 44 ++++++++++++++++++++++++-----------------
 5 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index 152f95c..b2f97f1 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -308,6 +308,7 @@
 
 #define OBD_FAIL_LDLM_GRANT_CHECK			0x32a
 #define OBD_FAIL_LDLM_LOCAL_CANCEL_PAUSE		0x32c
+#define OBD_FAIL_LDLM_REPLAY_PAUSE			0x32e
 
 /* LOCKLESS IO */
 #define OBD_FAIL_LDLM_SET_CONTENTION			0x385
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index d8ca744..3527678 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -2220,6 +2220,8 @@ static void ldlm_cancel_unused_locks_for_replay(struct ldlm_namespace *ns)
 	       "Dropping as many unused locks as possible before replay for namespace %s (%d)\n",
 	       ldlm_ns_name(ns), ns->ns_nr_unused);
 
+	OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_REPLAY_PAUSE, cfs_fail_val);
+
 	/*
 	 * We don't need to care whether or not LRU resize is enabled
 	 * because the LDLM_LRU_FLAG_NO_WAIT policy doesn't use the
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 1095fa9..654d065 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -204,6 +204,8 @@ static int ll_dom_lock_cancel(struct inode *inode, struct ldlm_lock *lock)
 	if (IS_ERR(env))
 		return PTR_ERR(env);
 
+	OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_REPLAY_PAUSE, cfs_fail_val);
+
 	/* reach MDC layer to flush data under  the DoM ldlm lock */
 	rc = cl_object_flush(env, lli->lli_clob, lock);
 	if (rc == -ENODATA) {
diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h
index e9ef5aa..f231be9 100644
--- a/fs/lustre/lov/lov_cl_internal.h
+++ b/fs/lustre/lov/lov_cl_internal.h
@@ -251,6 +251,11 @@ struct lov_mirror_entry {
 	unsigned short	lre_end;	/* end index of this mirror */
 };
 
+enum lov_object_flags {
+	/* Layout is invalid, set when layout lock is lost */
+	LO_LAYOUT_INVALID	= 0x1,
+};
+
 /**
  * lov-specific file state.
  *
@@ -281,10 +286,9 @@ struct lov_object {
 	 */
 	enum lov_layout_type	lo_type;
 	/**
-	 * True if layout is invalid. This bit is cleared when layout lock
-	 * is lost.
+	 * Object flags.
 	 */
-	bool			lo_layout_invalid;
+	unsigned long		lo_obj_flags;
 	/**
 	 * How many IOs are on going on this object. Layout can be changed
 	 * only if there is no active IO.
diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c
index abe1cee..db4070f 100644
--- a/fs/lustre/lov/lov_object.c
+++ b/fs/lustre/lov/lov_object.c
@@ -177,7 +177,7 @@ static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
 		old_obj = lu_object_locate(&parent->coh_lu, &lov_device_type);
 		LASSERT(old_obj);
 		old_lov = cl2lov(lu2cl(old_obj));
-		if (old_lov->lo_layout_invalid) {
+		if (test_bit(LO_LAYOUT_INVALID, &old_lov->lo_obj_flags)) {
 			/* the object's layout has already changed but isn't
 			 * refreshed
 			 */
@@ -628,7 +628,7 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev,
 	LASSERT(lsm->lsm_entry_count > 0);
 	LASSERT(!lov->lo_lsm);
 	lov->lo_lsm = lsm_addref(lsm);
-	lov->lo_layout_invalid = true;
+	set_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags);
 
 	dump_lsm(D_INODE, lsm);
 
@@ -910,7 +910,8 @@ static void lov_fini_released(const struct lu_env *env, struct lov_object *lov,
 static int lov_print_empty(const struct lu_env *env, void *cookie,
 			   lu_printer_t p, const struct lu_object *o)
 {
-	(*p)(env, cookie, "empty %d\n", lu2lov(o)->lo_layout_invalid);
+	(*p)(env, cookie, "empty %d\n",
+	     test_bit(LO_LAYOUT_INVALID, &lu2lov(o)->lo_obj_flags));
 	return 0;
 }
 
@@ -923,8 +924,8 @@ static int lov_print_composite(const struct lu_env *env, void *cookie,
 
 	(*p)(env, cookie, "entries: %d, %s, lsm{%p 0x%08X %d %u}:\n",
 	     lsm->lsm_entry_count,
-	     lov->lo_layout_invalid ? "invalid" : "valid", lsm,
-	     lsm->lsm_magic, atomic_read(&lsm->lsm_refc),
+	     test_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags) ? "invalid" :
+	     "valid", lsm, lsm->lsm_magic, atomic_read(&lsm->lsm_refc),
 	     lsm->lsm_layout_gen);
 
 	for (i = 0; i < lsm->lsm_entry_count; i++) {
@@ -953,8 +954,8 @@ static int lov_print_released(const struct lu_env *env, void *cookie,
 
 	(*p)(env, cookie,
 	     "released: %s, lsm{%p 0x%08X %d %u}:\n",
-	     lov->lo_layout_invalid ? "invalid" : "valid", lsm,
-	     lsm->lsm_magic, atomic_read(&lsm->lsm_refc),
+	     test_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags) ? "invalid" :
+	     "valid", lsm, lsm->lsm_magic, atomic_read(&lsm->lsm_refc),
 	     lsm->lsm_layout_gen);
 	return 0;
 }
@@ -967,7 +968,8 @@ static int lov_print_foreign(const struct lu_env *env, void *cookie,
 
 	(*p)(env, cookie,
 		"foreign: %s, lsm{%p 0x%08X %d %u}:\n",
-		lov->lo_layout_invalid ? "invalid" : "valid", lsm,
+		test_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags) ?
+		"invalid" : "valid", lsm,
 		lsm->lsm_magic, atomic_read(&lsm->lsm_refc),
 		lsm->lsm_layout_gen);
 	(*p)(env, cookie,
@@ -1352,15 +1354,15 @@ static int lov_conf_set(const struct lu_env *env, struct cl_object *obj,
 		dump_lsm(D_INODE, lsm);
 	}
 
-	lov_conf_lock(lov);
 	if (conf->coc_opc == OBJECT_CONF_INVALIDATE) {
-		lov->lo_layout_invalid = true;
+		set_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags);
 		result = 0;
-		goto out;
+		goto out_lsm;
 	}
 
+	lov_conf_lock(lov);
 	if (conf->coc_opc == OBJECT_CONF_WAIT) {
-		if (lov->lo_layout_invalid &&
+		if (test_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags) &&
 		    atomic_read(&lov->lo_active_ios) > 0) {
 			lov_conf_unlock(lov);
 			result = lov_layout_wait(env, lov);
@@ -1378,26 +1380,31 @@ static int lov_conf_set(const struct lu_env *env, struct cl_object *obj,
 	     (lov->lo_lsm->lsm_entries[0]->lsme_pattern ==
 	      lsm->lsm_entries[0]->lsme_pattern))) {
 		/* same version of layout */
-		lov->lo_layout_invalid = false;
+		clear_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags);
 		result = 0;
 		goto out;
 	}
 
 	/* will change layout - check if there still exists active IO. */
 	if (atomic_read(&lov->lo_active_ios) > 0) {
-		lov->lo_layout_invalid = true;
+		set_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags);
 		result = -EBUSY;
 		goto out;
 	}
 
 	result = lov_layout_change(env, lov, lsm, conf);
-	lov->lo_layout_invalid = result != 0;
+	if (result)
+		set_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags);
+	else
+		clear_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags);
 
 out:
 	lov_conf_unlock(lov);
+out_lsm:
 	lov_lsm_put(lsm);
-	CDEBUG(D_INODE, DFID " lo_layout_invalid=%d\n",
-	       PFID(lu_object_fid(lov2lu(lov))), lov->lo_layout_invalid);
+	CDEBUG(D_INODE, DFID " lo_layout_invalid=%u\n",
+	       PFID(lu_object_fid(lov2lu(lov))),
+	       test_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags));
 	return result;
 }
 
@@ -2254,7 +2261,8 @@ static struct lov_stripe_md *lov_lsm_addref(struct lov_object *lov)
 		lsm = lsm_addref(lov->lo_lsm);
 		CDEBUG(D_INODE, "lsm %p addref %d/%d by %p.\n",
 		       lsm, atomic_read(&lsm->lsm_refc),
-		       lov->lo_layout_invalid, current);
+		       test_bit(LO_LAYOUT_INVALID, &lov->lo_obj_flags),
+		       current);
 	}
 	lov_conf_thaw(lov);
 	return lsm;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 35/49] lustre: obdclass: Protect cl_env_percpu[]
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (33 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 34/49] lustre: lov: cancel layout lock on replay deadlock James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 36/49] lnet: libcfs: discard cfs_trace_console_buffers[] James Simmons
                   ` (13 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Etienne AUJAMES <eaujames@ddn.com>

cl_env_percpu is not protected against multi client mounts on the
same node: "keys_fill" could be called with the same cl_env_percpu
context by several mount processes (race on lu_context.lc_value).

This patch add a mutex for cl_env_percpu to proctect contexts
"refill".

WC-bug-id: https://jira.whamcloud.com/browse/LU-14110
Lustre-commit: 881551fbb733569 ("LU-14110 obdclass: Protect cl_env_percpu[]")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/40565
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_support.h | 1 +
 fs/lustre/llite/llite_lib.c     | 2 ++
 fs/lustre/obdclass/cl_object.c  | 3 +++
 3 files changed, 6 insertions(+)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index b2f97f1..027667f 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -477,6 +477,7 @@
 #define OBD_FAIL_LLITE_PCC_ATTACH_PAUSE			0x1414
 #define OBD_FAIL_LLITE_SHORT_COMMIT			0x1415
 #define OBD_FAIL_LLITE_CREATE_FILE_PAUSE2		0x1416
+#define OBD_FAIL_LLITE_RACE_MOUNT			0x1417
 
 #define OBD_FAIL_FID_INDIR				0x1501
 #define OBD_FAIL_FID_INLMA				0x1502
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index e15962e..2f2d9f0 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1047,6 +1047,8 @@ int ll_fill_super(struct super_block *sb)
 
 	CDEBUG(D_VFSTRACE, "VFS Op: sb %p\n", sb);
 
+	OBD_RACE(OBD_FAIL_LLITE_RACE_MOUNT);
+
 	cfg = kzalloc(sizeof(*cfg), GFP_NOFS);
 	if (!cfg) {
 		err = -ENOMEM;
diff --git a/fs/lustre/obdclass/cl_object.c b/fs/lustre/obdclass/cl_object.c
index 86434f1..aa3d928 100644
--- a/fs/lustre/obdclass/cl_object.c
+++ b/fs/lustre/obdclass/cl_object.c
@@ -823,6 +823,7 @@ void cl_lvb2attr(struct cl_attr *attr, const struct ost_lvb *lvb)
 EXPORT_SYMBOL(cl_lvb2attr);
 
 static struct cl_env cl_env_percpu[NR_CPUS];
+static DEFINE_MUTEX(cl_env_percpu_mutex);
 
 static int cl_env_percpu_init(void)
 {
@@ -888,8 +889,10 @@ static void cl_env_percpu_refill(void)
 {
 	int i;
 
+	mutex_lock(&cl_env_percpu_mutex);
 	for_each_possible_cpu(i)
 		lu_env_refill(&cl_env_percpu[i].ce_lu);
+	mutex_unlock(&cl_env_percpu_mutex);
 }
 
 void cl_env_percpu_put(struct lu_env *env)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 36/49] lnet: libcfs: discard cfs_trace_console_buffers[]
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (34 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 35/49] lustre: obdclass: Protect cl_env_percpu[] James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 37/49] lnet: libcfs: discard cfs_trace_copyin_string() James Simmons
                   ` (12 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

cfs_trace_console_buffers[] is a collection of buffers into which
various messages are formatted - with vscnprintf or similar - and
which are then passed to cfs_print_to_console which adds more
formatted information.

The two levels of formatting can instead be achieved using the "%pV"
format which takes a format-and-args.  If we do this, we don't need
cfs_trace_console_buffers[] and more.

One minor drawback is that cfs_tty_write_message() requires a final
string to print, not a format plus arguments.  This is only minor
because there is precisely one message that is ever sent to
cfs_tty_write_message(), and it contains no formatting.  So we now
generate a warning if the string passed with D_TTY ever contains
formatting, and just print that string ignoring any formatting.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14428
Lustre-commit: 95aa713f78e7acf9 ("LU-14428 libcfs: discard cfs_trace_console_buffers[]")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41489
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/libcfs/tracefile.c | 159 ++++++++++++++++++++++++--------------------
 1 file changed, 88 insertions(+), 71 deletions(-)

diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c
index 32bab98..e3a063f 100644
--- a/net/lnet/libcfs/tracefile.c
+++ b/net/lnet/libcfs/tracefile.c
@@ -44,10 +44,11 @@
 #include <linux/kthread.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
+#include <linux/poll.h>
+#include <linux/tty.h>
 #include <linux/uaccess.h>
 #include "tracefile.h"
 
-#define CFS_TRACE_CONSOLE_BUFFER_SIZE	1024
 
 enum cfs_trace_buf_type {
 	CFS_TCD_TYPE_PROC = 0,
@@ -58,7 +59,6 @@ enum cfs_trace_buf_type {
 
 union cfs_trace_data_union (*cfs_trace_data[CFS_TCD_TYPE_CNT])[NR_CPUS] __cacheline_aligned;
 
-char *cfs_trace_console_buffers[NR_CPUS][CFS_TCD_TYPE_CNT];
 char cfs_tracefile[TRACEFILE_NAME_SIZE];
 long long cfs_tracefile_size = CFS_TRACEFILE_SIZE;
 static struct tracefiled_ctl trace_tctl;
@@ -173,14 +173,6 @@ enum cfs_trace_buf_type cfs_trace_buf_idx_get(void)
 	return CFS_TCD_TYPE_PROC;
 }
 
-static inline char *cfs_trace_get_console_buffer(void)
-{
-	unsigned int i = get_cpu();
-	unsigned int j = cfs_trace_buf_idx_get();
-
-	return cfs_trace_console_buffers[i][j];
-}
-
 static inline struct cfs_trace_cpu_data *
 cfs_trace_get_tcd(void)
 {
@@ -373,9 +365,44 @@ static void cfs_set_ptldebug_header(struct ptldebug_header *header,
 	header->ph_extern_pid = 0;
 }
 
-static void cfs_print_to_console(struct ptldebug_header *hdr, int mask,
-				 const char *buf, int len, const char *file,
-				 const char *fn)
+/**
+ * tty_write_msg - write a message to a certain tty, not just the console.
+ * @tty: the destination tty_struct
+ * @msg: the message to write
+ *
+ * tty_write_message is not exported, so write a same function for it
+ *
+ */
+static void tty_write_msg(struct tty_struct *tty, const char *msg)
+{
+	mutex_lock(&tty->atomic_write_lock);
+	tty_lock(tty);
+	if (tty->ops->write && tty->count > 0)
+		tty->ops->write(tty, msg, strlen(msg));
+	tty_unlock(tty);
+	mutex_unlock(&tty->atomic_write_lock);
+	wake_up_interruptible_poll(&tty->write_wait, POLLOUT);
+}
+
+static void cfs_tty_write_message(const char *prefix, int mask, const char *msg)
+{
+	struct tty_struct *tty;
+
+	tty = get_current_tty();
+	if (!tty)
+		return;
+
+	tty_write_msg(tty, prefix);
+	if ((mask & D_EMERG) || (mask & D_ERROR))
+		tty_write_msg(tty, "Error");
+	tty_write_msg(tty, ": ");
+	tty_write_msg(tty, msg);
+	tty_kref_put(tty);
+}
+
+static void cfs_vprint_to_console(struct ptldebug_header *hdr, int mask,
+				  struct va_format *vaf, const char *file,
+				  const char *fn)
 {
 	char *prefix = "Lustre";
 
@@ -384,29 +411,46 @@ static void cfs_print_to_console(struct ptldebug_header *hdr, int mask,
 
 	if (mask & D_CONSOLE) {
 		if (mask & D_EMERG)
-			pr_emerg("%sError: %.*s", prefix, len, buf);
+			pr_emerg("%sError: %pV", prefix, vaf);
 		else if (mask & D_ERROR)
-			pr_err("%sError: %.*s", prefix, len, buf);
+			pr_err("%sError: %pV", prefix, vaf);
 		else if (mask & D_WARNING)
-			pr_warn("%s: %.*s", prefix, len, buf);
+			pr_warn("%s: %pV", prefix, vaf);
 		else if (mask & libcfs_printk)
-			pr_info("%s: %.*s", prefix, len, buf);
+			pr_info("%s: %pV", prefix, vaf);
 	} else {
 		if (mask & D_EMERG)
-			pr_emerg("%sError: %d:%d:(%s:%d:%s()) %.*s", prefix,
+			pr_emerg("%sError: %d:%d:(%s:%d:%s()) %pV", prefix,
 				 hdr->ph_pid, hdr->ph_extern_pid, file,
-				 hdr->ph_line_num, fn, len, buf);
+				 hdr->ph_line_num, fn, vaf);
 		else if (mask & D_ERROR)
-			pr_err("%sError: %d:%d:(%s:%d:%s()) %.*s", prefix,
+			pr_err("%sError: %d:%d:(%s:%d:%s()) %pV", prefix,
 			       hdr->ph_pid, hdr->ph_extern_pid, file,
-			       hdr->ph_line_num, fn, len, buf);
+			       hdr->ph_line_num, fn, vaf);
 		else if (mask & D_WARNING)
-			pr_warn("%s: %d:%d:(%s:%d:%s()) %.*s", prefix,
+			pr_warn("%s: %d:%d:(%s:%d:%s()) %pV", prefix,
 				hdr->ph_pid, hdr->ph_extern_pid, file,
-				hdr->ph_line_num, fn, len, buf);
+				hdr->ph_line_num, fn, vaf);
 		else if (mask & (D_CONSOLE | libcfs_printk))
-			pr_info("%s: %.*s", prefix, len, buf);
+			pr_info("%s: %pV", prefix, vaf);
 	}
+
+	if (mask & D_TTY)
+		/* tty_write_msg doesn't handle formatting */
+		cfs_tty_write_message(prefix, mask, vaf->fmt);
+}
+
+static void cfs_print_to_console(struct ptldebug_header *hdr, int mask,
+				 const char *file, const char *fn,
+				 const char *fmt, ...)
+{
+	struct va_format vaf;
+	va_list args;
+
+	va_start(args, fmt);
+	vaf.fmt = fmt;
+	vaf.va = &args;
+	cfs_vprint_to_console(hdr, mask, &vaf, file, fn);
 }
 
 int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata,
@@ -508,6 +552,9 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata,
 		if (needed < 2 || *(string_buf + needed - 2) != '\r')
 			pr_info("Lustre: format at %s:%d:%s doesn't end in '\\r\\n'\n",
 				file, msgdata->msg_line, msgdata->msg_fn);
+		if (strnchr(string_buf, needed, '%'))
+			pr_info("Lustre: format at %s:%d:%s mustn't contain %%\n",
+				file, msgdata->msg_line, msgdata->msg_fn);
 	}
 
 	header.ph_len = known_size + needed;
@@ -578,35 +625,27 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata,
 	}
 
 	if (tcd) {
-		cfs_print_to_console(&header, mask, string_buf, needed, file,
-				     msgdata->msg_fn);
+		cfs_print_to_console(&header, mask, file, msgdata->msg_fn,
+				     "%s", string_buf);
 		cfs_trace_put_tcd(tcd);
 	} else {
-		string_buf = cfs_trace_get_console_buffer();
+		struct va_format vaf;
 
 		va_start(ap, format);
-		needed = vscnprintf(string_buf, CFS_TRACE_CONSOLE_BUFFER_SIZE,
-				    format, ap);
+		vaf.fmt = format;
+		vaf.va = &ap;
+		cfs_vprint_to_console(&header, mask,
+				      &vaf, file, msgdata->msg_fn);
 		va_end(ap);
-
-		cfs_print_to_console(&header, mask,
-				     string_buf, needed, file, msgdata->msg_fn);
-
-		put_cpu();
 	}
 
 	if (cdls && cdls->cdls_count) {
-		string_buf = cfs_trace_get_console_buffer();
-
-		needed = scnprintf(string_buf, CFS_TRACE_CONSOLE_BUFFER_SIZE,
-				   "Skipped %d previous similar message%s\n",
-				   cdls->cdls_count,
-				   (cdls->cdls_count > 1) ? "s" : "");
-
-		cfs_print_to_console(&header, mask,
-				     string_buf, needed, file, msgdata->msg_fn);
-
-		put_cpu();
+		/* Do not allow print this to TTY */
+		cfs_print_to_console(&header, mask & ~D_TTY, file,
+				     msgdata->msg_fn,
+				     "Skipped %d previous similar message%s\n",
+				     cdls->cdls_count,
+				     (cdls->cdls_count > 1) ? "s" : "");
 		cdls->cdls_count = 0;
 	}
 
@@ -626,8 +665,8 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata,
 
 	cfs_set_ptldebug_header(&hdr, msgdata, CDEBUG_STACK());
 
-	cfs_print_to_console(&hdr, D_EMERG, str, strlen(str),
-			     msgdata->msg_file, msgdata->msg_fn);
+	cfs_print_to_console(&hdr, D_EMERG, msgdata->msg_file, msgdata->msg_fn,
+			     "%s", str);
 
 	panic("Lustre debug assertion failure\n");
 
@@ -793,7 +832,8 @@ void cfs_trace_debug_print(void)
 			p += strlen(fn) + 1;
 			len = hdr->ph_len - (int)(p - (char *)hdr);
 
-			cfs_print_to_console(hdr, D_EMERG, p, len, file, fn);
+			cfs_print_to_console(hdr, D_EMERG, file, fn,
+					     "%.*s", len, p);
 
 			p += len;
 		}
@@ -1272,24 +1312,8 @@ int cfs_tracefile_init(int max_pages)
 		tcd->tcd_shutting_down = 0;
 	}
 
-	for (i = 0; i < num_possible_cpus(); i++)
-		for (j = 0; j < CFS_TCD_TYPE_CNT; j++) {
-			cfs_trace_console_buffers[i][j] =
-				kmalloc(CFS_TRACE_CONSOLE_BUFFER_SIZE,
-					GFP_KERNEL);
-
-			if (!cfs_trace_console_buffers[i][j])
-				goto out_buffers;
-		}
-
 	return 0;
 
-out_buffers:
-	for (i = 0; i < num_possible_cpus(); i++)
-		for (j = 0; j < 3; j++) {
-			kfree(cfs_trace_console_buffers[i][j]);
-			cfs_trace_console_buffers[i][j] = NULL;
-		}
 out_trace_data:
 	for (i = 0; cfs_trace_data[i]; i++) {
 		kfree(cfs_trace_data[i]);
@@ -1331,18 +1355,11 @@ static void cfs_trace_cleanup(void)
 {
 	struct page_collection pc;
 	int i;
-	int j;
 
 	INIT_LIST_HEAD(&pc.pc_pages);
 
 	trace_cleanup_on_all_cpus();
 
-	for (i = 0; i < num_possible_cpus(); i++)
-		for (j = 0; j < CFS_TCD_TYPE_CNT; j++) {
-			kfree(cfs_trace_console_buffers[i][j]);
-			cfs_trace_console_buffers[i][j] = NULL;
-		}
-
 	for (i = 0; i < CFS_TCD_TYPE_CNT && cfs_trace_data[i]; i++) {
 		kfree(cfs_trace_data[i]);
 		cfs_trace_data[i] = NULL;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 37/49] lnet: libcfs: discard cfs_trace_copyin_string()
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (35 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 36/49] lnet: libcfs: discard cfs_trace_console_buffers[] James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 38/49] lustre: lmv: don't use lqr_alloc spinlock in lmv James Simmons
                   ` (11 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Instead of cfs_trace_copyin_string(), use memdup_user_nul().
This combines the allocation with the copyin, and nul-terminates.

The resulting code is a lot simpler.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14428
Lustre-commit: 67af976c806994ce ("LU-14428 libcfs: discard cfs_trace_copyin_string()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41490
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/libcfs/libcfs_debug.h |  2 --
 net/lnet/libcfs/module.c            | 19 +++++-------
 net/lnet/libcfs/tracefile.c         | 59 +++++++------------------------------
 net/lnet/libcfs/tracefile.h         |  2 --
 net/lnet/lnet/router_proc.c         | 22 +++++++-------
 5 files changed, 30 insertions(+), 74 deletions(-)

diff --git a/include/linux/libcfs/libcfs_debug.h b/include/linux/libcfs/libcfs_debug.h
index 99905f7..93eb752 100644
--- a/include/linux/libcfs/libcfs_debug.h
+++ b/include/linux/libcfs/libcfs_debug.h
@@ -209,8 +209,6 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata,
 	__printf(2, 3);
 
 /* other external symbols that tracefile provides: */
-int cfs_trace_copyin_string(char *knl_buffer, int knl_buffer_nob,
-			    const char __user *usr_buffer, int usr_buffer_nob);
 int cfs_trace_copyout_string(char __user *usr_buffer, int usr_buffer_nob,
 			     const char *knl_buffer, char *append);
 
diff --git a/net/lnet/libcfs/module.c b/net/lnet/libcfs/module.c
index f9cc6df..93e9b9e 100644
--- a/net/lnet/libcfs/module.c
+++ b/net/lnet/libcfs/module.c
@@ -295,7 +295,7 @@ static int proc_dobitmasks(struct ctl_table *table, int write,
 			   void __user *buffer, size_t *lenp, loff_t *ppos)
 {
 	const int tmpstrlen = 512;
-	char *tmpstr;
+	char *tmpstr = NULL;
 	int rc;
 	size_t nob = *lenp;
 	loff_t pos = *ppos;
@@ -303,11 +303,10 @@ static int proc_dobitmasks(struct ctl_table *table, int write,
 	int is_subsys = (mask == &libcfs_subsystem_debug) ? 1 : 0;
 	int is_printk = (mask == &libcfs_printk) ? 1 : 0;
 
-	tmpstr = kzalloc(tmpstrlen, GFP_KERNEL);
-	if (!tmpstr)
-		return -ENOMEM;
-
 	if (!write) {
+		tmpstr = kzalloc(tmpstrlen, GFP_KERNEL);
+		if (!tmpstr)
+			return -ENOMEM;
 		libcfs_debug_mask2str(tmpstr, tmpstrlen, *mask, is_subsys);
 		rc = strlen(tmpstr);
 
@@ -318,13 +317,11 @@ static int proc_dobitmasks(struct ctl_table *table, int write,
 						      tmpstr + pos, "\n");
 		}
 	} else {
-		rc = cfs_trace_copyin_string(tmpstr, tmpstrlen, buffer, nob);
-		if (rc < 0) {
-			kfree(tmpstr);
-			return rc;
-		}
+		tmpstr = memdup_user_nul(buffer, nob);
+		if (!tmpstr)
+			return -ENOMEM;
 
-		rc = libcfs_debug_str2mask(mask, tmpstr, is_subsys);
+		rc = libcfs_debug_str2mask(mask, strim(tmpstr), is_subsys);
 		/* Always print LBUG/LASSERT to console, so keep this mask */
 		if (is_printk)
 			*mask |= D_EMERG;
diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c
index e3a063f..731623b 100644
--- a/net/lnet/libcfs/tracefile.c
+++ b/net/lnet/libcfs/tracefile.c
@@ -920,34 +920,6 @@ void cfs_trace_flush_pages(void)
 	}
 }
 
-int cfs_trace_copyin_string(char *knl_buffer, int knl_buffer_nob,
-			    const char __user *usr_buffer, int usr_buffer_nob)
-{
-	int nob;
-
-	if (usr_buffer_nob > knl_buffer_nob)
-		return -EOVERFLOW;
-
-	if (copy_from_user((void *)knl_buffer,
-			   usr_buffer, usr_buffer_nob))
-		return -EFAULT;
-
-	nob = strnlen(knl_buffer, usr_buffer_nob);
-	while (--nob >= 0)		/* strip trailing whitespace */
-		if (!isspace(knl_buffer[nob]))
-			break;
-
-	if (nob < 0)			/* empty string */
-		return -EINVAL;
-
-	if (nob == knl_buffer_nob)	/* no space to terminate */
-		return -EOVERFLOW;
-
-	knl_buffer[nob + 1] = 0;	/* terminate */
-	return 0;
-}
-EXPORT_SYMBOL(cfs_trace_copyin_string);
-
 int cfs_trace_copyout_string(char __user *usr_buffer, int usr_buffer_nob,
 			     const char *knl_buffer, char *append)
 {
@@ -977,26 +949,20 @@ int cfs_trace_copyout_string(char __user *usr_buffer, int usr_buffer_nob,
 int cfs_trace_dump_debug_buffer_usrstr(void __user *usr_str, int usr_str_nob)
 {
 	char *str;
+	char *path;
 	int rc;
 
-	if (usr_str_nob >= 2 * PAGE_SIZE)
-		return -EINVAL;
-	str = kzalloc(usr_str_nob + 1, GFP_KERNEL);
+	str = memdup_user_nul(usr_str, usr_str_nob);
 	if (!str)
 		return -ENOMEM;
 
-	rc = cfs_trace_copyin_string(str, usr_str_nob + 1,
-				     usr_str, usr_str_nob);
-	if (rc)
-		goto out;
-
-	if (str[0] != '/') {
+	path = strim(str);
+	if (path[0] != '/')
 		rc = -EINVAL;
-		goto out;
-	}
-	rc = cfs_tracefile_dump_all_pages(str);
-out:
+	else
+		rc = cfs_tracefile_dump_all_pages(str);
 	kfree(str);
+
 	return rc;
 }
 
@@ -1045,18 +1011,13 @@ int cfs_trace_daemon_command_usrstr(void __user *usr_str, int usr_str_nob)
 	char *str;
 	int rc;
 
-	if (usr_str_nob >= 2 * PAGE_SIZE)
-		return -EINVAL;
-	str = kzalloc(usr_str_nob + 1, GFP_KERNEL);
+	str = memdup_user_nul(usr_str, usr_str_nob);
 	if (!str)
 		return -ENOMEM;
 
-	rc = cfs_trace_copyin_string(str, usr_str_nob + 1,
-				     usr_str, usr_str_nob);
-	if (!rc)
-		rc = cfs_trace_daemon_command(str);
-
+	rc = cfs_trace_daemon_command(str);
 	kfree(str);
+
 	return rc;
 }
 
diff --git a/net/lnet/libcfs/tracefile.h b/net/lnet/libcfs/tracefile.h
index 5b90c1b..311ec8c 100644
--- a/net/lnet/libcfs/tracefile.h
+++ b/net/lnet/libcfs/tracefile.h
@@ -59,8 +59,6 @@
 int cfs_tracefile_init(int max_pages);
 void cfs_tracefile_exit(void);
 
-int cfs_trace_copyin_string(char *knl_buffer, int knl_buffer_nob,
-			    const char __user *usr_buffer, int usr_buffer_nob);
 int cfs_trace_copyout_string(char __user *usr_buffer, int usr_buffer_nob,
 			     const char *knl_str, char *append);
 int cfs_trace_dump_debug_buffer_usrstr(void __user *usr_str, int usr_str_nob);
diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c
index 623899e..25d172d 100644
--- a/net/lnet/lnet/router_proc.c
+++ b/net/lnet/lnet/router_proc.c
@@ -743,7 +743,7 @@ struct lnet_portal_rotors {
 	const char	*pr_desc;
 };
 
-static struct lnet_portal_rotors	portal_rotors[] = {
+static struct lnet_portal_rotors portal_rotors[] = {
 	{
 		.pr_value = LNET_PTL_ROTOR_OFF,
 		.pr_name  = "OFF",
@@ -783,11 +783,11 @@ static int proc_lnet_portal_rotor(struct ctl_table *table, int write,
 	int rc;
 	int i;
 
-	buf = kmalloc(buf_len, GFP_KERNEL);
-	if (!buf)
-		return -ENOMEM;
-
 	if (!write) {
+		buf = kmalloc(buf_len, GFP_KERNEL);
+		if (!buf)
+			return -ENOMEM;
+
 		lnet_res_lock(0);
 
 		for (i = 0; portal_rotors[i].pr_value >= 0; i++) {
@@ -810,12 +810,14 @@ static int proc_lnet_portal_rotor(struct ctl_table *table, int write,
 			rc = cfs_trace_copyout_string(buffer, nob,
 						      buf + pos, "\n");
 		}
-		goto out;
+		kfree(buf);
+
+		return rc;
 	}
 
-	rc = cfs_trace_copyin_string(buf, buf_len, buffer, nob);
-	if (rc < 0)
-		goto out;
+	buf = memdup_user_nul(buffer, nob);
+	if (!buf)
+		return -ENOMEM;
 
 	tmp = strim(buf);
 
@@ -830,8 +832,8 @@ static int proc_lnet_portal_rotor(struct ctl_table *table, int write,
 		}
 	}
 	lnet_res_unlock(0);
-out:
 	kfree(buf);
+
 	return rc;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 38/49] lustre: lmv: don't use lqr_alloc spinlock in lmv
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (36 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 37/49] lnet: libcfs: discard cfs_trace_copyin_string() James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 39/49] lustre: lov: fault page update cp_lov_index James Simmons
                   ` (10 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The only place the lrq_alloc spinlock is used in lmv is in
lmv_locate_tgt_rr().  The purpose here is presumably to protect
lmv_qos_rr_index from concurrent updates.  This is a field that is
only tangentially related the the structure that holds the spinlock.

lmv_qos_rr_index is directly in 'struct lmv_obd' while lqr_alloc
is in struct lu_qos_rr which is in struct lu_qos, which is in lmv_obd.

As there is a spinlock in 'struct lmv_obd' (lmv_lock) it makes more
sense to use that to protect lmv_qos_rr_index.  Then the entire
lu_qos_rr structure will be unused on the client and can be made
server-only.

WC-bug-id: https://jira.whamcloud.com/browse/LU-8837
Lustre-commit: 3e14a71d87efde0 ("U-8837 lmv: don't use lqr_alloc spinlock in lmv")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41949
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lmv/lmv_obd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 9c0a0cf..6555c6e 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1493,7 +1493,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt)
 	int i;
 	int index;
 
-	spin_lock(&lmv->lmv_qos.lq_rr.lqr_alloc);
+	spin_lock(&lmv->lmv_lock);
 	for (i = 0; i < lmv->lmv_mdt_descs.ltd_tgts_size; i++) {
 		index = (i + lmv->lmv_qos_rr_index) %
 			lmv->lmv_mdt_descs.ltd_tgts_size;
@@ -1504,11 +1504,11 @@ static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt)
 		*mdt = tgt->ltd_index;
 		lmv->lmv_qos_rr_index = (*mdt + 1) %
 					lmv->lmv_mdt_descs.ltd_tgts_size;
-		spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc);
+		spin_unlock(&lmv->lmv_lock);
 
 		return tgt;
 	}
-	spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc);
+	spin_unlock(&lmv->lmv_lock);
 
 	return ERR_PTR(-ENODEV);
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 39/49] lustre: lov: fault page update cp_lov_index
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (37 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 38/49] lustre: lmv: don't use lqr_alloc spinlock in lmv James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 40/49] lustre: update version to 2.14.51 James Simmons
                   ` (9 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bobi Jam <bobijam@whamcloud.com>

In fault IO, vvp_io_fault_start() could find an existing cl_page
associated with the vmpage covering the fault index, and the page
may still refer to another mirror of an old IO.

This patch update the fault page's cp_lov_index in lov_io_fault_start

WC-bug-id: https://jira.whamcloud.com/browse/LU-14502
Lustre-commit: e9bac5fa455eab5 ("LU-14502 lov: fault page update cp_lov_index")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41954
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_io.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c
index a8bba1c..9f67d16 100644
--- a/fs/lustre/lov/lov_io.c
+++ b/fs/lustre/lov/lov_io.c
@@ -1341,9 +1341,46 @@ static int lov_io_fault_start(const struct lu_env *env,
 	struct cl_fault_io *fio;
 	struct lov_io *lio;
 	struct lov_io_sub *sub;
+	loff_t offset;
+	int entry;
+	int stripe;
 
 	fio = &ios->cis_io->u.ci_fault;
 	lio = cl2lov_io(env, ios);
+
+	/**
+	 * LU-14502: ft_page could be an existing cl_page associated with
+	 * the vmpage covering the fault index, and the page may still
+	 * refer to another mirror of an old IO.
+	 */
+	if (lov_is_flr(lio->lis_object)) {
+		offset = cl_offset(ios->cis_obj, fio->ft_index);
+		entry = lov_io_layout_at(lio, offset);
+		if (entry < 0) {
+			CERROR(DFID": page fault index %lu invalid component: %d, mirror: %d\n",
+			       PFID(lu_object_fid(&ios->cis_obj->co_lu)),
+			       fio->ft_index, entry,
+			       lio->lis_mirror_index);
+			return -EIO;
+		}
+		stripe = lov_stripe_number(lio->lis_object->lo_lsm,
+					   entry, offset);
+
+		if (fio->ft_page->cp_lov_index !=
+		    lov_comp_index(entry, stripe)) {
+			CDEBUG(D_INFO,
+			       DFID": page fault at index %lu, at mirror %u comp entry %u stripe %u, been used with comp entry %u stripe %u\n",
+			       PFID(lu_object_fid(&ios->cis_obj->co_lu)),
+			       fio->ft_index, lio->lis_mirror_index,
+			       entry, stripe,
+			       lov_comp_entry(fio->ft_page->cp_lov_index),
+			       lov_comp_stripe(fio->ft_page->cp_lov_index));
+
+			fio->ft_page->cp_lov_index =
+					lov_comp_index(entry, stripe);
+		}
+	}
+
 	sub = lov_sub_get(env, lio, fio->ft_page->cp_lov_index);
 	if (IS_ERR(sub))
 		return PTR_ERR(sub);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 40/49] lustre: update version to 2.14.51
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (38 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 39/49] lustre: lov: fault page update cp_lov_index James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 41/49] lustre: llite: mirror extend/copy keeps sparseness James Simmons
                   ` (8 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

New tag 2.14.51

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h
index 1e2b148..604b1df 100644
--- a/include/uapi/linux/lustre/lustre_ver.h
+++ b/include/uapi/linux/lustre/lustre_ver.h
@@ -3,9 +3,9 @@
 
 #define LUSTRE_MAJOR 2
 #define LUSTRE_MINOR 14
-#define LUSTRE_PATCH 50
+#define LUSTRE_PATCH 51
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.14.50"
+#define LUSTRE_VERSION_STRING "2.14.51"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 41/49] lustre: llite: mirror extend/copy keeps sparseness
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (39 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 40/49] lustre: update version to 2.14.51 James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 42/49] lustre: ptlrpc: don't use list_for_each_entry_safe unnecessarily James Simmons
                   ` (7 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Mikhail Pershin, Lustre Development List

From: Mikhail Pershin <mpershin@whamcloud.com>

- make ll_lseek() to work under group lock and on designated
  mirror

WC-bug-id: https://jira.whamcloud.com/browse/LU-13397
Lustre-commit: 0561c144cc1bb623 ("LU-13397 lfs: mirror extend/copy keeps sparseness")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/40772
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 225008e..bbb2ff9 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -4017,8 +4017,9 @@ static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags)
 	}
 }
 
-loff_t ll_lseek(struct inode *inode, loff_t offset, int whence)
+loff_t ll_lseek(struct file *file, loff_t offset, int whence)
 {
+	struct inode *inode = file_inode(file);
 	struct lu_env *env;
 	struct cl_io *io;
 	struct cl_lseek_io *lsio;
@@ -4032,6 +4033,7 @@ loff_t ll_lseek(struct inode *inode, loff_t offset, int whence)
 
 	io = vvp_env_thread_io(env);
 	io->ci_obj = ll_i2info(inode)->lli_clob;
+	ll_io_set_mirror(io, file);
 
 	lsio = &io->u.ci_lseek;
 	lsio->ls_start = offset;
@@ -4040,10 +4042,14 @@ loff_t ll_lseek(struct inode *inode, loff_t offset, int whence)
 
 	do {
 		rc = cl_io_init(env, io, CIT_LSEEK, io->ci_obj);
-		if (!rc)
+		if (!rc) {
+			struct vvp_io *vio = vvp_env_io(env);
+
+			vio->vui_fd = file->private_data;
 			rc = cl_io_loop(env, io);
-		else
+		} else {
 			rc = io->ci_result;
+		}
 		retval = rc ? : lsio->ls_result;
 		cl_io_fini(env, io);
 	} while (unlikely(io->ci_need_restart));
@@ -4077,7 +4083,7 @@ static loff_t ll_file_seek(struct file *file, loff_t offset, int origin)
 		cl_sync_file_range(inode, offset, OBD_OBJECT_EOF,
 				   CL_FSYNC_LOCAL, 0);
 
-		retval = ll_lseek(inode, offset, origin);
+		retval = ll_lseek(file, offset, origin);
 		if (retval < 0)
 			return retval;
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 42/49] lustre: ptlrpc: don't use list_for_each_entry_safe unnecessarily.
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (40 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 41/49] lustre: llite: mirror extend/copy keeps sparseness James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 43/49] lnet: Age peer NI out of recovery James Simmons
                   ` (6 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: NeilBrown <neilb@suse.de>

list_for_each_entry_safe() is only needed if the body of the
loop might change the list, or if it might drop a lock that would
otherwise prevent the list from being changed.

When the body does neither of these, list_for_each_entry() should be
preferred as it makes the behaviour of the loop more clear to readers.

In each of the cases changed there, the list cannot change while the
loop proceeds.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: 5c89fa57cb2dfff6 ("LU-6142 lustre: ptlrpc: don't use list_for_each_entry_safe unnecessarily.")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41939
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/client.c  | 10 +++++-----
 fs/lustre/ptlrpc/import.c  | 14 +++++++-------
 fs/lustre/ptlrpc/ptlrpcd.c |  4 ++--
 fs/lustre/ptlrpc/recover.c | 14 +++++++-------
 4 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index 20c00ad..04e8fec 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -3060,7 +3060,7 @@ int ptlrpc_replay_req(struct ptlrpc_request *req)
  */
 void ptlrpc_abort_inflight(struct obd_import *imp)
 {
-	struct ptlrpc_request *req, *n;
+	struct ptlrpc_request *req;
 
 	/*
 	 * Make sure that no new requests get processed for this import.
@@ -3074,7 +3074,7 @@ void ptlrpc_abort_inflight(struct obd_import *imp)
 	 * locked?  Also, how do we know if the requests on the list are
 	 * being freed at this time?
 	 */
-	list_for_each_entry_safe(req, n, &imp->imp_sending_list, rq_list) {
+	list_for_each_entry(req, &imp->imp_sending_list, rq_list) {
 		DEBUG_REQ(D_RPCTRACE, req, "inflight");
 
 		spin_lock(&req->rq_lock);
@@ -3086,7 +3086,7 @@ void ptlrpc_abort_inflight(struct obd_import *imp)
 		spin_unlock(&req->rq_lock);
 	}
 
-	list_for_each_entry_safe(req, n, &imp->imp_delayed_list, rq_list) {
+	list_for_each_entry(req, &imp->imp_delayed_list, rq_list) {
 		DEBUG_REQ(D_RPCTRACE, req, "aborting waiting req");
 
 		spin_lock(&req->rq_lock);
@@ -3112,9 +3112,9 @@ void ptlrpc_abort_inflight(struct obd_import *imp)
  */
 void ptlrpc_abort_set(struct ptlrpc_request_set *set)
 {
-	struct ptlrpc_request *req, *tmp;
+	struct ptlrpc_request *req;
 
-	list_for_each_entry_safe(req, tmp, &set->set_requests, rq_set_chain) {
+	list_for_each_entry(req, &set->set_requests, rq_set_chain) {
 		spin_lock(&req->rq_lock);
 		if (req->rq_phase != RQ_PHASE_RPC) {
 			spin_unlock(&req->rq_lock);
diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c
index 35c4f83..5e33ebc 100644
--- a/fs/lustre/ptlrpc/import.c
+++ b/fs/lustre/ptlrpc/import.c
@@ -271,11 +271,11 @@ void ptlrpc_deactivate_import(struct obd_import *imp)
 static time64_t ptlrpc_inflight_timeout(struct obd_import *imp)
 {
 	time64_t now = ktime_get_real_seconds();
-	struct ptlrpc_request *req, *n;
+	struct ptlrpc_request *req;
 	time64_t timeout = 0;
 
 	spin_lock(&imp->imp_lock);
-	list_for_each_entry_safe(req, n, &imp->imp_sending_list, rq_list)
+	list_for_each_entry(req, &imp->imp_sending_list, rq_list)
 		timeout = max(ptlrpc_inflight_deadline(req, now), timeout);
 
 	spin_unlock(&imp->imp_lock);
@@ -290,7 +290,7 @@ static time64_t ptlrpc_inflight_timeout(struct obd_import *imp)
  */
 void ptlrpc_invalidate_import(struct obd_import *imp)
 {
-	struct ptlrpc_request *req, *n;
+	struct ptlrpc_request *req;
 	time64_t timeout;
 	int rc;
 
@@ -370,13 +370,13 @@ void ptlrpc_invalidate_import(struct obd_import *imp)
 				 */
 				rc = 0;
 			} else {
-				list_for_each_entry_safe(req, n,
-							 &imp->imp_sending_list, rq_list) {
+				list_for_each_entry(req, &imp->imp_sending_list,
+						    rq_list) {
 					DEBUG_REQ(D_ERROR, req,
 						  "still on sending list");
 				}
-				list_for_each_entry_safe(req, n,
-							 &imp->imp_delayed_list, rq_list) {
+				list_for_each_entry(req, &imp->imp_delayed_list,
+						    rq_list) {
 					DEBUG_REQ(D_ERROR, req,
 						  "still on delayed list");
 				}
diff --git a/fs/lustre/ptlrpc/ptlrpcd.c b/fs/lustre/ptlrpc/ptlrpcd.c
index b0b81cc..ef24b0e 100644
--- a/fs/lustre/ptlrpc/ptlrpcd.c
+++ b/fs/lustre/ptlrpc/ptlrpcd.c
@@ -200,12 +200,12 @@ void ptlrpcd_wake(struct ptlrpc_request *req)
 static int ptlrpcd_steal_rqset(struct ptlrpc_request_set *des,
 			       struct ptlrpc_request_set *src)
 {
-	struct ptlrpc_request *req, *tmp;
+	struct ptlrpc_request *req;
 	int rc = 0;
 
 	spin_lock(&src->set_new_req_lock);
 	if (likely(!list_empty(&src->set_new_requests))) {
-		list_for_each_entry_safe(req, tmp, &src->set_new_requests, rq_set_chain)
+		list_for_each_entry(req, &src->set_new_requests, rq_set_chain)
 			req->rq_set = des;
 
 		list_splice_init(&src->set_new_requests, &des->set_requests);
diff --git a/fs/lustre/ptlrpc/recover.c b/fs/lustre/ptlrpc/recover.c
index 09ea3b3..104af56 100644
--- a/fs/lustre/ptlrpc/recover.c
+++ b/fs/lustre/ptlrpc/recover.c
@@ -66,7 +66,7 @@ void ptlrpc_initiate_recovery(struct obd_import *imp)
 int ptlrpc_replay_next(struct obd_import *imp, int *inflight)
 {
 	int rc = 0;
-	struct ptlrpc_request *req = NULL, *pos;
+	struct ptlrpc_request *req = NULL;
 	u64 last_transno;
 
 	*inflight = 0;
@@ -120,8 +120,8 @@ int ptlrpc_replay_next(struct obd_import *imp, int *inflight)
 	if (!req) {
 		struct ptlrpc_request *tmp;
 
-		list_for_each_entry_safe(tmp, pos, &imp->imp_replay_list,
-					 rq_replay_list) {
+		list_for_each_entry(tmp, &imp->imp_replay_list,
+				    rq_replay_list) {
 			if (tmp->rq_transno > last_transno) {
 				req = tmp;
 				break;
@@ -172,7 +172,7 @@ int ptlrpc_replay_next(struct obd_import *imp, int *inflight)
  */
 int ptlrpc_resend(struct obd_import *imp)
 {
-	struct ptlrpc_request *req, *next;
+	struct ptlrpc_request *req;
 
 	/* As long as we're in recovery, nothing should be added to the sending
 	 * list, so we don't need to hold the lock during this iteration and
@@ -186,7 +186,7 @@ int ptlrpc_resend(struct obd_import *imp)
 		return -1;
 	}
 
-	list_for_each_entry_safe(req, next, &imp->imp_sending_list, rq_list) {
+	list_for_each_entry(req, &imp->imp_sending_list, rq_list) {
 		LASSERTF((long)req > PAGE_SIZE && req != LP_POISON,
 			 "req %p bad\n", req);
 		LASSERTF(req->rq_type != LI_POISON, "req %p freed\n", req);
@@ -211,10 +211,10 @@ int ptlrpc_resend(struct obd_import *imp)
  */
 void ptlrpc_wake_delayed(struct obd_import *imp)
 {
-	struct ptlrpc_request *req, *pos;
+	struct ptlrpc_request *req;
 
 	spin_lock(&imp->imp_lock);
-	list_for_each_entry_safe(req, pos, &imp->imp_delayed_list, rq_list) {
+	list_for_each_entry(req, &imp->imp_delayed_list, rq_list) {
 		DEBUG_REQ(D_HA, req, "waking (set %p):", req->rq_set);
 		ptlrpc_client_wake_req(req);
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 43/49] lnet: Age peer NI out of recovery
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (41 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 42/49] lustre: ptlrpc: don't use list_for_each_entry_safe unnecessarily James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 44/49] lnet: Only recover known good peer NIs James Simmons
                   ` (5 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

No longer send recovery pings to a peer NI that has been in recovery
for the recovery time limit. A peer NI will become eligible for
recovery again once we receive a message from it.

The existing lpni_last_alive field is utilized for this new purpose.

A check for NULL lpni is removed from
lnet_handle_remote_failure_locked() because all callers of that
function already ensure the lpni is non-NULL.

lnet_peer_ni_add_to_recoveryq_locked() now takes the recovery queue
as an argument rather than using the_lnet.ln_mt_peerNIRecovq. This
allows the function to be used by lnet_recover_peer_nis().
lnet_peer_ni_add_to_recoveryq_locked() is also modified to take a ref
on the peer NI if it is added to the recovery queue. Previously, it
was the responsibility of callers to take this ref.

HPE-bug-id: LUS-9109
WC-bug-id: https://jira.whamcloud.com/browse/LU-13569
Lustre-commit: cc27201a76574b5 ("LU-13569 lnet: Age peer NI out of recovery")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39718
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  4 +++-
 net/lnet/lnet/lib-move.c      | 40 ++++++++++++++++---------------------
 net/lnet/lnet/lib-msg.c       | 25 ++++++++++++++---------
 net/lnet/lnet/peer.c          | 46 ++++++++++++++++++++++++++++++++-----------
 4 files changed, 70 insertions(+), 45 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 1954614..e30d0c4 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -513,7 +513,9 @@ struct lnet_ni *lnet_get_next_ni_locked(struct lnet_net *mynet,
 int lnet_get_peer_list(u32 *countp, u32 *sizep,
 		       struct lnet_process_id __user *ids);
 extern void lnet_peer_ni_set_healthv(lnet_nid_t nid, int value, bool all);
-extern void lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni);
+extern void lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni,
+						 struct list_head *queue,
+						 time64_t now);
 extern int lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
 extern void lnet_peer_clr_pref_nids(struct lnet_peer_ni *lpni);
 extern int lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 1868506..bdcba54 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -3356,6 +3356,7 @@ struct lnet_mt_event_info {
 	struct lnet_peer_ni *lpni;
 	struct lnet_peer_ni *tmp;
 	lnet_nid_t nid;
+	time64_t now;
 	int healthv;
 	int rc;
 
@@ -3367,6 +3368,8 @@ struct lnet_mt_event_info {
 			 &local_queue);
 	lnet_net_unlock(0);
 
+	now = ktime_get_seconds();
+
 	list_for_each_entry_safe(lpni, tmp, &local_queue,
 				 lpni_recovery) {
 		/* The same protection strategy is used here as is in the
@@ -3444,30 +3447,22 @@ struct lnet_mt_event_info {
 			}
 
 			lpni->lpni_recovery_ping_mdh = mdh;
-			/* While we're unlocked the lpni could've been
-			 * readded on the recovery queue. In this case we
-			 * don't need to add it to the local queue, since
-			 * it's already on there and the thread that added
-			 * it would've incremented the refcount on the
-			 * peer, which means we need to decref the refcount
-			 * that was implicitly grabbed by find_peer_ni_locked.
-			 * Otherwise, if the lpni is still not on
-			 * the recovery queue, then we'll add it to the
-			 * processed list.
-			 */
-			if (list_empty(&lpni->lpni_recovery))
-				list_add_tail(&lpni->lpni_recovery,
-					      &processed_list);
-			else
-				lnet_peer_ni_decref_locked(lpni);
-			lnet_net_unlock(0);
-
-			spin_lock(&lpni->lpni_lock);
-			if (rc)
+			lnet_peer_ni_add_to_recoveryq_locked(lpni,
+							     &processed_list,
+							     now);
+			if (rc) {
+				spin_lock(&lpni->lpni_lock);
 				lpni->lpni_state &=
 					~LNET_PEER_NI_RECOVERY_PENDING;
+				spin_unlock(&lpni->lpni_lock);
+			}
+
+			/* Drop the ref taken by lnet_find_peer_ni_locked() */
+			lnet_peer_ni_decref_locked(lpni);
+			lnet_net_unlock(0);
+		} else {
+			spin_unlock(&lpni->lpni_lock);
 		}
-		spin_unlock(&lpni->lpni_lock);
 	}
 
 	list_splice_init(&processed_list, &local_queue);
@@ -4384,8 +4379,7 @@ void lnet_monitor_thr_stop(void)
 		}
 	}
 
-	if (the_lnet.ln_routing)
-		lpni->lpni_last_alive = ktime_get_seconds();
+	lpni->lpni_last_alive = ktime_get_seconds();
 
 	msg->msg_rxpeer = lpni;
 	msg->msg_rxni = ni;
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index d888090..2e8fea7 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -488,19 +488,13 @@
 	lnet_net_unlock(0);
 }
 
+/* must hold net_lock/0 */
 void
 lnet_handle_remote_failure_locked(struct lnet_peer_ni *lpni)
 {
 	u32 sensitivity = lnet_health_sensitivity;
 	u32 lp_sensitivity;
 
-	/* NO-OP if:
-	 * 1. lpni could be NULL if we're in the LOLND case
-	 * 2. this is a recovery message
-	 */
-	if (!lpni)
-		return;
-
 	/* If there is a health sensitivity in the peer then use that
 	 * instead of the globally set one.
 	 */
@@ -519,7 +513,9 @@
 	 * value will not be reduced. In this case, there is no reason to
 	 * invoke recovery
 	 */
-	lnet_peer_ni_add_to_recoveryq_locked(lpni);
+	lnet_peer_ni_add_to_recoveryq_locked(lpni,
+					     &the_lnet.ln_mt_peerNIRecovq,
+					     ktime_get_seconds());
 }
 
 static void
@@ -892,8 +888,19 @@
 				u32 sensitivity;
 
 				lpn_peer = lpni->lpni_peer_net->lpn_peer;
-				sensitivity = lpn_peer->lp_health_sensitivity;
+				sensitivity = lpn_peer->lp_health_sensitivity ?
+					      lpn_peer->lp_health_sensitivity :
+					      lnet_health_sensitivity;
 				lnet_inc_lpni_healthv_locked(lpni, sensitivity);
+				/* This peer NI may have previously aged out
+				 * of recovery. Now that we've received a
+				 * message from it, we can continue recovery
+				 * if its health value is still below the
+				 * maximum.
+				 */
+				lnet_peer_ni_add_to_recoveryq_locked(lpni,
+								     &the_lnet.ln_mt_peerNIRecovq,
+								     ktime_get_seconds());
 			}
 			lnet_net_unlock(0);
 		}
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index ba41d86..fe80b81 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -3978,22 +3978,38 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 	return rc;
 }
 
+/* must hold net_lock/0 */
 void
-lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni)
+lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni,
+				     struct list_head *recovery_queue,
+				     time64_t now)
 {
 	/* the mt could've shutdown and cleaned up the queues */
 	if (the_lnet.ln_mt_state != LNET_MT_STATE_RUNNING)
 		return;
 
-	if (list_empty(&lpni->lpni_recovery) &&
-	    atomic_read(&lpni->lpni_healthv) < LNET_MAX_HEALTH_VALUE) {
-		CDEBUG(D_NET, "lpni %s added to recovery queue. Health = %d\n",
+	if (!list_empty(&lpni->lpni_recovery))
+		return;
+
+	if (atomic_read(&lpni->lpni_healthv) == LNET_MAX_HEALTH_VALUE)
+		return;
+
+	if (now > lpni->lpni_last_alive + lnet_recovery_limit) {
+		CDEBUG(D_NET, "lpni %s aged out last alive %lld\n",
 		       libcfs_nid2str(lpni->lpni_nid),
-		       atomic_read(&lpni->lpni_healthv));
-		list_add_tail(&lpni->lpni_recovery,
-			      &the_lnet.ln_mt_peerNIRecovq);
-		lnet_peer_ni_addref_locked(lpni);
+		       lpni->lpni_last_alive);
+		return;
 	}
+
+	/* This peer NI is going on the recovery queue, so take a ref on it */
+	lnet_peer_ni_addref_locked(lpni);
+
+	CDEBUG(D_NET, "%s added to recovery queue. last alive: %lld health: %d\n",
+	       libcfs_nid2str(lpni->lpni_nid),
+	       lpni->lpni_last_alive,
+	       atomic_read(&lpni->lpni_healthv));
+
+	list_add_tail(&lpni->lpni_recovery, recovery_queue);
 }
 
 /* Call with the ln_api_mutex held */
@@ -4006,10 +4022,13 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 	struct lnet_peer_ni *lpni;
 	int lncpt;
 	int cpt;
+	time64_t now;
 
 	if (the_lnet.ln_state != LNET_STATE_RUNNING)
 		return;
 
+	now = ktime_get_seconds();
+
 	if (!all) {
 		lnet_net_lock(LNET_LOCK_EX);
 		lpni = lnet_find_peer_ni_locked(nid);
@@ -4018,7 +4037,8 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 			return;
 		}
 		atomic_set(&lpni->lpni_healthv, value);
-		lnet_peer_ni_add_to_recoveryq_locked(lpni);
+		lnet_peer_ni_add_to_recoveryq_locked(lpni,
+						     &the_lnet.ln_mt_peerNIRecovq, now);
 		lnet_peer_ni_decref_locked(lpni);
 		lnet_net_unlock(LNET_LOCK_EX);
 		return;
@@ -4026,8 +4046,8 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 
 	lncpt = cfs_percpt_number(the_lnet.ln_peer_tables);
 
-	/* Walk all the peers and reset the healhv for each one to the
-	 * maximum value.
+	/* Walk all the peers and reset the healh value for each one to the
+	 * specified value.
 	 */
 	lnet_net_lock(LNET_LOCK_EX);
 	for (cpt = 0; cpt < lncpt; cpt++) {
@@ -4038,7 +4058,9 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 				list_for_each_entry(lpni, &lpn->lpn_peer_nis,
 						    lpni_peer_nis) {
 					atomic_set(&lpni->lpni_healthv, value);
-					lnet_peer_ni_add_to_recoveryq_locked(lpni);
+					lnet_peer_ni_add_to_recoveryq_locked(lpni,
+									     &the_lnet.ln_mt_peerNIRecovq,
+									     now);
 				}
 			}
 		}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 44/49] lnet: Only recover known good peer NIs
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (42 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 43/49] lnet: Age peer NI out of recovery James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 45/49] lnet: Recover peer NI w/exponential backoff interval James Simmons
                   ` (4 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

A peer NI should not be eligible for recovery if we've never
received a message from it.

HPE-bug-id: LUS-9109
WC-bug-id: https://jira.whamcloud.com/browse/LU-13569
Lustre-commit: 39a169cd02738a1 ("Chris Horn <chris.horn@hpe.com>")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39719
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index fe80b81..f9af5da 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -3994,6 +3994,14 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 	if (atomic_read(&lpni->lpni_healthv) == LNET_MAX_HEALTH_VALUE)
 		return;
 
+	if (!lpni->lpni_last_alive) {
+		CDEBUG(D_NET,
+		       "lpni %s(%p) not eligible for recovery last alive %lld\n",
+		       libcfs_nid2str(lpni->lpni_nid), lpni,
+		       lpni->lpni_last_alive);
+		return;
+	}
+
 	if (now > lpni->lpni_last_alive + lnet_recovery_limit) {
 		CDEBUG(D_NET, "lpni %s aged out last alive %lld\n",
 		       libcfs_nid2str(lpni->lpni_nid),
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 45/49] lnet: Recover peer NI w/exponential backoff interval
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (43 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 44/49] lnet: Only recover known good peer NIs James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 46/49] lustre: lov: return valid stripe_count/size for PFL files James Simmons
                   ` (3 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

Perform LNet recovery pings of peer NIs with an exponential backoff
interval.
 - The interval is equal to 2^(number failed pings) up to a maximum
   of 900 seconds (15 minutes).
 - When a message is received the count of failed pings for the
   associated peer NI is reset to 0 so that recovery can happen more
   quickly.

HPE-bug-id: LUS-9109
WC-bug-id: https://jira.whamcloud.com/browse/LU-13569
Lustre-commit: 917553c537a8860 ("LU-13569 lnet: Recover peer NI w/exponential backoff interval")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39720
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h  | 22 ++++++++++++++++++++++
 include/linux/lnet/lib-types.h |  6 ++++++
 net/lnet/lnet/lib-move.c       |  8 ++++++++
 net/lnet/lnet/lib-msg.c        |  6 +++++-
 net/lnet/lnet/peer.c           | 11 ++++++++++-
 5 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index e30d0c4..8b369dd 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -910,6 +910,28 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid,
 	return false;
 }
 
+#define LNET_RECOVERY_INTERVAL_MAX 900
+static inline unsigned int
+lnet_get_next_recovery_ping(unsigned int ping_count, time64_t now)
+{
+	unsigned int interval;
+
+	/* 2^9 = 512, 2^10 = 1024 */
+	if (ping_count > 9)
+		interval = LNET_RECOVERY_INTERVAL_MAX;
+	else
+		interval = 1 << ping_count;
+
+	return now + interval;
+}
+
+static inline void
+lnet_peer_ni_set_next_ping(struct lnet_peer_ni *lpni, time64_t now)
+{
+	lpni->lpni_next_ping =
+		lnet_get_next_recovery_ping(lpni->lpni_ping_count, now);
+}
+
 /*
  * A peer NI is alive if it satisfies the following two conditions:
  *  1. peer NI health >= LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index cc451cf..af8f61e 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -573,6 +573,12 @@ struct lnet_peer_ni {
 	atomic_t		 lpni_healthv;
 	/* recovery ping mdh */
 	struct lnet_handle_md	 lpni_recovery_ping_mdh;
+	/* When to send the next recovery ping */
+	time64_t		 lpni_next_ping;
+	/* How many pings sent during current recovery period did not receive
+	 * a reply. NB: reset whenever _any_ message arrives from this peer NI
+	 */
+	unsigned int		 lpni_ping_count;
 	/* CPT this peer attached on */
 	int			 lpni_cpt;
 	/* state flags -- protected by lpni_lock */
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index bdcba54..ad1517d 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -3398,6 +3398,12 @@ struct lnet_mt_event_info {
 		}
 
 		spin_unlock(&lpni->lpni_lock);
+
+		if (now < lpni->lpni_next_ping) {
+			lnet_net_unlock(0);
+			continue;
+		}
+
 		lnet_net_unlock(0);
 
 		/* NOTE: we're racing with peer deletion from user space.
@@ -3446,6 +3452,8 @@ struct lnet_mt_event_info {
 				continue;
 			}
 
+			lpni->lpni_ping_count++;
+
 			lpni->lpni_recovery_ping_mdh = mdh;
 			lnet_peer_ni_add_to_recoveryq_locked(lpni,
 							     &processed_list,
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 2e8fea7..0a4a317 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -863,8 +863,11 @@
 
 	switch (hstatus) {
 	case LNET_MSG_STATUS_OK:
-		/* increment the local ni health weather we successfully
+		/* increment the local ni health whether we successfully
 		 * received or sent a message on it.
+		 *
+		 * Ping counts are reset to 0 as appropriate to allow for
+		 * faster recovery.
 		 */
 		lnet_inc_healthv(&ni->ni_healthv, lnet_health_sensitivity);
 		/* It's possible msg_txpeer is NULL in the LOLND
@@ -875,6 +878,7 @@
 		 * as indication that the router is fully healthy.
 		 */
 		if (lpni && msg->msg_rx_committed) {
+			lpni->lpni_ping_count = 0;
 			/* If we're receiving a message from the router or
 			 * I'm a router, then set that lpni's health to
 			 * maximum so we can commence communication
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index f9af5da..15fcb5e 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -4006,14 +4006,23 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 		CDEBUG(D_NET, "lpni %s aged out last alive %lld\n",
 		       libcfs_nid2str(lpni->lpni_nid),
 		       lpni->lpni_last_alive);
+		/* Reset the ping count so that if this peer NI is added back to
+		 * the recovery queue we will send the first ping right away.
+		 */
+		lpni->lpni_ping_count = 0;
 		return;
 	}
 
 	/* This peer NI is going on the recovery queue, so take a ref on it */
 	lnet_peer_ni_addref_locked(lpni);
 
-	CDEBUG(D_NET, "%s added to recovery queue. last alive: %lld health: %d\n",
+	lnet_peer_ni_set_next_ping(lpni, now);
+
+	CDEBUG(D_NET,
+	       "%s added to recovery queue. ping count: %u next ping: %lld last alive: %lld health: %d\n",
 	       libcfs_nid2str(lpni->lpni_nid),
+	       lpni->lpni_ping_count,
+	       lpni->lpni_next_ping,
 	       lpni->lpni_last_alive,
 	       atomic_read(&lpni->lpni_healthv));
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 46/49] lustre: lov: return valid stripe_count/size for PFL files
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (44 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 45/49] lnet: Recover peer NI w/exponential backoff interval James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 47/49] lnet: convert lpni_refcount to a kref James Simmons
                   ` (2 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Emoly Liu <emoly@whamcloud.com>

Dump struct lov_comp_md_v1 in function ll_lov_getstripe_ea_info()
correctly to avoid stripe_count=0 or stripe_size=0 returned by
old interface llapi_file_get_stripe(), which will cause
divide-by-zero for older userspace that calls this ioctl,
e.g. lustre ADIO driver.
The rule is:
- if stripe_count=0, return stripe_count=1;
- if stripe_size=0,
  -- for DoM files, return the stripe size of the second component,
     since the first component of DoM file data is placed on the
     MDT for faster access;
  -- else, return the stripe size of the last component.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14337
Lustre-commit: abf04e7ea356e8b ("LU-14337 lov: return valid stripe_count/size for PFL files")
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41803
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c   | 74 ++++++++++++++++++++++++++++++++++++++----------
 fs/lustre/lov/lov_pack.c |  7 -----
 2 files changed, 59 insertions(+), 22 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index bbb2ff9..2558a60 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -2059,6 +2059,7 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 	}
 
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
+	LASSERT(body); /* checked by mdc_getattr_name */
 
 	lmmsize = body->mbo_eadatasize;
 
@@ -2069,6 +2070,7 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 	}
 
 	lmm = req_capsule_server_sized_get(&req->rq_pill, &RMF_MDT_MD, lmmsize);
+	LASSERT(lmm);
 
 	if (lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_V1) &&
 	    lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_V3) &&
@@ -2083,8 +2085,7 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 	 * little endian. We convert it to host endian before
 	 * passing it to userspace.
 	 */
-	if ((lmm->lmm_magic & __swab32(LOV_MAGIC_MAGIC)) ==
-	    __swab32(LOV_MAGIC_MAGIC)) {
+	if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC) {
 		int stripe_count = 0;
 
 		if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_V1) ||
@@ -2093,24 +2094,67 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 			if (le32_to_cpu(lmm->lmm_pattern) &
 			    LOV_PATTERN_F_RELEASED)
 				stripe_count = 0;
+
+			lustre_swab_lov_user_md((struct lov_user_md *)lmm, 0);
+
+			/* if function called for directory - we should
+			 * avoid swab not existent lsm objects
+			 */
+			if (lmm->lmm_magic == LOV_MAGIC_V1 &&
+			    S_ISREG(body->mbo_mode))
+				lustre_swab_lov_user_md_objects(((struct lov_user_md_v1 *)lmm)->lmm_objects,
+								stripe_count);
+			else if (lmm->lmm_magic == LOV_MAGIC_V3 &&
+				 S_ISREG(body->mbo_mode))
+				lustre_swab_lov_user_md_objects(((struct lov_user_md_v3 *)lmm)->lmm_objects,
+								stripe_count);
+		} else if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_COMP_V1)) {
+			lustre_swab_lov_comp_md_v1((struct lov_comp_md_v1 *)lmm);
 		}
+	}
 
-		lustre_swab_lov_user_md((struct lov_user_md *)lmm, 0);
+	if (lmm->lmm_magic == LOV_MAGIC_COMP_V1) {
+		struct lov_comp_md_v1 *comp_v1 = NULL;
+		struct lov_comp_md_entry_v1 *ent;
+		struct lov_user_md_v1 *v1;
+		u32 off;
+		int i = 0;
+
+		comp_v1 = (struct lov_comp_md_v1 *)lmm;
+		/* Dump the striping information */
+		for (; i < comp_v1->lcm_entry_count; i++) {
+			ent = &comp_v1->lcm_entries[i];
+			off = ent->lcme_offset;
+			v1 = (struct lov_user_md_v1 *)((char *)lmm + off);
+			CDEBUG(D_INFO,
+			       "comp[%d]: stripe_count=%u, stripe_size=%u\n",
+			       i, v1->lmm_stripe_count, v1->lmm_stripe_size);
+		}
 
-		/* if function called for directory - we should
-		 * avoid swab not existent lsm objects
+		/**
+		 * Return valid stripe_count and stripe_size instead of 0 for
+		 * DoM files to avoid divide-by-zero for older userspace that
+		 * calls this ioctl, e.g. lustre ADIO driver.
 		 */
-		if (lmm->lmm_magic == LOV_MAGIC_V1 && S_ISREG(body->mbo_mode))
-			lustre_swab_lov_user_md_objects(
-				((struct lov_user_md_v1 *)lmm)->lmm_objects,
-				stripe_count);
-		else if (lmm->lmm_magic == LOV_MAGIC_V3 &&
-			 S_ISREG(body->mbo_mode))
-			lustre_swab_lov_user_md_objects(
-				((struct lov_user_md_v3 *)lmm)->lmm_objects,
-				stripe_count);
+		if (lmm->lmm_stripe_count == 0)
+			lmm->lmm_stripe_count = 1;
+		if (lmm->lmm_stripe_size == 0) {
+			/* Since the first component of the file data is placed
+			 * on the MDT for faster access, the stripe_size of the
+			 * second one is always that applications which are
+			 * doing large IOs.
+			 */
+			if (lmm->lmm_pattern == LOV_PATTERN_MDT)
+				i = comp_v1->lcm_entry_count > 1 ? 1 : 0;
+			else
+				i = comp_v1->lcm_entry_count > 1 ?
+				    comp_v1->lcm_entry_count - 1 : 0;
+			ent = &comp_v1->lcm_entries[i];
+			off = ent->lcme_offset;
+			v1 = (struct lov_user_md_v1 *)((char *)lmm + off);
+			lmm->lmm_stripe_size = v1->lmm_stripe_size;
+		}
 	}
-
 out:
 	*lmmp = lmm;
 	*lmm_size = lmmsize;
diff --git a/fs/lustre/lov/lov_pack.c b/fs/lustre/lov/lov_pack.c
index 1962472..c97093e 100644
--- a/fs/lustre/lov/lov_pack.c
+++ b/fs/lustre/lov/lov_pack.c
@@ -450,13 +450,6 @@ int lov_getstripe(const struct lu_env *env, struct lov_object *obj,
 	}
 
 	/**
-	 * Return stripe_count=1 instead of 0 for DoM files to avoid
-	 * divide-by-zero for older userspace that calls this ioctl,
-	 * e.g. lustre ADIO driver.
-	 */
-	if ((lum.lmm_stripe_count == 0) && (lum.lmm_pattern & LOV_PATTERN_MDT))
-		lum.lmm_stripe_count = 1;
-	/**
 	 * User specified limited buffer size, usually the buffer is
 	 * from ll_lov_setstripe(), and the buffer can only hold basic
 	 * layout template info.
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 47/49] lnet: convert lpni_refcount to a kref
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (45 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 46/49] lustre: lov: return valid stripe_count/size for PFL files James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 48/49] lustre: lmv: handle default stripe_count=-1 properly James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 49/49] lnet: libcfs: discard cfs_array_alloc() James Simmons
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: "Mr. NeilBrown" <neilb@suse.de>

This refcount is used exactly like a kref.  So change it to one.
kref uses refcount_t which will warn on increment-from-zero and
similar problems (which enabled with CONFIG option), so we don't
need the LASSERT calls.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12678
Lustre-commit: e520ee276800362c ("LU-12678 lnet: convert lpni_refcount to a kref")
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41941
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h  |  9 +++------
 include/linux/lnet/lib-types.h |  3 ++-
 net/lnet/lnet/peer.c           | 14 ++++++++------
 net/lnet/lnet/router_proc.c    |  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 8b369dd..e4dbe0e 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -351,18 +351,15 @@ void lnet_res_lh_initialize(struct lnet_res_container *rec,
 static inline void
 lnet_peer_ni_addref_locked(struct lnet_peer_ni *lp)
 {
-	LASSERT(atomic_read(&lp->lpni_refcount) > 0);
-	atomic_inc(&lp->lpni_refcount);
+	kref_get(&lp->lpni_kref);
 }
 
-void lnet_destroy_peer_ni_locked(struct lnet_peer_ni *lp);
+void lnet_destroy_peer_ni_locked(struct kref *ref);
 
 static inline void
 lnet_peer_ni_decref_locked(struct lnet_peer_ni *lp)
 {
-	LASSERT(atomic_read(&lp->lpni_refcount) > 0);
-	if (atomic_dec_and_test(&lp->lpni_refcount))
-		lnet_destroy_peer_ni_locked(lp);
+	kref_put(&lp->lpni_kref, lnet_destroy_peer_ni_locked);
 }
 
 static inline int
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index af8f61e..ce067b3 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -42,6 +42,7 @@
 #include <linux/uio.h>
 #include <linux/types.h>
 #include <linux/completion.h>
+#include <linux/kref.h>
 
 #include <uapi/linux/lnet/lnet-types.h>
 #include <uapi/linux/lnet/lnetctl.h>
@@ -568,7 +569,7 @@ struct lnet_peer_ni {
 	/* peer's NID */
 	lnet_nid_t		 lpni_nid;
 	/* # refs */
-	atomic_t		 lpni_refcount;
+	struct kref		 lpni_kref;
 	/* health value for the peer */
 	atomic_t		 lpni_healthv;
 	/* recovery ping mdh */
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 15fcb5e..c833e34 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -127,7 +127,7 @@
 	INIT_LIST_HEAD(&lpni->lpni_on_remote_peer_ni_list);
 	INIT_LIST_HEAD(&lpni->lpni_rtr_pref_nids);
 	LNetInvalidateMDHandle(&lpni->lpni_recovery_ping_mdh);
-	atomic_set(&lpni->lpni_refcount, 1);
+	kref_init(&lpni->lpni_kref);
 	lpni->lpni_sel_priority = LNET_MAX_SELECTION_PRIORITY;
 
 	spin_lock_init(&lpni->lpni_lock);
@@ -1864,14 +1864,16 @@ struct lnet_peer_net *
 }
 
 void
-lnet_destroy_peer_ni_locked(struct lnet_peer_ni *lpni)
+lnet_destroy_peer_ni_locked(struct kref *ref)
 {
+	struct lnet_peer_ni *lpni = container_of(ref, struct lnet_peer_ni,
+						 lpni_kref);
 	struct lnet_peer_table *ptable;
 	struct lnet_peer_net *lpn;
 
 	CDEBUG(D_NET, "%p nid %s\n", lpni, libcfs_nid2str(lpni->lpni_nid));
 
-	LASSERT(atomic_read(&lpni->lpni_refcount) == 0);
+	LASSERT(kref_read(&lpni->lpni_kref) == 0);
 	LASSERT(list_empty(&lpni->lpni_txq));
 	LASSERT(lpni->lpni_txqnob == 0);
 	LASSERT(list_empty(&lpni->lpni_peer_nis));
@@ -3779,7 +3781,7 @@ void lnet_peer_discovery_stop(void)
 		aliveness = (lnet_is_peer_ni_alive(lp)) ? "up" : "down";
 
 	CDEBUG(D_WARNING, "%-24s %4d %5s %5d %5d %5d %5d %5d %ld\n",
-	       libcfs_nid2str(lp->lpni_nid), atomic_read(&lp->lpni_refcount),
+	       libcfs_nid2str(lp->lpni_nid), kref_read(&lp->lpni_kref),
 	       aliveness, lp->lpni_net->net_tunables.lct_peer_tx_credits,
 	       lp->lpni_rtrcredits, lp->lpni_minrtrcredits,
 	       lp->lpni_txcredits, lp->lpni_mintxcredits, lp->lpni_txqnob);
@@ -3837,7 +3839,7 @@ void lnet_peer_discovery_stop(void)
 					 ? "up" : "down");
 
 			*nid = lp->lpni_nid;
-			*refcount = atomic_read(&lp->lpni_refcount);
+			*refcount = kref_read(&lp->lpni_kref);
 			*ni_peer_tx_credits =
 				lp->lpni_net->net_tunables.lct_peer_tx_credits;
 			*peer_tx_credits = lp->lpni_txcredits;
@@ -3922,7 +3924,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 			snprintf(lpni_info->cr_aliveness, LNET_MAX_STR_LEN,
 				 lnet_is_peer_ni_alive(lpni) ? "up" : "down");
 
-		lpni_info->cr_refcount = atomic_read(&lpni->lpni_refcount);
+		lpni_info->cr_refcount = kref_read(&lpni->lpni_kref);
 		lpni_info->cr_ni_peer_tx_credits = lpni->lpni_net ?
 			lpni->lpni_net->net_tunables.lct_peer_tx_credits : 0;
 		lpni_info->cr_peer_tx_credits = lpni->lpni_txcredits;
diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c
index 25d172d..dd52a08 100644
--- a/net/lnet/lnet/router_proc.c
+++ b/net/lnet/lnet/router_proc.c
@@ -475,7 +475,7 @@ static int proc_lnet_peers(struct ctl_table *table, int write,
 
 		if (peer) {
 			lnet_nid_t nid = peer->lpni_nid;
-			int nrefs = atomic_read(&peer->lpni_refcount);
+			int nrefs = kref_read(&peer->lpni_kref);
 			time64_t lastalive = -1;
 			char *aliveness = "NA";
 			int maxcr = peer->lpni_net ?
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 48/49] lustre: lmv: handle default stripe_count=-1 properly
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (46 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 47/49] lnet: convert lpni_refcount to a kref James Simmons
@ 2021-04-15  4:02 ` James Simmons
  2021-04-15  4:02 ` [lustre-devel] [PATCH 49/49] lnet: libcfs: discard cfs_array_alloc() James Simmons
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

If the default LMV stripe_count=-1 print it as a signed value
instead of unsigned, to better match how it is set with "-c -1".

WC-bug-id: https://jira.whamcloud.com/browse/LU-14507
Lustre-commit: d9753b5ba6ad29f ("LU-14507 mdt: handle default stripe_count=-1 properly")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41983
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_lmv.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h
index afa4d60..aee8342 100644
--- a/fs/lustre/include/lustre_lmv.h
+++ b/fs/lustre/include/lustre_lmv.h
@@ -149,7 +149,8 @@ static inline void lmv1_le_to_cpu(struct lmv_mds_md_v1 *lmv_dst,
 		le32_to_cpu(lmv_src->lmv_master_mdt_index);
 	lmv_dst->lmv_hash_type = le32_to_cpu(lmv_src->lmv_hash_type);
 	lmv_dst->lmv_layout_version = le32_to_cpu(lmv_src->lmv_layout_version);
-
+	if (lmv_src->lmv_stripe_count > LMV_MAX_STRIPE_COUNT)
+		return;
 	for (i = 0; i < lmv_src->lmv_stripe_count; i++)
 		fid_le_to_cpu(&lmv_dst->lmv_stripe_fids[i],
 			      &lmv_src->lmv_stripe_fids[i]);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [lustre-devel] [PATCH 49/49] lnet: libcfs: discard cfs_array_alloc()
  2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
                   ` (47 preceding siblings ...)
  2021-04-15  4:02 ` [lustre-devel] [PATCH 48/49] lustre: lmv: handle default stripe_count=-1 properly James Simmons
@ 2021-04-15  4:02 ` James Simmons
  48 siblings, 0 replies; 50+ messages in thread
From: James Simmons @ 2021-04-15  4:02 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: "Mr. NeilBrown" <neilb@suse.de>

cfs_array_alloc() and _free() are used for precisely one array, and
provide little value beyond open-coding the alloc and free.

So discard these functions and alloc/free in the loops that already
exist for setup and cleanup.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14289
Lustre-commit: 8ec9fe0d55b1f5a ("LU-14289 libcfs: discard cfs_array_alloc()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41992
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/libcfs/libcfs_private.h |  7 -----
 net/lnet/libcfs/libcfs_mem.c          | 52 -----------------------------------
 net/lnet/lnet/lib-ptl.c               | 19 ++++++++-----
 3 files changed, 12 insertions(+), 66 deletions(-)

diff --git a/include/linux/libcfs/libcfs_private.h b/include/linux/libcfs/libcfs_private.h
index cf13fe0..3996d0e 100644
--- a/include/linux/libcfs/libcfs_private.h
+++ b/include/linux/libcfs/libcfs_private.h
@@ -105,13 +105,6 @@
 int libcfs_debug_clear_buffer(void);
 int libcfs_debug_mark_buffer(const char *text);
 
-/*
- * allocate a variable array, returned value is an array of pointers.
- * Caller can specify length of array by count.
- */
-void *cfs_array_alloc(int count, unsigned int size);
-void  cfs_array_free(void *vars);
-
 #define LASSERT_ATOMIC_ENABLED	  (1)
 
 #if LASSERT_ATOMIC_ENABLED
diff --git a/net/lnet/libcfs/libcfs_mem.c b/net/lnet/libcfs/libcfs_mem.c
index 2d533be..6a49d39 100644
--- a/net/lnet/libcfs/libcfs_mem.c
+++ b/net/lnet/libcfs/libcfs_mem.c
@@ -117,55 +117,3 @@ struct cfs_var_array {
 	return arr->va_count;
 }
 EXPORT_SYMBOL(cfs_percpt_number);
-
-/*
- * free variable array, see more detail in cfs_array_alloc
- */
-void
-cfs_array_free(void *vars)
-{
-	struct cfs_var_array *arr;
-	int i;
-
-	arr = container_of(vars, struct cfs_var_array, va_ptrs[0]);
-
-	for (i = 0; i < arr->va_count; i++) {
-		if (!arr->va_ptrs[i])
-			continue;
-
-		kvfree(arr->va_ptrs[i]);
-	}
-	kvfree(arr);
-}
-EXPORT_SYMBOL(cfs_array_free);
-
-/*
- * allocate a variable array, returned value is an array of pointers.
- * Caller can specify length of array by @count, @size is size of each
- * memory block in array.
- */
-void *
-cfs_array_alloc(int count, unsigned int size)
-{
-	struct cfs_var_array *arr;
-	int i;
-
-	arr = kvmalloc(offsetof(struct cfs_var_array, va_ptrs[count]), GFP_KERNEL);
-	if (!arr)
-		return NULL;
-
-	arr->va_count = count;
-	arr->va_size = size;
-
-	for (i = 0; i < count; i++) {
-		arr->va_ptrs[i] = kvzalloc(size, GFP_KERNEL);
-
-		if (!arr->va_ptrs[i]) {
-			cfs_array_free((void *)&arr->va_ptrs[0]);
-			return NULL;
-		}
-	}
-
-	return (void *)&arr->va_ptrs[0];
-}
-EXPORT_SYMBOL(cfs_array_alloc);
diff --git a/net/lnet/lnet/lib-ptl.c b/net/lnet/lnet/lib-ptl.c
index ae38bc3..45d1be2 100644
--- a/net/lnet/lnet/lib-ptl.c
+++ b/net/lnet/lnet/lib-ptl.c
@@ -830,6 +830,7 @@ struct list_head *
 	return -ENOMEM;
 }
 
+#define PORTAL_SIZE (offsetof(struct lnet_portal, ptl_mt_maps[LNET_CPT_NUMBER]))
 void
 lnet_portals_destroy(void)
 {
@@ -839,9 +840,12 @@ struct list_head *
 		return;
 
 	for (i = 0; i < the_lnet.ln_nportals; i++)
-		lnet_ptl_cleanup(the_lnet.ln_portals[i]);
+		if (the_lnet.ln_portals[i]) {
+			lnet_ptl_cleanup(the_lnet.ln_portals[i]);
+			kfree(the_lnet.ln_portals[i]);
+		}
 
-	cfs_array_free(the_lnet.ln_portals);
+	kvfree(the_lnet.ln_portals);
 	the_lnet.ln_portals = NULL;
 	the_lnet.ln_nportals = 0;
 }
@@ -849,12 +853,11 @@ struct list_head *
 int
 lnet_portals_create(void)
 {
-	int size;
 	int i;
 
-	size = offsetof(struct lnet_portal, ptl_mt_maps[LNET_CPT_NUMBER]);
-
-	the_lnet.ln_portals = cfs_array_alloc(MAX_PORTALS, size);
+	the_lnet.ln_portals = kvmalloc_array(MAX_PORTALS,
+					     sizeof(*the_lnet.ln_portals),
+					     GFP_KERNEL);
 	if (!the_lnet.ln_portals) {
 		CERROR("Failed to allocate portals table\n");
 		return -ENOMEM;
@@ -862,7 +865,9 @@ struct list_head *
 	the_lnet.ln_nportals = MAX_PORTALS;
 
 	for (i = 0; i < the_lnet.ln_nportals; i++) {
-		if (lnet_ptl_setup(the_lnet.ln_portals[i], i)) {
+		the_lnet.ln_portals[i] = kzalloc(PORTAL_SIZE, GFP_KERNEL);
+		if (!the_lnet.ln_portals[i] ||
+		    lnet_ptl_setup(the_lnet.ln_portals[i], i)) {
 			lnet_portals_destroy();
 			return -ENOMEM;
 		}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2021-04-15  4:06 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-15  4:01 [lustre-devel] [PATCH 00/49] lustre: sync to OpenSFS as of March 30 2021 James Simmons
2021-04-15  4:01 ` [lustre-devel] [PATCH 01/49] lnet: libcfs: Fix for unconfigured arch_stackwalk James Simmons
2021-04-15  4:01 ` [lustre-devel] [PATCH 02/49] lustre: lmv: iput() can safely be passed NULL James Simmons
2021-04-15  4:01 ` [lustre-devel] [PATCH 03/49] lustre: llite: mark extended attr and inode flags James Simmons
2021-04-15  4:01 ` [lustre-devel] [PATCH 04/49] lnet: lnet_notify sets route aliveness incorrectly James Simmons
2021-04-15  4:01 ` [lustre-devel] [PATCH 05/49] lnet: Prevent discovery on peer marked deletion James Simmons
2021-04-15  4:01 ` [lustre-devel] [PATCH 06/49] lnet: Prevent discovery on deleted peer James Simmons
2021-04-15  4:01 ` [lustre-devel] [PATCH 07/49] lnet: Transfer disc src NID when merging peers James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 08/49] lnet: Lookup lpni after discovery James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 09/49] lustre: llite: update and fix module loading bug in mounting code James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 10/49] lnet: socklnd: change various ints to bool James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 11/49] lnet: Correct asymmetric route detection James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 12/49] lustre: fixup ldlm_pool and lu_object shrinker failure cases James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 13/49] lustre: log: Add ending newline for some messages James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 14/49] lustre: use with_imp_locked() more broadly James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 15/49] lnet: o2iblnd: change some ints to bool James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 16/49] lustre: lmv: striped directory as subdirectory mount James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 17/49] lustre: llite: create file_operations registration function James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 18/49] lustre: osc: fix performance regression in osc_extent_merge() James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 19/49] lustre: mds: add enums for MDS_ATTR flags James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 20/49] lustre: uapi: remove OBD_IOC_LOV_GET_CONFIG James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 21/49] lustre: sec: fix migrate for encrypted dir James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 22/49] lnet: libcfs: restore LNET_DUMP_ON_PANIC functionality James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 23/49] lustre: ptlrpc: fix ASSERTION on scp_rqbd_posted James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 24/49] lustre: ldlm: not freed req on enqueue James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 25/49] lnet: uapi: move userland only nidstr.h handling James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 26/49] lnet: libcfs: don't depend on sysctl support for debugfs James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 27/49] lustre: ptlrpc: Add a binary heap implementation James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 28/49] lustre: ptlrpc: Implement NRS Delay Policy James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 29/49] lustre: ptlrpc: rename cfs_binheap to simply binheap James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 30/49] lustre: ptlrpc: mark some functions as static James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 31/49] lustre: use tgt_pool for lov layer James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 32/49] lustre: quota: make used for pool correct James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 33/49] lustre: quota: call rhashtable_lookup near params decl James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 34/49] lustre: lov: cancel layout lock on replay deadlock James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 35/49] lustre: obdclass: Protect cl_env_percpu[] James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 36/49] lnet: libcfs: discard cfs_trace_console_buffers[] James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 37/49] lnet: libcfs: discard cfs_trace_copyin_string() James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 38/49] lustre: lmv: don't use lqr_alloc spinlock in lmv James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 39/49] lustre: lov: fault page update cp_lov_index James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 40/49] lustre: update version to 2.14.51 James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 41/49] lustre: llite: mirror extend/copy keeps sparseness James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 42/49] lustre: ptlrpc: don't use list_for_each_entry_safe unnecessarily James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 43/49] lnet: Age peer NI out of recovery James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 44/49] lnet: Only recover known good peer NIs James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 45/49] lnet: Recover peer NI w/exponential backoff interval James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 46/49] lustre: lov: return valid stripe_count/size for PFL files James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 47/49] lnet: convert lpni_refcount to a kref James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 48/49] lustre: lmv: handle default stripe_count=-1 properly James Simmons
2021-04-15  4:02 ` [lustre-devel] [PATCH 49/49] lnet: libcfs: discard cfs_array_alloc() James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).