lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020
@ 2020-10-06  0:05 James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 01/42] lustre: ptlrpc: don't require CONFIG_CRYPTO_CRC32 James Simmons
                   ` (41 more replies)
  0 siblings, 42 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

Backport of various patches from the OpenSFS tree. Only the fscrypto
has been held back. Some fixes are needed for PCC as well.

Alexander Boyko (1):
  lustre: llite: add test to check client deadlock selinux

Alexander Zarochentsev (1):
  lustre: mdc: fix lovea for replay

Alexey Lyashkov (1):
  lustre: ldlm: don't use a locks without l_ast_data

Amir Shehata (1):
  lustre: mgc: Use IR for client->MDS/OST connections

Andreas Dilger (4):
  lustre: ptlrpc: don't require CONFIG_CRYPTO_CRC32
  lustre: llite: return -ENODATA if no default layout
  lustre: ptlrpc: prefer crc32_le() over CryptoAPI
  lustre: llite: increase readahead default values

Chris Horn (5):
  lnet: Do not set preferred NI for MR peer
  lnet: Conditionally attach rspt in LNetPut & LNetGet
  lnet: Loosen restrictions on LNet Health params
  lnet: Fix reference leak in lnet_select_pathway
  lnet: Do not overwrite destination when routing

James Simmons (1):
  lnet: don't read debugfs lnet stats when shutting down

Jinshan Xiong (1):
  lustre: ldlm: control lru_size for extent lock

Lai Siyao (2):
  lustre: llite: prune invalid dentries
  lustre: obdclass: don't initialize obj for zero FID

Mikhail Pershin (2):
  lustre: dom: lock cancel to drop pages
  lustre: osc: don't allow negative grants

Mr NeilBrown (10):
  lustre: lov: make various lov_object.c function static.
  lustre: sec: use memchr_inv() to check if page is zero.
  lnet: use init_wait(), not init_waitqueue_entry()
  lnet: libcfs: don't save journal_info in dumplog thread.
  lnet: call event handlers without res_lock
  lnet: Support checking for MD leaks.
  lustre: lov: don't use inline for operations functions.
  lustre: lov: discard unused lov_dump_lmm* functions
  lustre: lov: guard against class_exp2obd() returning NULL.
  lustre: don't take spinlock to read a 'long'.

NeilBrown (1):
  lustre: obdclass: fixes and improvements for jobid.

Oleg Drokin (1):
  lustre: update version to 2.13.56

Patrick Farrell (1):
  lustre: osc: Do ELC on locks with no OSC object

Serguei Smirnov (1):
  lnet: deadlock on LNet shutdown

Shaun Tancheff (1):
  lustre: llite: it_lock_bits should be bit-wise tested

Vitaly Fertman (4):
  lustre: ldlm: lru code cleanup
  lustre: ldlm: cancel LRU improvement
  lustre: ldlm: pool fixes
  lustre: ldlm: pool recalc forceful call

Wang Shilong (4):
  lustre: llite: reuse same cl_dio_aio for one IO
  lustre: llite: move iov iter forward by ourself
  lustre: llite: report client stats sumsq
  lustre: clio: don't call aio_complete() in lustre upon errors

 fs/lustre/Kconfig                      |   3 -
 fs/lustre/include/cl_object.h          |   1 +
 fs/lustre/include/lustre_dlm.h         |  72 +++++++--
 fs/lustre/include/lustre_net.h         |   6 +-
 fs/lustre/include/lustre_osc.h         |   4 +
 fs/lustre/include/obd.h                |   9 +-
 fs/lustre/include/obd_cksum.h          |  20 +--
 fs/lustre/include/obd_class.h          |   2 +
 fs/lustre/include/obd_support.h        |   1 +
 fs/lustre/ldlm/ldlm_internal.h         |  22 +--
 fs/lustre/ldlm/ldlm_lib.c              |  44 ++++++
 fs/lustre/ldlm/ldlm_lock.c             |  32 ++--
 fs/lustre/ldlm/ldlm_lockd.c            |  13 +-
 fs/lustre/ldlm/ldlm_pool.c             | 151 +++++++++++--------
 fs/lustre/ldlm/ldlm_request.c          | 268 +++++++++++++++------------------
 fs/lustre/ldlm/ldlm_resource.c         |  79 ++++++++--
 fs/lustre/llite/dcache.c               |  15 +-
 fs/lustre/llite/dir.c                  |  11 +-
 fs/lustre/llite/file.c                 | 141 +++++++++++------
 fs/lustre/llite/llite_internal.h       |  24 +--
 fs/lustre/llite/llite_lib.c            |   8 +-
 fs/lustre/llite/lproc_llite.c          |  31 +---
 fs/lustre/llite/namei.c                |  57 ++++---
 fs/lustre/llite/rw26.c                 |   5 +
 fs/lustre/llite/vvp_io.c               |  20 +--
 fs/lustre/lov/lov_cl_internal.h        |  12 --
 fs/lustre/lov/lov_ea.c                 |  33 +++-
 fs/lustre/lov/lov_internal.h           |   1 -
 fs/lustre/lov/lov_obd.c                |   7 +-
 fs/lustre/lov/lov_object.c             |  19 +--
 fs/lustre/lov/lov_pack.c               |  17 +--
 fs/lustre/lov/lovsub_object.c          |   4 +-
 fs/lustre/mdc/lproc_mdc.c              |   7 +-
 fs/lustre/mdc/mdc_dev.c                |  27 +++-
 fs/lustre/mdc/mdc_locks.c              |  37 ++---
 fs/lustre/mdc/mdc_request.c            |   6 +
 fs/lustre/mgc/lproc_mgc.c              |  30 ++++
 fs/lustre/mgc/mgc_request.c            |  58 +++++--
 fs/lustre/obdclass/cl_io.c             |   3 +-
 fs/lustre/obdclass/integrity.c         |   8 +-
 fs/lustre/obdclass/jobid.c             |  14 +-
 fs/lustre/obdclass/lu_object.c         |   7 +
 fs/lustre/obdclass/lustre_peer.c       |  37 ++++-
 fs/lustre/obdclass/obd_cksum.c         |   6 +-
 fs/lustre/osc/lproc_osc.c              |  23 +--
 fs/lustre/osc/osc_cache.c              |   2 +-
 fs/lustre/osc/osc_internal.h           |   3 +-
 fs/lustre/osc/osc_lock.c               |  14 +-
 fs/lustre/osc/osc_object.c             |   2 +-
 fs/lustre/osc/osc_request.c            |  41 +++--
 fs/lustre/ptlrpc/events.c              |   1 +
 fs/lustre/ptlrpc/pack_generic.c        |  18 ++-
 fs/lustre/ptlrpc/sec_null.c            |  10 +-
 fs/lustre/ptlrpc/sec_plain.c           |  26 +---
 include/linux/lnet/api.h               |   2 +
 include/linux/lnet/lib-lnet.h          |  25 ++-
 include/linux/lnet/lib-types.h         |   9 ++
 include/uapi/linux/lustre/lustre_idl.h |   2 +-
 include/uapi/linux/lustre/lustre_ver.h |   4 +-
 net/lnet/libcfs/debug.c                |   7 +-
 net/lnet/libcfs/tracefile.c            |   2 +-
 net/lnet/libcfs/tracefile.h            |   1 -
 net/lnet/lnet/api-ni.c                 |  86 ++++++-----
 net/lnet/lnet/lib-md.c                 |  45 +++++-
 net/lnet/lnet/lib-move.c               | 145 ++++++++++++------
 net/lnet/lnet/lib-msg.c                |  36 +++--
 net/lnet/lnet/peer.c                   |   1 +
 net/lnet/lnet/router_proc.c            |  17 +--
 net/lnet/selftest/rpc.c                |   1 +
 69 files changed, 1180 insertions(+), 715 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 01/42] lustre: ptlrpc: don't require CONFIG_CRYPTO_CRC32
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 02/42] lustre: dom: lock cancel to drop pages James Simmons
                   ` (40 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Andreas Dilger <adilger@whamcloud.com>

Don't require CONFIG_CRYPTO_CRC32 to build if not configured,
as it may not be available for all kernels and is easily fixed.

Consolidate the early reply code in sec_plain.c to also call
lustre_msg_calc_cksum() to reduce code duplication.

Fixes: 984ea92e22a8 ("lnet: libcfs: make noise to console if CRC32 is missing")
WC-bug-id: https://jira.whamcloud.com/browse/LU-13127
Lustre-commit: 726897c87c42c ("LU-13127 ptlrpc: don't require CONFIG_CRYPTO_CRC32")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39201
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/Kconfig                      |  3 ---
 fs/lustre/include/lustre_net.h         |  2 +-
 fs/lustre/include/obd.h                |  4 ++--
 fs/lustre/include/obd_cksum.h          | 20 ++++++++++----------
 fs/lustre/obdclass/integrity.c         |  8 ++++----
 fs/lustre/obdclass/obd_cksum.c         |  6 +++---
 fs/lustre/osc/osc_request.c            | 10 +++++-----
 fs/lustre/ptlrpc/pack_generic.c        | 13 +++++++++++--
 fs/lustre/ptlrpc/sec_null.c            | 10 +++++-----
 fs/lustre/ptlrpc/sec_plain.c           | 26 ++++----------------------
 include/uapi/linux/lustre/lustre_idl.h |  2 +-
 11 files changed, 46 insertions(+), 58 deletions(-)

diff --git a/fs/lustre/Kconfig b/fs/lustre/Kconfig
index e2a06bb..bb0e4e7 100644
--- a/fs/lustre/Kconfig
+++ b/fs/lustre/Kconfig
@@ -2,9 +2,6 @@ config LUSTRE_FS
 	tristate "Lustre file system client support"
 	depends on LNET
 	select CRYPTO
-	select CRYPTO_CRC32
-	select CRYPTO_CRC32_PCLMUL if X86
-	select CRYPTO_CRC32C
 	select CRYPTO_MD5
 	select CRYPTO_SHA1
 	select CRYPTO_SHA256
diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index dd482bc..d199121 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -2103,7 +2103,7 @@ int lustre_shrink_msg(struct lustre_msg *msg, int segment,
 timeout_t lustre_msg_get_service_timeout(struct lustre_msg *msg);
 char *lustre_msg_get_jobid(struct lustre_msg *msg);
 u32 lustre_msg_get_cksum(struct lustre_msg *msg);
-u32 lustre_msg_calc_cksum(struct lustre_msg *msg);
+u32 lustre_msg_calc_cksum(struct lustre_msg *msg, u32 buf);
 void lustre_msg_set_handle(struct lustre_msg *msg,
 			   struct lustre_handle *handle);
 void lustre_msg_set_type(struct lustre_msg *msg, u32 type);
diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index cd3abfd..c73aebe 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -338,9 +338,9 @@ struct client_obd {
 	/* supported checksum types that are worked out at connect time */
 	u32			cl_supp_cksum_types;
 	/* checksum algorithm to be used */
-	enum cksum_type		cl_cksum_type;
+	enum cksum_types	cl_cksum_type;
 	/* preferred checksum algorithm to be used */
-	enum cksum_type		cl_preferred_cksum_type;
+	enum cksum_types	cl_preferred_cksum_type;
 
 	/* also protected by the poorly named _loi_list_lock lock above */
 	struct osc_async_rc     cl_ar;
diff --git a/fs/lustre/include/obd_cksum.h b/fs/lustre/include/obd_cksum.h
index c03d0e6..f7d316b 100644
--- a/fs/lustre/include/obd_cksum.h
+++ b/fs/lustre/include/obd_cksum.h
@@ -36,9 +36,9 @@
 #include <uapi/linux/lustre/lustre_idl.h>
 
 int obd_t10_cksum_speed(const char *obd_name,
-			enum cksum_type cksum_type);
+			enum cksum_types cksum_type);
 
-static inline unsigned char cksum_obd2cfs(enum cksum_type cksum_type)
+static inline unsigned char cksum_obd2cfs(enum cksum_types cksum_type)
 {
 	switch (cksum_type) {
 	case OBD_CKSUM_CRC32:
@@ -54,9 +54,9 @@ static inline unsigned char cksum_obd2cfs(enum cksum_type cksum_type)
 	return 0;
 }
 
-u32 obd_cksum_type_pack(const char *obd_name, enum cksum_type cksum_type);
+u32 obd_cksum_type_pack(const char *obd_name, enum cksum_types cksum_type);
 
-static inline enum cksum_type obd_cksum_type_unpack(u32 o_flags)
+static inline enum cksum_types obd_cksum_type_unpack(u32 o_flags)
 {
 	switch (o_flags & OBD_FL_CKSUM_ALL) {
 	case OBD_FL_CKSUM_CRC32C:
@@ -82,9 +82,9 @@ static inline enum cksum_type obd_cksum_type_unpack(u32 o_flags)
  * 1.8 supported ADLER it is base and not depend on hw
  * Client uses all available local algos
  */
-static inline enum cksum_type obd_cksum_types_supported_client(void)
+static inline enum cksum_types obd_cksum_types_supported_client(void)
 {
-	enum cksum_type ret = OBD_CKSUM_ADLER;
+	enum cksum_types ret = OBD_CKSUM_ADLER;
 
 	CDEBUG(D_INFO, "Crypto hash speed: crc %d, crc32c %d, adler %d\n",
 	       cfs_crypto_hash_speed(cksum_obd2cfs(OBD_CKSUM_CRC32)),
@@ -110,9 +110,9 @@ static inline enum cksum_type obd_cksum_types_supported_client(void)
  * not be the fastest or most efficient algorithm on the server.
  */
 static inline
-enum cksum_type obd_cksum_type_select(const char *obd_name,
-				       enum cksum_type cksum_types,
-				       enum cksum_type preferred)
+enum cksum_types obd_cksum_type_select(const char *obd_name,
+				       enum cksum_types cksum_types,
+				       enum cksum_types preferred)
 {
 	u32 flag;
 
@@ -143,7 +143,7 @@ int obd_page_dif_generate_buffer(const char *obd_name, struct page *page,
  * If checksum type is one T10 checksum types, init the csum_fn and sector
  * size. Otherwise, init them to NULL/zero.
  */
-static inline void obd_t10_cksum2dif(enum cksum_type cksum_type,
+static inline void obd_t10_cksum2dif(enum cksum_types cksum_type,
 				     obd_dif_csum_fn **fn, int *sector_size)
 {
 	*fn = NULL;
diff --git a/fs/lustre/obdclass/integrity.c b/fs/lustre/obdclass/integrity.c
index cbb91ed..7a95a11 100644
--- a/fs/lustre/obdclass/integrity.c
+++ b/fs/lustre/obdclass/integrity.c
@@ -80,7 +80,7 @@ int obd_page_dif_generate_buffer(const char *obd_name, struct page *page,
 EXPORT_SYMBOL(obd_page_dif_generate_buffer);
 
 static int __obd_t10_performance_test(const char *obd_name,
-				      enum cksum_type cksum_type,
+				      enum cksum_types cksum_type,
 				      struct page *data_page,
 				      int repeat_number)
 {
@@ -163,7 +163,7 @@ static int __obd_t10_performance_test(const char *obd_name,
 static int obd_t10_cksum_speeds[OBD_T10_CKSUM_MAX];
 
 static enum obd_t10_cksum_type
-obd_t10_cksum2type(enum cksum_type cksum_type)
+obd_t10_cksum2type(enum cksum_types cksum_type)
 {
 	switch (cksum_type) {
 	case OBD_CKSUM_T10IP512:
@@ -205,7 +205,7 @@ static const char *obd_t10_cksum_name(enum obd_t10_cksum_type index)
  * \param[in] cksum_type	checksum type (OBD_CKSUM_T10*)
  */
 static void obd_t10_performance_test(const char *obd_name,
-				     enum cksum_type cksum_type)
+				     enum cksum_types cksum_type)
 {
 	enum obd_t10_cksum_type index = obd_t10_cksum2type(cksum_type);
 	const int buf_len = max(PAGE_SIZE, 1048576UL);
@@ -255,7 +255,7 @@ static void obd_t10_performance_test(const char *obd_name,
 }
 
 int obd_t10_cksum_speed(const char *obd_name,
-			enum cksum_type cksum_type)
+			enum cksum_types cksum_type)
 {
 	enum obd_t10_cksum_type index = obd_t10_cksum2type(cksum_type);
 
diff --git a/fs/lustre/obdclass/obd_cksum.c b/fs/lustre/obdclass/obd_cksum.c
index 601feb7..e109ab71 100644
--- a/fs/lustre/obdclass/obd_cksum.c
+++ b/fs/lustre/obdclass/obd_cksum.c
@@ -30,9 +30,9 @@
 #include <obd_cksum.h>
 
 /* Server uses algos that perform at 50% or better of the Adler */
-enum cksum_type obd_cksum_types_supported_server(const char *obd_name)
+enum cksum_types obd_cksum_types_supported_server(const char *obd_name)
 {
-	enum cksum_type ret = OBD_CKSUM_ADLER;
+	enum cksum_types ret = OBD_CKSUM_ADLER;
 	int base_speed;
 
 	CDEBUG(D_INFO,
@@ -84,7 +84,7 @@ enum cksum_type obd_cksum_types_supported_server(const char *obd_name)
  *
  * In case multiple algorithms are supported the best one is used.
  */
-u32 obd_cksum_type_pack(const char *obd_name, enum cksum_type cksum_type)
+u32 obd_cksum_type_pack(const char *obd_name, enum cksum_types cksum_type)
 {
 	unsigned int performance = 0, tmp;
 	u32 flag = OBD_FL_CKSUM_ADLER;
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 3bbe3d9..1e56ca3 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1275,7 +1275,7 @@ static int osc_checksum_bulk_t10pi(const char *obd_name, int nob,
 
 static int osc_checksum_bulk(int nob, u32 pg_count,
 			     struct brw_page **pga, int opc,
-			     enum cksum_type cksum_type,
+			     enum cksum_types cksum_type,
 			     u32 *cksum)
 {
 	int i = 0;
@@ -1334,7 +1334,7 @@ static int osc_checksum_bulk(int nob, u32 pg_count,
 }
 
 static int osc_checksum_bulk_rw(const char *obd_name,
-				enum cksum_type cksum_type,
+				enum cksum_types cksum_type,
 				int nob, size_t pg_count,
 				struct brw_page **pga, int opc,
 				u32 *check_sum)
@@ -1641,7 +1641,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 			/* store cl_cksum_type in a local variable since
 			 * it can be changed via lprocfs
 			 */
-			enum cksum_type cksum_type = cli->cl_cksum_type;
+			enum cksum_types cksum_type = cli->cl_cksum_type;
 
 			if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0)
 				body->oa.o_flags = 0;
@@ -1790,7 +1790,7 @@ static int check_write_checksum(struct obdo *oa,
 	int sector_size = 0;
 	u32 new_cksum;
 	char *msg;
-	enum cksum_type cksum_type;
+	enum cksum_types cksum_type;
 	int rc;
 
 	if (server_cksum == client_cksum) {
@@ -1996,7 +1996,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 		u32 server_cksum = body->oa.o_cksum;
 		char *via = "";
 		char *router = "";
-		enum cksum_type cksum_type;
+		enum cksum_types cksum_type;
 		u32 o_flags = body->oa.o_valid & OBD_MD_FLFLAGS ?
 			body->oa.o_flags : 0;
 
diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c
index 55d9814..cbb65ce 100644
--- a/fs/lustre/ptlrpc/pack_generic.c
+++ b/fs/lustre/ptlrpc/pack_generic.c
@@ -41,6 +41,10 @@
 
 #define DEBUG_SUBSYSTEM S_RPC
 
+#ifndef CONFIG_CRYPTO_CRC32
+#include <linux/crc32.h>
+#endif
+
 #include <uapi/linux/lustre/lustre_fiemap.h>
 
 #include <llog_swab.h>
@@ -1229,18 +1233,23 @@ u32 lustre_msg_get_cksum(struct lustre_msg *msg)
 	}
 }
 
-u32 lustre_msg_calc_cksum(struct lustre_msg *msg)
+u32 lustre_msg_calc_cksum(struct lustre_msg *msg, u32 buf)
 {
 	switch (msg->lm_magic) {
 	case LUSTRE_MSG_MAGIC_V2: {
-		struct ptlrpc_body *pb = lustre_msg_ptlrpc_body(msg);
+		struct ptlrpc_body *pb = lustre_msg_buf_v2(msg, buf, 0);
+		u32 len = lustre_msg_buflen(msg, buf);
 		u32 crc;
+#ifdef CONFIG_CRYPTO_CRC32
 		unsigned int hsize = 4;
 
 		cfs_crypto_hash_digest(CFS_HASH_ALG_CRC32, (unsigned char *)pb,
 				       lustre_msg_buflen(msg,
 							 MSG_PTLRPC_BODY_OFF),
 				       NULL, 0, (unsigned char *)&crc, &hsize);
+#else
+		crc = crc32_le(~(__u32)0, (unsigned char *)pb, len);
+#endif
 		return crc;
 	}
 	default:
diff --git a/fs/lustre/ptlrpc/sec_null.c b/fs/lustre/ptlrpc/sec_null.c
index 2eaa788..14058bf 100644
--- a/fs/lustre/ptlrpc/sec_null.c
+++ b/fs/lustre/ptlrpc/sec_null.c
@@ -100,7 +100,8 @@ int null_ctx_verify(struct ptlrpc_cli_ctx *ctx, struct ptlrpc_request *req)
 
 	if (req->rq_early) {
 		cksums = lustre_msg_get_cksum(req->rq_repdata);
-		cksumc = lustre_msg_calc_cksum(req->rq_repmsg);
+		cksumc = lustre_msg_calc_cksum(req->rq_repmsg,
+					       MSG_PTLRPC_BODY_OFF);
 
 		if (cksumc != cksums) {
 			CDEBUG(D_SEC,
@@ -356,18 +357,17 @@ int null_authorize(struct ptlrpc_request *req)
 
 	rs->rs_repbuf->lm_secflvr = SPTLRPC_FLVR_NULL;
 	rs->rs_repdata_len = req->rq_replen;
+	req->rq_reply_off = 0;
 
 	if (likely(req->rq_packed_final)) {
 		if (lustre_msghdr_get_flags(req->rq_reqmsg) & MSGHDR_AT_SUPPORT)
 			req->rq_reply_off = lustre_msg_early_size();
-		else
-			req->rq_reply_off = 0;
 	} else {
 		u32 cksum;
 
-		cksum = lustre_msg_calc_cksum(rs->rs_repbuf);
+		cksum = lustre_msg_calc_cksum(rs->rs_repbuf,
+					      MSG_PTLRPC_BODY_OFF);
 		lustre_msg_set_cksum(rs->rs_repbuf, cksum);
-		req->rq_reply_off = 0;
 	}
 
 	return 0;
diff --git a/fs/lustre/ptlrpc/sec_plain.c b/fs/lustre/ptlrpc/sec_plain.c
index ce72f64..b487968 100644
--- a/fs/lustre/ptlrpc/sec_plain.c
+++ b/fs/lustre/ptlrpc/sec_plain.c
@@ -214,7 +214,6 @@ int plain_ctx_verify(struct ptlrpc_cli_ctx *ctx, struct ptlrpc_request *req)
 {
 	struct lustre_msg *msg = req->rq_repdata;
 	struct plain_header *phdr;
-	u32 cksum;
 	bool swabbed;
 
 	if (msg->lm_bufcount != PLAIN_PACK_SEGMENTS) {
@@ -248,15 +247,8 @@ int plain_ctx_verify(struct ptlrpc_cli_ctx *ctx, struct ptlrpc_request *req)
 	}
 
 	if (unlikely(req->rq_early)) {
-		unsigned int hsize = 4;
-
-		cfs_crypto_hash_digest(CFS_HASH_ALG_CRC32,
-				       lustre_msg_buf(msg, PLAIN_PACK_MSG_OFF,
-						      0),
-				       lustre_msg_buflen(msg,
-							 PLAIN_PACK_MSG_OFF),
-				       NULL, 0, (unsigned char *)&cksum,
-				       &hsize);
+		u32 cksum = lustre_msg_calc_cksum(msg, PLAIN_PACK_MSG_OFF);
+
 		if (cksum != msg->lm_cksum) {
 			CDEBUG(D_SEC,
 			       "early reply checksum mismatch: %08x != %08x\n",
@@ -863,23 +855,13 @@ int plain_authorize(struct ptlrpc_request *req)
 		phdr->ph_flags |= PLAIN_FL_BULK;
 
 	rs->rs_repdata_len = len;
+	req->rq_reply_off = 0;
 
 	if (likely(req->rq_packed_final)) {
 		if (lustre_msghdr_get_flags(req->rq_reqmsg) & MSGHDR_AT_SUPPORT)
 			req->rq_reply_off = plain_at_offset;
-		else
-			req->rq_reply_off = 0;
 	} else {
-		unsigned int hsize = 4;
-
-		cfs_crypto_hash_digest(CFS_HASH_ALG_CRC32,
-				       lustre_msg_buf(msg, PLAIN_PACK_MSG_OFF,
-						      0),
-				       lustre_msg_buflen(msg,
-							 PLAIN_PACK_MSG_OFF),
-				       NULL, 0, (unsigned char *)&msg->lm_cksum,
-				       &hsize);
-		req->rq_reply_off = 0;
+		msg->lm_cksum = lustre_msg_calc_cksum(msg, PLAIN_PACK_MSG_OFF);
 	}
 
 	return 0;
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index 2cdc230..fda56d8 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -918,7 +918,7 @@ struct obd_connect_data {
  * algorithm and also the OBD_FL_CKSUM* flags, OBD_CKSUM_ALL flag,
  * OBD_FL_CKSUM_ALL flag and potentially OBD_CKSUM_T10_ALL flag.
  */
-enum cksum_type {
+enum cksum_types {
 	OBD_CKSUM_CRC32		= 0x00000001,
 	OBD_CKSUM_ADLER		= 0x00000002,
 	OBD_CKSUM_CRC32C	= 0x00000004,
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 02/42] lustre: dom: lock cancel to drop pages
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 01/42] lustre: ptlrpc: don't require CONFIG_CRYPTO_CRC32 James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 03/42] lustre: sec: use memchr_inv() to check if page is zero James Simmons
                   ` (39 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Mikhail Pershin <mpershin@whamcloud.com>

Prevent stale pages after lock cancel by creating
cl_page connection for read-on-open pages.

This reverts 99565b37fc to fix the problem.
Since VM pages are connected to cl_object they can be
found and discarded by CLIO properly.

Fixes: 99565b37fc ("lustre: llite: optimize read on open pages")
WC-bug-id: https://jira.whamcloud.com/browse/LU-13759
Lustre-commit: e95eca236471c ("LU-13759 dom: lock cancel to drop pages")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39401
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c           | 66 +++++++++++++++++++++++++++++-----------
 fs/lustre/llite/llite_internal.h |  3 +-
 fs/lustre/llite/namei.c          | 15 +++------
 fs/lustre/mdc/mdc_dev.c          |  5 +++
 4 files changed, 59 insertions(+), 30 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 17c2d88..d872cf3 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -460,9 +460,10 @@ static inline int ll_dom_readpage(void *data, struct page *page)
 	return rc;
 }
 
-void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req,
-			struct lookup_intent *it)
+void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req)
 {
+	struct lu_env *env;
+	struct cl_io *io;
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct cl_object *obj = lli->lli_clob;
 	struct address_space *mapping = inode->i_mapping;
@@ -472,6 +473,8 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req,
 	char *data;
 	unsigned long index, start;
 	struct niobuf_local lnb;
+	u16 refcheck;
+	int rc;
 
 	if (!obj)
 		return;
@@ -504,6 +507,16 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req,
 		return;
 	}
 
+	env = cl_env_get(&refcheck);
+	if (IS_ERR(env))
+		return;
+	io = vvp_env_thread_io(env);
+	io->ci_obj = obj;
+	io->ci_ignore_layout = 1;
+	rc = cl_io_init(env, io, CIT_MISC, obj);
+	if (rc)
+		goto out_io;
+
 	CDEBUG(D_INFO, "Get data along with open at %llu len %i, size %llu\n",
 	       rnb->rnb_offset, rnb->rnb_len, body->mbo_dom_size);
 
@@ -515,6 +528,8 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req,
 	LASSERT((lnb.lnb_file_offset & ~PAGE_MASK) == 0);
 	lnb.lnb_page_offset = 0;
 	do {
+		struct cl_page *page;
+
 		lnb.lnb_data = data + (index << PAGE_SHIFT);
 		lnb.lnb_len = rnb->rnb_len - (index << PAGE_SHIFT);
 		if (lnb.lnb_len > PAGE_SIZE)
@@ -530,9 +545,32 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req,
 			      PTR_ERR(vmpage));
 			break;
 		}
+		lock_page(vmpage);
+		if (!vmpage->mapping) {
+			unlock_page(vmpage);
+			put_page(vmpage);
+			/* page was truncated */
+			break;
+		}
+		/* attach VM page to CL page cache */
+		page = cl_page_find(env, obj, vmpage->index, vmpage,
+				    CPT_CACHEABLE);
+		if (IS_ERR(page)) {
+			ClearPageUptodate(vmpage);
+			unlock_page(vmpage);
+			put_page(vmpage);
+			break;
+		}
+		cl_page_export(env, page, 1);
+		cl_page_put(env, page);
+		unlock_page(vmpage);
 		put_page(vmpage);
 		index++;
 	} while (rnb->rnb_len > (index << PAGE_SHIFT));
+
+out_io:
+	cl_io_fini(env, io);
+	cl_env_put(env, &refcheck);
 }
 
 static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize,
@@ -616,27 +654,21 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize,
 	rc = ll_prep_inode(&inode, req, NULL, itp);
 
 	if (!rc && itp->it_lock_mode) {
-		struct lustre_handle handle = {.cookie = itp->it_lock_handle};
-		struct ldlm_lock *lock;
-		bool has_dom_bit = false;
+		u64 bits = 0;
 
 		/* If we got a lock back and it has a LOOKUP bit set,
 		 * make sure the dentry is marked as valid so we can find it.
 		 * We don't need to care about actual hashing since other bits
 		 * of kernel will deal with that later.
 		 */
-		lock = ldlm_handle2lock(&handle);
-		if (lock) {
-			has_dom_bit = ldlm_has_dom(lock);
-			if (lock->l_policy_data.l_inodebits.bits &
-			    MDS_INODELOCK_LOOKUP)
-				d_lustre_revalidate(de);
-
-			LDLM_LOCK_PUT(lock);
-		}
-		ll_set_lock_data(sbi->ll_md_exp, inode, itp, NULL);
-		if (has_dom_bit)
-			ll_dom_finish_open(inode, req, itp);
+		ll_set_lock_data(sbi->ll_md_exp, inode, itp, &bits);
+		if (bits & MDS_INODELOCK_LOOKUP)
+			d_lustre_revalidate(de);
+		/* if DoM bit returned along with LAYOUT bit then there
+		 * can be read-on-open data returned.
+		 */
+		if (bits & MDS_INODELOCK_DOM && bits & MDS_INODELOCK_LAYOUT)
+			ll_dom_finish_open(inode, req);
 	}
 
 out:
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 89783fb..8a0c40c 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1156,8 +1156,7 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 ssize_t ll_copy_user_md(const struct lov_user_md __user *md,
 			struct lov_user_md **kbuf);
 
-void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req,
-			struct lookup_intent *it);
+void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req);
 
 /* Compute expected user md size when passing in a md from user space */
 static inline ssize_t ll_lov_user_md_size(const struct lov_user_md *lum)
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index a268c93..014a470 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -186,14 +186,6 @@ static int ll_dom_lock_cancel(struct inode *inode, struct ldlm_lock *lock)
 	u16 refcheck;
 	int rc;
 
-	if (!lli->lli_clob) {
-		/* Due to DoM read on open, there may exist pages for Lustre
-		 * regular file even though cl_object is not set up yet.
-		 */
-		truncate_inode_pages(inode->i_mapping, 0);
-		return 0;
-	}
-
 	env = cl_env_get(&refcheck);
 	if (IS_ERR(env))
 		return PTR_ERR(env);
@@ -659,10 +651,11 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 			}
 		}
 
-		if (it->it_op & IT_OPEN)
-			ll_dom_finish_open(inode, request, it);
-
 		ll_set_lock_data(ll_i2sbi(parent)->ll_md_exp, inode, it, &bits);
+		/* OPEN can return data if lock has DoM+LAYOUT bits set */
+		if (it->it_op & IT_OPEN &&
+		    bits & MDS_INODELOCK_DOM && bits & MDS_INODELOCK_LAYOUT)
+			ll_dom_finish_open(inode, request);
 
 		/* We used to query real size from OSTs here, but actually
 		 * this is not needed. For stat() calls size would be updated
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index eb2cc91..d6d98ae 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -1406,6 +1406,11 @@ int mdc_object_prune(const struct lu_env *env, struct cl_object *obj)
 static int mdc_object_flush(const struct lu_env *env, struct cl_object *obj,
 			    struct ldlm_lock *lock)
 {
+	/* if lock cancel is initiated from llite then it is combined
+	 * lock with DOM bit and it may have no l_ast_data initialized yet,
+	 * so init it here with given osc_object.
+	 */
+	mdc_set_dom_lock_data(lock, cl2osc(obj));
 	return mdc_dlm_blocking_ast0(env, lock, LDLM_CB_CANCELING);
 }
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 03/42] lustre: sec: use memchr_inv() to check if page is zero.
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 01/42] lustre: ptlrpc: don't require CONFIG_CRYPTO_CRC32 James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 02/42] lustre: dom: lock cancel to drop pages James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 04/42] lustre: mdc: fix lovea for replay James Simmons
                   ` (38 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

memchr_inv() is the preferred way to check if a memory region is all
zeros.  It is likely fast that memcmp() is it doesn't need to read the
ZERO_PAGE into cache, or into the CPU.  It was introduced in Linux
3.2.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12275
Lustre-commit: afee2380c105c ("LU-12275 sec: use memchr_inv() to check if page is zero.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39459
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_request.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 1e56ca3..fbb8453 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -2077,9 +2077,8 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 			struct brw_page *pg = aa->aa_ppga[idx];
 
 			/* do not decrypt if page is all 0s */
-			if (memcmp(page_address(pg->pg),
-				   page_address(ZERO_PAGE(0)),
-				   PAGE_SIZE) == 0) {
+			if (memchr_inv(page_address(pg->pg), 0,
+				       PAGE_SIZE) == NULL) {
 				/* if page is empty forward info to upper layers
 				 * (ll_io_zero_page) by clearing PagePrivate2
 				 */
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 04/42] lustre: mdc: fix lovea for replay
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (2 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 03/42] lustre: sec: use memchr_inv() to check if page is zero James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 05/42] lustre: llite: add test to check client deadlock selinux James Simmons
                   ` (37 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>

lmm->lmm_stripe_offset gets overwritten by
layout generation at server reply,
so MDT does not recognize such LOVEA as
a valid striping at open request replay.
This patch extendes LU-7008 fix by supporting
of PFL layout.

HPE-bug-id: LUS-8820
WC-bug-id: https://jira.whamcloud.com/browse/LU-13809
Lustre-commit: 72d45e1d344c5 ("LU-13809 mdc: fix lovea for replay")
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/39468
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd.h     |  2 ++
 fs/lustre/lov/lov_ea.c      | 31 +++++++++++++++++++++++++++++++
 fs/lustre/mdc/mdc_locks.c   | 32 ++++++++++++++------------------
 fs/lustre/mdc/mdc_request.c |  6 ++++++
 4 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index c73aebe..083884c9f 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -71,6 +71,8 @@ struct lov_oinfo {				/* per-stripe data structure */
 	struct osc_async_rc	loi_ar;
 };
 
+void lov_fix_ea_for_replay(void *lovea);
+
 static inline void loi_kms_set(struct lov_oinfo *oinfo, u64 kms)
 {
 	oinfo->loi_kms = kms;
diff --git a/fs/lustre/lov/lov_ea.c b/fs/lustre/lov/lov_ea.c
index e198536..1d105c0 100644
--- a/fs/lustre/lov/lov_ea.c
+++ b/fs/lustre/lov/lov_ea.c
@@ -656,3 +656,34 @@ int lov_lsm_entry(const struct lov_stripe_md *lsm, u64 offset)
 
 	return -1;
 }
+
+/**
+ * lmm_layout_gen overlaps stripe_offset field, it needs to be reset back when
+ * sending to MDT for passing striping checks
+ */
+void lov_fix_ea_for_replay(void *lovea)
+{
+	struct lov_user_md *lmm = lovea;
+	struct lov_comp_md_v1 *c1;
+	int i;
+
+	switch (le32_to_cpu(lmm->lmm_magic)) {
+	case LOV_USER_MAGIC_V1:
+	case LOV_USER_MAGIC_V3:
+		lmm->lmm_stripe_offset = LOV_OFFSET_DEFAULT;
+		break;
+
+	case LOV_USER_MAGIC_COMP_V1:
+		c1 = (void *)lmm;
+		for (i = 0; i < le16_to_cpu(c1->lcm_entry_count); i++) {
+			struct lov_comp_md_entry_v1 *ent = &c1->lcm_entries[i];
+
+			if (le32_to_cpu(ent->lcme_flags) & LCME_FL_INIT) {
+				lmm = (void *)((char *)c1 +
+				      le32_to_cpu(ent->lcme_offset));
+				lmm->lmm_stripe_offset = LOV_OFFSET_DEFAULT;
+			}
+		}
+	}
+}
+EXPORT_SYMBOL(lov_fix_ea_for_replay);
diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index ea78415..2d623ff 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -205,7 +205,8 @@ static inline void mdc_clear_replay_flag(struct ptlrpc_request *req, int rc)
 	}
 }
 
-/* Save a large LOV EA into the request buffer so that it is available
+/**
+ * Save a large LOV EA into the request buffer so that it is available
  * for replay. We don't do this in the initial request because the
  * original request doesn't need this buffer (at most it sends just the
  * lov_mds_md) and it is a waste of RAM/bandwidth to send the empty
@@ -217,16 +218,14 @@ static inline void mdc_clear_replay_flag(struct ptlrpc_request *req, int rc)
  * but this is incredibly unlikely, and questionable whether the client
  * could do MDS recovery under OOM anyways...
  */
-static int mdc_save_lovea(struct ptlrpc_request *req,
-			  const struct req_msg_field *field, void *data,
-			  u32 size)
+static int mdc_save_lovea(struct ptlrpc_request *req, void *data, u32 size)
 {
 	struct req_capsule *pill = &req->rq_pill;
-	struct lov_user_md *lmm;
+	void *lovea;
 	int rc = 0;
 
-	if (req_capsule_get_size(pill, field, RCL_CLIENT) < size) {
-		rc = sptlrpc_cli_enlarge_reqbuf(req, field, size);
+	if (req_capsule_get_size(pill, &RMF_EADATA, RCL_CLIENT) < size) {
+		rc = sptlrpc_cli_enlarge_reqbuf(req, &RMF_EADATA, size);
 		if (rc) {
 			CERROR("%s: Can't enlarge ea size to %d: rc = %d\n",
 			       req->rq_export->exp_obd->obd_name,
@@ -234,16 +233,14 @@ static int mdc_save_lovea(struct ptlrpc_request *req,
 			return rc;
 		}
 	} else {
-		req_capsule_shrink(pill, field, size, RCL_CLIENT);
+		req_capsule_shrink(pill, &RMF_EADATA, size, RCL_CLIENT);
 	}
 
-	req_capsule_set_size(pill, field, RCL_CLIENT, size);
-	lmm = req_capsule_client_get(pill, field);
-	if (lmm) {
-		memcpy(lmm, data, size);
-		/* overwrite layout generation returned from the MDS */
-		lmm->lmm_stripe_offset =
-		  (typeof(lmm->lmm_stripe_offset))LOV_OFFSET_DEFAULT;
+	req_capsule_set_size(pill, &RMF_EADATA, RCL_CLIENT, size);
+	lovea = req_capsule_client_get(pill, &RMF_EADATA);
+	if (lovea) {
+		memcpy(lovea, data, size);
+		lov_fix_ea_for_replay(lovea);
 	}
 
 	return rc;
@@ -788,7 +785,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 			 * (for example error one).
 			 */
 			if ((it->it_op & IT_OPEN) && req->rq_replay) {
-				rc = mdc_save_lovea(req, &RMF_EADATA, eadata,
+				rc = mdc_save_lovea(req, eadata,
 						    body->mbo_eadatasize);
 				if (rc) {
 					body->mbo_valid &= ~OBD_MD_FLEASIZE;
@@ -817,8 +814,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 			 * another set of OST objects).
 			 */
 			if (req->rq_transno)
-				(void)mdc_save_lovea(req, &RMF_EADATA, lvb_data,
-						     lvb_len);
+				(void)mdc_save_lovea(req, lvb_data, lvb_len);
 		}
 	}
 
diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index 40670cb..a146af8 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -646,6 +646,7 @@ void mdc_replay_open(struct ptlrpc_request *req)
 	struct obd_client_handle *och;
 	struct lustre_handle old_open_handle = { };
 	struct mdt_body *body;
+	struct ldlm_reply *rep;
 
 	if (!mod) {
 		DEBUG_REQ(D_ERROR, req,
@@ -655,6 +656,11 @@ void mdc_replay_open(struct ptlrpc_request *req)
 
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 
+	rep = req_capsule_server_get(&req->rq_pill, &RMF_DLM_REP);
+	if (rep && rep->lock_policy_res2 != 0)
+		DEBUG_REQ(D_ERROR, req, "Open request replay failed with %ld ",
+			  (long)rep->lock_policy_res2);
+
 	spin_lock(&req->rq_lock);
 	och = mod->mod_och;
 	if (och && och->och_open_handle.cookie)
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 05/42] lustre: llite: add test to check client deadlock selinux
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (3 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 04/42] lustre: mdc: fix lovea for replay James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 06/42] lnet: use init_wait(), not init_waitqueue_entry() James Simmons
                   ` (36 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Alexander Boyko <alexander.boyko@hpe.com>

Add kernel fault injection to check client deadlock
and MDS eviction for it.

Cray-bug-id: LUS-8924
WC-bug-id: https://jira.whamcloud.com/browse/LU-13617
Lustre-commit: f519f22c8ba3a ("LU-13617 tests: check client deadlock selinux")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/38793
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_support.h | 1 +
 fs/lustre/llite/namei.c         | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index 35c7ef3..c678c8b 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -475,6 +475,7 @@
 #define OBD_FAIL_LLITE_PCC_MKWRITE_PAUSE		0x1413
 #define OBD_FAIL_LLITE_PCC_ATTACH_PAUSE			0x1414
 #define OBD_FAIL_LLITE_SHORT_COMMIT			0x1415
+#define OBD_FAIL_LLITE_CREATE_FILE_PAUSE2		0x1416
 
 #define OBD_FAIL_FID_INDIR				0x1501
 #define OBD_FAIL_FID_INLMA				0x1502
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 014a470..ce6cd19 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -1113,6 +1113,8 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 		rc = 0;
 	}
 
+	OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_CREATE_FILE_PAUSE2, cfs_fail_val);
+
 	/* Dentry added to dcache tree in ll_lookup_it */
 	de = ll_lookup_it(dir, dentry, it, &secctx, &secctxlen, &pca, encrypt,
 			  &encctx, &encctxlen);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 06/42] lnet: use init_wait(), not init_waitqueue_entry()
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (4 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 05/42] lustre: llite: add test to check client deadlock selinux James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 07/42] lustre: lov: make various lov_object.c function static James Simmons
                   ` (35 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

    init_waitqueue_entry(foo, current)

is equivalent to

    init_wait(foo)

So use the shorter version - in lustre and libcfs.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: f6aa7a46d36b3 ("LU-6142 lustre: use init_wait(), not init_waitqueue_entry()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39300
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/libcfs/tracefile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c
index f896321..14fcb2a 100644
--- a/net/lnet/libcfs/tracefile.c
+++ b/net/lnet/libcfs/tracefile.c
@@ -1167,7 +1167,7 @@ static int tracefiled(void *arg)
 				break;
 			}
 		}
-		init_waitqueue_entry(&__wait, current);
+		init_wait(&__wait);
 		add_wait_queue(&tctl->tctl_waitq, &__wait);
 		schedule_timeout_interruptible(HZ);
 		remove_wait_queue(&tctl->tctl_waitq, &__wait);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 07/42] lustre: lov: make various lov_object.c function static.
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (5 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 06/42] lnet: use init_wait(), not init_waitqueue_entry() James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 08/42] lustre: llite: return -ENODATA if no default layout James Simmons
                   ` (34 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

These function in lov_object.c and lovsub_object.c are only
used in the file in which they are defined.
So mark them as static.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: 52f04bb11195d ("LU-6142 lov: make various lov_object.c function static.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39385
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_cl_internal.h | 12 ------------
 fs/lustre/lov/lov_object.c      | 19 ++++++++++---------
 fs/lustre/lov/lovsub_object.c   |  4 ++--
 3 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h
index 6796d88..7128224 100644
--- a/fs/lustre/lov/lov_cl_internal.h
+++ b/fs/lustre/lov/lov_cl_internal.h
@@ -599,15 +599,6 @@ struct lov_session {
 
 extern struct kmem_cache *lovsub_object_kmem;
 
-int lov_object_init(const struct lu_env *env, struct lu_object *obj,
-		    const struct lu_object_conf *conf);
-int lovsub_object_init(const struct lu_env *env, struct lu_object *obj,
-		       const struct lu_object_conf *conf);
-int lov_lock_init(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_lock *lock, const struct cl_io *io);
-int lov_io_init(const struct lu_env *env, struct cl_object *obj,
-		struct cl_io *io);
-
 int lov_lock_init_composite(const struct lu_env *env, struct cl_object *obj,
 			    struct cl_lock *lock, const struct cl_io *io);
 int lov_lock_init_empty(const struct lu_env *env, struct cl_object *obj,
@@ -622,8 +613,6 @@ int lov_io_init_released(const struct lu_env *env, struct cl_object *obj,
 struct lov_io_sub *lov_sub_get(const struct lu_env *env, struct lov_io *lio,
 			       int stripe);
 
-int lov_page_init(const struct lu_env *env, struct cl_object *ob,
-		  struct cl_page *page, pgoff_t index);
 int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj,
 			struct cl_page *page, pgoff_t index);
 int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj,
@@ -638,7 +627,6 @@ struct lu_object *lovsub_object_alloc(const struct lu_env *env,
 				      const struct lu_object_header *hdr,
 				      struct lu_device *dev);
 
-struct lov_stripe_md *lov_lsm_addref(struct lov_object *lov);
 bool lov_page_is_empty(const struct cl_page *page);
 int lov_lsm_entry(const struct lov_stripe_md *lsm, u64 offset);
 int lov_io_layout_at(struct lov_io *lio, u64 offset);
diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c
index 6f8f066..7285276 100644
--- a/fs/lustre/lov/lov_object.c
+++ b/fs/lustre/lov/lov_object.c
@@ -80,6 +80,7 @@ struct lov_layout_operations {
 };
 
 static int lov_layout_wait(const struct lu_env *env, struct lov_object *lov);
+static struct lov_stripe_md *lov_lsm_addref(struct lov_object *lov);
 
 static void lov_lsm_put(struct lov_stripe_md *lsm)
 {
@@ -1277,8 +1278,8 @@ static int lov_layout_change(const struct lu_env *unused,
  * Lov object operations.
  *
  */
-int lov_object_init(const struct lu_env *env, struct lu_object *obj,
-		    const struct lu_object_conf *conf)
+static int lov_object_init(const struct lu_env *env, struct lu_object *obj,
+			   const struct lu_object_conf *conf)
 {
 	struct lov_object *lov = lu2lov(obj);
 	struct lov_device *dev = lov_object_dev(lov);
@@ -1402,8 +1403,8 @@ static int lov_object_print(const struct lu_env *env, void *cookie,
 	return LOV_2DISPATCH_NOLOCK(lu2lov(o), llo_print, env, cookie, p, o);
 }
 
-int lov_page_init(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_page *page, pgoff_t index)
+static int lov_page_init(const struct lu_env *env, struct cl_object *obj,
+			 struct cl_page *page, pgoff_t index)
 {
 	return LOV_2DISPATCH_NOLOCK(cl2lov(obj), llo_page_init, env, obj, page,
 				    index);
@@ -1413,8 +1414,8 @@ int lov_page_init(const struct lu_env *env, struct cl_object *obj,
  * Implements cl_object_operations::clo_io_init() method for lov
  * layer. Dispatches to the appropriate layout io initialization method.
  */
-int lov_io_init(const struct lu_env *env, struct cl_object *obj,
-		struct cl_io *io)
+static int lov_io_init(const struct lu_env *env, struct cl_object *obj,
+		       struct cl_io *io)
 {
 	CL_IO_SLICE_CLEAN(lov_env_io(env), lis_preserved);
 
@@ -1455,8 +1456,8 @@ static int lov_attr_update(const struct lu_env *env, struct cl_object *obj,
 	return 0;
 }
 
-int lov_lock_init(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_lock *lock, const struct cl_io *io)
+static int lov_lock_init(const struct lu_env *env, struct cl_object *obj,
+			 struct cl_lock *lock, const struct cl_io *io)
 {
 	/* No need to lock because we've taken one refcount of layout.  */
 	return LOV_2DISPATCH_NOLOCK(cl2lov(obj), llo_lock_init, env, obj, lock,
@@ -2157,7 +2158,7 @@ struct lu_object *lov_object_alloc(const struct lu_env *env,
 	return obj;
 }
 
-struct lov_stripe_md *lov_lsm_addref(struct lov_object *lov)
+static struct lov_stripe_md *lov_lsm_addref(struct lov_object *lov)
 {
 	struct lov_stripe_md *lsm = NULL;
 
diff --git a/fs/lustre/lov/lovsub_object.c b/fs/lustre/lov/lovsub_object.c
index 046f5e8..eef1713 100644
--- a/fs/lustre/lov/lovsub_object.c
+++ b/fs/lustre/lov/lovsub_object.c
@@ -49,8 +49,8 @@
  *
  */
 
-int lovsub_object_init(const struct lu_env *env, struct lu_object *obj,
-		       const struct lu_object_conf *conf)
+static int lovsub_object_init(const struct lu_env *env, struct lu_object *obj,
+			      const struct lu_object_conf *conf)
 {
 	struct lovsub_device *dev = lu2lovsub_dev(obj->lo_dev);
 	struct lu_object *below;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 08/42] lustre: llite: return -ENODATA if no default layout
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (6 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 07/42] lustre: lov: make various lov_object.c function static James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 09/42] lnet: libcfs: don't save journal_info in dumplog thread James Simmons
                   ` (33 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Andreas Dilger <adilger@whamcloud.com>

Don't return -ENOENT if fetching the default layout from the root
directory fails.  Otherwise, "lfs find" will print an error message
for every directory scanned in the filesystem:

     lfs find: /myth/tmp does not exist: No such file or directory

Fixes: 96377f5b2ad6 ("lustre: llite: fetch default layout for a directory")
WC-bug-id: https://jira.whamcloud.com/browse/LU-13687
Lustre-commit: 7fb17eb7b7e60 ("LU-13687 llite: return -ENODATA if no default layout")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39200
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index c2a75ce..262aea0 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -710,10 +710,13 @@ int ll_dir_getstripe_default(struct inode *inode, void **plmm, int *plmm_size,
 	rc = ll_dir_get_default_layout(inode, (void **)&lmm, &lmm_size,
 				       &req, valid, 0);
 	if (rc == -ENODATA && !fid_is_root(ll_inode2fid(inode)) &&
-	    !(valid & (OBD_MD_MEA|OBD_MD_DEFAULT_MEA)) && root_request)
-		rc = ll_dir_get_default_layout(inode, (void **)&lmm, &lmm_size,
-					       &root_req, valid,
-					       GET_DEFAULT_LAYOUT_ROOT);
+	    !(valid & (OBD_MD_MEA|OBD_MD_DEFAULT_MEA)) && root_request) {
+		int rc2 = ll_dir_get_default_layout(inode, (void **)&lmm,
+						    &lmm_size, &root_req, valid,
+						    GET_DEFAULT_LAYOUT_ROOT);
+		if (rc2 == 0)
+			rc = 0;
+	}
 
 	*plmm = lmm;
 	*plmm_size = lmm_size;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 09/42] lnet: libcfs: don't save journal_info in dumplog thread.
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (7 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 08/42] lustre: llite: return -ENODATA if no default layout James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 10/42] lustre: ldlm: lru code cleanup James Simmons
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

As this thread is started by kthread, it must have
a clean environment and cannot possibly be in a
filesystem transaction.  So current->journal_info
must be NULL, and preserving it serves no purpose.

Also change libcfs_debug_dumplog_internal() to 'static'
to make it clear that it shouldn't be called from
anywhere but this thread.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9859
Lustre-commit: 2eba3f9c3de50 ("LU-9859 libcfs: don't save journal_info in dumplog thread.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39294
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/libcfs/debug.c     | 7 +------
 net/lnet/libcfs/tracefile.h | 1 -
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/lnet/libcfs/debug.c b/net/lnet/libcfs/debug.c
index dd03520..ba32a99 100644
--- a/net/lnet/libcfs/debug.c
+++ b/net/lnet/libcfs/debug.c
@@ -372,14 +372,11 @@ static void libcfs_run_debug_log_upcall(char *file)
 /**
  * Dump Lustre log to ::debug_file_path by calling tracefile_dump_all_pages()
  */
-void libcfs_debug_dumplog_internal(void *arg)
+static void libcfs_debug_dumplog_internal(void *arg)
 {
 	static time64_t last_dump_time;
 	time64_t current_time;
-	void *journal_info;
 
-	journal_info = current->journal_info;
-	current->journal_info = NULL;
 	current_time = ktime_get_real_seconds();
 
 	if (strncmp(libcfs_debug_file_path_arr, "NONE", 4) &&
@@ -392,8 +389,6 @@ void libcfs_debug_dumplog_internal(void *arg)
 		cfs_tracefile_dump_all_pages(debug_file_name);
 		libcfs_run_debug_log_upcall(debug_file_name);
 	}
-
-	current->journal_info = journal_info;
 }
 
 static int libcfs_debug_dumplog_thread(void *arg)
diff --git a/net/lnet/libcfs/tracefile.h b/net/lnet/libcfs/tracefile.h
index 88ff0d1..5b90c1b 100644
--- a/net/lnet/libcfs/tracefile.h
+++ b/net/lnet/libcfs/tracefile.h
@@ -69,7 +69,6 @@ int cfs_trace_copyout_string(char __user *usr_buffer, int usr_buffer_nob,
 int cfs_trace_set_debug_mb(int mb);
 int cfs_trace_get_debug_mb(void);
 
-void libcfs_debug_dumplog_internal(void *arg);
 extern int libcfs_panic_in_progress;
 
 #define TCD_MAX_PAGES (5 << (20 - PAGE_SHIFT))
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 10/42] lustre: ldlm: lru code cleanup
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (8 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 09/42] lnet: libcfs: don't save journal_info in dumplog thread James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 11/42] lustre: ldlm: cancel LRU improvement James Simmons
                   ` (31 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Vitaly Fertman <c17818@cray.com>

cleanup includes:
- no need in unused locks parameter in the lru policy, better to
  take the current value right in the policy if needed;
- no need in a special SHRINKER policy, the same as the PASSED one
- no need in a special DEFAULT policy, the same as the PASSED one;
- no need in a special PASSED policy, LRU is to be cleaned anyway
  according to LRU resize or AGED policy;

bug fixes:
- if the @min amount is given, it should not be increased on the
  amount of locks exceeding the limit, but the max of them is to
  be taken instead;
- do not do ELC on enqueue if no LRU limits are reached;
- do not keep lock in LRUR policy once we have cancelled @min locks,
  try to cancel instead until we reach the @max limit if given;
- cancel locks from LRU with the new policy, if changed in sysfs;

HPE-bug-id: LUS-8678
WC-bug-id: https://jira.whamcloud.com/browse/LU-11518
Lustre-commit: 209a112eb153b ("LU-11518 ldlm: lru code cleanup")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://es-gerrit.dev.cray.com/157066
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-on: https://review.whamcloud.com/39560
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_internal.h |  12 +--
 fs/lustre/ldlm/ldlm_pool.c     |  14 +--
 fs/lustre/ldlm/ldlm_request.c  | 205 +++++++++++++----------------------------
 fs/lustre/ldlm/ldlm_resource.c |  19 +---
 4 files changed, 76 insertions(+), 174 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h
index 0dce219..788983f 100644
--- a/fs/lustre/ldlm/ldlm_internal.h
+++ b/fs/lustre/ldlm/ldlm_internal.h
@@ -88,14 +88,10 @@ void ldlm_namespace_move_to_inactive_locked(struct ldlm_namespace *ns,
 /* ldlm_request.c */
 /* Cancel lru flag, it indicates we cancel aged locks. */
 enum ldlm_lru_flags {
-	LDLM_LRU_FLAG_AGED	= BIT(0), /* Cancel old non-LRU resize locks */
-	LDLM_LRU_FLAG_PASSED	= BIT(1), /* Cancel passed number of locks. */
-	LDLM_LRU_FLAG_SHRINK	= BIT(2), /* Cancel locks from shrinker. */
-	LDLM_LRU_FLAG_LRUR	= BIT(3), /* Cancel locks from lru resize. */
-	LDLM_LRU_FLAG_NO_WAIT	= BIT(4), /* Cancel locks w/o blocking (neither
+	LDLM_LRU_FLAG_NO_WAIT	= BIT(1), /* Cancel locks w/o blocking (neither
 					   * sending nor waiting for any rpcs)
 					   */
-	LDLM_LRU_FLAG_CLEANUP	= BIT(5), /* Used when clearing lru, tells
+	LDLM_LRU_FLAG_CLEANUP	= BIT(2), /* Used when clearing lru, tells
 					   * prepare_lru_list to set discard
 					   * flag on PR extent locks so we
 					   * don't waste time saving pages
@@ -103,11 +99,11 @@ enum ldlm_lru_flags {
 					   */
 };
 
-int ldlm_cancel_lru(struct ldlm_namespace *ns, int nr,
+int ldlm_cancel_lru(struct ldlm_namespace *ns, int min,
 		    enum ldlm_cancel_flags cancel_flags,
 		    enum ldlm_lru_flags lru_flags);
 int ldlm_cancel_lru_local(struct ldlm_namespace *ns,
-			  struct list_head *cancels, int count, int max,
+			  struct list_head *cancels, int min, int max,
 			  enum ldlm_cancel_flags cancel_flags,
 			  enum ldlm_lru_flags lru_flags);
 extern unsigned int ldlm_enqueue_min;
diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c
index 40585f7..9e2a006 100644
--- a/fs/lustre/ldlm/ldlm_pool.c
+++ b/fs/lustre/ldlm/ldlm_pool.c
@@ -255,7 +255,6 @@ static void ldlm_cli_pool_pop_slv(struct ldlm_pool *pl)
 static int ldlm_cli_pool_recalc(struct ldlm_pool *pl)
 {
 	time64_t recalc_interval_sec;
-	enum ldlm_lru_flags lru_flags;
 	int ret;
 
 	recalc_interval_sec = ktime_get_real_seconds() - pl->pl_recalc_time;
@@ -280,22 +279,13 @@ static int ldlm_cli_pool_recalc(struct ldlm_pool *pl)
 	spin_unlock(&pl->pl_lock);
 
 	/*
-	 * Cancel aged locks if lru resize is disabled for this ns.
-	 */
-	if (!ns_connect_lru_resize(container_of(pl, struct ldlm_namespace,
-						ns_pool)))
-		lru_flags = LDLM_LRU_FLAG_LRUR;
-	else
-		lru_flags = LDLM_LRU_FLAG_AGED;
-
-	/*
 	 * In the time of canceling locks on client we do not need to maintain
 	 * sharp timing, we only want to cancel locks asap according to new SLV.
 	 * It may be called when SLV has changed much, this is why we do not
 	 * take into account pl->pl_recalc_time here.
 	 */
 	ret = ldlm_cancel_lru(container_of(pl, struct ldlm_namespace, ns_pool),
-			      0, LCF_ASYNC, lru_flags);
+			      0, LCF_ASYNC, 0);
 
 	spin_lock(&pl->pl_lock);
 	/*
@@ -340,7 +330,7 @@ static int ldlm_cli_pool_shrink(struct ldlm_pool *pl,
 	if (nr == 0)
 		return (unused / 100) * sysctl_vfs_cache_pressure;
 	else
-		return ldlm_cancel_lru(ns, nr, LCF_ASYNC, LDLM_LRU_FLAG_SHRINK);
+		return ldlm_cancel_lru(ns, nr, LCF_ASYNC, 0);
 }
 
 static const struct ldlm_pool_ops ldlm_cli_pool_ops = {
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 6318137..4bd7372 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -606,8 +606,7 @@ int ldlm_prep_elc_req(struct obd_export *exp, struct ptlrpc_request *req,
 	struct ldlm_namespace *ns = exp->exp_obd->obd_namespace;
 	struct req_capsule *pill = &req->rq_pill;
 	struct ldlm_request *dlm = NULL;
-	enum ldlm_lru_flags lru_flags;
-	int avail, to_free, pack = 0;
+	int avail, to_free = 0, pack = 0;
 	LIST_HEAD(head);
 	int rc;
 
@@ -618,11 +617,10 @@ int ldlm_prep_elc_req(struct obd_export *exp, struct ptlrpc_request *req,
 		req_capsule_filled_sizes(pill, RCL_CLIENT);
 		avail = ldlm_capsule_handles_avail(pill, RCL_CLIENT, canceloff);
 
-		lru_flags = LDLM_LRU_FLAG_NO_WAIT |
-			    (ns_connect_lru_resize(ns) ?
-			     LDLM_LRU_FLAG_LRUR : LDLM_LRU_FLAG_AGED);
-		to_free = !ns_connect_lru_resize(ns) &&
-			  opc == LDLM_ENQUEUE ? 1 : 0;
+		/* If we have reached the limit, free +1 slot for the new one */
+		if (!ns_connect_lru_resize(ns) && opc == LDLM_ENQUEUE &&
+		    ns->ns_nr_unused >= ns->ns_max_unused)
+			to_free = 1;
 
 		/*
 		 * Cancel LRU locks here _only_ if the server supports
@@ -632,7 +630,7 @@ int ldlm_prep_elc_req(struct obd_export *exp, struct ptlrpc_request *req,
 		if (avail > count)
 			count += ldlm_cancel_lru_local(ns, cancels, to_free,
 						       avail - count, 0,
-						       lru_flags);
+						       LDLM_LRU_FLAG_NO_WAIT);
 		if (avail > count)
 			pack = count;
 		else
@@ -1216,7 +1214,6 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh,
 		    enum ldlm_cancel_flags cancel_flags)
 {
 	struct obd_export *exp;
-	enum ldlm_lru_flags lru_flags;
 	int avail, count = 1;
 	u64 rc = 0;
 	struct ldlm_namespace *ns;
@@ -1271,10 +1268,8 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh,
 		LASSERT(avail > 0);
 
 		ns = ldlm_lock_to_ns(lock);
-		lru_flags = ns_connect_lru_resize(ns) ?
-			    LDLM_LRU_FLAG_LRUR : LDLM_LRU_FLAG_AGED;
 		count += ldlm_cancel_lru_local(ns, &cancels, 0, avail - 1,
-					       LCF_BL_AST, lru_flags);
+					       LCF_BL_AST, 0);
 	}
 	ldlm_cli_cancel_list(&cancels, count, NULL, cancel_flags);
 	return 0;
@@ -1338,12 +1333,12 @@ int ldlm_cli_cancel_list_local(struct list_head *cancels, int count,
  */
 static enum ldlm_policy_res
 ldlm_cancel_no_wait_policy(struct ldlm_namespace *ns, struct ldlm_lock *lock,
-			   int unused, int added, int count)
+			   int added, int min)
 {
 	enum ldlm_policy_res result = LDLM_POLICY_CANCEL_LOCK;
 
 	/*
-	 * don't check added & count since we want to process all locks
+	 * don't check @added & @min since we want to process all locks
 	 * from unused list.
 	 * It's fine to not take lock to access lock->l_resource since
 	 * the lock has already been granted so it won't change.
@@ -1364,42 +1359,36 @@ int ldlm_cli_cancel_list_local(struct list_head *cancels, int count,
 
 /**
  * Callback function for LRU-resize policy. Decides whether to keep
- * @lock in LRU for current @LRU size @unused, added in current
- * scan @added and number of locks to be preferably canceled @count.
+ * @lock in LRU for @added in current scan and @min number of locks
+ * to be preferably canceled.
  *
  * Retun:	LDLM_POLICY_KEEP_LOCK keep lock in LRU in stop scanning
  *		LDLM_POLICY_CANCEL_LOCK cancel lock from LRU
  */
 static enum ldlm_policy_res ldlm_cancel_lrur_policy(struct ldlm_namespace *ns,
 						    struct ldlm_lock *lock,
-						    int unused, int added,
-						    int count)
+						    int added, int min)
 {
 	ktime_t cur = ktime_get();
 	struct ldlm_pool *pl = &ns->ns_pool;
 	u64 slv, lvf, lv;
 	s64 la;
 
-	/*
-	 * Stop LRU processing when we reach past @count or have checked all
-	 * locks in LRU.
-	 */
-	if (count && added >= count)
-		return LDLM_POLICY_KEEP_LOCK;
+	if (added < min)
+		return LDLM_POLICY_CANCEL_LOCK;
 
 	/*
 	 * Despite of the LV, It doesn't make sense to keep the lock which
 	 * is unused for ns_max_age time.
 	 */
-	if (ktime_after(ktime_get(),
-			ktime_add(lock->l_last_used, ns->ns_max_age)))
+	if (ktime_after(cur, ktime_add(lock->l_last_used, ns->ns_max_age)))
 		return LDLM_POLICY_CANCEL_LOCK;
 
 	slv = ldlm_pool_get_slv(pl);
 	lvf = ldlm_pool_get_lvf(pl);
 	la = div_u64(ktime_to_ns(ktime_sub(cur, lock->l_last_used)),
 		     NSEC_PER_SEC);
-	lv = lvf * la * unused;
+	lv = lvf * la * ns->ns_nr_unused;
 
 	/* Inform pool about current CLV to see it via debugfs. */
 	ldlm_pool_set_clv(pl, lv);
@@ -1414,41 +1403,33 @@ static enum ldlm_policy_res ldlm_cancel_lrur_policy(struct ldlm_namespace *ns,
 	return LDLM_POLICY_CANCEL_LOCK;
 }
 
-/**
- * Callback function for debugfs used policy. Makes decision whether to keep
- * @lock in LRU for current @LRU size @unused, added in current scan
- * @added and number of locks to be preferably canceled @count.
- *
- * Return:	LDLM_POLICY_KEEP_LOCK keep lock in LRU in stop scanning
- *		LDLM_POLICY_CANCEL_LOCK cancel lock from LRU
- */
-static enum ldlm_policy_res ldlm_cancel_passed_policy(struct ldlm_namespace *ns,
-						      struct ldlm_lock *lock,
-						      int unused, int added,
-						      int count)
+static enum ldlm_policy_res
+ldlm_cancel_lrur_no_wait_policy(struct ldlm_namespace *ns,
+				struct ldlm_lock *lock,
+				int added, int min)
 {
-	/*
-	 * Stop LRU processing when we reach past @count or have checked all
-	 * locks in LRU.
-	 */
-	return (added >= count) ?
-		LDLM_POLICY_KEEP_LOCK : LDLM_POLICY_CANCEL_LOCK;
+	enum ldlm_policy_res result;
+
+	result = ldlm_cancel_lrur_policy(ns, lock, added, min);
+	if (result == LDLM_POLICY_KEEP_LOCK)
+		return result;
+
+	return ldlm_cancel_no_wait_policy(ns, lock, added, min);
 }
 
 /**
- * Callback function for aged policy. Makes decision whether to keep @lock in
- * LRU for current LRU size @unused, added in current scan @added and
- * number of locks to be preferably canceled @count.
+ * Callback function for aged policy. Decides whether to keep
+ * @lock in LRU for @added in current scan and @min number of locks
+ * to be preferably canceled.
  *
  * Return:	LDLM_POLICY_KEEP_LOCK keep lock in LRU in stop scanning
  *		LDLM_POLICY_CANCEL_LOCK cancel lock from LRU
  */
 static enum ldlm_policy_res ldlm_cancel_aged_policy(struct ldlm_namespace *ns,
 						    struct ldlm_lock *lock,
-						    int unused, int added,
-						    int count)
+						    int added, int min)
 {
-	if ((added >= count) &&
+	if ((added >= min) &&
 	    ktime_before(ktime_get(),
 			 ktime_add(lock->l_last_used, ns->ns_max_age)))
 		return LDLM_POLICY_KEEP_LOCK;
@@ -1457,90 +1438,41 @@ static enum ldlm_policy_res ldlm_cancel_aged_policy(struct ldlm_namespace *ns,
 }
 
 static enum ldlm_policy_res
-ldlm_cancel_lrur_no_wait_policy(struct ldlm_namespace *ns,
-				struct ldlm_lock *lock,
-				int unused, int added,
-				int count)
-{
-	enum ldlm_policy_res result;
-
-	result = ldlm_cancel_lrur_policy(ns, lock, unused, added, count);
-	if (result == LDLM_POLICY_KEEP_LOCK)
-		return result;
-
-	return ldlm_cancel_no_wait_policy(ns, lock, unused, added, count);
-}
-
-static enum ldlm_policy_res
 ldlm_cancel_aged_no_wait_policy(struct ldlm_namespace *ns,
 				struct ldlm_lock *lock,
-				int unused, int added, int count)
+				int added, int min)
 {
 	enum ldlm_policy_res result;
 
-	result = ldlm_cancel_aged_policy(ns, lock, unused, added, count);
+	result = ldlm_cancel_aged_policy(ns, lock, added, min);
 	if (result == LDLM_POLICY_KEEP_LOCK)
 		return result;
 
-	return ldlm_cancel_no_wait_policy(ns, lock, unused, added, count);
-}
-
-/**
- * Callback function for default policy. Makes decision whether to keep @lock
- * in LRU for current LRU size @unused, added in current scan @added and
- * number of locks to be preferably canceled @count.
- *
- * Return:	LDLM_POLICY_KEEP_LOCK keep lock in LRU in stop scanning
- *		LDLM_POLICY_CANCEL_LOCK cancel lock from LRU
- */
-static enum ldlm_policy_res
-ldlm_cancel_default_policy(struct ldlm_namespace *ns, struct ldlm_lock *lock,
-			   int unused, int added, int count)
-{
-	/*
-	 * Stop LRU processing when we reach past count or have checked all
-	 * locks in LRU.
-	 */
-	return (added >= count) ?
-		LDLM_POLICY_KEEP_LOCK : LDLM_POLICY_CANCEL_LOCK;
+	return ldlm_cancel_no_wait_policy(ns, lock, added, min);
 }
 
-typedef enum ldlm_policy_res (*ldlm_cancel_lru_policy_t)(struct ldlm_namespace *,
-							 struct ldlm_lock *,
-							 int, int, int);
+typedef enum ldlm_policy_res
+(*ldlm_cancel_lru_policy_t)(struct ldlm_namespace *ns, struct ldlm_lock *lock,
+			    int added, int min);
 
 static ldlm_cancel_lru_policy_t
 ldlm_cancel_lru_policy(struct ldlm_namespace *ns, enum ldlm_lru_flags lru_flags)
 {
 	if (ns_connect_lru_resize(ns)) {
-		if (lru_flags & LDLM_LRU_FLAG_SHRINK) {
-			/* We kill passed number of old locks. */
-			return ldlm_cancel_passed_policy;
-		} else if (lru_flags & LDLM_LRU_FLAG_LRUR) {
-			if (lru_flags & LDLM_LRU_FLAG_NO_WAIT)
-				return ldlm_cancel_lrur_no_wait_policy;
-			else
-				return ldlm_cancel_lrur_policy;
-		} else if (lru_flags & LDLM_LRU_FLAG_PASSED) {
-			return ldlm_cancel_passed_policy;
-		}
+		if (lru_flags & LDLM_LRU_FLAG_NO_WAIT)
+			return ldlm_cancel_lrur_no_wait_policy;
+		else
+			return ldlm_cancel_lrur_policy;
 	} else {
-		if (lru_flags & LDLM_LRU_FLAG_AGED) {
-			if (lru_flags & LDLM_LRU_FLAG_NO_WAIT)
-				return ldlm_cancel_aged_no_wait_policy;
-			else
-				return ldlm_cancel_aged_policy;
-		}
+		if (lru_flags & LDLM_LRU_FLAG_NO_WAIT)
+			return ldlm_cancel_aged_no_wait_policy;
+		else
+			return ldlm_cancel_aged_policy;
 	}
-
-	if (lru_flags & LDLM_LRU_FLAG_NO_WAIT)
-		return ldlm_cancel_no_wait_policy;
-
-	return ldlm_cancel_default_policy;
 }
 
 /**
- * - Free space in LRU for @count new locks,
+ * - Free space in LRU for @min new locks,
  *   redundant unused locks are canceled locally;
  * - also cancel locally unused aged locks;
  * - do not cancel more than @max locks;
@@ -1554,39 +1486,32 @@ typedef enum ldlm_policy_res (*ldlm_cancel_lru_policy_t)(struct ldlm_namespace *
  * attempt to cancel a lock rely on this flag, l_bl_ast list is accessed
  * later without any special locking.
  *
- * Calling policies for enabled LRU resize:
- * ----------------------------------------
- * flags & LDLM_LRU_FLAG_LRUR	- use LRU resize policy (SLV from server) to
- *				  cancel not more than @count locks;
- *
- * flags & LDLM_LRU_FLAG_PASSED - cancel @count number of old locks (located
- *				  at the beginning of LRU list);
+ * Locks are cancelled according to the LRU resize policy (SLV from server)
+ * if LRU resize is enabled; otherwise, the "aged policy" is used;
  *
- * flags & LDLM_LRU_FLAG_SHRINK - cancel not more than @count locks according
- *				  to memory pressure policy function;
- *
- * flags & LDLM_LRU_FLAG_AGED   - cancel @count locks according to
- *				  "aged policy".
+ * LRU flags:
+ * ----------------------------------------
  *
- * flags & LDLM_LRU_FLAG_NO_WAIT - cancel as many unused locks as possible
- *				   (typically before replaying locks) w/o
- *				   sending any RPCs or waiting for any
- *				   outstanding RPC to complete.
+ * flags & LDLM_LRU_FLAG_NO_WAIT - cancel locks w/o sending any RPCs or waiting
+ *				   for any outstanding RPC to complete.
  *
  * flags & LDLM_CANCEL_CLEANUP - when cancelling read locks, do not check for
  *				 other read locks covering the same pages, just
  *				 discard those pages.
  */
 static int ldlm_prepare_lru_list(struct ldlm_namespace *ns,
-				 struct list_head *cancels, int count, int max,
+				 struct list_head *cancels,
+				 int min, int max,
 				 enum ldlm_lru_flags lru_flags)
 {
 	ldlm_cancel_lru_policy_t pf;
 	int added = 0;
 	int no_wait = lru_flags & LDLM_LRU_FLAG_NO_WAIT;
 
+	LASSERT(ergo(max, min <= max));
+
 	if (!ns_connect_lru_resize(ns))
-		count += ns->ns_nr_unused - ns->ns_max_unused;
+		min = max_t(int, min, ns->ns_nr_unused - ns->ns_max_unused);
 
 	pf = ldlm_cancel_lru_policy(ns, lru_flags);
 	LASSERT(pf);
@@ -1643,7 +1568,7 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns,
 		 * their weight. Big extent locks will stay in
 		 * the cache.
 		 */
-		result = pf(ns, lock, ns->ns_nr_unused, added, count);
+		result = pf(ns, lock, added, min);
 		if (result == LDLM_POLICY_KEEP_LOCK) {
 			lu_ref_del(&lock->l_reference, __func__, current);
 			LDLM_LOCK_RELEASE(lock);
@@ -1725,28 +1650,28 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns,
 	return added;
 }
 
-int ldlm_cancel_lru_local(struct ldlm_namespace *ns,
-			  struct list_head *cancels, int count, int max,
+int ldlm_cancel_lru_local(struct ldlm_namespace *ns, struct list_head *cancels,
+			  int min, int max,
 			  enum ldlm_cancel_flags cancel_flags,
 			  enum ldlm_lru_flags lru_flags)
 {
 	int added;
 
-	added = ldlm_prepare_lru_list(ns, cancels, count, max, lru_flags);
+	added = ldlm_prepare_lru_list(ns, cancels, min, max, lru_flags);
 	if (added <= 0)
 		return added;
 	return ldlm_cli_cancel_list_local(cancels, added, cancel_flags);
 }
 
 /**
- * Cancel at least @nr locks from given namespace LRU.
+ * Cancel@least @min locks from given namespace LRU.
  *
  * When called with LCF_ASYNC the blocking callback will be handled
  * in a thread and this function will return after the thread has been
  * asked to call the callback.  When called with LCF_ASYNC the blocking
  * callback will be performed in this function.
  */
-int ldlm_cancel_lru(struct ldlm_namespace *ns, int nr,
+int ldlm_cancel_lru(struct ldlm_namespace *ns, int min,
 		    enum ldlm_cancel_flags cancel_flags,
 		    enum ldlm_lru_flags lru_flags)
 {
@@ -1757,7 +1682,7 @@ int ldlm_cancel_lru(struct ldlm_namespace *ns, int nr,
 	 * Just prepare the list of locks, do not actually cancel them yet.
 	 * Locks are cancelled later in a separate thread.
 	 */
-	count = ldlm_prepare_lru_list(ns, &cancels, nr, 0, lru_flags);
+	count = ldlm_prepare_lru_list(ns, &cancels, min, 0, lru_flags);
 	rc = ldlm_bl_to_thread_list(ns, NULL, &cancels, count, cancel_flags);
 	if (rc == 0)
 		return count;
diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c
index d0a59a8..4cf4358 100644
--- a/fs/lustre/ldlm/ldlm_resource.c
+++ b/fs/lustre/ldlm/ldlm_resource.c
@@ -191,17 +191,8 @@ static ssize_t lru_size_store(struct kobject *kobj, struct attribute *attr,
 		CDEBUG(D_DLMTRACE,
 		       "dropping all unused locks from namespace %s\n",
 		       ldlm_ns_name(ns));
-		if (ns_connect_lru_resize(ns)) {
-			ldlm_cancel_lru(ns, ns->ns_nr_unused, 0,
-					LDLM_LRU_FLAG_PASSED |
-					LDLM_LRU_FLAG_CLEANUP);
-		} else {
-			tmp = ns->ns_max_unused;
-			ns->ns_max_unused = 0;
-			ldlm_cancel_lru(ns, 0, 0, LDLM_LRU_FLAG_PASSED |
-					LDLM_LRU_FLAG_CLEANUP);
-			ns->ns_max_unused = tmp;
-		}
+		/* Try to cancel all @ns_nr_unused locks. */
+		ldlm_cancel_lru(ns, INT_MAX, 0, LDLM_LRU_FLAG_CLEANUP);
 		return count;
 	}
 
@@ -224,7 +215,6 @@ static ssize_t lru_size_store(struct kobject *kobj, struct attribute *attr,
 		       "changing namespace %s unused locks from %u to %u\n",
 		       ldlm_ns_name(ns), ns->ns_nr_unused,
 		       (unsigned int)tmp);
-		ldlm_cancel_lru(ns, tmp, LCF_ASYNC, LDLM_LRU_FLAG_PASSED);
 
 		if (!lru_resize) {
 			CDEBUG(D_DLMTRACE,
@@ -232,13 +222,12 @@ static ssize_t lru_size_store(struct kobject *kobj, struct attribute *attr,
 			       ldlm_ns_name(ns));
 			ns->ns_connect_flags &= ~OBD_CONNECT_LRU_RESIZE;
 		}
+		ldlm_cancel_lru(ns, tmp, LCF_ASYNC, 0);
 	} else {
 		CDEBUG(D_DLMTRACE,
 		       "changing namespace %s max_unused from %u to %u\n",
 		       ldlm_ns_name(ns), ns->ns_max_unused,
 		       (unsigned int)tmp);
-		ns->ns_max_unused = (unsigned int)tmp;
-		ldlm_cancel_lru(ns, 0, LCF_ASYNC, LDLM_LRU_FLAG_PASSED);
 
 		/* Make sure that LRU resize was originally supported before
 		 * turning it on here.
@@ -250,6 +239,8 @@ static ssize_t lru_size_store(struct kobject *kobj, struct attribute *attr,
 			       ldlm_ns_name(ns));
 			ns->ns_connect_flags |= OBD_CONNECT_LRU_RESIZE;
 		}
+		ns->ns_max_unused = (unsigned int)tmp;
+		ldlm_cancel_lru(ns, 0, LCF_ASYNC, 0);
 	}
 
 	return count;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 11/42] lustre: ldlm: cancel LRU improvement
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (9 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 10/42] lustre: ldlm: lru code cleanup James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 12/42] lnet: Do not set preferred NI for MR peer James Simmons
                   ` (30 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Vitaly Fertman <c17818@cray.com>

Add @batch parameter to cancel LRU, which means if at least 1 lock is
cancelled, try to cancel at least a batch locks. This functionality
will be used in later patches.

Limit the LRU cancel by 1 thread only, however, not for those which
have the @max limit given (ELC), as LRU may be left not cleaned up
in full.

HPE-bug-id: LUS-8678
WC-bug-id: https://jira.whamcloud.com/browse/LU-11518
Lustre-commit: 3d4b5dacb3053 ("LU-11518 ldlm: cancel LRU improvement")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://es-gerrit.dev.cray.com/157067
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/39561
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm.h | 13 +++++++++++++
 fs/lustre/ldlm/ldlm_request.c  | 33 ++++++++++++++++++++++++++++++---
 fs/lustre/ldlm/ldlm_resource.c |  1 +
 3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index 28e766b..e2a7b6b 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -333,6 +333,14 @@ enum ldlm_ns_type {
 	LDLM_NS_TYPE_MGT,
 };
 
+enum ldlm_namespace_flags {
+	/**
+	 * Flag to indicate the LRU cancel is in progress.
+	 * Used to limit the process by 1 thread only.
+	 */
+	LDLM_LRU_CANCEL = 0
+};
+
 /**
  * LDLM Namespace.
  *
@@ -476,6 +484,11 @@ struct ldlm_namespace {
 
 	struct kobject		ns_kobj; /* sysfs object */
 	struct completion	ns_kobj_unregister;
+
+	/**
+	 * To avoid another ns_lock usage, a separate bitops field.
+	 */
+	unsigned long		ns_flags;
 };
 
 /**
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 4bd7372..901e505 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -1476,6 +1476,7 @@ static enum ldlm_policy_res ldlm_cancel_aged_policy(struct ldlm_namespace *ns,
  *   redundant unused locks are canceled locally;
  * - also cancel locally unused aged locks;
  * - do not cancel more than @max locks;
+ * - if some locks are cancelled, try to cancel at least @batch locks
  * - GET the found locks and add them into the @cancels list.
  *
  * A client lock can be added to the l_bl_ast list only when it is
@@ -1501,18 +1502,37 @@ static enum ldlm_policy_res ldlm_cancel_aged_policy(struct ldlm_namespace *ns,
  */
 static int ldlm_prepare_lru_list(struct ldlm_namespace *ns,
 				 struct list_head *cancels,
-				 int min, int max,
+				 int min, int max, int batch,
 				 enum ldlm_lru_flags lru_flags)
 {
 	ldlm_cancel_lru_policy_t pf;
 	int added = 0;
 	int no_wait = lru_flags & LDLM_LRU_FLAG_NO_WAIT;
 
+	/*
+	 * Let only 1 thread to proceed. However, not for those which have the
+	 * @max limit given (ELC), as LRU may be left not cleaned up in full.
+	 */
+	if (max == 0) {
+		if (test_and_set_bit(LDLM_LRU_CANCEL, &ns->ns_flags))
+			return 0;
+	} else if (test_bit(LDLM_LRU_CANCEL, &ns->ns_flags)) {
+		return 0;
+	}
+
 	LASSERT(ergo(max, min <= max));
+	/* No sense to give @batch for ELC */
+	LASSERT(ergo(max, batch == 0));
 
 	if (!ns_connect_lru_resize(ns))
 		min = max_t(int, min, ns->ns_nr_unused - ns->ns_max_unused);
 
+	/* If at least 1 lock is to be cancelled, cancel at least @batch
+	 * locks
+	 */
+	if (min && min < batch)
+		min = batch;
+
 	pf = ldlm_cancel_lru_policy(ns, lru_flags);
 	LASSERT(pf);
 
@@ -1646,7 +1666,14 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns,
 		unlock_res_and_lock(lock);
 		lu_ref_del(&lock->l_reference, __func__, current);
 		added++;
+		/* Once a lock added, batch the requested amount */
+		if (min == 0)
+			min = batch;
 	}
+
+	if (max == 0)
+		clear_bit(LDLM_LRU_CANCEL, &ns->ns_flags);
+
 	return added;
 }
 
@@ -1657,7 +1684,7 @@ int ldlm_cancel_lru_local(struct ldlm_namespace *ns, struct list_head *cancels,
 {
 	int added;
 
-	added = ldlm_prepare_lru_list(ns, cancels, min, max, lru_flags);
+	added = ldlm_prepare_lru_list(ns, cancels, min, max, 0, lru_flags);
 	if (added <= 0)
 		return added;
 	return ldlm_cli_cancel_list_local(cancels, added, cancel_flags);
@@ -1682,7 +1709,7 @@ int ldlm_cancel_lru(struct ldlm_namespace *ns, int min,
 	 * Just prepare the list of locks, do not actually cancel them yet.
 	 * Locks are cancelled later in a separate thread.
 	 */
-	count = ldlm_prepare_lru_list(ns, &cancels, min, 0, lru_flags);
+	count = ldlm_prepare_lru_list(ns, &cancels, min, 0, 0, lru_flags);
 	rc = ldlm_bl_to_thread_list(ns, NULL, &cancels, count, cancel_flags);
 	if (rc == 0)
 		return count;
diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c
index 4cf4358..31e7513 100644
--- a/fs/lustre/ldlm/ldlm_resource.c
+++ b/fs/lustre/ldlm/ldlm_resource.c
@@ -641,6 +641,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
 	ns->ns_dirty_age_limit = ktime_set(LDLM_DIRTY_AGE_LIMIT, 0);
 	ns->ns_stopping = 0;
 	ns->ns_last_pos = &ns->ns_unused_list;
+	ns->ns_flags = 0;
 
 	rc = ldlm_namespace_sysfs_register(ns);
 	if (rc != 0) {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 12/42] lnet: Do not set preferred NI for MR peer
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (10 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 11/42] lustre: ldlm: cancel LRU improvement James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 13/42] lustre: ptlrpc: prefer crc32_le() over CryptoAPI James Simmons
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Chris Horn <chris.horn@hpe.com>

The preferred NI exists to ensure that a consistent source address is
used when communicating with a non-multi-rail peer. We needn't ever
set a preferred NI for a MR peer.

HPE-bug-id: LUS-9058
WC-bug-id: https://jira.whamcloud.com/browse/LU-13736
Lustre-commit: 4596ea5c247c9 ("LU-13736 lnet: Do not set preferred NI for MR peer")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39229
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index f521817..cf14f32 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1615,7 +1615,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 lnet_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, struct lnet_ni *lni,
 			 struct lnet_msg *msg)
 {
-	if (!lnet_msg_is_response(msg) && lpni->lpni_pref_nnids == 0) {
+	if (!lnet_peer_is_multi_rail(lpni->lpni_peer_net->lpn_peer) &&
+	    !lnet_msg_is_response(msg) && lpni->lpni_pref_nnids == 0) {
 		CDEBUG(D_NET, "Setting preferred local NID %s on NMR peer %s\n",
 		       libcfs_nid2str(lni->ni_nid),
 		       libcfs_nid2str(lpni->lpni_nid));
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 13/42] lustre: ptlrpc: prefer crc32_le() over CryptoAPI
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (11 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 12/42] lnet: Do not set preferred NI for MR peer James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 14/42] lnet: call event handlers without res_lock James Simmons
                   ` (28 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Andreas Dilger <adilger@whamcloud.com>

Prefer to call the crc32_le() library function directly if available,
instead of cfs_crypto_hash(CFS_HASH_ALG_CRC32). It is about 10x faster
for the 156-byte struct ptlrpc_body being checked in this function.
A test of small buffers in that compares the two implementations, run
on a 2.9GHz Core i7-7820 shows the difference is significant here:

  buffer size   156 bytes   1536 bytes   4096 bytes     1 MiB
  -----------+------------+------------+-----------+-----------
  cfs_crypto |  182 MiB/s | 1794 MiB/s | 4163 MB/s | 9631 MiB/s
  crc32_le   | 1947 MiB/s | 1871 MiB/s | 1867 MB/s | 1823 MiB/s

This corresponds to 10x faster or 1/10 as many cycles for ptlrpc_body.
The CryptoAPI speed crosses over around 1536 bytes, which is still 10x
larger than the ptlrpc_body size, so it is unlikely to be faster here.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13127
Lustre-commit: 1dda0ef6a70b2 ("LU-13127 ptlrpc: prefer crc32_le() over CryptoAPI")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39614
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/pack_generic.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c
index cbb65ce..fbed952 100644
--- a/fs/lustre/ptlrpc/pack_generic.c
+++ b/fs/lustre/ptlrpc/pack_generic.c
@@ -41,9 +41,7 @@
 
 #define DEBUG_SUBSYSTEM S_RPC
 
-#ifndef CONFIG_CRYPTO_CRC32
 #include <linux/crc32.h>
-#endif
 
 #include <uapi/linux/lustre/lustre_fiemap.h>
 
@@ -1240,15 +1238,16 @@ u32 lustre_msg_calc_cksum(struct lustre_msg *msg, u32 buf)
 		struct ptlrpc_body *pb = lustre_msg_buf_v2(msg, buf, 0);
 		u32 len = lustre_msg_buflen(msg, buf);
 		u32 crc;
-#ifdef CONFIG_CRYPTO_CRC32
+
+#if IS_ENABLED(CONFIG_CRC32)
+		/* about 10x faster than crypto_hash for small buffers */
+		crc = crc32_le(~(__u32)0, (unsigned char *)pb, len);
+#elif IS_ENABLED(CONFIG_CRYPTO_CRC32)
 		unsigned int hsize = 4;
 
 		cfs_crypto_hash_digest(CFS_HASH_ALG_CRC32, (unsigned char *)pb,
-				       lustre_msg_buflen(msg,
-							 MSG_PTLRPC_BODY_OFF),
-				       NULL, 0, (unsigned char *)&crc, &hsize);
-#else
-		crc = crc32_le(~(__u32)0, (unsigned char *)pb, len);
+				       len, NULL, 0, (unsigned char *)&crc,
+				       &hsize);
 #endif
 		return crc;
 	}
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 14/42] lnet: call event handlers without res_lock
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (12 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 13/42] lustre: ptlrpc: prefer crc32_le() over CryptoAPI James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 15/42] lnet: Conditionally attach rspt in LNetPut & LNetGet James Simmons
                   ` (27 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

Currently event handlers are called with the lnet_res_lock()
(a spinlock) held.  This is a problem if the handler wants
to take a mutex, allocate memory, or sleep for some other
reason.

The lock is needed because handlers for a given md need to
be serialized.  At the very least, the final event which
reports that the md is "unlinked" needs to be called last,
after any other events have been handled.

Instead of using a spinlock to ensure this ordering, we can
use a flag bit in the md.

- Before considering whether to send an event we wait for the flag bit
  to be cleared.  This ensures serialization.
- Also wait for the flag to be cleared before final freeing of the md.
- If this is not an unlink event and we need to call the handler, we
  set the flag bit before dropping lnet_res_lock().  This
  ensures the not further events will happen, and that the md
  won't be freed - so we can still clear the flag.
- use wait_var_event to wait for the flag it to be cleared,
  and wake_up_var() to signal a wakeup.  After wait_var_event()
  returns, we need to take the spinlock and check again.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10428
Lustre-commit: d05427a7856e8 ("LU-10428 lnet: call event handlers without res_lock")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/37068
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h  | 23 +++++++++++++++++++++++
 include/linux/lnet/lib-types.h |  9 +++++++++
 net/lnet/lnet/lib-md.c         | 26 ++++++++++++++++++++------
 net/lnet/lnet/lib-msg.c        | 36 +++++++++++++++++++++++++++---------
 4 files changed, 79 insertions(+), 15 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index d2a39f6..6a9ea10 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -188,6 +188,29 @@ static inline int lnet_md_unlinkable(struct lnet_libmd *md)
 	cfs_percpt_unlock(the_lnet.ln_res_lock, cpt);
 }
 
+static inline void lnet_md_wait_handling(struct lnet_libmd *md, int cpt)
+{
+	wait_queue_head_t *wq = __var_waitqueue(md);
+	struct wait_bit_queue_entry entry;
+	wait_queue_entry_t *wqe = &entry.wq_entry;
+
+	init_wait_var_entry(&entry, md, 0);
+	prepare_to_wait_event(wq, wqe, TASK_IDLE);
+	if (md->md_flags & LNET_MD_FLAG_HANDLING) {
+		/* Race with unlocked call to ->md_handler.
+		 * It is safe to drop the res_lock here as the
+		 * caller has only just claimed it.
+		 */
+		lnet_res_unlock(cpt);
+		schedule();
+		/* Cannot check md now, it might be freed.  Caller
+		 * must reclaim reference and check.
+		 */
+		lnet_res_lock(cpt);
+	}
+	finish_wait(wq, wqe);
+}
+
 static inline void
 lnet_md_free(struct lnet_libmd *md)
 {
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 1c016fd..aaf2a46 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -213,6 +213,15 @@ struct lnet_libmd {
 #define LNET_MD_FLAG_ZOMBIE		BIT(0)
 #define LNET_MD_FLAG_AUTO_UNLINK	BIT(1)
 #define LNET_MD_FLAG_ABORTED		BIT(2)
+/* LNET_MD_FLAG_HANDLING is set when a non-unlink event handler
+ * is being called for an event relating to the md.
+ * It ensures only one such handler runs at a time.
+ * The final "unlink" event is only called once the
+ * md_refcount has reached zero, and this flag has been cleared,
+ * ensuring that it doesn't race with any other event handler
+ * call.
+ */
+#define LNET_MD_FLAG_HANDLING		BIT(3)
 
 struct lnet_test_peer {
 	/* info about peers we are trying to fail */
diff --git a/net/lnet/lnet/lib-md.c b/net/lnet/lnet/lib-md.c
index 48249f3..e2c3e90 100644
--- a/net/lnet/lnet/lib-md.c
+++ b/net/lnet/lnet/lib-md.c
@@ -75,6 +75,7 @@
 
 	LASSERT(!list_empty(&md->md_list));
 	list_del_init(&md->md_list);
+	LASSERT(!(md->md_flags & LNET_MD_FLAG_HANDLING));
 	lnet_md_free(md);
 }
 
@@ -448,7 +449,8 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset)
 LNetMDUnlink(struct lnet_handle_md mdh)
 {
 	struct lnet_event ev;
-	struct lnet_libmd *md;
+	struct lnet_libmd *md = NULL;
+	lnet_handler_t handler = NULL;
 	int cpt;
 
 	LASSERT(the_lnet.ln_refcount > 0);
@@ -456,10 +458,18 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset)
 	cpt = lnet_cpt_of_cookie(mdh.cookie);
 	lnet_res_lock(cpt);
 
-	md = lnet_handle2md(&mdh);
-	if (!md) {
-		lnet_res_unlock(cpt);
-		return -ENOENT;
+	while (!md) {
+		md = lnet_handle2md(&mdh);
+		if (!md) {
+			lnet_res_unlock(cpt);
+			return -ENOENT;
+		}
+		if (md->md_refcount == 0 &&
+		    md->md_flags & LNET_MD_FLAG_HANDLING) {
+			/* Race with unlocked call to ->md_handler. */
+			lnet_md_wait_handling(md, cpt);
+			md = NULL;
+		}
 	}
 
 	md->md_flags |= LNET_MD_FLAG_ABORTED;
@@ -470,7 +480,7 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset)
 	 */
 	if (md->md_handler && !md->md_refcount) {
 		lnet_build_unlink_event(md, &ev);
-		md->md_handler(&ev);
+		handler = md->md_handler;
 	}
 
 	if (md->md_rspt_ptr)
@@ -479,6 +489,10 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset)
 	lnet_md_unlink(md);
 
 	lnet_res_unlock(cpt);
+
+	if (handler)
+		handler(&ev);
+
 	return 0;
 }
 EXPORT_SYMBOL(LNetMDUnlink);
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index f759b2d..e84cf02 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -938,11 +938,20 @@
 }
 
 static void
-lnet_msg_detach_md(struct lnet_msg *msg, int cpt, int status)
+lnet_msg_detach_md(struct lnet_msg *msg, int status)
 {
 	struct lnet_libmd *md = msg->msg_md;
+	lnet_handler_t handler = NULL;
+	int cpt = lnet_cpt_of_cookie(md->md_lh.lh_cookie);
 	int unlink;
 
+	lnet_res_lock(cpt);
+	while (md->md_flags & LNET_MD_FLAG_HANDLING)
+		/* An event handler is running - wait for it to
+		 * complete to avoid races.
+		 */
+		lnet_md_wait_handling(md, cpt);
+
 	/* Now it's safe to drop my caller's ref */
 	md->md_refcount--;
 	LASSERT(md->md_refcount >= 0);
@@ -956,17 +965,30 @@
 			msg->msg_ev.status = status;
 		}
 		msg->msg_ev.unlinked = unlink;
-		md->md_handler(&msg->msg_ev);
+		handler = md->md_handler;
+		if (!unlink)
+			md->md_flags |= LNET_MD_FLAG_HANDLING;
 	}
 
 	if (unlink || (md->md_refcount == 0 &&
 		       md->md_threshold == LNET_MD_THRESH_INF))
 		lnet_detach_rsp_tracker(md, cpt);
 
+	msg->msg_md = NULL;
 	if (unlink)
 		lnet_md_unlink(md);
 
-	msg->msg_md = NULL;
+	lnet_res_unlock(cpt);
+
+	if (handler) {
+		handler(&msg->msg_ev);
+		if (!unlink) {
+			lnet_res_lock(cpt);
+			md->md_flags &= ~LNET_MD_FLAG_HANDLING;
+			wake_up_var(md);
+			lnet_res_unlock(cpt);
+		}
+	}
 }
 
 static bool
@@ -1101,12 +1123,8 @@
 	/* We're not going to resend this message so detach its MD and invoke
 	 * the appropriate callbacks
 	 */
-	if (msg->msg_md) {
-		cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie);
-		lnet_res_lock(cpt);
-		lnet_msg_detach_md(msg, cpt, status);
-		lnet_res_unlock(cpt);
-	}
+	if (msg->msg_md)
+		lnet_msg_detach_md(msg, status);
 
 again:
 	if (!msg->msg_tx_committed && !msg->msg_rx_committed) {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 15/42] lnet: Conditionally attach rspt in LNetPut & LNetGet
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (13 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 14/42] lnet: call event handlers without res_lock James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 16/42] lustre: llite: reuse same cl_dio_aio for one IO James Simmons
                   ` (26 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Chris Horn <hornc@cray.com>

Create a function to interpret the message type and md options to
determine whether response tracking should be enabled for a particular
PUT or GET.

Use that function in LNetPut and LNetGet to determine whether we
attach the response tracker.

HPE-bug-id: LUS-8827
WC-bug-id: https://jira.whamcloud.com/browse/LU-13502
Lustre-commit: 0722e7601f0ba5 ("LU-13502 lnet: Conditionally attach rspt in LNetPut & LNetGet")
Signed-off-by: Chris Horn <hornc@cray.com>
Reviewed-on: https://review.whamcloud.com/38452
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index cf14f32..aa3f3ab 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -68,6 +68,30 @@ struct lnet_send_data {
 	return msg->msg_type == LNET_MSG_ACK || msg->msg_type == LNET_MSG_REPLY;
 }
 
+static inline bool
+lnet_response_tracking_enabled(u32 msg_type, unsigned int md_options)
+{
+	if (md_options & LNET_MD_NO_TRACK_RESPONSE)
+		/* Explicitly disabled in MD options */
+		return false;
+
+	if (md_options & LNET_MD_TRACK_RESPONSE)
+		/* Explicity enabled in MD options */
+		return true;
+
+	if (lnet_response_tracking == 3)
+		/* Enabled for all message types */
+		return true;
+
+	if (msg_type == LNET_MSG_PUT)
+		return lnet_response_tracking == 2;
+
+	if (msg_type == LNET_MSG_GET)
+		return lnet_response_tracking == 1;
+
+	return false;
+}
+
 static inline struct lnet_comm_count *
 get_stats_counts(struct lnet_element_stats *stats,
 		 enum lnet_stats_type stats_type)
@@ -4458,7 +4482,9 @@ void lnet_monitor_thr_stop(void)
 			       md->md_me->me_portal);
 		lnet_res_unlock(cpt);
 
-		lnet_rspt_free(rspt, cpt);
+		if (rspt)
+			lnet_rspt_free(rspt, cpt);
+
 		kfree(msg);
 		return -ENOENT;
 	}
@@ -4491,8 +4517,11 @@ void lnet_monitor_thr_stop(void)
 
 	lnet_build_msg_event(msg, LNET_EVENT_SEND);
 
-	if (ack == LNET_ACK_REQ)
+	if (rspt && lnet_response_tracking_enabled(LNET_MSG_PUT,
+						   md->md_options))
 		lnet_attach_rsp_tracker(rspt, cpt, md, mdh);
+	else if (rspt)
+		lnet_rspt_free(rspt, cpt);
 
 	if (CFS_FAIL_CHECK_ORSET(CFS_FAIL_PTLRPC_OST_BULK_CB2,
 				 CFS_FAIL_ONCE))
@@ -4718,7 +4747,10 @@ struct lnet_msg *
 
 	lnet_build_msg_event(msg, LNET_EVENT_SEND);
 
-	lnet_attach_rsp_tracker(rspt, cpt, md, mdh);
+	if (lnet_response_tracking_enabled(LNET_MSG_GET, md->md_options))
+		lnet_attach_rsp_tracker(rspt, cpt, md, mdh);
+	else
+		lnet_rspt_free(rspt, cpt);
 
 	rc = lnet_send(self, msg, LNET_NID_ANY);
 	if (rc < 0) {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 16/42] lustre: llite: reuse same cl_dio_aio for one IO
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (14 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 15/42] lnet: Conditionally attach rspt in LNetPut & LNetGet James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 17/42] lustre: llite: move iov iter forward by ourself James Simmons
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Wang Shilong <wshilong@ddn.com>

IO might be restarted if layout changed, this might
cause ki_complete() called several times for one IO.

Fixes: fde7ac1 ("lustre: clio: AIO support for direct IO")
Fixes: 52f2fc5 ("lustre: llite: fix short io for AIO")
WC-bug-id: https://jira.whamcloud.com/browse/LU-13835
Lustre-commit: db6f203965d91 ("LU-13835 llite: reuse same cl_dio_aio for one IO")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/39542
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 57 ++++++++++++++++++++++++++++----------------------
 fs/lustre/llite/rw26.c |  5 +++++
 2 files changed, 37 insertions(+), 25 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index d872cf3..251cca5 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -1563,13 +1563,26 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 	unsigned int retried = 0;
 	unsigned int ignore_lockless = 0;
 	bool is_aio = false;
+	struct cl_dio_aio *ci_aio = NULL;
 
 	CDEBUG(D_VFSTRACE, "file: %pD, type: %d ppos: %llu, count: %zu\n",
 	       file, iot, *ppos, count);
 
+	io = vvp_env_thread_io(env);
+	if (file->f_flags & O_DIRECT) {
+		if (!is_sync_kiocb(args->u.normal.via_iocb))
+			is_aio = true;
+		ci_aio = cl_aio_alloc(args->u.normal.via_iocb);
+		if (!ci_aio) {
+			rc = -ENOMEM;
+			goto out;
+		}
+	}
+
 restart:
 	io = vvp_env_thread_io(env);
 	ll_io_init(io, file, iot == CIT_WRITE, args);
+	io->ci_aio = ci_aio;
 	io->ci_ignore_lockless = ignore_lockless;
 	io->ci_ndelay_tried = retried;
 
@@ -1585,15 +1598,6 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 		vio->vui_fd  = file->private_data;
 		vio->vui_iter = args->u.normal.via_iter;
 		vio->vui_iocb = args->u.normal.via_iocb;
-		if (file->f_flags & O_DIRECT) {
-			if (!is_sync_kiocb(vio->vui_iocb))
-				is_aio = true;
-			io->ci_aio = cl_aio_alloc(vio->vui_iocb);
-			if (!io->ci_aio) {
-				rc = -ENOMEM;
-				goto out;
-			}
-		}
 		/*
 		 * Direct IO reads must also take range lock,
 		 * or multiple reads will try to work on the same pages
@@ -1632,29 +1636,18 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 	 * EIOCBQUEUED to the caller, So we could only return
 	 * number of bytes in non-AIO case.
 	 */
-	if (io->ci_nob > 0 && !is_aio) {
-		result += io->ci_nob;
+	if (io->ci_nob > 0) {
+		if (!is_aio) {
+			result += io->ci_nob;
+			*ppos = io->u.ci_wr.wr.crw_pos; /* for splice */
+		}
 		count -= io->ci_nob;
-		*ppos = io->u.ci_wr.wr.crw_pos;
 
 		/* prepare IO restart */
 		if (count > 0)
 			args->u.normal.via_iter = vio->vui_iter;
 	}
 out:
-	if (io->ci_aio) {
-		/**
-		 * Drop one extra reference so that end_io() could be
-		 * called for this IO context, we could call it after
-		 * we make sure all AIO requests have been proceed.
-		 */
-		cl_sync_io_note(env, &io->ci_aio->cda_sync,
-				rc == -EIOCBQUEUED ? 0 : rc);
-		if (!is_aio) {
-			cl_aio_free(io->ci_aio);
-			io->ci_aio = NULL;
-		}
-	}
 	cl_io_fini(env, io);
 
 	CDEBUG(D_VFSTRACE,
@@ -1675,6 +1668,20 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 		goto restart;
 	}
 
+	if (io->ci_aio) {
+		/**
+		 * Drop one extra reference so that end_io() could be
+		 * called for this IO context, we could call it after
+		 * we make sure all AIO requests have been proceed.
+		 */
+		cl_sync_io_note(env, &io->ci_aio->cda_sync,
+				rc == -EIOCBQUEUED ? 0 : rc);
+		if (!is_aio) {
+			cl_aio_free(io->ci_aio);
+			io->ci_aio = NULL;
+		}
+	}
+
 	if (iot == CIT_READ) {
 		if (result >= 0)
 			ll_stats_ops_tally(ll_i2sbi(file_inode(file)),
diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index 7010fe8..a4ae211 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -243,6 +243,11 @@ struct ll_dio_pages {
 		int iot = rw == READ ? CRT_READ : CRT_WRITE;
 
 		atomic_add(io_pages, &anchor->csi_sync_nr);
+		/*
+		 * Avoid out-of-order execution of adding inflight
+		 * modifications count and io submit.
+		 */
+		smp_mb();
 		rc = cl_io_submit_rw(env, io, iot, queue);
 		if (rc == 0) {
 			cl_page_list_splice(&queue->c2_qout,
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 17/42] lustre: llite: move iov iter forward by ourself
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (15 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 16/42] lustre: llite: reuse same cl_dio_aio for one IO James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 18/42] lustre: llite: report client stats sumsq James Simmons
                   ` (24 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Wang Shilong <wshilong@ddn.com>

Newer kernel will reward iov iter back to original
position for direct IO, see following codes:

	iov_iter_revert(from, write_len -
       			iov_iter_count(from));--------->here
out:
	return written;
}
EXPORT_SYMBOL(generic_file_direct_write);

This break assumptions from Lustre and caused problem
when Lustre need split one IO to several io loop, considering
4M block IO for 1 MiB stripe file, it will submit first 1MiB IO
4 times and caused data corruptions finally.

Since generic kernel varies from different kernel versions,
we'd better fix this problem by move iov iter forward by
Lustre itself.

Added a new test cases aiocp.c is copied from xfstests,
with codes style cleanups to make checkpatch.pl happy.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13846
Lustre-commit: 689714eb511d3 ("LU-13846 llite: move iov iter forward by ourself")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/39565
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/vvp_io.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index 59da56d..3a2e1cc 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -518,6 +518,12 @@ static void vvp_io_advance(const struct lu_env *env,
 
 	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
+	/*
+	 * Since 3.16(26978b8b4) vfs revert iov iter to
+	 * original position even io succeed, so instead
+	 * of relying on VFS, we move iov iter by ourselves.
+	 */
+	iov_iter_advance(vio->vui_iter, nob);
 	vio->vui_tot_count -= nob;
 	iov_iter_reexpand(vio->vui_iter, vio->vui_tot_count);
 }
@@ -771,11 +777,12 @@ static int vvp_io_read_start(const struct lu_env *env,
 	struct inode *inode = vvp_object_inode(obj);
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct file *file = vio->vui_fd->fd_file;
-	int result;
 	loff_t pos = io->u.ci_rd.rd.crw_pos;
 	size_t cnt = io->u.ci_rd.rd.crw_count;
 	size_t tot = vio->vui_tot_count;
 	int exceed = 0;
+	int result;
+	struct iov_iter iter;
 
 	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
@@ -825,7 +832,8 @@ static int vvp_io_read_start(const struct lu_env *env,
 	/* BUG: 5972 */
 	file_accessed(file);
 	LASSERT(vio->vui_iocb->ki_pos == pos);
-	result = generic_file_read_iter(vio->vui_iocb, vio->vui_iter);
+	iter = *vio->vui_iter;
+	result = generic_file_read_iter(vio->vui_iocb, &iter);
 	goto out;
 
 out:
@@ -1184,8 +1192,7 @@ static int vvp_io_write_start(const struct lu_env *env,
 
 		if (unlikely(lock_inode))
 			inode_lock(inode);
-		result = __generic_file_write_iter(vio->vui_iocb,
-						   vio->vui_iter);
+		result = __generic_file_write_iter(vio->vui_iocb, &iter);
 		if (unlikely(lock_inode))
 			inode_unlock(inode);
 
@@ -1223,11 +1230,6 @@ static int vvp_io_write_start(const struct lu_env *env,
 		 * successfully committed.
 		 */
 		vio->vui_iocb->ki_pos = pos + io->ci_nob - nob;
-		iov_iter_advance(&iter, io->ci_nob - nob);
-		vio->vui_iter->iov = iter.iov;
-		vio->vui_iter->nr_segs = iter.nr_segs;
-		vio->vui_iter->iov_offset = iter.iov_offset;
-		vio->vui_iter->count = iter.count;
 	}
 	if (result > 0 || result == -EIOCBQUEUED) {
 		set_bit(LLIF_DATA_MODIFIED, &(ll_i2info(inode))->lli_flags);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 18/42] lustre: llite: report client stats sumsq
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (16 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 17/42] lustre: llite: move iov iter forward by ourself James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 19/42] lnet: Support checking for MD leaks James Simmons
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Wang Shilong <wshilong@ddn.com>

Commit 2eeb tries to account sumsq for every client operation, but
lprocfs_counter_init() did not init them properly, also add a test
case to verify new format of client stats.

Fixes: 2eeb6dba81bc ("lustre: obd: add new LPROCFS_TYPE_*")
WC-bug-id: https://jira.whamcloud.com/browse/LU-13733
Lustre-commit: 8a1334626ec2f ("LU-13733 llite: report client stats sumsq")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/39223
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/lproc_llite.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index a742200..54db7eb 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -1673,19 +1673,16 @@ int ll_debugfs_register_super(struct super_block *sb, const char *name)
 	/* do counter init */
 	for (id = 0; id < LPROC_LL_FILE_OPCODES; id++) {
 		u32 type = llite_opcode_table[id].type;
-		void *ptr = NULL;
+		void *ptr = "unknown";
 
 		if (type & LPROCFS_TYPE_REQS)
 			ptr = "reqs";
 		else if (type & LPROCFS_TYPE_BYTES)
 			ptr = "bytes";
-		else if (type & LPROCFS_TYPE_PAGES)
-			ptr = "pages";
 		else if (type & LPROCFS_TYPE_USEC)
 			ptr = "usec";
 		lprocfs_counter_init(sbi->ll_stats,
-				     llite_opcode_table[id].opcode,
-				     (type & LPROCFS_CNTR_AVGMINMAX),
+				     llite_opcode_table[id].opcode, type,
 				     llite_opcode_table[id].opname, ptr);
 	}
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 19/42] lnet: Support checking for MD leaks.
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (17 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 18/42] lustre: llite: report client stats sumsq James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:05 ` [lustre-devel] [PATCH 20/42] lnet: don't read debugfs lnet stats when shutting down James Simmons
                   ` (22 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

Since we dropped the refcounting on LNetEQ we no longer get
confirmation that all MDs for a given handler a gone by the
time they should be.

So add lnet_assert_handler_unused() which searches the per-cpt
containers and ensures there are no MDs for a given handler, and call
that are the same place that we used to call LNetEQFree().

WC-bug-id: https://jira.whamcloud.com/browse/LU-13005
Lustre-commit: b7278ecc699b5 ("LU-13005 lnet: Support checking for MD leaks.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/38059
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/events.c |  1 +
 include/linux/lnet/api.h  |  2 ++
 net/lnet/lnet/api-ni.c    |  3 +++
 net/lnet/lnet/lib-md.c    | 19 +++++++++++++++++++
 net/lnet/lnet/peer.c      |  1 +
 net/lnet/selftest/rpc.c   |  1 +
 6 files changed, 27 insertions(+)

diff --git a/fs/lustre/ptlrpc/events.c b/fs/lustre/ptlrpc/events.c
index eef40b3..0943612 100644
--- a/fs/lustre/ptlrpc/events.c
+++ b/fs/lustre/ptlrpc/events.c
@@ -517,6 +517,7 @@ static void ptlrpc_ni_fini(void)
 	percpu_ref_kill(&ptlrpc_pending);
 	wait_for_completion(&ptlrpc_done);
 
+	lnet_assert_handler_unused(ptlrpc_handler);
 	LNetNIFini();
 }
 
diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h
index 95805de..064c92e 100644
--- a/include/linux/lnet/api.h
+++ b/include/linux/lnet/api.h
@@ -126,6 +126,8 @@ int LNetMDBind(const struct lnet_md *md_in,
 	       struct lnet_handle_md *md_handle_out);
 
 int LNetMDUnlink(struct lnet_handle_md md_in);
+
+void lnet_assert_handler_unused(lnet_handler_t handler);
 /** @} lnet_md */
 
 /** \defgroup lnet_data Data movement operations
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index c90ab2e..0f325ec 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -1231,6 +1231,7 @@ struct list_head **
 		the_lnet.ln_mt_zombie_rstqs = NULL;
 	}
 
+	lnet_assert_handler_unused(the_lnet.ln_mt_handler);
 	the_lnet.ln_mt_handler = NULL;
 
 	lnet_portals_destroy();
@@ -1795,6 +1796,7 @@ struct lnet_ping_buffer *
 	lnet_ping_md_unlink(the_lnet.ln_ping_target,
 			    &the_lnet.ln_ping_target_md);
 
+	lnet_assert_handler_unused(the_lnet.ln_ping_target_handler);
 	lnet_ping_target_destroy();
 }
 
@@ -1969,6 +1971,7 @@ static void lnet_push_target_fini(void)
 	the_lnet.ln_push_target_nnis = 0;
 
 	LNetClearLazyPortal(LNET_RESERVED_PORTAL);
+	lnet_assert_handler_unused(the_lnet.ln_push_target_handler);
 	the_lnet.ln_push_target_handler = NULL;
 }
 
diff --git a/net/lnet/lnet/lib-md.c b/net/lnet/lnet/lib-md.c
index e2c3e90..203c794 100644
--- a/net/lnet/lnet/lib-md.c
+++ b/net/lnet/lnet/lib-md.c
@@ -262,6 +262,25 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset)
 	list_add(&md->md_list, &container->rec_active);
 }
 
+void lnet_assert_handler_unused(lnet_handler_t handler)
+{
+	struct lnet_res_container *container;
+	int cpt;
+
+	if (!handler)
+		return;
+
+	cfs_percpt_for_each(container, cpt, the_lnet.ln_md_containers) {
+		struct lnet_libmd *md;
+
+		lnet_res_lock(cpt);
+		list_for_each_entry(md, &container->rec_active, md_list)
+			LASSERT(md->md_handler != handler);
+		lnet_res_unlock(cpt);
+	}
+}
+EXPORT_SYMBOL(lnet_assert_handler_unused);
+
 /* must be called with lnet_res_lock held */
 void
 lnet_md_deconstruct(struct lnet_libmd *lmd, struct lnet_event *ev)
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 5ca6f68..3889310 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -3473,6 +3473,7 @@ static int lnet_peer_discovery(void *arg)
 	}
 	lnet_net_unlock(LNET_LOCK_EX);
 
+	lnet_assert_handler_unused(the_lnet.ln_dc_handler);
 	the_lnet.ln_dc_handler = NULL;
 
 	the_lnet.ln_dc_state = LNET_DC_STATE_SHUTDOWN;
diff --git a/net/lnet/selftest/rpc.c b/net/lnet/selftest/rpc.c
index a72e485..d012930 100644
--- a/net/lnet/selftest/rpc.c
+++ b/net/lnet/selftest/rpc.c
@@ -1672,6 +1672,7 @@ struct srpc_client_rpc *
 		rc = LNetClearLazyPortal(SRPC_FRAMEWORK_REQUEST_PORTAL);
 		rc = LNetClearLazyPortal(SRPC_REQUEST_PORTAL);
 		LASSERT(!rc);
+		lnet_assert_handler_unused(srpc_data.rpc_lnet_handler);
 		/* fall through */
 	case SRPC_STATE_NI_INIT:
 		LNetNIFini();
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 20/42] lnet: don't read debugfs lnet stats when shutting down
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (18 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 19/42] lnet: Support checking for MD leaks James Simmons
@ 2020-10-06  0:05 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 21/42] lnet: Loosen restrictions on LNet Health params James Simmons
                   ` (21 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:05 UTC (permalink / raw)
  To: lustre-devel

A race exist on shutdown with an external application reading
the debugfs file containing lnet stats which causes an kernel
crash.

[  257.192117] BUG: unable to handle kernel paging request at fffffffffffffff0
[  257.194859] IP: [<ffffffffc0bb95c6>] cfs_percpt_number+0x6/0x10 [libcfs]
[  257.196863] PGD 7c14067 PUD 7c16067 PMD 0
[  257.198665] Oops: 0000 [#1] SMP
[  257.200431] Modules linked in: ksocklnd(OE) lnet(OE) libcfs(OE) dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) ppdev iosf_mbi crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sg e1000 video parport_pc parport i2c_piix4 dm_multipath dm_mod ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi crct10dif_pclmul crct10dif_common ata_piix serio_raw libata [last unloaded: obdclass]
[  257.222895] CPU: 0 PID: 7331 Comm: lctl Tainted: P           OE  ------------   3.10.0-957.el7_lustre.x86_64 #1
[  257.229312] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  257.233659] task: ffff9c9fbaf15140 ti: ffff9c9fbabcc000 task.ti: ffff9c9fbabcc000
[  257.238388] RIP: 0010:[<ffffffffc0bb95c6>]  [<ffffffffc0bb95c6>] cfs_percpt_number+0x6/0x10 [libcfs]
[  257.243851] RSP: 0018:ffff9c9fbabcfdb0  EFLAGS: 00010296
[  257.246400] RAX: 0000000000000000 RBX: ffff9c9fba2a5200 RCX: 0000000000000000
[  257.250304] RDX: 0000000000000001 RSI: 00000000ffffffff RDI: 0000000000000000
[  257.253677] RBP: ffff9c9fbabcfdd0 R08: 000000000001f120 R09: ffff9c9fbe001700
[  257.257073] R10: ffffffffc0c376db R11: 0000000000000246 R12: 0000000000000000
[  257.260339] R13: 0000000000000000 R14: 0000000000001000 R15: ffff9c9fba2a5200
[  257.263204] FS:  00007fbdc89c6740(0000) GS:ffff9c9fbfc00000(0000) knlGS:0000000000000000
[  257.266409] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  257.269105] CR2: fffffffffffffff0 CR3: 0000000022e36000 CR4: 00000000000606f0
[  257.272529] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  257.275209] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  257.277936] Call Trace:
[  257.279245]  [<ffffffffc0c0a88b>] ? lnet_counters_get_common+0xeb/0x150 [lnet]
[  257.283071]  [<ffffffffc0c0a95c>] lnet_counters_get+0x6c/0x150 [lnet]
[  257.286224]  [<ffffffffc0c3771b>] __proc_lnet_stats+0xfb/0x810 [lnet]
[  257.288975]  [<ffffffffc0ba6602>] lprocfs_call_handler+0x22/0x50 [libcfs]
[  257.292387]  [<ffffffffc0c36bf5>] proc_lnet_stats+0x25/0x30 [lnet]
[  257.295184]  [<ffffffffc0ba665d>] lnet_debugfs_read+0x2d/0x40 [libcfs]

The solution is to only allow reading of the lnet stats when the
lnet state is LNET_STATE_RUNNING.

WC-bug-id: https://jira.whamcloud.com/browse/LU-11986
Lustre-commit: f53eea15d470c9 ("LU-11986 lnet: don't read debugfs lnet stats when shutting down")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39404
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 include/linux/lnet/lib-lnet.h |  2 +-
 net/lnet/lnet/api-ni.c        | 40 +++++++++++++++++++++++++++++-----------
 net/lnet/lnet/router_proc.c   | 17 ++++++-----------
 3 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 6a9ea10..6253c16 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -664,7 +664,7 @@ int lnet_delay_rule_list(int pos, struct lnet_fault_attr *attr,
 /** @} lnet_fault_simulation */
 
 void lnet_counters_get_common(struct lnet_counters_common *common);
-void lnet_counters_get(struct lnet_counters *counters);
+int lnet_counters_get(struct lnet_counters *counters);
 void lnet_counters_reset(void);
 
 unsigned int lnet_iov_nob(unsigned int niov, struct kvec *iov);
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 0f325ec..af2089e 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -853,16 +853,17 @@ static void lnet_assert_wire_constants(void)
 }
 EXPORT_SYMBOL(lnet_unregister_lnd);
 
-void
-lnet_counters_get_common(struct lnet_counters_common *common)
+static void
+lnet_counters_get_common_locked(struct lnet_counters_common *common)
 {
 	struct lnet_counters *ctr;
 	int i;
 
+	/* FIXME !!! Their is no assert_lnet_net_locked() to ensure this
+	 * actually called under the protection of the lnet_net_lock.
+	 */
 	memset(common, 0, sizeof(*common));
 
-	lnet_net_lock(LNET_LOCK_EX);
-
 	cfs_percpt_for_each(ctr, i, the_lnet.ln_counters) {
 		common->lcc_msgs_max	 += ctr->lct_common.lcc_msgs_max;
 		common->lcc_msgs_alloc   += ctr->lct_common.lcc_msgs_alloc;
@@ -876,23 +877,35 @@ static void lnet_assert_wire_constants(void)
 		common->lcc_route_length += ctr->lct_common.lcc_route_length;
 		common->lcc_drop_length  += ctr->lct_common.lcc_drop_length;
 	}
+}
+
+void
+lnet_counters_get_common(struct lnet_counters_common *common)
+{
+	lnet_net_lock(LNET_LOCK_EX);
+	lnet_counters_get_common_locked(common);
 	lnet_net_unlock(LNET_LOCK_EX);
 }
 EXPORT_SYMBOL(lnet_counters_get_common);
 
-void
+int
 lnet_counters_get(struct lnet_counters *counters)
 {
 	struct lnet_counters *ctr;
 	struct lnet_counters_health *health = &counters->lct_health;
-	int i;
+	int i, rc = 0;
 
 	memset(counters, 0, sizeof(*counters));
 
-	lnet_counters_get_common(&counters->lct_common);
-
 	lnet_net_lock(LNET_LOCK_EX);
 
+	if (the_lnet.ln_state != LNET_STATE_RUNNING) {
+		rc = -ENODEV;
+		goto out_unlock;
+	}
+
+	lnet_counters_get_common_locked(&counters->lct_common);
+
 	cfs_percpt_for_each(ctr, i, the_lnet.ln_counters) {
 		health->lch_rst_alloc += ctr->lct_health.lch_rst_alloc;
 		health->lch_resend_count += ctr->lct_health.lch_resend_count;
@@ -919,7 +932,9 @@ static void lnet_assert_wire_constants(void)
 		health->lch_network_timeout_count +=
 				ctr->lct_health.lch_network_timeout_count;
 	}
+out_unlock:
 	lnet_net_unlock(LNET_LOCK_EX);
+	return rc;
 }
 EXPORT_SYMBOL(lnet_counters_get);
 
@@ -931,9 +946,12 @@ static void lnet_assert_wire_constants(void)
 
 	lnet_net_lock(LNET_LOCK_EX);
 
+	if (the_lnet.ln_state != LNET_STATE_RUNNING)
+		goto avoid_reset;
+
 	cfs_percpt_for_each(counters, i, the_lnet.ln_counters)
 		memset(counters, 0, sizeof(struct lnet_counters));
-
+avoid_reset:
 	lnet_net_unlock(LNET_LOCK_EX);
 }
 
@@ -3680,9 +3698,9 @@ u32 lnet_get_dlc_seq_locked(void)
 			return -EINVAL;
 
 		mutex_lock(&the_lnet.ln_api_mutex);
-		lnet_counters_get(&lnet_stats->st_cntrs);
+		rc = lnet_counters_get(&lnet_stats->st_cntrs);
 		mutex_unlock(&the_lnet.ln_api_mutex);
-		return 0;
+		return rc;
 	}
 
 	case IOC_LIBCFS_CONFIG_RTR:
diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c
index 7fe8d33..96cc506 100644
--- a/net/lnet/lnet/router_proc.c
+++ b/net/lnet/lnet/router_proc.c
@@ -83,8 +83,7 @@ static int proc_lnet_stats(struct ctl_table *table, int write,
 	size_t nob = *lenp;
 	loff_t pos = *ppos;
 	int len;
-	char *tmpstr;
-	const int tmpsiz = 256; /* 7 %u and 4 %llu */
+	char tmpstr[256]; /* 7 %u and 4 %llu */
 
 	if (write) {
 		lnet_counters_reset();
@@ -96,16 +95,13 @@ static int proc_lnet_stats(struct ctl_table *table, int write,
 	if (!ctrs)
 		return -ENOMEM;
 
-	tmpstr = kmalloc(tmpsiz, GFP_KERNEL);
-	if (!tmpstr) {
-		kfree(ctrs);
-		return -ENOMEM;
-	}
+	rc = lnet_counters_get(ctrs);
+	if (rc)
+		goto out_no_ctrs;
 
-	lnet_counters_get(ctrs);
 	common = ctrs->lct_common;
 
-	len = scnprintf(tmpstr, tmpsiz,
+	len = scnprintf(tmpstr, sizeof(tmpstr),
 			"%u %u %u %u %u %u %u %llu %llu %llu %llu",
 			common.lcc_msgs_alloc, common.lcc_msgs_max,
 			common.lcc_errors,
@@ -119,8 +115,7 @@ static int proc_lnet_stats(struct ctl_table *table, int write,
 	else
 		rc = cfs_trace_copyout_string(buffer, nob,
 					      tmpstr + pos, "\n");
-
-	kfree(tmpstr);
+out_no_ctrs:
 	kfree(ctrs);
 	return rc;
 }
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 21/42] lnet: Loosen restrictions on LNet Health params
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (19 preceding siblings ...)
  2020-10-06  0:05 ` [lustre-devel] [PATCH 20/42] lnet: don't read debugfs lnet stats when shutting down James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 22/42] lnet: Fix reference leak in lnet_select_pathway James Simmons
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Chris Horn <chris.horn@hpe.com>

The functions that set various LNet Health related parameters require
that the parameters be set in a specific order depending on whether
health is enabled or disabled. This is not user-friendly.
 - Don't overwrite lnet_transaction_timeout when health is being
   enabled or disabled.
 - Don't overwrite lnet_retry_count when health is being enabled
   (still set it to zero when health is disabled).
 - Allow lnet_retry_count to be set to 0 when health is disabled
 - Correct off-by-one error in transaction_to_set() to ensure
   lnet_transaction_timeout is greater than lnet_retry_count

HPE-bug-id: LUS-8995
WC-bug-id: https://jira.whamcloud.com/browse/LU-13735
Lustre-commit: b8ff97611717d ("LU-13735 lnet: Loosen restrictions on LNet Health params")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39228
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/api-ni.c | 35 ++++++++++-------------------------
 1 file changed, 10 insertions(+), 25 deletions(-)

diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index af2089e..f678ae2 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -140,10 +140,8 @@ static int recovery_interval_set(const char *val,
 MODULE_PARM_DESC(lnet_drop_asym_route,
 		 "Set to 1 to drop asymmetrical route messages.");
 
-#define LNET_TRANSACTION_TIMEOUT_NO_HEALTH_DEFAULT 50
-#define LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT 50
-
-unsigned int lnet_transaction_timeout = LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT;
+#define LNET_TRANSACTION_TIMEOUT_DEFAULT 50
+unsigned int lnet_transaction_timeout = LNET_TRANSACTION_TIMEOUT_DEFAULT;
 static int transaction_to_set(const char *val, const struct kernel_param *kp);
 static struct kernel_param_ops param_ops_transaction_timeout = {
 	.set = transaction_to_set,
@@ -156,8 +154,8 @@ static int recovery_interval_set(const char *val,
 MODULE_PARM_DESC(lnet_transaction_timeout,
 		 "Maximum number of seconds to wait for a peer response.");
 
-#define LNET_RETRY_COUNT_HEALTH_DEFAULT 2
-unsigned int lnet_retry_count = LNET_RETRY_COUNT_HEALTH_DEFAULT;
+#define LNET_RETRY_COUNT_DEFAULT 2
+unsigned int lnet_retry_count = LNET_RETRY_COUNT_DEFAULT;
 static int retry_count_set(const char *val, const struct kernel_param *kp);
 static struct kernel_param_ops param_ops_retry_count = {
 	.set = retry_count_set,
@@ -184,8 +182,8 @@ static int response_tracking_set(const char *val,
 MODULE_PARM_DESC(lnet_response_tracking,
 		 "(0|1|2|3) LNet Internal Only|GET Reply only|PUT ACK only|Full Tracking (default)");
 
-#define LNET_LND_TIMEOUT_DEFAULT ((LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT - 1) / \
-				  (LNET_RETRY_COUNT_HEALTH_DEFAULT + 1))
+#define LNET_LND_TIMEOUT_DEFAULT ((LNET_TRANSACTION_TIMEOUT_DEFAULT - 1) / \
+				  (LNET_RETRY_COUNT_DEFAULT + 1))
 unsigned int lnet_lnd_timeout = LNET_LND_TIMEOUT_DEFAULT;
 static void lnet_set_lnd_timeout(void)
 {
@@ -235,20 +233,7 @@ static int lnet_discover(struct lnet_process_id id, u32 force,
 		return -EINVAL;
 	}
 
-	/* if we're turning on health then use the health timeout
-	 * defaults.
-	 */
-	if (*sensitivity == 0 && value != 0) {
-		lnet_transaction_timeout =
-			LNET_TRANSACTION_TIMEOUT_HEALTH_DEFAULT;
-		lnet_retry_count = LNET_RETRY_COUNT_HEALTH_DEFAULT;
-		lnet_set_lnd_timeout();
-	/* if we're turning off health then use the no health timeout
-	 * default.
-	 */
-	} else if (*sensitivity != 0 && value == 0) {
-		lnet_transaction_timeout =
-			LNET_TRANSACTION_TIMEOUT_NO_HEALTH_DEFAULT;
+	if (*sensitivity != 0 && value == 0 && lnet_retry_count != 0) {
 		lnet_retry_count = 0;
 		lnet_set_lnd_timeout();
 	}
@@ -396,7 +381,7 @@ static int lnet_discover(struct lnet_process_id id, u32 force,
 	 */
 	mutex_lock(&the_lnet.ln_api_mutex);
 
-	if (value < lnet_retry_count || value == 0) {
+	if (value <= lnet_retry_count || value == 0) {
 		mutex_unlock(&the_lnet.ln_api_mutex);
 		CERROR("Invalid value for lnet_transaction_timeout (%lu). Has to be greater than lnet_retry_count (%u)\n",
 		       value, lnet_retry_count);
@@ -437,9 +422,9 @@ static int lnet_discover(struct lnet_process_id id, u32 force,
 	 */
 	mutex_lock(&the_lnet.ln_api_mutex);
 
-	if (lnet_health_sensitivity == 0) {
+	if (lnet_health_sensitivity == 0 && value > 0) {
 		mutex_unlock(&the_lnet.ln_api_mutex);
-		CERROR("Can not set retry_count when health feature is turned off\n");
+		CERROR("Can not set lnet_retry_count when health feature is turned off\n");
 		return -EINVAL;
 	}
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 22/42] lnet: Fix reference leak in lnet_select_pathway
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (20 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 21/42] lnet: Loosen restrictions on LNet Health params James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 23/42] lustre: llite: prune invalid dentries James Simmons
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Chris Horn <chris.horn@hpe.com>

We call lnet_nid2peerni_locked() to lookup the peer NI for the message
originator. lnet_nid2peerni_locked() takes a reference on the peer NI
that is never dropped.

Fixes: 92d8fb2620 ("lnet: Allow router to forward to healthier NID")
HPE-bug-id: LUS-9185
WC-bug-id: https://jira.whamcloud.com/browse/LU-13896
Lustre-commit: 3420cd6158cb0 ("LU-13896 lnet: Fix reference leak in lnet_select_pathway")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39603
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index aa3f3ab..7474d44 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -2517,6 +2517,8 @@ struct lnet_ni *
 		 * try to send it via non-multi-rail criteria
 		 */
 		if (!IS_ERR(src_lpni)) {
+			/* Drop ref taken by lnet_nid2peerni_locked() */
+			lnet_peer_ni_decref_locked(src_lpni);
 			src_lp = lpni->lpni_peer_net->lpn_peer;
 			if (lnet_peer_is_multi_rail(src_lp) &&
 			    !lnet_is_peer_ni_alive(lpni))
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 23/42] lustre: llite: prune invalid dentries
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (21 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 22/42] lnet: Fix reference leak in lnet_select_pathway James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 24/42] lnet: Do not overwrite destination when routing James Simmons
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Lai Siyao <lai.siyao@whamcloud.com>

When file LOOKUP lock is canceled on client, mark its dentries
invalid, and also prune them to avoid OOM, to achieve this,
ll_invalidate_aliases() is renamed to ll_prune_aliases(), the latter
calls d_prune_aliases() to prune unused invalid dentries.

The same for negative dentries when parent UPDATE lock is canceled,
rename ll_invalidate_negative_children() to
ll_prune_negative_children().

Since now unused invalid dentries will always be pruned, it's not
necessary to call __d_drop() in d_lustre_invalidate().

It's redundant to take i_lock before d_lustre_invalidate() in
ll_inode_revalidate() because d_lustre_invalidate() takes d_lock,
remove it.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13909
Lustre-commit: 1f0b2a0dca6a3 ("LU-13909 llite: prune invalid dentries")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39685
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dcache.c         | 15 ++++++---------
 fs/lustre/llite/file.c           |  7 ++-----
 fs/lustre/llite/llite_internal.h | 14 ++++++--------
 fs/lustre/llite/namei.c          | 40 +++++++++++++++++++++++++++-------------
 4 files changed, 41 insertions(+), 35 deletions(-)

diff --git a/fs/lustre/llite/dcache.c b/fs/lustre/llite/dcache.c
index e8b6fe8..0a6d773 100644
--- a/fs/lustre/llite/dcache.c
+++ b/fs/lustre/llite/dcache.c
@@ -183,7 +183,8 @@ void ll_intent_release(struct lookup_intent *it)
 	it->it_request = NULL;
 }
 
-void ll_invalidate_aliases(struct inode *inode)
+/* mark aliases invalid and prune unused aliases */
+void ll_prune_aliases(struct inode *inode)
 {
 	struct dentry *dentry;
 
@@ -191,15 +192,11 @@ void ll_invalidate_aliases(struct inode *inode)
 	       PFID(ll_inode2fid(inode)), inode);
 
 	spin_lock(&inode->i_lock);
-	hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias) {
-		CDEBUG(D_DENTRY,
-		       "dentry in drop %pd (%p) parent %p inode %p flags %d\n",
-		       dentry, dentry, dentry->d_parent,
-		       d_inode(dentry), dentry->d_flags);
-
-		d_lustre_invalidate(dentry, 0);
-	}
+	hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias)
+		d_lustre_invalidate(dentry);
 	spin_unlock(&inode->i_lock);
+
+	d_prune_aliases(inode);
 }
 
 int ll_revalidate_it_finish(struct ptlrpc_request *request,
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 251cca5..babd24d 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -4614,11 +4614,8 @@ static int ll_inode_revalidate(struct dentry *dentry, enum ldlm_intent_flags op)
 	 * here to preserve get_cwd functionality on 2.6.
 	 * Bug 10503
 	 */
-	if (!d_inode(dentry)->i_nlink) {
-		spin_lock(&inode->i_lock);
-		d_lustre_invalidate(dentry, 0);
-		spin_unlock(&inode->i_lock);
-	}
+	if (!d_inode(dentry)->i_nlink)
+		d_lustre_invalidate(dentry);
 
 	ll_lookup_finish_locks(&oit, inode);
 out:
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 8a0c40c..7c6eddd 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1103,7 +1103,7 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size,
 extern const struct dentry_operations ll_d_ops;
 void ll_intent_drop_lock(struct lookup_intent *it);
 void ll_intent_release(struct lookup_intent *it);
-void ll_invalidate_aliases(struct inode *inode);
+void ll_prune_aliases(struct inode *inode);
 void ll_lookup_finish_locks(struct lookup_intent *it, struct inode *inode);
 int ll_revalidate_it_finish(struct ptlrpc_request *request,
 			    struct lookup_intent *it, struct inode *inode);
@@ -1560,21 +1560,19 @@ static inline int d_lustre_invalid(const struct dentry *dentry)
 
 /*
  * Mark dentry INVALID, if dentry refcount is zero (this is normally case for
- * ll_md_blocking_ast), unhash this dentry, and let dcache to reclaim it later;
- * else dput() of the last refcount will unhash this dentry and kill it.
+ * ll_md_blocking_ast), it will be pruned by ll_prune_aliases() and
+ * ll_prune_negative_children(); otherwise dput() of the last refcount will
+ * unhash this dentry and kill it.
  */
-static inline void d_lustre_invalidate(struct dentry *dentry, int nested)
+static inline void d_lustre_invalidate(struct dentry *dentry)
 {
 	CDEBUG(D_DENTRY,
 	       "invalidate dentry %pd (%p) parent %p inode %p refc %d\n",
 	       dentry, dentry,
 	       dentry->d_parent, d_inode(dentry), d_count(dentry));
 
-	spin_lock_nested(&dentry->d_lock,
-			 nested ? DENTRY_D_LOCK_NESTED : DENTRY_D_LOCK_NORMAL);
+	spin_lock(&dentry->d_lock);
 	ll_d2d(dentry)->lld_invalid = 1;
-	if (d_count(dentry) == 0)
-		__d_drop(dentry);
 	spin_unlock(&dentry->d_lock);
 }
 
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index ce6cd19..f9c10d0 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -152,22 +152,36 @@ struct inode *ll_iget(struct super_block *sb, ino_t hash,
 	return inode;
 }
 
-static void ll_invalidate_negative_children(struct inode *dir)
+/* mark negative sub file dentries invalid and prune unused dentries */
+static void ll_prune_negative_children(struct inode *dir)
 {
-	struct dentry *dentry, *tmp_subdir;
+	struct dentry *dentry;
+	struct dentry *child;
 
+restart:
 	spin_lock(&dir->i_lock);
 	hlist_for_each_entry(dentry, &dir->i_dentry, d_u.d_alias) {
 		spin_lock(&dentry->d_lock);
-		if (!list_empty(&dentry->d_subdirs)) {
-			struct dentry *child;
-
-			list_for_each_entry_safe(child, tmp_subdir,
-						 &dentry->d_subdirs,
-						 d_child) {
-				if (d_really_is_negative(child))
-					d_lustre_invalidate(child, 1);
+		list_for_each_entry(child, &dentry->d_subdirs, d_child) {
+			if (child->d_inode)
+				continue;
+
+			spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
+			ll_d2d(child)->lld_invalid = 1;
+			if (!d_count(child)) {
+				dget_dlock(child);
+				__d_drop(child);
+				spin_unlock(&child->d_lock);
+				spin_unlock(&dentry->d_lock);
+				spin_unlock(&dir->i_lock);
+
+				CDEBUG(D_DENTRY, "prune negative dentry %pd\n",
+				       child);
+
+				dput(child);
+				goto restart;
 			}
+			spin_unlock(&child->d_lock);
 		}
 		spin_unlock(&dentry->d_lock);
 	}
@@ -345,18 +359,18 @@ static void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel)
 							ll_test_inode_by_fid,
 							(void *)&lli->lli_pfid);
 			if (master_inode) {
-				ll_invalidate_negative_children(master_inode);
+				ll_prune_negative_children(master_inode);
 				iput(master_inode);
 			}
 		} else {
-			ll_invalidate_negative_children(inode);
+			ll_prune_negative_children(inode);
 		}
 	}
 
 	if ((bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)) &&
 	    inode->i_sb->s_root &&
 	    !is_root_inode(inode))
-		ll_invalidate_aliases(inode);
+		ll_prune_aliases(inode);
 
 	if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM))
 		forget_all_cached_acls(inode);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 24/42] lnet: Do not overwrite destination when routing
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (22 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 23/42] lustre: llite: prune invalid dentries James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 25/42] lustre: lov: don't use inline for operations functions James Simmons
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Chris Horn <chris.horn@hpe.com>

MR path selection in a routed environment is supposed to allow the
originator of a message to set the final destination NID. On a
multi-hop route, intermediate routers execute the same code path as
the message originator (i.e. the remote send cases). This causes
them to overwrite the destination NID when forwarding the message.

Check the msg_routing flag to determine whether we should set the
final destination NID (i.e. LNet peer NI).

A somewhat related issue is that because intermediate routers are not
selecting a destination lpni, they need to pick the next-hop lpni
based on the destination NID's remote net.

Fixes: 111c56a3c7e ("lnet: fix remote peer ni selection")
HPE-bug-id: LUS-8919
WC-bug-id: https://jira.whamcloud.com/browse/LU-13605
Lustre-commit: ec94d6f77b61fe ("LU-13605 lnet: Do not overwrite destination when routing")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/38731
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 102 +++++++++++++++++++++++++++--------------------
 1 file changed, 59 insertions(+), 43 deletions(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 7474d44..1c9fb41 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1830,52 +1830,73 @@ struct lnet_ni *
 	}
 
 	if (!route_found) {
-		/* we've already looked up the initial lpni using dst_nid */
-		lpni = sd->sd_best_lpni;
-		/* the peer tree must be in existence */
-		LASSERT(lpni && lpni->lpni_peer_net &&
-			lpni->lpni_peer_net->lpn_peer);
-		lp = lpni->lpni_peer_net->lpn_peer;
-
-		list_for_each_entry(lpn, &lp->lp_peer_nets, lpn_peer_nets) {
-			/* is this remote network reachable?  */
-			rnet = lnet_find_rnet_locked(lpn->lpn_net_id);
-			if (!rnet)
-				continue;
+		if (sd->sd_msg->msg_routing) {
+			/* If I'm routing this message then I need to find the
+			 * next hop based on the destination NID
+			 */
+			best_rnet = lnet_find_rnet_locked(LNET_NIDNET(sd->sd_dst_nid));
+			if (!best_rnet) {
+				CERROR("Unable to route message to %s - Route table may be misconfigured\n",
+				       libcfs_nid2str(sd->sd_dst_nid));
+				return -EHOSTUNREACH;
+			}
+		} else {
+			/* we've already looked up the initial lpni using
+			 * dst_nid
+			 */
+			lpni = sd->sd_best_lpni;
+			/* the peer tree must be in existence */
+			LASSERT(lpni && lpni->lpni_peer_net &&
+				lpni->lpni_peer_net->lpn_peer);
+			lp = lpni->lpni_peer_net->lpn_peer;
+
+			list_for_each_entry(lpn, &lp->lp_peer_nets,
+					    lpn_peer_nets) {
+				/* is this remote network reachable?  */
+				rnet = lnet_find_rnet_locked(lpn->lpn_net_id);
+				if (!rnet)
+					continue;
+
+				if (!best_lpn) {
+					best_lpn = lpn;
+					best_rnet = rnet;
+				}
+
+				if (best_lpn->lpn_seq <= lpn->lpn_seq)
+					continue;
 
-			if (!best_lpn) {
 				best_lpn = lpn;
 				best_rnet = rnet;
 			}
 
-			if (best_lpn->lpn_seq <= lpn->lpn_seq)
-				continue;
+			if (!best_lpn) {
+				CERROR("peer %s has no available nets\n",
+				       libcfs_nid2str(sd->sd_dst_nid));
+				return -EHOSTUNREACH;
+			}
 
-			best_lpn = lpn;
-			best_rnet = rnet;
-		}
+			sd->sd_best_lpni = lnet_find_best_lpni(sd->sd_best_ni,
+							       sd->sd_dst_nid,
+							       lp,
+							       best_lpn->lpn_net_id);
+			if (!sd->sd_best_lpni) {
+				CERROR("peer %s is unreachable\n",
+				       libcfs_nid2str(sd->sd_dst_nid));
+				return -EHOSTUNREACH;
+			}
 
-		if (!best_lpn) {
-			CERROR("peer %s has no available nets\n",
-			       libcfs_nid2str(sd->sd_dst_nid));
-			return -EHOSTUNREACH;
-		}
+			/* We're attempting to round robin over the remote peer
+			 * NI's so update the final destination we selected
+			 */
+			sd->sd_final_dst_lpni = sd->sd_best_lpni;
 
-		sd->sd_best_lpni = lnet_find_best_lpni(sd->sd_best_ni,
-						       sd->sd_dst_nid,
-						       lp,
-						       best_lpn->lpn_net_id);
-		if (!sd->sd_best_lpni) {
-			CERROR("peer %s is unreachable\n",
-			       libcfs_nid2str(sd->sd_dst_nid));
-			return -EHOSTUNREACH;
+			/* Increment the sequence number of the remote lpni so
+			 * we can round robin over the different interfaces of
+			 * the remote lpni
+			 */
+			sd->sd_best_lpni->lpni_seq++;
 		}
 
-		/* We're attempting to round robin over the remote peer
-		 * NI's so update the final destination we selected
-		 */
-		sd->sd_final_dst_lpni = sd->sd_best_lpni;
-
 		/* find the best route. Restrict the selection on the net of the
 		 * local NI if we've already picked the local NI to send from.
 		 * Otherwise, let's pick any route we can find and then find
@@ -1903,12 +1924,6 @@ struct lnet_ni *
 		gw = best_route->lr_gateway;
 		LASSERT(gw == gwni->lpni_peer_net->lpn_peer);
 		local_lnet = best_route->lr_lnet;
-
-		/* Increment the sequence number of the remote lpni so we
-		 * can round robin over the different interfaces of the
-		 * remote lpni
-		 */
-		sd->sd_best_lpni->lpni_seq++;
 	}
 
 	/* Discover this gateway if it hasn't already been discovered.
@@ -1945,7 +1960,8 @@ struct lnet_ni *
 	if (sd->sd_rtr_nid == LNET_NID_ANY) {
 		LASSERT(best_route && last_route);
 		best_route->lr_seq = last_route->lr_seq + 1;
-		best_lpn->lpn_seq++;
+		if (best_lpn)
+			best_lpn->lpn_seq++;
 	}
 
 	return 0;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 25/42] lustre: lov: don't use inline for operations functions.
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (23 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 24/42] lnet: Do not overwrite destination when routing James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 26/42] lustre: osc: don't allow negative grants James Simmons
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

These functions have their address taken and stored in an 'operations'
structure.  So they cannot possibly be compiled "inline".
So remove the "inline" declaration.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: e58cfb7ec5ed5d ("LU-6142 lov: don't use inline for operations functions.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39376
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_ea.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/lov/lov_ea.c b/fs/lustre/lov/lov_ea.c
index 1d105c0..13f47ee 100644
--- a/fs/lustre/lov/lov_ea.c
+++ b/fs/lustre/lov/lov_ea.c
@@ -303,7 +303,7 @@ void lsm_free(struct lov_stripe_md *lsm)
 	return ERR_PTR(rc);
 }
 
-static inline struct lov_stripe_md *
+static struct lov_stripe_md *
 lsm_unpackmd_v1v3(struct lov_obd *lov,
 		  struct lov_mds_md *lmm, size_t buf_size,
 		  const char *pool_name,
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 26/42] lustre: osc: don't allow negative grants
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (24 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 25/42] lustre: lov: don't use inline for operations functions James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 27/42] lustre: mgc: Use IR for client->MDS/OST connections James Simmons
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Mikhail Pershin <mpershin@whamcloud.com>

Add check in the osc_init_grant() to prevent possible
underflow of cl_avail_grant and report error if it happens

WC-bug-id: https://jira.whamcloud.com/browse/LU-13763
Lustre-commit: e05ccafd6ee214 ("LU-13763 osc: don't allow negative grants")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39827
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_request.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index fbb8453..53f87ea 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1030,12 +1030,19 @@ void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd)
 	cli->cl_avail_grant = ocd->ocd_grant;
 	spin_lock(&cli->cl_loi_list_lock);
 	if (cli->cl_import->imp_state != LUSTRE_IMP_EVICTED) {
-		cli->cl_avail_grant -= cli->cl_reserved_grant;
+		unsigned long consumed = cli->cl_reserved_grant;
+
 		if (OCD_HAS_FLAG(ocd, GRANT_PARAM))
-			cli->cl_avail_grant -= cli->cl_dirty_grant;
+			consumed += cli->cl_dirty_grant;
 		else
-			cli->cl_avail_grant -=
-				cli->cl_dirty_pages << PAGE_SHIFT;
+			consumed += cli->cl_dirty_pages << PAGE_SHIFT;
+		if (cli->cl_avail_grant < consumed) {
+			CERROR("%s: granted %ld but already consumed %ld\n",
+			       cli_name(cli), cli->cl_avail_grant, consumed);
+			cli->cl_avail_grant = 0;
+		} else {
+			cli->cl_avail_grant -= consumed;
+		}
 	}
 
 	if (OCD_HAS_FLAG(ocd, GRANT_PARAM)) {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 27/42] lustre: mgc: Use IR for client->MDS/OST connections
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (25 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 26/42] lustre: osc: don't allow negative grants James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 28/42] lustre: ldlm: don't use a locks without l_ast_data James Simmons
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Amir Shehata <ashehata@whamcloud.com>

When a target registers with the MGS, the MGS sends an IR log
to the client to speed up recovery.

The IR log contains the updated NID information on the target
which just registered.

This patch allows clients to update their imports with the latest
NIDs for the targets reported in the IR log. It also allows clients
to create new connections for targets which were not added via
the config log.

For example if a target reboots and comes up with a new NID, then
the client can continue using it.

This functionality is disabled by default and can be enabled by
setting a new file system specific module parameter, dynamic_nids.

    lctl set_param mgc.*.dynamic_nids=1

This parameters will need to be set on the clients and the MGS

WC-bug-id: https://jira.whamcloud.com/browse/LU-10360
Lustre-commit: 37be05eca3f4ae ("LU-10360 mgc: Use IR for client->MDS/OST connections")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39613
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_net.h   |  4 +++
 fs/lustre/include/obd.h          |  3 ++-
 fs/lustre/include/obd_class.h    |  2 ++
 fs/lustre/ldlm/ldlm_lib.c        | 44 ++++++++++++++++++++++++++++++
 fs/lustre/mgc/lproc_mgc.c        | 30 +++++++++++++++++++++
 fs/lustre/mgc/mgc_request.c      | 58 +++++++++++++++++++++++++++++++---------
 fs/lustre/obdclass/lustre_peer.c | 37 ++++++++++++++++++++++---
 7 files changed, 160 insertions(+), 18 deletions(-)

diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index d199121..1e7fe03 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -2345,6 +2345,10 @@ int client_connect_import(const struct lu_env *env,
 int client_disconnect_export(struct obd_export *exp);
 int client_import_add_conn(struct obd_import *imp, struct obd_uuid *uuid,
 			   int priority);
+int client_import_dyn_add_conn(struct obd_import *imp, struct obd_uuid *uuid,
+			       lnet_nid_t prim_nid, int priority);
+int client_import_add_nids_to_conn(struct obd_import *imp, lnet_nid_t *nids,
+				   int nid_count, struct obd_uuid *uuid);
 int client_import_del_conn(struct obd_import *imp, struct obd_uuid *uuid);
 int client_import_find_conn(struct obd_import *imp, lnet_nid_t peer,
 			    struct obd_uuid *uuid);
diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 083884c9f..39e3d51 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -564,7 +564,8 @@ struct obd_device {
 					    */
 		      obd_no_ir:1,	   /* no imperative recovery. */
 		      obd_process_conf:1,  /* device is processing mgs config */
-		      obd_checksum_dump:1; /* dump pages upon cksum error */
+		      obd_checksum_dump:1, /* dump pages upon cksum error */
+		      obd_dynamic_nids:1;  /* Allow dynamic NIDs on device */
 	/* use separate field as it is set in interrupt to don't mess with
 	 * protection of other bits using _bh lock
 	 */
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index a22581d..1ac9fcf 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -1694,6 +1694,8 @@ struct lwp_register_item {
 int lustre_uuid_to_peer(const char *uuid, lnet_nid_t *peer_nid, int index);
 int class_add_uuid(const char *uuid, u64 nid);
 int class_del_uuid(const char *uuid);
+int class_add_nids_to_uuid(struct obd_uuid *uuid, lnet_nid_t *nids,
+			   int nid_count);
 int class_check_uuid(struct obd_uuid *uuid, u64 nid);
 
 /* class_obd.c */
diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c
index 2698b93..713ca1c 100644
--- a/fs/lustre/ldlm/ldlm_lib.c
+++ b/fs/lustre/ldlm/ldlm_lib.c
@@ -140,6 +140,50 @@ int client_import_add_conn(struct obd_import *imp, struct obd_uuid *uuid,
 }
 EXPORT_SYMBOL(client_import_add_conn);
 
+int client_import_dyn_add_conn(struct obd_import *imp, struct obd_uuid *uuid,
+			       lnet_nid_t prim_nid, int priority)
+{
+	struct ptlrpc_connection *ptlrpc_conn;
+	int rc;
+
+	ptlrpc_conn = ptlrpc_uuid_to_connection(uuid, prim_nid);
+	if (!ptlrpc_conn) {
+		const char *str_uuid = obd_uuid2str(uuid);
+
+		rc = class_add_uuid(str_uuid, prim_nid);
+		if (rc) {
+			CERROR("%s: failed to add UUID '%s': rc = %d\n",
+			       imp->imp_obd->obd_name, str_uuid, rc);
+			return rc;
+		}
+	}
+	return import_set_conn(imp, uuid, priority, 1);
+}
+EXPORT_SYMBOL(client_import_dyn_add_conn);
+
+int client_import_add_nids_to_conn(struct obd_import *imp, lnet_nid_t *nids,
+				   int nid_count, struct obd_uuid *uuid)
+{
+	struct obd_import_conn *conn;
+	int rc = -ENOENT;
+
+	if (nid_count <= 0 || !nids)
+		return rc;
+
+	spin_lock(&imp->imp_lock);
+	list_for_each_entry(conn, &imp->imp_conn_list, oic_item) {
+		if (class_check_uuid(&conn->oic_uuid, nids[0])) {
+			*uuid = conn->oic_uuid;
+			rc = class_add_nids_to_uuid(&conn->oic_uuid, nids,
+						    nid_count);
+			break;
+		}
+	}
+	spin_unlock(&imp->imp_lock);
+	return rc;
+}
+EXPORT_SYMBOL(client_import_add_nids_to_conn);
+
 int client_import_del_conn(struct obd_import *imp, struct obd_uuid *uuid)
 {
 	struct obd_import_conn *imp_conn;
diff --git a/fs/lustre/mgc/lproc_mgc.c b/fs/lustre/mgc/lproc_mgc.c
index c22ec23..dd7ed0f 100644
--- a/fs/lustre/mgc/lproc_mgc.c
+++ b/fs/lustre/mgc/lproc_mgc.c
@@ -71,10 +71,40 @@ struct ldebugfs_vars lprocfs_mgc_obd_vars[] = {
 
 LUSTRE_RW_ATTR(ping);
 
+ssize_t dynamic_nids_show(struct kobject *kobj, struct attribute *attr,
+			  char *buf)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%u\n", obd->obd_dynamic_nids);
+}
+
+ssize_t dynamic_nids_store(struct kobject *kobj, struct attribute *attr,
+			   const char *buffer, size_t count)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+	bool val;
+	int rc;
+
+	rc = kstrtobool(buffer, &val);
+	if (rc)
+		return rc;
+
+	spin_lock(&obd->obd_dev_lock);
+	obd->obd_dynamic_nids = val;
+	spin_unlock(&obd->obd_dev_lock);
+
+	return count;
+}
+LUSTRE_RW_ATTR(dynamic_nids);
+
 static struct attribute *mgc_attrs[] = {
 	&lustre_attr_mgs_conn_uuid.attr,
 	&lustre_attr_conn_uuid.attr,
 	&lustre_attr_ping.attr,
+	&lustre_attr_dynamic_nids.attr,
 	NULL,
 };
 
diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c
index cc3c82e..8133f27 100644
--- a/fs/lustre/mgc/mgc_request.c
+++ b/fs/lustre/mgc/mgc_request.c
@@ -1107,10 +1107,14 @@ static int mgc_apply_recover_logs(struct obd_device *mgc,
 	int pos;
 	int rc = 0;
 	int off = 0;
+	unsigned long dynamic_nids;
 
 	LASSERT(cfg->cfg_instance);
 	LASSERT(cfg->cfg_sb == cfg->cfg_instance);
 
+	/* get dynamic nids setting */
+	dynamic_nids = mgc->obd_dynamic_nids;
+
 	inst = kzalloc(PAGE_SIZE, GFP_KERNEL);
 	if (!inst)
 		return -ENOMEM;
@@ -1127,7 +1131,7 @@ static int mgc_apply_recover_logs(struct obd_device *mgc,
 
 	while (datalen > 0) {
 		int entry_len = sizeof(*entry);
-		int is_ost, i;
+		int is_ost;
 		struct obd_device *obd;
 		char *obdname;
 		char *cname;
@@ -1236,23 +1240,51 @@ static int mgc_apply_recover_logs(struct obd_device *mgc,
 
 		/* iterate all nids to find one */
 		/* find uuid by nid */
-		rc = -ENOENT;
-		for (i = 0; i < entry->mne_nid_count; i++) {
-			rc = client_import_find_conn(obd->u.cli.cl_import,
-						     entry->u.nids[0],
-						     (struct obd_uuid *)uuid);
-			if (!rc)
-				break;
+		/* create import entries if they don't exist */
+		rc = client_import_add_nids_to_conn(obd->u.cli.cl_import,
+						    entry->u.nids,
+						    entry->mne_nid_count,
+						    (struct obd_uuid *)uuid);
+		if (rc == -ENOENT && dynamic_nids) {
+			/* create a new connection for this import */
+			char *primary_nid = libcfs_nid2str(entry->u.nids[0]);
+			int prim_nid_len = strlen(primary_nid) + 1;
+			struct obd_uuid server_uuid;
+
+			if (prim_nid_len > UUID_MAX)
+				goto fail;
+			strncpy(server_uuid.uuid, primary_nid, prim_nid_len);
+
+			CDEBUG(D_INFO, "Adding a connection for %s\n",
+			       primary_nid);
+
+			rc = client_import_dyn_add_conn(obd->u.cli.cl_import,
+							&server_uuid,
+							entry->u.nids[0], 1);
+			if (rc < 0) {
+				CERROR("%s: Failed to add new connection with NID '%s' to import: rc = %d\n",
+				       obd->obd_name, primary_nid, rc);
+				goto fail;
+			}
+			rc = client_import_add_nids_to_conn(obd->u.cli.cl_import,
+							    entry->u.nids,
+							    entry->mne_nid_count,
+							    (struct obd_uuid *)uuid);
+			if (rc < 0) {
+				CERROR("%s: failed to lookup UUID: rc = %d\n",
+				       obd->obd_name, rc);
+				goto fail;
+			}
 		}
-
+fail:
 		up_read(&obd->u.cli.cl_sem);
-		if (rc < 0) {
-			CERROR("mgc: cannot find uuid by nid %s\n",
-			       libcfs_nid2str(entry->u.nids[0]));
+		if (rc < 0 && rc != -ENOSPC) {
+			CERROR("mgc: cannot find UUID by nid '%s': rc = %d\n",
+			       libcfs_nid2str(entry->u.nids[0]), rc);
 			break;
 		}
 
-		CDEBUG(D_INFO, "Find uuid %s by nid %s\n",
+		CDEBUG(D_INFO, "Found UUID '%s' by NID '%s'\n",
 		       uuid, libcfs_nid2str(entry->u.nids[0]));
 
 		pos += strlen(uuid);
diff --git a/fs/lustre/obdclass/lustre_peer.c b/fs/lustre/obdclass/lustre_peer.c
index 58b6e670..2675594 100644
--- a/fs/lustre/obdclass/lustre_peer.c
+++ b/fs/lustre/obdclass/lustre_peer.c
@@ -41,13 +41,11 @@
 #include <lustre_net.h>
 #include <lprocfs_status.h>
 
-#define NIDS_MAX	32
-
 struct uuid_nid_data {
 	struct list_head	un_list;
 	struct obd_uuid		un_uuid;
 	int			un_nid_count;
-	lnet_nid_t		un_nids[NIDS_MAX];
+	lnet_nid_t		un_nids[MTI_NIDS_MAX];
 };
 
 /* FIXME: This should probably become more elegant than a global linked list */
@@ -109,7 +107,7 @@ int class_add_uuid(const char *uuid, u64 nid)
 					break;
 
 			if (i == entry->un_nid_count) {
-				LASSERT(entry->un_nid_count < NIDS_MAX);
+				LASSERT(entry->un_nid_count < MTI_NIDS_MAX);
 				entry->un_nids[entry->un_nid_count++] = nid;
 			}
 			break;
@@ -128,6 +126,7 @@ int class_add_uuid(const char *uuid, u64 nid)
 	}
 	return 0;
 }
+EXPORT_SYMBOL(class_add_uuid);
 
 /* Delete the nids for one uuid if specified, otherwise delete all */
 int class_del_uuid(const char *uuid)
@@ -171,6 +170,36 @@ int class_del_uuid(const char *uuid)
 	return 0;
 }
 
+int class_add_nids_to_uuid(struct obd_uuid *uuid, lnet_nid_t *nids,
+			   int nid_count)
+{
+	struct uuid_nid_data *entry;
+	int i;
+
+	if (nid_count >= MTI_NIDS_MAX) {
+		CDEBUG(D_NET, "too many NIDs (%d) for UUID '%s'\n",
+		       nid_count, obd_uuid2str(uuid));
+		return -ENOSPC;
+	}
+
+	spin_lock(&g_uuid_lock);
+	list_for_each_entry(entry, &g_uuid_list, un_list) {
+		CDEBUG(D_NET, "Comparing %s with %s\n",
+		       obd_uuid2str(uuid), obd_uuid2str(&entry->un_uuid));
+
+		if (!obd_uuid_equals(&entry->un_uuid, uuid))
+			continue;
+		CDEBUG(D_NET, "Updating UUID '%s'\n", obd_uuid2str(uuid));
+		for (i = 0; i < nid_count; i++)
+			entry->un_nids[i] = nids[i];
+		entry->un_nid_count = nid_count;
+		break;
+	}
+	spin_unlock(&g_uuid_lock);
+	return 0;
+}
+EXPORT_SYMBOL(class_add_nids_to_uuid);
+
 /* check if @nid exists in nid list of @uuid */
 int class_check_uuid(struct obd_uuid *uuid, u64 nid)
 {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 28/42] lustre: ldlm: don't use a locks without l_ast_data
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (26 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 27/42] lustre: mgc: Use IR for client->MDS/OST connections James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 29/42] lustre: lov: discard unused lov_dump_lmm* functions James Simmons
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Alexey Lyashkov <alexey.lyashkov@hpe.com>

Partially initialized lock (without l_ast_data to be init)
caused a fail with blocking ast, as discard from page cache
skipped, and stale data will read later with fast read.
Slow read have chance to attach this lock to right IO, but
it don?t true always, so that should disabled, until lock
will have l_ast_data set always for DoM and Lock Ahead locks.

HPE-bugid: LUS-8750
WC-bug-id: https://jira.whamcloud.com/browse/LU-13645
Lustre-commit: a6798c5806088d ("LU-13645 ldlm: don't use a locks without l_ast_data")
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-on: https://review.whamcloud.com/39318
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm.h | 17 +++++++++++------
 fs/lustre/include/lustre_osc.h |  4 ++++
 fs/lustre/ldlm/ldlm_lock.c     | 15 ++++++++-------
 fs/lustre/mdc/mdc_dev.c        | 22 +++++++++++++++-------
 fs/lustre/mdc/mdc_locks.c      |  5 ++---
 fs/lustre/osc/osc_cache.c      |  2 +-
 fs/lustre/osc/osc_internal.h   |  3 ++-
 fs/lustre/osc/osc_lock.c       | 12 +++++++++---
 fs/lustre/osc/osc_object.c     |  2 +-
 fs/lustre/osc/osc_request.c    | 11 +++++++----
 10 files changed, 60 insertions(+), 33 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index e2a7b6b..871d4ff 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -821,6 +821,12 @@ struct ldlm_lock {
 #endif
 };
 
+enum ldlm_match_flags {
+	LDLM_MATCH_UNREF	= BIT(0),
+	LDLM_MATCH_AST		= BIT(1),
+	LDLM_MATCH_AST_ANY	= BIT(2),
+};
+
 /**
  * Describe the overlap between two locks.  itree_overlap_cb data.
  */
@@ -831,8 +837,7 @@ struct ldlm_match_data {
 	union ldlm_policy_data	*lmd_policy;
 	u64			 lmd_flags;
 	u64			 lmd_skip_flags;
-	int			 lmd_unref;
-	bool			 lmd_has_ast_data;
+	enum ldlm_match_flags	 lmd_match;
 };
 
 /**
@@ -1172,6 +1177,7 @@ void ldlm_lock_decref_and_cancel(const struct lustre_handle *lockh,
 void ldlm_lock_fail_match_locked(struct ldlm_lock *lock);
 void ldlm_lock_allow_match(struct ldlm_lock *lock);
 void ldlm_lock_allow_match_locked(struct ldlm_lock *lock);
+
 enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns,
 					 u64 flags, u64 skip_flags,
 					 const struct ldlm_res_id *res_id,
@@ -1179,18 +1185,17 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns,
 					 union ldlm_policy_data *policy,
 					 enum ldlm_mode mode,
 					 struct lustre_handle *lh,
-					 int unref);
+					 enum ldlm_match_flags match_flags);
 static inline enum ldlm_mode ldlm_lock_match(struct ldlm_namespace *ns,
 					     u64 flags,
 					     const struct ldlm_res_id *res_id,
 					     enum ldlm_type type,
 					     union ldlm_policy_data *policy,
 					     enum ldlm_mode mode,
-					     struct lustre_handle *lh,
-					     int unref)
+					     struct lustre_handle *lh)
 {
 	return ldlm_lock_match_with_skip(ns, flags, 0, res_id, type, policy,
-					 mode, lh, unref);
+					 mode, lh, 0);
 }
 struct ldlm_lock *search_itree(struct ldlm_resource *res,
 			       struct ldlm_match_data *data);
diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index 3956ab4..24cfec8 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -244,6 +244,10 @@ enum osc_dap_flags {
 	 * Return the lock even if it is being canceled.
 	 */
 	OSC_DAP_FL_CANCELING = BIT(1),
+	/**
+	 * check ast data is present, requested to cancel cb
+	 */
+	OSC_DAP_FL_AST	     = BIT(2),
 };
 
 /*
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index 7fda7b8..f5dedc9 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -1074,7 +1074,7 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata)
 	    !(data->lmd_flags & LDLM_FL_CBPENDING))
 		return false;
 
-	if (!data->lmd_unref && ldlm_is_cbpending(lock) &&
+	if (!(data->lmd_match & LDLM_MATCH_UNREF) && ldlm_is_cbpending(lock) &&
 	    !lock->l_readers && !lock->l_writers)
 		return false;
 
@@ -1084,11 +1084,12 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata)
 	/* When we search for ast_data, we are not doing a traditional match,
 	 * so we don't worry about IBITS or extent matching.
 	 */
-	if (data->lmd_has_ast_data) {
+	if (data->lmd_match & (LDLM_MATCH_AST | LDLM_MATCH_AST_ANY)) {
 		if (!lock->l_ast_data)
 			return false;
 
-		goto matched;
+		if (data->lmd_match & LDLM_MATCH_AST_ANY)
+			goto matched;
 	}
 
 	match = lock->l_req_mode;
@@ -1121,7 +1122,7 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata)
 	 * We match if we have existing lock with same or wider set
 	 * of bits.
 	 */
-	if (!data->lmd_unref && LDLM_HAVE_MASK(lock, GONE))
+	if (!(data->lmd_match & LDLM_MATCH_UNREF) && LDLM_HAVE_MASK(lock, GONE))
 		return false;
 
 	if (!equi(data->lmd_flags & LDLM_FL_LOCAL_ONLY, ldlm_is_local(lock)))
@@ -1273,7 +1274,8 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns,
 					 enum ldlm_type type,
 					 union ldlm_policy_data *policy,
 					 enum ldlm_mode mode,
-					 struct lustre_handle *lockh, int unref)
+					 struct lustre_handle *lockh,
+					 enum ldlm_match_flags match_flags)
 {
 	struct ldlm_match_data data = {
 		.lmd_old	= NULL,
@@ -1282,8 +1284,7 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns,
 		.lmd_policy	= policy,
 		.lmd_flags	= flags,
 		.lmd_skip_flags	= skip_flags,
-		.lmd_unref	= unref,
-		.lmd_has_ast_data = false,
+		.lmd_match	= match_flags,
 	};
 	struct ldlm_resource *res;
 	struct ldlm_lock *lock;
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index d6d98ae..329371b 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -92,14 +92,16 @@ int mdc_dom_lock_match(const struct lu_env *env, struct obd_export *exp,
 		       struct ldlm_res_id *res_id, enum ldlm_type type,
 		       union ldlm_policy_data *policy, enum ldlm_mode mode,
 		       u64 *flags, struct osc_object *obj,
-		       struct lustre_handle *lockh, int unref)
+		       struct lustre_handle *lockh,
+		       enum ldlm_match_flags match_flags)
 {
 	struct obd_device *obd = exp->exp_obd;
 	u64 lflags = *flags;
 	enum ldlm_mode rc;
 
-	rc = ldlm_lock_match(obd->obd_namespace, lflags,
-			     res_id, type, policy, mode, lockh, unref);
+	rc = ldlm_lock_match_with_skip(obd->obd_namespace, lflags, 0,
+				       res_id, type, policy, mode, lockh,
+				       match_flags);
 	if (rc == 0 || lflags & LDLM_FL_TEST_LOCK)
 		return rc;
 
@@ -139,6 +141,7 @@ struct ldlm_lock *mdc_dlmlock_at_pgoff(const struct lu_env *env,
 	struct ldlm_lock *lock = NULL;
 	enum ldlm_mode mode;
 	u64 flags;
+	enum ldlm_match_flags match_flags = 0;
 
 	fid_build_reg_res_name(lu_object_fid(osc2lu(obj)), resname);
 	mdc_lock_build_policy(env, policy);
@@ -147,6 +150,12 @@ struct ldlm_lock *mdc_dlmlock_at_pgoff(const struct lu_env *env,
 	if (dap_flags & OSC_DAP_FL_TEST_LOCK)
 		flags |= LDLM_FL_TEST_LOCK;
 
+	if (dap_flags & OSC_DAP_FL_AST)
+		match_flags |= LDLM_MATCH_AST;
+
+	if (dap_flags & OSC_DAP_FL_CANCELING)
+		match_flags |= LDLM_MATCH_UNREF;
+
 again:
 	/* Next, search for already existing extent locks that will cover us */
 	/* If we're trying to read, we also search for an existing PW lock.  The
@@ -155,8 +164,7 @@ struct ldlm_lock *mdc_dlmlock_at_pgoff(const struct lu_env *env,
 	 */
 	mode = mdc_dom_lock_match(env, osc_export(obj), resname, LDLM_IBITS,
 				  policy, LCK_PR | LCK_PW | LCK_GROUP, &flags,
-				  obj, &lockh,
-				  dap_flags & OSC_DAP_FL_CANCELING);
+				  obj, &lockh, match_flags);
 	if (mode) {
 		lock = ldlm_handle2lock(&lockh);
 		/* RACE: the lock is cancelled so let's try again */
@@ -184,7 +192,7 @@ static bool mdc_check_and_discard_cb(const struct lu_env *env, struct cl_io *io,
 
 		/* refresh non-overlapped index */
 		tmp = mdc_dlmlock_at_pgoff(env, osc, index,
-					   OSC_DAP_FL_TEST_LOCK);
+					   OSC_DAP_FL_TEST_LOCK | OSC_DAP_FL_AST);
 		if (tmp) {
 			info->oti_fn_index = CL_PAGE_EOF;
 			LDLM_LOCK_PUT(tmp);
@@ -692,7 +700,7 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp,
 	 * such locks should be skipped.
 	 */
 	mode = ldlm_lock_match(obd->obd_namespace, match_flags, res_id,
-			       einfo->ei_type, policy, mode, &lockh, 0);
+			       einfo->ei_type, policy, mode, &lockh);
 	if (mode) {
 		struct ldlm_lock *matched;
 
diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index 2d623ff..72ee070 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -146,7 +146,7 @@ enum ldlm_mode mdc_lock_match(struct obd_export *exp, u64 flags,
 	/* LU-4405: Clear bits not supported by server */
 	policy->l_inodebits.bits &= exp_connect_ibits(exp);
 	rc = ldlm_lock_match(class_exp2obd(exp)->obd_namespace, flags,
-			     &res_id, type, policy, mode, lockh, 0);
+			     &res_id, type, policy, mode, lockh);
 	return rc;
 }
 
@@ -1185,8 +1185,7 @@ static int mdc_finish_intent_lock(struct obd_export *exp,
 
 		memcpy(&old_lock, lockh, sizeof(*lockh));
 		if (ldlm_lock_match(NULL, LDLM_FL_BLOCK_GRANTED, NULL,
-				    LDLM_IBITS, &policy, LCK_NL,
-				    &old_lock, 0)) {
+				    LDLM_IBITS, &policy, LCK_NL, &old_lock)) {
 			ldlm_lock_decref_and_cancel(lockh,
 						    it->it_lock_mode);
 			memcpy(lockh, &old_lock, sizeof(old_lock));
diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index c7aaabb..ddf6fb1 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -3216,7 +3216,7 @@ static bool check_and_discard_cb(const struct lu_env *env, struct cl_io *io,
 
 		/* refresh non-overlapped index */
 		tmp = osc_dlmlock_at_pgoff(env, osc, index,
-					   OSC_DAP_FL_TEST_LOCK);
+					   OSC_DAP_FL_TEST_LOCK | OSC_DAP_FL_AST);
 		if (tmp) {
 			u64 end = tmp->l_policy_data.l_extent.end;
 			/* Cache the first-non-overlapped index so as to skip
diff --git a/fs/lustre/osc/osc_internal.h b/fs/lustre/osc/osc_internal.h
index 6bec6bf..fc3ca8a 100644
--- a/fs/lustre/osc/osc_internal.h
+++ b/fs/lustre/osc/osc_internal.h
@@ -68,7 +68,8 @@ int osc_match_base(const struct lu_env *env, struct obd_export *exp,
 		   struct ldlm_res_id *res_id, enum ldlm_type type,
 		   union ldlm_policy_data *policy, enum ldlm_mode mode,
 		   u64 *flags, struct osc_object *obj,
-		   struct lustre_handle *lockh, int unref);
+		   struct lustre_handle *lockh,
+		   enum ldlm_match_flags match_flags);
 
 int osc_setattr_async(struct obd_export *exp, struct obdo *oa,
 		      obd_enqueue_update_f upcall, void *cookie,
diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index 27495e3..ed9f0a0 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -579,8 +579,7 @@ int osc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data)
 	matchdata.lmd_mode = &mode;
 	matchdata.lmd_policy = &policy;
 	matchdata.lmd_flags = LDLM_FL_TEST_LOCK | LDLM_FL_CBPENDING;
-	matchdata.lmd_unref = 1;
-	matchdata.lmd_has_ast_data = true;
+	matchdata.lmd_match = LDLM_MATCH_UNREF | LDLM_MATCH_AST_ANY;
 
 	LDLM_LOCK_GET(dlmlock);
 
@@ -1267,6 +1266,7 @@ struct ldlm_lock *osc_obj_dlmlock_at_pgoff(const struct lu_env *env,
 	struct ldlm_lock *lock = NULL;
 	enum ldlm_mode mode;
 	u64 flags;
+	enum ldlm_match_flags match_flags = 0;
 
 	ostid_build_res_name(&obj->oo_oinfo->loi_oi, resname);
 	osc_index2policy(policy, osc2cl(obj), index, index);
@@ -1276,6 +1276,12 @@ struct ldlm_lock *osc_obj_dlmlock_at_pgoff(const struct lu_env *env,
 	if (dap_flags & OSC_DAP_FL_TEST_LOCK)
 		flags |= LDLM_FL_TEST_LOCK;
 
+	if (dap_flags & OSC_DAP_FL_AST)
+		match_flags |= LDLM_MATCH_AST;
+
+	if (dap_flags & OSC_DAP_FL_CANCELING)
+		match_flags |= LDLM_MATCH_UNREF;
+
 	/*
 	 * It is fine to match any group lock since there could be only one
 	 * with a uniq gid and it conflicts with all other lock modes too
@@ -1283,7 +1289,7 @@ struct ldlm_lock *osc_obj_dlmlock_at_pgoff(const struct lu_env *env,
 again:
 	mode = osc_match_base(env, osc_export(obj), resname, LDLM_EXTENT,
 			      policy, LCK_PR | LCK_PW | LCK_GROUP, &flags,
-			      obj, &lockh, dap_flags & OSC_DAP_FL_CANCELING);
+			      obj, &lockh, match_flags);
 	if (mode != 0) {
 		lock = ldlm_handle2lock(&lockh);
 		/* RACE: the lock is cancelled so let's try again */
diff --git a/fs/lustre/osc/osc_object.c b/fs/lustre/osc/osc_object.c
index 1f7ff24..9a0fc54 100644
--- a/fs/lustre/osc/osc_object.c
+++ b/fs/lustre/osc/osc_object.c
@@ -275,7 +275,7 @@ static int osc_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	mode = ldlm_lock_match(exp->exp_obd->obd_namespace,
 			       LDLM_FL_BLOCK_GRANTED | LDLM_FL_LVB_READY,
 			       &resid, LDLM_EXTENT, &policy,
-			       LCK_PR | LCK_PW, &lockh, 0);
+			       LCK_PR | LCK_PW, &lockh);
 	if (mode) { /* lock is cached on client */
 		if (mode != LCK_PR) {
 			ldlm_lock_addref(&lockh, LCK_PR);
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 53f87ea..8a8a624 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -2723,7 +2723,7 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 	if (intent != 0)
 		match_flags |= LDLM_FL_BLOCK_GRANTED;
 	mode = ldlm_lock_match(obd->obd_namespace, match_flags, res_id,
-			       einfo->ei_type, policy, mode, &lockh, 0);
+			       einfo->ei_type, policy, mode, &lockh);
 	if (mode) {
 		struct ldlm_lock *matched;
 
@@ -2830,7 +2830,8 @@ int osc_match_base(const struct lu_env *env, struct obd_export *exp,
 		   struct ldlm_res_id *res_id, enum ldlm_type type,
 		   union ldlm_policy_data *policy, enum ldlm_mode mode,
 		   u64 *flags, struct osc_object *obj,
-		   struct lustre_handle *lockh, int unref)
+		   struct lustre_handle *lockh,
+		   enum ldlm_match_flags match_flags)
 {
 	struct obd_device *obd = exp->exp_obd;
 	u64 lflags = *flags;
@@ -2853,8 +2854,10 @@ int osc_match_base(const struct lu_env *env, struct obd_export *exp,
 	rc = mode;
 	if (mode == LCK_PR)
 		rc |= LCK_PW;
-	rc = ldlm_lock_match(obd->obd_namespace, lflags,
-			     res_id, type, policy, rc, lockh, unref);
+
+	rc = ldlm_lock_match_with_skip(obd->obd_namespace, lflags, 0,
+				       res_id, type, policy, rc, lockh,
+				       match_flags);
 	if (!rc || lflags & LDLM_FL_TEST_LOCK)
 		return rc;
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 29/42] lustre: lov: discard unused lov_dump_lmm* functions
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (27 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 28/42] lustre: ldlm: don't use a locks without l_ast_data James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 30/42] lustre: lov: guard against class_exp2obd() returning NULL James Simmons
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

lov_dump_lmm_v3() is never used, so remove it.

So mark lov_lsm_pack_v1v3() and lov_lsm_pack_foreign() as static.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: c7d0d63b5191c6 ("LU-6142 lov: discard unused lov_dump_lmm* functions")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39378
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_internal.h |  1 -
 fs/lustre/lov/lov_pack.c     | 17 ++++-------------
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/fs/lustre/lov/lov_internal.h b/fs/lustre/lov/lov_internal.h
index 5f5f98e..202e4b5 100644
--- a/fs/lustre/lov/lov_internal.h
+++ b/fs/lustre/lov/lov_internal.h
@@ -308,7 +308,6 @@ struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, void *buf,
 int lov_free_memmd(struct lov_stripe_md **lsmp);
 
 void lov_dump_lmm_v1(int level, struct lov_mds_md_v1 *lmm);
-void lov_dump_lmm_v3(int level, struct lov_mds_md_v3 *lmm);
 void lov_dump_lmm_common(int level, void *lmmp);
 
 /* lov_ea.c */
diff --git a/fs/lustre/lov/lov_pack.c b/fs/lustre/lov/lov_pack.c
index a6a68b7..d200a62 100644
--- a/fs/lustre/lov/lov_pack.c
+++ b/fs/lustre/lov/lov_pack.c
@@ -92,15 +92,6 @@ void lov_dump_lmm_v1(int level, struct lov_mds_md_v1 *lmm)
 			     le16_to_cpu(lmm->lmm_stripe_count));
 }
 
-void lov_dump_lmm_v3(int level, struct lov_mds_md_v3 *lmm)
-{
-	lov_dump_lmm_common(level, lmm);
-	CDEBUG_LIMIT(level, "pool_name " LOV_POOLNAMEF "\n",
-		     lmm->lmm_pool_name);
-	lov_dump_lmm_objects(level, lmm->lmm_objects,
-			     le16_to_cpu(lmm->lmm_stripe_count));
-}
-
 /**
  * Pack LOV striping metadata for disk storage format (in little
  * endian byte order).
@@ -109,8 +100,8 @@ void lov_dump_lmm_v3(int level, struct lov_mds_md_v3 *lmm)
  * then return the size needed. If @buf_size is too small then
  * return -ERANGE. Otherwise return the size of the result.
  */
-ssize_t lov_lsm_pack_v1v3(const struct lov_stripe_md *lsm, void *buf,
-			  size_t buf_size)
+static ssize_t lov_lsm_pack_v1v3(const struct lov_stripe_md *lsm, void *buf,
+				 size_t buf_size)
 {
 	struct lov_ost_data_v1 *lmm_objects;
 	struct lov_mds_md_v1 *lmmv1 = buf;
@@ -162,8 +153,8 @@ ssize_t lov_lsm_pack_v1v3(const struct lov_stripe_md *lsm, void *buf,
 	return lmm_size;
 }
 
-ssize_t lov_lsm_pack_foreign(const struct lov_stripe_md *lsm, void *buf,
-			     size_t buf_size)
+static ssize_t lov_lsm_pack_foreign(const struct lov_stripe_md *lsm, void *buf,
+				    size_t buf_size)
 {
 	struct lov_foreign_md *lfm = buf;
 	size_t lfm_size;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 30/42] lustre: lov: guard against class_exp2obd() returning NULL.
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (28 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 29/42] lustre: lov: discard unused lov_dump_lmm* functions James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 31/42] lustre: clio: don't call aio_complete() in lustre upon errors James Simmons
                   ` (11 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

class_exp2obd() can return NULL.  Sometimes this is tested for,
sometimes not.
Add appropriate tested so that a NULL return is never dereferenced.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: 0aa3d883acfb05 ("LU-6142 lov: guard against class_exp2obd() returning NULL.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39379
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_obd.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c
index 7fe7372..d88d325 100644
--- a/fs/lustre/lov/lov_obd.c
+++ b/fs/lustre/lov/lov_obd.c
@@ -269,8 +269,8 @@ static int lov_disconnect_obd(struct obd_device *obd, struct lov_tgt_desc *tgt)
 	int rc;
 
 	osc_obd = class_exp2obd(tgt->ltd_exp);
-	CDEBUG(D_CONFIG, "%s: disconnecting target %s\n",
-	       obd->obd_name, osc_obd ? osc_obd->obd_name : "NULL");
+	CDEBUG(D_CONFIG, "%s: disconnecting target %s\n", obd->obd_name,
+	       osc_obd ? osc_obd->obd_name : "<no obd>");
 
 	if (tgt->ltd_active) {
 		tgt->ltd_active = 0;
@@ -1122,7 +1122,8 @@ static int lov_iocontrol(unsigned int cmd, struct obd_export *exp, int len,
 
 			/* ll_umount_begin() sets force on lov, pass to osc */
 			osc_obd = class_exp2obd(lov->lov_tgts[i]->ltd_exp);
-			osc_obd->obd_force = obd->obd_force;
+			if (osc_obd)
+				osc_obd->obd_force = obd->obd_force;
 			err = obd_iocontrol(cmd, lov->lov_tgts[i]->ltd_exp,
 					    len, karg, uarg);
 			if (err) {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 31/42] lustre: clio: don't call aio_complete() in lustre upon errors
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (29 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 30/42] lustre: lov: guard against class_exp2obd() returning NULL James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 32/42] lustre: llite: it_lock_bits should be bit-wise tested James Simmons
                   ` (10 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Wang Shilong <wshilong@ddn.com>

For older kernels VFS will call aio_complete() if ret is
not -EIOCBQUEUED, this could happen when we don't pass user
buffer as page alignment or some other errors happen in Lustre.

So in Lustre, we need be careful to handle this case to avoid double
aio_complete() called. Newer kernels don't have this issue but
apply this change to keep in sync with OpenSFS tree.

Fixes: fde7ac1 ("lustre: clio: AIO support for direct IO")
WC-bug-id: https://jira.whamcloud.com/browse/LU-13900
Lustre-commit: 2fb8444b5a6369 ("LU-13900 clio: don't call aio_complete() in lustre upon errors")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/39636
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h | 1 +
 fs/lustre/llite/file.c        | 7 +++++++
 fs/lustre/obdclass/cl_io.c    | 3 ++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index e849f23..56200d2 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -2572,6 +2572,7 @@ struct cl_dio_aio {
 	struct cl_page_list	cda_pages;
 	struct kiocb		*cda_iocb;
 	ssize_t			cda_bytes;
+	unsigned int		cda_no_aio_complete:1;
 };
 
 /** @} cl_sync_io */
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index babd24d..1d2ab11 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -1669,6 +1669,13 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 	}
 
 	if (io->ci_aio) {
+		/*
+		 * VFS will call aio_complete() if no -EIOCBQUEUED
+		 * is returned for AIO, so we can not call aio_complete()
+		 * in our end_io().
+		 */
+		if (rc != -EIOCBQUEUED)
+			io->ci_aio->cda_no_aio_complete = 1;
 		/**
 		 * Drop one extra reference so that end_io() could be
 		 * called for this IO context, we could call it after
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index 1564d9f..aa3cb17 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -1087,7 +1087,7 @@ static void cl_aio_end(const struct lu_env *env, struct cl_sync_io *anchor)
 		cl_page_put(env, page);
 	}
 
-	if (!is_sync_kiocb(aio->cda_iocb) &&
+	if (!is_sync_kiocb(aio->cda_iocb) && !aio->cda_no_aio_complete &&
 	    aio->cda_iocb->ki_complete)
 		aio->cda_iocb->ki_complete(aio->cda_iocb,
 					   ret ?: aio->cda_bytes, 0);
@@ -1108,6 +1108,7 @@ struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb)
 				       cl_aio_end);
 		cl_page_list_init(&aio->cda_pages);
 		aio->cda_iocb = iocb;
+		aio->cda_no_aio_complete = 0;
 	}
 	return aio;
 }
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 32/42] lustre: llite: it_lock_bits should be bit-wise tested
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (30 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 31/42] lustre: clio: don't call aio_complete() in lustre upon errors James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 33/42] lustre: ldlm: control lru_size for extent lock James Simmons
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Shaun Tancheff <shaun.tancheff@hpe.com>

If lock_mode is not set then ensure that it_lock_bits has
the MDS_INODELOCK_OPEN bit set

Fixes: e476f2e55aa9e ("staging/lustre/llite: flatten struct lookup_intent")
HPE-bug-id: LUS-9198
WC-bug-id: https://jira.whamcloud.com/browse/LU-13940
Lustre-commit: 86868afde5a5eb ("LU-13940 llite: it_lock_bits should be bit-wise tested")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/39797
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 1d2ab11..de22e191 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -1169,8 +1169,8 @@ static int ll_lease_och_release(struct inode *inode, struct file *file)
 
 	/* already get lease, handle lease lock */
 	ll_set_lock_data(sbi->ll_md_exp, inode, &it, NULL);
-	if (it.it_lock_mode == 0 ||
-	    it.it_lock_bits != MDS_INODELOCK_OPEN) {
+	if (!it.it_lock_mode ||
+	    !(it.it_lock_bits & MDS_INODELOCK_OPEN)) {
 		/* open lock must return for lease */
 		CERROR(DFID "lease granted but no open lock, %d/%llu.\n",
 		       PFID(ll_inode2fid(inode)), it.it_lock_mode,
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 33/42] lustre: ldlm: control lru_size for extent lock
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (31 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 32/42] lustre: llite: it_lock_bits should be bit-wise tested James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 34/42] lustre: ldlm: pool fixes James Simmons
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Jinshan Xiong <jinshan.xiong@uber.com>

We register ELC for extent locks to be canceled at enqueue time,
but it can't make positive effect to locks that have dirty pages
under it. To keep the semantics of lru_size, the client should
check how many unused locks are cached after adding a lock into
lru list. If it has already exceeded the hard limit
(ns_max_unused), the client will initiate async lock cancellation
process in batch mode (ns->ns_cancel_batch).

To do it, re-use the new batching LRU cancel functionality.

Wherever unlimited LRU cancel is called (not ELC), try to cancel in
batched mode.

And a new field named new sysfs attribute named *lru_cancel_batch*
is introduced into ldlm namespace to control the batch count.

WC-bug-id: https://jira.whamcloud.com/browse/LU-11518
Lustre-commit: 6052cc88eb1232 ("LU-11518 ldlm: control lru_size for extent lock")
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Shuichi Ihara <sihara@ddn.com>
Signed-off-by: Gu Zheng <gzheng@ddn.com>
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/39562
Reviewed-on: https://es-gerrit.dev.cray.com/157068
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm.h |  7 +++++++
 fs/lustre/ldlm/ldlm_lock.c     | 17 +++++++----------
 fs/lustre/ldlm/ldlm_request.c  |  3 ++-
 fs/lustre/ldlm/ldlm_resource.c | 28 ++++++++++++++++++++++++++++
 4 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index 871d4ff..682035a 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -65,6 +65,7 @@
  */
 #define LDLM_DIRTY_AGE_LIMIT (10)
 #define LDLM_DEFAULT_PARALLEL_AST_LIMIT 1024
+#define LDLM_DEFAULT_LRU_SHRINK_BATCH (16)
 
 /**
  * LDLM non-error return states
@@ -423,6 +424,12 @@ struct ldlm_namespace {
 	 */
 	unsigned int		ns_max_unused;
 
+	/**
+	 * Cancel batch, if unused lock count exceed lru_size
+	 * Only be used if LRUR disable.
+	 */
+	unsigned int		ns_cancel_batch;
+
 	/** Maximum allowed age (last used time) for locks in the LRU. Set in
 	 * seconds from userspace, but stored in ns to avoid repeat conversions.
 	 */
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index f5dedc9..2931873 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -776,10 +776,14 @@ void ldlm_lock_decref_internal(struct ldlm_lock *lock, enum ldlm_mode mode)
 	}
 
 	if (!lock->l_readers && !lock->l_writers && ldlm_is_cbpending(lock)) {
+		unsigned int mask = D_DLMTRACE;
+
 		/* If we received a blocked AST and this was the last reference,
 		 * run the callback.
 		 */
-		LDLM_DEBUG(lock, "final decref done on cbpending lock");
+		LDLM_DEBUG_LIMIT(mask, lock,
+				 "final decref done on %sCBPENDING lock",
+				 mask & D_WARNING ? "non-local " : "");
 
 		LDLM_LOCK_GET(lock); /* dropped by bl thread */
 		ldlm_lock_remove_from_lru(lock);
@@ -794,24 +798,17 @@ void ldlm_lock_decref_internal(struct ldlm_lock *lock, enum ldlm_mode mode)
 	} else if (!lock->l_readers && !lock->l_writers &&
 		   !ldlm_is_no_lru(lock) && !ldlm_is_bl_ast(lock) &&
 		   !ldlm_is_converting(lock)) {
-		LDLM_DEBUG(lock, "add lock into lru list");
-
 		/* If this is a client-side namespace and this was the last
 		 * reference, put it on the LRU.
 		 */
 		ldlm_lock_add_to_lru(lock);
 		unlock_res_and_lock(lock);
+		LDLM_DEBUG(lock, "add lock into lru list");
 
 		if (ldlm_is_fail_loc(lock))
 			OBD_RACE(OBD_FAIL_LDLM_CP_BL_RACE);
 
-		/* Call ldlm_cancel_lru() only if EARLY_CANCEL and LRU RESIZE
-		 * are not supported by the server, otherwise, it is done on
-		 * enqueue.
-		 */
-		if (!exp_connect_cancelset(lock->l_conn_export) &&
-		    !ns_connect_lru_resize(ns))
-			ldlm_cancel_lru(ns, 0, LCF_ASYNC, 0);
+		ldlm_cancel_lru(ns, 0, LCF_ASYNC, 0);
 	} else {
 		LDLM_DEBUG(lock, "do not add lock into lru list");
 		unlock_res_and_lock(lock);
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 901e505..c235915 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -1709,7 +1709,8 @@ int ldlm_cancel_lru(struct ldlm_namespace *ns, int min,
 	 * Just prepare the list of locks, do not actually cancel them yet.
 	 * Locks are cancelled later in a separate thread.
 	 */
-	count = ldlm_prepare_lru_list(ns, &cancels, min, 0, 0, lru_flags);
+	count = ldlm_prepare_lru_list(ns, &cancels, min, 0,
+				      ns->ns_cancel_batch, lru_flags);
 	rc = ldlm_bl_to_thread_list(ns, NULL, &cancels, count, cancel_flags);
 	if (rc == 0)
 		return count;
diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c
index 31e7513..3527e15 100644
--- a/fs/lustre/ldlm/ldlm_resource.c
+++ b/fs/lustre/ldlm/ldlm_resource.c
@@ -247,6 +247,32 @@ static ssize_t lru_size_store(struct kobject *kobj, struct attribute *attr,
 }
 LUSTRE_RW_ATTR(lru_size);
 
+static ssize_t lru_cancel_batch_show(struct kobject *kobj,
+				     struct attribute *attr, char *buf)
+{
+	struct ldlm_namespace *ns = container_of(kobj, struct ldlm_namespace,
+						 ns_kobj);
+
+	return scnprintf(buf, sizeof(buf) - 1, "%u\n", ns->ns_cancel_batch);
+}
+
+static ssize_t lru_cancel_batch_store(struct kobject *kobj,
+				      struct attribute *attr,
+				      const char *buffer, size_t count)
+{
+	struct ldlm_namespace *ns = container_of(kobj, struct ldlm_namespace,
+						 ns_kobj);
+	unsigned long tmp;
+
+	if (kstrtoul(buffer, 10, &tmp))
+		return -EINVAL;
+
+	ns->ns_cancel_batch = (unsigned int)tmp;
+
+	return count;
+}
+LUSTRE_RW_ATTR(lru_cancel_batch);
+
 static ssize_t lru_max_age_show(struct kobject *kobj, struct attribute *attr,
 				char *buf)
 {
@@ -350,6 +376,7 @@ static ssize_t dirty_age_limit_store(struct kobject *kobj,
 	&lustre_attr_lock_count.attr,
 	&lustre_attr_lock_unused_count.attr,
 	&lustre_attr_lru_size.attr,
+	&lustre_attr_lru_cancel_batch.attr,
 	&lustre_attr_lru_max_age.attr,
 	&lustre_attr_early_lock_cancel.attr,
 	&lustre_attr_dirty_age_limit.attr,
@@ -635,6 +662,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
 	ns->ns_max_parallel_ast = LDLM_DEFAULT_PARALLEL_AST_LIMIT;
 	ns->ns_nr_unused = 0;
 	ns->ns_max_unused = LDLM_DEFAULT_LRU_SIZE;
+	ns->ns_cancel_batch = LDLM_DEFAULT_LRU_SHRINK_BATCH;
 	ns->ns_max_age = ktime_set(LDLM_DEFAULT_MAX_ALIVE, 0);
 	ns->ns_orig_connect_flags = 0;
 	ns->ns_connect_flags = 0;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 34/42] lustre: ldlm: pool fixes
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (32 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 33/42] lustre: ldlm: control lru_size for extent lock James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 35/42] lustre: ldlm: pool recalc forceful call James Simmons
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Vitaly Fertman <c17818@cray.com>

At the time the client side recalc period was increased up to 10secs
the grant & cancel rates started showing the speed not in seconds but
in tens of seconds.

At the pool initialization time, the server side recalc job should not
be delayed on client's recalc period.

It may happen an NS time is significant and comparable (or even more)
than the recalc period of the next NS (all the following NS's) in the
list. If the time has been already spent on the next NS, it does not
mean we want to double the delay for the original NS and recalc after
next N secs.

Make lock volume factor more fine grained (default is 100 now vs the
original 1): it is likely to cancel locks on clients twice faster than
server requested is too fast.

Protect missed pl_server_lock_volume update by the pool lock.

Replace ktime_get_real_seconds with ktime_get_seconds for the recal
interval.

HPE-bug-id: LUS-8678
WC-bug-id: https://jira.whamcloud.com/browse/LU-11518
Lustre-commit: 1806d6e8291758a ("LU-11518 ldlm: pool fixes")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/39563
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm.h |   4 +-
 fs/lustre/ldlm/ldlm_pool.c     | 129 +++++++++++++++++++++++++++--------------
 fs/lustre/ldlm/ldlm_request.c  |   2 +-
 3 files changed, 88 insertions(+), 47 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index 682035a..bc6785f 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -250,8 +250,8 @@ struct ldlm_pool {
 	u64			pl_server_lock_volume;
 	/** Current biggest client lock volume. Protected by pl_lock. */
 	u64			pl_client_lock_volume;
-	/** Lock volume factor. SLV on client is calculated as following:
-	 *  server_slv * lock_volume_factor.
+	/** Lock volume factor, shown in percents in procfs, but internally
+	 *  Client SLV calculated as: server_slv * lock_volume_factor >> 8.
 	 */
 	atomic_t		pl_lock_volume_factor;
 	/** Time when last SLV from server was obtained. */
diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c
index 9e2a006..c37948a 100644
--- a/fs/lustre/ldlm/ldlm_pool.c
+++ b/fs/lustre/ldlm/ldlm_pool.c
@@ -209,13 +209,13 @@ static inline int ldlm_pool_t2gsp(unsigned int t)
  *
  * \pre ->pl_lock is locked.
  */
-static void ldlm_pool_recalc_stats(struct ldlm_pool *pl)
+static void ldlm_pool_recalc_stats(struct ldlm_pool *pl, timeout_t period)
 {
 	int grant_plan = pl->pl_grant_plan;
 	u64 slv = pl->pl_server_lock_volume;
 	int granted = atomic_read(&pl->pl_granted);
-	int grant_rate = atomic_read(&pl->pl_grant_rate);
-	int cancel_rate = atomic_read(&pl->pl_cancel_rate);
+	int grant_rate = atomic_read(&pl->pl_grant_rate) / period;
+	int cancel_rate = atomic_read(&pl->pl_cancel_rate) / period;
 
 	lprocfs_counter_add(pl->pl_stats, LDLM_POOL_SLV_STAT,
 			    slv);
@@ -254,10 +254,10 @@ static void ldlm_cli_pool_pop_slv(struct ldlm_pool *pl)
  */
 static int ldlm_cli_pool_recalc(struct ldlm_pool *pl)
 {
-	time64_t recalc_interval_sec;
+	timeout_t recalc_interval_sec;
 	int ret;
 
-	recalc_interval_sec = ktime_get_real_seconds() - pl->pl_recalc_time;
+	recalc_interval_sec = ktime_get_seconds() - pl->pl_recalc_time;
 	if (recalc_interval_sec < pl->pl_recalc_period)
 		return 0;
 
@@ -265,7 +265,7 @@ static int ldlm_cli_pool_recalc(struct ldlm_pool *pl)
 	/*
 	 * Check if we need to recalc lists now.
 	 */
-	recalc_interval_sec = ktime_get_real_seconds() - pl->pl_recalc_time;
+	recalc_interval_sec = ktime_get_seconds() - pl->pl_recalc_time;
 	if (recalc_interval_sec < pl->pl_recalc_period) {
 		spin_unlock(&pl->pl_lock);
 		return 0;
@@ -292,7 +292,7 @@ static int ldlm_cli_pool_recalc(struct ldlm_pool *pl)
 	 * Time of LRU resizing might be longer than period,
 	 * so update after LRU resizing rather than before it.
 	 */
-	pl->pl_recalc_time = ktime_get_real_seconds();
+	pl->pl_recalc_time = ktime_get_seconds();
 	lprocfs_counter_add(pl->pl_stats, LDLM_POOL_TIMING_STAT,
 			    recalc_interval_sec);
 	spin_unlock(&pl->pl_lock);
@@ -321,7 +321,9 @@ static int ldlm_cli_pool_shrink(struct ldlm_pool *pl,
 	/*
 	 * Make sure that pool knows last SLV and Limit from obd.
 	 */
+	spin_lock(&pl->pl_lock);
 	ldlm_cli_pool_pop_slv(pl);
+	spin_unlock(&pl->pl_lock);
 
 	spin_lock(&ns->ns_lock);
 	unused = ns->ns_nr_unused;
@@ -341,23 +343,25 @@ static int ldlm_cli_pool_shrink(struct ldlm_pool *pl,
 /**
  * Pool recalc wrapper. Will call either client or server pool recalc callback
  * depending what pool @pl is used.
+ *
+ * Returns	time in seconds for the next recalc of this pool
  */
-static int ldlm_pool_recalc(struct ldlm_pool *pl)
+static timeout_t ldlm_pool_recalc(struct ldlm_pool *pl)
 {
-	u32 recalc_interval_sec;
+	timeout_t recalc_interval_sec;
 	int count;
 
-	recalc_interval_sec = ktime_get_real_seconds() - pl->pl_recalc_time;
+	recalc_interval_sec = ktime_get_seconds() - pl->pl_recalc_time;
 	if (recalc_interval_sec > 0) {
 		spin_lock(&pl->pl_lock);
-		recalc_interval_sec = ktime_get_real_seconds() -
+		recalc_interval_sec = ktime_get_seconds() -
 				      pl->pl_recalc_time;
 
 		if (recalc_interval_sec > 0) {
 			/*
-			 * Update pool statistics every 1s.
+			 * Update pool statistics every recalc interval.
 			 */
-			ldlm_pool_recalc_stats(pl);
+			ldlm_pool_recalc_stats(pl, recalc_interval_sec);
 
 			/*
 			 * Zero out all rates and speed for the last period.
@@ -374,20 +378,7 @@ static int ldlm_pool_recalc(struct ldlm_pool *pl)
 				    count);
 	}
 
-	recalc_interval_sec = pl->pl_recalc_time - ktime_get_real_seconds() +
-			      pl->pl_recalc_period;
-	if (recalc_interval_sec <= 0) {
-		/* DEBUG: should be re-removed after LU-4536 is fixed */
-		CDEBUG(D_DLMTRACE,
-		       "%s: Negative interval(%ld), too short period(%ld)\n",
-		       pl->pl_name, (long)recalc_interval_sec,
-		       (long)pl->pl_recalc_period);
-
-		/* Prevent too frequent recalculation. */
-		recalc_interval_sec = 1;
-	}
-
-	return recalc_interval_sec;
+	return pl->pl_recalc_time + pl->pl_recalc_period;
 }
 
 /*
@@ -421,6 +412,7 @@ static int lprocfs_pool_state_seq_show(struct seq_file *m, void *unused)
 	int granted, grant_rate, cancel_rate;
 	int grant_speed, lvf;
 	struct ldlm_pool *pl = m->private;
+	timeout_t period;
 	u64 slv, clv;
 	u32 limit;
 
@@ -429,8 +421,11 @@ static int lprocfs_pool_state_seq_show(struct seq_file *m, void *unused)
 	clv = pl->pl_client_lock_volume;
 	limit = atomic_read(&pl->pl_limit);
 	granted = atomic_read(&pl->pl_granted);
-	grant_rate = atomic_read(&pl->pl_grant_rate);
-	cancel_rate = atomic_read(&pl->pl_cancel_rate);
+	period = ktime_get_seconds() - pl->pl_recalc_time;
+	if (period <= 0)
+		period = 1;
+	grant_rate = atomic_read(&pl->pl_grant_rate) / period;
+	cancel_rate = atomic_read(&pl->pl_cancel_rate) / period;
 	grant_speed = grant_rate - cancel_rate;
 	lvf = atomic_read(&pl->pl_lock_volume_factor);
 	spin_unlock(&pl->pl_lock);
@@ -439,7 +434,7 @@ static int lprocfs_pool_state_seq_show(struct seq_file *m, void *unused)
 		      "  SLV: %llu\n"
 		      "  CLV: %llu\n"
 		      "  LVF: %d\n",
-		      pl->pl_name, slv, clv, lvf);
+		      pl->pl_name, slv, clv, (lvf * 100) >> 8);
 
 	seq_printf(m, "  GR:  %d\n  CR:  %d\n  GS:  %d\n"
 		      "  G:   %d\n  L:   %d\n",
@@ -457,11 +452,15 @@ static ssize_t grant_speed_show(struct kobject *kobj, struct attribute *attr,
 	struct ldlm_pool *pl = container_of(kobj, struct ldlm_pool,
 					    pl_kobj);
 	int grant_speed;
+	timeout_t period;
 
 	spin_lock(&pl->pl_lock);
 	/* serialize with ldlm_pool_recalc */
-	grant_speed = atomic_read(&pl->pl_grant_rate) -
-			atomic_read(&pl->pl_cancel_rate);
+	period = ktime_get_seconds() - pl->pl_recalc_time;
+	if (period <= 0)
+		period = 1;
+	grant_speed = (atomic_read(&pl->pl_grant_rate) -
+		       atomic_read(&pl->pl_cancel_rate)) / period;
 	spin_unlock(&pl->pl_lock);
 	return sprintf(buf, "%d\n", grant_speed);
 }
@@ -477,6 +476,9 @@ static ssize_t grant_speed_show(struct kobject *kobj, struct attribute *attr,
 LDLM_POOL_SYSFS_READER_NOLOCK_SHOW(server_lock_volume, u64);
 LUSTRE_RO_ATTR(server_lock_volume);
 
+LDLM_POOL_SYSFS_READER_NOLOCK_SHOW(client_lock_volume, u64);
+LUSTRE_RO_ATTR(client_lock_volume);
+
 LDLM_POOL_SYSFS_READER_NOLOCK_SHOW(limit, atomic);
 LDLM_POOL_SYSFS_WRITER_NOLOCK_STORE(limit, atomic);
 LUSTRE_RW_ATTR(limit);
@@ -490,16 +492,56 @@ static ssize_t grant_speed_show(struct kobject *kobj, struct attribute *attr,
 LDLM_POOL_SYSFS_READER_NOLOCK_SHOW(grant_rate, atomic);
 LUSTRE_RO_ATTR(grant_rate);
 
-LDLM_POOL_SYSFS_READER_NOLOCK_SHOW(lock_volume_factor, atomic);
-LDLM_POOL_SYSFS_WRITER_NOLOCK_STORE(lock_volume_factor, atomic);
+static ssize_t lock_volume_factor_show(struct kobject *kobj,
+				       struct attribute *attr,
+				       char *buf)
+{
+	struct ldlm_pool *pl = container_of(kobj, struct ldlm_pool, pl_kobj);
+	unsigned long tmp;
+
+	tmp = (atomic_read(&pl->pl_lock_volume_factor) * 100) >> 8;
+	return sprintf(buf, "%lu\n", tmp);
+}
+
+static ssize_t lock_volume_factor_store(struct kobject *kobj,
+					struct attribute *attr,
+					const char *buffer,
+					size_t count)
+{
+	struct ldlm_pool *pl = container_of(kobj, struct ldlm_pool, pl_kobj);
+	unsigned long tmp;
+	int rc;
+
+	rc = kstrtoul(buffer, 10, &tmp);
+	if (rc < 0)
+		return rc;
+
+	tmp = (tmp << 8) / 100;
+	atomic_set(&pl->pl_lock_volume_factor, tmp);
+
+	return count;
+}
 LUSTRE_RW_ATTR(lock_volume_factor);
 
+static ssize_t recalc_time_show(struct kobject *kobj,
+				struct attribute *attr,
+				char *buf)
+{
+	struct ldlm_pool *pl = container_of(kobj, struct ldlm_pool, pl_kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n",
+			ktime_get_seconds() - pl->pl_recalc_time);
+}
+LUSTRE_RO_ATTR(recalc_time);
+
 /* These are for pools in /sys/fs/lustre/ldlm/namespaces/.../pool */
 static struct attribute *ldlm_pl_attrs[] = {
 	&lustre_attr_grant_speed.attr,
 	&lustre_attr_grant_plan.attr,
 	&lustre_attr_recalc_period.attr,
 	&lustre_attr_server_lock_volume.attr,
+	&lustre_attr_client_lock_volume.attr,
+	&lustre_attr_recalc_time.attr,
 	&lustre_attr_limit.attr,
 	&lustre_attr_granted.attr,
 	&lustre_attr_cancel_rate.attr,
@@ -625,8 +667,8 @@ int ldlm_pool_init(struct ldlm_pool *pl, struct ldlm_namespace *ns,
 
 	spin_lock_init(&pl->pl_lock);
 	atomic_set(&pl->pl_granted, 0);
-	pl->pl_recalc_time = ktime_get_real_seconds();
-	atomic_set(&pl->pl_lock_volume_factor, 1);
+	pl->pl_recalc_time = ktime_get_seconds();
+	atomic_set(&pl->pl_lock_volume_factor, 1 << 8);
 
 	atomic_set(&pl->pl_grant_rate, 0);
 	atomic_set(&pl->pl_cancel_rate, 0);
@@ -867,7 +909,7 @@ static void ldlm_pools_recalc(struct work_struct *ws)
 	struct ldlm_namespace *ns;
 	struct ldlm_namespace *ns_old = NULL;
 	/* seconds of sleep if no active namespaces */
-	time64_t time = LDLM_POOL_CLI_DEF_RECALC_PERIOD;
+	timeout_t delay = LDLM_POOL_CLI_DEF_RECALC_PERIOD;
 	int nr;
 
 	/*
@@ -933,11 +975,8 @@ static void ldlm_pools_recalc(struct work_struct *ws)
 		 * After setup is done - recalc the pool.
 		 */
 		if (!skip) {
-			time64_t ttime = ldlm_pool_recalc(&ns->ns_pool);
-
-			if (ttime < time)
-				time = ttime;
-
+			delay = min(delay,
+				    ldlm_pool_recalc(&ns->ns_pool));
 			ldlm_namespace_put(ns);
 		}
 	}
@@ -945,12 +984,14 @@ static void ldlm_pools_recalc(struct work_struct *ws)
 	/* Wake up the blocking threads from time to time. */
 	ldlm_bl_thread_wakeup();
 
-	schedule_delayed_work(&ldlm_recalc_pools, time * HZ);
+	schedule_delayed_work(&ldlm_recalc_pools, delay * HZ);
 }
 
 static int ldlm_pools_thread_start(void)
 {
-	schedule_delayed_work(&ldlm_recalc_pools, 0);
+	time64_t delay = LDLM_POOL_CLI_DEF_RECALC_PERIOD;
+
+	schedule_delayed_work(&ldlm_recalc_pools, delay);
 
 	return 0;
 }
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index c235915..a8d6df1 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -1388,7 +1388,7 @@ static enum ldlm_policy_res ldlm_cancel_lrur_policy(struct ldlm_namespace *ns,
 	lvf = ldlm_pool_get_lvf(pl);
 	la = div_u64(ktime_to_ns(ktime_sub(cur, lock->l_last_used)),
 		     NSEC_PER_SEC);
-	lv = lvf * la * ns->ns_nr_unused;
+	lv = lvf * la * ns->ns_nr_unused >> 8;
 
 	/* Inform pool about current CLV to see it via debugfs. */
 	ldlm_pool_set_clv(pl, lv);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 35/42] lustre: ldlm: pool recalc forceful call
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (33 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 34/42] lustre: ldlm: pool fixes James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 36/42] lustre: don't take spinlock to read a 'long' James Simmons
                   ` (6 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Vitaly Fertman <c17818@cray.com>

Let pool recalc to be able to be called forcefully independently of
the last recalc time;

Call the pool recalc forcefully on the lock decref instead of LRU
cancel to take into account the fresh SLV obtained from the server.

Call LRU recalc from after_reply if a significant SLV change occurs.
Add a sysfs attribute to control what 'a significant SLV change' is.

WC-bug-id: https://jira.whamcloud.com/browse/LU-11518
Lustre-commit: dd43ff345254f2 ("LU-11518 ldlm: pool recalc forceful call")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://es-gerrit.dev.cray.com/157134
Reviewed-on: https://review.whamcloud.com/39564
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm.h | 31 +++++++++++++++++++++++++++++--
 fs/lustre/ldlm/ldlm_internal.h | 14 +-------------
 fs/lustre/ldlm/ldlm_lock.c     |  2 +-
 fs/lustre/ldlm/ldlm_lockd.c    | 13 ++++++++++++-
 fs/lustre/ldlm/ldlm_pool.c     | 12 ++++++------
 fs/lustre/ldlm/ldlm_request.c  | 35 +++++++++++++++++++++++++++++------
 fs/lustre/ldlm/ldlm_resource.c | 31 +++++++++++++++++++++++++++++++
 7 files changed, 109 insertions(+), 29 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index bc6785f..f056c2d 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -66,6 +66,7 @@
 #define LDLM_DIRTY_AGE_LIMIT (10)
 #define LDLM_DEFAULT_PARALLEL_AST_LIMIT 1024
 #define LDLM_DEFAULT_LRU_SHRINK_BATCH (16)
+#define LDLM_DEFAULT_SLV_RECALC_PCT (10)
 
 /**
  * LDLM non-error return states
@@ -193,6 +194,19 @@ static inline int lockmode_compat(enum ldlm_mode exist_mode,
  *
  */
 
+/* Cancel lru flag, it indicates we cancel aged locks. */
+enum ldlm_lru_flags {
+	LDLM_LRU_FLAG_NO_WAIT	= 0x1,	/* Cancel locks w/o blocking (neither
+					 * sending nor waiting for any RPCs)
+					 */
+	LDLM_LRU_FLAG_CLEANUP   = 0x2,	/* Used when clearing lru, tells
+					 * prepare_lru_list to set discard flag
+					 * on PR extent locks so we don't waste
+					 * time saving pages that will be
+					 * discarded momentarily
+					 */
+};
+
 struct ldlm_pool;
 struct ldlm_lock;
 struct ldlm_resource;
@@ -208,7 +222,7 @@ static inline int lockmode_compat(enum ldlm_mode exist_mode,
  */
 struct ldlm_pool_ops {
 	/** Recalculate pool @pl usage */
-	int (*po_recalc)(struct ldlm_pool *pl);
+	int (*po_recalc)(struct ldlm_pool *pl, bool force);
 	/** Cancel at least @nr locks from pool @pl */
 	int (*po_shrink)(struct ldlm_pool *pl, int nr,
 			 gfp_t gfp_mask);
@@ -430,6 +444,12 @@ struct ldlm_namespace {
 	 */
 	unsigned int		ns_cancel_batch;
 
+	/**
+	 * How much the SLV should decrease in %% to trigger LRU cancel
+	 * urgently.
+	 */
+	unsigned int		ns_recalc_pct;
+
 	/** Maximum allowed age (last used time) for locks in the LRU. Set in
 	 * seconds from userspace, but stored in ns to avoid repeat conversions.
 	 */
@@ -487,7 +507,13 @@ struct ldlm_namespace {
 	 * Flag to indicate namespace is being freed. Used to determine if
 	 * recalculation of LDLM pool statistics should be skipped.
 	 */
-	unsigned		ns_stopping:1;
+	unsigned int		ns_stopping:1,
+
+	/**
+	 * Flag to indicate the LRU recalc on RPC reply is in progress.
+	 * Used to limit the process by 1 thread only.
+	 */
+				ns_rpc_recalc:1;
 
 	struct kobject		ns_kobj; /* sysfs object */
 	struct completion	ns_kobj_unregister;
@@ -1404,6 +1430,7 @@ static inline void check_res_locked(struct ldlm_resource *res)
 int ldlm_pool_init(struct ldlm_pool *pl, struct ldlm_namespace *ns,
 		   int idx, enum ldlm_side client);
 void ldlm_pool_fini(struct ldlm_pool *pl);
+timeout_t ldlm_pool_recalc(struct ldlm_pool *pl, bool force);
 void ldlm_pool_add(struct ldlm_pool *pl, struct ldlm_lock *lock);
 void ldlm_pool_del(struct ldlm_pool *pl, struct ldlm_lock *lock);
 /** @} */
diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h
index 788983f..9dc0561 100644
--- a/fs/lustre/ldlm/ldlm_internal.h
+++ b/fs/lustre/ldlm/ldlm_internal.h
@@ -86,19 +86,6 @@ void ldlm_namespace_move_to_inactive_locked(struct ldlm_namespace *ns,
 struct ldlm_namespace *ldlm_namespace_first_locked(enum ldlm_side client);
 
 /* ldlm_request.c */
-/* Cancel lru flag, it indicates we cancel aged locks. */
-enum ldlm_lru_flags {
-	LDLM_LRU_FLAG_NO_WAIT	= BIT(1), /* Cancel locks w/o blocking (neither
-					   * sending nor waiting for any rpcs)
-					   */
-	LDLM_LRU_FLAG_CLEANUP	= BIT(2), /* Used when clearing lru, tells
-					   * prepare_lru_list to set discard
-					   * flag on PR extent locks so we
-					   * don't waste time saving pages
-					   * that will be discarded momentarily
-					   */
-};
-
 int ldlm_cancel_lru(struct ldlm_namespace *ns, int min,
 		    enum ldlm_cancel_flags cancel_flags,
 		    enum ldlm_lru_flags lru_flags);
@@ -163,6 +150,7 @@ int ldlm_bl_to_thread_list(struct ldlm_namespace *ns,
 			   struct ldlm_lock_desc *ld,
 			   struct list_head *cancels, int count,
 			   enum ldlm_cancel_flags cancel_flags);
+int ldlm_bl_to_thread_ns(struct ldlm_namespace *ns);
 int ldlm_bl_thread_wakeup(void);
 
 void ldlm_handle_bl_callback(struct ldlm_namespace *ns,
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index 2931873..0dbd4f3 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -808,7 +808,7 @@ void ldlm_lock_decref_internal(struct ldlm_lock *lock, enum ldlm_mode mode)
 		if (ldlm_is_fail_loc(lock))
 			OBD_RACE(OBD_FAIL_LDLM_CP_BL_RACE);
 
-		ldlm_cancel_lru(ns, 0, LCF_ASYNC, 0);
+		ldlm_pool_recalc(&ns->ns_pool, true);
 	} else {
 		LDLM_DEBUG(lock, "do not add lock into lru list");
 		unlock_res_and_lock(lock);
diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c
index 7df7af2..4a91a7f 100644
--- a/fs/lustre/ldlm/ldlm_lockd.c
+++ b/fs/lustre/ldlm/ldlm_lockd.c
@@ -504,6 +504,11 @@ int ldlm_bl_to_thread_list(struct ldlm_namespace *ns, struct ldlm_lock_desc *ld,
 	return ldlm_bl_to_thread(ns, ld, NULL, cancels, count, cancel_flags);
 }
 
+int ldlm_bl_to_thread_ns(struct ldlm_namespace *ns)
+{
+	return ldlm_bl_to_thread(ns, NULL, NULL, NULL, 0, LCF_ASYNC);
+}
+
 int ldlm_bl_thread_wakeup(void)
 {
 	wake_up(&ldlm_state->ldlm_bl_pool->blp_waitq);
@@ -856,9 +861,15 @@ static int ldlm_bl_thread_blwi(struct ldlm_bl_pool *blp,
 						   LCF_BL_AST);
 		ldlm_cli_cancel_list(&blwi->blwi_head, count, NULL,
 				     blwi->blwi_flags);
-	} else {
+	} else if (blwi->blwi_lock) {
 		ldlm_handle_bl_callback(blwi->blwi_ns, &blwi->blwi_ld,
 					blwi->blwi_lock);
+	} else {
+		ldlm_pool_recalc(&blwi->blwi_ns->ns_pool, true);
+		spin_lock(&blwi->blwi_ns->ns_lock);
+		blwi->blwi_ns->ns_rpc_recalc = 0;
+		spin_unlock(&blwi->blwi_ns->ns_lock);
+		ldlm_namespace_put(blwi->blwi_ns);
 	}
 	if (blwi->blwi_mem_pressure)
 		memalloc_noreclaim_restore(flags);
diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c
index c37948a..9cee24b 100644
--- a/fs/lustre/ldlm/ldlm_pool.c
+++ b/fs/lustre/ldlm/ldlm_pool.c
@@ -252,13 +252,13 @@ static void ldlm_cli_pool_pop_slv(struct ldlm_pool *pl)
 /**
  * Recalculates client size pool @pl according to current SLV and Limit.
  */
-static int ldlm_cli_pool_recalc(struct ldlm_pool *pl)
+static int ldlm_cli_pool_recalc(struct ldlm_pool *pl, bool force)
 {
 	timeout_t recalc_interval_sec;
 	int ret;
 
 	recalc_interval_sec = ktime_get_seconds() - pl->pl_recalc_time;
-	if (recalc_interval_sec < pl->pl_recalc_period)
+	if (!force && recalc_interval_sec < pl->pl_recalc_period)
 		return 0;
 
 	spin_lock(&pl->pl_lock);
@@ -266,7 +266,7 @@ static int ldlm_cli_pool_recalc(struct ldlm_pool *pl)
 	 * Check if we need to recalc lists now.
 	 */
 	recalc_interval_sec = ktime_get_seconds() - pl->pl_recalc_time;
-	if (recalc_interval_sec < pl->pl_recalc_period) {
+	if (!force && recalc_interval_sec < pl->pl_recalc_period) {
 		spin_unlock(&pl->pl_lock);
 		return 0;
 	}
@@ -346,7 +346,7 @@ static int ldlm_cli_pool_shrink(struct ldlm_pool *pl,
  *
  * Returns	time in seconds for the next recalc of this pool
  */
-static timeout_t ldlm_pool_recalc(struct ldlm_pool *pl)
+timeout_t ldlm_pool_recalc(struct ldlm_pool *pl, bool force)
 {
 	timeout_t recalc_interval_sec;
 	int count;
@@ -373,7 +373,7 @@ static timeout_t ldlm_pool_recalc(struct ldlm_pool *pl)
 	}
 
 	if (pl->pl_ops->po_recalc) {
-		count = pl->pl_ops->po_recalc(pl);
+		count = pl->pl_ops->po_recalc(pl, force);
 		lprocfs_counter_add(pl->pl_stats, LDLM_POOL_RECALC_STAT,
 				    count);
 	}
@@ -976,7 +976,7 @@ static void ldlm_pools_recalc(struct work_struct *ws)
 		 */
 		if (!skip) {
 			delay = min(delay,
-				    ldlm_pool_recalc(&ns->ns_pool));
+				    ldlm_pool_recalc(&ns->ns_pool, false));
 			ldlm_namespace_put(ns);
 		}
 	}
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index a8d6df1..dd897ec 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -1129,8 +1129,9 @@ static inline struct ldlm_pool *ldlm_imp2pl(struct obd_import *imp)
  */
 int ldlm_cli_update_pool(struct ptlrpc_request *req)
 {
+	struct ldlm_namespace *ns;
 	struct obd_device *obd;
-	u64 new_slv;
+	u64 new_slv, ratio;
 	u32 new_limit;
 
 	if (unlikely(!req->rq_import || !req->rq_import->imp_obd ||
@@ -1170,17 +1171,39 @@ int ldlm_cli_update_pool(struct ptlrpc_request *req)
 	read_unlock(&obd->obd_pool_lock);
 
 	/*
-	 * Set new SLV and limit in OBD fields to make them accessible
-	 * to the pool thread. We do not access obd_namespace and pool
-	 * directly here as there is no reliable way to make sure that
-	 * they are still alive at cleanup time. Evil races are possible
-	 * which may cause Oops at that time.
+	 * OBD device keeps the new pool attributes before they are handled by
+	 * the pool.
 	 */
 	write_lock(&obd->obd_pool_lock);
 	obd->obd_pool_slv = new_slv;
 	obd->obd_pool_limit = new_limit;
 	write_unlock(&obd->obd_pool_lock);
 
+	/*
+	 * Check if an urgent pool recalc is needed, let it to be a change of
+	 * SLV on 10%. It is applicable to LRU resize enabled case only.
+	 */
+	ns = obd->obd_namespace;
+	if (!ns_connect_lru_resize(ns) ||
+	    ldlm_pool_get_slv(&ns->ns_pool) < new_slv)
+		return 0;
+
+	ratio = 100 * new_slv / ldlm_pool_get_slv(&ns->ns_pool);
+	if (100 - ratio >= ns->ns_recalc_pct &&
+	    !ns->ns_stopping && !ns->ns_rpc_recalc) {
+		bool recalc = false;
+
+		spin_lock(&ns->ns_lock);
+		if (!ns->ns_stopping && !ns->ns_rpc_recalc) {
+			ldlm_namespace_get(ns);
+			recalc = true;
+			ns->ns_rpc_recalc = 1;
+		}
+		spin_unlock(&ns->ns_lock);
+		if (recalc)
+			ldlm_bl_to_thread_ns(ns);
+	}
+
 	return 0;
 }
 
diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c
index 3527e15..dab837d 100644
--- a/fs/lustre/ldlm/ldlm_resource.c
+++ b/fs/lustre/ldlm/ldlm_resource.c
@@ -273,6 +273,35 @@ static ssize_t lru_cancel_batch_store(struct kobject *kobj,
 }
 LUSTRE_RW_ATTR(lru_cancel_batch);
 
+static ssize_t ns_recalc_pct_show(struct kobject *kobj,
+				  struct attribute *attr, char *buf)
+{
+	struct ldlm_namespace *ns = container_of(kobj, struct ldlm_namespace,
+						 ns_kobj);
+
+	return scnprintf(buf, sizeof(buf) - 1, "%u\n", ns->ns_recalc_pct);
+}
+
+static ssize_t ns_recalc_pct_store(struct kobject *kobj,
+				   struct attribute *attr,
+				   const char *buffer, size_t count)
+{
+	struct ldlm_namespace *ns = container_of(kobj, struct ldlm_namespace,
+						 ns_kobj);
+	unsigned long tmp;
+
+	if (kstrtoul(buffer, 10, &tmp))
+		return -EINVAL;
+
+	if (tmp > 100)
+		return -ERANGE;
+
+	ns->ns_recalc_pct = (unsigned int)tmp;
+
+	return count;
+}
+LUSTRE_RW_ATTR(ns_recalc_pct);
+
 static ssize_t lru_max_age_show(struct kobject *kobj, struct attribute *attr,
 				char *buf)
 {
@@ -375,6 +404,7 @@ static ssize_t dirty_age_limit_store(struct kobject *kobj,
 	&lustre_attr_resource_count.attr,
 	&lustre_attr_lock_count.attr,
 	&lustre_attr_lock_unused_count.attr,
+	&lustre_attr_ns_recalc_pct.attr,
 	&lustre_attr_lru_size.attr,
 	&lustre_attr_lru_cancel_batch.attr,
 	&lustre_attr_lru_max_age.attr,
@@ -663,6 +693,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
 	ns->ns_nr_unused = 0;
 	ns->ns_max_unused = LDLM_DEFAULT_LRU_SIZE;
 	ns->ns_cancel_batch = LDLM_DEFAULT_LRU_SHRINK_BATCH;
+	ns->ns_recalc_pct = LDLM_DEFAULT_SLV_RECALC_PCT;
 	ns->ns_max_age = ktime_set(LDLM_DEFAULT_MAX_ALIVE, 0);
 	ns->ns_orig_connect_flags = 0;
 	ns->ns_connect_flags = 0;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 36/42] lustre: don't take spinlock to read a 'long'.
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (34 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 35/42] lustre: ldlm: pool recalc forceful call James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 37/42] lustre: osc: Do ELC on locks with no OSC object James Simmons
                   ` (5 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

Reading a 'long' (or unsigned long) is always an atomic operation.
There is never a need to take a spinlock to just read a single 'long'.

There are several procfs/debugfs/sysfs handlers which needlessly take
a spinlock for this purpose.

This patch:
 - removes the taking of the spinlock
 - changes the printf to scnprintf() as appropriate
 - directly returns the value returned by scnprintf rather than
   storing it in a variable
 - accesses the 'long' as an arg to the scnprintf(), rather than
   introducing a variabe to hold it.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: 023a9e4cde5498 ("LU-6142 lustre: don't take spinlock to read a 'long'.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39743
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/lproc_llite.c | 24 ++++++------------------
 fs/lustre/mdc/lproc_mdc.c     |  7 +------
 fs/lustre/osc/lproc_osc.c     | 23 ++++++++---------------
 3 files changed, 15 insertions(+), 39 deletions(-)

diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index 54db7eb..9b1c392 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -326,13 +326,9 @@ static ssize_t max_read_ahead_mb_show(struct kobject *kobj,
 {
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
 					      ll_kset.kobj);
-	unsigned long ra_max_mb;
 
-	spin_lock(&sbi->ll_lock);
-	ra_max_mb = PAGES_TO_MiB(sbi->ll_ra_info.ra_max_pages);
-	spin_unlock(&sbi->ll_lock);
-
-	return scnprintf(buf, PAGE_SIZE, "%lu\n", ra_max_mb);
+	return scnprintf(buf, PAGE_SIZE, "%lu\n",
+			 PAGES_TO_MiB(sbi->ll_ra_info.ra_max_pages));
 }
 
 static ssize_t max_read_ahead_mb_store(struct kobject *kobj,
@@ -374,13 +370,9 @@ static ssize_t max_read_ahead_per_file_mb_show(struct kobject *kobj,
 {
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
 					      ll_kset.kobj);
-	unsigned long ra_max_file_mb;
 
-	spin_lock(&sbi->ll_lock);
-	ra_max_file_mb = PAGES_TO_MiB(sbi->ll_ra_info.ra_max_pages_per_file);
-	spin_unlock(&sbi->ll_lock);
-
-	return scnprintf(buf, PAGE_SIZE, "%lu\n", ra_max_file_mb);
+	return scnprintf(buf, PAGE_SIZE, "%lu\n",
+			 PAGES_TO_MiB(sbi->ll_ra_info.ra_max_pages_per_file));
 }
 
 static ssize_t max_read_ahead_per_file_mb_store(struct kobject *kobj,
@@ -419,13 +411,9 @@ static ssize_t max_read_ahead_whole_mb_show(struct kobject *kobj,
 {
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
 					      ll_kset.kobj);
-	unsigned long ra_max_whole_mb;
-
-	spin_lock(&sbi->ll_lock);
-	ra_max_whole_mb = PAGES_TO_MiB(sbi->ll_ra_info.ra_max_read_ahead_whole_pages);
-	spin_unlock(&sbi->ll_lock);
 
-	return scnprintf(buf, PAGE_SIZE, "%lu\n", ra_max_whole_mb);
+	return scnprintf(buf, PAGE_SIZE, "%lu\n",
+			 PAGES_TO_MiB(sbi->ll_ra_info.ra_max_read_ahead_whole_pages));
 }
 
 static ssize_t max_read_ahead_whole_mb_store(struct kobject *kobj,
diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index d7506ea..662be42 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -44,13 +44,8 @@ static int mdc_max_dirty_mb_seq_show(struct seq_file *m, void *v)
 {
 	struct obd_device *obd = m->private;
 	struct client_obd *cli = &obd->u.cli;
-	unsigned long val;
 
-	spin_lock(&cli->cl_loi_list_lock);
-	val = PAGES_TO_MiB(cli->cl_dirty_max_pages);
-	spin_unlock(&cli->cl_loi_list_lock);
-
-	seq_printf(m, "%lu\n", val);
+	seq_printf(m, "%lu\n", PAGES_TO_MiB(cli->cl_dirty_max_pages));
 	return 0;
 }
 
diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c
index 14cbe54..7ea9530 100644
--- a/fs/lustre/osc/lproc_osc.c
+++ b/fs/lustre/osc/lproc_osc.c
@@ -87,7 +87,7 @@ static ssize_t max_rpcs_in_flight_show(struct kobject *kobj,
 					      obd_kset.kobj);
 	struct client_obd *cli = &obd->u.cli;
 
-	return sprintf(buf, "%u\n", cli->cl_max_rpcs_in_flight);
+	return scnprintf(buf, PAGE_SIZE, "%u\n", cli->cl_max_rpcs_in_flight);
 }
 
 static ssize_t max_rpcs_in_flight_store(struct kobject *kobj,
@@ -139,13 +139,9 @@ static ssize_t max_dirty_mb_show(struct kobject *kobj,
 	struct obd_device *obd = container_of(kobj, struct obd_device,
 					      obd_kset.kobj);
 	struct client_obd *cli = &obd->u.cli;
-	unsigned long val;
 
-	spin_lock(&cli->cl_loi_list_lock);
-	val = PAGES_TO_MiB(cli->cl_dirty_max_pages);
-	spin_unlock(&cli->cl_loi_list_lock);
-
-	return scnprintf(buf, PAGE_SIZE, "%lu\n", val);
+	return scnprintf(buf, PAGE_SIZE, "%lu\n",
+			 PAGES_TO_MiB(cli->cl_dirty_max_pages));
 }
 
 static ssize_t max_dirty_mb_store(struct kobject *kobj,
@@ -252,7 +248,8 @@ static ssize_t cur_dirty_bytes_show(struct kobject *kobj,
 					      obd_kset.kobj);
 	struct client_obd *cli = &obd->u.cli;
 
-	return sprintf(buf, "%lu\n", cli->cl_dirty_pages << PAGE_SHIFT);
+	return scnprintf(buf, PAGE_SIZE, "%lu\n",
+			 cli->cl_dirty_pages << PAGE_SHIFT);
 }
 LUSTRE_RO_ATTR(cur_dirty_bytes);
 
@@ -264,7 +261,7 @@ static ssize_t cur_grant_bytes_show(struct kobject *kobj,
 					      obd_kset.kobj);
 	struct client_obd *cli = &obd->u.cli;
 
-	return sprintf(buf, "%lu\n", cli->cl_avail_grant);
+	return scnprintf(buf, PAGE_SIZE, "%lu\n", cli->cl_avail_grant);
 }
 
 static ssize_t cur_grant_bytes_store(struct kobject *kobj,
@@ -284,12 +281,8 @@ static ssize_t cur_grant_bytes_store(struct kobject *kobj,
 		return rc;
 
 	/* this is only for shrinking grant */
-	spin_lock(&cli->cl_loi_list_lock);
-	if (val >= cli->cl_avail_grant) {
-		spin_unlock(&cli->cl_loi_list_lock);
+	if (val >= cli->cl_avail_grant)
 		return -EINVAL;
-	}
-	spin_unlock(&cli->cl_loi_list_lock);
 
 	with_imp_locked(obd, imp, rc)
 		if (imp->imp_state == LUSTRE_IMP_FULL)
@@ -307,7 +300,7 @@ static ssize_t cur_lost_grant_bytes_show(struct kobject *kobj,
 					      obd_kset.kobj);
 	struct client_obd *cli = &obd->u.cli;
 
-	return sprintf(buf, "%lu\n", cli->cl_lost_grant);
+	return scnprintf(buf, PAGE_SIZE, "%lu\n", cli->cl_lost_grant);
 }
 LUSTRE_RO_ATTR(cur_lost_grant_bytes);
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 37/42] lustre: osc: Do ELC on locks with no OSC object
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (35 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 36/42] lustre: don't take spinlock to read a 'long' James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 38/42] lnet: deadlock on LNet shutdown James Simmons
                   ` (4 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Patrick Farrell <pfarrell@whamcloud.com>

Currently, osc_ldlm_weigh_ast weighs locks with no OSC
object in their ast data as "1", meaning the lock is not
considered for ELC.

This doesn't make much sense, since if there is no OSC
object, it's unlikely there's any data under the lock, so
it's actually a good candidate for ELC.

WC-bug-id: https://jira.whamcloud.com/browse/LU-11518
Lustre-commit: 36eca1017fe464 ("LU-11518 osc: Do ELC on locks with no OSC object")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34584
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index ed9f0a0..7bfcbfb 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -722,7 +722,7 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock)
 	unlock_res_and_lock(dlmlock);
 
 	if (!obj) {
-		weight = 1;
+		weight = 0;
 		goto out;
 	}
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 38/42] lnet: deadlock on LNet shutdown
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (36 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 37/42] lustre: osc: Do ELC on locks with no OSC object James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 39/42] lustre: update version to 2.13.56 James Simmons
                   ` (3 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Release ln_api_mutex during LNet shutdown while waiting
for zombie LNI to allow other threads to read the LNet
state updated by the shutdown and fall through, avoiding
the deadlock

WC-bug-id: https://jira.whamcloud.com/browse/LU-12233
Lustre-commit: e0c445648a38fb ("LU-12233 lnet: deadlock on LNet shutdown")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39933
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/api-ni.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index f678ae2..03473bf 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -2036,13 +2036,21 @@ static void lnet_push_target_fini(void)
 		}
 
 		if (!list_empty(&ni->ni_netlist)) {
+			/* Unlock mutex while waiting to allow other
+			 * threads to read the LNet state and fall through
+			 * to avoid deadlock
+			 */
 			lnet_net_unlock(LNET_LOCK_EX);
+			mutex_unlock(&the_lnet.ln_api_mutex);
+
 			++i;
 			if ((i & (-i)) == i) {
 				CDEBUG(D_WARNING, "Waiting for zombie LNI %s\n",
 				       libcfs_nid2str(ni->ni_nid));
 			}
 			schedule_timeout_uninterruptible(HZ);
+
+			mutex_lock(&the_lnet.ln_api_mutex);
 			lnet_net_lock(LNET_LOCK_EX);
 			continue;
 		}
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 39/42] lustre: update version to 2.13.56
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (37 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 38/42] lnet: deadlock on LNet shutdown James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 40/42] lustre: llite: increase readahead default values James Simmons
                   ` (2 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Oleg Drokin <green@whamcloud.com>

New tag 2.13.56

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h
index f206b6c..8d2f2e8 100644
--- a/include/uapi/linux/lustre/lustre_ver.h
+++ b/include/uapi/linux/lustre/lustre_ver.h
@@ -3,9 +3,9 @@
 
 #define LUSTRE_MAJOR 2
 #define LUSTRE_MINOR 13
-#define LUSTRE_PATCH 55
+#define LUSTRE_PATCH 56
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.13.55"
+#define LUSTRE_VERSION_STRING "2.13.56"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 40/42] lustre: llite: increase readahead default values
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (38 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 39/42] lustre: update version to 2.13.56 James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 41/42] lustre: obdclass: don't initialize obj for zero FID James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 42/42] lustre: obdclass: fixes and improvements for jobid James Simmons
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Andreas Dilger <adilger@whamcloud.com>

It is commonly recommended to increase the readahead tunables
for clients to increase performance, since the current defaults
are too small, having been set several years ago for slower
networks and servers.

Increase the readahead defaults to better match values that are
recommended today:
- read_ahead_max_mb increased from 64MB to 1GB by default,
  or 1/32 RAM, whichever is less
- read_ahead_per_file_max_mb is increased from 64MB to 256MB,
  or 1/4 of read_ahead_max_mb, whichever is less

Modify the constant names to better match the variable and /proc
filenames.

Fix sanity test_101g to allow readahead to generate extra read
RPCs, as long as they are the expected size or larger.

WC-bug-id: https://jira.whamcloud.com/browse/LU-11548
Lustre-commit: 4b47ec5a8e6895 ("LU-11548 llite: increase readahead default values")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/33400
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_internal.h | 7 +++++--
 fs/lustre/llite/llite_lib.c      | 8 +++++---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 7c6eddd..0bd6795 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -473,8 +473,11 @@ static inline struct pcc_inode *ll_i2pcci(struct inode *inode)
 /* default to use at least 16M for fast read if possible */
 #define RA_REMAIN_WINDOW_MIN			MiB_TO_PAGES(16UL)
 
-/* default readahead on a given system. */
-#define SBI_DEFAULT_READ_AHEAD_MAX		MiB_TO_PAGES(64UL)
+/* default read-ahead on a given client mountpoint. */
+#define SBI_DEFAULT_READ_AHEAD_MAX		MiB_TO_PAGES(1024UL)
+
+/* default read-ahead for a single file descriptor */
+#define SBI_DEFAULT_READ_AHEAD_PER_FILE_MAX	MiB_TO_PAGES(256UL)
 
 /* default read-ahead full files smaller than limit on the second read */
 #define SBI_DEFAULT_READ_AHEAD_WHOLE_MAX	MiB_TO_PAGES(2UL)
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index e732a82..8ef2437 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -123,11 +123,13 @@ static struct ll_sb_info *ll_init_sbi(void)
 		goto out_destroy_ra;
 	}
 
-	sbi->ll_ra_info.ra_max_pages_per_file = min(pages / 32,
-						    SBI_DEFAULT_READ_AHEAD_MAX);
+	sbi->ll_ra_info.ra_max_pages =
+		min(pages / 32, SBI_DEFAULT_READ_AHEAD_MAX);
+	sbi->ll_ra_info.ra_max_pages_per_file =
+		min(sbi->ll_ra_info.ra_max_pages / 4,
+		    SBI_DEFAULT_READ_AHEAD_PER_FILE_MAX);
 	sbi->ll_ra_info.ra_async_pages_per_file_threshold =
 				sbi->ll_ra_info.ra_max_pages_per_file;
-	sbi->ll_ra_info.ra_max_pages = sbi->ll_ra_info.ra_max_pages_per_file;
 	sbi->ll_ra_info.ra_max_read_ahead_whole_pages = -1;
 	atomic_set(&sbi->ll_ra_info.ra_async_inflight, 0);
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 41/42] lustre: obdclass: don't initialize obj for zero FID
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (39 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 40/42] lustre: llite: increase readahead default values James Simmons
@ 2020-10-06  0:06 ` James Simmons
  2020-10-06  0:06 ` [lustre-devel] [PATCH 42/42] lustre: obdclass: fixes and improvements for jobid James Simmons
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: Lai Siyao <lai.siyao@whamcloud.com>

Object with zero FID is used in stripe allocation, and it's
meaningless to initialize such object via lu_object_find_at(),
return error early to avoid assertion in lu_object_put().

WC-bug-id: https://jira.whamcloud.com/browse/LU-13511
Lustre-commit: 22ea9767956c89 ("LU-13511 obdclass: don't initialize obj for zero FID")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39792
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/lu_object.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c
index 42bb7a6..e8fc328 100644
--- a/fs/lustre/obdclass/lu_object.c
+++ b/fs/lustre/obdclass/lu_object.c
@@ -780,6 +780,13 @@ struct lu_object *lu_object_find_at(const struct lu_env *env,
 	struct rhashtable *hs;
 	int rc;
 
+	/* FID is from disk or network, zero FID is meaningless, return error
+	 * early to avoid assertion in lu_object_put. If a zero FID is wanted,
+	 * it should be allocated via lu_object_anon().
+	 */
+	if (fid_is_zero(f))
+		return ERR_PTR(-EINVAL);
+
 	/*
 	 * This uses standard index maintenance protocol:
 	 *
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 42/42] lustre: obdclass: fixes and improvements for jobid.
  2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
                   ` (40 preceding siblings ...)
  2020-10-06  0:06 ` [lustre-devel] [PATCH 41/42] lustre: obdclass: don't initialize obj for zero FID James Simmons
@ 2020-10-06  0:06 ` James Simmons
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2020-10-06  0:06 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.de>

1/ When lustre_get_jobid() uses jobid_current(), it should
   copy into tmp_jobid, not into jobid.
2/ A '%j' in jobname should use jobid_current()
3/ If there is a %j in jobid_name, then it should
   be used even if  jobid_var is 'session'.

Fixes: 2646ff0b3509 ("lustre: obdclass: allow per-session jobids.")
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/lustre/obdclass/jobid.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/lustre/obdclass/jobid.c b/fs/lustre/obdclass/jobid.c
index 98b3f39..3ee9d40 100644
--- a/fs/lustre/obdclass/jobid.c
+++ b/fs/lustre/obdclass/jobid.c
@@ -206,7 +206,7 @@ static void jobid_prune_expedite(void)
  *   %e = executable
  *   %g = gid
  *   %h = hostname
- *   %j = jobid from environment
+ *   %j = per-session
  *   %p = pid
  *   %u = uid
  *
@@ -247,10 +247,9 @@ static int jobid_interpret_string(const char *jobfmt, char *jobid,
 			l = snprintf(jobid, joblen, "%s",
 				     init_utsname()->nodename);
 			break;
-		case 'j': /* jobid requested by process
-			   * - currently not supported
-			   */
-			l = snprintf(jobid, joblen, "%s", "jobid");
+		case 'j': /* jobid requested by process */
+			l = snprintf(jobid, joblen, "%s",
+				     jobid_current() ?: "jobid");
 			break;
 		case 'p': /* process ID */
 			l = snprintf(jobid, joblen, "%u", current->pid);
@@ -306,7 +305,8 @@ int lustre_get_jobid(char *jobid, size_t joblen)
 		goto out_cache_jobid;
 
 	/* Whole node dedicated to single job */
-	if (strcmp(obd_jobid_var, JOBSTATS_NODELOCAL) == 0) {
+	if (strcmp(obd_jobid_var, JOBSTATS_NODELOCAL) == 0 ||
+	    strnstr(obd_jobid_name, "%j", LUSTRE_JOBID_SIZE)) {
 		int rc2 = jobid_interpret_string(obd_jobid_name,
 						 tmp_jobid, joblen);
 		if (!rc2)
@@ -327,7 +327,7 @@ int lustre_get_jobid(char *jobid, size_t joblen)
 		rcu_read_lock();
 		jid = jobid_current();
 		if (jid)
-			strlcpy(jobid, jid, sizeof(jobid));
+			strlcpy(tmp_jobid, jid, sizeof(tmp_jobid));
 		rcu_read_unlock();
 		goto out_cache_jobid;
 	}
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2020-10-06  0:06 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-06  0:05 [lustre-devel] [PATCH 00/42] lustre: OpenSFS backport for Oct 4 2020 James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 01/42] lustre: ptlrpc: don't require CONFIG_CRYPTO_CRC32 James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 02/42] lustre: dom: lock cancel to drop pages James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 03/42] lustre: sec: use memchr_inv() to check if page is zero James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 04/42] lustre: mdc: fix lovea for replay James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 05/42] lustre: llite: add test to check client deadlock selinux James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 06/42] lnet: use init_wait(), not init_waitqueue_entry() James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 07/42] lustre: lov: make various lov_object.c function static James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 08/42] lustre: llite: return -ENODATA if no default layout James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 09/42] lnet: libcfs: don't save journal_info in dumplog thread James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 10/42] lustre: ldlm: lru code cleanup James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 11/42] lustre: ldlm: cancel LRU improvement James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 12/42] lnet: Do not set preferred NI for MR peer James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 13/42] lustre: ptlrpc: prefer crc32_le() over CryptoAPI James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 14/42] lnet: call event handlers without res_lock James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 15/42] lnet: Conditionally attach rspt in LNetPut & LNetGet James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 16/42] lustre: llite: reuse same cl_dio_aio for one IO James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 17/42] lustre: llite: move iov iter forward by ourself James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 18/42] lustre: llite: report client stats sumsq James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 19/42] lnet: Support checking for MD leaks James Simmons
2020-10-06  0:05 ` [lustre-devel] [PATCH 20/42] lnet: don't read debugfs lnet stats when shutting down James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 21/42] lnet: Loosen restrictions on LNet Health params James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 22/42] lnet: Fix reference leak in lnet_select_pathway James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 23/42] lustre: llite: prune invalid dentries James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 24/42] lnet: Do not overwrite destination when routing James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 25/42] lustre: lov: don't use inline for operations functions James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 26/42] lustre: osc: don't allow negative grants James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 27/42] lustre: mgc: Use IR for client->MDS/OST connections James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 28/42] lustre: ldlm: don't use a locks without l_ast_data James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 29/42] lustre: lov: discard unused lov_dump_lmm* functions James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 30/42] lustre: lov: guard against class_exp2obd() returning NULL James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 31/42] lustre: clio: don't call aio_complete() in lustre upon errors James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 32/42] lustre: llite: it_lock_bits should be bit-wise tested James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 33/42] lustre: ldlm: control lru_size for extent lock James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 34/42] lustre: ldlm: pool fixes James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 35/42] lustre: ldlm: pool recalc forceful call James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 36/42] lustre: don't take spinlock to read a 'long' James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 37/42] lustre: osc: Do ELC on locks with no OSC object James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 38/42] lnet: deadlock on LNet shutdown James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 39/42] lustre: update version to 2.13.56 James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 40/42] lustre: llite: increase readahead default values James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 41/42] lustre: obdclass: don't initialize obj for zero FID James Simmons
2020-10-06  0:06 ` [lustre-devel] [PATCH 42/42] lustre: obdclass: fixes and improvements for jobid James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).