lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021
@ 2021-01-21 17:16 James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 01/39] lustre: ldlm: page discard speedup James Simmons
                   ` (38 more replies)
  0 siblings, 39 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Update the Linux Lustre client to where the latest OpenSFS tree is at as of
Jan 21, 2021. Please review.

Alexander Zarochentsev (1):
  lustre: ldlm: page discard speedup

Andreas Dilger (1):
  lustre: ldlm: don't compute sumsq for pool stats

Andrew Perepechko (1):
  lustre: ptlrpc: fixes for RCU-related stalls

Andriy Skulysh (2):
  lustre: ldlm: Don't re-enqueue glimpse lock on read
  lustre: ldlm: osc_object_ast_clear() is called for mdc object on
    eviction

Bobi Jam (3):
  lustre: llite: don't check layout info for page discard
  lustre: lov: FIEMAP support for PFL and FLR file
  lustre: lov: correctly set OST obj size

Chris Horn (5):
  lnet: Correct handling of NETWORK_TIMEOUT status
  lnet: Introduce constant for net ID of LNET_NID_ANY
  lnet: lnd: Use NETWORK_TIMEOUT for txs on ibp_tx_queue
  lnet: lnd: Use NETWORK_TIMEOUT for some conn failures
  lnet: Introduce lnet_recovery_limit parameter

Etienne AUJAMES (1):
  lustre: mdc: process changelogs_catalog from the oldest rec

John L. Hammond (2):
  lustre: llite: return EOPNOTSUPP if fallocate is not supported
  lnet: use an unbound cred in kiblnd_resolve_addr()

Lai Siyao (1):
  lustre: lmv: optimize dir shard revalidate

Mikhail Pershin (3):
  lustre: lov: fix SEEK_HOLE calcs at component end
  lustre: dom: non-blocking enqueue for DOM locks
  lustre: cksum: add lprocfs checksum support in MDC/MDT

Mr NeilBrown (1):
  lnet: o2iblnd: remove FMR-pool support.

Olaf Faaland (1):
  lustre: osc: prevent overflow of o_dropped

Oleg Drokin (4):
  lustre: ldlm: Do not wait for lock replay sending if import
    dsconnected
  lustre: ldlm: Do not hang if recovery restarted during lock replay
  lustre: Use vfree_atomic instead of vfree
  lustre: update version to 2.13.57

Qian Yingjin (1):
  lustre: uapi: fix compatibility for LL_IOC_MDC_GETINFO

Sebastien Buisson (1):
  lustre: llite: fiemap set flags for encrypted files

Serguei Smirnov (2):
  lnet: o2iblnd: retry qp creation with reduced queue depth
  lnet: socklnd: announce deprecation of 'use_tcp_bonding'

Wang Shilong (5):
  lustre: llite: fix client evicition with DIO
  lustre: llite: allow DIO with unaligned IO count
  lustre: quota: df should return projid-specific values
  lustre: llite: try to improve mmap performance
  lustre: lov: instantiate components layout for fallocate

Yang Sheng (4):
  lustre: osc: skip 0 row for rpc_stats
  lnet: discard the callback
  lustre: mdc: avoid easize set to 0
  lustre: ldlm: Use req_mode while lock cleanup

 fs/lustre/include/cl_object.h             |   2 +-
 fs/lustre/include/lustre_dlm.h            |   4 +-
 fs/lustre/include/lustre_export.h         |   5 +
 fs/lustre/include/lustre_net.h            |  15 +-
 fs/lustre/include/lustre_osc.h            |   7 +-
 fs/lustre/include/obd.h                   |   2 +-
 fs/lustre/include/obd_class.h             |   3 +-
 fs/lustre/ldlm/ldlm_flock.c               |   2 +-
 fs/lustre/ldlm/ldlm_lib.c                 |   2 +-
 fs/lustre/ldlm/ldlm_lock.c                |  16 +-
 fs/lustre/ldlm/ldlm_pool.c                |  33 ++--
 fs/lustre/ldlm/ldlm_request.c             |  75 ++++----
 fs/lustre/llite/dir.c                     |  36 ++--
 fs/lustre/llite/file.c                    |  35 +++-
 fs/lustre/llite/llite_internal.h          |  19 ++
 fs/lustre/llite/llite_lib.c               |  57 +++++-
 fs/lustre/llite/lproc_llite.c             |  47 +++++
 fs/lustre/llite/rw.c                      | 146 +++++++++++++--
 fs/lustre/llite/rw26.c                    |  39 +++-
 fs/lustre/llite/vvp_io.c                  |   8 +-
 fs/lustre/lmv/lmv_intent.c                |  15 +-
 fs/lustre/lmv/lmv_internal.h              |   1 -
 fs/lustre/lmv/lmv_obd.c                   |   3 +-
 fs/lustre/lov/lov_cl_internal.h           |   5 +
 fs/lustre/lov/lov_internal.h              |   1 +
 fs/lustre/lov/lov_io.c                    | 134 +++++++++++--
 fs/lustre/lov/lov_lock.c                  |  31 ++-
 fs/lustre/lov/lov_object.c                | 249 +++++++++++++++---------
 fs/lustre/lov/lov_offset.c                |  10 +-
 fs/lustre/mdc/lproc_mdc.c                 | 128 ++++++++++++-
 fs/lustre/mdc/mdc_changelog.c             |   2 +-
 fs/lustre/mdc/mdc_dev.c                   | 144 ++++++++------
 fs/lustre/mdc/mdc_internal.h              |  10 +
 fs/lustre/mdc/mdc_locks.c                 |  14 +-
 fs/lustre/obdclass/lprocfs_status.c       |   1 +
 fs/lustre/osc/lproc_osc.c                 |  12 +-
 fs/lustre/osc/osc_cache.c                 |  48 +++--
 fs/lustre/osc/osc_lock.c                  |   3 +
 fs/lustre/osc/osc_object.c                |   2 +-
 fs/lustre/osc/osc_request.c               |  57 +++---
 fs/lustre/ptlrpc/client.c                 |  26 ++-
 fs/lustre/ptlrpc/niobuf.c                 |   7 +-
 fs/lustre/ptlrpc/sec.c                    |  21 ++-
 fs/lustre/ptlrpc/sec_bulk.c               |   6 +-
 fs/lustre/ptlrpc/sec_config.c             |  10 +-
 fs/lustre/ptlrpc/sec_null.c               |  24 ++-
 fs/lustre/ptlrpc/sec_plain.c              |  24 ++-
 fs/lustre/ptlrpc/service.c                |  21 ++-
 fs/lustre/ptlrpc/wiretest.c               |   3 +-
 include/linux/lnet/api.h                  |   3 +-
 include/linux/lnet/lib-lnet.h             |   2 +
 include/linux/lnet/lib-types.h            |   1 +
 include/uapi/linux/lnet/lnet-types.h      |   2 +
 include/uapi/linux/lustre/lustre_fiemap.h |  30 ++-
 include/uapi/linux/lustre/lustre_idl.h    |   8 +
 include/uapi/linux/lustre/lustre_user.h   |  21 +--
 include/uapi/linux/lustre/lustre_ver.h    |   4 +-
 mm/vmalloc.c                              |   1 +
 net/lnet/klnds/o2iblnd/o2iblnd.c          | 302 +++++++++++-------------------
 net/lnet/klnds/o2iblnd/o2iblnd.h          |  12 +-
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c       | 102 ++++++----
 net/lnet/lnet/api-ni.c                    |  13 +-
 net/lnet/lnet/config.c                    |   8 +-
 net/lnet/lnet/lib-md.c                    |  25 ++-
 net/lnet/lnet/lib-move.c                  |   3 +-
 net/lnet/lnet/lib-msg.c                   |   5 +
 net/lnet/lnet/nidstrings.c                |   2 +-
 net/lnet/lnet/peer.c                      |   2 +-
 net/lnet/lnet/router.c                    |   6 +-
 69 files changed, 1434 insertions(+), 683 deletions(-)

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 01/39] lustre: ldlm: page discard speedup
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 02/39] lustre: ptlrpc: fixes for RCU-related stalls James Simmons
                   ` (37 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexander Zarochentsev, Lustre Development List

From: Alexander Zarochentsev <c17826@cray.com>

Improving check_and_discard_cb, allowing to cache
negative result of dlm lock lookup and avoid
excessive osc_dlm_lock_at_pgoff() calls.

HPE-bug-id: LUS-6432
WC-bug-id: https://jira.whamcloud.com/browse/LU-11290
Lustre-commit: 0f48cd0b9856fe ("LU-11290 ldlm: page discard speedup")
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/39327
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm.h |  1 +
 fs/lustre/include/lustre_osc.h |  5 +++++
 fs/lustre/ldlm/ldlm_lock.c     | 16 +++++++++-----
 fs/lustre/osc/osc_cache.c      | 48 +++++++++++++++++++++++++++++++-----------
 fs/lustre/osc/osc_lock.c       |  3 +++
 5 files changed, 56 insertions(+), 17 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index f056c2d..e4c95a2 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -858,6 +858,7 @@ enum ldlm_match_flags {
 	LDLM_MATCH_UNREF	= BIT(0),
 	LDLM_MATCH_AST		= BIT(1),
 	LDLM_MATCH_AST_ANY	= BIT(2),
+	LDLM_MATCH_RIGHT	= BIT(3),
 };
 
 /**
diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index ef5237b..e7bf392 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -186,6 +186,7 @@ struct osc_thread_info {
 	 */
 	pgoff_t			oti_next_index;
 	pgoff_t			oti_fn_index; /* first non-overlapped index */
+	pgoff_t			oti_ng_index; /* negative lock caching */
 	struct cl_sync_io	oti_anchor;
 	struct cl_req_attr	oti_req_attr;
 	struct lu_buf		oti_ladvise_buf;
@@ -248,6 +249,10 @@ enum osc_dap_flags {
 	 * check ast data is present, requested to cancel cb
 	 */
 	OSC_DAP_FL_AST	     = BIT(2),
+	/**
+	 * look at right region for the desired lock
+	 */
+	OSC_DAP_FL_RIGHT     = BIT(3),
 };
 
 /*
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index 56f1550..b7ce0bb 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -1093,8 +1093,9 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata)
 
 	switch (lock->l_resource->lr_type) {
 	case LDLM_EXTENT:
-		if (lpol->l_extent.start > data->lmd_policy->l_extent.start ||
-		    lpol->l_extent.end < data->lmd_policy->l_extent.end)
+		if (!(data->lmd_match & LDLM_MATCH_RIGHT) &&
+		    (lpol->l_extent.start > data->lmd_policy->l_extent.start ||
+		     lpol->l_extent.end < data->lmd_policy->l_extent.end))
 			return false;
 
 		if (unlikely(match == LCK_GROUP) &&
@@ -1160,10 +1161,17 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata)
 struct ldlm_lock *search_itree(struct ldlm_resource *res,
 			       struct ldlm_match_data *data)
 {
+	struct ldlm_extent ext = {
+		.start	= data->lmd_policy->l_extent.start,
+		.end	= data->lmd_policy->l_extent.end
+	};
 	int idx;
 
 	data->lmd_lock = NULL;
 
+	if (data->lmd_match & LDLM_MATCH_RIGHT)
+		ext.end = OBD_OBJECT_EOF;
+
 	for (idx = 0; idx < LCK_MODE_NUM; idx++) {
 		struct ldlm_interval_tree *tree = &res->lr_itree[idx];
 
@@ -1173,9 +1181,7 @@ struct ldlm_lock *search_itree(struct ldlm_resource *res,
 		if (!(tree->lit_mode & *data->lmd_mode))
 			continue;
 
-		ldlm_extent_search(&tree->lit_root,
-				   data->lmd_policy->l_extent.start,
-				   data->lmd_policy->l_extent.end,
+		ldlm_extent_search(&tree->lit_root, ext.start, ext.end,
 				   lock_matches, data);
 		if (data->lmd_lock)
 			return data->lmd_lock;
diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index ddf6fb1..d511ece 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -3207,28 +3207,51 @@ static bool check_and_discard_cb(const struct lu_env *env, struct cl_io *io,
 {
 	struct osc_thread_info *info = osc_env_info(env);
 	struct osc_object *osc = cbdata;
+	struct cl_page *page = ops->ops_cl.cpl_page;
 	pgoff_t index;
+	bool discard = false;
 
 	index = osc_index(ops);
-	if (index >= info->oti_fn_index) {
+	/* negative lock caching */
+	if (index < info->oti_ng_index) {
+		discard = true;
+	} else if (index >= info->oti_fn_index) {
 		struct ldlm_lock *tmp;
-		struct cl_page *page = ops->ops_cl.cpl_page;
 
 		/* refresh non-overlapped index */
 		tmp = osc_dlmlock_at_pgoff(env, osc, index,
-					   OSC_DAP_FL_TEST_LOCK | OSC_DAP_FL_AST);
+					   OSC_DAP_FL_TEST_LOCK |
+					   OSC_DAP_FL_AST | OSC_DAP_FL_RIGHT);
 		if (tmp) {
 			u64 end = tmp->l_policy_data.l_extent.end;
-			/* Cache the first-non-overlapped index so as to skip
-			 * all pages within [index, oti_fn_index). This is safe
-			 * because if tmp lock is canceled, it will discard
-			 * these pages.
-			 */
-			info->oti_fn_index = cl_index(osc2cl(osc), end + 1);
-			if (end == OBD_OBJECT_EOF)
-				info->oti_fn_index = CL_PAGE_EOF;
+			u64 start = tmp->l_policy_data.l_extent.start;
+
+			/* no lock covering this page */
+			if (index < cl_index(osc2cl(osc), start)) {
+				/* no lock at @index, first lock at @start */
+				info->oti_ng_index = cl_index(osc2cl(osc),
+							      start);
+				discard = true;
+			} else {
+				/* Cache the first-non-overlapped index so as to
+				 * skip all pages within [index, oti_fn_index).
+				 * This is safe because if tmp lock is canceled,
+				 * it will discard these pages.
+				 */
+				info->oti_fn_index = cl_index(osc2cl(osc),
+							      end + 1);
+				if (end == OBD_OBJECT_EOF)
+					info->oti_fn_index = CL_PAGE_EOF;
+			}
 			LDLM_LOCK_PUT(tmp);
-		} else if (cl_page_own(env, io, page) == 0) {
+		} else {
+			info->oti_ng_index = CL_PAGE_EOF;
+			discard = true;
+		}
+	}
+
+	if (discard) {
+		if (cl_page_own(env, io, page) == 0) {
 			/* discard the page */
 			cl_page_discard(env, io, page);
 			cl_page_disown(env, io, page);
@@ -3292,6 +3315,7 @@ int osc_lock_discard_pages(const struct lu_env *env, struct osc_object *osc,
 	cb = discard ? osc_discard_cb : check_and_discard_cb;
 	info->oti_fn_index = start;
 	info->oti_next_index = start;
+	info->oti_ng_index = 0;
 
 	osc_page_gang_lookup(env, io, osc,
 			     info->oti_next_index, end, cb, osc);
diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index 7bfcbfb..536142f2 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -1282,6 +1282,9 @@ struct ldlm_lock *osc_obj_dlmlock_at_pgoff(const struct lu_env *env,
 	if (dap_flags & OSC_DAP_FL_CANCELING)
 		match_flags |= LDLM_MATCH_UNREF;
 
+	if (dap_flags & OSC_DAP_FL_RIGHT)
+		match_flags |= LDLM_MATCH_RIGHT;
+
 	/*
 	 * It is fine to match any group lock since there could be only one
 	 * with a uniq gid and it conflicts with all other lock modes too
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 02/39] lustre: ptlrpc: fixes for RCU-related stalls
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 01/39] lustre: ldlm: page discard speedup James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 03/39] lustre: ldlm: Do not wait for lock replay sending if import dsconnected James Simmons
                   ` (36 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Andrew Perepechko, Lustre Development List

From: Andrew Perepechko <andrew.perepechko@hpe.com>

ptlrpc_expired_set() may need to process a lot
of requests, so the processing loop needs to
schedule from time to time to avoid RCU-related
stalls.

HPE-bug-id: LUS-8939
WC-bug-id: https://jira.whamcloud.com/browse/LU-13822
Lustre-commit: 1bbd5b5f0ee042 ("LU-13822 ptlrpc: fixes for RCU-related stalls")
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/39514
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/client.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index 0e01ab33..2002c03 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -2249,6 +2249,11 @@ void ptlrpc_expired_set(struct ptlrpc_request_set *set)
 		 * ptlrpcd thread.
 		 */
 		ptlrpc_expire_one_request(req, 1);
+		/*
+		 * Loops require that we resched once in a while to avoid
+		 * RCU stalls and a few other problems.
+		 */
+		cond_resched();
 	}
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 03/39] lustre: ldlm: Do not wait for lock replay sending if import dsconnected
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 01/39] lustre: ldlm: page discard speedup James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 02/39] lustre: ptlrpc: fixes for RCU-related stalls James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 04/39] lustre: ldlm: Do not hang if recovery restarted during lock replay James Simmons
                   ` (35 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

If import disconnected while we were preparing to send some lock replays
the sending RPC would get stuck on the sending list and would keep
the reconnected import state from progressing from REPLAY
to REPLAY_LOCKS state waiting for the queued replay RPCs to finish.

Set them as no_delay to ensure they don't wait.

LU-13600 exacerbated this issue some but it certainly exist
before it as well.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14027
Lustre-commit: f06a4efe13faca ("LU-14027 ldlm: Do not wait for lock replay sending if import dsconnected")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40272
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_request.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 74bcba2..a2e1969 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -2173,6 +2173,8 @@ static int replay_one_lock(struct obd_import *imp, struct ldlm_lock *lock)
 
 	/* We're part of recovery, so don't wait for it. */
 	req->rq_send_state = LUSTRE_IMP_REPLAY_LOCKS;
+	/* If the state changed while we were prepared, don't wait */
+	req->rq_no_delay = 1;
 
 	body = req_capsule_client_get(&req->rq_pill, &RMF_DLM_REQ);
 	ldlm_lock2desc(lock, &body->lock_desc);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 04/39] lustre: ldlm: Do not hang if recovery restarted during lock replay
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (2 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 03/39] lustre: ldlm: Do not wait for lock replay sending if import dsconnected James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 05/39] lnet: Correct handling of NETWORK_TIMEOUT status James Simmons
                   ` (34 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

LU-13600 introduced lock ratelimiting logic, but it did not take
into account that if there's a disconnection in the REPLAY_LOCKS
phase then yet unsent locks get stuck in the sending queue so
the replay locks thread hangs with imp_replay_inflight elevated
above zero.

The direct consequence from that is recovery state machine never
advances from REPLAY to REPLAY_LOCKS status when imp_replay_inflight
is non zero.

Adjust __ldlm_replay_locks() to check if the import state changed
before attempting to send any more requests.

Add a testcase.

Fixes: 8cc7f22847 ("lustre: ptlrpc: limit rate of lock replays")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14027
Lustre-commit: 7ca495ec67f474 ("LU-14027 ldlm: Do not hang if recovery restarted during lock replay")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40238
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_request.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index a2e1969..86b10a7 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -2271,9 +2271,12 @@ int __ldlm_replay_locks(struct obd_import *imp, bool rate_limit)
 		lock = list_first_entry(&list, struct ldlm_lock,
 					l_pending_chain);
 		list_del_init(&lock->l_pending_chain);
-		if (rc) {
+		/* If we disconnected in the middle - cleanup and let
+		 * reconnection to happen again. LU-14027
+		 */
+		if (rc || (imp->imp_state != LUSTRE_IMP_REPLAY_LOCKS)) {
 			LDLM_LOCK_RELEASE(lock);
-			continue; /* or try to do the rest? */
+			continue;
 		}
 		rc = replay_one_lock(imp, lock);
 		LDLM_LOCK_RELEASE(lock);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 05/39] lnet: Correct handling of NETWORK_TIMEOUT status
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (3 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 04/39] lustre: ldlm: Do not hang if recovery restarted during lock replay James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 06/39] lnet: Introduce constant for net ID of LNET_NID_ANY James Simmons
                   ` (33 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

The original intent of the LNET_MSG_STATUS_NETWORK_TIMEOUT health
status was to handle cases where the LND was unsure whether the
failure was due to the local or remote NI. In this case, we'll want
to decrement both the local and remote NI health and allow recovery
to ascertain which interface is actually healthy.

HPE-bug-id: LUS-9342
WC-bug-id: https://jira.whamcloud.com/browse/LU-13751
Lustre-commit: ffd4523f2d50ef ("LU-13571 lnet: Correct handling of NETWORK_TIMEOUT status")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39898
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-msg.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index e84cf02..d888090 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -925,9 +925,14 @@
 
 	case LNET_MSG_STATUS_REMOTE_ERROR:
 	case LNET_MSG_STATUS_REMOTE_TIMEOUT:
+		if (handle_remote_health)
+			lnet_handle_remote_failure(lpni);
+		return -1;
 	case LNET_MSG_STATUS_NETWORK_TIMEOUT:
 		if (handle_remote_health)
 			lnet_handle_remote_failure(lpni);
+		if (handle_local_health)
+			lnet_handle_local_failure(ni);
 		return -1;
 	default:
 		LBUG();
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 06/39] lnet: Introduce constant for net ID of LNET_NID_ANY
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (4 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 05/39] lnet: Correct handling of NETWORK_TIMEOUT status James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 07/39] lustre: ldlm: Don't re-enqueue glimpse lock on read James Simmons
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

This patch adds a new constant, LNET_NET_ANY, to represent the net
ID of the LNET_NID_ANY wildcard NID.

HPE-bug-id: LUS-9122
WC-bug-id: https://jira.whamcloud.com/browse/LU-13837
Lustre-commit: 1741e993c874ed ("LU-13837 lnet: Introduce constant for net ID of LNET_NID_ANY")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39544
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_lib.c            |  2 +-
 fs/lustre/ptlrpc/sec_config.c        | 10 +++++-----
 include/uapi/linux/lnet/lnet-types.h |  2 ++
 net/lnet/lnet/config.c               |  8 ++++----
 net/lnet/lnet/lib-move.c             |  3 +--
 net/lnet/lnet/nidstrings.c           |  2 +-
 net/lnet/lnet/peer.c                 |  2 +-
 net/lnet/lnet/router.c               |  6 +++---
 8 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c
index 713ca1c..2965395 100644
--- a/fs/lustre/ldlm/ldlm_lib.c
+++ b/fs/lustre/ldlm/ldlm_lib.c
@@ -487,7 +487,7 @@ int client_obd_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 	if (lustre_cfg_buf(lcfg, 4)) {
 		u32 refnet = libcfs_str2net(lustre_cfg_string(lcfg, 4));
 
-		if (refnet == LNET_NIDNET(LNET_NID_ANY)) {
+		if (refnet == LNET_NET_ANY) {
 			rc = -EINVAL;
 			CERROR("%s: bad mount option 'network=%s': rc = %d\n",
 			       obd->obd_name, lustre_cfg_string(lcfg, 4),
diff --git a/fs/lustre/ptlrpc/sec_config.c b/fs/lustre/ptlrpc/sec_config.c
index 9ced6c7..0891f2f 100644
--- a/fs/lustre/ptlrpc/sec_config.c
+++ b/fs/lustre/ptlrpc/sec_config.c
@@ -145,7 +145,7 @@ static void get_default_flavor(struct sptlrpc_flavor *sf)
 
 static void sptlrpc_rule_init(struct sptlrpc_rule *rule)
 {
-	rule->sr_netid = LNET_NIDNET(LNET_NID_ANY);
+	rule->sr_netid = LNET_NET_ANY;
 	rule->sr_from = LUSTRE_SP_ANY;
 	rule->sr_to = LUSTRE_SP_ANY;
 	rule->sr_padding = 0;
@@ -177,7 +177,7 @@ static int sptlrpc_parse_rule(char *param, struct sptlrpc_rule *rule)
 	/* 1.1 network */
 	if (strcmp(param, "default")) {
 		rule->sr_netid = libcfs_str2net(param);
-		if (rule->sr_netid == LNET_NIDNET(LNET_NID_ANY)) {
+		if (rule->sr_netid == LNET_NET_ANY) {
 			CERROR("invalid network name: %s\n", param);
 			return -EINVAL;
 		}
@@ -263,7 +263,7 @@ static inline int rule_spec_dir(struct sptlrpc_rule *rule)
 
 static inline int rule_spec_net(struct sptlrpc_rule *rule)
 {
-	return (rule->sr_netid != LNET_NIDNET(LNET_NID_ANY));
+	return (rule->sr_netid != LNET_NET_ANY);
 }
 
 static inline int rule_match_dir(struct sptlrpc_rule *r1,
@@ -384,8 +384,8 @@ static int sptlrpc_rule_set_choose(struct sptlrpc_rule_set *rset,
 	for (n = 0; n < rset->srs_nrule; n++) {
 		r = &rset->srs_rules[n];
 
-		if (LNET_NIDNET(nid) != LNET_NIDNET(LNET_NID_ANY) &&
-		    r->sr_netid != LNET_NIDNET(LNET_NID_ANY) &&
+		if (LNET_NIDNET(nid) != LNET_NET_ANY &&
+		    r->sr_netid != LNET_NET_ANY &&
 		    LNET_NIDNET(nid) != r->sr_netid)
 			continue;
 
diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index 70fab42..3324792 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -112,6 +112,8 @@ static inline __u32 LNET_MKNET(__u32 type, __u32 num)
 /** The lolnd NID (i.e. myself) */
 #define LNET_NID_LO_0 LNET_MKNID(LNET_MKNET(LOLND, 0), 0)
 
+#define LNET_NET_ANY LNET_NIDNET(LNET_NID_ANY)
+
 /* Packed version of lnet_process_id to transfer via network */
 struct lnet_process_id_packed {
 	/* node id / process id */
diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c
index 6ddd9d6..b078bc8 100644
--- a/net/lnet/lnet/config.c
+++ b/net/lnet/lnet/config.c
@@ -679,7 +679,7 @@ struct lnet_ni *
 		 * At this point the name is properly terminated.
 		 */
 		net_id = libcfs_str2net(name);
-		if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
+		if (net_id == LNET_NET_ANY) {
 			LCONSOLE_ERROR_MSG(0x113,
 					"Unrecognised network type\n");
 			str = name;
@@ -1169,7 +1169,7 @@ struct lnet_ni *
 
 			if (ntokens == 1) {
 				net = libcfs_str2net(ltb->ltb_text);
-				if (net == LNET_NIDNET(LNET_NID_ANY) ||
+				if (net == LNET_NET_ANY ||
 				    LNET_NETTYP(net) == LOLND)
 					goto token_error;
 			} else {
@@ -1197,7 +1197,7 @@ struct lnet_ni *
 
 	list_for_each_entry(ltb1, &nets, ltb_list) {
 		net = libcfs_str2net(ltb1->ltb_text);
-		LASSERT(net != LNET_NIDNET(LNET_NID_ANY));
+		LASSERT(net != LNET_NET_ANY);
 
 		list_for_each_entry(ltb2, &gateways, ltb_list) {
 			nid = libcfs_str2nid(ltb2->ltb_text);
@@ -1403,7 +1403,7 @@ struct lnet_ni *
 			*sep++ = 0;
 
 		net = lnet_netspec2net(tb->ltb_text);
-		if (net == LNET_NIDNET(LNET_NID_ANY)) {
+		if (net == LNET_NET_ANY) {
 			lnet_syntax("ip2nets", source, offset,
 				    strlen(tb->ltb_text));
 			return -EINVAL;
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 1c9fb41..4687acd 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1222,10 +1222,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		    struct lnet_peer *peer, u32 net_id)
 {
 	struct lnet_peer_net *peer_net;
-	u32 any_net = LNET_NIDNET(LNET_NID_ANY);
 
 	/* find the best_lpni on any local network */
-	if (net_id == any_net) {
+	if (net_id == LNET_NET_ANY) {
 		struct lnet_peer_ni *best_lpni = NULL;
 		struct lnet_peer_net *lpn;
 
diff --git a/net/lnet/lnet/nidstrings.c b/net/lnet/lnet/nidstrings.c
index fb8d3e2..f260092 100644
--- a/net/lnet/lnet/nidstrings.c
+++ b/net/lnet/lnet/nidstrings.c
@@ -884,7 +884,7 @@ int cfs_print_nidlist(char *buffer, int count, struct list_head *nidlist)
 	if (libcfs_str2net_internal(str, &net))
 		return net;
 
-	return LNET_NIDNET(LNET_NID_ANY);
+	return LNET_NET_ANY;
 }
 EXPORT_SYMBOL(libcfs_str2net);
 
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 3889310..70df37a 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -596,7 +596,7 @@ void lnet_peer_uninit(void)
 			gw_nid = lp->lpni_peer_net->lpn_peer->lp_primary_nid;
 
 			lnet_net_unlock(LNET_LOCK_EX);
-			lnet_del_route(LNET_NIDNET(LNET_NID_ANY), gw_nid);
+			lnet_del_route(LNET_NET_ANY, gw_nid);
 			lnet_net_lock(LNET_LOCK_EX);
 		}
 	}
diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index 1253e4c..e030b16 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -664,7 +664,7 @@ static void lnet_shuffle_seed(void)
 
 	if (gateway == LNET_NID_ANY ||
 	    gateway == LNET_NID_LO_0 ||
-	    net == LNET_NIDNET(LNET_NID_ANY) ||
+	    net == LNET_NET_ANY ||
 	    LNET_NETTYP(net) == LOLND ||
 	    LNET_NIDNET(gateway) == net ||
 	    (hops != LNET_UNDEFINED_HOPS && (hops < 1 || hops > 255)))
@@ -841,7 +841,7 @@ static void lnet_shuffle_seed(void)
 		lnet_peer_ni_decref_locked(lpni);
 	}
 
-	if (net != LNET_NIDNET(LNET_NID_ANY)) {
+	if (net != LNET_NET_ANY) {
 		rnet = lnet_find_rnet_locked(net);
 		if (!rnet) {
 			lnet_net_unlock(LNET_LOCK_EX);
@@ -898,7 +898,7 @@ static void lnet_shuffle_seed(void)
 void
 lnet_destroy_routes(void)
 {
-	lnet_del_route(LNET_NIDNET(LNET_NID_ANY), LNET_NID_ANY);
+	lnet_del_route(LNET_NET_ANY, LNET_NID_ANY);
 }
 
 int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 07/39] lustre: ldlm: Don't re-enqueue glimpse lock on read
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (5 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 06/39] lnet: Introduce constant for net ID of LNET_NID_ANY James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 08/39] lustre: osc: prevent overflow of o_dropped James Simmons
                   ` (31 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Andriy Skulysh, Lustre Development List

From: Andriy Skulysh <c17819@cray.com>

cl_glimpse_lock() doesn't match a lock with LDLM_FL_BL_AST
even if this lock is acquired by the same thread earlier.

It needs only size to check for spare file,
so let't add LDLM_FL_CBPENDING to match flags.

 #1 [ffff9ba7326036f0] schedule at ffffffff87b67c49
 #2 [ffff9ba732603700] obd_get_request_slot at ffffffffc0dbe0a4 [obdclass]
 #3 [ffff9ba7326037b8] ldlm_cli_enqueue at ffffffffc0faedce [ptlrpc]
 #4 [ffff9ba732603878] mdc_enqueue_send at ffffffffc11b38a8 [mdc]
 #5 [ffff9ba732603938] mdc_lock_enqueue at ffffffffc11b3eb2 [mdc]
 #6 [ffff9ba7326039a8] cl_lock_enqueue at ffffffffc0dfee95 [obdclass]
 #7 [ffff9ba7326039e0] lov_lock_enqueue at ffffffffc10ef265 [lov]
 #8 [ffff9ba732603a20] cl_lock_enqueue at ffffffffc0dfee95 [obdclass]
 #9 [ffff9ba732603a58] cl_lock_request at ffffffffc0dff54b [obdclass]

HPE-bug-id: LUS-8690
WC-bug-id: https://jira.whamcloud.com/browse/LU-13987
Lustre-commit: 829a3a93d43e4d ("LU-13987 ldlm: Don't re-enqueue glimpse lock on read")
Reviewed-on: https://review.whamcloud.com/40044
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_osc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index e7bf392..e32723c 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -203,7 +203,7 @@ static inline u64 osc_enq2ldlm_flags(u32 enqflags)
 	if (enqflags & CEF_NONBLOCK)
 		result |= LDLM_FL_BLOCK_NOWAIT;
 	if (enqflags & CEF_GLIMPSE)
-		result |= LDLM_FL_HAS_INTENT;
+		result |= LDLM_FL_HAS_INTENT | LDLM_FL_CBPENDING;
 	if (enqflags & CEF_DISCARD_DATA)
 		result |= LDLM_FL_AST_DISCARD_DATA;
 	if (enqflags & CEF_PEEK)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 08/39] lustre: osc: prevent overflow of o_dropped
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (6 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 07/39] lustre: ldlm: Don't re-enqueue glimpse lock on read James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 09/39] lustre: llite: fix client evicition with DIO James Simmons
                   ` (30 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Olaf Faaland <faaland1@llnl.gov>

In osc_announce_cached(), prevent o_dropped from overflowing.
Necessary because o_dropped AKA o_misc is 32 bits, but cl_lost_grant
is 64 bits.

Add a CDEBUG call so we can tell whether this happened.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14125
Lustre-commit: 82e9a11056a552 ("LU-14125 osc: prevent overflow of o_dropped")
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/40659
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_request.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index f225ccd..4a4b5ef 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -754,11 +754,21 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
 				    ~(PTLRPC_MAX_BRW_SIZE * 4UL));
 	}
 	oa->o_grant = cli->cl_avail_grant + cli->cl_reserved_grant;
-	oa->o_dropped = cli->cl_lost_grant;
-	cli->cl_lost_grant = 0;
+	/* o_dropped AKA o_misc is 32 bits, but cl_lost_grant is 64 bits */
+	if (cli->cl_lost_grant > INT_MAX) {
+		CDEBUG(D_CACHE,
+		       "%s: avoided o_dropped overflow: cl_lost_grant %lu\n",
+		       cli_name(cli), cli->cl_lost_grant);
+		oa->o_dropped = INT_MAX;
+	} else {
+		oa->o_dropped = cli->cl_lost_grant;
+	}
+	cli->cl_lost_grant -= oa->o_dropped;
 	spin_unlock(&cli->cl_loi_list_lock);
-	CDEBUG(D_CACHE, "dirty: %llu undirty: %u dropped %u grant: %llu\n",
-	       oa->o_dirty, oa->o_undirty, oa->o_dropped, oa->o_grant);
+	CDEBUG(D_CACHE,
+	       "%s: dirty: %llu undirty: %u dropped %u grant: %llu cl_lost_grant %lu\n",
+	       cli_name(cli), oa->o_dirty, oa->o_undirty, oa->o_dropped,
+	       oa->o_grant, cli->cl_lost_grant);
 }
 
 void osc_update_next_shrink(struct client_obd *cli)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 09/39] lustre: llite: fix client evicition with DIO
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (7 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 08/39] lustre: osc: prevent overflow of o_dropped James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 10/39] lustre: Use vfree_atomic instead of vfree James Simmons
                   ` (29 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

We set lockless in file open if O_DIRECT flag is passed,
however O_DIRECT flag could be cleared by
fcntl(..., F_SETFL, ...).

Finally we comes to a case where buffer IO without lock
held properly, and hit hang:

[<ffffffffc0d421ed>] osc_extent_wait+0x21d/0x7c0 [osc]
[<ffffffffc0d44897>] osc_cache_wait_range+0x2e7/0x940 [osc]
[<ffffffffc0d4585e>] osc_cache_writeback_range+0x96e/0xff0 [osc]
[<ffffffffc0d31c45>] osc_lock_flush+0x195/0x290 [osc]
[<ffffffffc0d31d7c>] osc_lock_lockless_cancel+0x3c/0xe0 [osc]
[<ffffffffc081f488>] cl_lock_cancel+0x78/0x160 [obdclass]
[<ffffffffc0cd8079>] lov_lock_cancel+0x99/0x190 [lov]
[<ffffffffc081f488>] cl_lock_cancel+0x78/0x160 [obdclass]
[<ffffffffc081f9a2>] cl_lock_release+0x52/0x140 [obdclass]
[<ffffffffc08238a9>] cl_io_unlock+0x139/0x290 [obdclass]
[<ffffffffc08242e8>] cl_io_loop+0xb8/0x200 [obdclass]
[<ffffffffc0e1d36b>] ll_file_io_generic+0x91b/0xdf0 [lustre]
[<ffffffffc0e1dd0c>] ll_file_aio_write+0x29c/0x6e0 [lustre]
[<ffffffffc0e1e250>] ll_file_write+0x100/0x1c0 [lustre]
[<ffffffffa984aa90>] vfs_write+0xc0/0x1f0
[<ffffffffa984b8af>] SyS_write+0x7f/0xf0
[<ffffffffa9d8eede>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff

Lock cancel time out in the server side and client
eviction happen.

Fix this problem by testing O_DIRECT flag to decide if
we could issue lockless IO.

Fixes: bf18998820 ("lustre: clio: turn on lockless for some kind of IO")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14072
Lustre-commit: f348437218d0b9 ("LU-14072 llite: fix client evicition with DIO")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/40389
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h           | 2 +-
 fs/lustre/llite/file.c                  | 9 +++------
 fs/lustre/llite/rw.c                    | 4 ++--
 fs/lustre/llite/rw26.c                  | 6 +++---
 fs/lustre/llite/vvp_io.c                | 6 +++---
 include/uapi/linux/lustre/lustre_user.h | 1 -
 6 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index e17385c0..d2cee34 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -1962,7 +1962,7 @@ struct cl_io {
 	/**
 	 * Ignore lockless and do normal locking for this io.
 	 */
-				ci_ignore_lockless:1,
+				ci_dio_lock:1,
 	/**
 	 * Set if we've tried all mirrors for this read IO, if it's not set,
 	 * the read IO will check to-be-read OSCs' status, and make fast-switch
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index f7f917b..2b0ffad 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -945,9 +945,6 @@ int ll_file_open(struct inode *inode, struct file *file)
 
 	mutex_unlock(&lli->lli_och_mutex);
 
-	/* lockless for direct IO so that it can do IO in parallel */
-	if (file->f_flags & O_DIRECT)
-		fd->fd_flags |= LL_FILE_LOCKLESS_IO;
 	fd = NULL;
 
 	/* Must do this outside lli_och_mutex lock to prevent deadlock where
@@ -1573,7 +1570,7 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 	ssize_t result = 0;
 	int rc = 0;
 	unsigned int retried = 0;
-	unsigned int ignore_lockless = 0;
+	unsigned int dio_lock = 0;
 	bool is_aio = false;
 	struct cl_dio_aio *ci_aio = NULL;
 
@@ -1595,7 +1592,7 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 	io = vvp_env_thread_io(env);
 	ll_io_init(io, file, iot == CIT_WRITE, args);
 	io->ci_aio = ci_aio;
-	io->ci_ignore_lockless = ignore_lockless;
+	io->ci_dio_lock = dio_lock;
 	io->ci_ndelay_tried = retried;
 
 	if (cl_io_rw_init(env, io, iot, *ppos, count) == 0) {
@@ -1675,7 +1672,7 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 		       *ppos, count, result);
 		/* preserve the tried count for FLR */
 		retried = io->ci_ndelay_tried;
-		ignore_lockless = io->ci_ignore_lockless;
+		dio_lock = io->ci_dio_lock;
 		goto restart;
 	}
 
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 54f0b9a..da4a26d 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -1723,9 +1723,9 @@ int ll_readpage(struct file *file, struct page *vmpage)
 	 */
 	if (file->f_flags & O_DIRECT &&
 	    lcc && lcc->lcc_type == LCC_RW &&
-	    !io->ci_ignore_lockless) {
+	    !io->ci_dio_lock) {
 		unlock_page(vmpage);
-		io->ci_ignore_lockless = 1;
+		io->ci_dio_lock = 1;
 		io->ci_need_restart = 1;
 		return -ENOLCK;
 	}
diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index 1736e9a..605a326 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -538,12 +538,12 @@ static int ll_write_begin(struct file *file, struct address_space *mapping,
 		}
 
 		/*
-		 * Direct read can fall back to buffered read, but DIO is done
+		 * Direct write can fall back to buffered read, but DIO is done
 		 * with lockless i/o, and buffered requires LDLM locking, so
 		 * in this case we must restart without lockless.
 		 */
-		if (!io->ci_ignore_lockless) {
-			io->ci_ignore_lockless = 1;
+		if (!io->ci_dio_lock) {
+			io->ci_dio_lock = 1;
 			io->ci_need_restart = 1;
 			result = -ENOLCK;
 			goto out;
diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index d6ca267..8dbe835 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -557,11 +557,11 @@ static int vvp_io_rw_lock(const struct lu_env *env, struct cl_io *io,
 	if (vio->vui_fd) {
 		/* Group lock held means no lockless any more */
 		if (vio->vui_fd->fd_flags & LL_FILE_GROUP_LOCKED)
-			io->ci_ignore_lockless = 1;
+			io->ci_dio_lock = 1;
 
 		if (ll_file_nolock(vio->vui_fd->fd_file) ||
-		    (vio->vui_fd->fd_flags & LL_FILE_LOCKLESS_IO &&
-		     !io->ci_ignore_lockless))
+		    (vio->vui_fd->fd_file->f_flags & O_DIRECT &&
+		     !io->ci_dio_lock))
 			ast_flags |= CEF_NEVER;
 	}
 
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index b0301e1..143b7d5 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -402,7 +402,6 @@ struct ll_ioc_lease_id {
 #define LL_FILE_GROUP_LOCKED	0x00000002
 #define LL_FILE_READAHEA	0x00000004
 #define LL_FILE_LOCKED_DIRECTIO 0x00000008 /* client-side locks with dio */
-#define LL_FILE_LOCKLESS_IO	0x00000010 /* server-side locks with cio */
 #define LL_FILE_FLOCK_WARNING	0x00000020 /* warned about disabled flock */
 
 #define LOV_USER_MAGIC_V1	0x0BD10BD0
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 10/39] lustre: Use vfree_atomic instead of vfree
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (8 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 09/39] lustre: llite: fix client evicition with DIO James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 11/39] lnet: lnd: Use NETWORK_TIMEOUT for txs on ibp_tx_queue James Simmons
                   ` (28 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

Since vfree is unsafe to use in atomic context, we can use
vmalloc_free() introduced in linux 4.10 commit
bf22e37a641327e34681b7b6959d9646e3886770. To use this we have
to export vmalloc_free(). The biggest offender is in the ptlrpc
code so replace the kvfree() with vfree_atomic().

WC-bug-id: https://jira.whamcloud.com/browse/LU-12564
Lustre-commit: 7a9c0ca690eb00 ("LU-12564 libcfs: Use vfree_atomic instead of vfree")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40136
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/client.c    |  6 +++++-
 fs/lustre/ptlrpc/sec.c       | 21 +++++++++++++++++----
 fs/lustre/ptlrpc/sec_bulk.c  |  6 +++++-
 fs/lustre/ptlrpc/sec_null.c  | 24 +++++++++++++++++++-----
 fs/lustre/ptlrpc/sec_plain.c | 24 +++++++++++++++++++-----
 fs/lustre/ptlrpc/service.c   | 21 +++++++++++++++++----
 mm/vmalloc.c                 |  1 +
 7 files changed, 83 insertions(+), 20 deletions(-)

diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index 2002c03..4b8aa25 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -38,6 +38,7 @@
 #include <linux/libcfs/libcfs_cpu.h>
 #include <linux/delay.h>
 #include <linux/random.h>
+#include <linux/vmalloc.h>
 
 #include <obd_support.h>
 #include <obd_class.h>
@@ -529,7 +530,10 @@ void ptlrpc_free_rq_pool(struct ptlrpc_request_pool *pool)
 		list_del(&req->rq_list);
 		LASSERT(req->rq_reqbuf);
 		LASSERT(req->rq_reqbuf_len == pool->prp_rq_size);
-		kvfree(req->rq_reqbuf);
+		if (is_vmalloc_addr(req->rq_reqbuf))
+			vfree_atomic(req->rq_reqbuf);
+		else
+			kfree(req->rq_reqbuf);
 		ptlrpc_request_cache_free(req);
 	}
 	kfree(pool);
diff --git a/fs/lustre/ptlrpc/sec.c b/fs/lustre/ptlrpc/sec.c
index 44c15e6..43d4f76 100644
--- a/fs/lustre/ptlrpc/sec.c
+++ b/fs/lustre/ptlrpc/sec.c
@@ -42,6 +42,7 @@
 #include <linux/cred.h>
 #include <linux/key.h>
 #include <linux/sched/task.h>
+#include <linux/vmalloc.h>
 
 #include <obd.h>
 #include <obd_class.h>
@@ -474,7 +475,10 @@ int sptlrpc_req_ctx_switch(struct ptlrpc_request *req,
 			req->rq_flvr = old_flvr;
 		}
 
-		kvfree(reqmsg);
+		if (is_vmalloc_addr(reqmsg))
+			vfree_atomic(reqmsg);
+		else
+			kfree(reqmsg);
 	}
 	return rc;
 }
@@ -836,7 +840,10 @@ void sptlrpc_request_out_callback(struct ptlrpc_request *req)
 	if (req->rq_pool || !req->rq_reqbuf)
 		return;
 
-	kvfree(req->rq_reqbuf);
+	if (is_vmalloc_addr(req->rq_reqbuf))
+		vfree_atomic(req->rq_reqbuf);
+	else
+		kfree(req->rq_reqbuf);
 	req->rq_reqbuf = NULL;
 	req->rq_reqbuf_len = 0;
 }
@@ -1133,7 +1140,10 @@ int sptlrpc_cli_unwrap_early_reply(struct ptlrpc_request *req,
 err_ctx:
 	sptlrpc_cli_ctx_put(early_req->rq_cli_ctx, 1);
 err_buf:
-	kvfree(early_buf);
+	if (is_vmalloc_addr(early_buf))
+		vfree_atomic(early_buf);
+	else
+		kfree(early_buf);
 err_req:
 	ptlrpc_request_cache_free(early_req);
 	return rc;
@@ -1151,7 +1161,10 @@ void sptlrpc_cli_finish_early_reply(struct ptlrpc_request *early_req)
 	LASSERT(early_req->rq_repmsg);
 
 	sptlrpc_cli_ctx_put(early_req->rq_cli_ctx, 1);
-	kvfree(early_req->rq_repbuf);
+	if (is_vmalloc_addr(early_req->rq_repbuf))
+		vfree_atomic(early_req->rq_repbuf);
+	else
+		kfree(early_req->rq_repbuf);
 	ptlrpc_request_cache_free(early_req);
 }
 
diff --git a/fs/lustre/ptlrpc/sec_bulk.c b/fs/lustre/ptlrpc/sec_bulk.c
index 3c3ae8b..9548721 100644
--- a/fs/lustre/ptlrpc/sec_bulk.c
+++ b/fs/lustre/ptlrpc/sec_bulk.c
@@ -37,6 +37,7 @@
 
 #define DEBUG_SUBSYSTEM S_SEC
 
+#include <linux/vmalloc.h>
 #include <linux/libcfs/libcfs.h>
 
 #include <obd.h>
@@ -380,7 +381,10 @@ static inline void enc_pools_free(void)
 	LASSERT(page_pools.epp_max_pools);
 	LASSERT(page_pools.epp_pools);
 
-	kvfree(page_pools.epp_pools);
+	if (is_vmalloc_addr(page_pools.epp_pools))
+		vfree_atomic(page_pools.epp_pools);
+	else
+		kfree(page_pools.epp_pools);
 }
 
 static struct shrinker pools_shrinker = {
diff --git a/fs/lustre/ptlrpc/sec_null.c b/fs/lustre/ptlrpc/sec_null.c
index 97c4e19..3892d6e 100644
--- a/fs/lustre/ptlrpc/sec_null.c
+++ b/fs/lustre/ptlrpc/sec_null.c
@@ -37,6 +37,7 @@
 
 #define DEBUG_SUBSYSTEM S_SEC
 
+#include <linux/vmalloc.h>
 #include <obd_support.h>
 #include <obd_cksum.h>
 #include <obd_class.h>
@@ -180,7 +181,10 @@ void null_free_reqbuf(struct ptlrpc_sec *sec,
 			 "req %p: reqlen %d should smaller than buflen %d\n",
 			 req, req->rq_reqlen, req->rq_reqbuf_len);
 
-		kvfree(req->rq_reqbuf);
+		if (is_vmalloc_addr(req->rq_reqbuf))
+			vfree_atomic(req->rq_reqbuf);
+		else
+			kfree(req->rq_reqbuf);
 		req->rq_reqbuf = NULL;
 		req->rq_reqbuf_len = 0;
 	}
@@ -210,7 +214,10 @@ void null_free_repbuf(struct ptlrpc_sec *sec,
 {
 	LASSERT(req->rq_repbuf);
 
-	kvfree(req->rq_repbuf);
+	if (is_vmalloc_addr(req->rq_repbuf))
+		vfree_atomic(req->rq_repbuf);
+	else
+		kfree(req->rq_repbuf);
 	req->rq_repbuf = NULL;
 	req->rq_repbuf_len = 0;
 }
@@ -257,7 +264,10 @@ int null_enlarge_reqbuf(struct ptlrpc_sec *sec,
 			spin_lock(&req->rq_import->imp_lock);
 		memcpy(newbuf, req->rq_reqbuf, req->rq_reqlen);
 
-		kvfree(req->rq_reqbuf);
+		if (is_vmalloc_addr(req->rq_reqbuf))
+			vfree_atomic(req->rq_reqbuf);
+		else
+			kfree(req->rq_reqbuf);
 		req->rq_reqbuf = newbuf;
 		req->rq_reqmsg = newbuf;
 		req->rq_reqbuf_len = alloc_size;
@@ -337,8 +347,12 @@ void null_free_rs(struct ptlrpc_reply_state *rs)
 	LASSERT_ATOMIC_GT(&rs->rs_svc_ctx->sc_refcount, 1);
 	atomic_dec(&rs->rs_svc_ctx->sc_refcount);
 
-	if (!rs->rs_prealloc)
-		kvfree(rs);
+	if (!rs->rs_prealloc) {
+		if (is_vmalloc_addr(rs))
+			vfree_atomic(rs);
+		else
+			kfree(rs);
+	}
 }
 
 static
diff --git a/fs/lustre/ptlrpc/sec_plain.c b/fs/lustre/ptlrpc/sec_plain.c
index b487968..80831af 100644
--- a/fs/lustre/ptlrpc/sec_plain.c
+++ b/fs/lustre/ptlrpc/sec_plain.c
@@ -38,6 +38,7 @@
 #define DEBUG_SUBSYSTEM S_SEC
 
 #include <linux/highmem.h>
+#include <linux/vmalloc.h>
 #include <obd_support.h>
 #include <obd_cksum.h>
 #include <obd_class.h>
@@ -582,7 +583,10 @@ void plain_free_reqbuf(struct ptlrpc_sec *sec,
 		       struct ptlrpc_request *req)
 {
 	if (!req->rq_pool) {
-		kvfree(req->rq_reqbuf);
+		if (is_vmalloc_addr(req->rq_reqbuf))
+			vfree_atomic(req->rq_reqbuf);
+		else
+			kfree(req->rq_reqbuf);
 		req->rq_reqbuf = NULL;
 		req->rq_reqbuf_len = 0;
 	}
@@ -623,7 +627,10 @@ int plain_alloc_repbuf(struct ptlrpc_sec *sec,
 void plain_free_repbuf(struct ptlrpc_sec *sec,
 		       struct ptlrpc_request *req)
 {
-	kvfree(req->rq_repbuf);
+	if (is_vmalloc_addr(req->rq_repbuf))
+		vfree_atomic(req->rq_repbuf);
+	else
+		kfree(req->rq_repbuf);
 	req->rq_repbuf = NULL;
 	req->rq_repbuf_len = 0;
 }
@@ -678,7 +685,10 @@ int plain_enlarge_reqbuf(struct ptlrpc_sec *sec,
 
 		memcpy(newbuf, req->rq_reqbuf, req->rq_reqbuf_len);
 
-		kvfree(req->rq_reqbuf);
+		if (is_vmalloc_addr(req->rq_reqbuf))
+			vfree_atomic(req->rq_reqbuf);
+		else
+			kfree(req->rq_reqbuf);
 		req->rq_reqbuf = newbuf;
 		req->rq_reqbuf_len = newbuf_size;
 		req->rq_reqmsg = lustre_msg_buf(req->rq_reqbuf,
@@ -823,8 +833,12 @@ void plain_free_rs(struct ptlrpc_reply_state *rs)
 	LASSERT(atomic_read(&rs->rs_svc_ctx->sc_refcount) > 1);
 	atomic_dec(&rs->rs_svc_ctx->sc_refcount);
 
-	if (!rs->rs_prealloc)
-		kvfree(rs);
+	if (!rs->rs_prealloc) {
+		if (is_vmalloc_addr(rs))
+			vfree_atomic(rs);
+		else
+			kfree(rs);
+	}
 }
 
 static
diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c
index 5881e0a..b341877 100644
--- a/fs/lustre/ptlrpc/service.c
+++ b/fs/lustre/ptlrpc/service.c
@@ -36,6 +36,7 @@
 #include <linux/kthread.h>
 #include <linux/ratelimit.h>
 #include <linux/fs_struct.h>
+#include <linux/vmalloc.h>
 
 #include <obd_support.h>
 #include <obd_class.h>
@@ -118,7 +119,10 @@ static void ptlrpc_free_rqbd(struct ptlrpc_request_buffer_desc *rqbd)
 	svcpt->scp_nrqbds_total--;
 	spin_unlock(&svcpt->scp_lock);
 
-	kvfree(rqbd->rqbd_buffer);
+	if (is_vmalloc_addr(rqbd->rqbd_buffer))
+		vfree_atomic(rqbd->rqbd_buffer);
+	else
+		kfree(rqbd->rqbd_buffer);
 	kfree(rqbd);
 }
 
@@ -838,7 +842,10 @@ static void ptlrpc_server_drop_request(struct ptlrpc_request *req)
 			    test_req_buffer_pressure) {
 				/* like in ptlrpc_free_rqbd() */
 				svcpt->scp_nrqbds_total--;
-				kvfree(rqbd->rqbd_buffer);
+				if (is_vmalloc_addr(rqbd->rqbd_buffer))
+					vfree_atomic(rqbd->rqbd_buffer);
+				else
+					kfree(rqbd->rqbd_buffer);
 				kfree(rqbd);
 			} else {
 				list_add_tail(&rqbd->rqbd_list,
@@ -1197,7 +1204,10 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
 	class_export_put(reqcopy->rq_export);
 out:
 	sptlrpc_svc_ctx_decref(reqcopy);
-	kvfree(reqmsg);
+	if (is_vmalloc_addr(reqmsg))
+		vfree_atomic(reqmsg);
+	else
+		kfree(reqmsg);
 out_free:
 	ptlrpc_request_cache_free(reqcopy);
 	return rc;
@@ -2938,7 +2948,10 @@ static void ptlrpc_wait_replies(struct ptlrpc_service_part *svcpt)
 						      struct ptlrpc_reply_state,
 						      rs_list)) != NULL) {
 			list_del(&rs->rs_list);
-			kvfree(rs);
+			if (is_vmalloc_addr(rs))
+				vfree_atomic(rs);
+			else
+				kfree(rs);
 		}
 	}
 }
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9a8227a..9a27fbd 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2355,6 +2355,7 @@ void vfree_atomic(const void *addr)
 		return;
 	__vfree_deferred(addr);
 }
+EXPORT_SYMBOL(vfree_atomic);
 
 static void __vfree(const void *addr)
 {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 11/39] lnet: lnd: Use NETWORK_TIMEOUT for txs on ibp_tx_queue
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (9 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 10/39] lustre: Use vfree_atomic instead of vfree James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 12/39] lnet: lnd: Use NETWORK_TIMEOUT for some conn failures James Simmons
                   ` (27 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

TXs on the ibp_tx_queue are waiting for a connection to be
established. Failure to establish a connection could be due to a
problem with either the local NI or the remote NI, and o2iblnd cannot
currently distinguish between these failures. As such, it should
return LNET_MSG_STATUS_NETWORK_TIMEOUT to LNet so that LNet will
decrement the health value of both the local NI and the remote NI and
future sends can take these health values into account.

HPE-bug-id: LUS-9342
WC-bug-id: https://jira.whamcloud.com/browse/LU-13571
Lustre-commit: 7af63191370fd2 ("LU-13571 lnd: Use NETWORK_TIMEOUT for txs on ibp_tx_queue")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39899
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 3d7026b..9766aa2 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -3308,7 +3308,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 	if (!list_empty(&timedout_txs))
 		kiblnd_txlist_done(&timedout_txs, -ETIMEDOUT,
-				   LNET_MSG_STATUS_LOCAL_TIMEOUT);
+				   LNET_MSG_STATUS_REMOTE_TIMEOUT);
 
 	/*
 	 * Handle timeout by closing the whole
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 12/39] lnet: lnd: Use NETWORK_TIMEOUT for some conn failures
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (10 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 11/39] lnet: lnd: Use NETWORK_TIMEOUT for txs on ibp_tx_queue James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 13/39] lustre: llite: allow DIO with unaligned IO count James Simmons
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

For -EHOSTUNREACH and -ETIMEDOUT we cannot tell whether the
connection failure was due to a problem with the remote or local NI,
so we should return the LNET_MSG_STATUS_NETWORK_TIMEOUT to LNet in
these cases.

HPE-bug-id: LUS-9342
WC-bug-id: https://jira.whamcloud.com/browse/LU-13571
Lustre-commit: 12333c1fecc00e ("LU-13571 lnd: Use NETWORK_TIMEOUT for some conn failures")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39900
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 9766aa2..20d555f 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -2143,8 +2143,12 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	CNETERR("Deleting messages for %s: connection failed\n",
 		libcfs_nid2str(peer_ni->ibp_nid));
 
-	kiblnd_txlist_done(&zombies, error,
-			   LNET_MSG_STATUS_LOCAL_DROPPED);
+	if (error == -EHOSTUNREACH || error == -ETIMEDOUT)
+		kiblnd_txlist_done(&zombies, error,
+				   LNET_MSG_STATUS_NETWORK_TIMEOUT);
+	else
+		kiblnd_txlist_done(&zombies, error,
+				   LNET_MSG_STATUS_LOCAL_DROPPED);
 }
 
 static void
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 13/39] lustre: llite: allow DIO with unaligned IO count
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (11 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 12/39] lnet: lnd: Use NETWORK_TIMEOUT for some conn failures James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 14/39] lustre: osc: skip 0 row for rpc_stats James Simmons
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

DIO only require user buffer page aligned and
IO offset page aligned, it is ok that io count is
not page aligned, remove this unnecessary limit
so that we could use DIO with file not aligned
with PAGE SIZE.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14043
Lustre-commit: 45c46c6effd827 ("LU-14043 llite: allow DIO with unaligned IO count")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/40392
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/rw26.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index 605a326..28c0a75 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -181,6 +181,35 @@ static ssize_t ll_get_user_pages(int rw, struct iov_iter *iter,
 	return result;
 }
 
+/*
+ * Lustre could relax a bit for alignment, io count is not
+ * necessary page alignment.
+ */
+static unsigned long ll_iov_iter_alignment(struct iov_iter *i)
+{
+	size_t orig_size = i->count;
+	size_t count = orig_size & ~PAGE_MASK;
+	unsigned long res;
+
+	if (!count)
+		return iov_iter_alignment(i);
+
+	if (orig_size > PAGE_SIZE) {
+		iov_iter_truncate(i, orig_size - count);
+		res = iov_iter_alignment(i);
+		iov_iter_reexpand(i, orig_size);
+
+		return res;
+	}
+
+	res = iov_iter_alignment(i);
+	/* start address is page aligned */
+	if ((res & ~PAGE_MASK) == orig_size)
+		return PAGE_SIZE;
+
+	return res;
+}
+
 /* direct IO pages */
 struct ll_dio_pages {
 	struct cl_dio_aio	*ldp_aio;
@@ -325,7 +354,7 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		return 0;
 
 	/* FIXME: io smaller than PAGE_SIZE is broken on ia64 ??? */
-	if ((file_offset & ~PAGE_MASK) || (count & ~PAGE_MASK))
+	if (file_offset & ~PAGE_MASK)
 		return -EINVAL;
 
 	CDEBUG(D_VFSTRACE,
@@ -335,7 +364,7 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	       MAX_DIO_SIZE >> PAGE_SHIFT);
 
 	/* Check that all user buffers are aligned as well */
-	if (iov_iter_alignment(iter) & ~PAGE_MASK)
+	if (ll_iov_iter_alignment(iter) & ~PAGE_MASK)
 		return -EINVAL;
 
 	lcc = ll_cl_find(file);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 14/39] lustre: osc: skip 0 row for rpc_stats
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (12 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 13/39] lustre: llite: allow DIO with unaligned IO count James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 15/39] lustre: quota: df should return projid-specific values James Simmons
                   ` (24 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Yang Sheng, Lustre Development List

From: Yang Sheng <ys@whamcloud.com>

Fix the rpc_stats statistic it should not print
0 row as it makes nosense.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14130
Lustre-commit: 596f74c122f5ed ("LU-14130 osc: skip 0 row for rpc_stats")
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40613
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/lproc_mdc.c | 2 +-
 fs/lustre/osc/lproc_osc.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index 662be42..ce03999 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -383,7 +383,7 @@ static int mdc_rpc_stats_seq_show(struct seq_file *seq, void *v)
 
 	read_cum = 0;
 	write_cum = 0;
-	for (i = 0; i < OBD_HIST_MAX; i++) {
+	for (i = 1; i < OBD_HIST_MAX; i++) {
 		unsigned long r = cli->cl_read_rpc_hist.oh_buckets[i];
 		unsigned long w = cli->cl_write_rpc_hist.oh_buckets[i];
 
diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c
index 7ea9530..89b55c3 100644
--- a/fs/lustre/osc/lproc_osc.c
+++ b/fs/lustre/osc/lproc_osc.c
@@ -808,7 +808,7 @@ static int osc_rpc_stats_seq_show(struct seq_file *seq, void *v)
 
 	read_cum = 0;
 	write_cum = 0;
-	for (i = 0; i < OBD_HIST_MAX; i++) {
+	for (i = 1; i < OBD_HIST_MAX; i++) {
 		unsigned long r = cli->cl_read_rpc_hist.oh_buckets[i];
 		unsigned long w = cli->cl_write_rpc_hist.oh_buckets[i];
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 15/39] lustre: quota: df should return projid-specific values
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (13 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 14/39] lustre: osc: skip 0 row for rpc_stats James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 16/39] lnet: discard the callback James Simmons
                   ` (23 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

With local ext4 and XFS filesystems, it is possible to use
"df /path/to/directory" (statfs()) to return the current
project quota usage for that directory as "used", and
min(projid quota limit, free space) as "total".

statfs() is a natural interface for users/applications, since
it represents the used/maximum space for that subdirectory.
Otherwise, the user will get EDQUOT back when the project
quota runs out for that directory and applications will not
be able to figure out how much data they could write into
that directory.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9555
Lustre-commit: e5c8f6670fbeea ("LU-9555 quota: df should return projid-specific values")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/36685
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c                   |  2 +-
 fs/lustre/llite/llite_internal.h        |  1 +
 fs/lustre/llite/llite_lib.c             | 49 +++++++++++++++++++++++++++++++++
 include/uapi/linux/lustre/lustre_user.h |  6 ++--
 4 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 6bc95d9..db620ce 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -1079,7 +1079,7 @@ static int check_owner(int type, int id)
 	return 0;
 }
 
-static int quotactl_ioctl(struct ll_sb_info *sbi, struct if_quotactl *qctl)
+int quotactl_ioctl(struct ll_sb_info *sbi, struct if_quotactl *qctl)
 {
 	int cmd = qctl->qc_cmd;
 	int type = qctl->qc_type;
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 9d988aac..bad974f 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -996,6 +996,7 @@ int ll_dir_read(struct inode *inode, u64 *ppos, struct md_op_data *op_data,
 struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
 			     u64 offset);
 void ll_release_page(struct inode *inode, struct page *page, bool remove);
+int quotactl_ioctl(struct ll_sb_info *sbi, struct if_quotactl *qctl);
 
 enum get_default_layout_type {
 	GET_DEFAULT_LAYOUT_ROOT = 1,
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index e4036af..34bd661 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2137,6 +2137,53 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs,
 	return rc;
 }
 
+static int ll_statfs_project(struct inode *inode, struct kstatfs *sfs)
+{
+	struct if_quotactl qctl = {
+		.qc_cmd = LUSTRE_Q_GETQUOTA,
+		.qc_type = PRJQUOTA,
+		.qc_valid = QC_GENERAL,
+	};
+	u64 limit, curblock;
+	int ret;
+
+	qctl.qc_id = ll_i2info(inode)->lli_projid;
+	ret = quotactl_ioctl(ll_i2sbi(inode), &qctl);
+	if (ret) {
+		/* ignore errors if project ID does not have
+		 * a quota limit or feature unsupported.
+		 */
+		if (ret == -ESRCH || ret == -EOPNOTSUPP)
+			ret = 0;
+		return ret;
+	}
+
+	limit = ((qctl.qc_dqblk.dqb_bsoftlimit ?
+		 qctl.qc_dqblk.dqb_bsoftlimit :
+		 qctl.qc_dqblk.dqb_bhardlimit) * 1024) / sfs->f_bsize;
+	if (limit && sfs->f_blocks > limit) {
+		curblock = (qctl.qc_dqblk.dqb_curspace +
+			    sfs->f_bsize - 1) / sfs->f_bsize;
+		sfs->f_blocks = limit;
+		sfs->f_bavail =
+			(sfs->f_blocks > curblock) ?
+			(sfs->f_blocks - curblock) : 0;
+		sfs->f_bfree = sfs->f_bavail;
+	}
+
+	limit = qctl.qc_dqblk.dqb_isoftlimit ?
+		qctl.qc_dqblk.dqb_isoftlimit :
+		qctl.qc_dqblk.dqb_ihardlimit;
+	if (limit && sfs->f_files > limit) {
+		sfs->f_files = limit;
+		sfs->f_ffree = (sfs->f_files >
+			qctl.qc_dqblk.dqb_curinodes) ?
+			(sfs->f_files - qctl.qc_dqblk.dqb_curinodes) : 0;
+	}
+
+	return 0;
+}
+
 int ll_statfs(struct dentry *de, struct kstatfs *sfs)
 {
 	struct super_block *sb = de->d_sb;
@@ -2174,6 +2221,8 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs)
 	sfs->f_bavail = osfs.os_bavail;
 	sfs->f_fsid.val[0] = (u32)fsid;
 	sfs->f_fsid.val[1] = (u32)(fsid >> 32);
+	if (ll_i2info(de->d_inode)->lli_projid)
+		return ll_statfs_project(de->d_inode, sfs);
 
 	ll_stats_ops_tally(ll_s2sbi(sb), LPROC_LL_STATFS,
 			   ktime_us_delta(ktime_get(), kstart));
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 143b7d5..62c6952 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -1043,9 +1043,9 @@ struct obd_dqinfo {
 
 /* XXX: same as if_dqblk struct in kernel, plus one padding */
 struct obd_dqblk {
-	__u64 dqb_bhardlimit;
-	__u64 dqb_bsoftlimit;
-	__u64 dqb_curspace;
+	__u64 dqb_bhardlimit;	/* kbytes unit */
+	__u64 dqb_bsoftlimit;	/* kbytes unit */
+	__u64 dqb_curspace;	/* bytes unit */
 	__u64 dqb_ihardlimit;
 	__u64 dqb_isoftlimit;
 	__u64 dqb_curinodes;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 16/39] lnet: discard the callback
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (14 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 15/39] lustre: quota: df should return projid-specific values James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 17/39] lustre: llite: try to improve mmap performance James Simmons
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Yang Sheng, Lustre Development List

From: Yang Sheng <ys@whamcloud.com>

Lustre need a completion callback for event that request
has been sent. And then need other callback when reply
arrived. Sometime the request completion callback maybe
lost by some reason even reply has been received.
system will wait forever even timeout. We needn't to wait
request completion in such case. So provide a way to
discard the callback.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13368
Lustre-commit: babf0232273467 ("LU-13368 lnet: discard the callback")
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38845
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_net.h      | 15 +++++++++-
 fs/lustre/ptlrpc/client.c           | 15 ++++++----
 fs/lustre/ptlrpc/niobuf.c           |  7 +++--
 include/linux/lnet/api.h            |  3 +-
 include/linux/lnet/lib-lnet.h       |  1 +
 include/linux/lnet/lib-types.h      |  1 +
 net/lnet/klnds/o2iblnd/o2iblnd.c    |  1 +
 net/lnet/klnds/o2iblnd/o2iblnd.h    |  4 +++
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 58 +++++++++++++++++++++++++++++++++++--
 net/lnet/lnet/lib-md.c              | 25 ++++++++++++++--
 10 files changed, 117 insertions(+), 13 deletions(-)

diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index 61be05c..f16c935 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -2225,8 +2225,10 @@ static inline int ptlrpc_status_ntoh(int n)
 	return req->rq_receiving_reply;
 }
 
+#define ptlrpc_cli_wait_unlink(req) __ptlrpc_cli_wait_unlink(req, NULL)
+
 static inline int
-ptlrpc_client_recv_or_unlink(struct ptlrpc_request *req)
+__ptlrpc_cli_wait_unlink(struct ptlrpc_request *req, bool *discard)
 {
 	int rc;
 
@@ -2239,6 +2241,17 @@ static inline int ptlrpc_status_ntoh(int n)
 		spin_unlock(&req->rq_lock);
 		return 1;
 	}
+
+	if (discard) {
+		*discard = false;
+		if (req->rq_reply_unlinked && req->rq_req_unlinked == 0) {
+			*discard = true;
+			spin_unlock(&req->rq_lock);
+			return 1; /* Should call again after LNetMDUnlink */
+		}
+	}
+
+
 	rc = !req->rq_req_unlinked || !req->rq_reply_unlinked ||
 	     req->rq_receiving_reply;
 	spin_unlock(&req->rq_lock);
diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index 4b8aa25..cec4da99 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -1783,7 +1783,7 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
 			 * not corrupt any data.
 			 */
 			if (req->rq_phase == RQ_PHASE_UNREG_RPC &&
-			    ptlrpc_client_recv_or_unlink(req))
+			    ptlrpc_cli_wait_unlink(req))
 				continue;
 			if (req->rq_phase == RQ_PHASE_UNREG_BULK &&
 			    ptlrpc_client_bulk_active(req))
@@ -1821,7 +1821,7 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
 			ptlrpc_expire_one_request(req, 1);
 
 			/* Check if we still need to wait for unlink. */
-			if (ptlrpc_client_recv_or_unlink(req) ||
+			if (ptlrpc_cli_wait_unlink(req) ||
 			    ptlrpc_client_bulk_active(req))
 				continue;
 			/* If there is no need to resend, fail it now. */
@@ -2599,6 +2599,8 @@ u64 ptlrpc_req_xid(struct ptlrpc_request *request)
  */
 static int ptlrpc_unregister_reply(struct ptlrpc_request *request, int async)
 {
+	bool discard = false;
+
 	/* Might sleep. */
 	LASSERT(!in_interrupt());
 
@@ -2609,13 +2611,16 @@ static int ptlrpc_unregister_reply(struct ptlrpc_request *request, int async)
 					     PTLRPC_REQ_LONG_UNLINK;
 
 	/* Nothing left to do. */
-	if (!ptlrpc_client_recv_or_unlink(request))
+	if (!__ptlrpc_cli_wait_unlink(request, &discard))
 		return 1;
 
 	LNetMDUnlink(request->rq_reply_md_h);
 
+	if (discard) /* Discard the request-out callback */
+		__LNetMDUnlink(request->rq_req_md_h, discard);
+
 	/* Let's check it once again. */
-	if (!ptlrpc_client_recv_or_unlink(request))
+	if (!ptlrpc_cli_wait_unlink(request))
 		return 1;
 
 	/* Move to "Unregistering" phase as reply was not unlinked yet. */
@@ -2636,7 +2641,7 @@ static int ptlrpc_unregister_reply(struct ptlrpc_request *request, int async)
 		 */
 		while (seconds > PTLRPC_REQ_LONG_UNLINK &&
 		       (wait_event_idle_timeout(*wq,
-						!ptlrpc_client_recv_or_unlink(request),
+						!ptlrpc_cli_wait_unlink(request),
 						HZ)) == 0)
 			seconds -= 1;
 		if (seconds > 0) {
diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c
index a1e6581..5ae7dd1 100644
--- a/fs/lustre/ptlrpc/niobuf.c
+++ b/fs/lustre/ptlrpc/niobuf.c
@@ -103,12 +103,15 @@ static int ptl_send_buf(struct lnet_handle_md *mdh, void *base, int len,
 	return 0;
 }
 
-static void mdunlink_iterate_helper(struct lnet_handle_md *bd_mds, int count)
+#define mdunlink_iterate_helper(mds, count) \
+		__mdunlink_iterate_helper(mds, count, false)
+static void __mdunlink_iterate_helper(struct lnet_handle_md *bd_mds,
+				      int count, bool discard)
 {
 	int i;
 
 	for (i = 0; i < count; i++)
-		LNetMDUnlink(bd_mds[i]);
+		__LNetMDUnlink(bd_mds[i], discard);
 }
 
 /**
diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h
index 064c92e..891c4a6 100644
--- a/include/linux/lnet/api.h
+++ b/include/linux/lnet/api.h
@@ -125,7 +125,8 @@ int LNetMDBind(const struct lnet_md *md_in,
 	       enum lnet_unlink unlink_in,
 	       struct lnet_handle_md *md_handle_out);
 
-int LNetMDUnlink(struct lnet_handle_md md_in);
+int __LNetMDUnlink(struct lnet_handle_md md_in, bool discard);
+#define LNetMDUnlink(handle) __LNetMDUnlink(handle, false)
 
 void lnet_assert_handler_unused(lnet_handler_t handler);
 /** @} lnet_md */
diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 6253c16..d349f06 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -625,6 +625,7 @@ void lnet_set_reply_msg_len(struct lnet_ni *ni, struct lnet_msg *msg,
 void lnet_detach_rsp_tracker(struct lnet_libmd *md, int cpt);
 void lnet_clean_zombie_rstqs(void);
 
+bool lnet_md_discarded(struct lnet_libmd *md);
 void lnet_finalize(struct lnet_msg *msg, int rc);
 bool lnet_send_error_simulation(struct lnet_msg *msg,
 				enum lnet_msg_hstatus *hstatus);
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index aaf2a46..7c9d7e2 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -222,6 +222,7 @@ struct lnet_libmd {
  * call.
  */
 #define LNET_MD_FLAG_HANDLING		BIT(3)
+#define LNET_MD_FLAG_DISCARD		BIT(4)
 
 struct lnet_test_peer {
 	/* info about peers we are trying to fail */
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index c6a077b..9c65524 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -2732,6 +2732,7 @@ static int kiblnd_base_startup(struct net *ns)
 
 	spin_lock_init(&kiblnd_data.kib_connd_lock);
 	INIT_LIST_HEAD(&kiblnd_data.kib_connd_conns);
+	INIT_LIST_HEAD(&kiblnd_data.kib_connd_waits);
 	INIT_LIST_HEAD(&kiblnd_data.kib_connd_zombies);
 	INIT_LIST_HEAD(&kiblnd_data.kib_reconn_list);
 	INIT_LIST_HEAD(&kiblnd_data.kib_reconn_wait);
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 2b8d5ff..1fc68e1 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -360,6 +360,8 @@ struct kib_data {
 	struct list_head	kib_reconn_list;
 	/* peers wait for reconnection */
 	struct list_head	kib_reconn_wait;
+	/* connections wait for completion */
+	struct list_head	kib_connd_waits;
 	/*
 	 * The second that peers are pulled out from @kib_reconn_wait
 	 * for reconnection.
@@ -567,6 +569,8 @@ struct kib_conn {
 	u16			ibc_queue_depth;
 	/* connections max frags */
 	u16			ibc_max_frags;
+	/* count of timeout txs waiting on cq */
+	u16			ibc_waits;
 	unsigned int		ibc_nrx:16;	/* receive buffers owned */
 	unsigned int		ibc_scheduled:1;/* scheduled for attention */
 	unsigned int		ibc_ready:1;	/* CQ callback fired */
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 20d555f..5cd367e5 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -2052,6 +2052,10 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		if (!tx->tx_sending) {
 			tx->tx_queued = 0;
 			list_move(&tx->tx_list, &zombies);
+		} else {
+			/* keep tx until cq destroy */
+			list_move(&tx->tx_list, &conn->ibc_zombie_txs);
+			conn->ibc_waits++;
 		}
 	}
 
@@ -2065,6 +2069,31 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	kiblnd_txlist_done(&zombies, -ECONNABORTED, LNET_MSG_STATUS_OK);
 }
 
+static int
+kiblnd_tx_may_discard(struct kib_conn *conn)
+{
+	int rc = 0;
+	struct kib_tx *nxt;
+	struct kib_tx *tx;
+
+	spin_lock(&conn->ibc_lock);
+
+	list_for_each_entry_safe(tx, nxt, &conn->ibc_zombie_txs, tx_list) {
+		if (tx->tx_sending > 0 && tx->tx_lntmsg[0] &&
+		    lnet_md_discarded(tx->tx_lntmsg[0]->msg_md)) {
+			tx->tx_sending--;
+			if (tx->tx_sending == 0) {
+				kiblnd_conn_decref(tx->tx_conn);
+				tx->tx_conn = NULL;
+				rc = 1;
+			}
+		}
+	}
+
+	spin_unlock(&conn->ibc_lock);
+	return rc;
+}
+
 static void
 kiblnd_finalise_conn(struct kib_conn *conn)
 {
@@ -3221,8 +3250,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		}
 
 		if (ktime_compare(ktime_get(), tx->tx_deadline) >= 0) {
-			CERROR("Timed out tx: %s, %lld seconds\n",
+			CERROR("Timed out tx: %s(WSQ:%d%d%d), %lld seconds\n",
 			       kiblnd_queue2str(conn, txs),
+			       tx->tx_waiting, tx->tx_sending, tx->tx_queued,
 			       kiblnd_timeout() +
 			       ktime_ms_delta(ktime_get(),
 					      tx->tx_deadline) / MSEC_PER_SEC);
@@ -3426,15 +3456,23 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		conn = list_first_entry_or_null(&kiblnd_data.kib_connd_conns,
 						struct kib_conn, ibc_list);
 		if (conn) {
+			int wait;
+
 			list_del(&conn->ibc_list);
 
 			spin_unlock_irqrestore(lock, flags);
 			dropped_lock = 1;
 
 			kiblnd_disconnect_conn(conn);
-			kiblnd_conn_decref(conn);
+			wait = conn->ibc_waits;
+			if (wait == 0) /* keep ref for connd_wait, see below */
+				kiblnd_conn_decref(conn);
 
 			spin_lock_irqsave(lock, flags);
+
+			if (wait)
+				list_add_tail(&conn->ibc_list,
+					      &kiblnd_data.kib_connd_waits);
 		}
 
 		while (reconn < KIB_RECONN_BREAK) {
@@ -3462,6 +3500,22 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			spin_lock_irqsave(lock, flags);
 		}
 
+		conn = list_first_entry_or_null(&kiblnd_data.kib_connd_conns,
+						struct kib_conn, ibc_list);
+		if (conn) {
+			list_del(&conn->ibc_list);
+			spin_unlock_irqrestore(lock, flags);
+
+			dropped_lock = kiblnd_tx_may_discard(conn);
+			if (dropped_lock)
+				kiblnd_conn_decref(conn);
+
+			spin_lock_irqsave(lock, flags);
+			if (dropped_lock == 0)
+				list_add_tail(&conn->ibc_list,
+					      &kiblnd_data.kib_connd_waits);
+		}
+
 		/* careful with the jiffy wrap... */
 		timeout = (int)(deadline - jiffies);
 		if (timeout <= 0) {
diff --git a/net/lnet/lnet/lib-md.c b/net/lnet/lnet/lib-md.c
index 203c794..b3f758c 100644
--- a/net/lnet/lnet/lib-md.c
+++ b/net/lnet/lnet/lib-md.c
@@ -465,7 +465,7 @@ void lnet_assert_handler_unused(lnet_handler_t handler)
  *		-ENOENT If @mdh does not point to a valid MD object.
  */
 int
-LNetMDUnlink(struct lnet_handle_md mdh)
+__LNetMDUnlink(struct lnet_handle_md mdh, bool discard)
 {
 	struct lnet_event ev;
 	struct lnet_libmd *md = NULL;
@@ -502,6 +502,9 @@ void lnet_assert_handler_unused(lnet_handler_t handler)
 		handler = md->md_handler;
 	}
 
+	if (discard)
+		md->md_flags |= LNET_MD_FLAG_DISCARD;
+
 	if (md->md_rspt_ptr)
 		lnet_detach_rsp_tracker(md, cpt);
 
@@ -514,4 +517,22 @@ void lnet_assert_handler_unused(lnet_handler_t handler)
 
 	return 0;
 }
-EXPORT_SYMBOL(LNetMDUnlink);
+EXPORT_SYMBOL(__LNetMDUnlink);
+
+bool
+lnet_md_discarded(struct lnet_libmd *md)
+{
+	bool rc;
+	int cpt;
+
+	if (!md)
+		return false;
+
+	cpt = lnet_cpt_of_cookie(md->md_lh.lh_cookie);
+	lnet_res_lock(cpt);
+	rc = md->md_flags & LNET_MD_FLAG_DISCARD;
+	lnet_res_unlock(cpt);
+
+	return rc;
+}
+EXPORT_SYMBOL(lnet_md_discarded);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 17/39] lustre: llite: try to improve mmap performance
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (15 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 16/39] lnet: discard the callback James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 18/39] lnet: Introduce lnet_recovery_limit parameter James Simmons
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

We have observed slow mmap read performances for some
applications. The problem is if access pattern is neither
sequential nor stride, but could be still adjacent in a
small range and then seek a random position.

So the pattern could be something like this:

[1M data] [hole] [0.5M data] [hole] [0.7M data] [1M data]

Every time an application reads mmap data, it may not only
read a single 4KB page, but aslo a cluster of nearby pages in
a range(e.g. 1MB) of the first page after a cache miss.

The readahead engine is modified to track the range size of
a cluster of mmap reads, so that after a seek and/or cache miss,
the range size is used to efficiently prefetch multiple pages
in a single RPC rather than many small RPCs.

Benchmark:
fio --name=randread --directory=/ai400/fio --rw=randread
--ioengine=mmap --bs=128K --numjobs=32 --filesize=200G
--filename=randread --time_based --status-interval=10s
--runtime=30s --allow_file_create=1 --group_reporting
--disable_lat=1 --disable_clat=1 --disable_slat=1
--disk_util=0 --aux-path=/tmp --randrepeat=0
--unique_filename=0 --fallocate=0

               |   master  |   patched  |  speedup  |
---------------+-----------+------------+-----------+
page_fault_avg |   512usec |    52usec  |  9.75x
page_fault_max |  37698usec|    6543usec|  5.76x

WC-bug-id: https://jira.whamcloud.com/browse/LU-13669
Lustre-commit: 0c5ad4b6df5bf3 ("LU-13669 llite: try to improve mmap performance")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/38916
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_internal.h |  18 +++++
 fs/lustre/llite/llite_lib.c      |   1 +
 fs/lustre/llite/lproc_llite.c    |  47 +++++++++++++
 fs/lustre/llite/rw.c             | 142 +++++++++++++++++++++++++++++++++++----
 4 files changed, 196 insertions(+), 12 deletions(-)

diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index bad974f..797dfea 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -482,6 +482,12 @@ static inline struct pcc_inode *ll_i2pcci(struct inode *inode)
 /* default read-ahead full files smaller than limit on the second read */
 #define SBI_DEFAULT_READ_AHEAD_WHOLE_MAX	MiB_TO_PAGES(2UL)
 
+/* default range pages */
+#define SBI_DEFAULT_RA_RANGE_PAGES		MiB_TO_PAGES(1ULL)
+
+/* Min range pages */
+#define RA_MIN_MMAP_RANGE_PAGES			16UL
+
 enum ra_stat {
 	RA_STAT_HIT = 0,
 	RA_STAT_MISS,
@@ -498,6 +504,7 @@ enum ra_stat {
 	RA_STAT_FAILED_REACH_END,
 	RA_STAT_ASYNC,
 	RA_STAT_FAILED_FAST_READ,
+	RA_STAT_MMAP_RANGE_READ,
 	_NR_RA_STAT,
 };
 
@@ -505,6 +512,7 @@ struct ll_ra_info {
 	atomic_t	      ra_cur_pages;
 	unsigned long	     ra_max_pages;
 	unsigned long	     ra_max_pages_per_file;
+	unsigned long		ra_range_pages;
 	unsigned long	     ra_max_read_ahead_whole_pages;
 	struct workqueue_struct  *ll_readahead_wq;
 	/*
@@ -790,6 +798,16 @@ struct ll_readahead_state {
 	 */
 	pgoff_t		ras_window_start_idx;
 	pgoff_t		ras_window_pages;
+
+	/* Page index where min range read starts */
+	pgoff_t		ras_range_min_start_idx;
+	/* Page index where mmap range read ends */
+	pgoff_t		ras_range_max_end_idx;
+	/* number of mmap pages where last time detected */
+	pgoff_t		ras_last_range_pages;
+	/* number of mmap range requests */
+	pgoff_t		ras_range_requests;
+
 	/*
 	 * Optimal RPC size in pages.
 	 * It decides how many pages will be sent for each read-ahead.
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 34bd661..c560492 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -130,6 +130,7 @@ static struct ll_sb_info *ll_init_sbi(void)
 		    SBI_DEFAULT_READ_AHEAD_PER_FILE_MAX);
 	sbi->ll_ra_info.ra_async_pages_per_file_threshold =
 				sbi->ll_ra_info.ra_max_pages_per_file;
+	sbi->ll_ra_info.ra_range_pages = SBI_DEFAULT_RA_RANGE_PAGES;
 	sbi->ll_ra_info.ra_max_read_ahead_whole_pages = -1;
 	atomic_set(&sbi->ll_ra_info.ra_async_inflight, 0);
 
diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index 9b1c392..5d1e2f4 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -1173,6 +1173,51 @@ static ssize_t read_ahead_async_file_threshold_mb_show(struct kobject *kobj,
 }
 LUSTRE_RW_ATTR(read_ahead_async_file_threshold_mb);
 
+static ssize_t read_ahead_range_kb_show(struct kobject *kobj,
+					struct attribute *attr, char *buf)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%lu\n",
+			 sbi->ll_ra_info.ra_range_pages << (PAGE_SHIFT - 10));
+}
+
+static ssize_t
+read_ahead_range_kb_store(struct kobject *kobj,
+			  struct attribute *attr,
+			  const char *buffer, size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+	unsigned long pages_number;
+	unsigned long max_ra_per_file;
+	u64 val;
+	int rc;
+
+	rc = sysfs_memparse(buffer, count, &val, "KiB");
+	if (rc < 0)
+		return rc;
+
+	pages_number = val >> PAGE_SHIFT;
+	/* Disable mmap range read */
+	if (pages_number == 0)
+		goto out;
+
+	max_ra_per_file = sbi->ll_ra_info.ra_max_pages_per_file;
+	if (pages_number > max_ra_per_file ||
+	    pages_number < RA_MIN_MMAP_RANGE_PAGES)
+		return -ERANGE;
+
+out:
+	spin_lock(&sbi->ll_lock);
+	sbi->ll_ra_info.ra_range_pages = pages_number;
+	spin_unlock(&sbi->ll_lock);
+
+	return count;
+}
+LUSTRE_RW_ATTR(read_ahead_range_kb);
+
 static ssize_t fast_read_show(struct kobject *kobj,
 			      struct attribute *attr,
 			      char *buf)
@@ -1506,6 +1551,7 @@ struct ldebugfs_vars lprocfs_llite_obd_vars[] = {
 	&lustre_attr_max_read_ahead_mb.attr,
 	&lustre_attr_max_read_ahead_per_file_mb.attr,
 	&lustre_attr_max_read_ahead_whole_mb.attr,
+	&lustre_attr_read_ahead_range_kb.attr,
 	&lustre_attr_checksums.attr,
 	&lustre_attr_checksum_pages.attr,
 	&lustre_attr_stats_track_pid.attr,
@@ -1622,6 +1668,7 @@ void ll_stats_ops_tally(struct ll_sb_info *sbi, int op, long count)
 	[RA_STAT_FAILED_REACH_END]	= "failed to reach end",
 	[RA_STAT_ASYNC]			= "async readahead",
 	[RA_STAT_FAILED_FAST_READ]	= "failed to fast read",
+	[RA_STAT_MMAP_RANGE_READ]	= "mmap range read",
 };
 
 int ll_debugfs_register_super(struct super_block *sb, const char *name)
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index da4a26d..096e015 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -388,7 +388,7 @@ static bool ras_inside_ra_window(pgoff_t idx, struct ra_io_arg *ria)
 static unsigned long
 ll_read_ahead_pages(const struct lu_env *env, struct cl_io *io,
 		    struct cl_page_list *queue, struct ll_readahead_state *ras,
-		    struct ra_io_arg *ria, pgoff_t *ra_end)
+		    struct ra_io_arg *ria, pgoff_t *ra_end, pgoff_t skip_index)
 {
 	struct cl_read_ahead ra = { 0 };
 	pgoff_t page_idx;
@@ -402,6 +402,8 @@ static bool ras_inside_ra_window(pgoff_t idx, struct ra_io_arg *ria)
 	for (page_idx = ria->ria_start_idx;
 	     page_idx <= ria->ria_end_idx && ria->ria_reserved > 0;
 	     page_idx++) {
+		if (skip_index && page_idx == skip_index)
+			continue;
 		if (ras_inside_ra_window(page_idx, ria)) {
 			if (!ra.cra_end_idx || ra.cra_end_idx < page_idx) {
 				pgoff_t end_idx;
@@ -447,10 +449,12 @@ static bool ras_inside_ra_window(pgoff_t idx, struct ra_io_arg *ria)
 				if (ras->ras_rpc_pages != ra.cra_rpc_pages &&
 				    ra.cra_rpc_pages > 0)
 					ras->ras_rpc_pages = ra.cra_rpc_pages;
-				/* trim it to align with optimal RPC size */
-				end_idx = ras_align(ras, ria->ria_end_idx + 1);
-				if (end_idx > 0 && !ria->ria_eof)
-					ria->ria_end_idx = end_idx - 1;
+				if (!skip_index) {
+					/* trim it to align with optimal RPC size */
+					end_idx = ras_align(ras, ria->ria_end_idx + 1);
+					if (end_idx > 0 && !ria->ria_eof)
+						ria->ria_end_idx = end_idx - 1;
+				}
 				if (ria->ria_end_idx < ria->ria_end_idx_min)
 					ria->ria_end_idx = ria->ria_end_idx_min;
 			}
@@ -650,7 +654,7 @@ static void ll_readahead_handle_work(struct work_struct *wq)
 	cl_2queue_init(queue);
 
 	rc = ll_read_ahead_pages(env, io, &queue->c2_qin, ras, ria,
-				 &ra_end_idx);
+				 &ra_end_idx, 0);
 	if (ria->ria_reserved != 0)
 		ll_ra_count_put(ll_i2sbi(inode), ria->ria_reserved);
 	if (queue->c2_qin.pl_nr > 0) {
@@ -688,7 +692,7 @@ static void ll_readahead_handle_work(struct work_struct *wq)
 static int ll_readahead(const struct lu_env *env, struct cl_io *io,
 			struct cl_page_list *queue,
 			struct ll_readahead_state *ras, bool hit,
-			struct file *file)
+			struct file *file, pgoff_t skip_index)
 {
 	struct vvp_io *vio = vvp_env_io(env);
 	struct ll_thread_info *lti = ll_env_info(env);
@@ -731,6 +735,9 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	if (ras->ras_window_pages > 0)
 		end_idx = ras->ras_window_start_idx + ras->ras_window_pages - 1;
 
+	if (skip_index)
+		end_idx = start_idx + ras->ras_window_pages - 1;
+
 	/* Enlarge the RA window to encompass the full read */
 	if (vio->vui_ra_valid &&
 	    end_idx < vio->vui_ra_start_idx + vio->vui_ra_pages - 1)
@@ -783,6 +790,10 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io,
 			    ria->ria_start_idx;
 	}
 
+	/* don't over reserved for mmap range read */
+	if (skip_index)
+		pages_min = 0;
+
 	ria->ria_reserved = ll_ra_count_get(ll_i2sbi(inode), ria, pages,
 					    pages_min);
 	if (ria->ria_reserved < pages)
@@ -793,8 +804,8 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	       atomic_read(&ll_i2sbi(inode)->ll_ra_info.ra_cur_pages),
 	       ll_i2sbi(inode)->ll_ra_info.ra_max_pages);
 
-	ret = ll_read_ahead_pages(env, io, queue, ras, ria, &ra_end_idx);
-
+	ret = ll_read_ahead_pages(env, io, queue, ras, ria, &ra_end_idx,
+				  skip_index);
 	if (ria->ria_reserved)
 		ll_ra_count_put(ll_i2sbi(inode), ria->ria_reserved);
 
@@ -890,6 +901,10 @@ void ll_readahead_init(struct inode *inode, struct ll_readahead_state *ras)
 	ras_reset(ras, 0);
 	ras->ras_last_read_end_bytes = 0;
 	ras->ras_requests = 0;
+	ras->ras_range_min_start_idx = 0;
+	ras->ras_range_max_end_idx = 0;
+	ras->ras_range_requests = 0;
+	ras->ras_last_range_pages = 0;
 }
 
 /*
@@ -1033,6 +1048,73 @@ static inline bool is_loose_seq_read(struct ll_readahead_state *ras, loff_t pos)
 			     8UL << PAGE_SHIFT, 8UL << PAGE_SHIFT);
 }
 
+static inline bool is_loose_mmap_read(struct ll_sb_info *sbi,
+				      struct ll_readahead_state *ras,
+				      unsigned long pos)
+{
+	unsigned long range_pages = sbi->ll_ra_info.ra_range_pages;
+
+	return pos_in_window(pos, ras->ras_last_read_end_bytes,
+			     range_pages << PAGE_SHIFT,
+			     range_pages << PAGE_SHIFT);
+}
+
+/**
+ * We have observed slow mmap read performances for some
+ * applications. The problem is if access pattern is neither
+ * sequential nor stride, but could be still adjacent in a
+ * small range and then seek a random position.
+ *
+ * So the pattern could be something like this:
+ *
+ * [1M data] [hole] [0.5M data] [hole] [0.7M data] [1M data]
+ *
+ *
+ * Every time an application reads mmap data, it may not only
+ * read a single 4KB page, but aslo a cluster of nearby pages in
+ * a range(e.g. 1MB) of the first page after a cache miss.
+ *
+ * The readahead engine is modified to track the range size of
+ * a cluster of mmap reads, so that after a seek and/or cache miss,
+ * the range size is used to efficiently prefetch multiple pages
+ * in a single RPC rather than many small RPCs.
+ */
+static void ras_detect_cluster_range(struct ll_readahead_state *ras,
+				     struct ll_sb_info *sbi,
+				     unsigned long pos, unsigned long count)
+{
+	pgoff_t last_pages, pages;
+	pgoff_t end_idx = (pos + count - 1) >> PAGE_SHIFT;
+
+	last_pages = ras->ras_range_max_end_idx -
+		     ras->ras_range_min_start_idx + 1;
+	/* First time come here */
+	if (!ras->ras_range_max_end_idx)
+		goto out;
+
+	/* Random or Stride read */
+	if (!is_loose_mmap_read(sbi, ras, pos))
+		goto out;
+
+	ras->ras_range_requests++;
+	if (ras->ras_range_max_end_idx < end_idx)
+		ras->ras_range_max_end_idx = end_idx;
+
+	if (ras->ras_range_min_start_idx > (pos >> PAGE_SHIFT))
+		ras->ras_range_min_start_idx = pos >> PAGE_SHIFT;
+
+	/* Out of range, consider it as random or stride */
+	pages = ras->ras_range_max_end_idx -
+		ras->ras_range_min_start_idx + 1;
+	if (pages <= sbi->ll_ra_info.ra_range_pages)
+		return;
+out:
+	ras->ras_last_range_pages = last_pages;
+	ras->ras_range_requests = 0;
+	ras->ras_range_min_start_idx = pos >> PAGE_SHIFT;
+	ras->ras_range_max_end_idx = end_idx;
+}
+
 static void ras_detect_read_pattern(struct ll_readahead_state *ras,
 				    struct ll_sb_info *sbi,
 				    loff_t pos, size_t count, bool mmap)
@@ -1080,9 +1162,13 @@ static void ras_detect_read_pattern(struct ll_readahead_state *ras,
 
 	ras->ras_consecutive_bytes += count;
 	if (mmap) {
+		unsigned long ra_range_pages =
+				max_t(unsigned long, RA_MIN_MMAP_RANGE_PAGES,
+				      sbi->ll_ra_info.ra_range_pages);
 		pgoff_t idx = ras->ras_consecutive_bytes >> PAGE_SHIFT;
 
-		if ((idx >= 4 && (idx & 3UL) == 0) || stride_detect)
+		if ((idx >= ra_range_pages &&
+		     idx % ra_range_pages == 0) || stride_detect)
 			ras->ras_need_increase_window = true;
 	} else if ((ras->ras_consecutive_requests > 1 || stride_detect)) {
 		ras->ras_need_increase_window = true;
@@ -1190,10 +1276,36 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode,
 	if (ras->ras_no_miss_check)
 		goto out_unlock;
 
-	if (flags & LL_RAS_MMAP)
+	if (flags & LL_RAS_MMAP) {
+		unsigned long ra_pages;
+
+		ras_detect_cluster_range(ras, sbi, index << PAGE_SHIFT,
+					 PAGE_SIZE);
 		ras_detect_read_pattern(ras, sbi, (loff_t)index << PAGE_SHIFT,
 					PAGE_SIZE, true);
 
+		/* we did not detect anything but we could prefetch */
+		if (!ras->ras_need_increase_window &&
+		    ras->ras_window_pages <= sbi->ll_ra_info.ra_range_pages &&
+		    ras->ras_range_requests >= 2) {
+			if (!hit) {
+				ra_pages = max_t(unsigned long,
+						 RA_MIN_MMAP_RANGE_PAGES,
+						 ras->ras_last_range_pages);
+				if (index < ra_pages / 2)
+					index = 0;
+				else
+					index -= ra_pages / 2;
+				ras->ras_window_pages = ra_pages;
+				ll_ra_stats_inc_sbi(sbi,
+						    RA_STAT_MMAP_RANGE_READ);
+			} else {
+				ras->ras_window_pages = 0;
+			}
+			goto skip;
+		}
+	}
+
 	if (!hit && ras->ras_window_pages &&
 	    index < ras->ras_next_readahead_idx &&
 	    pos_in_window(index, ras->ras_window_start_idx, 0,
@@ -1231,6 +1343,8 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode,
 			goto out_unlock;
 		}
 	}
+
+skip:
 	ras_set_start(ras, index);
 
 	if (stride_io_mode(ras)) {
@@ -1500,8 +1614,12 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 	io_end_index = cl_index(io->ci_obj, io->u.ci_rw.crw_pos +
 				io->u.ci_rw.crw_count - 1);
 	if (ll_readahead_enabled(sbi) && ras) {
+		pgoff_t skip_index = 0;
+
+		if (ras->ras_next_readahead_idx < vvp_index(vpg))
+			skip_index = vvp_index(vpg);
 		rc2 = ll_readahead(env, io, &queue->c2_qin, ras,
-				   uptodate, file);
+				   uptodate, file, skip_index);
 		CDEBUG(D_READA, DFID " %d pages read ahead at %lu\n",
 		       PFID(ll_inode2fid(inode)), rc2, vvp_index(vpg));
 	} else if (vvp_index(vpg) == io_start_index &&
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 18/39] lnet: Introduce lnet_recovery_limit parameter
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (16 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 17/39] lustre: llite: try to improve mmap performance James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 19/39] lustre: mdc: avoid easize set to 0 James Simmons
                   ` (20 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

This parameter controls how long LNet will attempt to recover an
unhealthy interface.

Defaults to 0 to indicate indefinite recovery. This maintains the
current behavior.

HPE-bug-id: LUS-9109
WC-bug-id: https://jira.whamcloud.com/browse/LU-13569
Lustre-commit: a2e61838f8de89 ("LU-13569 lnet: Introduce lnet_recovery_limit parameter")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39716
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h | 1 +
 net/lnet/lnet/api-ni.c        | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index d349f06..927ca44 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -476,6 +476,7 @@ struct lnet_ni *
 extern unsigned int lnet_numa_range;
 extern unsigned int lnet_health_sensitivity;
 extern unsigned int lnet_recovery_interval;
+extern unsigned int lnet_recovery_limit;
 extern unsigned int lnet_peer_discovery_disabled;
 extern unsigned int lnet_drop_asym_route;
 extern unsigned int router_sensitivity_percentage;
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 03473bf..322b25d 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -112,6 +112,11 @@ static int recovery_interval_set(const char *val,
 MODULE_PARM_DESC(lnet_recovery_interval,
 		 "Interval to recover unhealthy interfaces in seconds");
 
+unsigned int lnet_recovery_limit;
+module_param(lnet_recovery_limit, uint, 0644);
+MODULE_PARM_DESC(lnet_recovery_limit,
+		 "How long to attempt recovery of unhealthy peer interfaces in seconds. Set to 0 to allow indefinite recovery");
+
 static int lnet_interfaces_max = LNET_INTERFACES_MAX_DEFAULT;
 static int intf_max_set(const char *val, const struct kernel_param *kp);
 module_param_call(lnet_interfaces_max, intf_max_set, param_get_int,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 19/39] lustre: mdc: avoid easize set to 0
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (17 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 18/39] lnet: Introduce lnet_recovery_limit parameter James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 20/39] lustre: lmv: optimize dir shard revalidate James Simmons
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Yang Sheng, Lustre Development List

From: Yang Sheng <ys@whamcloud.com>

The cl_default_mds_easize could be set to 0 in some case. Then
check it before package.

Fixes: 05fc96e25b55 ("lustre: osd: Set max ea size to XATTR_SIZE_MAX")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14155
Lustre-commit: ff35e27da4c76b ("LU-14155 mdc: avoid easize set to 0")
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40785
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/mdc_locks.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index a82e8ca..8bbb9e1 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -554,7 +554,10 @@ static int mdc_save_lovea(struct ptlrpc_request *req, void *data, u32 size)
 	lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT);
 	lit->opc = (u64)it->it_op;
 
-	easize = obd->u.cli.cl_default_mds_easize;
+	if (obd->u.cli.cl_default_mds_easize > 0)
+		easize = obd->u.cli.cl_default_mds_easize;
+	else
+		easize = obd->u.cli.cl_max_mds_easize;
 
 	/* pack the intended request */
 	mdc_getattr_pack(req, valid, it->it_flags, op_data, easize);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 20/39] lustre: lmv: optimize dir shard revalidate
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (18 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 19/39] lustre: mdc: avoid easize set to 0 James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 21/39] lustre: ldlm: osc_object_ast_clear() is called for mdc object on eviction James Simmons
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

mdt_is_remote_object() will check whether child is directory shard
if parent and child are on different MDTs, which needs to read LMV
from disk, and hurt striped directory stat performance.

This can be optimized, client can just set CROSS_REF flag to do a
cross reference getattr, which avoids lots of checks.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14172
Lustre-commit: de47c7671f29b2 ("LU-14172 lmv: optimize dir shard revalidate")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40863
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd.h                |  2 +-
 fs/lustre/include/obd_class.h          |  3 +--
 fs/lustre/llite/file.c                 |  4 ++--
 fs/lustre/llite/llite_lib.c            |  4 ++--
 fs/lustre/lmv/lmv_intent.c             | 15 ++++++++-------
 fs/lustre/lmv/lmv_internal.h           |  1 -
 fs/lustre/lmv/lmv_obd.c                |  3 +--
 include/uapi/linux/lustre/lustre_idl.h |  7 +++++++
 8 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index a017997..de62005 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -1033,7 +1033,7 @@ struct md_ops {
 
 	int (*free_lustre_md)(struct obd_export *, struct lustre_md *);
 
-	int (*merge_attr)(struct obd_export *, const struct lu_fid *fid,
+	int (*merge_attr)(struct obd_export *exp,
 			  const struct lmv_stripe_md *lsm,
 			  struct cl_attr *attr, ldlm_blocking_callback);
 
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index 1ac9fcf..b441215 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -1460,7 +1460,6 @@ static inline int md_free_lustre_md(struct obd_export *exp,
 }
 
 static inline int md_merge_attr(struct obd_export *exp,
-				const struct lu_fid *fid,
 				const struct lmv_stripe_md *lsm,
 				struct cl_attr *attr,
 				ldlm_blocking_callback cb)
@@ -1471,7 +1470,7 @@ static inline int md_merge_attr(struct obd_export *exp,
 	if (rc)
 		return rc;
 
-	return MDP(exp->exp_obd, merge_attr)(exp, fid, lsm, attr, cb);
+	return MDP(exp->exp_obd, merge_attr)(exp, lsm, attr, cb);
 }
 
 static inline int md_setxattr(struct obd_export *exp, const struct lu_fid *fid,
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 2b0ffad..5d03fc3 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -4708,8 +4708,8 @@ static int ll_merge_md_attr(struct inode *inode)
 		return 0;
 
 	down_read(&lli->lli_lsm_sem);
-	rc = md_merge_attr(ll_i2mdexp(inode), &lli->lli_fid, lli->lli_lsm_md,
-			   &attr, ll_md_blocking_ast);
+	rc = md_merge_attr(ll_i2mdexp(inode), lli->lli_lsm_md, &attr,
+			   ll_md_blocking_ast);
 	up_read(&lli->lli_lsm_sem);
 	if (rc)
 		return rc;
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index c560492..570d51a 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1521,8 +1521,8 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 	}
 
 	/* validate the lsm */
-	rc = md_merge_attr(ll_i2mdexp(inode), &lli->lli_fid, lli->lli_lsm_md,
-			   attr, ll_md_blocking_ast);
+	rc = md_merge_attr(ll_i2mdexp(inode), lli->lli_lsm_md, attr,
+			   ll_md_blocking_ast);
 	if (!rc) {
 		if (md->body->mbo_valid & OBD_MD_FLNLINK)
 			md->body->mbo_nlink = attr->cat_nlink;
diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c
index ad59b64..38b8c75 100644
--- a/fs/lustre/lmv/lmv_intent.c
+++ b/fs/lustre/lmv/lmv_intent.c
@@ -153,7 +153,6 @@ static int lmv_intent_remote(struct obd_export *exp, struct lookup_intent *it,
 }
 
 int lmv_revalidate_slaves(struct obd_export *exp,
-			  const struct lu_fid *pfid,
 			  const struct lmv_stripe_md *lsm,
 			  ldlm_blocking_callback cb_blocking,
 			  int extra_lock_flags)
@@ -197,11 +196,14 @@ int lmv_revalidate_slaves(struct obd_export *exp,
 		 * which is not needed here.
 		 */
 		memset(op_data, 0, sizeof(*op_data));
-		if (exp_connect_flags2(exp) & OBD_CONNECT2_GETATTR_PFID)
-			op_data->op_fid1 = *pfid;
-		else
-			op_data->op_fid1 = fid;
+		op_data->op_fid1 = fid;
 		op_data->op_fid2 = fid;
+		/* shard revalidate only needs to fetch attributes and UPDATE
+		 * lock, which is similar to the bottom half of remote object
+		 * getattr, set this flag so that MDT skips checking whether
+		 * it's remote object.
+		 */
+		op_data->op_bias = MDS_CROSS_REF;
 
 		tgt = lmv_tgt(lmv, lsm->lsm_md_oinfo[i].lmo_mds);
 		if (!tgt) {
@@ -495,8 +497,7 @@ static int lmv_intent_lookup(struct obd_export *exp,
 		 * during update_inode process (see ll_update_lsm_md)
 		 */
 		if (lmv_dir_striped(op_data->op_mea2)) {
-			rc = lmv_revalidate_slaves(exp, &op_data->op_fid2,
-						   op_data->op_mea2,
+			rc = lmv_revalidate_slaves(exp, op_data->op_mea2,
 						   cb_blocking,
 						   extra_lock_flags);
 			if (rc != 0)
diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h
index 756fa27..e42b141 100644
--- a/fs/lustre/lmv/lmv_internal.h
+++ b/fs/lustre/lmv/lmv_internal.h
@@ -53,7 +53,6 @@ int lmv_fid_alloc(const struct lu_env *env, struct obd_export *exp,
 		  struct lu_fid *fid, struct md_op_data *op_data);
 
 int lmv_revalidate_slaves(struct obd_export *exp,
-			  const struct lu_fid *pfid,
 			  const struct lmv_stripe_md *lsm,
 			  ldlm_blocking_callback cb_blocking,
 			  int extra_lock_flags);
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index fa1dae5..d845118 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -3482,7 +3482,6 @@ static int lmv_quotactl(struct obd_device *unused, struct obd_export *exp,
 }
 
 static int lmv_merge_attr(struct obd_export *exp,
-			  const struct lu_fid *fid,
 			  const struct lmv_stripe_md *lsm,
 			  struct cl_attr *attr,
 			  ldlm_blocking_callback cb_blocking)
@@ -3492,7 +3491,7 @@ static int lmv_merge_attr(struct obd_export *exp,
 	if (!lmv_dir_striped(lsm))
 		return 0;
 
-	rc = lmv_revalidate_slaves(exp, fid, lsm, cb_blocking, 0);
+	rc = lmv_revalidate_slaves(exp, lsm, cb_blocking, 0);
 	if (rc < 0)
 		return rc;
 
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index f56b3c5..f953815 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -1705,6 +1705,13 @@ struct mdt_rec_setattr {
 
 enum mds_op_bias {
 /*	MDS_CHECK_SPLIT		= 1 << 0, obsolete before 2.3.58 */
+	/* used for remote object getattr/open by name: in the original
+	 * getattr/open request, MDT found the object against name is on another
+	 * MDT, then packed FID and LOOKUP lock in reply and returned -EREMOTE,
+	 * and client knew it's a remote object, then set this flag in
+	 * getattr/open request and sent to the corresponding MDT to finish
+	 * getattr/open, which fetched attributes and UPDATE lock/opened file.
+	 */
 	MDS_CROSS_REF		= 1 << 1,
 /*	MDS_VTX_BYPASS		= 1 << 2, obsolete since 2.3.54 */
 	MDS_PERM_BYPASS		= 1 << 3,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 21/39] lustre: ldlm: osc_object_ast_clear() is called for mdc object on eviction
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (19 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 20/39] lustre: lmv: optimize dir shard revalidate James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 22/39] lustre: uapi: fix compatibility for LL_IOC_MDC_GETINFO James Simmons
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Andriy Skulysh, Lustre Development List

From: Andriy Skulysh <c17819@cray.com>

Replace osc_object_prune() with cl_object_prune()

PID: 3477   TASK: ffff9360d82fa0e0  CPU: 0   COMMAND: "ll_imp_inval"
 #0 [ffff9360d5c5b990] machine_kexec at ffffffff86865704
 #1 [ffff9360d5c5b9f0] __crash_kexec at ffffffff869209a2
 #2 [ffff9360d5c5bac0] panic at ffffffff86f7294c
 #3 [ffff9360d5c5bb40] lbug_with_loc at ffffffffc04b78cb [libcfs]
 #4 [ffff9360d5c5bb60] osc_object_ast_clear at ffffffffc0956471 [osc]
 #5 [ffff9360d5c5bbc8] ldlm_resource_foreach at ffffffffc07e2fd6 [ptlrpc]
 #6 [ffff9360d5c5bc08] ldlm_resource_iterate at ffffffffc07e3266 [ptlrpc]
 #7 [ffff9360d5c5bc38] osc_object_prune at ffffffffc0956140 [osc]
 #8 [ffff9360d5c5bc58] osc_object_invalidate at ffffffffc0956e12 [osc]
 #9 [ffff9360d5c5bcd0] osc_ldlm_resource_invalidate at ffffffffc09477bf [osc]

HPE-bug-id: LUS-8399
WC-bug-id: https://jira.whamcloud.com/browse/LU-13994
Lustre-commit: 542d0059184060 ("LU-13994 ldlm: osc_object_ast_clear() is called for mdc object on eviction")
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/40052
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/osc/osc_object.c b/fs/lustre/osc/osc_object.c
index 9a0fc54..273098a 100644
--- a/fs/lustre/osc/osc_object.c
+++ b/fs/lustre/osc/osc_object.c
@@ -493,7 +493,7 @@ int osc_object_invalidate(const struct lu_env *env, struct osc_object *osc)
 	osc_lock_discard_pages(env, osc, 0, CL_PAGE_EOF, true);
 
 	/* Clear ast data of dlm lock. Do this after discarding all pages */
-	osc_object_prune(env, osc2cl(osc));
+	cl_object_prune(env, osc2cl(osc));
 
 	return 0;
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 22/39] lustre: uapi: fix compatibility for LL_IOC_MDC_GETINFO
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (20 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 21/39] lustre: ldlm: osc_object_ast_clear() is called for mdc object on eviction James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 23/39] lustre: llite: don't check layout info for page discard James Simmons
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Qian Yingjin <qian@ddn.com>

The landed patch "lustre: som: integrate LSOM with lfs find"
uses "LL_IOC_MDC_GETINFO_OLD", so while the IOCTL number/structs
are ABI compatible, it is not API compatible and applications
using for the header including the definition LL_IOC_MDC_GETINFO
is broken.

This patch defines versioned IOCTL number: LL_IOC_MDC_GETINFO_V1,
LL_IOC_MDC_GETINFO_V2. Then we can use the explicitly versioned
constrants everywhere for the in-tree code, and declare
LL_IOC_MDC_GETINFO in a compatible way, but external applications
can select the version that they want explicitly.

And this patch does the same fix for IOC_MDC_GETFILEINFO.

Fixes: 9b5e45e7275e ("lustre: som: integrate LSOM with lfs find")
WC-bug-id: https://jira.whamcloud.com/browse/LU-13826
Lustre-commit: 449c648793d2fc ("LU-13826 utils: fix compatibility for LL_IOC_MDC_GETINFO")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/40858
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c                   | 34 ++++++++++++++++-----------------
 include/uapi/linux/lustre/lustre_user.h | 14 ++++++--------
 2 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index db620ce..c42cff7 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -1634,10 +1634,10 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		return ll_obd_statfs(inode, (void __user *)arg);
 	case LL_IOC_LOV_GETSTRIPE:
 	case LL_IOC_LOV_GETSTRIPE_NEW:
-	case LL_IOC_MDC_GETINFO:
-	case LL_IOC_MDC_GETINFO_OLD:
-	case IOC_MDC_GETFILEINFO:
-	case IOC_MDC_GETFILEINFO_OLD:
+	case LL_IOC_MDC_GETINFO_V1:
+	case LL_IOC_MDC_GETINFO_V2:
+	case IOC_MDC_GETFILEINFO_V1:
+	case IOC_MDC_GETFILEINFO_V2:
 	case IOC_MDC_GETFILESTRIPE: {
 		struct ptlrpc_request *request = NULL;
 		struct ptlrpc_request *root_request = NULL;
@@ -1652,8 +1652,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		struct lu_fid __user *fidp = NULL;
 		int lmmsize;
 
-		if (cmd == IOC_MDC_GETFILEINFO_OLD ||
-		    cmd == IOC_MDC_GETFILEINFO ||
+		if (cmd == IOC_MDC_GETFILEINFO_V1 ||
+		    cmd == IOC_MDC_GETFILEINFO_V2 ||
 		    cmd == IOC_MDC_GETFILESTRIPE) {
 			filename = ll_getname((const char __user *)arg);
 			if (IS_ERR(filename))
@@ -1675,10 +1675,10 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 			goto out_req;
 		}
 
-		if (rc == -ENODATA && (cmd == IOC_MDC_GETFILEINFO ||
-				       cmd == LL_IOC_MDC_GETINFO ||
-				       cmd == IOC_MDC_GETFILEINFO_OLD ||
-				       cmd == LL_IOC_MDC_GETINFO_OLD)) {
+		if (rc == -ENODATA && (cmd == IOC_MDC_GETFILEINFO_V1 ||
+				       cmd == LL_IOC_MDC_GETINFO_V1 ||
+				       cmd == IOC_MDC_GETFILEINFO_V2 ||
+				       cmd == LL_IOC_MDC_GETINFO_V2)) {
 			lmmsize = 0;
 			rc = 0;
 		}
@@ -1690,8 +1690,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		    cmd == LL_IOC_LOV_GETSTRIPE ||
 		    cmd == LL_IOC_LOV_GETSTRIPE_NEW) {
 			lump = (struct lov_user_md __user *)arg;
-		} else if (cmd == IOC_MDC_GETFILEINFO_OLD ||
-			   cmd == LL_IOC_MDC_GETINFO_OLD){
+		} else if (cmd == IOC_MDC_GETFILEINFO_V1 ||
+			   cmd == LL_IOC_MDC_GETINFO_V1) {
 			struct lov_user_mds_data_v1 __user *lmdp;
 
 			lmdp = (struct lov_user_mds_data_v1 __user *)arg;
@@ -1724,8 +1724,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 			rc = -EOVERFLOW;
 		}
 
-		if (cmd == IOC_MDC_GETFILEINFO_OLD ||
-		    cmd == LL_IOC_MDC_GETINFO_OLD) {
+		if (cmd == IOC_MDC_GETFILEINFO_V1 ||
+		    cmd == LL_IOC_MDC_GETINFO_V1) {
 			lstat_t st = { 0 };
 
 			st.st_dev = inode->i_sb->s_dev;
@@ -1748,8 +1748,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 				rc = -EFAULT;
 				goto out_req;
 			}
-		} else if (cmd == IOC_MDC_GETFILEINFO ||
-			   cmd == LL_IOC_MDC_GETINFO) {
+		} else if (cmd == IOC_MDC_GETFILEINFO_V2 ||
+			   cmd == LL_IOC_MDC_GETINFO_V2) {
 			struct statx stx = { 0 };
 			u64 valid = body->mbo_valid;
 
@@ -1783,7 +1783,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 			 * However, this whould be better decided by the MDS
 			 * instead of the client.
 			 */
-			if (cmd == LL_IOC_MDC_GETINFO &&
+			if (cmd == LL_IOC_MDC_GETINFO_V2 &&
 			    ll_i2info(inode)->lli_lsm_md)
 				valid &= ~(OBD_MD_FLSIZE | OBD_MD_FLBLOCKS);
 
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 62c6952..835ffce 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -86,8 +86,6 @@
 #define fstatat_f	fstatat
 #endif
 
-#define HAVE_LOV_USER_MDS_DATA
-
 #define LUSTRE_EOF 0xffffffffffffffffULL
 
 /* for statfs() */
@@ -384,10 +382,12 @@ struct ll_ioc_lease_id {
 #define IOC_MDC_TYPE		'i'
 #define IOC_MDC_LOOKUP		_IOWR(IOC_MDC_TYPE, 20, struct obd_device *)
 #define IOC_MDC_GETFILESTRIPE	_IOWR(IOC_MDC_TYPE, 21, struct lov_user_md *)
-#define IOC_MDC_GETFILEINFO_OLD	_IOWR(IOC_MDC_TYPE, 22, struct lov_user_mds_data_v1 *)
-#define IOC_MDC_GETFILEINFO	_IOWR(IOC_MDC_TYPE, 22, struct lov_user_mds_data)
-#define LL_IOC_MDC_GETINFO_OLD	_IOWR(IOC_MDC_TYPE, 23, struct lov_user_mds_data_v1 *)
-#define LL_IOC_MDC_GETINFO	_IOWR(IOC_MDC_TYPE, 23, struct lov_user_mds_data)
+#define IOC_MDC_GETFILEINFO_V1	_IOWR(IOC_MDC_TYPE, 22, struct lov_user_mds_data_v1 *)
+#define IOC_MDC_GETFILEINFO_V2	_IOWR(IOC_MDC_TYPE, 22, struct lov_user_mds_data)
+#define LL_IOC_MDC_GETINFO_V1	_IOWR(IOC_MDC_TYPE, 23, struct lov_user_mds_data_v1 *)
+#define LL_IOC_MDC_GETINFO_V2	_IOWR(IOC_MDC_TYPE, 23, struct lov_user_mds_data)
+#define IOC_MDC_GETFILEINFO	IOC_MDC_GETFILEINFO_V1
+#define LL_IOC_MDC_GETINFO	LL_IOC_MDC_GETINFO_V1
 
 #define MAX_OBD_NAME 128 /* If this changes, a NEW ioctl must be added */
 
@@ -658,7 +658,6 @@ static inline __u32 lov_user_md_size(__u16 stripes, __u32 lmm_magic)
  * use this.  It is unsafe to #define those values in this header as it
  * is possible the application has already #included <sys/stat.h>.
  */
-#ifdef HAVE_LOV_USER_MDS_DATA
 #define lov_user_mds_data lov_user_mds_data_v2
 struct lov_user_mds_data_v1 {
 	lstat_t lmd_st;			/* MDS stat struct */
@@ -678,7 +677,6 @@ struct lov_user_mds_data_v3 {
 	lstat_t lmd_st;			/* MDS stat struct */
 	struct lov_user_md_v3 lmd_lmm;	/* LOV EA V3 user data */
 } __attribute__((packed));
-#endif
 
 struct lmv_user_mds_data {
 	struct lu_fid	lum_fid;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 23/39] lustre: llite: don't check layout info for page discard
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (21 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 22/39] lustre: uapi: fix compatibility for LL_IOC_MDC_GETINFO James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 24/39] lustre: update version to 2.13.57 James Simmons
                   ` (15 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bobi Jam <bobijam@whamcloud.com>

The CIT_MISC+ignore_layout is indicating locks/pages manipulation
from the OSC layer, it does not care/access lov layout related info.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14042
Lustre-commit: 5d1ffc65d5a97c ("LU-14042 llite: don't check layout info for page discard")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40267
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_io.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c
index 20fcde1..7f0e945 100644
--- a/fs/lustre/lov/lov_io.c
+++ b/fs/lustre/lov/lov_io.c
@@ -465,8 +465,6 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
 	io->ci_result = 0;
 	lio->lis_object = obj;
 
-	LASSERT(obj->lo_lsm);
-
 	switch (io->ci_type) {
 	case CIT_READ:
 	case CIT_WRITE:
@@ -555,6 +553,18 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
 	default:
 		LBUG();
 	}
+
+	/*
+	 * CIT_MISC + ci_ignore_layout can identify the I/O from the OSC layer,
+	 * it won't care/access lov layout related info.
+	 */
+	if (io->ci_ignore_layout && io->ci_type == CIT_MISC) {
+		result = 0;
+		goto out;
+	}
+
+	LASSERT(obj->lo_lsm);
+
 	result = lov_io_mirror_init(lio, obj, io);
 	if (result)
 		goto out;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 24/39] lustre: update version to 2.13.57
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (22 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 23/39] lustre: llite: don't check layout info for page discard James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 25/39] lnet: o2iblnd: retry qp creation with reduced queue depth James Simmons
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

New tag 2.13.57

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h
index 8d2f2e8..2a6c050 100644
--- a/include/uapi/linux/lustre/lustre_ver.h
+++ b/include/uapi/linux/lustre/lustre_ver.h
@@ -3,9 +3,9 @@
 
 #define LUSTRE_MAJOR 2
 #define LUSTRE_MINOR 13
-#define LUSTRE_PATCH 56
+#define LUSTRE_PATCH 57
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.13.56"
+#define LUSTRE_VERSION_STRING "2.13.57"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 25/39] lnet: o2iblnd: retry qp creation with reduced queue depth
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (23 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 24/39] lustre: update version to 2.13.57 James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 26/39] lustre: lov: fix SEEK_HOLE calcs at component end James Simmons
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

If negotiated number of frags * queue depth is too large for
successful qp creation, reduce the queue depth in a loop
until qp creation succeeds or the queue depth dips below 2.
Remember the reduced queue depth value to use for later
connections to the same peer.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12901
Lustre-commit: 8a3ef5713cc4ae ("LU-12901 o2iblnd: retry qp creation with reduced queue depth")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40748
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c | 33 ++++++++++++++++++++++++++-------
 net/lnet/klnds/o2iblnd/o2iblnd.h |  2 ++
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 9c65524..fc515fc 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -336,6 +336,7 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer_ni **peerp,
 	peer_ni->ibp_last_alive = 0;
 	peer_ni->ibp_max_frags = IBLND_MAX_RDMA_FRAGS;
 	peer_ni->ibp_queue_depth = ni->ni_net->net_tunables.lct_peer_tx_credits;
+	peer_ni->ibp_queue_depth_mod = 0;	/* try to use the default */
 	atomic_set(&peer_ni->ibp_refcount, 1);  /* 1 ref for caller */
 
 	INIT_LIST_HEAD(&peer_ni->ibp_list);
@@ -795,13 +796,28 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni,
 	init_qp_attr.qp_type = IB_QPT_RC;
 	init_qp_attr.send_cq = cq;
 	init_qp_attr.recv_cq = cq;
-	/* kiblnd_send_wrs() can change the connection's queue depth if
-	 * the maximum work requests for the device is maxed out
-	 */
-	init_qp_attr.cap.max_send_wr = kiblnd_send_wrs(conn);
-	init_qp_attr.cap.max_recv_wr = IBLND_RECV_WRS(conn);
 
-	rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, &init_qp_attr);
+	if (peer_ni->ibp_queue_depth_mod &&
+	    peer_ni->ibp_queue_depth_mod < peer_ni->ibp_queue_depth) {
+		conn->ibc_queue_depth = peer_ni->ibp_queue_depth_mod;
+		CDEBUG(D_NET, "Use reduced queue depth %u (from %u)\n",
+		       peer_ni->ibp_queue_depth_mod,
+		       peer_ni->ibp_queue_depth);
+	}
+
+	do {
+		/* kiblnd_send_wrs() can change the connection's queue depth if
+		 * the maximum work requests for the device is maxed out
+		 */
+		init_qp_attr.cap.max_send_wr = kiblnd_send_wrs(conn);
+		init_qp_attr.cap.max_recv_wr = IBLND_RECV_WRS(conn);
+		rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd,
+				    &init_qp_attr);
+		if (rc != -ENOMEM || conn->ibc_queue_depth < 2)
+			break;
+		conn->ibc_queue_depth--;
+	} while (rc);
+
 	if (rc) {
 		CERROR("Can't create QP: %d, send_wr: %d, recv_wr: %d, send_sge: %d, recv_sge: %d\n",
 		       rc, init_qp_attr.cap.max_send_wr,
@@ -813,11 +829,14 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni,
 
 	conn->ibc_sched = sched;
 
-	if (conn->ibc_queue_depth != peer_ni->ibp_queue_depth)
+	if (!peer_ni->ibp_queue_depth_mod &&
+	    conn->ibc_queue_depth != peer_ni->ibp_queue_depth) {
 		CWARN("peer %s - queue depth reduced from %u to %u  to allow for qp creation\n",
 		      libcfs_nid2str(peer_ni->ibp_nid),
 		      peer_ni->ibp_queue_depth,
 		      conn->ibc_queue_depth);
+		peer_ni->ibp_queue_depth_mod = conn->ibc_queue_depth;
+	}
 
 	conn->ibc_rxs = kzalloc_cpt(IBLND_RX_MSGS(conn) *
 				    sizeof(*conn->ibc_rxs),
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 1fc68e1..424ca07 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -638,6 +638,8 @@ struct kib_peer_ni {
 	u16			ibp_max_frags;
 	/* max_peer_credits */
 	u16			ibp_queue_depth;
+	/* reduced value which allows conn to be created if max fails */
+	u16			ibp_queue_depth_mod;
 };
 
 extern struct kib_data kiblnd_data;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 26/39] lustre: lov: fix SEEK_HOLE calcs at component end
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (24 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 25/39] lnet: o2iblnd: retry qp creation with reduced queue depth James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 27/39] lustre: lov: instantiate components layout for fallocate James Simmons
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Mikhail Pershin, Lustre Development List

From: Mikhail Pershin <mpershin@whamcloud.com>

If data ends exactly at component end then LOV assumed that
is not yet hole in file and the next component will take care.
Meanwhile there can be no next component initialized yet if file
ends exactly at component boundary, so no hole offset is returned
but error

Patch fixes that issue. If component reports hole offset at
component end then it is saved to be used as result when no
other components report valid hole offset.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14143
Lustre-commit: dbb6b493ad9f98 ("LU-14143 lov: fix SEEK_HOLE calcs at component end")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40713
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_io.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c
index 7f0e945..ac88a55 100644
--- a/fs/lustre/lov/lov_io.c
+++ b/fs/lustre/lov/lov_io.c
@@ -1295,6 +1295,7 @@ static void lov_io_lseek_end(const struct lu_env *env,
 	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
 	struct lov_io_sub *sub;
 	loff_t offset = -ENXIO;
+	u64 hole_off = 0;
 	bool seek_hole = io->u.ci_lseek.ls_whence == SEEK_HOLE;
 
 	list_for_each_entry(sub, &lio->lis_active, sub_linkage) {
@@ -1302,6 +1303,7 @@ static void lov_io_lseek_end(const struct lu_env *env,
 		int index = lov_comp_entry(sub->sub_subio_index);
 		int stripe = lov_comp_stripe(sub->sub_subio_index);
 		loff_t sub_off, lov_off;
+		u64 comp_end = lsm->lsm_entries[index]->lsme_extent.e_end;
 
 		lov_io_end_wrapper(sub->sub_env, subio);
 
@@ -1347,10 +1349,22 @@ static void lov_io_lseek_end(const struct lu_env *env,
 		/* resulting offset can be out of component range if stripe
 		 * object is full and its file size was returned as virtual
 		 * hole start. Skip this result, the next component will give
-		 * us correct lseek result.
+		 * us correct lseek result but keep possible hole offset in
+		 * case there is no more components ahead
 		 */
-		if (lov_off >= lsm->lsm_entries[index]->lsme_extent.e_end)
+		if (lov_off >= comp_end) {
+			/* must be SEEK_HOLE case */
+			if (likely(seek_hole)) {
+				/* save comp end as potential hole offset */
+				hole_off = max_t(u64, comp_end, hole_off);
+			} else {
+				io->ci_result = -EINVAL;
+				CDEBUG(D_INFO,
+				       "off %lld >= comp_end %llu: rc = %d\n",
+				       lov_off, comp_end, io->ci_result);
+			}
 			continue;
+		}
 
 		CDEBUG(D_INFO, "SEEK_%s: %lld->%lld/%lld: rc = %d\n",
 		       seek_hole ? "HOLE" : "DATA",
@@ -1358,6 +1372,10 @@ static void lov_io_lseek_end(const struct lu_env *env,
 		       sub->sub_io.ci_result);
 		offset = min_t(u64, offset, lov_off);
 	}
+	/* no result but some component returns hole as component end */
+	if (seek_hole && offset == -ENXIO && hole_off > 0)
+		offset = hole_off;
+
 	io->u.ci_lseek.ls_result = offset;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 27/39] lustre: lov: instantiate components layout for fallocate
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (25 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 26/39] lustre: lov: fix SEEK_HOLE calcs at component end James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 28/39] lustre: dom: non-blocking enqueue for DOM locks James Simmons
                   ` (11 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

fallocate() need send intent lock to MDS to instantiate layout
like PFL.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14186
Lustre-commit: 7e25e6c7d0a710 ("LU-14186 lov: instantiate components layout for fallocate")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/40885
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/vvp_io.c | 2 +-
 fs/lustre/lov/lov_io.c   | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index 8dbe835..b0b31c37 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -361,7 +361,7 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 
 		io->ci_need_write_intent = 0;
 
-		LASSERT(io->ci_type == CIT_WRITE ||
+		LASSERT(io->ci_type == CIT_WRITE || cl_io_is_fallocate(io) ||
 			cl_io_is_trunc(io) || cl_io_is_mkwrite(io));
 
 		CDEBUG(D_VFSTRACE, DFID" write layout, type %u " DEXT "\n",
diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c
index ac88a55..d4a0c9d 100644
--- a/fs/lustre/lov/lov_io.c
+++ b/fs/lustre/lov/lov_io.c
@@ -571,6 +571,7 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
 
 	/* check if it needs to instantiate layout */
 	if (!(io->ci_type == CIT_WRITE || cl_io_is_mkwrite(io) ||
+	      cl_io_is_fallocate(io) ||
 	      (cl_io_is_trunc(io) && io->u.ci_setattr.sa_attr.lvb_size > 0))) {
 		result = 0;
 		goto out;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 28/39] lustre: dom: non-blocking enqueue for DOM locks
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (26 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 27/39] lustre: lov: instantiate components layout for fallocate James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 29/39] lustre: llite: fiemap set flags for encrypted files James Simmons
                   ` (10 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Mikhail Pershin, Lustre Development List

From: Mikhail Pershin <mpershin@whamcloud.com>

DOM lock enqueue waits for blocking locks on MDT due to
ATOMIC flag, so MDT thread is blocked until lock is granted.
When many clients attempt to write to shared file that may
cause server thread starvation and lock contention. Switch
to non-atomic lock enqueue for DOM locks.

- switch IO lock to non-intent enqueue, so it doesn't consume
  server thread for a long time being blocked
- on client take LVB from l_lvb_data updated by completion AST and
  update l_ost_lvb used by DoM
- make glimpse performing similarly on MDT and OST, it uses one
  format with no intent buffer and return data in LVB buffer
- introduce new connect flag 'dom_lvb' for compatibility reasons
- on server handle glimpse for both old and new clients by filling
  either LVB reply buffer or mdt_body buffer
- don't take RPC slot for a DOM enqueue like it is done for EXTENT
  locks, update ldlm_cli_enqueue_fini() to accept ldlm_enqueue_info
  as parameter
- check that there is no atomic local lock issued with mandatory DOM
  bit, trybits should be used

WC-bug-id: https://jira.whamcloud.com/browse/LU-10664
Lustre-commit: 3c75d2522786a2a ("LU-10664 dom: non-blocking enqueue for DOM locks")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36903
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm.h         |   3 +-
 fs/lustre/include/lustre_export.h      |   5 ++
 fs/lustre/ldlm/ldlm_request.c          |  66 +++++++--------
 fs/lustre/llite/llite_lib.c            |   3 +-
 fs/lustre/mdc/mdc_dev.c                | 144 +++++++++++++++++++--------------
 fs/lustre/mdc/mdc_internal.h           |  10 +++
 fs/lustre/mdc/mdc_locks.c              |   9 ++-
 fs/lustre/obdclass/lprocfs_status.c    |   1 +
 fs/lustre/osc/osc_request.c            |  39 ++-------
 fs/lustre/ptlrpc/wiretest.c            |   2 +
 include/uapi/linux/lustre/lustre_idl.h |   1 +
 11 files changed, 147 insertions(+), 136 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index e4c95a2..8156f75 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -1341,8 +1341,7 @@ int ldlm_prep_elc_req(struct obd_export *exp,
 
 struct ptlrpc_request *ldlm_enqueue_pack(struct obd_export *exp, int lvb_len);
 int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
-			  enum ldlm_type type, u8 with_policy,
-			  enum ldlm_mode mode,
+			  struct ldlm_enqueue_info *einfo, u8 with_policy,
 			  u64 *flags, void *lvb, u32 lvb_len,
 			  const struct lustre_handle *lockh, int rc);
 int ldlm_cli_convert_req(struct ldlm_lock *lock, u32 *flags, u64 new_bits);
diff --git a/fs/lustre/include/lustre_export.h b/fs/lustre/include/lustre_export.h
index ed49a97..4cc88ef 100644
--- a/fs/lustre/include/lustre_export.h
+++ b/fs/lustre/include/lustre_export.h
@@ -290,6 +290,11 @@ static inline int exp_connect_lseek(struct obd_export *exp)
 	return !!(exp_connect_flags2(exp) & OBD_CONNECT2_LSEEK);
 }
 
+static inline int exp_connect_dom_lvb(struct obd_export *exp)
+{
+	return !!(exp_connect_flags2(exp) & OBD_CONNECT2_DOM_LVB);
+}
+
 enum {
 	/* archive_ids in array format */
 	KKUC_CT_DATA_ARRAY_MAGIC	= 0x092013cea,
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 86b10a7..1c2ecf2 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -355,9 +355,12 @@ static void failed_lock_cleanup(struct ldlm_namespace *ns,
 	}
 }
 
-static bool ldlm_request_slot_needed(enum ldlm_type type)
+static bool ldlm_request_slot_needed(struct ldlm_enqueue_info *einfo)
 {
-	return type == LDLM_FLOCK || type == LDLM_IBITS;
+	/* exclude EXTENT locks and DOM-only IBITS locks because they
+	 * are asynchronous and don't wait on server being blocked.
+	 */
+	return einfo->ei_type == LDLM_FLOCK || einfo->ei_type == LDLM_IBITS;
 }
 
 /**
@@ -366,19 +369,19 @@ static bool ldlm_request_slot_needed(enum ldlm_type type)
  * Called after receiving reply from server.
  */
 int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
-			  enum ldlm_type type, u8 with_policy,
-			  enum ldlm_mode mode,
-			  u64 *flags, void *lvb, u32 lvb_len,
-			  const struct lustre_handle *lockh, int rc)
+			  struct ldlm_enqueue_info *einfo,
+			  u8 with_policy, u64 *ldlm_flags, void *lvb,
+			  u32 lvb_len, const struct lustre_handle *lockh,
+			  int rc)
 {
 	struct ldlm_namespace *ns = exp->exp_obd->obd_namespace;
 	const struct lu_env *env = NULL;
-	int is_replay = *flags & LDLM_FL_REPLAY;
+	int is_replay = *ldlm_flags & LDLM_FL_REPLAY;
 	struct ldlm_lock *lock;
 	struct ldlm_reply *reply;
 	int cleanup_phase = 1;
 
-	if (ldlm_request_slot_needed(type))
+	if (ldlm_request_slot_needed(einfo))
 		obd_put_request_slot(&req->rq_import->imp_obd->u.cli);
 
 	ptlrpc_put_mod_rpc_slot(req);
@@ -386,7 +389,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 	lock = ldlm_handle2lock(lockh);
 	/* ldlm_cli_enqueue is holding a reference on this lock. */
 	if (!lock) {
-		LASSERT(type == LDLM_FLOCK);
+		LASSERT(einfo->ei_type == LDLM_FLOCK);
 		return -ENOLCK;
 	}
 
@@ -443,20 +446,20 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 	lock_res_and_lock(lock);
 	lock->l_remote_handle = reply->lock_handle;
 
-	*flags = ldlm_flags_from_wire(reply->lock_flags);
+	*ldlm_flags = ldlm_flags_from_wire(reply->lock_flags);
 	lock->l_flags |= ldlm_flags_from_wire(reply->lock_flags &
 					      LDLM_FL_INHERIT_MASK);
 	unlock_res_and_lock(lock);
 
 	CDEBUG(D_INFO, "local: %p, remote cookie: %#llx, flags: 0x%llx\n",
-	       lock, reply->lock_handle.cookie, *flags);
+	       lock, reply->lock_handle.cookie, *ldlm_flags);
 
 	/*
 	 * If enqueue returned a blocked lock but the completion handler has
 	 * already run, then it fixed up the resource and we don't need to do it
 	 * again.
 	 */
-	if ((*flags) & LDLM_FL_LOCK_CHANGED) {
+	if ((*ldlm_flags) & LDLM_FL_LOCK_CHANGED) {
 		int newmode = reply->lock_desc.l_req_mode;
 
 		LASSERT(!is_replay);
@@ -490,12 +493,12 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 						     &lock->l_policy_data);
 		}
 
-		if (type != LDLM_PLAIN)
+		if (einfo->ei_type != LDLM_PLAIN)
 			LDLM_DEBUG(lock,
 				   "client-side enqueue, new policy data");
 	}
 
-	if ((*flags) & LDLM_FL_AST_SENT) {
+	if ((*ldlm_flags) & LDLM_FL_AST_SENT) {
 		lock_res_and_lock(lock);
 		ldlm_bl_desc2lock(&reply->lock_desc, lock);
 		lock->l_flags |= LDLM_FL_CBPENDING |  LDLM_FL_BL_AST;
@@ -526,9 +529,10 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 	}
 
 	if (!is_replay) {
-		rc = ldlm_lock_enqueue(env, ns, &lock, NULL, flags);
+		rc = ldlm_lock_enqueue(env, ns, &lock, NULL, ldlm_flags);
 		if (lock->l_completion_ast) {
-			int err = lock->l_completion_ast(lock, *flags, NULL);
+			int err = lock->l_completion_ast(lock, *ldlm_flags,
+							 NULL);
 
 			if (!rc)
 				rc = err;
@@ -548,7 +552,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 	LDLM_DEBUG(lock, "client-side enqueue END");
 cleanup:
 	if (cleanup_phase == 1 && rc)
-		failed_lock_cleanup(ns, lock, mode);
+		failed_lock_cleanup(ns, lock, einfo->ei_mode);
 	/* Put lock 2 times, the second reference is held by ldlm_cli_enqueue */
 	LDLM_LOCK_PUT(lock);
 	LDLM_LOCK_RELEASE(lock);
@@ -811,24 +815,15 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
 
 	/* extended LDLM opcodes in client stats */
 	if (exp->exp_obd->obd_svc_stats != NULL) {
-		bool glimpse = *flags & LDLM_FL_HAS_INTENT;
-
-		/* OST glimpse has no intent buffer */
-		if (req_capsule_has_field(&req->rq_pill, &RMF_LDLM_INTENT,
-					  RCL_CLIENT)) {
-			struct ldlm_intent *it;
-
-			it = req_capsule_client_get(&req->rq_pill,
-						    &RMF_LDLM_INTENT);
-			glimpse = (it && (it->opc == IT_GLIMPSE));
-		}
-
-		if (!glimpse)
-			ldlm_svc_get_eopc(body, exp->exp_obd->obd_svc_stats);
-		else
+		/* glimpse is intent with no intent buffer */
+		if (*flags & LDLM_FL_HAS_INTENT &&
+		    !req_capsule_has_field(&req->rq_pill, &RMF_LDLM_INTENT,
+					   RCL_CLIENT))
 			lprocfs_counter_incr(exp->exp_obd->obd_svc_stats,
 					     PTLRPC_LAST_CNTR +
 					     LDLM_GLIMPSE_ENQUEUE);
+		else
+			ldlm_svc_get_eopc(body, exp->exp_obd->obd_svc_stats);
 	}
 
 	/* It is important to obtain modify RPC slot first (if applicable), so
@@ -838,7 +833,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
 	if (einfo->ei_enq_slot)
 		ptlrpc_get_mod_rpc_slot(req);
 
-	if (ldlm_request_slot_needed(einfo->ei_type)) {
+	if (ldlm_request_slot_needed(einfo)) {
 		rc = obd_get_request_slot(&req->rq_import->imp_obd->u.cli);
 		if (rc) {
 			if (einfo->ei_enq_slot)
@@ -858,9 +853,8 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
 
 	rc = ptlrpc_queue_wait(req);
 
-	err = ldlm_cli_enqueue_fini(exp, req, einfo->ei_type, policy ? 1 : 0,
-				    einfo->ei_mode, flags, lvb, lvb_len,
-				    lockh, rc);
+	err = ldlm_cli_enqueue_fini(exp, req, einfo, policy ? 1 : 0, flags,
+				    lvb, lvb_len, lockh, rc);
 
 	/*
 	 * If ldlm_cli_enqueue_fini did not find the lock, we need to free
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 570d51a..3139669 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -265,7 +265,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 				   OBD_CONNECT2_ASYNC_DISCARD |
 				   OBD_CONNECT2_PCC |
 				   OBD_CONNECT2_CRUSH | OBD_CONNECT2_LSEEK |
-				   OBD_CONNECT2_GETATTR_PFID;
+				   OBD_CONNECT2_GETATTR_PFID |
+				   OBD_CONNECT2_DOM_LVB;
 
 	if (sbi->ll_flags & LL_SBI_LRU_RESIZE)
 		data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE;
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index 214fd31..e86e69d 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -294,20 +294,16 @@ void mdc_lock_lockless_cancel(const struct lu_env *env,
  * Helper for osc_dlm_blocking_ast() handling discrepancies between cl_lock
  * and ldlm_lock caches.
  */
-static int mdc_dlm_blocking_ast0(const struct lu_env *env,
-				 struct ldlm_lock *dlmlock,
-				 int flag)
+static int mdc_dlm_canceling(const struct lu_env *env,
+			     struct ldlm_lock *dlmlock)
 {
 	struct cl_object *obj = NULL;
 	int result = 0;
 	bool discard;
 	enum cl_lock_mode mode = CLM_READ;
 
-	LASSERT(flag == LDLM_CB_CANCELING);
-	LASSERT(dlmlock);
-
 	lock_res_and_lock(dlmlock);
-	if (dlmlock->l_granted_mode != dlmlock->l_req_mode) {
+	if (!ldlm_is_granted(dlmlock)) {
 		dlmlock->l_ast_data = NULL;
 		unlock_res_and_lock(dlmlock);
 		return 0;
@@ -349,11 +345,11 @@ static int mdc_dlm_blocking_ast0(const struct lu_env *env,
 }
 
 int mdc_ldlm_blocking_ast(struct ldlm_lock *dlmlock,
-			  struct ldlm_lock_desc *new, void *data, int flag)
+			  struct ldlm_lock_desc *new, void *data, int reason)
 {
 	int rc = 0;
 
-	switch (flag) {
+	switch (reason) {
 	case LDLM_CB_BLOCKING: {
 		struct lustre_handle lockh;
 
@@ -384,7 +380,7 @@ int mdc_ldlm_blocking_ast(struct ldlm_lock *dlmlock,
 			break;
 		}
 
-		rc = mdc_dlm_blocking_ast0(env, dlmlock, flag);
+		rc = mdc_dlm_canceling(env, dlmlock);
 		cl_env_put(env, &refcheck);
 		break;
 	}
@@ -430,6 +426,7 @@ void mdc_lock_lvb_update(const struct lu_env *env, struct osc_object *osc,
 			attr->cat_kms = size;
 			setkms = 1;
 		}
+		ldlm_lock_allow_match_locked(dlmlock);
 	}
 
 	/* The size should not be less than the kms */
@@ -479,7 +476,7 @@ static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl,
 
 	/* Lock must have been granted. */
 	lock_res_and_lock(dlmlock);
-	if (dlmlock->l_granted_mode == dlmlock->l_req_mode) {
+	if (ldlm_is_granted(dlmlock)) {
 		struct cl_lock_descr *descr = &oscl->ols_cl.cls_lock->cll_descr;
 
 		/* extend the lock extent, otherwise it will have problem when
@@ -505,7 +502,7 @@ static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl,
 
 /**
  * Lock upcall function that is executed either when a reply to ENQUEUE rpc is
- * received from a server, or after osc_enqueue_base() matched a local DLM
+ * received from a server, or after mdc_enqueue_base() matched a local DLM
  * lock.
  */
 static int mdc_lock_upcall(void *cookie, struct lustre_handle *lockh,
@@ -561,51 +558,64 @@ static int mdc_lock_upcall(void *cookie, struct lustre_handle *lockh,
 	return rc;
 }
 
+/* This is needed only for old servers (before 2.14) support */
 int mdc_fill_lvb(struct ptlrpc_request *req, struct ost_lvb *lvb)
 {
 	struct mdt_body *body;
 
+	/* get LVB data from mdt_body otherwise */
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 	if (!body)
 		return -EPROTO;
 
-	lvb->lvb_mtime = body->mbo_mtime;
-	lvb->lvb_atime = body->mbo_atime;
-	lvb->lvb_ctime = body->mbo_ctime;
-	lvb->lvb_blocks = body->mbo_dom_blocks;
-	lvb->lvb_size = body->mbo_dom_size;
+	if (!(body->mbo_valid & OBD_MD_DOM_SIZE))
+		return -EPROTO;
 
+	mdc_body2lvb(body, lvb);
 	return 0;
 }
 
-int mdc_enqueue_fini(struct ptlrpc_request *req, osc_enqueue_upcall_f upcall,
-		     void *cookie, struct lustre_handle *lockh,
-		     enum ldlm_mode mode, u64 *flags, int errcode)
+int mdc_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
+		     osc_enqueue_upcall_f upcall, void *cookie,
+		     struct lustre_handle *lockh, enum ldlm_mode mode,
+		     u64 *flags, int errcode)
 {
 	struct osc_lock *ols = cookie;
-	struct ldlm_lock *lock;
+	bool glimpse = *flags & LDLM_FL_HAS_INTENT;
 	int rc = 0;
 
-	/* The request was created before ldlm_cli_enqueue call. */
-	if (errcode == ELDLM_LOCK_ABORTED) {
+	/* needed only for glimpse from an old server (< 2.14) */
+	if (glimpse && !exp_connect_dom_lvb(exp))
+		rc = mdc_fill_lvb(req, &ols->ols_lvb);
+
+	if (glimpse && errcode == ELDLM_LOCK_ABORTED) {
 		struct ldlm_reply *rep;
 
 		rep = req_capsule_server_get(&req->rq_pill, &RMF_DLM_REP);
-		LASSERT(rep);
-
-		rep->lock_policy_res2 =
-			ptlrpc_status_ntoh(rep->lock_policy_res2);
-		if (rep->lock_policy_res2)
-			errcode = rep->lock_policy_res2;
-
-		rc = mdc_fill_lvb(req, &ols->ols_lvb);
+		if (likely(rep)) {
+			rep->lock_policy_res2 =
+				ptlrpc_status_ntoh(rep->lock_policy_res2);
+			if (rep->lock_policy_res2)
+				errcode = rep->lock_policy_res2;
+		} else {
+			rc = -EPROTO;
+		}
 		*flags |= LDLM_FL_LVB_READY;
 	} else if (errcode == ELDLM_OK) {
+		struct ldlm_lock *lock;
+
 		/* Callers have references, should be valid always */
 		lock = ldlm_handle2lock(lockh);
-		LASSERT(lock);
 
-		rc = mdc_fill_lvb(req, &lock->l_ost_lvb);
+		/* At this point ols_lvb must be filled with correct LVB either
+		 * by mdc_fill_lvb() above or by ldlm_cli_enqueue_fini().
+		 * DoM uses l_ost_lvb to store LVB data, so copy it here from
+		 * just updated ols_lvb.
+		 */
+		lock_res_and_lock(lock);
+		memcpy(&lock->l_ost_lvb, &ols->ols_lvb,
+		       sizeof(lock->l_ost_lvb));
+		unlock_res_and_lock(lock);
 		LDLM_LOCK_PUT(lock);
 		*flags |= LDLM_FL_LVB_READY;
 	}
@@ -629,6 +639,10 @@ int mdc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req,
 	struct ldlm_lock *lock;
 	struct lustre_handle *lockh = &aa->oa_lockh;
 	enum ldlm_mode mode = aa->oa_mode;
+	struct ldlm_enqueue_info einfo = {
+		.ei_type = aa->oa_type,
+		.ei_mode = mode,
+	};
 
 	LASSERT(!aa->oa_speculative);
 
@@ -643,7 +657,7 @@ int mdc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req,
 	/* Take an additional reference so that a blocking AST that
 	 * ldlm_cli_enqueue_fini() might post for a failed lock, is guaranteed
 	 * to arrive after an upcall has been executed by
-	 * osc_enqueue_fini().
+	 * mdc_enqueue_fini().
 	 */
 	ldlm_lock_addref(lockh, mode);
 
@@ -654,12 +668,12 @@ int mdc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req,
 	OBD_FAIL_TIMEOUT(OBD_FAIL_OSC_CP_ENQ_RACE, 1);
 
 	/* Complete obtaining the lock procedure. */
-	rc = ldlm_cli_enqueue_fini(aa->oa_exp, req, aa->oa_type, 1,
-				   aa->oa_mode, aa->oa_flags, NULL, 0,
-				   lockh, rc);
+	rc = ldlm_cli_enqueue_fini(aa->oa_exp, req, &einfo, 1, aa->oa_flags,
+				   aa->oa_lvb, aa->oa_lvb ?
+				   sizeof(*aa->oa_lvb) : 0, lockh, rc);
 	/* Complete mdc stuff. */
-	rc = mdc_enqueue_fini(req, aa->oa_upcall, aa->oa_cookie, lockh, mode,
-			      aa->oa_flags, rc);
+	rc = mdc_enqueue_fini(aa->oa_exp, req, aa->oa_upcall, aa->oa_cookie,
+			      lockh, mode, aa->oa_flags, rc);
 
 	OBD_FAIL_TIMEOUT(OBD_FAIL_OSC_CP_CANCEL_RACE, 10);
 
@@ -678,8 +692,7 @@ int mdc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req,
  */
 int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp,
 		     struct ldlm_res_id *res_id, u64 *flags,
-		     union ldlm_policy_data *policy,
-		     struct ost_lvb *lvb, int kms_valid,
+		     union ldlm_policy_data *policy, struct ost_lvb *lvb,
 		     osc_enqueue_upcall_f upcall, void *cookie,
 		     struct ldlm_enqueue_info *einfo, int async)
 {
@@ -692,17 +705,16 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp,
 	u64 match_flags = *flags;
 	LIST_HEAD(cancels);
 	int rc, count;
+	int lvb_size;
+	bool compat_glimpse = glimpse && !exp_connect_dom_lvb(exp);
 
 	mode = einfo->ei_mode;
 	if (einfo->ei_mode == LCK_PR)
 		mode |= LCK_PW;
 
+	match_flags |= LDLM_FL_LVB_READY;
 	if (glimpse)
 		match_flags |= LDLM_FL_BLOCK_GRANTED;
-	/* DOM locking uses LDLM_FL_KMS_IGNORE to mark locks wich have no valid
-	 * LVB information, e.g. canceled locks or locks of just pruned object,
-	 * such locks should be skipped.
-	 */
 	mode = ldlm_lock_match(obd->obd_namespace, match_flags, res_id,
 			       einfo->ei_type, policy, mode, &lockh);
 	if (mode) {
@@ -733,7 +745,9 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp,
 	if (*flags & (LDLM_FL_TEST_LOCK | LDLM_FL_MATCH_LOCK))
 		return -ENOLCK;
 
-	req = ptlrpc_request_alloc(class_exp2cliimp(exp), &RQF_LDLM_INTENT);
+	/* Glimpse is intent on old server */
+	req = ptlrpc_request_alloc(class_exp2cliimp(exp), compat_glimpse ?
+				   &RQF_LDLM_INTENT : &RQF_LDLM_ENQUEUE);
 	if (!req)
 		return -ENOMEM;
 
@@ -751,20 +765,27 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp,
 		return rc;
 	}
 
-	/* pack the intent */
-	lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT);
-	lit->opc = glimpse ? IT_GLIMPSE : IT_BRW;
-
-	req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER, 0);
-	req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, 0);
-	ptlrpc_request_set_replen(req);
+	if (compat_glimpse) {
+		/* pack the glimpse intent */
+		lit = req_capsule_client_get(&req->rq_pill, &RMF_LDLM_INTENT);
+		lit->opc = IT_GLIMPSE;
+	}
 
 	/* users of mdc_enqueue() can pass this flag for ldlm_lock_match() */
 	*flags &= ~LDLM_FL_BLOCK_GRANTED;
-	/* All MDC IO locks are intents */
-	*flags |= LDLM_FL_HAS_INTENT;
-	rc = ldlm_cli_enqueue(exp, &req, einfo, res_id, policy, flags, NULL,
-			      0, LVB_T_NONE, &lockh, async);
+	if (compat_glimpse) {
+		req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER, 0);
+		req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, 0);
+		lvb_size = 0;
+	} else {
+		lvb_size = sizeof(*lvb);
+		req_capsule_set_size(&req->rq_pill, &RMF_DLM_LVB, RCL_SERVER,
+				     lvb_size);
+	}
+	ptlrpc_request_set_replen(req);
+
+	rc = ldlm_cli_enqueue(exp, &req, einfo, res_id, policy, flags, lvb,
+			      lvb_size, LVB_T_OST, &lockh, async);
 	if (async) {
 		if (!rc) {
 			struct osc_enqueue_args *aa;
@@ -778,7 +799,7 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp,
 			aa->oa_cookie = cookie;
 			aa->oa_speculative = false;
 			aa->oa_flags = flags;
-			aa->oa_lvb = lvb;
+			aa->oa_lvb = compat_glimpse ? NULL : lvb;
 
 			req->rq_interpret_reply = mdc_enqueue_interpret;
 			ptlrpcd_add_req(req);
@@ -788,7 +809,7 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp,
 		return rc;
 	}
 
-	rc = mdc_enqueue_fini(req, upcall, cookie, &lockh, einfo->ei_mode,
+	rc = mdc_enqueue_fini(exp, req, upcall, cookie, &lockh, einfo->ei_mode,
 			      flags, rc);
 	ptlrpc_req_finished(req);
 	return rc;
@@ -874,8 +895,7 @@ static int mdc_lock_enqueue(const struct lu_env *env,
 	mdc_lock_build_policy(env, lock, policy);
 	LASSERT(!oscl->ols_speculative);
 	result = mdc_enqueue_send(env, osc_export(osc), resname,
-				  &oscl->ols_flags, policy,
-				  &oscl->ols_lvb, osc->oo_oinfo->loi_kms_valid,
+				  &oscl->ols_flags, policy, &oscl->ols_lvb,
 				  upcall, cookie, &oscl->ols_einfo, async);
 	if (result == 0) {
 		if (osc_lock_is_lockless(oscl)) {
@@ -1429,7 +1449,7 @@ static int mdc_object_flush(const struct lu_env *env, struct cl_object *obj,
 	 * so init it here with given osc_object.
 	 */
 	mdc_set_dom_lock_data(lock, cl2osc(obj));
-	return mdc_dlm_blocking_ast0(env, lock, LDLM_CB_CANCELING);
+	return mdc_dlm_canceling(env, lock);
 }
 
 static const struct cl_object_operations mdc_ops = {
diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h
index 065cba5..91e8240 100644
--- a/fs/lustre/mdc/mdc_internal.h
+++ b/fs/lustre/mdc/mdc_internal.h
@@ -168,6 +168,16 @@ int mdc_unpack_acl(struct ptlrpc_request *req, struct lustre_md *md)
 }
 #endif
 
+static inline void mdc_body2lvb(struct mdt_body *body, struct ost_lvb *lvb)
+{
+	LASSERT(body->mbo_valid & OBD_MD_DOM_SIZE);
+	lvb->lvb_mtime = body->mbo_mtime;
+	lvb->lvb_atime = body->mbo_atime;
+	lvb->lvb_ctime = body->mbo_ctime;
+	lvb->lvb_blocks = body->mbo_dom_blocks;
+	lvb->lvb_size = body->mbo_dom_size;
+}
+
 static inline unsigned long hash_x_index(u64 hash, int hash64)
 {
 	if (BITS_PER_LONG == 32 && hash64)
diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index 8bbb9e1..dbf402a 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -872,7 +872,10 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 		LDLM_DEBUG(lock, "DoM lock is returned by: %s, size: %llu",
 			   ldlm_it2str(it->it_op), body->mbo_dom_size);
 
-		rc = mdc_fill_lvb(req, &lock->l_ost_lvb);
+		lock_res_and_lock(lock);
+		mdc_body2lvb(body, &lock->l_ost_lvb);
+		ldlm_lock_allow_match_locked(lock);
+		unlock_res_and_lock(lock);
 	}
 out_lock:
 	LDLM_LOCK_PUT(lock);
@@ -1368,8 +1371,8 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env,
 	if (OBD_FAIL_CHECK(OBD_FAIL_MDC_GETATTR_ENQUEUE))
 		rc = -ETIMEDOUT;
 
-	rc = ldlm_cli_enqueue_fini(exp, req, einfo->ei_type, 1, einfo->ei_mode,
-				   &flags, NULL, 0, lockh, rc);
+	rc = ldlm_cli_enqueue_fini(exp, req, einfo, 1, &flags, NULL, 0,
+				   lockh, rc);
 	if (rc < 0) {
 		CERROR("%s: ldlm_cli_enqueue_fini() failed: rc = %d\n",
 		       exp->exp_obd->obd_name, rc);
diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index 6ce0a5d..0ed1bd5 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -130,6 +130,7 @@
 	"fidmap",		/* 0x10000 */
 	"getattr_pfid",		/* 0x20000 */
 	"lseek",		/* 0x40000 */
+	"dom_lvb",		/* 0x80000 */
 	NULL
 };
 
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 4a4b5ef..a6a8cac 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -2684,6 +2684,10 @@ int osc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req,
 	struct ost_lvb *lvb = aa->oa_lvb;
 	u32 lvb_len = sizeof(*lvb);
 	u64 flags = 0;
+	struct ldlm_enqueue_info einfo = {
+		.ei_type = aa->oa_type,
+		.ei_mode = mode,
+	};
 
 	/* ldlm_cli_enqueue is holding a reference on the lock, so it must
 	 * be valid.
@@ -2712,9 +2716,8 @@ int osc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req,
 	}
 
 	/* Complete obtaining the lock procedure. */
-	rc = ldlm_cli_enqueue_fini(aa->oa_exp, req, aa->oa_type, 1,
-				   aa->oa_mode, aa->oa_flags, lvb, lvb_len,
-				   lockh, rc);
+	rc = ldlm_cli_enqueue_fini(aa->oa_exp, req, &einfo, 1, aa->oa_flags,
+				   lvb, lvb_len, lockh, rc);
 	/* Complete osc stuff. */
 	rc = osc_enqueue_fini(req, aa->oa_upcall, aa->oa_cookie, lockh, mode,
 			      aa->oa_flags, aa->oa_speculative, rc);
@@ -2821,22 +2824,6 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 
 	if (*flags & (LDLM_FL_TEST_LOCK | LDLM_FL_MATCH_LOCK))
 		return -ENOLCK;
-	if (intent) {
-		req = ptlrpc_request_alloc(class_exp2cliimp(exp),
-					   &RQF_LDLM_ENQUEUE_LVB);
-		if (!req)
-			return -ENOMEM;
-
-		rc = ldlm_prep_enqueue_req(exp, req, NULL, 0);
-		if (rc) {
-			ptlrpc_request_free(req);
-			return rc;
-		}
-
-		req_capsule_set_size(&req->rq_pill, &RMF_DLM_LVB, RCL_SERVER,
-				     sizeof(*lvb));
-		ptlrpc_request_set_replen(req);
-	}
 
 	/* users of osc_enqueue() can pass this flag for ldlm_lock_match() */
 	*flags &= ~LDLM_FL_BLOCK_GRANTED;
@@ -2869,16 +2856,12 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 
 			req->rq_interpret_reply = osc_enqueue_interpret;
 			ptlrpc_set_add_req(rqset, req);
-		} else if (intent) {
-			ptlrpc_req_finished(req);
 		}
 		return rc;
 	}
 
 	rc = osc_enqueue_fini(req, upcall, cookie, &lockh, einfo->ei_mode,
 			      flags, speculative, rc);
-	if (intent)
-		ptlrpc_req_finished(req);
 
 	return rc;
 }
@@ -2904,16 +2887,8 @@ int osc_match_base(const struct lu_env *env, struct obd_export *exp,
 	policy->l_extent.end |= ~PAGE_MASK;
 
 	/* Next, search for already existing extent locks that will cover us */
-	/* If we're trying to read, we also search for an existing PW lock.  The
-	 * VFS and page cache already protect us locally, so lots of readers/
-	 * writers can share a single PW lock.
-	 */
-	rc = mode;
-	if (mode == LCK_PR)
-		rc |= LCK_PW;
-
 	rc = ldlm_lock_match_with_skip(obd->obd_namespace, lflags, 0,
-				       res_id, type, policy, rc, lockh,
+				       res_id, type, policy, mode, lockh,
 				       match_flags);
 	if (!rc || lflags & LDLM_FL_TEST_LOCK)
 		return rc;
diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index c8b97fa..fedb914 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -1249,6 +1249,8 @@ void lustre_assert_wire_constants(void)
 		 OBD_CONNECT2_GETATTR_PFID);
 	LASSERTF(OBD_CONNECT2_LSEEK == 0x40000ULL, "found 0x%.16llxULL\n",
 		 OBD_CONNECT2_LSEEK);
+	LASSERTF(OBD_CONNECT2_DOM_LVB == 0x80000ULL, "found 0x%.16llxULL\n",
+		 OBD_CONNECT2_DOM_LVB);
 	LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n",
 		 (unsigned int)OBD_CKSUM_CRC32);
 	LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n",
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index f953815..449ac47 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -839,6 +839,7 @@ struct ptlrpc_body_v2 {
 #define OBD_CONNECT2_FIDMAP	      0x10000ULL /* FID map */
 #define OBD_CONNECT2_GETATTR_PFID     0x20000ULL /* pack parent FID in getattr */
 #define OBD_CONNECT2_LSEEK	      0x40000ULL /* SEEK_HOLE/DATA RPC */
+#define OBD_CONNECT2_DOM_LVB	      0x80000ULL /* pack DOM glimpse data in LVB */
 /* XXX README XXX:
  * Please DO NOT add flag values here before first ensuring that this same
  * flag value is not in use on some other branch.  Please clear any such
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 29/39] lustre: llite: fiemap set flags for encrypted files
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (27 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 28/39] lustre: dom: non-blocking enqueue for DOM locks James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 30/39] lustre: ldlm: don't compute sumsq for pool stats James Simmons
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

FIEMAP ioctl needs to set
FIEMAP_EXTENT_DATA_ENCRYPTED|FIEMAP_EXTENT_ENCODED flags for all
extents of files encrypted by fscrypt.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14149
Lustre-commit: 33322f3a24882d ("LU-14149 llite: fiemap set flags for encrypted files")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/40852
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 5d03fc3..a3a8d1a 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -4986,6 +4986,15 @@ static int ll_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 
 	rc = ll_do_fiemap(inode, fiemap, num_bytes);
 
+	if (IS_ENCRYPTED(inode)) {
+		int i;
+
+		for (i = 0; i < fiemap->fm_mapped_extents; i++)
+			fiemap->fm_extents[i].fe_flags |=
+				FIEMAP_EXTENT_DATA_ENCRYPTED |
+				FIEMAP_EXTENT_ENCODED;
+	}
+
 	fieinfo->fi_flags = fiemap->fm_flags;
 	fieinfo->fi_extents_mapped = fiemap->fm_mapped_extents;
 	if (extent_count > 0 &&
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 30/39] lustre: ldlm: don't compute sumsq for pool stats
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (28 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 29/39] lustre: llite: fiemap set flags for encrypted files James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 31/39] lustre: lov: FIEMAP support for PFL and FLR file James Simmons
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Remove the calculation of sumsq from the LDLM pool stats, since
these stats are almost never used, while conversely the pools
are updated frequently.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9114
Lustre-commit: 966f6bb550be52e ("LU-9114 ldlm: don't compute sumsq for pool stats")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39435
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_pool.c | 33 +++++++++++----------------------
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c
index 9cee24b..2e4d16b 100644
--- a/fs/lustre/ldlm/ldlm_pool.c
+++ b/fs/lustre/ldlm/ldlm_pool.c
@@ -606,38 +606,27 @@ static int ldlm_pool_debugfs_init(struct ldlm_pool *pl)
 	}
 
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_GRANTED_STAT,
-			     LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV,
-			     "granted", "locks");
+			     LPROCFS_CNTR_AVGMINMAX, "granted", "locks");
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_GRANT_STAT,
-			     LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV,
-			     "grant", "locks");
+			     LPROCFS_CNTR_AVGMINMAX, "grant", "locks");
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_CANCEL_STAT,
-			     LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV,
-			     "cancel", "locks");
+			     LPROCFS_CNTR_AVGMINMAX, "cancel", "locks");
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_GRANT_RATE_STAT,
-			     LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV,
-			     "grant_rate", "locks/s");
+			     LPROCFS_CNTR_AVGMINMAX, "grant_rate", "locks/s");
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_CANCEL_RATE_STAT,
-			     LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV,
-			     "cancel_rate", "locks/s");
+			     LPROCFS_CNTR_AVGMINMAX, "cancel_rate", "locks/s");
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_GRANT_PLAN_STAT,
-			     LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV,
-			     "grant_plan", "locks/s");
+			     LPROCFS_CNTR_AVGMINMAX, "grant_plan", "locks/s");
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_SLV_STAT,
-			     LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV,
-			     "slv", "slv");
+			     LPROCFS_CNTR_AVGMINMAX, "slv", "slv");
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_SHRINK_REQTD_STAT,
-			     LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV,
-			     "shrink_request", "locks");
+			     LPROCFS_CNTR_AVGMINMAX, "shrink_request", "locks");
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_SHRINK_FREED_STAT,
-			     LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV,
-			     "shrink_freed", "locks");
+			     LPROCFS_CNTR_AVGMINMAX, "shrink_freed", "locks");
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_RECALC_STAT,
-			     LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV,
-			     "recalc_freed", "locks");
+			     LPROCFS_CNTR_AVGMINMAX, "recalc_freed", "locks");
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_TIMING_STAT,
-			     LPROCFS_CNTR_AVGMINMAX | LPROCFS_CNTR_STDDEV,
-			     "recalc_timing", "sec");
+			     LPROCFS_CNTR_AVGMINMAX, "recalc_timing", "sec");
 	debugfs_create_file("stats", 0644, pl->pl_debugfs_entry, pl->pl_stats,
 			    &lprocfs_stats_seq_fops);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 31/39] lustre: lov: FIEMAP support for PFL and FLR file
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (29 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 30/39] lustre: ldlm: don't compute sumsq for pool stats James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 32/39] lustre: mdc: process changelogs_catalog from the oldest rec James Simmons
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bobi Jam <bobijam@whamcloud.com>

* use the high 16 bits of fe_device to record the absolute stripe
  number from the beginning we are processing, so that continuous call
  can resume from the stripe specified by it.

WC-bug-id: https://jira.whamcloud.com/browse/LU-11848
Lustre-commit: 409719608cf0f60 ("LU-11848 lov: FIEMAP support for PFL and FLR file")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40766
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_object.c                | 248 +++++++++++++++++++-----------
 fs/lustre/lov/lov_offset.c                |   8 +-
 fs/lustre/ptlrpc/wiretest.c               |   1 -
 include/uapi/linux/lustre/lustre_fiemap.h |  30 +++-
 4 files changed, 191 insertions(+), 96 deletions(-)

diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c
index 0762cc5..3fcd342 100644
--- a/fs/lustre/lov/lov_object.c
+++ b/fs/lustre/lov/lov_object.c
@@ -1487,21 +1487,34 @@ static int fiemap_calc_last_stripe(struct lov_stripe_md *lsm, int index,
 				   int start_stripe, int *stripe_count)
 {
 	struct lov_stripe_md_entry *lsme = lsm->lsm_entries[index];
+	int init_stripe;
 	int last_stripe;
-	u64 obd_start;
-	u64 obd_end;
 	int i, j;
 
+	init_stripe = lov_stripe_number(lsm, index, ext->e_start);
+
 	if (ext->e_end - ext->e_start >
 	    lsme->lsme_stripe_size * lsme->lsme_stripe_count) {
-		last_stripe = (start_stripe < 1 ? lsme->lsme_stripe_count - 1 :
-						  start_stripe - 1);
-		*stripe_count = lsme->lsme_stripe_count;
+		if (init_stripe == start_stripe) {
+			last_stripe = (start_stripe < 1) ?
+				lsme->lsme_stripe_count - 1 : start_stripe - 1;
+			*stripe_count = lsme->lsme_stripe_count;
+		} else if (init_stripe < start_stripe) {
+			last_stripe = (init_stripe < 1) ?
+				lsme->lsme_stripe_count - 1 : init_stripe - 1;
+			*stripe_count = lsme->lsme_stripe_count -
+					(start_stripe - init_stripe);
+		} else {
+			last_stripe = init_stripe - 1;
+			*stripe_count = init_stripe - start_stripe;
+		}
 	} else {
 		for (j = 0, i = start_stripe; j < lsme->lsme_stripe_count;
 		     i = (i + 1) % lsme->lsme_stripe_count, j++) {
-			if (lov_stripe_intersects(lsm, index, i, ext,
-						  &obd_start, &obd_end) == 0)
+			if (!lov_stripe_intersects(lsm, index, i, ext, NULL,
+						   NULL))
+				break;
+			if ((start_stripe != init_stripe) && (i == init_stripe))
 				break;
 		}
 		*stripe_count = j;
@@ -1524,13 +1537,14 @@ static int fiemap_calc_last_stripe(struct lov_stripe_md *lsm, int index,
 static void fiemap_prepare_and_copy_exts(struct fiemap *fiemap,
 					 struct fiemap_extent *lcl_fm_ext,
 					 int ost_index, unsigned int ext_count,
-					 int current_extent)
+					 int current_extent, int abs_stripeno)
 {
 	unsigned int ext;
 	char *to;
 
 	for (ext = 0; ext < ext_count; ext++) {
-		lcl_fm_ext[ext].fe_device = ost_index;
+		set_fe_device_stripenr(&lcl_fm_ext[ext], ost_index,
+				       abs_stripeno);
 		lcl_fm_ext[ext].fe_flags |= FIEMAP_EXTENT_NET;
 	}
 
@@ -1565,26 +1579,14 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 {
 	struct lov_stripe_md_entry *lsme = lsm->lsm_entries[index];
 	u64 local_end = fiemap->fm_extents[0].fe_logical;
-	u64 lun_start, lun_end;
+	u64 lun_end;
 	u64 fm_end_offset;
 	int stripe_no = -1;
-	int i;
 
 	if (!fiemap->fm_extent_count || !fiemap->fm_extents[0].fe_logical)
 		return 0;
 
-	/* Find out stripe_no from ost_index saved in the fe_device */
-	for (i = 0; i < lsme->lsme_stripe_count; i++) {
-		struct lov_oinfo *oinfo = lsme->lsme_oinfo[i];
-
-		if (lov_oinfo_is_dummy(oinfo))
-			continue;
-
-		if (oinfo->loi_ost_idx == fiemap->fm_extents[0].fe_device) {
-			stripe_no = i;
-			break;
-		}
-	}
+	stripe_no = *start_stripe;
 
 	if (stripe_no == -1)
 		return -EINVAL;
@@ -1593,11 +1595,9 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 	 * If we have finished mapping on previous device, shift logical
 	 * offset to start of next device
 	 */
-	if (lov_stripe_intersects(lsm, index, stripe_no, ext,
-				  &lun_start, &lun_end) != 0 &&
+	if (lov_stripe_intersects(lsm, index, stripe_no, ext, NULL, &lun_end) &&
 	    local_end < lun_end) {
 		fm_end_offset = local_end;
-		*start_stripe = stripe_no;
 	} else {
 		/* This is a special value to indicate that caller should
 		 * calculate offset in next stripe.
@@ -1611,16 +1611,16 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 
 struct fiemap_state {
 	struct fiemap		*fs_fm;
-	struct lu_extent	fs_ext;
+	struct lu_extent	fs_ext;		/* current entry extent */
 	u64			fs_length;
-	u64			fs_end_offset;
-	int			fs_cur_extent;
-	int			fs_cnt_need;
+	u64			fs_end_offset;	/* last iteration offset */
+	int			fs_cur_extent;	/* collected exts so far */
+	int			fs_cnt_need;	/* # of extents buf can hold */
 	int			fs_start_stripe;
 	int			fs_last_stripe;
-	bool			fs_device_done;
-	bool			fs_finish_stripe;
-	bool			fs_enough;
+	bool			fs_device_done;	/* enough for this OST */
+	bool			fs_finish_stripe; /* reached fs_last_stripe */
+	bool			fs_enough;	/* enough for this call */
 };
 
 static struct cl_object *lov_find_subobj(const struct lu_env *env,
@@ -1669,17 +1669,17 @@ static struct cl_object *lov_find_subobj(const struct lu_env *env,
 static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 			     struct lov_stripe_md *lsm, struct fiemap *fiemap,
 			     size_t *buflen, struct ll_fiemap_info_key *fmkey,
-			     int index, int stripeno, struct fiemap_state *fs)
+			     int index, int stripe_last, int stripeno,
+			     struct fiemap_state *fs)
 {
 	struct lov_stripe_md_entry *lsme = lsm->lsm_entries[index];
 	struct cl_object *subobj;
 	struct lov_obd *lov = lu2lov_dev(obj->co_lu.lo_dev)->ld_lov;
 	struct fiemap_extent *fm_ext = &fs->fs_fm->fm_extents[0];
-	u64 req_fm_len; /* Stores length of required mapping */
+	u64 req_fm_len; /* max requested extent coverage */
 	u64 len_mapped_single_call;
-	u64 lun_start;
-	u64 lun_end;
-	u64 obd_object_end;
+	u64 obd_start;
+	u64 obd_end;
 	unsigned int ext_count;
 	/* EOF for object */
 	bool ost_eof = false;
@@ -1691,24 +1691,24 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 	fs->fs_device_done = false;
 	/* Find out range of mapping on this stripe */
 	if ((lov_stripe_intersects(lsm, index, stripeno, &fs->fs_ext,
-				   &lun_start, &obd_object_end)) == 0)
+				   &obd_start, &obd_end)) == 0)
 		return 0;
 
 	if (lov_oinfo_is_dummy(lsme->lsme_oinfo[stripeno]))
 		return -EIO;
 
 	/* If this is a continuation FIEMAP call and we are on
-	 * starting stripe then lun_start needs to be set to
+	 * starting stripe then obd_start needs to be set to
 	 * end_offset
 	 */
 	if (fs->fs_end_offset != 0 && stripeno == fs->fs_start_stripe)
-		lun_start = fs->fs_end_offset;
+		obd_start = fs->fs_end_offset;
 
-	lun_end = lov_size_to_stripe(lsm, index, fs->fs_ext.e_end, stripeno);
-	if (lun_start == lun_end)
+	if (lov_size_to_stripe(lsm, index, fs->fs_ext.e_end, stripeno) ==
+	    obd_start)
 		return 0;
 
-	req_fm_len = obd_object_end - lun_start + 1;
+	req_fm_len = obd_end - obd_start + 1;
 	fs->fs_fm->fm_length = 0;
 	len_mapped_single_call = 0;
 
@@ -1729,7 +1729,7 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 						  fs->fs_cur_extent;
 		}
 
-		lun_start += len_mapped_single_call;
+		obd_start += len_mapped_single_call;
 		fs->fs_fm->fm_length = req_fm_len - len_mapped_single_call;
 		req_fm_len = fs->fs_fm->fm_length;
 		/**
@@ -1753,14 +1753,14 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 			fs->fs_fm->fm_flags |= FIEMAP_EXTENT_LAST;
 			fs->fs_fm->fm_mapped_extents = 1;
 
-			fm_ext[0].fe_logical = lun_start;
-			fm_ext[0].fe_length = obd_object_end - lun_start + 1;
+			fm_ext[0].fe_logical = obd_start;
+			fm_ext[0].fe_length = obd_end - obd_start + 1;
 			fm_ext[0].fe_flags |= FIEMAP_EXTENT_UNKNOWN;
 
 			goto inactive_tgt;
 		}
 
-		fs->fs_fm->fm_start = lun_start;
+		fs->fs_fm->fm_start = obd_start;
 		fs->fs_fm->fm_flags &= ~FIEMAP_FLAG_DEVICE_ORDER;
 		memcpy(&fmkey->lfik_fiemap, fs->fs_fm, sizeof(*fs->fs_fm));
 		*buflen = fiemap_count_to_size(fs->fs_fm->fm_extent_count);
@@ -1799,7 +1799,7 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 		/* prepare to copy retrived map extents */
 		len_mapped_single_call = fm_ext[ext_count - 1].fe_logical +
 					 fm_ext[ext_count - 1].fe_length -
-					 lun_start;
+					 obd_start;
 
 		/* Have we finished mapping on this device? */
 		if (req_fm_len <= len_mapped_single_call) {
@@ -1821,7 +1821,8 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 		}
 
 		fiemap_prepare_and_copy_exts(fiemap, fm_ext, ost_index,
-					     ext_count, fs->fs_cur_extent);
+					     ext_count, fs->fs_cur_extent,
+					     stripe_last + stripeno);
 		fs->fs_cur_extent += ext_count;
 
 		/* Ran out of available extents? */
@@ -1863,12 +1864,17 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	loff_t whole_start;
 	loff_t whole_end;
 	int entry;
-	int start_entry;
+	int start_entry = -1;
 	int end_entry;
 	int cur_stripe = 0;
 	int stripe_count;
 	int rc = 0;
 	struct fiemap_state fs = { NULL };
+	struct lu_extent range;
+	int cur_ext;
+	int stripe_last;
+	int start_stripe = 0;
+	bool resume = false;
 
 	lsm = lov_lsm_addref(cl2lov(obj));
 	if (!lsm) {
@@ -1936,8 +1942,6 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	 */
 	if (fiemap_count_to_size(fiemap->fm_extent_count) > *buflen)
 		fiemap->fm_extent_count = fiemap_size_to_count(*buflen);
-	if (!fiemap->fm_extent_count)
-		fs.fs_cnt_need = 0;
 
 	fs.fs_enough = false;
 	fs.fs_cur_extent = 0;
@@ -1951,73 +1955,142 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 		goto out_fm_local;
 	}
 	whole_end = (fiemap->fm_length == OBD_OBJECT_EOF) ?
-		     fmkey->lfik_oa.o_size :
-		     whole_start + fiemap->fm_length - 1;
+		     fmkey->lfik_oa.o_size + 1 :
+		     whole_start + fiemap->fm_length;
 	/**
 	 * If fiemap->fm_length != OBD_OBJECT_EOF but whole_end exceeds file
 	 * size
 	 */
-	if (whole_end > fmkey->lfik_oa.o_size)
-		whole_end = fmkey->lfik_oa.o_size;
+	if (whole_end > fmkey->lfik_oa.o_size + 1)
+		whole_end = fmkey->lfik_oa.o_size + 1;
 
-	start_entry = lov_lsm_entry(lsm, whole_start);
-	end_entry = lov_lsm_entry(lsm, whole_end);
-	if (end_entry == -1)
-		end_entry = lsm->lsm_entry_count - 1;
+	/**
+	 * the high 16bits of fe_device remember which stripe the last
+	 * call has been arrived, we'd continue from there in this call.
+	 */
+	if (fiemap->fm_extent_count && fiemap->fm_extents[0].fe_logical)
+		resume = true;
+	stripe_last = get_fe_stripenr(&fiemap->fm_extents[0]);
+	/**
+	 * stripe_last records stripe number we've been processed in the last
+	 * call
+	 */
+	end_entry = lsm->lsm_entry_count - 1;
+	cur_stripe = 0;
+	for (entry = 0; entry <= end_entry; entry++) {
+		lsme = lsm->lsm_entries[entry];
+		if (cur_stripe + lsme->lsme_stripe_count >= stripe_last) {
+			start_entry = entry;
+			start_stripe = stripe_last - cur_stripe;
+			break;
+		}
+		cur_stripe += lsme->lsme_stripe_count;
+	}
 
-	if (start_entry == -1 || end_entry == -1) {
+	if (start_entry == -1) {
+		CERROR(DFID": FIEMAP does not init start entry, cur_stripe=%d, stripe_last=%d\n",
+		       PFID(lu_object_fid(&obj->co_lu)),
+		       cur_stripe, stripe_last);
 		rc = -EINVAL;
 		goto out_fm_local;
 	}
+	/**
+	 * @start_entry & @start_stripe records the position of fiemap
+	 * resumption @stripe_last keeps recording the absolution position
+	 * we'are processing. @resume indicates we'd honor @start_stripe.
+	 */
+
+	range.e_start = whole_start;
+	range.e_end = whole_end;
 
-	/* TODO: rewrite it with lov_foreach_io_layout() */
 	for (entry = start_entry; entry <= end_entry; entry++) {
+		/* remeber to update stripe_last accordingly */
 		lsme = lsm->lsm_entries[entry];
 
-		if (!lsme_inited(lsme))
-			break;
+		/* FLR could contain component holes between entries */
+		if (!lsme_inited(lsme)) {
+			stripe_last += lsme->lsme_stripe_count;
+			resume = false;
+			continue;
+		}
 
-		if (entry == start_entry)
-			fs.fs_ext.e_start = whole_start;
-		else
+		if (!lu_extent_is_overlapped(&range, &lsme->lsme_extent)) {
+			stripe_last += lsme->lsme_stripe_count;
+			resume = false;
+			continue;
+		}
+
+		/* prepare for a component entry iteration */
+		if (lsme->lsme_extent.e_start > whole_start)
 			fs.fs_ext.e_start = lsme->lsme_extent.e_start;
-		if (entry == end_entry)
+		else
+			fs.fs_ext.e_start = whole_start;
+		if (lsme->lsme_extent.e_end > whole_end)
 			fs.fs_ext.e_end = whole_end;
 		else
-			fs.fs_ext.e_end = lsme->lsme_extent.e_end - 1;
-		fs.fs_length = fs.fs_ext.e_end - fs.fs_ext.e_start + 1;
+			fs.fs_ext.e_end = lsme->lsme_extent.e_end;
 
 		/* Calculate start stripe, last stripe and length of mapping */
-		fs.fs_start_stripe = lov_stripe_number(lsm, entry,
-						       fs.fs_ext.e_start);
+		if (resume) {
+			fs.fs_start_stripe = start_stripe;
+			/* put stripe_last to the first stripe of the comp */
+			stripe_last -= start_stripe;
+			resume = false;
+		} else {
+			fs.fs_start_stripe = lov_stripe_number(lsm, entry,
+							       fs.fs_ext.e_start);
+		}
 		fs.fs_last_stripe = fiemap_calc_last_stripe(lsm, entry,
 							    &fs.fs_ext,
 							    fs.fs_start_stripe,
 							    &stripe_count);
-		fs.fs_end_offset = fiemap_calc_fm_end_offset(fiemap, lsm, entry,
-							     &fs.fs_ext,
-							     &fs.fs_start_stripe);
+		/**
+		 * A new mirror component is under process, reset
+		 * fs.fs_end_offset and then fiemap_for_stripe() starts from
+		 * the overlapping extent, otherwise starts from
+		 * fs.fs_end_offset.
+		 */
+		if (entry > start_entry && lsme->lsme_extent.e_start == 0) {
+			/* new mirror */
+			fs.fs_end_offset = 0;
+		} else {
+			fs.fs_end_offset = fiemap_calc_fm_end_offset(fiemap,
+								     lsm, entry,
+								     &fs.fs_ext,
+								     &fs.fs_start_stripe);
+		}
+
 		/* Check each stripe */
 		for (cur_stripe = fs.fs_start_stripe; stripe_count > 0;
 		     --stripe_count,
 		     cur_stripe = (cur_stripe + 1) % lsme->lsme_stripe_count) {
+			/* reset fs_finish_stripe */
+			fs.fs_finish_stripe = false;
 			rc = fiemap_for_stripe(env, obj, lsm, fiemap, buflen,
-					       fmkey, entry, cur_stripe, &fs);
+					       fmkey, entry, stripe_last,
+					       cur_stripe, &fs);
 			if (rc < 0)
 				goto out_fm_local;
-			if (fs.fs_enough)
+			if (fs.fs_enough) {
+				stripe_last += cur_stripe;
 				goto finish;
+			}
 			if (fs.fs_finish_stripe)
 				break;
 		} /* for each stripe */
-	} /* for covering layout component */
+		stripe_last += lsme->lsme_stripe_count;
+	} /* for covering layout component entry */
 
-	/*
-	 * We've traversed all components, set @entry to the last component
-	 * entry, it's for the last stripe check.
-	 */
-	entry--;
 finish:
+	if (fs.fs_cur_extent > 0)
+		cur_ext = fs.fs_cur_extent - 1;
+	else
+		cur_ext = 0;
+
+	/* done all the processing */
+	if (entry > end_entry)
+		fiemap->fm_extents[cur_ext].fe_flags |= FIEMAP_EXTENT_LAST;
+
 	/*
 	 * Indicate that we are returning device offsets unless file just has
 	 * single stripe
@@ -2030,13 +2103,6 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	if (!fiemap->fm_extent_count)
 		goto skip_last_device_calc;
 
-	/*
-	 * Check if we have reached the last stripe and whether mapping for that
-	 * stripe is done.
-	 */
-	if ((cur_stripe == fs.fs_last_stripe) && fs.fs_device_done)
-		fiemap->fm_extents[fs.fs_cur_extent - 1].fe_flags |=
-							FIEMAP_EXTENT_LAST;
 skip_last_device_calc:
 	fiemap->fm_mapped_extents = fs.fs_cur_extent;
 out_fm_local:
diff --git a/fs/lustre/lov/lov_offset.c b/fs/lustre/lov/lov_offset.c
index b53ce43..ca763af 100644
--- a/fs/lustre/lov/lov_offset.c
+++ b/fs/lustre/lov/lov_offset.c
@@ -227,18 +227,24 @@ u64 lov_size_to_stripe(struct lov_stripe_md *lsm, int index, u64 file_size,
  * that is contained within the lov extent.  this returns true if the given
  * stripe does intersect with the lov extent.
  *
- * Closed interval [@obd_start, @obd_end] will be returned.
+ * Closed interval [@obd_start, @obd_end] will be returned if caller needs them.
  */
 int lov_stripe_intersects(struct lov_stripe_md *lsm, int index, int stripeno,
 			  struct lu_extent *ext, u64 *obd_start, u64 *obd_end)
 {
 	struct lov_stripe_md_entry *entry = lsm->lsm_entries[index];
 	int start_side, end_side;
+	u64 loc_start, loc_end;
 	u64 start, end;
 
 	if (!lu_extent_is_overlapped(ext, &entry->lsme_extent))
 		return 0;
 
+	if (!obd_start)
+		obd_start = &loc_start;
+	if (!obd_end)
+		obd_end = &loc_end;
+
 	start = max_t(u64, ext->e_start, entry->lsme_extent.e_start);
 	end = min_t(u64, ext->e_end, entry->lsme_extent.e_end);
 	if (end != OBD_OBJECT_EOF)
diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index fedb914..a500a87 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -4262,7 +4262,6 @@ void lustre_assert_wire_constants(void)
 	BUILD_BUG_ON(FIEMAP_EXTENT_UNWRITTEN != 0x00000800);
 	BUILD_BUG_ON(FIEMAP_EXTENT_MERGED != 0x00001000);
 	BUILD_BUG_ON(FIEMAP_EXTENT_SHARED != 0x00002000);
-	BUILD_BUG_ON(FIEMAP_EXTENT_NO_DIRECT != 0x40000000);
 	BUILD_BUG_ON(FIEMAP_EXTENT_NET != 0x80000000);
 
 #ifdef CONFIG_FS_POSIX_ACL
diff --git a/include/uapi/linux/lustre/lustre_fiemap.h b/include/uapi/linux/lustre/lustre_fiemap.h
index 4ae1850..f93e107 100644
--- a/include/uapi/linux/lustre/lustre_fiemap.h
+++ b/include/uapi/linux/lustre/lustre_fiemap.h
@@ -43,9 +43,35 @@
 #include <linux/fiemap.h>
 #include <linux/types.h>
 
-/* XXX: We use fiemap_extent::fe_reserved[0] */
+/**
+ * XXX: We use fiemap_extent::fe_reserved[0], notice the high 16bits of it
+ * is used to locate the stripe number starting from the very beginning to
+ * resume the fiemap call.
+ */
 #define fe_device	fe_reserved[0]
 
+static inline int get_fe_device(struct fiemap_extent *fe)
+{
+	return fe->fe_device & 0xffff;
+}
+static inline void set_fe_device(struct fiemap_extent *fe, int devno)
+{
+	fe->fe_device = (fe->fe_device & 0xffff0000) | (devno & 0xffff);
+}
+static inline int get_fe_stripenr(struct fiemap_extent *fe)
+{
+	return fe->fe_device >> 16;
+}
+static inline void set_fe_stripenr(struct fiemap_extent *fe, int nr)
+{
+	fe->fe_device = (fe->fe_device & 0xffff) | (nr << 16);
+}
+static inline void set_fe_device_stripenr(struct fiemap_extent *fe, int devno,
+					  int nr)
+{
+	fe->fe_device = (nr << 16) | (devno & 0xffff);
+}
+
 static inline __kernel_size_t fiemap_count_to_size(__kernel_size_t extent_count)
 {
 	return sizeof(struct fiemap) + extent_count *
@@ -64,8 +90,6 @@ static inline unsigned int fiemap_size_to_count(__kernel_size_t array_size)
 #undef FIEMAP_FLAGS_COMPAT
 #endif
 
-/* Lustre specific flags - use a high bit, don't conflict with upstream flag */
-#define FIEMAP_EXTENT_NO_DIRECT	 0x40000000 /* Data mapping undefined */
 #define FIEMAP_EXTENT_NET	 0x80000000 /* Data stored remotely.
 					     * Sets NO_DIRECT flag
 					     */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 32/39] lustre: mdc: process changelogs_catalog from the oldest rec
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (30 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 31/39] lustre: lov: FIEMAP support for PFL and FLR file James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 33/39] lustre: ldlm: Use req_mode while lock cleanup James Simmons
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Etienne AUJAMES <eaujames@ddn.com>

The chlg_load use the LLOG_CAT_FIRST to process changelogs. This
values will process record in the catalog always starting with index
0 to the newest record. So when catalog reach the end of indexes and
when records are saved at the beginning of catalog, the
llog_cat_process will ignore records at the end.

This patch change the "startcat" value LLOG_CAT_FIRST to 0 to scan
the catalog from the oldest record to the newest.

Fixes: d95486c4 (lustre: mdc: polling mode for changelog reader)
WC-bug-id: https://jira.whamcloud.com/browse/LU-14158
Lustre-commit: ad4c8633498848 ("LU-14158 mdc: process changelogs_catalog from the oldest rec")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/40786
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/mdc_changelog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/mdc/mdc_changelog.c b/fs/lustre/mdc/mdc_changelog.c
index 8531edb..f671f46 100644
--- a/fs/lustre/mdc/mdc_changelog.c
+++ b/fs/lustre/mdc/mdc_changelog.c
@@ -287,7 +287,7 @@ static int chlg_load(void *args)
 	struct llog_handle *llh = NULL;
 	int rc;
 
-	crs->crs_last_catidx = -1;
+	crs->crs_last_catidx = 0;
 	crs->crs_last_idx = 0;
 
 again:
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 33/39] lustre: ldlm: Use req_mode while lock cleanup
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (31 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 32/39] lustre: mdc: process changelogs_catalog from the oldest rec James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 34/39] lnet: socklnd: announce deprecation of 'use_tcp_bonding' James Simmons
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Yang Sheng, Lustre Development List

From: Yang Sheng <ys@whamcloud.com>

For local lock, the decref cannot count exactly by
granted_mode if the lock has not been granted.
 LustreError: (ldlm_lock.c:354:ldlm_lock_destroy_internal())
 ### lock still has references ns: ??
 lock: ffff88342aa07200/0x9b92ad3407bea22a
 lrc: 4/0,1 mode: --/PW res: ?? rrc=?? type: ???
 flags: 0x10106400000000 nid: local
 remote: 0x5248822d3123ac19 expref: -99
 pid: 14515 timeout: 0 lvb_type: 0
LustreError: (ldlm_lock.c:355:ldlm_lock_destroy_internal()) LBUG
Pid: 14562, comm: ll_imp_inval 3.10.0-693.21.1.el7.x86_64 #1 SMP
Call Trace:
[] save_stack_trace_tsk+0x22/0x40
[] libcfs_call_trace+0x8c/0xc0 [libcfs]
[] lbug_with_loc+0x4c/0xa0 [libcfs]
[] ldlm_lock_destroy_internal+0x269/0x2a0 [ptlrpc]
[] ldlm_lock_destroy_nolock+0x2b/0x110 [ptlrpc]
[] ldlm_flock_completion_ast+0x4f5/0x1080 [ptlrpc]
[] cleanup_resource+0x18e/0x370 [ptlrpc]
[] ldlm_resource_clean+0x53/0x60 [ptlrpc]
[] cfs_hash_for_each_relax+0x250/0x450 [libcfs]
[] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
[] ldlm_namespace_cleanup+0x30/0xc0 [ptlrpc]
[] mdc_import_event+0x1b6/0xa20 [mdc]
[] ptlrpc_invalidate_import+0x220/0x8f0 [ptlrpc]
[] ptlrpc_invalidate_import_thread+0x48/0x2b0 [ptlrpc]
[] kthread+0xd1/0xe0

WC-bug-id: https://jira.whamcloud.com/browse/LU-14082
Lustre-commit: a11c18cbab00d0 ("LU-14082 ldlm: Use req_mode while lock cleanup")
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40433
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_flock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/ldlm/ldlm_flock.c b/fs/lustre/ldlm/ldlm_flock.c
index 720362f..b4916cb15 100644
--- a/fs/lustre/ldlm/ldlm_flock.c
+++ b/fs/lustre/ldlm/ldlm_flock.c
@@ -414,7 +414,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
 		if (ldlm_is_test_lock(lock) || ldlm_is_flock_deadlock(lock))
 			mode = getlk->fl_type;
 		else
-			mode = lock->l_granted_mode;
+			mode = lock->l_req_mode;
 
 		if (ldlm_is_flock_deadlock(lock)) {
 			LDLM_DEBUG(lock,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 34/39] lnet: socklnd: announce deprecation of 'use_tcp_bonding'
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (32 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 33/39] lustre: ldlm: Use req_mode while lock cleanup James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 35/39] lnet: o2iblnd: remove FMR-pool support James Simmons
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Add warning to be printed if 'use_tcp_bonding' option is used
notifying the user that the feature is being deprecated.
It is suggested to use MR configuration with dynamic discovery
instead.

Multi-Rail feature doesn't need to be explicitly enabled.
To use MR instead of tcp bonding, group the interfaces
on the same network using the lnetctl utility:

	lnetctl net add --net tcp --if eth2,eth3

or via the modprobe configuration file (/etc/modprobe.d/lnet.conf
or /etc/modprobe.d/lustre.conf):

	options lnet networks="tcp(eth2,eth3)"

and make sure dynamic discovery is enabled:

	lnetctl set discovery 1

MR will aggregate the throughput of all configured and available
networks/interfaces shared between peer nodes.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13641
Lustre-commit: 1a2bf911b97936 ("LU-13641 socklnd: announce deprecation of 'use_tcp_bonding'")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41088
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/api-ni.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 322b25d..c3bf444 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -70,7 +70,7 @@ struct lnet the_lnet = {
 static int use_tcp_bonding = false;
 module_param(use_tcp_bonding, int, 0444);
 MODULE_PARM_DESC(use_tcp_bonding,
-		 "Set to 1 to use socklnd bonding. 0 to use Multi-Rail");
+		 "use_tcp_bonding parameter has been deprecated");
 
 unsigned int lnet_numa_range;
 EXPORT_SYMBOL(lnet_numa_range);
@@ -2610,8 +2610,10 @@ void lnet_lib_exit(void)
 		goto err_empty_list;
 	}
 
-	/*
-	 * If LNet is being initialized via DLC it is possible
+	if (use_tcp_bonding)
+		CWARN("'use_tcp_bonding' option has been deprecated. See LU-13641\n");
+
+	/* If LNet is being initialized via DLC it is possible
 	 * that the user requests not to load module parameters (ones which
 	 * are supported by DLC) on initialization.  Therefore, make sure not
 	 * to load networks, routes and forwarding from module parameters
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 35/39] lnet: o2iblnd: remove FMR-pool support.
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (33 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 34/39] lnet: socklnd: announce deprecation of 'use_tcp_bonding' James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:16 ` [lustre-devel] [PATCH 36/39] lustre: llite: return EOPNOTSUPP if fallocate is not supported James Simmons
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Linux 5.8 removes the FMR-pool API.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13783
Lustre-commit: 6fd5c8bef83aaf ("LU-13783 o2iblnd: make FMR-pool support optional.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/40287
Reviewed-by: Sergey Gorenko <sergeygo@nvidia.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c    | 268 +++++++++++-------------------------
 net/lnet/klnds/o2iblnd/o2iblnd.h    |   6 -
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c |  27 +---
 3 files changed, 81 insertions(+), 220 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index fc515fc..9147d17 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1313,27 +1313,23 @@ static void kiblnd_map_tx_pool(struct kib_tx_pool *tpo)
 
 static void kiblnd_destroy_fmr_pool(struct kib_fmr_pool *fpo)
 {
-	LASSERT(!fpo->fpo_map_count);
+	struct kib_fast_reg_descriptor *frd;
+	int i = 0;
 
-	if (!IS_ERR_OR_NULL(fpo->fmr.fpo_fmr_pool)) {
-		ib_destroy_fmr_pool(fpo->fmr.fpo_fmr_pool);
-	} else {
-		struct kib_fast_reg_descriptor *frd;
-		int i = 0;
+	LASSERT(!fpo->fpo_map_count);
 
-		while (!list_empty(&fpo->fast_reg.fpo_pool_list)) {
-			frd = list_first_entry(&fpo->fast_reg.fpo_pool_list,
-					       struct kib_fast_reg_descriptor,
-					       frd_list);
-			list_del(&frd->frd_list);
-			ib_dereg_mr(frd->frd_mr);
-			kfree(frd);
-			i++;
-		}
-		if (i < fpo->fast_reg.fpo_pool_size)
-			CERROR("FastReg pool still has %d regions registered\n",
-			       fpo->fast_reg.fpo_pool_size - i);
+	while (!list_empty(&fpo->fast_reg.fpo_pool_list)) {
+		frd = list_first_entry(&fpo->fast_reg.fpo_pool_list,
+				       struct kib_fast_reg_descriptor,
+				       frd_list);
+		list_del(&frd->frd_list);
+		ib_dereg_mr(frd->frd_mr);
+		kfree(frd);
+		i++;
 	}
+	if (i < fpo->fast_reg.fpo_pool_size)
+		CERROR("FastReg pool still has %d regions registered\n",
+		       fpo->fast_reg.fpo_pool_size - i);
 
 	if (fpo->fpo_hdev)
 		kiblnd_hdev_decref(fpo->fpo_hdev);
@@ -1370,34 +1366,6 @@ static void kiblnd_destroy_fmr_pool_list(struct list_head *head)
 	return max(IBLND_FMR_POOL_FLUSH, size);
 }
 
-static int kiblnd_alloc_fmr_pool(struct kib_fmr_poolset *fps, struct kib_fmr_pool *fpo)
-{
-	struct ib_fmr_pool_param param = {
-		.max_pages_per_fmr	= LNET_MAX_IOV,
-		.page_shift		= PAGE_SHIFT,
-		.access			= (IB_ACCESS_LOCAL_WRITE |
-					   IB_ACCESS_REMOTE_WRITE),
-		.pool_size		= fps->fps_pool_size,
-		.dirty_watermark	= fps->fps_flush_trigger,
-		.flush_function		= NULL,
-		.flush_arg		= NULL,
-		.cache			= !!fps->fps_cache
-	};
-	int rc = 0;
-
-	fpo->fmr.fpo_fmr_pool = ib_create_fmr_pool(fpo->fpo_hdev->ibh_pd,
-						   &param);
-	if (IS_ERR(fpo->fmr.fpo_fmr_pool)) {
-		rc = PTR_ERR(fpo->fmr.fpo_fmr_pool);
-		if (rc != -ENOSYS)
-			CERROR("Failed to create FMR pool: %d\n", rc);
-		else
-			CERROR("FMRs are not supported\n");
-	}
-
-	return rc;
-}
-
 static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps,
 				  struct kib_fmr_pool *fpo,
 				  enum kib_dev_caps dev_caps)
@@ -1481,10 +1449,7 @@ static int kiblnd_create_fmr_pool(struct kib_fmr_poolset *fps,
 	fpo->fpo_hdev = kiblnd_current_hdev(dev);
 	dev_attr = &fpo->fpo_hdev->ibh_ibdev->attrs;
 
-	if (dev->ibd_dev_caps & IBLND_DEV_CAPS_FMR_ENABLED)
-		rc = kiblnd_alloc_fmr_pool(fps, fpo);
-	else
-		rc = kiblnd_alloc_freg_pool(fps, fpo, dev->ibd_dev_caps);
+	rc = kiblnd_alloc_freg_pool(fps, fpo, dev->ibd_dev_caps);
 	if (rc)
 		goto out_fpo;
 
@@ -1568,61 +1533,25 @@ static int kiblnd_fmr_pool_is_idle(struct kib_fmr_pool *fpo, time64_t now)
 	return now >= fpo->fpo_deadline;
 }
 
-static int
-kiblnd_map_tx_pages(struct kib_tx *tx, struct kib_rdma_desc *rd)
-{
-	u64 *pages = tx->tx_pages;
-	struct kib_hca_dev *hdev;
-	int npages;
-	int size;
-	int i;
-
-	hdev = tx->tx_pool->tpo_hdev;
-
-	for (i = 0, npages = 0; i < rd->rd_nfrags; i++) {
-		for (size = 0; size <  rd->rd_frags[i].rf_nob;
-		     size += hdev->ibh_page_size) {
-			pages[npages++] = (rd->rd_frags[i].rf_addr &
-					   hdev->ibh_page_mask) + size;
-		}
-	}
-
-	return npages;
-}
-
 void kiblnd_fmr_pool_unmap(struct kib_fmr *fmr, int status)
 {
+	struct kib_fast_reg_descriptor *frd = fmr->fmr_frd;
 	LIST_HEAD(zombies);
 	struct kib_fmr_pool *fpo = fmr->fmr_pool;
 	struct kib_fmr_poolset *fps;
 	time64_t now = ktime_get_seconds();
 	struct kib_fmr_pool *tmp;
-	int rc;
 
 	if (!fpo)
 		return;
 
 	fps = fpo->fpo_owner;
-	if (!IS_ERR_OR_NULL(fpo->fmr.fpo_fmr_pool)) {
-		if (fmr->fmr_pfmr) {
-			ib_fmr_pool_unmap(fmr->fmr_pfmr);
-			fmr->fmr_pfmr = NULL;
-		}
-
-		if (status) {
-			rc = ib_flush_fmr_pool(fpo->fmr.fpo_fmr_pool);
-			LASSERT(!rc);
-		}
-	} else {
-		struct kib_fast_reg_descriptor *frd = fmr->fmr_frd;
-
-		if (frd) {
-			frd->frd_valid = false;
-			spin_lock(&fps->fps_lock);
-			list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list);
-			spin_unlock(&fps->fps_lock);
-			fmr->fmr_frd = NULL;
-		}
+	if (frd) {
+		frd->frd_valid = false;
+		spin_lock(&fps->fps_lock);
+		list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list);
+		spin_unlock(&fps->fps_lock);
+		fmr->fmr_frd = NULL;
 	}
 	fmr->fmr_pool = NULL;
 
@@ -1649,11 +1578,8 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx,
 			struct kib_rdma_desc *rd, u32 nob, u64 iov,
 			struct kib_fmr *fmr)
 {
-	u64 *pages = tx->tx_pages;
 	bool is_rx = (rd != tx->tx_rd);
-	bool tx_pages_mapped = false;
 	struct kib_fmr_pool *fpo;
-	int npages = 0;
 	u64 version;
 	int rc;
 
@@ -1664,96 +1590,65 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx,
 		fpo->fpo_deadline = ktime_get_seconds() + IBLND_POOL_DEADLINE;
 		fpo->fpo_map_count++;
 
-		if (!IS_ERR_OR_NULL(fpo->fmr.fpo_fmr_pool)) {
-			struct ib_pool_fmr *pfmr;
+		if (!list_empty(&fpo->fast_reg.fpo_pool_list)) {
+			struct kib_fast_reg_descriptor *frd;
+			struct ib_reg_wr *wr;
+			struct ib_mr *mr;
+			int n;
 
+			frd = list_first_entry(&fpo->fast_reg.fpo_pool_list,
+					       struct kib_fast_reg_descriptor,
+					       frd_list);
+			list_del(&frd->frd_list);
 			spin_unlock(&fps->fps_lock);
 
-			if (!tx_pages_mapped) {
-				npages = kiblnd_map_tx_pages(tx, rd);
-				tx_pages_mapped = 1;
-			}
+			mr = frd->frd_mr;
 
-			pfmr = ib_fmr_pool_map_phys(fpo->fmr.fpo_fmr_pool,
-						    pages, npages, iov);
-			if (likely(!IS_ERR(pfmr))) {
-				fmr->fmr_key = is_rx ? pfmr->fmr->rkey :
-						       pfmr->fmr->lkey;
-				fmr->fmr_frd = NULL;
-				fmr->fmr_pfmr = pfmr;
-				fmr->fmr_pool = fpo;
-				return 0;
+			if (!frd->frd_valid) {
+				u32 key = is_rx ? mr->rkey : mr->lkey;
+				struct ib_send_wr *inv_wr;
+
+				inv_wr = &frd->frd_inv_wr;
+				memset(inv_wr, 0, sizeof(*inv_wr));
+				inv_wr->opcode = IB_WR_LOCAL_INV;
+				inv_wr->wr_id = IBLND_WID_MR;
+				inv_wr->ex.invalidate_rkey = key;
+
+				/* Bump the key */
+				key = ib_inc_rkey(key);
+				ib_update_fast_reg_key(mr, key);
 			}
-			rc = PTR_ERR(pfmr);
-		} else {
-			if (!list_empty(&fpo->fast_reg.fpo_pool_list)) {
-				struct kib_fast_reg_descriptor *frd;
-				struct ib_reg_wr *wr;
-				struct ib_mr *mr;
-				int n;
-
-				frd = list_first_entry(&fpo->fast_reg.fpo_pool_list,
-						       struct kib_fast_reg_descriptor,
-						       frd_list);
-				list_del(&frd->frd_list);
-				spin_unlock(&fps->fps_lock);
-
-				mr = frd->frd_mr;
-
-				if (!frd->frd_valid) {
-					u32 key = is_rx ? mr->rkey : mr->lkey;
-					struct ib_send_wr *inv_wr;
-
-					inv_wr = &frd->frd_inv_wr;
-					memset(inv_wr, 0, sizeof(*inv_wr));
-					inv_wr->opcode = IB_WR_LOCAL_INV;
-					inv_wr->wr_id = IBLND_WID_MR;
-					inv_wr->ex.invalidate_rkey = key;
-
-					/* Bump the key */
-					key = ib_inc_rkey(key);
-					ib_update_fast_reg_key(mr, key);
-				}
-
-				n = ib_map_mr_sg(mr, tx->tx_frags,
-						 rd->rd_nfrags, NULL,
-						 PAGE_SIZE);
-				if (unlikely(n != rd->rd_nfrags)) {
-					CERROR("Failed to map mr %d/%d elements\n",
-					       n, rd->rd_nfrags);
-					return n < 0 ? n : -EINVAL;
-				}
-
-				/* Prepare FastReg WR */
-				wr = &frd->frd_fastreg_wr;
-				memset(wr, 0, sizeof(*wr));
-				wr->wr.opcode = IB_WR_REG_MR;
-				wr->wr.wr_id = IBLND_WID_MR;
-				wr->wr.num_sge = 0;
-				wr->wr.send_flags = 0;
-				wr->mr = mr;
-				wr->key = is_rx ? mr->rkey : mr->lkey;
-				wr->access = (IB_ACCESS_LOCAL_WRITE |
-					      IB_ACCESS_REMOTE_WRITE);
-
-				fmr->fmr_key = is_rx ? mr->rkey : mr->lkey;
-				fmr->fmr_frd = frd;
-				fmr->fmr_pfmr = NULL;
-				fmr->fmr_pool = fpo;
-				return 0;
+
+			n = ib_map_mr_sg(mr, tx->tx_frags,
+					 rd->rd_nfrags, NULL,
+					 PAGE_SIZE);
+			if (unlikely(n != rd->rd_nfrags)) {
+				CERROR("Failed to map mr %d/%d elements\n",
+				       n, rd->rd_nfrags);
+				return n < 0 ? n : -EINVAL;
 			}
-			spin_unlock(&fps->fps_lock);
-			rc = -EAGAIN;
-		}
 
-		spin_lock(&fps->fps_lock);
-		fpo->fpo_map_count--;
-		if (rc != -EAGAIN) {
-			spin_unlock(&fps->fps_lock);
-			return rc;
+			/* Prepare FastReg WR */
+			wr = &frd->frd_fastreg_wr;
+			memset(wr, 0, sizeof(*wr));
+			wr->wr.opcode = IB_WR_REG_MR;
+			wr->wr.wr_id = IBLND_WID_MR;
+			wr->wr.num_sge = 0;
+			wr->wr.send_flags = 0;
+			wr->mr = mr;
+			wr->key = is_rx ? mr->rkey : mr->lkey;
+			wr->access = (IB_ACCESS_LOCAL_WRITE |
+				      IB_ACCESS_REMOTE_WRITE);
+
+			fmr->fmr_key = is_rx ? mr->rkey : mr->lkey;
+			fmr->fmr_frd = frd;
+			fmr->fmr_pool = fpo;
+			return 0;
 		}
 
 		/* EAGAIN and ... */
+		rc = -EAGAIN;
+		fpo->fpo_map_count--;
 		if (version != fps->fps_version) {
 			spin_unlock(&fps->fps_lock);
 			goto again;
@@ -2353,32 +2248,25 @@ static int kiblnd_hdev_get_attr(struct kib_hca_dev *hdev)
 	hdev->ibh_page_size = 1 << PAGE_SHIFT;
 	hdev->ibh_page_mask = ~((u64)hdev->ibh_page_size - 1);
 
-	if (hdev->ibh_ibdev->ops.alloc_fmr &&
-	    hdev->ibh_ibdev->ops.dealloc_fmr &&
-	    hdev->ibh_ibdev->ops.map_phys_fmr &&
-	    hdev->ibh_ibdev->ops.unmap_fmr) {
-		LCONSOLE_INFO("Using FMR for registration\n");
-		hdev->ibh_dev->ibd_dev_caps |= IBLND_DEV_CAPS_FMR_ENABLED;
-	} else if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
+	hdev->ibh_mr_size = dev_attr->max_mr_size;
+	hdev->ibh_max_qp_wr = dev_attr->max_qp_wr;
+
+	if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
 		LCONSOLE_INFO("Using FastReg for registration\n");
 		hdev->ibh_dev->ibd_dev_caps |= IBLND_DEV_CAPS_FASTREG_ENABLED;
 		if (dev_attr->device_cap_flags & IB_DEVICE_SG_GAPS_REG)
 			hdev->ibh_dev->ibd_dev_caps |= IBLND_DEV_CAPS_FASTREG_GAPS_SUPPORT;
 	} else {
-		CERROR("IB device does not support FMRs nor FastRegs, can't register memory: %d\n",
+		CERROR("IB device does not support FastRegs, can't register memory: %d\n",
 		       -ENXIO);
 		return -ENXIO;
 	}
 
-	hdev->ibh_mr_size = dev_attr->max_mr_size;
-	hdev->ibh_max_qp_wr = dev_attr->max_qp_wr;
-
 	rc2 = kiblnd_port_get_attr(hdev);
 	if (rc2 != 0)
-		return rc2;
+		CERROR("Invalid mr size: %#llx\n", hdev->ibh_mr_size);
 
-	CERROR("Invalid mr size: %#llx\n", hdev->ibh_mr_size);
-	return -EINVAL;
+	return rc2;
 }
 
 void kiblnd_hdev_destroy(struct kib_hca_dev *hdev)
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 424ca07..12d220c 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -60,7 +60,6 @@
 #include <rdma/rdma_cm.h>
 #include <rdma/ib_cm.h>
 #include <rdma/ib_verbs.h>
-#include <rdma/ib_fmr_pool.h>
 
 #define DEBUG_SUBSYSTEM S_LND
 
@@ -146,7 +145,6 @@ struct kib_tunables {
 enum kib_dev_caps {
 	IBLND_DEV_CAPS_FASTREG_ENABLED		= BIT(0),
 	IBLND_DEV_CAPS_FASTREG_GAPS_SUPPORT	= BIT(1),
-	IBLND_DEV_CAPS_FMR_ENABLED		= BIT(2),
 };
 
 struct kib_dev {
@@ -281,9 +279,6 @@ struct kib_fmr_pool {
 	struct kib_hca_dev	*fpo_hdev;	/* device for this pool */
 	struct kib_fmr_poolset	*fpo_owner;	/* owner of this pool */
 	union {
-		struct {
-			struct ib_fmr_pool	*fpo_fmr_pool; /* IB FMR pool */
-		} fmr;
 		struct { /* For fast registration */
 			struct list_head	fpo_pool_list;
 			int			fpo_pool_size;
@@ -296,7 +291,6 @@ struct kib_fmr_pool {
 
 struct kib_fmr {
 	struct kib_fmr_pool		*fmr_pool;	/* pool of FMR */
-	struct ib_pool_fmr		*fmr_pfmr;	/* IB pool fmr */
 	struct kib_fast_reg_descriptor	*fmr_frd;
 	u32				 fmr_key;
 };
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 5cd367e5..c799453 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -575,23 +575,6 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 		return -EPROTONOSUPPORT;
 	}
 
-	/*
-	 * FMR does not support gaps but the tx has gaps then
-	 * we should make sure that the number of fragments we'll be sending
-	 * over fits within the number of fragments negotiated on the
-	 * connection, otherwise, we won't be able to RDMA the data.
-	 * We need to maintain the number of fragments negotiation on the
-	 * connection for backwards compatibility.
-	 */
-	if (tx->tx_gaps && (dev->ibd_dev_caps & IBLND_DEV_CAPS_FMR_ENABLED)) {
-		if (tx->tx_conn &&
-		    tx->tx_conn->ibc_max_frags <= rd->rd_nfrags) {
-			CERROR("TX number of frags (%d) is <= than connection number of frags (%d). Consider setting peer's map_on_demand to 256\n",
-			       tx->tx_nfrags, tx->tx_conn->ibc_max_frags);
-			return -EFBIG;
-		}
-	}
-
 	fps = net->ibn_fmr_ps[cpt];
 	rc = kiblnd_fmr_pool_map(fps, tx, rd, nob, 0, &tx->tx_fmr);
 	if (rc) {
@@ -606,14 +589,10 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 	 */
 	rd->rd_key = tx->tx_fmr.fmr_key;
 	/*
-	 * for FastReg or FMR with no gaps we can accumulate all
+	 * for FastReg with no gaps we can accumulate all
 	 * the fragments in one FastReg or FMR fragment.
 	 */
-	if (((dev->ibd_dev_caps & IBLND_DEV_CAPS_FMR_ENABLED) && !tx->tx_gaps) ||
-	    (dev->ibd_dev_caps & IBLND_DEV_CAPS_FASTREG_ENABLED)) {
-		/* FMR requires zero based address */
-		if (dev->ibd_dev_caps & IBLND_DEV_CAPS_FMR_ENABLED)
-			rd->rd_frags[0].rf_addr &= ~hdev->ibh_page_mask;
+	if (dev->ibd_dev_caps & IBLND_DEV_CAPS_FASTREG_ENABLED) {
 		rd->rd_frags[0].rf_nob = nob;
 		rd->rd_nfrags = 1;
 	} else {
@@ -633,7 +612,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 
 static void kiblnd_unmap_tx(struct kib_tx *tx)
 {
-	if (tx->tx_fmr.fmr_pfmr || tx->tx_fmr.fmr_frd)
+	if (tx->tx_fmr.fmr_frd)
 		kiblnd_fmr_pool_unmap(&tx->tx_fmr, tx->tx_status);
 
 	if (tx->tx_nfrags) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 36/39] lustre: llite: return EOPNOTSUPP if fallocate is not supported
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (34 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 35/39] lnet: o2iblnd: remove FMR-pool support James Simmons
@ 2021-01-21 17:16 ` James Simmons
  2021-01-21 17:17 ` [lustre-devel] [PATCH 37/39] lnet: use an unbound cred in kiblnd_resolve_addr() James Simmons
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: "John L. Hammond" <jhammond@whamcloud.com>

In ll_fallocate() if the server returns the NFSv3 specific error code
ENOTSUPP then replace it with EOPNOTSUPP to avoid confusing
applications.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14301
Lustre-commit: 71a9f5a466bfa4 ("LU-14301 llite: return EOPNOTSUPP if fallocate is not supported")
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41148
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index a3a8d1a..7c7ac01 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -4934,6 +4934,7 @@ int cl_falloc(struct inode *inode, int mode, loff_t offset, loff_t len)
 long ll_fallocate(struct file *filp, int mode, loff_t offset, loff_t len)
 {
 	struct inode *inode = filp->f_path.dentry->d_inode;
+	int rc;
 
 	/*
 	 * Encrypted inodes can't handle collapse range or zero range or insert
@@ -4955,7 +4956,17 @@ long ll_fallocate(struct file *filp, int mode, loff_t offset, loff_t len)
 
 	ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FALLOCATE, 1);
 
-	return cl_falloc(inode, mode, offset, len);
+	rc = cl_falloc(inode, mode, offset, len);
+	/*
+	 * ENOTSUPP (524) is an NFSv3 specific error code erroneously
+	 * used by Lustre in several places. Retuning it here would
+	 * confuse applications that explicity test for EOPNOTSUPP
+	 * (95) and fall back to ftruncate().
+	 */
+	if (rc == -ENOTSUPP)
+		rc = -EOPNOTSUPP;
+
+	return rc;
 }
 
 static int ll_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 37/39] lnet: use an unbound cred in kiblnd_resolve_addr()
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (35 preceding siblings ...)
  2021-01-21 17:16 ` [lustre-devel] [PATCH 36/39] lustre: llite: return EOPNOTSUPP if fallocate is not supported James Simmons
@ 2021-01-21 17:17 ` James Simmons
  2021-01-21 17:17 ` [lustre-devel] [PATCH 38/39] lustre: lov: correctly set OST obj size James Simmons
  2021-01-21 17:17 ` [lustre-devel] [PATCH 39/39] lustre: cksum: add lprocfs checksum support in MDC/MDT James Simmons
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: "John L. Hammond" <jhammond@whamcloud.com>

In kiblnd_resolve_addr() call prepare_kernel_cred(NULL) rather than
prepare_creds() to get a cred with unbound capabilities.

Fixes: 5fc342b471a ("lnet: o2ib: raise bind cap before resolving address")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14296
Lustre-commit: 30b356a28b5094 ("LU-14296 lnet: use an unbound cred in kiblnd_resolve_addr()")
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41137
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index c799453..e29cb4b 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1207,8 +1207,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	unsigned short port;
 	int rc;
 
-	LASSERT(capable(CAP_NET_BIND_SERVICE));
-
 	/* allow the port to be reused */
 	rc = rdma_set_reuseaddr(cmid, 1);
 	if (rc) {
@@ -1234,7 +1232,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		}
 	}
 
-	CERROR("Failed to bind to a free privileged port\n");
+	CERROR("cannot bind to a free privileged port: rc = %d\n", rc);
+
 	return rc;
 }
 
@@ -1249,7 +1248,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	int rc;
 
 	if (!capable(CAP_NET_BIND_SERVICE)) {
-		new_creds = prepare_creds();
+		new_creds = prepare_kernel_cred(NULL);
 		if (!new_creds)
 			return -ENOMEM;
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 38/39] lustre: lov: correctly set OST obj size
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (36 preceding siblings ...)
  2021-01-21 17:17 ` [lustre-devel] [PATCH 37/39] lnet: use an unbound cred in kiblnd_resolve_addr() James Simmons
@ 2021-01-21 17:17 ` James Simmons
  2021-01-21 17:17 ` [lustre-devel] [PATCH 39/39] lustre: cksum: add lprocfs checksum support in MDC/MDT James Simmons
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bobi Jam <bobijam@whamcloud.com>

When extends a PFL file to a size locating at a boundary of a stripe
in a component, the truncate won't set the size of the OST object
in the prior stripe.

This patch record the prior stripe in
lov_layout_raid0::lo_trunc_stripeno and add the stripe in the
truncate IO and enqueue the lock covering it.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14128
Lustre-commit: 98015004516cad ("LU-14128 lov: correctly set OST obj size")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40581
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_cl_internal.h |  5 +++
 fs/lustre/lov/lov_internal.h    |  1 +
 fs/lustre/lov/lov_io.c          | 97 +++++++++++++++++++++++++++++++++++------
 fs/lustre/lov/lov_lock.c        | 31 +++++++++----
 fs/lustre/lov/lov_object.c      |  1 +
 fs/lustre/lov/lov_offset.c      |  2 +-
 6 files changed, 114 insertions(+), 23 deletions(-)

diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h
index 7128224..f86176a 100644
--- a/fs/lustre/lov/lov_cl_internal.h
+++ b/fs/lustre/lov/lov_cl_internal.h
@@ -176,6 +176,11 @@ struct lov_comp_layout_entry_ops {
 struct lov_layout_raid0 {
 	unsigned int		lo_nr;
 	/**
+	 * record the stripe no before the truncate size, used for setting OST
+	 * object size for truncate. LU-14128.
+	 */
+	int			lo_trunc_stripeno;
+	/**
 	 * When this is true, lov_object::lo_attr contains
 	 * valid up to date attributes for a top-level
 	 * object. This field is reset to 0 when attributes of
diff --git a/fs/lustre/lov/lov_internal.h b/fs/lustre/lov/lov_internal.h
index 202e4b5..5d726fd 100644
--- a/fs/lustre/lov/lov_internal.h
+++ b/fs/lustre/lov/lov_internal.h
@@ -264,6 +264,7 @@ int lov_merge_lvb_kms(struct lov_stripe_md *lsm, int index,
 		      struct ost_lvb *lvb, u64 *kms_place);
 
 /* lov_offset.c */
+u64 stripe_width(struct lov_stripe_md *lsm, unsigned int index);
 u64 lov_stripe_size(struct lov_stripe_md *lsm, int index, u64 ost_size,
 		    int stripeno);
 int lov_stripe_offset(struct lov_stripe_md *lsm, int index, u64 lov_off,
diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c
index d4a0c9d..daceab0 100644
--- a/fs/lustre/lov/lov_io.c
+++ b/fs/lustre/lov/lov_io.c
@@ -752,6 +752,24 @@ static u64 lov_offset_mod(u64 val, int delta)
 	return val;
 }
 
+static int lov_io_add_sub(const struct lu_env *env, struct lov_io *lio,
+			  struct lov_io_sub *sub, u64 start, u64 end)
+{
+	int rc;
+
+	end = lov_offset_mod(end, 1);
+	lov_io_sub_inherit(sub, lio, start, end);
+	rc = cl_io_iter_init(sub->sub_env, &sub->sub_io);
+	if (rc != 0) {
+		cl_io_iter_fini(sub->sub_env, &sub->sub_io);
+		return rc;
+	}
+
+	list_add_tail(&sub->sub_linkage, &lio->lis_active);
+
+	return rc;
+}
+
 static int lov_io_iter_init(const struct lu_env *env,
 			    const struct cl_io_slice *ios)
 {
@@ -768,10 +786,13 @@ static int lov_io_iter_init(const struct lu_env *env,
 	lov_foreach_io_layout(index, lio, &ext) {
 		struct lov_layout_entry *le = lov_entry(lio->lis_object, index);
 		struct lov_layout_raid0 *r0 = &le->lle_raid0;
+		bool tested_trunc_stripe = false;
 		int stripe;
 		u64 start;
 		u64 end;
 
+		r0->lo_trunc_stripeno = -1;
+
 		CDEBUG(D_VFSTRACE, "component[%d] flags %#x\n",
 		       index, lsm->lsm_entries[index]->lsme_flags);
 		if (!lsm_entry_inited(lsm, index)) {
@@ -801,28 +822,76 @@ static int lov_io_iter_init(const struct lu_env *env,
 				continue;
 			}
 
-			end = lov_offset_mod(end, 1);
+			if (cl_io_is_trunc(ios->cis_io) &&
+			    !tested_trunc_stripe) {
+				int prev;
+				u64 tr_start;
+
+				prev = (stripe == 0) ? r0->lo_nr - 1 :
+						       stripe - 1;
+				/**
+				 * Only involving previous stripe if the
+				 * truncate in this component is at the
+				 * beginning of this stripe.
+				 */
+				tested_trunc_stripe = true;
+				if (ext.e_start <
+				    lsm->lsm_entries[index]->lsme_extent.e_start) {
+					/* need previous stripe involvement */
+					r0->lo_trunc_stripeno = prev;
+				} else {
+					tr_start = ext.e_start;
+					tr_start = lov_do_div64(tr_start,
+								stripe_width(lsm, index));
+					/* tr_start %= stripe_swidth */
+					if (tr_start == stripe * lsm->lsm_entries[index]->lsme_stripe_size)
+						r0->lo_trunc_stripeno = prev;
+				}
+			}
+
+			/* if the last stripe is the trunc stripeno */
+			if (r0->lo_trunc_stripeno == stripe)
+				r0->lo_trunc_stripeno = -1;
+
 			sub = lov_sub_get(env, lio,
 					  lov_comp_index(index, stripe));
-			if (IS_ERR(sub)) {
-				rc = PTR_ERR(sub);
-				break;
-			}
+			if (IS_ERR(sub))
+				return PTR_ERR(sub);
 
-			lov_io_sub_inherit(sub, lio, start, end);
-			rc = cl_io_iter_init(sub->sub_env, &sub->sub_io);
-			if (rc) {
-				cl_io_iter_fini(sub->sub_env, &sub->sub_io);
+			rc = lov_io_add_sub(env, lio, sub, start, end);
+			if (rc)
 				break;
+		}
+		if (rc != 0)
+			break;
+
+		if (r0->lo_trunc_stripeno != -1) {
+			stripe = r0->lo_trunc_stripeno;
+			if (unlikely(!r0->lo_sub[stripe])) {
+				r0->lo_trunc_stripeno = -1;
+				continue;
 			}
+			sub = lov_sub_get(env, lio,
+					  lov_comp_index(index, stripe));
+			if (IS_ERR(sub))
+				return PTR_ERR(sub);
 
-			CDEBUG(D_VFSTRACE, "shrink: %d [%llu, %llu)\n",
-			       stripe, start, end);
+			/**
+			 * the prev sub could be used by another truncate, we'd
+			 * skip it. LU-14128 happends when expand truncate +
+			 * read get wrong kms.
+			 */
+			if (!list_empty(&sub->sub_linkage)) {
+				r0->lo_trunc_stripeno = -1;
+				continue;
+			}
 
-			list_add_tail(&sub->sub_linkage, &lio->lis_active);
+			(void)lov_stripe_intersects(lsm, index, stripe, &ext,
+						    &start, &end);
+			rc = lov_io_add_sub(env, lio, sub, start, end);
+			if (rc != 0)
+				break;
 		}
-		if (rc)
-			break;
 	}
 	return rc;
 }
diff --git a/fs/lustre/lov/lov_lock.c b/fs/lustre/lov/lov_lock.c
index 7dae13f..c79f728 100644
--- a/fs/lustre/lov/lov_lock.c
+++ b/fs/lustre/lov/lov_lock.c
@@ -111,6 +111,7 @@ static int lov_sublock_init(const struct lu_env *env,
  * through already created sub-locks (possibly shared with other top-locks).
  */
 static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
+					  const struct cl_io *io,
 					  const struct cl_object *obj,
 					  struct cl_lock *lock)
 {
@@ -135,10 +136,14 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 		struct lov_layout_raid0 *r0 = lov_r0(lov, index);
 
 		for (i = 0; i < r0->lo_nr; i++) {
-			if (likely(r0->lo_sub[i]) && /* spare layout */
-			    lov_stripe_intersects(lov->lo_lsm, index, i,
-						  &ext, &start, &end))
-				nr++;
+			if (likely(r0->lo_sub[i])) { /* spare layout */
+				if (lov_stripe_intersects(lov->lo_lsm, index, i,
+							  &ext, &start, &end))
+					nr++;
+				else if (cl_io_is_trunc(io) &&
+					 r0->lo_trunc_stripeno == i)
+					nr++;
+			}
 		}
 	}
 	/**
@@ -160,12 +165,22 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 		for (i = 0; i < r0->lo_nr; ++i) {
 			struct lov_lock_sub *lls = &lovlck->lls_sub[nr];
 			struct cl_lock_descr *descr = &lls->sub_lock.cll_descr;
+			bool intersect = false;
 
-			if (unlikely(!r0->lo_sub[i]) ||
-			    !lov_stripe_intersects(lov->lo_lsm, index, i,
-						   &ext, &start, &end))
+			if (unlikely(!r0->lo_sub[i]))
 				continue;
 
+			intersect = lov_stripe_intersects(lov->lo_lsm, index, i,
+							  &ext, &start, &end);
+			if (intersect)
+				goto init_sublock;
+
+			if (cl_io_is_trunc(io) && i == r0->lo_trunc_stripeno)
+				goto init_sublock;
+
+			continue;
+
+init_sublock:
 			LASSERT(!descr->cld_obj);
 			descr->cld_obj = lovsub2cl(r0->lo_sub[i]);
 			descr->cld_start = cl_index(descr->cld_obj, start);
@@ -308,7 +323,7 @@ int lov_lock_init_composite(const struct lu_env *env, struct cl_object *obj,
 	struct lov_lock *lck;
 	int result = 0;
 
-	lck = lov_lock_sub_init(env, obj, lock);
+	lck = lov_lock_sub_init(env, io, obj, lock);
 	if (!IS_ERR(lck))
 		cl_lock_slice_add(lock, &lck->lls_cl, obj, &lov_lock_ops);
 	else
diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c
index 3fcd342..d9729c8 100644
--- a/fs/lustre/lov/lov_object.c
+++ b/fs/lustre/lov/lov_object.c
@@ -215,6 +215,7 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 
 	spin_lock_init(&r0->lo_sub_lock);
 	r0->lo_nr = lse->lsme_stripe_count;
+	r0->lo_trunc_stripeno = -1;
 
 	flags = memalloc_nofs_save();
 	r0->lo_sub = kvmalloc_array(r0->lo_nr, sizeof(r0->lo_sub[0]),
diff --git a/fs/lustre/lov/lov_offset.c b/fs/lustre/lov/lov_offset.c
index ca763af..2493331 100644
--- a/fs/lustre/lov/lov_offset.c
+++ b/fs/lustre/lov/lov_offset.c
@@ -37,7 +37,7 @@
 
 #include "lov_internal.h"
 
-static u64 stripe_width(struct lov_stripe_md *lsm, unsigned int index)
+u64 stripe_width(struct lov_stripe_md *lsm, unsigned int index)
 {
 	struct lov_stripe_md_entry *entry = lsm->lsm_entries[index];
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [lustre-devel] [PATCH 39/39] lustre: cksum: add lprocfs checksum support in MDC/MDT
  2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
                   ` (37 preceding siblings ...)
  2021-01-21 17:17 ` [lustre-devel] [PATCH 38/39] lustre: lov: correctly set OST obj size James Simmons
@ 2021-01-21 17:17 ` James Simmons
  38 siblings, 0 replies; 40+ messages in thread
From: James Simmons @ 2021-01-21 17:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Mikhail Pershin, Lustre Development List

From: Mikhail Pershin <mpershin@whamcloud.com>

Add missed support for checksum parameters in MDC and MDT
Handle T10-PI parameters in MDT similar to OFD, move all
functionality to target code and unify its usage in both
targets

WC-bug-id: https://jira.whamcloud.com/browse/LU-14194
Lustre-commit: 18d61a910bcc76 ("LU-14194 cksum: add lprocfs checksum support in MDC/MDT")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40971
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/lproc_mdc.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/lustre/osc/lproc_osc.c |  10 ++--
 2 files changed, 131 insertions(+), 5 deletions(-)

diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index ce03999..3a2c37a2 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -34,6 +34,7 @@
 
 #include <linux/vfs.h>
 #include <obd_class.h>
+#include <obd_cksum.h>
 #include <lprocfs_status.h>
 #include <lustre_osc.h>
 #include <cl_object.h>
@@ -87,6 +88,127 @@ static ssize_t mdc_max_dirty_mb_seq_write(struct file *file,
 }
 LDEBUGFS_SEQ_FOPS(mdc_max_dirty_mb);
 
+DECLARE_CKSUM_NAME;
+
+static int mdc_checksum_type_seq_show(struct seq_file *m, void *v)
+{
+	struct obd_device *obd = m->private;
+	int i;
+
+	if (!obd)
+		return 0;
+
+	for (i = 0; i < ARRAY_SIZE(cksum_name); i++) {
+		if ((BIT(i) & obd->u.cli.cl_supp_cksum_types) == 0)
+			continue;
+		if (obd->u.cli.cl_cksum_type == BIT(i))
+			seq_printf(m, "[%s] ", cksum_name[i]);
+		else
+			seq_printf(m, "%s ", cksum_name[i]);
+	}
+	seq_puts(m, "\n");
+
+	return 0;
+}
+
+static ssize_t mdc_checksum_type_seq_write(struct file *file,
+					   const char __user *buffer,
+					   size_t count, loff_t *off)
+{
+	struct seq_file *m = file->private_data;
+	struct obd_device *obd = m->private;
+	char kernbuf[10];
+	int rc = -EINVAL;
+	int i;
+
+	if (!obd)
+		return 0;
+
+	if (count > sizeof(kernbuf) - 1)
+		return -EINVAL;
+	if (copy_from_user(kernbuf, buffer, count))
+		return -EFAULT;
+
+	if (count > 0 && kernbuf[count - 1] == '\n')
+		kernbuf[count - 1] = '\0';
+	else
+		kernbuf[count] = '\0';
+
+	for (i = 0; i < ARRAY_SIZE(cksum_name); i++) {
+		if (strcmp(kernbuf, cksum_name[i]) == 0) {
+			obd->u.cli.cl_preferred_cksum_type = BIT(i);
+			if (obd->u.cli.cl_supp_cksum_types & BIT(i)) {
+				obd->u.cli.cl_cksum_type = BIT(i);
+				rc = count;
+			} else {
+				rc = -ENOTSUPP;
+			}
+			break;
+		}
+	}
+
+	return rc;
+}
+LDEBUGFS_SEQ_FOPS(mdc_checksum_type);
+
+static ssize_t checksums_show(struct kobject *kobj,
+			      struct attribute *attr, char *buf)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%d\n", !!obd->u.cli.cl_checksum);
+}
+
+static ssize_t checksums_store(struct kobject *kobj,
+			       struct attribute *attr,
+			       const char *buffer,
+			       size_t count)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+	bool val;
+	int rc;
+
+	rc = kstrtobool(buffer, &val);
+	if (rc)
+		return rc;
+
+	obd->u.cli.cl_checksum = val;
+
+	return count;
+}
+LUSTRE_RW_ATTR(checksums);
+
+static ssize_t checksum_dump_show(struct kobject *kobj,
+				  struct attribute *attr, char *buf)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "%d\n", !!obd->u.cli.cl_checksum_dump);
+}
+
+static ssize_t checksum_dump_store(struct kobject *kobj,
+				   struct attribute *attr,
+				   const char *buffer,
+				   size_t count)
+{
+	struct obd_device *obd = container_of(kobj, struct obd_device,
+					      obd_kset.kobj);
+	bool val;
+	int rc;
+
+	rc = kstrtobool(buffer, &val);
+	if (rc)
+		return rc;
+
+	obd->u.cli.cl_checksum_dump = val;
+
+	return count;
+}
+LUSTRE_RW_ATTR(checksum_dump);
+
 static int mdc_cached_mb_seq_show(struct seq_file *m, void *v)
 {
 	struct obd_device *obd = m->private;
@@ -503,6 +625,8 @@ static ssize_t mdc_dom_min_repsize_seq_write(struct file *file,
 	  .fops	=	&mdc_max_dirty_mb_fops		},
 	{ .name	=	"mdc_cached_mb",
 	  .fops	=	&mdc_cached_mb_fops		},
+	{ .name =	"checksum_type",
+	  .fops	=	&mdc_checksum_type_fops		},
 	{ .name	=	"timeouts",
 	  .fops	=	&mdc_timeouts_fops		},
 	{ .name	=	"contention_seconds",
@@ -526,6 +650,8 @@ static ssize_t mdc_dom_min_repsize_seq_write(struct file *file,
 
 static struct attribute *mdc_attrs[] = {
 	&lustre_attr_active.attr,
+	&lustre_attr_checksums.attr,
+	&lustre_attr_checksum_dump.attr,
 	&lustre_attr_max_rpcs_in_flight.attr,
 	&lustre_attr_max_mod_rpcs_in_flight.attr,
 	&lustre_attr_max_pages_per_rpc.attr,
diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c
index 89b55c3..e64176e 100644
--- a/fs/lustre/osc/lproc_osc.c
+++ b/fs/lustre/osc/lproc_osc.c
@@ -358,7 +358,7 @@ static ssize_t checksums_show(struct kobject *kobj,
 	struct obd_device *obd = container_of(kobj, struct obd_device,
 					      obd_kset.kobj);
 
-	return sprintf(buf, "%d\n", obd->u.cli.cl_checksum ? 1 : 0);
+	return scnprintf(buf, PAGE_SIZE, "%d\n", !!obd->u.cli.cl_checksum);
 }
 
 static ssize_t checksums_store(struct kobject *kobj,
@@ -381,10 +381,11 @@ static ssize_t checksums_store(struct kobject *kobj,
 }
 LUSTRE_RW_ATTR(checksums);
 
+DECLARE_CKSUM_NAME;
+
 static int osc_checksum_type_seq_show(struct seq_file *m, void *v)
 {
 	struct obd_device *obd = m->private;
-	DECLARE_CKSUM_NAME;
 	int i;
 
 	if (!obd)
@@ -408,10 +409,9 @@ static ssize_t osc_checksum_type_seq_write(struct file *file,
 {
 	struct seq_file *m = file->private_data;
 	struct obd_device *obd = m->private;
-	DECLARE_CKSUM_NAME;
 	char kernbuf[10];
-	int i;
 	int rc = -EINVAL;
+	int i;
 
 	if (!obd)
 		return 0;
@@ -479,7 +479,7 @@ static ssize_t checksum_dump_show(struct kobject *kobj,
 	struct obd_device *obd = container_of(kobj, struct obd_device,
 					      obd_kset.kobj);
 
-	return sprintf(buf, "%d\n", obd->u.cli.cl_checksum_dump ? 1 : 0);
+	return scnprintf(buf, PAGE_SIZE, "%d\n", !!obd->u.cli.cl_checksum_dump);
 }
 
 static ssize_t checksum_dump_store(struct kobject *kobj,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-01-21 17:19 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-21 17:16 [lustre-devel] [PATCH 00/39] lustre: update to latest OpenSFS version as of Jan 21 2021 James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 01/39] lustre: ldlm: page discard speedup James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 02/39] lustre: ptlrpc: fixes for RCU-related stalls James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 03/39] lustre: ldlm: Do not wait for lock replay sending if import dsconnected James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 04/39] lustre: ldlm: Do not hang if recovery restarted during lock replay James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 05/39] lnet: Correct handling of NETWORK_TIMEOUT status James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 06/39] lnet: Introduce constant for net ID of LNET_NID_ANY James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 07/39] lustre: ldlm: Don't re-enqueue glimpse lock on read James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 08/39] lustre: osc: prevent overflow of o_dropped James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 09/39] lustre: llite: fix client evicition with DIO James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 10/39] lustre: Use vfree_atomic instead of vfree James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 11/39] lnet: lnd: Use NETWORK_TIMEOUT for txs on ibp_tx_queue James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 12/39] lnet: lnd: Use NETWORK_TIMEOUT for some conn failures James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 13/39] lustre: llite: allow DIO with unaligned IO count James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 14/39] lustre: osc: skip 0 row for rpc_stats James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 15/39] lustre: quota: df should return projid-specific values James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 16/39] lnet: discard the callback James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 17/39] lustre: llite: try to improve mmap performance James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 18/39] lnet: Introduce lnet_recovery_limit parameter James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 19/39] lustre: mdc: avoid easize set to 0 James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 20/39] lustre: lmv: optimize dir shard revalidate James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 21/39] lustre: ldlm: osc_object_ast_clear() is called for mdc object on eviction James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 22/39] lustre: uapi: fix compatibility for LL_IOC_MDC_GETINFO James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 23/39] lustre: llite: don't check layout info for page discard James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 24/39] lustre: update version to 2.13.57 James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 25/39] lnet: o2iblnd: retry qp creation with reduced queue depth James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 26/39] lustre: lov: fix SEEK_HOLE calcs at component end James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 27/39] lustre: lov: instantiate components layout for fallocate James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 28/39] lustre: dom: non-blocking enqueue for DOM locks James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 29/39] lustre: llite: fiemap set flags for encrypted files James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 30/39] lustre: ldlm: don't compute sumsq for pool stats James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 31/39] lustre: lov: FIEMAP support for PFL and FLR file James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 32/39] lustre: mdc: process changelogs_catalog from the oldest rec James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 33/39] lustre: ldlm: Use req_mode while lock cleanup James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 34/39] lnet: socklnd: announce deprecation of 'use_tcp_bonding' James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 35/39] lnet: o2iblnd: remove FMR-pool support James Simmons
2021-01-21 17:16 ` [lustre-devel] [PATCH 36/39] lustre: llite: return EOPNOTSUPP if fallocate is not supported James Simmons
2021-01-21 17:17 ` [lustre-devel] [PATCH 37/39] lnet: use an unbound cred in kiblnd_resolve_addr() James Simmons
2021-01-21 17:17 ` [lustre-devel] [PATCH 38/39] lustre: lov: correctly set OST obj size James Simmons
2021-01-21 17:17 ` [lustre-devel] [PATCH 39/39] lustre: cksum: add lprocfs checksum support in MDC/MDT James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).