All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022
@ 2022-01-14  1:37 James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 01/24] lustre: osc: don't have extra gpu call James Simmons
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Update native Linux lustre client with backport of the OpenSFS
work as of Jan 13, 2022

Alexander Zarochentsev (2):
  lustre: ptlrpc: make rq_replied flag always correct
  lustre: mgc: do not ignore target registration failure

Alexey Lyashkov (2):
  lustre: osc: don't have extra gpu call
  lnet: o2iblnd: cleanup

Andreas Dilger (1):
  lustre: mdc: GET(X)ATTR to READPAGE portal

Chris Horn (1):
  lnet: Skip router discovery on send path

Etienne AUJAMES (1):
  lnet: libcfs: set x->ls_len to 0 when x->ls_str is NULL

James Simmons (1):
  lustre: llite: make foreign symlinks aware of mount namespaces

Lai Siyao (4):
  lustre: lmv: set default LMV for "lfs mkdir -c 1"
  lustre: lmv: improve MDT QOS space balance
  lustre: llite: access striped directory with missing stripe
  lustre: llite: revalidate dentry if LOOKUP lock fetched

Lei Feng (1):
  lustre: uapi: set default max-inherit to 3

Li Dongyang (1):
  lustre: llite: add trusted.projid virtual xattr

Patrick Farrell (8):
  lustre: lov: Cache stripe offset calculation
  lnet: libcfs: Remove D_TTY
  lustre: llite: Add D_IOTRACE
  lustre: llite: Add start_idx debug
  lustre: llite: Switch pcc to lookup_one_len
  lustre: llite: Simplify cda_no_aio_complete use
  lustre: osc: Always set aio in anchor
  lustre: llite: Implement lower/upper aio

Serguei Smirnov (2):
  lnet: o2iblnd: treat cmid->device == NULL as an error
  lnet: socklnd: decrement connection counters on close

 fs/lustre/include/cl_object.h           |   7 +-
 fs/lustre/include/obd.h                 |   9 ++-
 fs/lustre/include/obd_class.h           |   7 +-
 fs/lustre/llite/dcache.c                |  19 ++++-
 fs/lustre/llite/dir.c                   |  45 ++++++++---
 fs/lustre/llite/file.c                  |  16 +++-
 fs/lustre/llite/llite_foreign_symlink.c |   8 +-
 fs/lustre/llite/llite_internal.h        |  14 +++-
 fs/lustre/llite/llite_lib.c             |   7 +-
 fs/lustre/llite/llite_mmap.c            |  13 +++-
 fs/lustre/llite/llite_nfs.c             |   2 +-
 fs/lustre/llite/namei.c                 |   6 +-
 fs/lustre/llite/pcc.c                   | 119 +++++++++++++++++++---------
 fs/lustre/llite/rw.c                    |  31 +++++---
 fs/lustre/llite/rw26.c                  |  34 ++++++--
 fs/lustre/llite/statahead.c             |   6 +-
 fs/lustre/llite/xattr.c                 |  15 ++++
 fs/lustre/llite/xattr_cache.c           |  15 ++++
 fs/lustre/lmv/lmv_intent.c              |   4 +-
 fs/lustre/lmv/lmv_obd.c                 | 133 ++++++++++++++++++--------------
 fs/lustre/lov/lov_cl_internal.h         |   9 ++-
 fs/lustre/lov/lov_io.c                  |   6 ++
 fs/lustre/lov/lov_page.c                |  57 ++++++++++----
 fs/lustre/mdc/mdc_request.c             |  14 +++-
 fs/lustre/mgc/mgc_request.c             |   5 +-
 fs/lustre/obdclass/cl_io.c              |  56 ++++++++++----
 fs/lustre/osc/osc_request.c             |   2 +-
 fs/lustre/ptlrpc/ptlrpc_internal.h      |   1 +
 include/uapi/linux/lnet/libcfs_debug.h  |   4 +-
 include/uapi/linux/lustre/lustre_idl.h  |   1 +
 include/uapi/linux/lustre/lustre_user.h |  16 ++--
 net/lnet/klnds/o2iblnd/o2iblnd.c        |  15 ++--
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c     |  43 ++++-------
 net/lnet/klnds/socklnd/socklnd.c        |  69 ++++++++++++++---
 net/lnet/libcfs/libcfs_string.c         |   1 +
 net/lnet/libcfs/tracefile.c             |  51 +-----------
 net/lnet/lnet/lib-move.c                |  23 ++++--
 37 files changed, 582 insertions(+), 301 deletions(-)

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 01/24] lustre: osc: don't have extra gpu call
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 02/24] lustre: llite: add trusted.projid virtual xattr James Simmons
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexey Lyashkov, Lustre Development List

From: Alexey Lyashkov <alexey.lyashkov@hpe.com>

osc don't needs to call GPU to check an GPU page,
this is in the oap_flags

WC-bug-id: https://jira.whamcloud.com/browse/LU-15189
Lustre-commit: a75f1a90611038ea0 ("LU-15189 osc: don't have extra nvidia call")
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/45481
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_request.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 59dc625..14863dc 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1584,7 +1584,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 		}
 	}
 
-	if (lnet_is_rdma_only_page(pga[0]->pg)) {
+	if (brw_page2oap(pga[0])->oap_brw_flags & OBD_BRW_RDMA_ONLY) {
 		enable_checksum = false;
 		short_io_size = 0;
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 02/24] lustre: llite: add trusted.projid virtual xattr
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 01/24] lustre: osc: don't have extra gpu call James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 03/24] lnet: o2iblnd: cleanup James Simmons
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Li Dongyang, Lustre Development List

From: Li Dongyang <dongyangli@ddn.com>

Add trusted.projid virtual xattr in ldiskfs to export the
current project id, intended for ldiskfs level MDT backup.

When the project id is EXT4_DEF_PROJID/0,
the virtual xattr is hidden from listxattr(2).

It's also hidden on lustre client when parent has the
project inherit flag and the same project ID,
to stop mv from setting the virtual xattr on the dest with
the project id from src, which could be different from dest.

getxattr(2) on trusted.projid will report current project id,
setxattr(2) will change curent project id and
removexattr(2) will set project id back to EXT4_DEF_PROJID/0

Both get|setxattr(2) will work even when the virtual xattr is
hidden.

Invalidate client xattr cache for the inode when changing its
project id, so the virtual xattr can get the new value
for next getxattr(2)

Add test cases to verify the virtual projid xattr and backup
restore MDT using tar can now preserve the project id.

Change mds_backup_restore in test framework, to use
tar with --xattrs --xattrs-include='trusted.*'" options.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12056
Lustre-commit: 665383d3a1f4d1dc7 ("LU-12056 ldiskfs: add trusted.projid virtual xattr")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/45679
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/xattr.c                | 15 +++++++++++++++
 fs/lustre/llite/xattr_cache.c          | 15 +++++++++++++++
 include/uapi/linux/lustre/lustre_idl.h |  1 +
 3 files changed, 31 insertions(+)

diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c
index 6aea651..ce9585a 100644
--- a/fs/lustre/llite/xattr.c
+++ b/fs/lustre/llite/xattr.c
@@ -613,6 +613,7 @@ static int ll_xattr_get(const struct xattr_handler *handler,
 
 ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size)
 {
+	struct inode *dir = d_inode(dentry->d_parent);
 	struct inode *inode = d_inode(dentry);
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	ktime_t kstart = ktime_get();
@@ -656,6 +657,20 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size)
 				hide_xattr = true;
 		}
 
+		/* Hide virtual project id xattr from the list when
+		 * parent has the inherit flag and the same project id,
+		 * so project id won't be messed up by copying the xattrs
+		 * when mv to a tree with different project id.
+		 */
+		if (get_xattr_type(xattr_name)->flags == XATTR_TRUSTED_T &&
+		    strcmp(xattr_name, XATTR_NAME_PROJID) == 0) {
+			if (ll_i2info(inode)->lli_projid ==
+					ll_i2info(dir)->lli_projid &&
+			    test_bit(LLIF_PROJECT_INHERIT,
+				     &ll_i2info(dir)->lli_flags))
+				hide_xattr = true;
+		}
+
 		len = strnlen(xattr_name, rem - 1) + 1;
 		rem -= len;
 		if (!xattr_type_filter(sbi, hide_xattr ? NULL :
diff --git a/fs/lustre/llite/xattr_cache.c b/fs/lustre/llite/xattr_cache.c
index 7c1f5b7..723cc39 100644
--- a/fs/lustre/llite/xattr_cache.c
+++ b/fs/lustre/llite/xattr_cache.c
@@ -563,6 +563,21 @@ int ll_xattr_cache_get(struct inode *inode, const char *name, char *buffer,
 				else
 					rc = -ERANGE;
 			}
+		/* Return the project id when the virtual project id xattr
+		 * is explicitly asked.
+		 */
+		} else if (strcmp(name, XATTR_NAME_PROJID) == 0) {
+			/* 10 chars to hold u32 in decimal, plus ending \0 */
+			char projid[11];
+
+			rc = snprintf(projid, sizeof(projid),
+				      "%u", lli->lli_projid);
+			if (size != 0) {
+				if (rc <= size)
+					memcpy(buffer, projid, rc);
+				else
+					rc = -ERANGE;
+			}
 		}
 	} else if (valid & OBD_MD_FLXATTRLS) {
 		rc = ll_xattr_cache_list(&lli->lli_xattrs,
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index 35d3ed2..78e20a7 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -1078,6 +1078,7 @@ struct lov_mds_md_v1 {		/* LOV EA mds/wire data (little-endian) */
 #define XATTR_NAME_SOM		"trusted.som"
 #define XATTR_NAME_HSM		"trusted.hsm"
 #define XATTR_NAME_LFSCK_NAMESPACE "trusted.lfsck_namespace"
+#define XATTR_NAME_PROJID	"trusted.projid"
 
 #define LL_XATTR_NAME_ENCRYPTION_CONTEXT XATTR_SECURITY_PREFIX"c"
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 03/24] lnet: o2iblnd: cleanup
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 01/24] lustre: osc: don't have extra gpu call James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 02/24] lustre: llite: add trusted.projid virtual xattr James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 04/24] lustre: ptlrpc: make rq_replied flag always correct James Simmons
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexey Lyashkov, Lustre Development List

From: Alexey Lyashkov <alexey.lyashkov@hpe.com>

simplify kiblnd_send by avoid code duplication.
lets pickup idle tx first.

HPE-bug-id: LUS-1796
WC-bug-id: https://jira.whamcloud.com/browse/LU-14008
Lustre-commit: 3916b9d7226ebb21c ("LU-14008 o2iblnd: cleanup")
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/40260
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 43 ++++++++++++++-----------------------
 net/lnet/lnet/lib-move.c            |  1 +
 2 files changed, 17 insertions(+), 27 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index db13f41..7560fe1 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1543,6 +1543,15 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 	iov_iter_advance(&from, payload_offset);
 
+	tx = kiblnd_get_idle_tx(ni, target.nid);
+	if (!tx) {
+		CERROR("Can't allocate %s txd for %s\n",
+		       lnet_msgtyp2str(type),
+		       libcfs_nid2str(target.nid));
+		return -ENOMEM;
+	}
+	ibmsg = tx->tx_msg;
+
 	switch (type) {
 	default:
 		LBUG();
@@ -1561,14 +1570,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		if (nob <= IBLND_MSG_SIZE && !lntmsg->msg_rdma_force)
 			break;		/* send IMMEDIATE */
 
-		tx = kiblnd_get_idle_tx(ni, target.nid);
-		if (!tx) {
-			CERROR("Can't allocate txd for GET to %s\n",
-			       libcfs_nid2str(target.nid));
-			return -ENOMEM;
-		}
-
-		ibmsg = tx->tx_msg;
 		rd = &ibmsg->ibm_u.get.ibgm_rd;
 		rc = kiblnd_setup_rd_kiov(ni, tx, rd,
 					  payload_niov, payload_kiov,
@@ -1595,7 +1596,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			return -EIO;
 		}
 
-		tx->tx_lntmsg[0] = lntmsg;	/* finalise lntmsg[0,1] on completion */
+		/* finalise lntmsg[0,1] on completion */
+		tx->tx_lntmsg[0] = lntmsg;
 		tx->tx_waiting = 1;		/* waiting for GET_DONE */
 		kiblnd_launch_tx(ni, tx, target.nid);
 		return 0;
@@ -1607,14 +1609,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		if (nob <= IBLND_MSG_SIZE && !lntmsg->msg_rdma_force)
 			break;			/* send IMMEDIATE */
 
-		tx = kiblnd_get_idle_tx(ni, target.nid);
-		if (!tx) {
-			CERROR("Can't allocate %s txd for %s\n",
-			       type == LNET_MSG_PUT ? "PUT" : "REPLY",
-			       libcfs_nid2str(target.nid));
-			return -ENOMEM;
-		}
-
 		rc = kiblnd_setup_rd_kiov(ni, tx, tx->tx_rd,
 					  payload_niov, payload_kiov,
 					  payload_offset, payload_nob);
@@ -1625,12 +1619,12 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			return -EIO;
 		}
 
-		ibmsg = tx->tx_msg;
 		ibmsg->ibm_u.putreq.ibprm_hdr = *hdr;
 		ibmsg->ibm_u.putreq.ibprm_cookie = tx->tx_cookie;
 		kiblnd_init_tx_msg(ni, tx, IBLND_MSG_PUT_REQ, sizeof(struct kib_putreq_msg));
 
-		tx->tx_lntmsg[0] = lntmsg;	/* finalise lntmsg on completion */
+		/* finalise lntmsg on completion */
+		tx->tx_lntmsg[0] = lntmsg;
 		tx->tx_waiting = 1;		/* waiting for PUT_{ACK,NAK} */
 		kiblnd_launch_tx(ni, tx, target.nid);
 		return 0;
@@ -1641,13 +1635,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	LASSERT(offsetof(struct kib_msg, ibm_u.immediate.ibim_payload[payload_nob])
 		 <= IBLND_MSG_SIZE);
 
-	tx = kiblnd_get_idle_tx(ni, target.nid);
-	if (!tx) {
-		CERROR("Can't send %d to %s: tx descs exhausted\n",
-		       type, libcfs_nid2str(target.nid));
-		return -ENOMEM;
-	}
-
 	ibmsg = tx->tx_msg;
 	ibmsg->ibm_u.immediate.ibim_hdr = *hdr;
 
@@ -1661,7 +1648,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	nob = offsetof(struct kib_immediate_msg, ibim_payload[payload_nob]);
 	kiblnd_init_tx_msg(ni, tx, IBLND_MSG_IMMEDIATE, nob);
 
-	tx->tx_lntmsg[0] = lntmsg;		/* finalise lntmsg on completion */
+	/* finalise lntmsg on completion */
+	tx->tx_lntmsg[0] = lntmsg;
+
 	kiblnd_launch_tx(ni, tx, target.nid);
 	return 0;
 }
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index caffa30..133397e 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -4215,6 +4215,7 @@ void lnet_monitor_thr_stop(void)
 		return "<UNKNOWN>";
 	}
 }
+EXPORT_SYMBOL(lnet_msgtyp2str);
 
 int
 lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid4,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 04/24] lustre: ptlrpc: make rq_replied flag always correct
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (2 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 03/24] lnet: o2iblnd: cleanup James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 05/24] lustre: mgc: do not ignore target registration failure James Simmons
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexander Zarochentsev, Lustre Development List

From: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>

rq_replied flag is cleared at ptl_rpc_send() only,
so state of the flag may be incorrect for rpcs which
are timed out but have have been never sent.

HPE-bug-id: LUS-8752
WC-bug-id: https://jira.whamcloud.com/browse/LU-15112
Lustre-commit: 94f3f1b511609fa19 ("LU-15112 ptlrpc: make rq_replied flag always correct")
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45871
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/ptlrpc_internal.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/lustre/ptlrpc/ptlrpc_internal.h b/fs/lustre/ptlrpc/ptlrpc_internal.h
index d6edfde..d902cfe 100644
--- a/fs/lustre/ptlrpc/ptlrpc_internal.h
+++ b/fs/lustre/ptlrpc/ptlrpc_internal.h
@@ -336,6 +336,7 @@ static inline void ptlrpc_cli_req_init(struct ptlrpc_request *req)
 	req->rq_receiving_reply = 0;
 	req->rq_req_unlinked = 1;
 	req->rq_reply_unlinked = 1;
+	req->rq_replied = 0;
 
 	INIT_LIST_HEAD(&cr->cr_set_chain);
 	INIT_LIST_HEAD(&cr->cr_ctx_chain);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 05/24] lustre: mgc: do not ignore target registration failure
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (3 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 04/24] lustre: ptlrpc: make rq_replied flag always correct James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 06/24] lustre: llite: make foreign symlinks aware of mount namespaces James Simmons
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexander Zarochentsev, Lustre Development List

From: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>

A serious target registation failure with LDD_F_ERROR
flag set is ignored by target, it makes possible
registreting new target with already used index;
Writeconf flag should be encoded in fs label regardless
the "first_time" flag, otherwise target cannot be registered
after initial registration failure.

HPE-bug-id: LUS-8752
WC-bug-id: https://jira.whamcloud.com/browse/LU-15112
Lustre-commit: cefabee52586f443b ("LU-15112 mgc: do not ignore target registration failure")
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45259
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mgc/mgc_request.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c
index 3955d1f..62bf0ea 100644
--- a/fs/lustre/mgc/mgc_request.c
+++ b/fs/lustre/mgc/mgc_request.c
@@ -937,7 +937,10 @@ static int mgc_target_register(struct obd_export *exp,
 	if (!rc) {
 		rep_mti = req_capsule_server_get(&req->rq_pill,
 						 &RMF_MGS_TARGET_INFO);
-		memcpy(mti, rep_mti, sizeof(*rep_mti));
+		if (rep_mti)
+			memcpy(mti, rep_mti, sizeof(*rep_mti));
+	}
+	if (!rc) {
 		CDEBUG(D_MGC, "register %s got index = %d\n",
 		       mti->mti_svname, mti->mti_stripe_index);
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 06/24] lustre: llite: make foreign symlinks aware of mount namespaces
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (4 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 05/24] lustre: mgc: do not ignore target registration failure James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 07/24] lustre: lov: Cache stripe offset calculation James Simmons
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Currently the foreign symlink code test if mount namespace is the
same namespace related to the sysfs tree. This doesn't cover all
cases. Linux supports limiting which mounts are visible to a
process with mount namespaces. Lets add this support as well.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10824
Lustre-commit: 942b4e118677af587 ("LU-10824 llite: make foreign symlinks aware of mount namespaces")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45609
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 fs/lustre/llite/llite_foreign_symlink.c | 8 ++++----
 fs/lustre/llite/llite_internal.h        | 1 +
 fs/lustre/llite/llite_lib.c             | 1 +
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/llite/llite_foreign_symlink.c b/fs/lustre/llite/llite_foreign_symlink.c
index bfade93..64bc5db 100644
--- a/fs/lustre/llite/llite_foreign_symlink.c
+++ b/fs/lustre/llite/llite_foreign_symlink.c
@@ -367,15 +367,15 @@ static struct dentry *ll_foreign_dir_lookup(struct inode *parent,
 
 static bool has_same_mount_namespace(struct ll_sb_info *sbi)
 {
-	int rc;
+	bool same;
 
-	rc = (sbi->ll_mnt.mnt == current->fs->root.mnt);
-	if (!rc)
+	same = (sbi->ll_mnt_ns == current->nsproxy->mnt_ns);
+	if (!same)
 		LCONSOLE_WARN("%s: client mount %s and '%s.%d' not in same mnt-namespace\n",
 			      sbi->ll_fsname, sbi->ll_kset.kobj.name,
 			      current->comm, current->pid);
 
-	return rc;
+	return same;
 }
 
 ssize_t foreign_symlink_enable_show(struct kobject *kobj,
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 54fd8d4..a2abec6 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -672,6 +672,7 @@ struct ll_sb_info {
 	struct obd_device	*ll_dt_obd;
 	struct dentry		*ll_debugfs_entry;
 	struct lu_fid		ll_root_fid; /* root object fid */
+	struct mnt_namespace	*ll_mnt_ns;
 
 	DECLARE_BITMAP(ll_flags, LL_SBI_NUM_FLAGS); /* enum ll_sbi_flags */
 	unsigned int		ll_xattr_cache_enabled:1,
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 87cdc36..f8ecdcba 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -445,6 +445,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
 	sbi->ll_namelen = osfs->os_namelen;
 	sbi->ll_mnt.mnt = current->fs->root.mnt;
+	sbi->ll_mnt_ns = current->nsproxy->mnt_ns;
 
 	if (test_bit(LL_SBI_USER_XATTR, sbi->ll_flags) &&
 	    !(data->ocd_connect_flags & OBD_CONNECT_XATTR)) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 07/24] lustre: lov: Cache stripe offset calculation
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (5 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 06/24] lustre: llite: make foreign symlinks aware of mount namespaces James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 08/24] lnet: o2iblnd: treat cmid->device == NULL as an error James Simmons
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Patrick Farrell, Lustre Development List

From: Patrick Farrell <farr0186@gmail.com>

Calculating the page offset relative to the stripe (etc)
in a file is surprisingly expensive.  Because i/o has
already been split up to stripes by the cl_io code,
calculating the stripe each time is unnecessary.

We cache most of the values requiring calculation.

This improves AIO/DIO page submission significantly,
improving performance by a bit over 10%.

Also remove lpg_generation, which isn't doing anything
useful.  This suggests the possibility of removing
lov_page, but that's for another patch.

This patch reduces i/o time in ms/GiB by:
Write: 17 ms/GiB
Read: 22 ms/GiB

Totals:
Write: 119 ms/GiB
Read: 121 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write        7531 MiB/s
read         7179 MiB/s

Plus this patch:
write        8637 MiB/s
read         8488 MiB/s

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: 14db1faa0fbe813fe ("LU-13799 lov: Cache stripe offset calculation")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/39445
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_cl_internal.h |  9 +++++--
 fs/lustre/lov/lov_io.c          |  6 +++++
 fs/lustre/lov/lov_page.c        | 57 +++++++++++++++++++++++++++++++----------
 3 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h
index d48e2df3..42fd10a 100644
--- a/fs/lustre/lov/lov_cl_internal.h
+++ b/fs/lustre/lov/lov_cl_internal.h
@@ -453,8 +453,6 @@ struct lov_lock {
 
 struct lov_page {
 	struct cl_page_slice	lps_cl;
-	/* the layout gen when this page was created */
-	u32			lps_layout_gen;
 };
 
 /*
@@ -524,6 +522,7 @@ struct lov_io_sub {
 /**
  * IO state private for LOV.
  */
+#define LIS_CACHE_ENTRY_NONE	-ENOENT
 struct lov_io {
 	/** super-class */
 	struct cl_io_slice	lis_cl;
@@ -590,6 +589,12 @@ struct lov_io {
 	 * All sub-io's created in this lov_io.
 	 */
 	struct list_head	lis_subios;
+	/* Cached results from stripe & offset calculations for page init */
+	int			lis_cached_entry;
+	int			lis_cached_stripe;
+	loff_t			lis_cached_off;
+	loff_t			lis_cached_suboff;
+	struct lov_io_sub	*lis_cached_sub;
 };
 
 struct lov_session {
diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c
index 8df13ee..904bafd 100644
--- a/fs/lustre/lov/lov_io.c
+++ b/fs/lustre/lov/lov_io.c
@@ -467,6 +467,7 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
 
 	io->ci_result = 0;
 	lio->lis_object = obj;
+	lio->lis_cached_entry = LIS_CACHE_ENTRY_NONE;
 
 	switch (io->ci_type) {
 	case CIT_READ:
@@ -1053,6 +1054,11 @@ static void lov_io_end(const struct lu_env *env, const struct cl_io_slice *ios)
 {
 	int rc;
 
+	/* Before ending each i/o, we must set lis_cached_entry to tell the
+	 * next i/o not to use stale cached lis information.
+	 */
+	cl2lov_io(env, ios)->lis_cached_entry = LIS_CACHE_ENTRY_NONE;
+
 	rc = lov_io_call(env, cl2lov_io(env, ios), lov_io_end_wrapper);
 	LASSERT(rc == 0);
 }
diff --git a/fs/lustre/lov/lov_page.c b/fs/lustre/lov/lov_page.c
index fdc415b..16bd7cd 100644
--- a/fs/lustre/lov/lov_page.c
+++ b/fs/lustre/lov/lov_page.c
@@ -56,8 +56,7 @@ static int lov_comp_page_print(const struct lu_env *env,
 	struct lov_page *lp = cl2lov_page(slice);
 
 	return (*printer)(env, cookie,
-			  LUSTRE_LOV_NAME "-page@%p, gen: %u\n",
-			  lp, lp->lps_layout_gen);
+			  LUSTRE_LOV_NAME"-page@%p\n", lp);
 }
 
 static const struct cl_page_operations lov_comp_page_ops = {
@@ -74,33 +73,65 @@ int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj,
 	struct cl_object *o;
 	struct lov_io_sub *sub;
 	struct lov_page *lpg = cl_object_page_slice(obj, page);
+	bool stripe_cached = false;
 	u64 offset;
 	u64 suboff;
-	int stripe;
 	int entry;
+	int stripe;
 	int rc;
 
+	/* Direct i/o (CPT_TRANSIENT) is split strictly to stripes, so we can
+	 * cache the stripe information.  Buffered i/o is differently
+	 * organized, and stripe calculation isn't a significant cost for
+	 * buffered i/o, so we only cache this for direct i/o.
+	 */
+	stripe_cached = lio->lis_cached_entry != LIS_CACHE_ENTRY_NONE &&
+			page->cp_type == CPT_TRANSIENT;
+
 	offset = cl_offset(obj, index);
-	entry = lov_io_layout_at(lio, offset);
+
+	if (stripe_cached) {
+		entry = lio->lis_cached_entry;
+		stripe = lio->lis_cached_stripe;
+		/* Offset can never go backwards in an i/o, so this is valid */
+		suboff = lio->lis_cached_suboff + offset - lio->lis_cached_off;
+	} else {
+		entry = lov_io_layout_at(lio, offset);
+
+		stripe = lov_stripe_number(loo->lo_lsm, entry, offset);
+		rc = lov_stripe_offset(loo->lo_lsm, entry, offset, stripe,
+				       &suboff);
+		LASSERT(rc == 0);
+		lio->lis_cached_entry = entry;
+		lio->lis_cached_stripe = stripe;
+		lio->lis_cached_off = offset;
+		lio->lis_cached_suboff = suboff;
+	}
+
 	if (entry < 0 || !lsm_entry_inited(loo->lo_lsm, entry)) {
 		/* non-existing layout component */
 		lov_page_init_empty(env, obj, page, index);
 		return 0;
 	}
 
-	r0 = lov_r0(loo, entry);
-	stripe = lov_stripe_number(loo->lo_lsm, entry, offset);
-	LASSERT(stripe < r0->lo_nr);
-	rc = lov_stripe_offset(loo->lo_lsm, entry, offset, stripe, &suboff);
-	LASSERT(rc == 0);
+	CDEBUG(D_PAGE, "offset %llu, entry %d, stripe %d, suboff %llu\n",
+	       offset, entry, stripe, suboff);
 
 	page->cp_lov_index = lov_comp_index(entry, stripe);
-	lpg->lps_layout_gen = loo->lo_lsm->lsm_layout_gen;
 	cl_page_slice_add(page, &lpg->lps_cl, obj, &lov_comp_page_ops);
 
-	sub = lov_sub_get(env, lio, page->cp_lov_index);
-	if (IS_ERR(sub))
-		return PTR_ERR(sub);
+	if (!stripe_cached) {
+		sub = lov_sub_get(env, lio, page->cp_lov_index);
+		if (IS_ERR(sub))
+			return PTR_ERR(sub);
+	} else {
+		sub = lio->lis_cached_sub;
+	}
+
+	lio->lis_cached_sub = sub;
+
+	r0 = lov_r0(loo, entry);
+	LASSERT(stripe < r0->lo_nr);
 
 	subobj = lovsub2cl(r0->lo_sub[stripe]);
 	cl_object_for_each(o, subobj) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 08/24] lnet: o2iblnd: treat cmid->device == NULL as an error
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (6 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 07/24] lustre: lov: Cache stripe offset calculation James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 09/24] lustre: lmv: set default LMV for "lfs mkdir -c 1" James Simmons
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Even if rdma_bind_addr is successful, kiblnd_dev_failover should
treat cmid->device == NULL as an error in order to later avoid
calling kiblnd_set_ni_fatal_on with possibly dev->ibd_hdev == NULL.

Fixes: 5e07562bc3 ("lnet: o2iblnd: clear fatal error on successful failover")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15018
Lustre-commit: abd0ce62e96523193 ("LU-15018 o2iblnd: treat cmid->device == NULL as an error")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44981
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 7d28acd..76f5e7f 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -2365,6 +2365,7 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns)
 	struct kib_net *net;
 	struct sockaddr_in addr;
 	struct net_device *netdev;
+	bool set_fatal = true;
 	unsigned long flags;
 	int rc = 0;
 	int i;
@@ -2416,6 +2417,8 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns)
 		CERROR("Failed to bind %s:%pI4h to device(%p): %d\n",
 		       dev->ibd_ifname, &dev->ibd_ifip,
 		       cmid->device, rc);
+		if (!rc && !cmid->device)
+			set_fatal = false;
 		rdma_destroy_id(cmid);
 		goto out;
 	}
@@ -2490,11 +2493,13 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns)
 	} else {
 		dev->ibd_failed_failover = 0;
 
-		rcu_read_lock();
-		netdev = dev_get_by_name_rcu(ns, dev->ibd_ifname);
-		if (netdev && (kiblnd_get_link_status(netdev) == 1))
-			kiblnd_set_ni_fatal_on(dev->ibd_hdev, 0);
-		rcu_read_unlock();
+		if (set_fatal) {
+			rcu_read_lock();
+			netdev = dev_get_by_name_rcu(ns, dev->ibd_ifname);
+			if (netdev && (kiblnd_get_link_status(netdev) == 1))
+				kiblnd_set_ni_fatal_on(dev->ibd_hdev, 0);
+			rcu_read_unlock();
+		}
 	}
 
 	return rc;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 09/24] lustre: lmv: set default LMV for "lfs mkdir -c 1"
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (7 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 08/24] lnet: o2iblnd: treat cmid->device == NULL as an error James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 10/24] lnet: socklnd: decrement connection counters on close James Simmons
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

With the introduction of filesystem-wide default LMV, dirs will be
created on MDT by space usage, but if dir is created by
"lfs mkdir -c 1 ...", its subdirs should be kept on the same MDT.
To achieve this, set default LMV on such dirs, NB if user doesn't
want this, he needs to create dir with
"lfs mkdir -c 1 --max-inherit=0 ...".

The policy to choose MDT in mkdir is as below:
1. is "lfs mkdir -i N"? mkdir on MDT N.
2. is "lfs mkdir -i -1"? mkdir by space usage.
3. is starting MDT specified in default LMV? mkdir on MDT N.
4. is default LMV space balanced? mkdir by space usage.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14560
Lustre-commit: bc2d7f065af6b4f9a ("LU-13560 lod: set default LMV for "lfs mkdir -c 1")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45290
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lmv/lmv_obd.c | 104 +++++++++++++++++++++++++++---------------------
 1 file changed, 58 insertions(+), 46 deletions(-)

diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index c87f37f..55816a1 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1770,46 +1770,39 @@ int lmv_old_layout_lookup(struct lmv_obd *lmv, struct md_op_data *op_data)
 	return rc;
 }
 
+/* mkdir by QoS upon 'lfs mkdir -i -1'.
+ *
+ * NB, mkdir by QoS only if parent is not striped, this is to avoid remote
+ * directories under striped directory.
+ */
 static inline bool lmv_op_user_qos_mkdir(const struct md_op_data *op_data)
 {
 	const struct lmv_user_md *lum = op_data->op_data;
 
+	if (op_data->op_code != LUSTRE_OPC_MKDIR)
+		return false;
+
+	if (lmv_dir_striped(op_data->op_mea1))
+		return false;
+
 	return (op_data->op_cli_flags & CLI_SET_MEA) && lum &&
 	       le32_to_cpu(lum->lum_magic) == LMV_USER_MAGIC &&
 	       le32_to_cpu(lum->lum_stripe_offset) == LMV_OFFSET_DEFAULT;
 }
 
+/* mkdir by QoS if either ROOT or parent default LMV is space balanced. */
 static inline bool lmv_op_default_qos_mkdir(const struct md_op_data *op_data)
 {
 	const struct lmv_stripe_md *lsm = op_data->op_default_mea1;
 
-	return (op_data->op_flags & MF_QOS_MKDIR) ||
-	       (lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT);
-}
-
-/* mkdir by QoS in three cases:
- * 1. ROOT default LMV is space balanced.
- * 2. 'lfs mkdir -i -1'
- * 3. parent default LMV master_mdt_index is -1
- *
- * NB, mkdir by QoS only if parent is not striped, this is to avoid remote
- * directories under striped directory.
- */
-static inline bool lmv_op_qos_mkdir(const struct md_op_data *op_data)
-{
 	if (op_data->op_code != LUSTRE_OPC_MKDIR)
 		return false;
 
 	if (lmv_dir_striped(op_data->op_mea1))
 		return false;
 
-	if (lmv_op_user_qos_mkdir(op_data))
-		return true;
-
-	if (lmv_op_default_qos_mkdir(op_data))
-		return true;
-
-	return false;
+	return (op_data->op_flags & MF_QOS_MKDIR) ||
+	       (lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT);
 }
 
 /* if parent default LMV is space balanced, and
@@ -1853,6 +1846,38 @@ static inline bool lmv_op_user_specific_mkdir(const struct md_op_data *op_data)
 			LMV_OFFSET_DEFAULT;
 }
 
+/* locate MDT by space usage */
+static struct lu_tgt_desc *lmv_locate_tgt_by_space(struct lmv_obd *lmv,
+						   struct md_op_data *op_data,
+						   struct lmv_tgt_desc *tgt)
+{
+	struct lmv_tgt_desc *tmp = tgt;
+
+	tgt = lmv_locate_tgt_qos(lmv, op_data->op_mds, op_data->op_dir_depth);
+	if (tgt == ERR_PTR(-EAGAIN)) {
+		if (ltd_qos_is_balanced(&lmv->lmv_mdt_descs) &&
+		    !lmv_op_default_rr_mkdir(op_data) &&
+		    !lmv_op_user_qos_mkdir(op_data))
+			/* if not necessary, don't create remote directory. */
+			tgt = tmp;
+		else
+			tgt = lmv_locate_tgt_rr(lmv);
+	}
+
+	/*
+	 * only update statfs after QoS mkdir, this means the cached statfs may
+	 * be stale, and current mkdir may not follow QoS accurately, but it's
+	 * not serious, and avoids periodic statfs when client doesn't mkdir by
+	 * QoS.
+	 */
+	if (!IS_ERR(tgt)) {
+		op_data->op_mds = tgt->ltd_index;
+		lmv_statfs_check_update(lmv2obd_dev(lmv), tgt);
+	}
+
+	return tgt;
+}
+
 int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 		const void *data, size_t datalen, umode_t mode, uid_t uid,
 		gid_t gid, kernel_cap_t cap_effective, u64 rdev,
@@ -1886,6 +1911,12 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
 
+	/* the order to apply policy in mkdir:
+	 * 1. is "lfs mkdir -i N"? mkdir on MDT N.
+	 * 2. is "lfs mkdir -i -1"? mkdir by space usage.
+	 * 3. is starting MDT specified in default LMV? mkdir on MDT N.
+	 * 4. is default LMV space balanced? mkdir by space usage.
+	 */
 	if (lmv_op_user_specific_mkdir(op_data)) {
 		struct lmv_user_md *lum = op_data->op_data;
 
@@ -1893,39 +1924,20 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 		tgt = lmv_tgt(lmv, op_data->op_mds);
 		if (!tgt)
 			return -ENODEV;
+	} else if (lmv_op_user_qos_mkdir(op_data)) {
+		tgt = lmv_locate_tgt_by_space(lmv, op_data, tgt);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
 	} else if (lmv_op_default_specific_mkdir(op_data)) {
 		op_data->op_mds =
 			op_data->op_default_mea1->lsm_md_master_mdt_index;
 		tgt = lmv_tgt(lmv, op_data->op_mds);
 		if (!tgt)
 			return -ENODEV;
-	} else if (lmv_op_qos_mkdir(op_data)) {
-		struct lmv_tgt_desc *tmp = tgt;
-
-		tgt = lmv_locate_tgt_qos(lmv, op_data->op_mds,
-					 op_data->op_dir_depth);
-		if (tgt == ERR_PTR(-EAGAIN)) {
-			if (ltd_qos_is_balanced(&lmv->lmv_mdt_descs) &&
-			    !lmv_op_default_rr_mkdir(op_data) &&
-			    !lmv_op_user_qos_mkdir(op_data))
-				/* if it's not necessary, don't create remote
-				 * directory.
-				 */
-				tgt = tmp;
-			else
-				tgt = lmv_locate_tgt_rr(lmv);
-		}
+	} else if (lmv_op_default_qos_mkdir(op_data)) {
+		tgt = lmv_locate_tgt_by_space(lmv, op_data, tgt);
 		if (IS_ERR(tgt))
 			return PTR_ERR(tgt);
-
-		op_data->op_mds = tgt->ltd_index;
-		/*
-		 * only update statfs after QoS mkdir, this means the cached
-		 * statfs may be stale, and current mkdir may not follow QoS
-		 * accurately, but it's not serious, and avoids periodic statfs
-		 * when client doesn't mkdir by QoS.
-		 */
-		lmv_statfs_check_update(obd, tgt);
 	}
 
 retry:
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 10/24] lnet: socklnd: decrement connection counters on close
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (8 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 09/24] lustre: lmv: set default LMV for "lfs mkdir -c 1" James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 11/24] lustre: lmv: improve MDT QOS space balance James Simmons
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

To gracefully handle potential race with delayed connection create,
decrement connection counters per type as connections are being
closed.

Fixes: 511ace4a ("lnet: socklnd: add conns_per_peer parameter")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15137
Lustre-commit: 7e26413aa85fdc931 ("LU-15137 socklnd: decrement connection counters on close")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45422
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd.c | 69 ++++++++++++++++++++++++++++++++++------
 1 file changed, 60 insertions(+), 9 deletions(-)

diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index b014aa8..6d1f85c 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -422,7 +422,9 @@ struct ksock_peer_ni *
 	switch (type) {
 	case SOCKLND_CONN_CONTROL:
 		conn_cb->ksnr_ctrl_conn_count++;
-		/* there's a single control connection per peer */
+		/* there's a single control connection per peer,
+		 * two in case of loopback
+		 */
 		conn_cb->ksnr_connected |= BIT(type);
 		break;
 	case SOCKLND_CONN_BULK_IN:
@@ -449,6 +451,45 @@ struct ksock_peer_ni *
 }
 
 static void
+ksocknal_decr_conn_count(struct ksock_conn_cb *conn_cb,
+			 int type)
+{
+	conn_cb->ksnr_conn_count--;
+
+	/* check if all connections of the given type got created */
+	switch (type) {
+	case SOCKLND_CONN_CONTROL:
+		conn_cb->ksnr_ctrl_conn_count--;
+		/* there's a single control connection per peer,
+		 * two in case of loopback
+		 */
+		if (conn_cb->ksnr_ctrl_conn_count == 0)
+			conn_cb->ksnr_connected &= ~BIT(type);
+		break;
+	case SOCKLND_CONN_BULK_IN:
+		conn_cb->ksnr_blki_conn_count--;
+		if (conn_cb->ksnr_blki_conn_count < conn_cb->ksnr_max_conns)
+			conn_cb->ksnr_connected &= ~BIT(type);
+		break;
+	case SOCKLND_CONN_BULK_OUT:
+		conn_cb->ksnr_blko_conn_count--;
+		if (conn_cb->ksnr_blko_conn_count < conn_cb->ksnr_max_conns)
+			conn_cb->ksnr_connected &= ~BIT(type);
+		break;
+	case SOCKLND_CONN_ANY:
+		if (conn_cb->ksnr_conn_count < conn_cb->ksnr_max_conns)
+			conn_cb->ksnr_connected &= ~BIT(type);
+		break;
+	default:
+		LBUG();
+		break;
+	}
+
+	CDEBUG(D_NET, "Del conn type %d, ksnr_connected %x ksnr_max_conns %d\n",
+	       type, conn_cb->ksnr_connected, conn_cb->ksnr_max_conns);
+}
+
+static void
 ksocknal_associate_cb_conn_locked(struct ksock_conn_cb *conn_cb,
 				  struct ksock_conn *conn)
 {
@@ -1249,6 +1290,8 @@ struct ksock_peer_ni *
 	struct ksock_peer_ni *peer_ni = conn->ksnc_peer;
 	struct ksock_conn_cb *conn_cb;
 	struct ksock_conn *conn2;
+	int conn_count;
+	int duplicate_count = 0;
 
 	LASSERT(!peer_ni->ksnp_error);
 	LASSERT(!conn->ksnc_closing);
@@ -1262,21 +1305,29 @@ struct ksock_peer_ni *
 		/* dissociate conn from cb... */
 		LASSERT(!conn_cb->ksnr_deleted);
 
+		conn_count = ksocknal_get_conn_count_by_type(conn_cb,
+							     conn->ksnc_type);
 		/* connected bit is set only if all connections
 		 * of the given type got created
 		 */
-		if (ksocknal_get_conn_count_by_type(conn_cb, conn->ksnc_type) ==
-		    conn_cb->ksnr_max_conns)
+		if (conn_count == conn_cb->ksnr_max_conns)
 			LASSERT((conn_cb->ksnr_connected &
 				BIT(conn->ksnc_type)) != 0);
 
-		list_for_each_entry(conn2, &peer_ni->ksnp_conns, ksnc_list) {
-			if (conn2->ksnc_conn_cb == conn_cb &&
-			    conn2->ksnc_type == conn->ksnc_type)
-				goto conn2_found;
+		if (conn_count == 1) {
+			list_for_each_entry(conn2, &peer_ni->ksnp_conns,
+					    ksnc_list) {
+				if (conn2->ksnc_conn_cb == conn_cb &&
+				    conn2->ksnc_type == conn->ksnc_type)
+					duplicate_count += 1;
+			}
+			if (duplicate_count > 0)
+				CERROR("Found %d duplicate conns type %d\n",
+				       duplicate_count,
+				       conn->ksnc_type);
 		}
-		conn_cb->ksnr_connected &= ~BIT(conn->ksnc_type);
-conn2_found:
+		ksocknal_decr_conn_count(conn_cb, conn->ksnc_type);
+
 		conn->ksnc_conn_cb = NULL;
 
 		/* drop conn's ref on route */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 11/24] lustre: lmv: improve MDT QOS space balance
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (9 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 10/24] lnet: socklnd: decrement connection counters on close James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 12/24] lustre: llite: access striped directory with missing stripe James Simmons
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

When MDTs are not balanced, QOS code tries to keep subdirectory
creation local to the same MDT when it is deep in the directory
tree, to avoid creating too many remote directories, but the
existing weight to stay on the parent MDT until 50% of other MDTs
is too radical, and causes mkdirs to be "stuck" on the same MDT.

* remove "lq_threshold_rr" from above calculation because the check
  in ltd_qos_is_usable() handles this, so use only "dir_depth".
* the factor is changed to "16 / (dir_depth + 10)", then it's less
  likely to stick to the parent MDT for top levels, while more
  likely to stay on the parent MDT for low levels:
  depth=0 -> 160%, depth=4 -> 114%, depth=6 -> 100%,
  depth=8 -> 88%, depth=12 -> 72%
* rename lli_depth to lli_dir_depth to make usage more clear.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15216
Lustre-commit: 38c4c538f53fb5f0c ("LU-15216 lmv: improve MDT QOS space balance")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45544
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c            | 2 +-
 fs/lustre/llite/llite_internal.h | 2 +-
 fs/lustre/llite/llite_lib.c      | 6 +++---
 fs/lustre/llite/namei.c          | 6 +++---
 fs/lustre/lmv/lmv_obd.c          | 7 ++++---
 5 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index f3f1ce7..43cd3cc 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -480,7 +480,7 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
-	op_data->op_dir_depth = ll_i2info(parent)->lli_depth;
+	op_data->op_dir_depth = ll_i2info(parent)->lli_dir_depth;
 
 	if (ll_sbi_has_encrypt(sbi) &&
 	    (IS_ENCRYPTED(parent) ||
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index a2abec6..0398b5f 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -184,7 +184,7 @@ struct ll_inode_info {
 			 */
 			pid_t				lli_opendir_pid;
 			/* directory depth to ROOT */
-			unsigned short			lli_depth;
+			unsigned short			lli_dir_depth;
 			/* stat will try to access statahead entries or start
 			 * statahead if this flag is set, and this flag will be
 			 * set upon dir open, and cleared when dir is closed,
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index f8ecdcba..e3e871d 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2609,9 +2609,9 @@ void ll_update_dir_depth(struct inode *dir, struct inode *inode)
 		return;
 
 	lli = ll_i2info(inode);
-	lli->lli_depth = ll_i2info(dir)->lli_depth + 1;
-	CDEBUG(D_INODE, DFID" depth %hu\n", PFID(&lli->lli_fid),
-	       lli->lli_depth);
+	lli->lli_dir_depth = ll_i2info(dir)->lli_dir_depth + 1;
+	CDEBUG(D_INODE, DFID" depth %hu\n",
+	       PFID(&lli->lli_fid), lli->lli_dir_depth);
 }
 
 void ll_truncate_inode_pages_final(struct inode *inode)
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index d46a30f..0683614 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -1493,7 +1493,7 @@ static void ll_qos_mkdir_prep(struct md_op_data *op_data, struct inode *dir)
 	struct ll_inode_info *lli = ll_i2info(dir);
 	struct lmv_stripe_md *lsm;
 
-	op_data->op_dir_depth = lli->lli_depth;
+	op_data->op_dir_depth = lli->lli_dir_depth;
 
 	/* parent directory is striped */
 	if (unlikely(lli->lli_lsm_md))
@@ -1522,11 +1522,11 @@ static void ll_qos_mkdir_prep(struct md_op_data *op_data, struct inode *dir)
 
 	if (lsm->lsm_md_max_inherit != LMV_INHERIT_NONE &&
 	    (lsm->lsm_md_max_inherit == LMV_INHERIT_UNLIMITED ||
-	     lsm->lsm_md_max_inherit >= lli->lli_depth)) {
+	     lsm->lsm_md_max_inherit >= lli->lli_dir_depth)) {
 		op_data->op_flags |= MF_QOS_MKDIR;
 		if (lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE &&
 		    (lsm->lsm_md_max_inherit_rr == LMV_INHERIT_RR_UNLIMITED ||
-		     lsm->lsm_md_max_inherit_rr >= lli->lli_depth))
+		     lsm->lsm_md_max_inherit_rr >= lli->lli_dir_depth))
 			op_data->op_flags |= MF_RR_MKDIR;
 		CDEBUG(D_INODE, DFID" requests qos mkdir %#x\n",
 		       PFID(&lli->lli_fid), op_data->op_flags);
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 55816a1..3e050b7 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1471,10 +1471,11 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 mdt,
 
 	/* if current MDT has above-average space, within range of the QOS
 	 * threshold, stay on the same MDT to avoid creating needless remote
-	 * MDT directories. It's more likely for low level directories.
+	 * MDT directories. It's more likely for low level directories
+	 * "16 / (dir_depth + 10)" is the factor to make it more unlikely for
+	 * top level directories, while more likely for low levels.
 	 */
-	rand = total_avail * (256 - lmv->lmv_qos.lq_threshold_rr) /
-	       (total_usable * 256 * (1 + dir_depth / 4));
+	rand = total_avail * 16 / (total_usable * (dir_depth + 10));
 	if (cur && cur->ltd_qos.ltq_avail >= rand) {
 		tgt = cur;
 		goto unlock;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 12/24] lustre: llite: access striped directory with missing stripe
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (10 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 11/24] lustre: lmv: improve MDT QOS space balance James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 13/24] lnet: libcfs: Remove D_TTY James Simmons
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

This patch allows accessing striped directory with missing stripes:
* lmv_revalidate_slave() skip error if one stripe returns -ESHUTDOWN.
* add ll_dir_flush(), which will return error found in reading
  stripe dir pages, thus 'ls' can list dirents on other stripes, and
  return an error in the end.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9206
Lustre-commit: c0fa6f7a10d1162f8 ("LU-9206 llite: access striped directory with missing stripe")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45631
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd.h          |  9 ++++++---
 fs/lustre/include/obd_class.h    |  7 +++----
 fs/lustre/llite/dir.c            | 43 ++++++++++++++++++++++++++++++----------
 fs/lustre/llite/llite_internal.h |  8 ++++++--
 fs/lustre/llite/llite_nfs.c      |  2 +-
 fs/lustre/llite/statahead.c      |  6 +++---
 fs/lustre/lmv/lmv_intent.c       |  4 ++--
 fs/lustre/lmv/lmv_obd.c          | 22 ++++++++++----------
 fs/lustre/mdc/mdc_request.c      |  7 +++----
 9 files changed, 69 insertions(+), 39 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index f6b9d16..ecee321 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -826,10 +826,12 @@ struct md_op_data {
 	u32			op_archive_id;
 };
 
-struct md_callback {
-	int (*md_blocking_ast)(struct ldlm_lock *lock,
+struct md_readdir_info {
+	int (*mr_blocking_ast)(struct ldlm_lock *lock,
 			       struct ldlm_lock_desc *desc,
 			       void *data, int flag);
+	/* if striped directory is partially read, the result is stored here */
+	int mr_partial_readdir_rc;
 };
 
 struct md_enqueue_info;
@@ -1028,8 +1030,9 @@ struct md_ops {
 	int (*fsync)(struct obd_export *, const struct lu_fid *,
 		     struct ptlrpc_request **);
 	int (*read_page)(struct obd_export *, struct md_op_data *,
-			 struct md_callback *cb_op, u64 hash_offset,
+			 struct md_readdir_info *mrinfo, u64 hash_offset,
 			 struct page **ppage);
+
 	int (*unlink)(struct obd_export *, struct md_op_data *,
 		      struct ptlrpc_request **);
 
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index f2a3d2b..b69331d 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -1399,9 +1399,8 @@ static inline int md_file_resync(struct obd_export *exp,
 
 static inline int md_read_page(struct obd_export *exp,
 			       struct md_op_data *op_data,
-			       struct md_callback *cb_op,
-			       u64 hash_offset,
-			       struct page **ppage)
+			       struct md_readdir_info *mrinfo,
+			       u64 hash_offset, struct page **ppage)
 {
 	int rc;
 
@@ -1412,7 +1411,7 @@ static inline int md_read_page(struct obd_export *exp,
 	lprocfs_counter_incr(exp->exp_obd->obd_md_stats,
 			     LPROC_MD_READ_PAGE);
 
-	return MDP(exp->exp_obd, read_page)(exp, op_data, cb_op, hash_offset,
+	return MDP(exp->exp_obd, read_page)(exp, op_data, mrinfo, hash_offset,
 					    ppage);
 }
 
diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 43cd3cc..b4870d9 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -140,17 +140,21 @@
  *
  */
 struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
-			     u64 offset)
+			     u64 offset, int *partial_readdir_rc)
 {
-	struct md_callback cb_op;
+	struct md_readdir_info mrinfo = {
+		.mr_blocking_ast = ll_md_blocking_ast
+	};
 	struct page *page;
 	int rc;
 
-	cb_op.md_blocking_ast = ll_md_blocking_ast;
-	rc = md_read_page(ll_i2mdexp(dir), op_data, &cb_op, offset, &page);
+	rc = md_read_page(ll_i2mdexp(dir), op_data, &mrinfo, offset, &page);
 	if (rc)
 		return ERR_PTR(rc);
 
+	if (partial_readdir_rc && mrinfo.mr_partial_readdir_rc)
+		*partial_readdir_rc = mrinfo.mr_partial_readdir_rc;
+
 	return page;
 }
 
@@ -177,7 +181,7 @@ void ll_release_page(struct inode *inode, struct page *page, bool remove)
 }
 
 int ll_dir_read(struct inode *inode, u64 *ppos, struct md_op_data *op_data,
-		struct dir_context *ctx)
+		struct dir_context *ctx, int *partial_readdir_rc)
 {
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	u64 pos = *ppos;
@@ -194,7 +198,7 @@ int ll_dir_read(struct inode *inode, u64 *ppos, struct md_op_data *op_data,
 			return rc;
 	}
 
-	page = ll_get_dir_page(inode, op_data, pos);
+	page = ll_get_dir_page(inode, op_data, pos, partial_readdir_rc);
 
 	while (rc == 0 && !done) {
 		struct lu_dirpage *dp;
@@ -285,7 +289,8 @@ int ll_dir_read(struct inode *inode, u64 *ppos, struct md_op_data *op_data,
 					le32_to_cpu(dp->ldp_flags) &
 					LDF_COLLIDE);
 			next = pos;
-			page = ll_get_dir_page(inode, op_data, pos);
+			page = ll_get_dir_page(inode, op_data, pos,
+					       partial_readdir_rc);
 		}
 	}
 
@@ -305,8 +310,13 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx)
 	struct md_op_data *op_data;
 	struct lu_fid pfid = { 0 };
 	ktime_t kstart = ktime_get();
+	/* result of possible partial readdir */
+	int partial_readdir_rc = 0;
 	int rc;
 
+	LASSERT(lfd);
+	pos = lfd->lfd_pos;
+
 	CDEBUG(D_VFSTRACE,
 	       "VFS Op:inode=" DFID "(%p) pos/size %lu/%llu 32bit_api %d\n",
 	       PFID(ll_inode2fid(inode)), inode, (unsigned long)pos,
@@ -369,10 +379,11 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx)
 	op_data->op_fid3 = pfid;
 
 	ctx->pos = pos;
-	rc = ll_dir_read(inode, &pos, op_data, ctx);
+	rc = ll_dir_read(inode, &pos, op_data, ctx, &partial_readdir_rc);
 	pos = ctx->pos;
-	if (lfd)
-		lfd->lfd_pos = pos;
+	lfd->lfd_pos = pos;
+	if (!lfd->fd_partial_readdir_rc)
+		lfd->fd_partial_readdir_rc = partial_readdir_rc;
 
 	if (pos == MDS_DIR_END_OFF) {
 		if (api32)
@@ -2294,6 +2305,17 @@ static int ll_dir_release(struct inode *inode, struct file *file)
 	return ll_file_release(inode, file);
 }
 
+/* notify error if partially read striped directory */
+static int ll_dir_flush(struct file *file, fl_owner_t id)
+{
+	struct ll_file_data *lfd = file->private_data;
+	int rc = lfd->fd_partial_readdir_rc;
+
+	lfd->fd_partial_readdir_rc = 0;
+
+	return rc;
+}
+
 const struct file_operations ll_dir_operations = {
 	.llseek			= ll_dir_seek,
 	.open			= ll_dir_open,
@@ -2302,4 +2324,5 @@ static int ll_dir_release(struct inode *inode, struct file *file)
 	.iterate_shared		= ll_readdir,
 	.unlocked_ioctl		= ll_dir_ioctl,
 	.fsync			= ll_fsync,
+	.flush			= ll_dir_flush,
 };
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 0398b5f..54f0218 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -920,6 +920,10 @@ struct ll_file_data {
 	 */
 	u32 fd_layout_version;
 	struct pcc_file fd_pcc_file;
+	/* striped directory may read partially if some stripe inaccessible,
+	 * -errno is saved here, and will return to user in close().
+	 */
+	int fd_partial_readdir_rc;
 };
 
 void llite_tunables_unregister(void);
@@ -1043,11 +1047,11 @@ enum {
 extern const struct file_operations ll_dir_operations;
 extern const struct inode_operations ll_dir_inode_operations;
 int ll_dir_read(struct inode *inode, u64 *ppos, struct md_op_data *op_data,
-		struct dir_context *ctx);
+		struct dir_context *ctx, int *partial_readdir_rc);
 int ll_get_mdt_idx(struct inode *inode);
 int ll_get_mdt_idx_by_fid(struct ll_sb_info *sbi, const struct lu_fid *fid);
 struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
-			     u64 offset);
+			     u64 offset, int *partial_readdir_rc);
 void ll_release_page(struct inode *inode, struct page *page, bool remove);
 int quotactl_ioctl(struct super_block *sb, struct if_quotactl *qctl);
 
diff --git a/fs/lustre/llite/llite_nfs.c b/fs/lustre/llite/llite_nfs.c
index 07fcad6..3c4c9ef 100644
--- a/fs/lustre/llite/llite_nfs.c
+++ b/fs/lustre/llite/llite_nfs.c
@@ -233,7 +233,7 @@ static int ll_get_name(struct dentry *dentry, char *name,
 	}
 
 	inode_lock(dir);
-	rc = ll_dir_read(dir, &pos, op_data, &lgd.ctx);
+	rc = ll_dir_read(dir, &pos, op_data, &lgd.ctx, NULL);
 	inode_unlock(dir);
 	ll_finish_md_op_data(op_data);
 	if (!rc && !lgd.lgd_found)
diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c
index afb668e..c781e49 100644
--- a/fs/lustre/llite/statahead.c
+++ b/fs/lustre/llite/statahead.c
@@ -1041,7 +1041,7 @@ static int ll_statahead_thread(void *arg)
 		}
 
 		sai->sai_in_readpage = 1;
-		page = ll_get_dir_page(dir, op_data, pos);
+		page = ll_get_dir_page(dir, op_data, pos, NULL);
 		ll_unlock_md_op_lsm(op_data);
 		sai->sai_in_readpage = 0;
 		if (IS_ERR(page)) {
@@ -1325,7 +1325,7 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 	/**
 	 * FIXME choose the start offset of the readdir
 	 */
-	page = ll_get_dir_page(dir, op_data, pos);
+	page = ll_get_dir_page(dir, op_data, pos, NULL);
 
 	while (1) {
 		struct lu_dirpage *dp;
@@ -1429,7 +1429,7 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			ll_release_page(dir, page,
 					le32_to_cpu(dp->ldp_flags) &
 					LDF_COLLIDE);
-			page = ll_get_dir_page(dir, op_data, pos);
+			page = ll_get_dir_page(dir, op_data, pos, NULL);
 		}
 	}
 out:
diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c
index 906ca16..2322b6a 100644
--- a/fs/lustre/lmv/lmv_intent.c
+++ b/fs/lustre/lmv/lmv_intent.c
@@ -222,8 +222,8 @@ int lmv_revalidate_slaves(struct obd_export *exp,
 
 		rc = md_intent_lock(tgt->ltd_exp, op_data, &it, &req,
 				    cb_blocking, extra_lock_flags);
-		if (rc == -ENOENT) {
-			/* skip stripe is not exists */
+		if (rc == -ENOENT || rc == -ESHUTDOWN) {
+			/* skip stripe that doesn't exist or is inaccessible */
 			rc = 0;
 			continue;
 		}
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 3e050b7..5fd00d3 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -2574,7 +2574,7 @@ struct stripe_dirent {
 struct lmv_dir_ctxt {
 	struct lmv_obd		*ldc_lmv;
 	struct md_op_data	*ldc_op_data;
-	struct md_callback	*ldc_cb_op;
+	struct md_readdir_info  *ldc_mrinfo;
 	u64			 ldc_hash;
 	int			 ldc_count;
 	struct stripe_dirent	 ldc_stripes[0];
@@ -2675,7 +2675,7 @@ static struct lu_dirent *stripe_dirent_load(struct lmv_dir_ctxt *ctxt,
 		op_data->op_fid2 = oinfo->lmo_fid;
 		op_data->op_data = oinfo->lmo_root;
 
-		rc = md_read_page(tgt->ltd_exp, op_data, ctxt->ldc_cb_op, hash,
+		rc = md_read_page(tgt->ltd_exp, op_data, ctxt->ldc_mrinfo, hash,
 				  &stripe->sd_page);
 
 		op_data->op_fid1 = fid;
@@ -2696,6 +2696,7 @@ static struct lu_dirent *stripe_dirent_load(struct lmv_dir_ctxt *ctxt,
 		LASSERT(!ent);
 		/* treat error as eof, so dir can be partially accessed */
 		stripe->sd_eof = true;
+		ctxt->ldc_mrinfo->mr_partial_readdir_rc = rc;
 		LCONSOLE_WARN("dir " DFID " stripe %d readdir failed: %d, directory is partially accessed!\n",
 			      PFID(&ctxt->ldc_op_data->op_fid1), stripe_index,
 			      rc);
@@ -2793,7 +2794,8 @@ static struct lu_dirent *lmv_dirent_next(struct lmv_dir_ctxt *ctxt)
  *
  * @exp:	obd export refer to LMV
  * @op_data:	hold those MD parameters of read_entry
- * @cb_op:	ldlm callback being used in enqueue in mdc_read_entry
+ * @mrinfo:	ldlm callback being used in enqueue in mdc_read_entry,
+ *		and partial readdir results will be stored in it.
  * @offset:	the entry being read
  * @ppage:	the page holding the entry. Note: because the entry
  *		will be accessed in upper layer, so we need hold the
@@ -2805,8 +2807,8 @@ static struct lu_dirent *lmv_dirent_next(struct lmv_dir_ctxt *ctxt)
  */
 static int lmv_striped_read_page(struct obd_export *exp,
 				 struct md_op_data *op_data,
-				 struct md_callback *cb_op,
-				 u64 offset, struct page **ppage)
+				 struct md_readdir_info *mrinfo, u64 offset,
+				 struct page **ppage)
 {
 	struct page *page = NULL;
 	struct lu_dirpage *dp;
@@ -2848,7 +2850,7 @@ static int lmv_striped_read_page(struct obd_export *exp,
 	}
 	ctxt->ldc_lmv = &exp->exp_obd->u.lmv;
 	ctxt->ldc_op_data = op_data;
-	ctxt->ldc_cb_op = cb_op;
+	ctxt->ldc_mrinfo = mrinfo;
 	ctxt->ldc_hash = offset;
 	ctxt->ldc_count = stripe_count;
 
@@ -2925,7 +2927,7 @@ static int lmv_striped_read_page(struct obd_export *exp,
 }
 
 static int lmv_read_page(struct obd_export *exp, struct md_op_data *op_data,
-			 struct md_callback *cb_op, u64 offset,
+			 struct md_readdir_info *mrinfo, u64 offset,
 			 struct page **ppage)
 {
 	struct obd_device *obd = exp->exp_obd;
@@ -2936,15 +2938,15 @@ static int lmv_read_page(struct obd_export *exp, struct md_op_data *op_data,
 		return -ENODATA;
 
 	if (unlikely(lmv_dir_striped(op_data->op_mea1))) {
-		return lmv_striped_read_page(exp, op_data, cb_op,
-					     offset, ppage);
+		return lmv_striped_read_page(exp, op_data, mrinfo, offset,
+					     ppage);
 	}
 
 	tgt = lmv_fid2tgt(lmv, &op_data->op_fid1);
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
 
-	return md_read_page(tgt->ltd_exp, op_data, cb_op, offset, ppage);
+	return md_read_page(tgt->ltd_exp, op_data, mrinfo, offset, ppage);
 }
 
 /**
diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index 9788bd3..3284c01 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -1294,7 +1294,6 @@ struct readpage_param {
 	u64			rp_off;
 	int			rp_hash64;
 	struct obd_export	*rp_exp;
-	struct md_callback	*rp_cb;
 };
 
 /**
@@ -1410,7 +1409,7 @@ static int mdc_read_page_remote(void *data, struct page *page0)
  * @exp:		MDC export
  * @op_data:		client MD stack parameters, transferring parameters
  *			between different layers on client MD stack.
- * @cb_op:		callback required for ldlm lock enqueue during
+ * @mrinfo:		callback required for ldlm lock enqueue during
  *			read page
  * @hash_offset:	the hash offset of the page to be read
  * @ppage		the page to be read
@@ -1419,7 +1418,7 @@ static int mdc_read_page_remote(void *data, struct page *page0)
  *			errno(<0) get the page failed
  */
 static int mdc_read_page(struct obd_export *exp, struct md_op_data *op_data,
-			 struct md_callback *cb_op, u64 hash_offset,
+			 struct md_readdir_info *mrinfo, u64 hash_offset,
 			 struct page **ppage)
 {
 	struct lookup_intent it = { .it_op = IT_READDIR };
@@ -1440,7 +1439,7 @@ static int mdc_read_page(struct obd_export *exp, struct md_op_data *op_data,
 	mapping = dir->i_mapping;
 
 	rc = mdc_intent_lock(exp, op_data, &it, &enq_req,
-			     cb_op->md_blocking_ast, 0);
+			     mrinfo->mr_blocking_ast, 0);
 	if (enq_req)
 		ptlrpc_req_finished(enq_req);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 13/24] lnet: libcfs: Remove D_TTY
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (11 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 12/24] lustre: llite: access striped directory with missing stripe James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 14/24] lustre: llite: Add D_IOTRACE James Simmons
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

The D_TTY flag is almost entirely unused and certainly not
needed.  Remove it so we have a spare flag to use for
iotrace.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15137
Lustre-commit: f9fe2977d184fbc8e ("LU-15317 libcfs: Remove D_TTY")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45751
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c                 |  2 +-
 include/uapi/linux/lnet/libcfs_debug.h |  1 -
 net/lnet/libcfs/tracefile.c            | 51 +---------------------------------
 3 files changed, 2 insertions(+), 52 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 30e99c0..05f2f1a 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -4883,7 +4883,7 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum,
 	 */
 	if (!(fd->fd_flags & LL_FILE_FLOCK_WARNING)) {
 		fd->fd_flags |= LL_FILE_FLOCK_WARNING;
-		CDEBUG_LIMIT(D_TTY | D_CONSOLE,
+		CDEBUG_LIMIT(D_CONSOLE,
 			     "flock disabled, mount with '-o [local]flock' to enable\r\n");
 	}
 	return -EINVAL;
diff --git a/include/uapi/linux/lnet/libcfs_debug.h b/include/uapi/linux/lnet/libcfs_debug.h
index 6b64f0e..4cb6594 100644
--- a/include/uapi/linux/lnet/libcfs_debug.h
+++ b/include/uapi/linux/lnet/libcfs_debug.h
@@ -106,7 +106,6 @@ struct ptldebug_header {
 #define D_TRACE		0x00000001 /* ENTRY/EXIT markers */
 #define D_INODE		0x00000002
 #define D_SUPER		0x00000004
-#define D_TTY		0x00000008 /* notification printed to TTY */
 #define D_MALLOC	0x00000010 /* print malloc, free information */
 #define D_CACHE		0x00000020 /* cache-related items */
 #define D_INFO		0x00000040 /* general information */
diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c
index b27732a..948eaaa 100644
--- a/net/lnet/libcfs/tracefile.c
+++ b/net/lnet/libcfs/tracefile.c
@@ -44,7 +44,6 @@
 #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/poll.h>
-#include <linux/tty.h>
 #include <linux/uaccess.h>
 #include "tracefile.h"
 
@@ -352,41 +351,6 @@ static void cfs_set_ptldebug_header(struct ptldebug_header *header,
 	header->ph_extern_pid = 0;
 }
 
-/**
- * tty_write_msg - write a message to a certain tty, not just the console.
- * @tty: the destination tty_struct
- * @msg: the message to write
- *
- * tty_write_message is not exported, so write a same function for it
- *
- */
-static void tty_write_msg(struct tty_struct *tty, const char *msg)
-{
-	mutex_lock(&tty->atomic_write_lock);
-	tty_lock(tty);
-	if (tty->ops->write && tty->count > 0)
-		tty->ops->write(tty, msg, strlen(msg));
-	tty_unlock(tty);
-	mutex_unlock(&tty->atomic_write_lock);
-	wake_up_interruptible_poll(&tty->write_wait, POLLOUT);
-}
-
-static void cfs_tty_write_message(const char *prefix, int mask, const char *msg)
-{
-	struct tty_struct *tty;
-
-	tty = get_current_tty();
-	if (!tty)
-		return;
-
-	tty_write_msg(tty, prefix);
-	if ((mask & D_EMERG) || (mask & D_ERROR))
-		tty_write_msg(tty, "Error");
-	tty_write_msg(tty, ": ");
-	tty_write_msg(tty, msg);
-	tty_kref_put(tty);
-}
-
 static void cfs_vprint_to_console(struct ptldebug_header *hdr, int mask,
 				  struct va_format *vaf, const char *file,
 				  const char *fn)
@@ -421,10 +385,6 @@ static void cfs_vprint_to_console(struct ptldebug_header *hdr, int mask,
 		else if (mask & (D_CONSOLE | libcfs_printk))
 			pr_info("%s: %pV", prefix, vaf);
 	}
-
-	if (mask & D_TTY)
-		/* tty_write_msg doesn't handle formatting */
-		cfs_tty_write_message(prefix, mask, vaf->fmt);
 }
 
 static void cfs_print_to_console(struct ptldebug_header *hdr, int mask,
@@ -534,14 +494,6 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata,
 	if (*(string_buf + needed - 1) != '\n') {
 		pr_info("Lustre: format at %s:%d:%s doesn't end in newline\n",
 			file, msgdata->msg_line, msgdata->msg_fn);
-	} else if (mask & D_TTY) {
-		/* TTY needs '\r\n' to move carriage to leftmost position */
-		if (needed < 2 || *(string_buf + needed - 2) != '\r')
-			pr_info("Lustre: format at %s:%d:%s doesn't end in '\\r\\n'\n",
-				file, msgdata->msg_line, msgdata->msg_fn);
-		if (strnchr(string_buf, needed, '%'))
-			pr_info("Lustre: format at %s:%d:%s mustn't contain %%\n",
-				file, msgdata->msg_line, msgdata->msg_fn);
 	}
 
 	header.ph_len = known_size + needed;
@@ -627,8 +579,7 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata,
 	}
 
 	if (cdls && cdls->cdls_count) {
-		/* Do not allow print this to TTY */
-		cfs_print_to_console(&header, mask & ~D_TTY, file,
+		cfs_print_to_console(&header, mask, file,
 				     msgdata->msg_fn,
 				     "Skipped %d previous similar message%s\n",
 				     cdls->cdls_count,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 14/24] lustre: llite: Add D_IOTRACE
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (12 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 13/24] lnet: libcfs: Remove D_TTY James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 15/24] lustre: llite: Add start_idx debug James Simmons
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

In looking in to performance problems, it's very important
to be able to trace the I/O patterns from userspace in to
Lustre, and also understand the key basics of how Lustre
handles that I/O (readahead, RPC generation).

This is best done with a dedicated debug flag - No
userspace tool can provide all this information, and
existing debug flags collect a huge number of unrelated
pieces of, well, debug information.

The goal is for customers to be able to quickly gather log
files of a reasonable size which contain the necessary
information and which can easily be interpreted by
engineering.  This is not possible if the information is
spread out across a number of heavyweight debug flags.

This is a first pass at adding the flag and the debug
required to track basic data I/O.  One significant
omission in the first patch is RPC generation - I have not
decided how best to do that yet.  That will be added in a
future patch.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15137
Lustre-commit: 40d286e11138fc67f ("LU-15317 llite: Add D_IOTRACE")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45752
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c                 | 10 ++++++++++
 fs/lustre/llite/llite_internal.h       |  1 +
 fs/lustre/llite/llite_mmap.c           | 13 ++++++++++---
 fs/lustre/llite/rw.c                   | 10 ++++++++--
 include/uapi/linux/lnet/libcfs_debug.h |  3 ++-
 5 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 05f2f1a..dec0109 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -1954,6 +1954,11 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	ktime_t kstart = ktime_get();
 	bool cached;
 
+	CDEBUG(D_VFSTRACE|D_IOTRACE, "file %s:"DFID", ppos: %lld, count: %zu\n",
+	       file_dentry(file)->d_name.name,
+	       PFID(ll_inode2fid(file_inode(file))), iocb->ki_pos,
+	       iov_iter_count(to));
+
 	if (!iov_iter_count(to))
 		return 0;
 
@@ -2075,6 +2080,11 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	ktime_t kstart = ktime_get();
 	int result;
 
+	CDEBUG(D_VFSTRACE|D_IOTRACE, "file %s:"DFID", ppos: %lld, count: %zu\n",
+	       file_dentry(file)->d_name.name,
+	       PFID(ll_inode2fid(file_inode(file))), iocb->ki_pos,
+	       iov_iter_count(from));
+
 	if (!iov_iter_count(from)) {
 		rc_normal = 0;
 		goto out;
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 54f0218..8c7361a 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -885,6 +885,7 @@ struct ll_readahead_work {
 	struct file			*lrw_file;
 	pgoff_t				 lrw_start_idx;
 	pgoff_t				 lrw_end_idx;
+	pid_t				 lrw_user_pid;
 
 	/* async worker to handler read */
 	struct work_struct		 lrw_readahead_work;
diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
index 0009c5f..d87a68d 100644
--- a/fs/lustre/llite/llite_mmap.c
+++ b/fs/lustre/llite/llite_mmap.c
@@ -377,9 +377,10 @@ static vm_fault_t ll_fault(struct vm_fault *vmf)
 	if (cached)
 		goto out;
 
-	CDEBUG(D_MMAP, DFID": vma=%p start=%#lx end=%#lx vm_flags=%#lx\n",
+	CDEBUG(D_MMAP|D_IOTRACE,
+	       DFID": vma=%p start=%#lx end=%#lx vm_flags=%#lx idx=%lu\n",
 	       PFID(&ll_i2info(file_inode(vma->vm_file))->lli_fid),
-	       vma, vma->vm_start, vma->vm_end, vma->vm_flags);
+	       vma, vma->vm_start, vma->vm_end, vma->vm_flags, vmf->pgoff);
 
 	/* Only SIGKILL and SIGTERM are allowed for fault/nopage/mkwrite
 	 * so that it can be killed by admin but not cause segfault by
@@ -440,8 +441,14 @@ static vm_fault_t ll_page_mkwrite(struct vm_fault *vmf)
 	bool retry;
 	bool cached;
 	int err;
-	vm_fault_t ret;
 	ktime_t kstart = ktime_get();
+	vm_fault_t ret;
+
+	CDEBUG(D_MMAP|D_IOTRACE,
+	       DFID": vma=%p start=%#lx end=%#lx vm_flags=%#lx idx=%lu\n",
+	       PFID(&ll_i2info(file_inode(vma->vm_file))->lli_fid),
+	       vma, vma->vm_start, vma->vm_end, vma->vm_flags,
+	       vmf->page->index);
 
 	ret = pcc_page_mkwrite(vma, vmf, &cached);
 	if (cached)
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index c9f29ef..9f6e140 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -595,6 +595,11 @@ static void ll_readahead_handle_work(struct work_struct *wq)
 	inode = file_inode(file);
 	sbi = ll_i2sbi(inode);
 
+	CDEBUG(D_READA|D_IOTRACE,
+	       "%s: async ra from %lu to %lu triggered by user pid %d\n",
+	       file_dentry(file)->d_name.name, work->lrw_start_idx,
+	       work->lrw_end_idx, work->lrw_user_pid);
+
 	env = cl_env_alloc(&refcheck, LCT_NOREF);
 	if (IS_ERR(env)) {
 		rc = PTR_ERR(env);
@@ -1301,7 +1306,7 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode,
 	spin_lock(&ras->ras_lock);
 
 	if (!hit)
-		CDEBUG(D_READA, DFID " pages at %lu miss.\n",
+		CDEBUG(D_READA|D_IOTRACE, DFID " pages at %lu miss.\n",
 		       PFID(ll_inode2fid(inode)), index);
 	ll_ra_stats_inc_sbi(sbi, hit ? RA_STAT_HIT : RA_STAT_MISS);
 
@@ -1670,7 +1675,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 			skip_index = vvp_index(vpg);
 		rc2 = ll_readahead(env, io, &queue->c2_qin, ras,
 				   uptodate, file, skip_index);
-		CDEBUG(D_READA, DFID " %d pages read ahead at %lu\n",
+		CDEBUG(D_READA|D_IOTRACE, DFID " %d pages read ahead at %lu\n",
 		       PFID(ll_inode2fid(inode)), rc2, vvp_index(vpg));
 	} else if (vvp_index(vpg) == io_start_index &&
 		   io_end_index - io_start_index > 0) {
@@ -1770,6 +1775,7 @@ static int kickoff_async_readahead(struct file *file, unsigned long pages)
 		lrw->lrw_file = get_file(file);
 		lrw->lrw_start_idx = start_idx;
 		lrw->lrw_end_idx = end_idx;
+		lrw->lrw_user_pid = current->pid;
 		spin_lock(&ras->ras_lock);
 		ras->ras_next_readahead_idx = end_idx + 1;
 		ras->ras_async_last_readpage_idx = start_idx;
diff --git a/include/uapi/linux/lnet/libcfs_debug.h b/include/uapi/linux/lnet/libcfs_debug.h
index 4cb6594..bbd9f25 100644
--- a/include/uapi/linux/lnet/libcfs_debug.h
+++ b/include/uapi/linux/lnet/libcfs_debug.h
@@ -106,6 +106,7 @@ struct ptldebug_header {
 #define D_TRACE		0x00000001 /* ENTRY/EXIT markers */
 #define D_INODE		0x00000002
 #define D_SUPER		0x00000004
+#define D_IOTRACE	0x00000008 /* simple, low overhead io tracing */
 #define D_MALLOC	0x00000010 /* print malloc, free information */
 #define D_CACHE		0x00000020 /* cache-related items */
 #define D_INFO		0x00000040 /* general information */
@@ -136,7 +137,7 @@ struct ptldebug_header {
 #define D_LAYOUT	0x80000000
 
 #define LIBCFS_DEBUG_MASKS_NAMES {					\
-	"trace", "inode", "super", "tty", "malloc", "cache", "info",	\
+	"trace", "inode", "super", "iotrace", "malloc", "cache", "info",\
 	"ioctl", "neterror", "net", "warning", "buffs", "other",	\
 	"dentry", "nettrace", "page", "dlmtrace", "error", "emerg",	\
 	"ha", "rpctrace", "vfstrace", "reada", "mmap", "config",	\
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 15/24] lustre: llite: Add start_idx debug
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (13 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 14/24] lustre: llite: Add D_IOTRACE James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 16/24] lnet: Skip router discovery on send path James Simmons
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

When readahead is triggered, current readahead debug
prints the page the user requested which triggered
readahead and the number of pages read by readahead.

However, readahead does not necessarily start reading from
the user requested page, so it's important to also print
the page where readahead starts.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15069
Lustre-commit: e13ed446337273a04 ("LU-15069 llite: Add start_idx debug")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45674
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/rw.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 9f6e140..b8cffde 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -713,12 +713,13 @@ static void ll_readahead_handle_work(struct work_struct *wq)
 static int ll_readahead(const struct lu_env *env, struct cl_io *io,
 			struct cl_page_list *queue,
 			struct ll_readahead_state *ras, bool hit,
-			struct file *file, pgoff_t skip_index)
+			struct file *file, pgoff_t skip_index,
+			pgoff_t *start_idx)
 {
 	struct vvp_io *vio = vvp_env_io(env);
 	struct ll_thread_info *lti = ll_env_info(env);
 	unsigned long pages, pages_min = 0;
-	pgoff_t ra_end_idx = 0, start_idx = 0, end_idx = 0;
+	pgoff_t ra_end_idx = 0, end_idx = 0;
 	struct inode *inode;
 	struct ra_io_arg *ria = &lti->lti_ria;
 	struct cl_object *clob;
@@ -761,16 +762,16 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	 * so that stride read ahead can work correctly.
 	 */
 	if (stride_io_mode(ras))
-		start_idx = max_t(pgoff_t, ras->ras_next_readahead_idx,
+		*start_idx = max_t(pgoff_t, ras->ras_next_readahead_idx,
 				  ras->ras_stride_offset >> PAGE_SHIFT);
 	else
-		start_idx = ras->ras_next_readahead_idx;
+		*start_idx = ras->ras_next_readahead_idx;
 
 	if (ras->ras_window_pages > 0)
 		end_idx = ras->ras_window_start_idx + ras->ras_window_pages - 1;
 
 	if (skip_index)
-		end_idx = start_idx + ras->ras_window_pages - 1;
+		end_idx = *start_idx + ras->ras_window_pages - 1;
 
 	/* Enlarge the RA window to encompass the full read */
 	if (vio->vui_ra_valid &&
@@ -787,7 +788,7 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io,
 			ria->ria_eof = true;
 		}
 	}
-	ria->ria_start_idx = start_idx;
+	ria->ria_start_idx = *start_idx;
 	ria->ria_end_idx = end_idx;
 	/* If stride I/O mode is detected, get stride window*/
 	if (stride_io_mode(ras)) {
@@ -1627,6 +1628,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 	struct cl_2queue *queue = &io->ci_queue;
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	struct cl_sync_io *anchor = NULL;
+	pgoff_t ra_start_index = 0;
 	pgoff_t io_start_index;
 	pgoff_t io_end_index;
 	int rc = 0, rc2 = 0;
@@ -1674,9 +1676,12 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 		if (ras->ras_next_readahead_idx < vvp_index(vpg))
 			skip_index = vvp_index(vpg);
 		rc2 = ll_readahead(env, io, &queue->c2_qin, ras,
-				   uptodate, file, skip_index);
-		CDEBUG(D_READA|D_IOTRACE, DFID " %d pages read ahead at %lu\n",
-		       PFID(ll_inode2fid(inode)), rc2, vvp_index(vpg));
+				   uptodate, file, skip_index,
+				   &ra_start_index);
+		CDEBUG(D_READA|D_IOTRACE,
+		       DFID " %d pages read ahead at %lu, triggered by user read at %lu\n",
+		       PFID(ll_inode2fid(inode)), rc2, ra_start_index,
+		       vvp_index(vpg));
 	} else if (vvp_index(vpg) == io_start_index &&
 		   io_end_index - io_start_index > 0) {
 		rc2 = ll_readpages(env, io, &queue->c2_qin, io_start_index + 1,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 16/24] lnet: Skip router discovery on send path
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (14 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 15/24] lustre: llite: Add start_idx debug James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 17/24] lustre: mdc: GET(X)ATTR to READPAGE portal James Simmons
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

When the router checker is enabled, routes are regularly marked as out
of date w.r.t. discovery. This can cause upper level messages to be
delayed while the router undergoes discovery. We can avoid delaying
messages by relying on the router checker to initiate discovery of
routers. If we happen to send a message to a router before it has
been discovered then the worst case scenario is that the route is
actually down or we end up utilizing a subset of a multi-rail router's
interfaces. Both situations can be remedied by utilizing the
check_routers_before_use parameter.

Change the logic in lnet_handle_find_routed_path() so that we only
initiate discovery if the alive_router_check_interval is <= 0 (i.e.
router checker pings are disabled).

WC-bug-id: https://jira.whamcloud.com/browse/LU-15275
Lustre-commit: c8e74c395d5634dbb ("LU-15275 lnet: Skip router discovery on send path")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/45684
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 133397e..8d4fd4d 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -2104,13 +2104,23 @@ struct lnet_ni *
 		LASSERT(gw == gwni->lpni_peer_net->lpn_peer);
 	}
 
-	/* Discover this gateway if it hasn't already been discovered.
-	 * This means we might delay the message until discovery has
-	 * completed
+	/* If the router checker is not active then discover the gateway here.
+	 * This ensures we are able to take advantage of multi-rail routing, but
+	 * if the router checker is active then we do not unecessarily delay
+	 * messages while the gateway is being checked by the dedicated monitor
+	 * thread.
+	 *
+	 * NB: We're only checking the alive_router_check_interval here, rather
+	 * than calling lnet_router_checker_active(), because the other
+	 * conditions that are checked by that function are either
+	 * irrelevant (the_lnet.ln_routing) or must be true (list of routers
+	 * is not empty)
 	 */
-	rc = lnet_initiate_peer_discovery(gwni, sd->sd_msg, sd->sd_cpt);
-	if (rc)
-		return rc;
+	if (alive_router_check_interval <= 0) {
+		rc = lnet_initiate_peer_discovery(gwni, sd->sd_msg, sd->sd_cpt);
+		if (rc)
+			return rc;
+	}
 
 	if (!sd->sd_best_ni) {
 		lpn = gwni->lpni_peer_net;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 17/24] lustre: mdc: GET(X)ATTR to READPAGE portal
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (15 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 16/24] lnet: Skip router discovery on send path James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 18/24] lnet: libcfs: set x->ls_len to 0 when x->ls_str is NULL James Simmons
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Send the MDS_GETATTR and MDS_GETXATTR RPCs to the
MDS_READPAGE_PORTAL instead of the default portal to avoid
deadlocks with other MDS_REINT RPCs that may block all of
the MDS service threads on that portal.

This deadlock occurs with MDS_GETXATTR when selinux is
enabled, because getxattr becomes part of lookup, so it
takes a reference on a lock used for lookup.  However, all
of the MDS service threads on the default portal can be
consumed by threads waiting for that lock, resulting in
a deadlock when the getxattr can't be processed.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15245
Lustre-commit: 5552eba1451d47ce1 ("LU-15245 mdc: GET(X)ATTR to READPAGE portal")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45593
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/mdc_request.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index 3284c01..1064d9f 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -224,6 +224,9 @@ static int mdc_getattr(struct obd_export *exp, struct md_op_data *op_data,
 		return rc;
 	}
 
+	/* LU-15245: avoid deadlock with modifying RPCs on MDS_REQUEST_PORTAL */
+	req->rq_request_portal = MDS_READPAGE_PORTAL;
+
 again:
 	mdc_pack_body(&req->rq_pill, &op_data->op_fid1, op_data->op_valid,
 		      op_data->op_mode, -1, 0);
@@ -402,6 +405,10 @@ static int mdc_xattr_common(struct obd_export *exp,
 	} else {
 		mdc_pack_body(&req->rq_pill, fid, valid, output_size,
 			      suppgid, flags);
+		/* Avoid deadlock with modifying RPCs on MDS_REQUEST_PORTAL.
+		 * See LU-15245.
+		 */
+		req->rq_request_portal = MDS_READPAGE_PORTAL;
 	}
 
 	if (xattr_name) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 18/24] lnet: libcfs: set x->ls_len to 0 when x->ls_str is NULL
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (16 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 17/24] lustre: mdc: GET(X)ATTR to READPAGE portal James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 19/24] lustre: uapi: set default max-inherit to 3 James Simmons
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Etienne AUJAMES <eaujames@ddn.com>

cfs_gettok() set next->ls_str to NULL if no delimiter is found but
it does not update next->ls_len to 0.

This patch fix cfs_gettok() to update "next->ls_len = 0;" if no
delimiter is found.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15130
Lustre-commit: cec864b7938f1138d ("LU-15130 nrs: null pointer dereference in nrs_tbf_id_parse")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/45291
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/libcfs/libcfs_string.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/lnet/libcfs/libcfs_string.c b/net/lnet/libcfs/libcfs_string.c
index 4259f8b8..0563c42 100644
--- a/net/lnet/libcfs/libcfs_string.c
+++ b/net/lnet/libcfs/libcfs_string.c
@@ -154,6 +154,7 @@ int cfs_str2mask(const char *str, const char *(*bit2str)(int bit),
 		/* there is no the delimeter in the string */
 		end = next->ls_str + next->ls_len;
 		next->ls_str = NULL;
+		next->ls_len = 0;
 	} else {
 		next->ls_str = end + 1;
 		next->ls_len -= (end - res->ls_str + 1);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 19/24] lustre: uapi: set default max-inherit to 3
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (17 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 18/24] lnet: libcfs: set x->ls_len to 0 when x->ls_str is NULL James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:37 ` [lustre-devel] [PATCH 20/24] lustre: llite: Switch pcc to lookup_one_len James Simmons
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lei Feng, Lustre Development List

From: Lei Feng <flei@whamcloud.com>

Change LMV_INHERIT_DEFAULT from 0 to 3. So that the default stripe
policy of dir will not be inherited unlimited and reduce performance
unexpectly.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15314
Lustre-commit: 956b4b1e0d9f18c6f ("LU-15314 utils: set default max-inherit to 3")
Signed-off-by: Lei Feng <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45874
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_user.h | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 1e66930..3b53a5b 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -838,23 +838,25 @@ enum lmv_type {
  */
 enum {
 	/* for historical reason, 0 means unlimited inheritance */
-	LMV_INHERIT_UNLIMITED	= 0,
-	/* unlimited lum_max_inherit by default */
-	LMV_INHERIT_DEFAULT	= 0,
+	LMV_INHERIT_UNLIMITED		= 0,
+	/* unlimited lum_max_inherit by default for plain stripe (0 or 1) */
+	LMV_INHERIT_DEFAULT_PLAIN	= LMV_INHERIT_UNLIMITED,
 	/* not inherit any more */
-	LMV_INHERIT_END		= 1,
+	LMV_INHERIT_END			= 1,
+	/* for multiple stripes, the default lum_max_inherit is 3 */
+	LMV_INHERIT_DEFAULT_STRIPED	= 3,
 	/* max inherit depth */
-	LMV_INHERIT_MAX		= 250,
+	LMV_INHERIT_MAX			= 250,
 	/* [251, 254] are reserved */
 	/* not set, or when inherit depth goes beyond end,  */
-	LMV_INHERIT_NONE	= 255,
+	LMV_INHERIT_NONE		= 255,
 };
 
 enum {
 	/* not set, or when inherit_rr depth goes beyond end,  */
 	LMV_INHERIT_RR_NONE		= 0,
 	/* disable lum_max_inherit_rr by default */
-	LMV_INHERIT_RR_DEFAULT		= 0,
+	LMV_INHERIT_RR_DEFAULT		= LMV_INHERIT_RR_NONE,
 	/* not inherit any more */
 	LMV_INHERIT_RR_END		= 1,
 	/* default inherit_rr of ROOT */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 20/24] lustre: llite: Switch pcc to lookup_one_len
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (18 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 19/24] lustre: uapi: set default max-inherit to 3 James Simmons
@ 2022-01-14  1:37 ` James Simmons
  2022-01-14  1:38 ` [lustre-devel] [PATCH 21/24] lustre: llite: revalidate dentry if LOOKUP lock fetched James Simmons
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

Using kern_path to lookup files in the PCC cache means we
are subject to user namespaces, so the PCC volume must be
mapped in to a container or the cached files cannot be
found.

One solution is to switch to using lookup_one_len - this is
what the code which *creates* PCC files does.  This
manually walks the path from the root, which avoids
namespace issues.

This is appropriate because PCC is kernel functionality -
the user should not be able to directly access the volume,
but it should be accessible as a cache.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15170
Lustre-commit: f3be560031cc7022a ("LU-15170 llite: Switch pcc to lookup_one_len")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45436
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/pcc.c | 119 +++++++++++++++++++++++++++++++++++---------------
 1 file changed, 84 insertions(+), 35 deletions(-)

diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c
index 85114b8..8bdf681e 100644
--- a/fs/lustre/llite/pcc.c
+++ b/fs/lustre/llite/pcc.c
@@ -1085,7 +1085,7 @@ void pcc_inode_free(struct inode *inode)
  * reduce overhead:
  * (fid->f_oid >> 16 & oxFFFF)/FID
  */
-#define MAX_PCC_DATABASE_PATH (6 * 5 + FID_NOBRACE_LEN + 1)
+#define PCC_DATASET_MAX_PATH (6 * 5 + FID_NOBRACE_LEN + 1)
 static int pcc_fid2dataset_path(char *buf, int sz, struct lu_fid *fid)
 {
 	return scnprintf(buf, sz, "%04x/%04x/%04x/%04x/%04x/%04x/"
@@ -1160,21 +1160,6 @@ static int pcc_get_layout_info(struct inode *inode, struct cl_layout *clt)
 	return rc < 0 ? rc : 0;
 }
 
-static int pcc_fid2dataset_fullpath(char *buf, int sz, struct lu_fid *fid,
-				    struct pcc_dataset *dataset)
-{
-	return scnprintf(buf, sz, "%s/%04x/%04x/%04x/%04x/%04x/%04x/"
-			 DFID_NOBRACE,
-			 dataset->pccd_pathname,
-			 (fid)->f_oid       & 0xFFFF,
-			 (fid)->f_oid >> 16 & 0xFFFF,
-			 (unsigned int)((fid)->f_seq       & 0xFFFF),
-			 (unsigned int)((fid)->f_seq >> 16 & 0xFFFF),
-			 (unsigned int)((fid)->f_seq >> 32 & 0xFFFF),
-			 (unsigned int)((fid)->f_seq >> 48 & 0xFFFF),
-			 PFID(fid));
-}
-
 /* Must be called with pcci->pcci_lock held */
 static void pcc_inode_attach_init(struct pcc_dataset *dataset,
 				  struct pcc_inode *pcci,
@@ -1221,6 +1206,72 @@ static inline bool pcc_inode_has_layout(struct pcc_inode *pcci)
 	return pcci->pcci_layout_gen != CL_LAYOUT_GEN_NONE;
 }
 
+static struct dentry *pcc_lookup(struct dentry *base, char *pathname)
+{
+	char *ptr = NULL, *component;
+	struct dentry *parent;
+	struct dentry *child = ERR_PTR(-ENOENT);
+
+	ptr = pathname;
+
+	/* move past any initial '/' to the start of the first path component*/
+	while (*ptr == '/')
+		ptr++;
+
+	/* store the start of the first path component */
+	component = ptr;
+
+	parent = dget(base);
+	while (ptr) {
+		/* find the start of the next component - if we don't find it,
+		 * the current component is the last component
+		 */
+		ptr = strchr(ptr, '/');
+		/* put a NUL char in place of the '/' before the next compnent
+		 * so we can treat this component as a string; note the full
+		 * path string is NUL terminated to this is not needed for the
+		 * last component
+		 */
+		if (ptr)
+			*ptr = '\0';
+
+		/* look up the current component */
+		inode_lock(parent->d_inode);
+		child = lookup_one_len(component, parent, strlen(component));
+		inode_unlock(parent->d_inode);
+
+		/* repair the path string: put '/' back in place of the NUL */
+		if (ptr)
+			*ptr = '/';
+
+		dput(parent);
+
+		if (IS_ERR_OR_NULL(child))
+			break;
+
+		/* we may find a cached negative dentry */
+		if (!d_is_positive(child)) {
+			dput(child);
+			child = NULL;
+			break;
+		}
+
+		/* descend in to the next level of the path */
+		parent = child;
+
+		/* move the pointer past the '/' to the next component */
+		if (ptr)
+			ptr++;
+		component = ptr;
+	}
+
+	/* NULL child means we didn't find anything */
+	if (!child)
+		child = ERR_PTR(-ENOENT);
+
+	return child;
+}
+
 static int pcc_try_dataset_attach(struct inode *inode, u32 gen,
 				  enum lu_pcc_type type,
 				  struct pcc_dataset *dataset,
@@ -1229,9 +1280,8 @@ static int pcc_try_dataset_attach(struct inode *inode, u32 gen,
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct pcc_inode *pcci = lli->lli_pcc_inode;
 	const struct cred *old_cred;
-	struct dentry *pcc_dentry;
-	struct path path;
-	char *pathname;
+	struct dentry *pcc_dentry = NULL;
+	char pathname[PCC_DATASET_MAX_PATH];
 	u32 pcc_gen;
 	int rc;
 
@@ -1239,27 +1289,27 @@ static int pcc_try_dataset_attach(struct inode *inode, u32 gen,
 	    !(dataset->pccd_flags & PCC_DATASET_RWPCC))
 		return 0;
 
-	pathname = kzalloc(PATH_MAX, GFP_KERNEL);
-	if (!pathname)
-		return -ENOMEM;
-
-	pcc_fid2dataset_fullpath(pathname, PATH_MAX, &lli->lli_fid, dataset);
+	rc = pcc_fid2dataset_path(pathname, PCC_DATASET_MAX_PATH,
+				  &lli->lli_fid);
 
 	old_cred = override_creds(pcc_super_cred(inode->i_sb));
-	rc = kern_path(pathname, LOOKUP_FOLLOW, &path);
-	if (rc) {
+	pcc_dentry = pcc_lookup(dataset->pccd_path.dentry, pathname);
+	if (IS_ERR(pcc_dentry)) {
+		rc = PTR_ERR(pcc_dentry);
+		CDEBUG(D_CACHE, "%s: path lookup error on "DFID":%s: rc = %d\n",
+		       ll_i2sbi(inode)->ll_fsname, PFID(&lli->lli_fid),
+		       pathname, rc);
 		/* ignore this error */
 		rc = 0;
 		goto out;
 	}
 
-	pcc_dentry = path.dentry;
 	rc = __vfs_getxattr(pcc_dentry, pcc_dentry->d_inode, pcc_xattr_layout,
 			    &pcc_gen, sizeof(pcc_gen));
 	if (rc < 0) {
 		/* ignore this error */
 		rc = 0;
-		goto out_put_path;
+		goto out_put_pcc_dentry;
 	}
 
 	rc = 0;
@@ -1271,7 +1321,7 @@ static int pcc_try_dataset_attach(struct inode *inode, u32 gen,
 			pcci = kmem_cache_zalloc(pcc_inode_slab, GFP_NOFS);
 			if (!pcci) {
 				rc = -ENOMEM;
-				goto out_put_path;
+				goto out_put_pcc_dentry;
 			}
 
 			pcc_inode_init(pcci, lli);
@@ -1294,11 +1344,10 @@ static int pcc_try_dataset_attach(struct inode *inode, u32 gen,
 		pcc_layout_gen_set(pcci, gen);
 		*cached = true;
 	}
-out_put_path:
-	path_put(&path);
+out_put_pcc_dentry:
+	dput(pcc_dentry);
 out:
 	revert_creds(old_cred);
-	kfree(pathname);
 	return rc;
 }
 
@@ -2072,11 +2121,11 @@ static int __pcc_inode_create(struct pcc_dataset *dataset,
 	struct dentry *child;
 	int rc = 0;
 
-	path = kzalloc(MAX_PCC_DATABASE_PATH, GFP_NOFS);
+	path = kzalloc(PCC_DATASET_MAX_PATH, GFP_NOFS);
 	if (!path)
 		return -ENOMEM;
 
-	pcc_fid2dataset_path(path, MAX_PCC_DATABASE_PATH, fid);
+	pcc_fid2dataset_path(path, PCC_DATASET_MAX_PATH, fid);
 
 	base = pcc_mkdir_p(dataset->pccd_path.dentry, path, 0);
 	if (IS_ERR(base)) {
@@ -2084,7 +2133,7 @@ static int __pcc_inode_create(struct pcc_dataset *dataset,
 		goto out;
 	}
 
-	snprintf(path, MAX_PCC_DATABASE_PATH, DFID_NOBRACE, PFID(fid));
+	snprintf(path, PCC_DATASET_MAX_PATH, DFID_NOBRACE, PFID(fid));
 	child = pcc_create(base, path, 0);
 	if (IS_ERR(child)) {
 		rc = PTR_ERR(child);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 21/24] lustre: llite: revalidate dentry if LOOKUP lock fetched
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (19 preceding siblings ...)
  2022-01-14  1:37 ` [lustre-devel] [PATCH 20/24] lustre: llite: Switch pcc to lookup_one_len James Simmons
@ 2022-01-14  1:38 ` James Simmons
  2022-01-14  1:38 ` [lustre-devel] [PATCH 22/24] lustre: llite: Simplify cda_no_aio_complete use James Simmons
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

Once ll_inode_revalidate() fetches LOOKUP lock, it should revalidate
dentry, so subsequent lookup can find it in dcache.

It should also update lli_dir_depth.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15200
Lustre-commit: 92fadf9cc1d06b21b ("LU-15200 llite: revalidate dentry if LOOKUP lock fetched")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45599
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dcache.c         | 19 +++++++++++++++++--
 fs/lustre/llite/file.c           |  2 +-
 fs/lustre/llite/llite_internal.h |  2 +-
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/llite/dcache.c b/fs/lustre/llite/dcache.c
index a074a2c..d9fb0cd 100644
--- a/fs/lustre/llite/dcache.c
+++ b/fs/lustre/llite/dcache.c
@@ -200,15 +200,30 @@ void ll_prune_aliases(struct inode *inode)
 
 int ll_revalidate_it_finish(struct ptlrpc_request *request,
 			    struct lookup_intent *it,
-			    struct inode *inode)
+			    struct dentry *de)
 {
+	struct inode *inode = d_inode(de);
+	u64 bits = 0;
+	int rc;
+
 	if (!request)
 		return 0;
 
 	if (it_disposition(it, DISP_LOOKUP_NEG))
 		return -ENOENT;
 
-	return ll_prep_inode(&inode, &request->rq_pill, NULL, it);
+	rc = ll_prep_inode(&inode, &request->rq_pill, NULL, it);
+	if (rc)
+		return rc;
+
+	ll_set_lock_data(ll_i2sbi(inode)->ll_md_exp, inode, it,
+			 &bits);
+	if (bits & MDS_INODELOCK_LOOKUP) {
+		ll_update_dir_depth(de->d_parent->d_inode, inode);
+		d_lustre_revalidate(de);
+	}
+
+	return rc;
 }
 
 void ll_lookup_finish_locks(struct lookup_intent *it, struct inode *inode)
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index dec0109..d9b1457 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -5033,7 +5033,7 @@ static int ll_inode_revalidate(struct dentry *dentry, enum ldlm_intent_flags op)
 		goto out;
 	}
 
-	rc = ll_revalidate_it_finish(req, &oit, inode);
+	rc = ll_revalidate_it_finish(req, &oit, dentry);
 	if (rc != 0) {
 		ll_intent_release(&oit);
 		goto out;
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 8c7361a..dd338f2 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1177,7 +1177,7 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size,
 void ll_prune_aliases(struct inode *inode);
 void ll_lookup_finish_locks(struct lookup_intent *it, struct inode *inode);
 int ll_revalidate_it_finish(struct ptlrpc_request *request,
-			    struct lookup_intent *it, struct inode *inode);
+			    struct lookup_intent *it, struct dentry *de);
 
 /* llite/llite_lib.c */
 extern const struct super_operations lustre_super_operations;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 22/24] lustre: llite: Simplify cda_no_aio_complete use
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (20 preceding siblings ...)
  2022-01-14  1:38 ` [lustre-devel] [PATCH 21/24] lustre: llite: revalidate dentry if LOOKUP lock fetched James Simmons
@ 2022-01-14  1:38 ` James Simmons
  2022-01-14  1:38 ` [lustre-devel] [PATCH 23/24] lustre: osc: Always set aio in anchor James Simmons
  2022-01-14  1:38 ` [lustre-devel] [PATCH 24/24] lustre: llite: Implement lower/upper aio James Simmons
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

It is better to handle AIO and DIO the same as much as
possible, limiting the difference to setup if possible.

In this spirit, move the check for DIO (is_sync_kiocb()) to
the setup function rather than cleanup and just use
no_aio_complete.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: b60bd21ec5d5f34ed ("LU-13799 llite: Simplify cda_no_aio_complete use")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44154
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/cl_io.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index f33a5f38..675116d 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -1135,8 +1135,7 @@ static void cl_aio_end(const struct lu_env *env, struct cl_sync_io *anchor)
 		cl_page_put(env, page);
 	}
 
-	if (!is_sync_kiocb(aio->cda_iocb) && !aio->cda_no_aio_complete &&
-	    aio->cda_iocb->ki_complete)
+	if (!aio->cda_no_aio_complete)
 		aio->cda_iocb->ki_complete(aio->cda_iocb,
 					   ret ?: aio->cda_bytes, 0);
 }
@@ -1156,7 +1155,10 @@ struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj)
 				       cl_aio_end);
 		cl_page_list_init(&aio->cda_pages);
 		aio->cda_iocb = iocb;
-		aio->cda_no_aio_complete = 0;
+		if (is_sync_kiocb(iocb))
+			aio->cda_no_aio_complete = 1;
+		else
+			aio->cda_no_aio_complete = 0;
 		cl_object_get(obj);
 		aio->cda_obj = obj;
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 23/24] lustre: osc: Always set aio in anchor
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (21 preceding siblings ...)
  2022-01-14  1:38 ` [lustre-devel] [PATCH 22/24] lustre: llite: Simplify cda_no_aio_complete use James Simmons
@ 2022-01-14  1:38 ` James Simmons
  2022-01-14  1:38 ` [lustre-devel] [PATCH 24/24] lustre: llite: Implement lower/upper aio James Simmons
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

We currently do not set csi_aio for DIO and use this to
control when we free the aio struct.  (For AIO, we must
free it in cl_sync_io_note, but for other users, we have to
wait until after cl_sync_io_wait has been called.)

The lack of csi_aio causes trouble for the implementation
of the next patch, so instead we always set it and control
freeing by checking at that time if we are doing DIO.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: eadccb33ac4bbe54a ("LU-13799 osc: Always set aio in anchor")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44153
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/cl_io.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index 675116d..b72f5db 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -1150,9 +1150,7 @@ struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj)
 		 * Hold one ref so that it won't be released until
 		 * every pages is added.
 		 */
-		cl_sync_io_init_notify(&aio->cda_sync, 1,
-				       is_sync_kiocb(iocb) ? NULL : aio,
-				       cl_aio_end);
+		cl_sync_io_init_notify(&aio->cda_sync, 1, aio, cl_aio_end);
 		cl_page_list_init(&aio->cda_pages);
 		aio->cda_iocb = iocb;
 		if (is_sync_kiocb(iocb))
@@ -1203,16 +1201,20 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor,
 		wake_up_locked(&anchor->csi_waitq);
 		if (end_io)
 			end_io(env, anchor);
-		if (anchor->csi_aio)
-			aio = anchor->csi_aio;
+
+		aio = anchor->csi_aio;
 
 		spin_unlock(&anchor->csi_waitq.lock);
 
 		/**
-		 * If anchor->csi_aio is set, we are responsible for freeing
-		 * memory here rather than when cl_sync_io_wait() completes.
+		 * For AIO (!is_sync_kiocb), we are responsible for freeing
+		 * memory here.  This is because we are the last user of this
+		 * aio struct, whereas in other cases, we will call
+		 * cl_sync_io_wait to wait after this, and so the memory is
+		 * freed after that call.
 		 */
-		cl_aio_free(env, aio);
+		if (aio && !is_sync_kiocb(aio->cda_iocb))
+			cl_aio_free(env, aio);
 	}
 }
 EXPORT_SYMBOL(cl_sync_io_note);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [lustre-devel] [PATCH 24/24] lustre: llite: Implement lower/upper aio
  2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
                   ` (22 preceding siblings ...)
  2022-01-14  1:38 ` [lustre-devel] [PATCH 23/24] lustre: osc: Always set aio in anchor James Simmons
@ 2022-01-14  1:38 ` James Simmons
  23 siblings, 0 replies; 25+ messages in thread
From: James Simmons @ 2022-01-14  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

This patch creates a lower level aio struct for each set of
pages submitted, and attaches that to the llite level aio.

That means the completion of i/o (in the sense of
successful RPC/page completion) is associated with the
lower level aio struct, and the higher level aio waits for
the completion of these lower level structs.  Previously,
all pages were associated with the upper level (and only)
aio struct.

This patch is a reorganization/cleanup, which is necessary
for the next patch, which moves release pages to aio_end.
The justification for this (correctness and performance)
will be provided in that patch.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: 46ff76137160b66f1 ("LU-13799 llite: Implement lower/upper aio")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44209
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h |  7 +++++--
 fs/lustre/llite/file.c        |  2 +-
 fs/lustre/llite/rw26.c        | 34 +++++++++++++++++++++++++--------
 fs/lustre/obdclass/cl_io.c    | 44 +++++++++++++++++++++++++++++++++----------
 4 files changed, 66 insertions(+), 21 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 1746c4e..9815b19 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -2592,7 +2592,8 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor,
 		     int ioret);
 int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor,
 			    long timeout, int ioret);
-struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj);
+struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj,
+				struct cl_dio_aio *ll_aio);
 void cl_aio_free(const struct lu_env *env, struct cl_dio_aio *aio);
 
 static inline void cl_sync_io_init(struct cl_sync_io *anchor, int nr)
@@ -2626,7 +2627,9 @@ struct cl_dio_aio {
 	struct cl_object	*cda_obj;
 	struct kiocb		*cda_iocb;
 	ssize_t			cda_bytes;
-	unsigned int		cda_no_aio_complete:1;
+	struct cl_dio_aio	*cda_ll_aio;
+	unsigned int		cda_no_aio_complete:1,
+				cda_no_aio_free:1;
 };
 
 /** @} cl_sync_io */
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index d9b1457..6b95133 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -1684,7 +1684,7 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 			is_parallel_dio = false;
 
 		ci_aio = cl_aio_alloc(args->u.normal.via_iocb,
-				      ll_i2info(inode)->lli_clob);
+				      ll_i2info(inode)->lli_clob, NULL);
 		if (!ci_aio) {
 			rc = -ENOMEM;
 			goto out;
diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index 4c2ab38..16cccfa 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -330,7 +330,8 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	struct cl_io *io;
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
-	struct cl_dio_aio *aio;
+	struct cl_dio_aio *ll_aio;
+	struct cl_dio_aio *ldp_aio;
 	size_t count = iov_iter_count(iter);
 	ssize_t tot_bytes = 0, result = 0;
 	loff_t file_offset = iocb->ki_pos;
@@ -365,12 +366,12 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	io = lcc->lcc_io;
 	LASSERT(io);
 
-	aio = io->ci_aio;
-	LASSERT(aio);
-	LASSERT(aio->cda_iocb == iocb);
+	ll_aio = io->ci_aio;
+	LASSERT(ll_aio);
+	LASSERT(ll_aio->cda_iocb == iocb);
 
 	while (iov_iter_count(iter)) {
-		struct ll_dio_pages pvec = { .ldp_aio = aio };
+		struct ll_dio_pages pvec = {};
 		struct page **pages;
 
 		count = min_t(size_t, iov_iter_count(iter), MAX_DIO_SIZE);
@@ -382,10 +383,23 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 				count = i_size_read(inode) - file_offset;
 		}
 
+		/* this aio is freed on completion from cl_sync_io_note, so we
+		 * do not need to directly free the memory here
+		 */
+		ldp_aio = cl_aio_alloc(iocb, ll_i2info(inode)->lli_clob,
+				       ll_aio);
+		if (!ldp_aio) {
+			result = -ENOMEM;
+			goto out;
+		}
+		pvec.ldp_aio = ldp_aio;
+
 		result = ll_get_user_pages(rw, iter, &pages,
 					   &pvec.ldp_count, count);
-		if (unlikely(result <= 0))
+		if (unlikely(result <= 0)) {
+			cl_sync_io_note(env, &ldp_aio->cda_sync, result);
 			goto out;
+		}
 
 		count = result;
 		pvec.ldp_file_offset = file_offset;
@@ -393,6 +407,10 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 
 		result = ll_direct_rw_pages(env, io, count,
 					    rw, inode, &pvec);
+		/* We've submitted pages and can now remove the extra
+		 * reference for that
+		 */
+		cl_sync_io_note(env, &ldp_aio->cda_sync, result);
 		ll_free_user_pages(pages, pvec.ldp_count);
 
 		if (unlikely(result < 0))
@@ -404,7 +422,7 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	}
 
 out:
-	aio->cda_bytes += tot_bytes;
+	ll_aio->cda_bytes += tot_bytes;
 
 	if (rw == WRITE)
 		vio->u.readwrite.vui_written += tot_bytes;
@@ -424,7 +442,7 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		ssize_t rc2;
 
 		/* Wait here rather than doing async submission */
-		rc2 = cl_sync_io_wait_recycle(env, &aio->cda_sync, 0, 0);
+		rc2 = cl_sync_io_wait_recycle(env, &ll_aio->cda_sync, 0, 0);
 		if (result == 0 && rc2)
 			result = rc2;
 
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index b72f5db..038ab4c 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -1138,9 +1138,13 @@ static void cl_aio_end(const struct lu_env *env, struct cl_sync_io *anchor)
 	if (!aio->cda_no_aio_complete)
 		aio->cda_iocb->ki_complete(aio->cda_iocb,
 					   ret ?: aio->cda_bytes, 0);
+
+	if (aio->cda_ll_aio)
+		cl_sync_io_note(env, &aio->cda_ll_aio->cda_sync, ret);
 }
 
-struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj)
+struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj,
+				struct cl_dio_aio *ll_aio)
 {
 	struct cl_dio_aio *aio;
 
@@ -1153,12 +1157,30 @@ struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj)
 		cl_sync_io_init_notify(&aio->cda_sync, 1, aio, cl_aio_end);
 		cl_page_list_init(&aio->cda_pages);
 		aio->cda_iocb = iocb;
-		if (is_sync_kiocb(iocb))
+		if (is_sync_kiocb(iocb) || ll_aio)
 			aio->cda_no_aio_complete = 1;
 		else
 			aio->cda_no_aio_complete = 0;
+		/* in the case of a lower level aio struct (ll_aio is set), or
+		 * true AIO (!is_sync_kiocb()), the memory is freed by
+		 * the daemons calling cl_sync_io_note, because they are the
+		 * last users of the aio struct
+		 *
+		 * in other cases, the last user is cl_sync_io_wait, and in
+		 * that case, the caller frees the aio struct after that call
+		 * completes
+		 */
+		if (ll_aio || !is_sync_kiocb(iocb))
+			aio->cda_no_aio_free = 0;
+		else
+			aio->cda_no_aio_free = 1;
+
 		cl_object_get(obj);
 		aio->cda_obj = obj;
+		aio->cda_ll_aio = ll_aio;
+
+		if (ll_aio)
+			atomic_add(1,  &ll_aio->cda_sync.csi_sync_nr);
 	}
 	return aio;
 }
@@ -1206,14 +1228,7 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor,
 
 		spin_unlock(&anchor->csi_waitq.lock);
 
-		/**
-		 * For AIO (!is_sync_kiocb), we are responsible for freeing
-		 * memory here.  This is because we are the last user of this
-		 * aio struct, whereas in other cases, we will call
-		 * cl_sync_io_wait to wait after this, and so the memory is
-		 * freed after that call.
-		 */
-		if (aio && !is_sync_kiocb(aio->cda_iocb))
+		if (aio && !aio->cda_no_aio_free)
 			cl_aio_free(env, aio);
 	}
 }
@@ -1223,8 +1238,15 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor,
 int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor,
 			    long timeout, int ioret)
 {
+	bool no_aio_free = anchor->csi_aio->cda_no_aio_free;
 	int rc = 0;
 
+	/* for true AIO, the daemons running cl_sync_io_note would normally
+	 * free the aio struct, but if we're waiting on it, we need them to not
+	 * do that.  This ensures the aio is not freed when we drop the
+	 * reference count to zero in cl_sync_io_note below
+	 */
+	anchor->csi_aio->cda_no_aio_free = 1;
 	/*
 	 * @anchor was inited as 1 to prevent end_io to be
 	 * called before we add all pages for IO, so drop
@@ -1244,6 +1266,8 @@ int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor,
 	 */
 	atomic_add(1, &anchor->csi_sync_nr);
 
+	anchor->csi_aio->cda_no_aio_free = no_aio_free;
+
 	return rc;
 }
 EXPORT_SYMBOL(cl_sync_io_wait_recycle);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-01-14  1:39 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-14  1:37 [lustre-devel] [PATCH 00/24] lustre: update to OpenSFS Jan 13, 2022 James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 01/24] lustre: osc: don't have extra gpu call James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 02/24] lustre: llite: add trusted.projid virtual xattr James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 03/24] lnet: o2iblnd: cleanup James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 04/24] lustre: ptlrpc: make rq_replied flag always correct James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 05/24] lustre: mgc: do not ignore target registration failure James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 06/24] lustre: llite: make foreign symlinks aware of mount namespaces James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 07/24] lustre: lov: Cache stripe offset calculation James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 08/24] lnet: o2iblnd: treat cmid->device == NULL as an error James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 09/24] lustre: lmv: set default LMV for "lfs mkdir -c 1" James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 10/24] lnet: socklnd: decrement connection counters on close James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 11/24] lustre: lmv: improve MDT QOS space balance James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 12/24] lustre: llite: access striped directory with missing stripe James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 13/24] lnet: libcfs: Remove D_TTY James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 14/24] lustre: llite: Add D_IOTRACE James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 15/24] lustre: llite: Add start_idx debug James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 16/24] lnet: Skip router discovery on send path James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 17/24] lustre: mdc: GET(X)ATTR to READPAGE portal James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 18/24] lnet: libcfs: set x->ls_len to 0 when x->ls_str is NULL James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 19/24] lustre: uapi: set default max-inherit to 3 James Simmons
2022-01-14  1:37 ` [lustre-devel] [PATCH 20/24] lustre: llite: Switch pcc to lookup_one_len James Simmons
2022-01-14  1:38 ` [lustre-devel] [PATCH 21/24] lustre: llite: revalidate dentry if LOOKUP lock fetched James Simmons
2022-01-14  1:38 ` [lustre-devel] [PATCH 22/24] lustre: llite: Simplify cda_no_aio_complete use James Simmons
2022-01-14  1:38 ` [lustre-devel] [PATCH 23/24] lustre: osc: Always set aio in anchor James Simmons
2022-01-14  1:38 ` [lustre-devel] [PATCH 24/24] lustre: llite: Implement lower/upper aio James Simmons

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.