lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021
@ 2021-12-29 14:51 James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 01/13] lustre: sec: filename encryption - digest support James Simmons
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Port the latest OpenSFS work to the Linux client as of Dec 29,
2021

Alexander Boyko (1):
  lustre: mdc: add client tunable to disable LSOM update

Alexey Lyashkov (1):
  lustre: ptlrpc: use a cached value

Arshad Hussain (1):
  lustre: quota: fallocate send UID/GID for quota

Chris Horn (2):
  lnet: Revert "lnet: Lock primary NID logic"
  lnet: Race on discovery queue

James Simmons (1):
  lnet: o2iblnd: convert ibp_refcount to a kref

Lai Siyao (1):
  lustre: dne: dir migration in non-recursive mode

Oleg Drokin (1):
  lustre: update version to 2.14.56

Qian Yingjin (1):
  lustre: llite: set ra_pages of backing_dev_info with 0

Sebastien Buisson (3):
  lustre: sec: filename encryption - digest support
  lustre: sec: no encryption key migrate/extend/resync/split
  lustre: sec: fix handling of encrypted file with long name

Serguei Smirnov (1):
  lnet: socklnd: expect two control connections maximum

 fs/lustre/include/cl_object.h           |   2 +
 fs/lustre/include/lustre_net.h          |   2 +-
 fs/lustre/include/obd.h                 |   4 +-
 fs/lustre/llite/crypto.c                | 175 +++++++++++++++++++++++++++-----
 fs/lustre/llite/dir.c                   |  20 +++-
 fs/lustre/llite/file.c                  |  66 ++++++++----
 fs/lustre/llite/llite_internal.h        |  25 ++++-
 fs/lustre/llite/llite_lib.c             | 127 +++++++++++++++++++++--
 fs/lustre/llite/namei.c                 |  83 ++++++++-------
 fs/lustre/llite/rw26.c                  |   2 +-
 fs/lustre/llite/statahead.c             |   8 +-
 fs/lustre/llite/vvp_io.c                |   3 -
 fs/lustre/llite/xattr.c                 |   4 +-
 fs/lustre/lmv/lmv_obd.c                 |   5 +
 fs/lustre/lov/lov_io.c                  |   4 +
 fs/lustre/mdc/lproc_mdc.c               |  29 ++++++
 fs/lustre/mdc/mdc_lib.c                 |   2 +
 fs/lustre/mdc/mdc_locks.c               |   8 +-
 fs/lustre/mdc/mdc_request.c             |  13 ++-
 fs/lustre/osc/osc_io.c                  |   8 +-
 fs/lustre/osc/osc_request.c             |  42 ++++++--
 fs/lustre/ptlrpc/pack_generic.c         |   8 +-
 fs/lustre/ptlrpc/ptlrpc_internal.h      |   1 +
 fs/lustre/ptlrpc/ptlrpc_module.c        |   1 +
 fs/lustre/ptlrpc/sec_null.c             |   4 +-
 fs/lustre/ptlrpc/sec_plain.c            |   2 +-
 fs/lustre/ptlrpc/wiretest.c             |   6 ++
 include/uapi/linux/lustre/lustre_idl.h  |  16 ++-
 include/uapi/linux/lustre/lustre_user.h |   7 +-
 include/uapi/linux/lustre/lustre_ver.h  |   4 +-
 net/lnet/klnds/o2iblnd/o2iblnd.c        |  11 +-
 net/lnet/klnds/o2iblnd/o2iblnd.h        |  35 ++++---
 net/lnet/klnds/socklnd/socklnd.h        |   2 +-
 net/lnet/lnet/peer.c                    | 114 +++++++--------------
 34 files changed, 607 insertions(+), 236 deletions(-)

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 01/13] lustre: sec: filename encryption - digest support
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 02/13] lnet: Revert "lnet: Lock primary NID logic" James Simmons
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

A number of operations are allowed on encrypted files without the key:
- read file metadata (stat);
- list directories;
- remove files and directories.
In order to present valid names to users, cipher text names are base64
encoded if they are short. Otherwise we compute a digested form of the
cipher text, made of the FID (16 bytes) followed by the second-to-last
cipher block (16 bytes), and we base64 encode this digested form for
presentation to user.
These transformations are carried out in the specific overlay
functions, that now need to know the fid of the file.

As the digested form does not contain the whole cipher text name,
server side needs to proceed to an operation by FID for requests such
as lookup and getattr. It also relies on the content of the LinkEA to
verify the digested form as received from client side.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13717
Lustre-commit: ed4a625d88567a249 ("LU-13717 sec: filename encryption - digest support")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/43392
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/crypto.c                | 130 +++++++++++++++++++++++++++-----
 fs/lustre/llite/dir.c                   |   2 +-
 fs/lustre/llite/llite_internal.h        |  15 +++-
 fs/lustre/llite/llite_lib.c             |  11 ++-
 fs/lustre/llite/namei.c                 |  19 +++--
 fs/lustre/llite/statahead.c             |   8 +-
 fs/lustre/mdc/mdc_lib.c                 |   2 +
 fs/lustre/mdc/mdc_locks.c               |   4 +-
 fs/lustre/mdc/mdc_request.c             |   9 +++
 include/uapi/linux/lustre/lustre_idl.h  |  14 ++--
 include/uapi/linux/lustre/lustre_user.h |   3 +-
 11 files changed, 180 insertions(+), 37 deletions(-)

diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c
index 0388e360..7bc6e01 100644
--- a/fs/lustre/llite/crypto.c
+++ b/fs/lustre/llite/crypto.c
@@ -178,19 +178,70 @@ static bool ll_empty_dir(struct inode *inode)
  *	->lookup() or we're finding the dir_entry for deletion; 0 if we cannot
  *	proceed without the key because we're going to create the dir_entry.
  * @fname: the filename information to be filled in
+ * @fid: fid retrieved from user-provided filename
  *
  * This overlay function is necessary to properly encode @fname after
  * encryption, as it will be sent over the wire.
+ * This overlay function is also necessary to handle the case of operations
+ * carried out without the key. Normally llcrypt makes use of digested names in
+ * that case. Having a digested name works for local file systems that can call
+ * llcrypt_match_name(), but Lustre server side is not aware of encryption.
+ * So for keyless @lookup operations on long names, for Lustre we choose to
+ * present to users the encoded struct ll_digest_filename, instead of a digested
+ * name. FID and name hash can then easily be extracted and put into the
+ * requests sent to servers.
  */
 int ll_setup_filename(struct inode *dir, const struct qstr *iname,
-		      int lookup, struct fscrypt_name *fname)
+		      int lookup, struct fscrypt_name *fname,
+		      struct lu_fid *fid)
 {
+	int digested = 0;
+	struct qstr dname;
 	int rc;
 
-	rc = fscrypt_setup_filename(dir, iname, lookup, fname);
+	if (fid) {
+		fid->f_seq = 0;
+		fid->f_oid = 0;
+		fid->f_ver = 0;
+	}
+
+	if (fid && IS_ENCRYPTED(dir) && !fscrypt_has_encryption_key(dir) &&
+	    iname->name[0] == '_')
+		digested = 1;
+
+	dname.name = iname->name + digested;
+	dname.len = iname->len - digested;
+
+	if (fid) {
+		fid->f_seq = 0;
+		fid->f_oid = 0;
+		fid->f_ver = 0;
+	}
+	rc = fscrypt_setup_filename(dir, &dname, lookup, fname);
 	if (rc)
 		return rc;
 
+	if (digested) {
+		/* Without the key, for long names user should have struct
+		 * ll_digest_filename representation of the dentry instead of
+		 * the name. So make sure it is valid, return fid and put
+		 * excerpt of cipher text name in disk_name.
+		 */
+		struct ll_digest_filename *digest;
+
+		if (fname->crypto_buf.len < sizeof(struct ll_digest_filename)) {
+			rc = -EINVAL;
+			goto out_free;
+		}
+		digest = (struct ll_digest_filename *)fname->crypto_buf.name;
+		*fid = digest->ldf_fid;
+		if (!fid_is_sane(fid)) {
+			rc = -EINVAL;
+			goto out_free;
+		}
+		fname->disk_name.name = digest->ldf_excerpt;
+		fname->disk_name.len = LLCRYPT_FNAME_DIGEST_SIZE;
+	}
 	if (IS_ENCRYPTED(dir) &&
 	    !name_is_dot_or_dotdot(fname->disk_name.name,
 				   fname->disk_name.len)) {
@@ -224,6 +275,11 @@ int ll_setup_filename(struct inode *dir, const struct qstr *iname,
 	return rc;
 }
 
+#define LLCRYPT_FNAME_DIGEST(name, len) \
+	((name) + round_down((len) - FS_CRYPTO_BLOCK_SIZE - 1, \
+			     FS_CRYPTO_BLOCK_SIZE))
+#define LLCRYPT_FNAME_MAX_UNDIGESTED_SIZE	32
+
 /**
  * ll_fname_disk_to_usr() - overlay to fscrypt_fname_disk_to_usr
  * @inode: the inode to convert name
@@ -231,40 +287,76 @@ int ll_setup_filename(struct inode *dir, const struct qstr *iname,
  * @minor_hash: minor hash for inode
  * @iname: the user-provided filename needing conversion
  * @oname: the filename information to be filled in
+ * @fid: the user-provided fid for filename
  *
  * The caller must have allocated sufficient memory for the @oname string.
  *
  * This overlay function is necessary to properly decode @iname before
  * decryption, as it comes from the wire.
+ * This overlay function is also necessary to handle the case of operations
+ * carried out without the key. Normally llcrypt makes use of digested names in
+ * that case. Having a digested name works for local file systems that can call
+ * llcrypt_match_name(), but Lustre server side is not aware of encryption.
+ * So for keyless @lookup operations on long names, for Lustre we choose to
+ * present to users the encoded struct ll_digest_filename, instead of a digested
+ * name. FID and name hash can then easily be extracted and put into the
+ * requests sent to servers.
  */
 int ll_fname_disk_to_usr(struct inode *inode,
 			 u32 hash, u32 minor_hash,
-			 struct fscrypt_str *iname, struct fscrypt_str *oname)
+			 struct fscrypt_str *iname, struct fscrypt_str *oname,
+			 struct lu_fid *fid)
 {
 	struct fscrypt_str lltr = FSTR_INIT(iname->name, iname->len);
+	struct ll_digest_filename digest;
+	int digested = 0;
 	char *buf = NULL;
 	int rc;
 
-	if (IS_ENCRYPTED(inode) &&
-	    !name_is_dot_or_dotdot(lltr.name, lltr.len) &&
-	    strnchr(lltr.name, lltr.len, '=')) {
-		/* Only proceed to critical decode if
-		 * iname contains espace char '='.
-		 */
-		int len = lltr.len;
-
-		buf = kmalloc(len, GFP_NOFS);
-		if (!buf)
-			return -ENOMEM;
-
-		len = critical_decode(lltr.name, len, buf);
-		lltr.name = buf;
-		lltr.len = len;
+	if (IS_ENCRYPTED(inode)) {
+		if (!name_is_dot_or_dotdot(lltr.name, lltr.len) &&
+		    strnchr(lltr.name, lltr.len, '=')) {
+			/* Only proceed to critical decode if
+			 * iname contains espace char '='.
+			 */
+			int len = lltr.len;
+
+			buf = kmalloc(len, GFP_NOFS);
+			if (!buf)
+				return -ENOMEM;
+
+			len = critical_decode(lltr.name, len, buf);
+			lltr.name = buf;
+			lltr.len = len;
+		}
+		if (lltr.len > LLCRYPT_FNAME_MAX_UNDIGESTED_SIZE &&
+		    !fscrypt_has_encryption_key(inode)) {
+			digested = 1;
+			/* Without the key for long names, set the dentry name
+			 * to the representing struct ll_digest_filename. It
+			 * will be encoded by llcrypt for display, and will
+			 * enable further lookup requests.
+			 */
+			if (!fid)
+				return -EINVAL;
+			digest.ldf_fid = *fid;
+			memcpy(digest.ldf_excerpt,
+			       LLCRYPT_FNAME_DIGEST(lltr.name, lltr.len),
+			       LLCRYPT_FNAME_DIGEST_SIZE);
+
+			lltr.name = (char *)&digest;
+			lltr.len = sizeof(digest);
+
+			oname->name[0] = '_';
+			oname->name = oname->name + 1;
+			oname->len--;
+		}
 	}
-
 	rc = fscrypt_fname_disk_to_usr(inode, hash, minor_hash, &lltr, oname);
 
 	kfree(buf);
+	oname->name = oname->name - digested;
+	oname->len = oname->len + digested;
 
 	return rc;
 }
diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index ee49c90..23d3fba 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -250,7 +250,7 @@ int ll_dir_read(struct inode *inode, u64 *ppos, struct md_op_data *op_data,
 					= FSTR_INIT(ent->lde_name, namelen);
 
 				rc = ll_fname_disk_to_usr(inode, 0, 0, &de_name,
-							  &lltr);
+							  &lltr, &fid);
 				de_name = lltr;
 				lltr.len = save_len;
 				if (rc) {
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 01672b8..6e212c9 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1705,11 +1705,22 @@ static inline struct pcc_super *ll_info2pccs(struct ll_inode_info *lli)
 
 /* crypto.c */
 #ifdef CONFIG_FS_ENCRYPTION
+/* The digested form is made of a FID (16 bytes) followed by the second-to-last
+ * ciphertext block (16 bytes), so a total length of 32 bytes.
+ * That way, llcrypt does not compute a digested form of this digest.
+ */
+struct ll_digest_filename {
+	struct lu_fid ldf_fid;
+	char ldf_excerpt[LLCRYPT_FNAME_DIGEST_SIZE];
+};
+
 int ll_setup_filename(struct inode *dir, const struct qstr *iname,
-		      int lookup, struct fscrypt_name *fname);
+		      int lookup, struct fscrypt_name *fname,
+		      struct lu_fid *fid);
 int ll_fname_disk_to_usr(struct inode *inode,
 			 u32 hash, u32 minor_hash,
-			 struct fscrypt_str *iname, struct fscrypt_str *oname);
+			 struct fscrypt_str *iname, struct fscrypt_str *oname,
+			 struct lu_fid *fid);
 int ll_revalidate_d_crypto(struct dentry *dentry, unsigned int flags);
 #else
 int ll_setup_filename(struct inode *dir, const struct qstr *iname,
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index dddbe7a..7f168a2 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -3067,6 +3067,8 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 	} else if (name && namelen) {
 		struct qstr dname = QSTR_INIT(name, namelen);
 		struct inode *dir;
+		struct lu_fid *pfid = NULL;
+		struct lu_fid fid;
 		int lookup;
 
 		if (!S_ISDIR(i1->i_mode) && i2 && S_ISDIR(i2->i_mode)) {
@@ -3077,11 +3079,18 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 			dir = i1;
 			lookup = (int)(opc == LUSTRE_OPC_ANY);
 		}
-		rc = ll_setup_filename(dir, &dname, lookup, &fname);
+		if (opc == LUSTRE_OPC_ANY && lookup)
+			pfid = &fid;
+		rc = ll_setup_filename(dir, &dname, lookup, &fname, pfid);
 		if (rc) {
 			ll_finish_md_op_data(op_data);
 			return ERR_PTR(rc);
 		}
+		if (pfid && !fid_is_zero(pfid)) {
+			if (i2 == NULL)
+				op_data->op_fid2 = fid;
+			op_data->op_bias = MDS_FID_OP;
+		}
 		if (fname.disk_name.name &&
 		    fname.disk_name.name != (unsigned char *)name)
 			/* op_data->op_name must be freed after use */
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index a0192da..5fff54d 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -814,6 +814,7 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
 	char secctx_name[XATTR_NAME_MAX + 1];
 	struct fscrypt_name fname;
 	struct inode *inode;
+	struct lu_fid fid;
 	u32 opc;
 	int rc;
 
@@ -856,7 +857,7 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
 	 * not exported function) and call it from ll_revalidate_dentry(), to
 	 * ensure we do not cache stale dentries after a key has been added.
 	 */
-	rc = ll_setup_filename(parent, &dentry->d_name, 1, &fname);
+	rc = ll_setup_filename(parent, &dentry->d_name, 1, &fname, &fid);
 	if ((!rc || rc == -ENOENT) && fname.is_ciphertext_name) {
 		spin_lock(&dentry->d_lock);
 		dentry->d_flags |= DCACHE_ENCRYPTED_NAME;
@@ -874,6 +875,12 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
 		return ERR_CAST(op_data);
 		goto out;
 	}
+	if (!fid_is_zero(&fid)) {
+		op_data->op_fid2 = fid;
+		op_data->op_bias = MDS_FID_OP;
+		if (it->it_op & IT_OPEN)
+			it->it_flags |= MDS_OPEN_BY_FID;
+	}
 
 	/* enforce umask if acl disabled or MDS doesn't support umask */
 	if (!IS_POSIXACL(parent) || !exp_connect_umask(ll_i2mdexp(parent)))
@@ -1856,7 +1863,8 @@ static int ll_unlink(struct inode *dir, struct dentry *dchild)
 	    ll_i2info(dchild->d_inode)->lli_clob &&
 	    dirty_cnt(dchild->d_inode))
 		op_data->op_cli_flags |= CLI_DIRTY_DATA;
-	op_data->op_fid2 = op_data->op_fid3;
+	if (fid_is_zero(&op_data->op_fid2))
+		op_data->op_fid2 = op_data->op_fid3;
 	rc = md_unlink(ll_i2sbi(dir)->ll_md_exp, op_data, &request);
 	ll_finish_md_op_data(op_data);
 	if (rc)
@@ -1926,7 +1934,8 @@ static int ll_rmdir(struct inode *dir, struct dentry *dchild)
 	if (dchild->d_inode)
 		op_data->op_fid3 = *ll_inode2fid(dchild->d_inode);
 
-	op_data->op_fid2 = op_data->op_fid3;
+	if (fid_is_zero(&op_data->op_fid2))
+		op_data->op_fid2 = op_data->op_fid3;
 	rc = md_unlink(ll_i2sbi(dir)->ll_md_exp, op_data, &request);
 	ll_finish_md_op_data(op_data);
 	if (rc == 0) {
@@ -2068,10 +2077,10 @@ static int ll_rename(struct inode *src, struct dentry *src_dchild,
 	if (tgt_dchild->d_inode)
 		op_data->op_fid4 = *ll_inode2fid(tgt_dchild->d_inode);
 
-	err = ll_setup_filename(src, &src_dchild->d_name, 1, &foldname);
+	err = ll_setup_filename(src, &src_dchild->d_name, 1, &foldname, NULL);
 	if (err)
 		return err;
-	err = ll_setup_filename(tgt, &tgt_dchild->d_name, 1, &fnewname);
+	err = ll_setup_filename(tgt, &tgt_dchild->d_name, 1, &fnewname, NULL);
 	if (err) {
 		fscrypt_free_filename(&foldname);
 		return err;
diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c
index 39ffb9d..afb668e 100644
--- a/fs/lustre/llite/statahead.c
+++ b/fs/lustre/llite/statahead.c
@@ -1141,14 +1141,16 @@ static int ll_statahead_thread(void *arg)
 			if (IS_ENCRYPTED(dir)) {
 				struct fscrypt_str de_name =
 					FSTR_INIT(ent->lde_name, namelen);
+				struct lu_fid fid;
 
 				rc = fscrypt_fname_alloc_buffer(dir, NAME_MAX,
 								&lltr);
 				if (rc < 0)
 					continue;
 
+				fid_le_to_cpu(&fid, &ent->lde_fid);
 				if (ll_fname_disk_to_usr(dir, 0, 0, &de_name,
-							 &lltr)) {
+							 &lltr, &fid)) {
 					fscrypt_fname_free_buffer(&lltr);
 					continue;
 				}
@@ -1391,9 +1393,11 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			if (IS_ENCRYPTED(dir)) {
 				struct fscrypt_str de_name =
 					FSTR_INIT(ent->lde_name, namelen);
+				struct lu_fid fid;
 
+				fid_le_to_cpu(&fid, &ent->lde_fid);
 				if (ll_fname_disk_to_usr(dir, 0, 0, &de_name,
-							  &lltr))
+							 &lltr, &fid))
 					continue;
 				name = lltr.name;
 				namelen = lltr.len;
diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c
index d07ef81..51080a1 100644
--- a/fs/lustre/mdc/mdc_lib.c
+++ b/fs/lustre/mdc/mdc_lib.c
@@ -621,6 +621,8 @@ void mdc_getattr_pack(struct req_capsule *pill, u64 valid, u32 flags,
 	b->mbo_valid = valid;
 	if (op_data->op_bias & MDS_CROSS_REF)
 		b->mbo_valid |= OBD_MD_FLCROSSREF;
+	if (op_data->op_bias & MDS_FID_OP)
+		b->mbo_valid |= OBD_MD_NAMEHASH;
 	b->mbo_eadatasize = ea_size;
 	b->mbo_flags = flags;
 	__mdc_pack_body(b, op_data->op_suppgids[0]);
diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index 2c344d7..aba94d1 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -1320,8 +1320,10 @@ int mdc_intent_lock(struct obd_export *exp, struct md_op_data *op_data,
 		it->it_flags);
 
 	lockh.cookie = 0;
+	/* MDS_FID_OP is not a revalidate case */
 	if (fid_is_sane(&op_data->op_fid2) &&
-	    (it->it_op & (IT_LOOKUP | IT_GETATTR | IT_READDIR))) {
+	    (it->it_op & (IT_LOOKUP | IT_GETATTR | IT_READDIR)) &&
+	    !(op_data->op_bias & MDS_FID_OP)) {
 		/* We could just return 1 immediately, but since we should only
 		 * be called in revalidate_it if we already have a lock, let's
 		 * verify that.
diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index 626f493..818c542 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -287,6 +287,15 @@ static int mdc_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 			     op_data->op_mode);
 	req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, acl_bufsize);
 	ptlrpc_request_set_replen(req);
+	if (op_data->op_bias & MDS_FID_OP) {
+		struct mdt_body *b = req_capsule_client_get(&req->rq_pill,
+							    &RMF_MDT_BODY);
+
+		if (b) {
+			b->mbo_valid |= OBD_MD_NAMEHASH;
+			b->mbo_fid2 = op_data->op_fid2;
+		}
+	}
 
 	rc = mdc_getattr_common(exp, req);
 	if (rc) {
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index ec25140..debd0c1 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -1197,11 +1197,14 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic)
 #define OBD_MD_DEFAULT_MEA	(0x0040000000000000ULL) /* default MEA */
 #define OBD_MD_FLOSTLAYOUT	(0x0080000000000000ULL)	/* contain ost_layout */
 #define OBD_MD_FLPROJID		(0x0100000000000000ULL) /* project ID */
-#define OBD_MD_SECCTX        (0x0200000000000000ULL) /* embed security xattr */
-#define OBD_MD_FLLAZYSIZE    (0x0400000000000000ULL) /* Lazy size */
-#define OBD_MD_FLLAZYBLOCKS  (0x0800000000000000ULL) /* Lazy blocks */
+#define OBD_MD_SECCTX		(0x0200000000000000ULL) /* embed security xattr */
+#define OBD_MD_FLLAZYSIZE	(0x0400000000000000ULL) /* Lazy size */
+#define OBD_MD_FLLAZYBLOCKS	(0x0800000000000000ULL) /* Lazy blocks */
 #define OBD_MD_FLBTIME		(0x1000000000000000ULL) /* birth time */
-#define OBD_MD_ENCCTX	     (0x2000000000000000ULL) /* embed encryption ctx */
+#define OBD_MD_ENCCTX		(0x2000000000000000ULL) /* embed encryption ctx */
+#define OBD_MD_NAMEHASH		(0x4000000000000000ULL)	/* use hash instead of name
+							 * in case of encryption
+							 */
 
 #define OBD_MD_FLALLQUOTA (OBD_MD_FLUSRQUOTA | \
 			   OBD_MD_FLGRPQUOTA | \
@@ -1705,7 +1708,8 @@ enum mds_op_bias {
 	MDS_PCC_ATTACH		= 1 << 19,
 	MDS_CLOSE_UPDATE_TIMES	= 1 << 20,
 	/* setstripe create only, don't restripe if target exists */
-	 MDS_SETSTRIPE_CREATE	= 1 << 21,
+	MDS_SETSTRIPE_CREATE	= 1 << 21,
+	MDS_FID_OP		= 1 << 22,
 };
 
 #define MDS_CLOSE_INTENT (MDS_HSM_RELEASE | MDS_CLOSE_LAYOUT_SWAP |         \
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 5c4dadf..291e8e0 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -1221,12 +1221,13 @@ enum la_valid {
 #define MDS_OPEN_PCC      010000000000000ULL /* PCC: auto RW-PCC cache attach
 					      * for newly created file
 					      */
+#define MDS_OP_WITH_FID	  020000000000000ULL /* operation carried out by FID */
 
 #define MDS_OPEN_FL_INTERNAL (MDS_OPEN_HAS_EA | MDS_OPEN_HAS_OBJS |	\
 			      MDS_OPEN_OWNEROVERRIDE | MDS_OPEN_LOCK |	\
 			      MDS_OPEN_BY_FID | MDS_OPEN_LEASE |	\
 			      MDS_OPEN_RELEASE | MDS_OPEN_RESYNC |	\
-			      MDS_OPEN_PCC)
+			      MDS_OPEN_PCC | MDS_OP_WITH_FID)
 
 /********* Changelogs **********/
 /** Changelog record types */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 02/13] lnet: Revert "lnet: Lock primary NID logic"
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 01/13] lustre: sec: filename encryption - digest support James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 03/13] lustre: quota: fallocate send UID/GID for quota James Simmons
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

This patch breaks client mounts under certain LNet configurations.

This reverts commit f2f168e3daf12850f40f991d74e04eb283c2376f

WC-bug-id: https://jira.whamcloud.com/browse/LU-15169
Lustre-commit: f2f168e3daf12850f ("LU-15169 Revert "LU-14668 lnet: Lock primary NID logic")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/45386
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 67 +++++++++++++---------------------------------------
 1 file changed, 16 insertions(+), 51 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index a9f33c0..cca458f 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -535,15 +535,6 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
 		}
 	}
 
-	/* If we're asked to lock down the primary NID we shouldn't be
-	 * deleting it
-	 */
-	if (lp->lp_state & LNET_PEER_LOCK_PRIMARY &&
-	    nid_same(&primary_nid, &nid)) {
-		rc = -EPERM;
-		goto out;
-	}
-
 	lpni = lnet_peer_ni_find_locked(&nid);
 	if (!lpni) {
 		rc = -ENOENT;
@@ -1448,18 +1439,13 @@ struct lnet_peer_ni *
 	 * down then this discovery can introduce long delays into the mount
 	 * process, so skip it if it isn't necessary.
 	 */
-	if (!lnet_peer_discovery_disabled && !lnet_peer_is_uptodate(lp)) {
+	while (!lnet_peer_discovery_disabled && !lnet_peer_is_uptodate(lp)) {
 		spin_lock(&lp->lp_lock);
 		/* force a full discovery cycle */
-		lp->lp_state |= LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH |
-				LNET_PEER_LOCK_PRIMARY;
+		lp->lp_state |= LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH;
 		spin_unlock(&lp->lp_lock);
 
-		/* start discovery in the background. Messages to that
-		 * peer will not go through until the discovery is
-		 * complete
-		 */
-		rc = lnet_discover_peer_locked(lpni, cpt, false);
+		rc = lnet_discover_peer_locked(lpni, cpt, true);
 		if (rc)
 			goto out_decref;
 		/* The lpni (or lp) for this NID may have changed and our ref is
@@ -1473,6 +1459,14 @@ struct lnet_peer_ni *
 			goto out_unlock;
 		}
 		lp = lpni->lpni_peer_net->lpn_peer;
+
+		/* If we find that the peer has discovery disabled then we will
+		 * not modify whatever primary NID is currently set for this
+		 * peer. Thus, we can break out of this loop even if the peer
+		 * is not fully up to date.
+		 */
+		if (lnet_is_discovery_disabled(lp))
+			break;
 	}
 	primary_nid = lnet_nid_to_nid4(&lp->lp_primary_nid);
 out_decref:
@@ -1579,8 +1573,6 @@ struct lnet_peer_net *
 			lnet_peer_clr_non_mr_pref_nids(lp);
 		}
 	}
-	if (flags & LNET_PEER_LOCK_PRIMARY)
-		lp->lp_state |= LNET_PEER_LOCK_PRIMARY;
 	spin_unlock(&lp->lp_lock);
 
 	lp->lp_nnis++;
@@ -1742,27 +1734,9 @@ struct lnet_peer_net *
 		}
 		/* If this is the primary NID, destroy the peer. */
 		if (lnet_peer_ni_is_primary(lpni)) {
-			struct lnet_peer *lp2 =
+			struct lnet_peer *rtr_lp =
 				lpni->lpni_peer_net->lpn_peer;
-			int rtr_refcount = lp2->lp_rtr_refcount;
-
-			/* If the new peer that this NID belongs to is
-			 * a primary NID for another peer which we're
-			 * suppose to preserve the Primary for then we
-			 * don't want to mess with it. But the
-			 * configuration is wrong at this point, so we
-			 * should flag both of these peers as in a bad
-			 * state
-			 */
-			if (lp2->lp_state & LNET_PEER_LOCK_PRIMARY) {
-				spin_lock(&lp->lp_lock);
-				lp->lp_state |= LNET_PEER_BAD_CONFIG;
-				spin_unlock(&lp->lp_lock);
-				spin_lock(&lp2->lp_lock);
-				lp2->lp_state |= LNET_PEER_BAD_CONFIG;
-				spin_unlock(&lp2->lp_lock);
-				goto out_free_lpni;
-			}
+			int rtr_refcount = rtr_lp->lp_rtr_refcount;
 
 			/* if we're trying to delete a router it means
 			 * we're moving this peer NI to a new peer so must
@@ -1770,9 +1744,9 @@ struct lnet_peer_net *
 			 */
 			if (rtr_refcount > 0) {
 				flags |= LNET_PEER_RTR_NI_FORCE_DEL;
-				lnet_rtr_transfer_to_peer(lp2, lp);
+				lnet_rtr_transfer_to_peer(rtr_lp, lp);
 			}
-			lnet_peer_del(lp2);
+			lnet_peer_del(lpni->lpni_peer_net->lpn_peer);
 			lnet_peer_ni_decref_locked(lpni);
 			lpni = lnet_peer_ni_alloc(&nid);
 			if (!lpni) {
@@ -1830,8 +1804,7 @@ struct lnet_peer_net *
 	if (lnet_nid_to_nid4(&lp->lp_primary_nid) == nid)
 		goto out;
 
-	if (!(lp->lp_state & LNET_PEER_LOCK_PRIMARY))
-		lnet_nid4_to_nid(nid, &lp->lp_primary_nid);
+	lnet_nid4_to_nid(nid, &lp->lp_primary_nid);
 
 	rc = lnet_peer_add_nid(lp, nid, flags);
 	if (rc) {
@@ -1839,14 +1812,6 @@ struct lnet_peer_net *
 		goto out;
 	}
 out:
-	/* if this is a configured peer or the primary for that peer has
-	 * been locked, then we don't want to flag this scenario as
-	 * a failure
-	 */
-	if (lp->lp_state & LNET_PEER_CONFIGURED ||
-	    lp->lp_state & LNET_PEER_LOCK_PRIMARY)
-		return 0;
-
 	CDEBUG(D_NET, "peer %s NID %s: %d\n",
 	       libcfs_nidstr(&old), libcfs_nid2str(nid), rc);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 03/13] lustre: quota: fallocate send UID/GID for quota
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 01/13] lustre: sec: filename encryption - digest support James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 02/13] lnet: Revert "lnet: Lock primary NID logic" James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 04/13] lustre: mdc: add client tunable to disable LSOM update James Simmons
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Arshad Hussain, Lustre Development List

From: Arshad Hussain <arshad.hussain@aeoncomputing.com>

Calling fallocate() on a newly created file did not account quota
usage properly because the OST object did not have a UID/GID
assigned yet. Update the fallocate code in the OSC to always send
the file UID/GID/PROJID to the OST so that the object ownership
can be updated before space is allocated.

Fixes: d748d2ffa1bc ("lustre: fallocate: Implement fallocate preallocate operation")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15167
Lustre-commit: 789038c97ae107287 ("LU-15167 quota: fallocate send UID/GID for quota")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/45475
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h | 2 ++
 fs/lustre/llite/file.c        | 8 ++++++++
 fs/lustre/lov/lov_io.c        | 4 ++++
 fs/lustre/osc/osc_io.c        | 8 +++++++-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index a65240b..1746c4e 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -1877,6 +1877,8 @@ struct cl_io {
 			int			sa_falloc_mode;
 			loff_t			sa_falloc_offset;
 			loff_t			sa_falloc_end;
+			uid_t			sa_falloc_uid;
+			gid_t			sa_falloc_gid;
 		} ci_setattr;
 		struct cl_data_version_io {
 			u64			dv_data_version;
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 898db80..20571c9 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -5244,6 +5244,14 @@ int cl_falloc(struct file *file, struct inode *inode, int mode, loff_t offset,
 	io->u.ci_setattr.sa_falloc_offset = offset;
 	io->u.ci_setattr.sa_falloc_end = offset + len;
 	io->u.ci_setattr.sa_subtype = CL_SETATTR_FALLOCATE;
+
+	CDEBUG(D_INODE, "UID %u GID %u\n",
+	       from_kuid(&init_user_ns, inode->i_uid),
+	       from_kgid(&init_user_ns, inode->i_gid));
+
+	io->u.ci_setattr.sa_falloc_uid = from_kuid(&init_user_ns, inode->i_uid);
+	io->u.ci_setattr.sa_falloc_gid = from_kgid(&init_user_ns, inode->i_gid);
+
 	if (io->u.ci_setattr.sa_falloc_end > size) {
 		loff_t newsize = io->u.ci_setattr.sa_falloc_end;
 
diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c
index d5f895f..8df13ee 100644
--- a/fs/lustre/lov/lov_io.c
+++ b/fs/lustre/lov/lov_io.c
@@ -680,6 +680,10 @@ static void lov_io_sub_inherit(struct lov_io_sub *sub, struct lov_io *lio,
 		if (cl_io_is_fallocate(io)) {
 			io->u.ci_setattr.sa_falloc_offset = start;
 			io->u.ci_setattr.sa_falloc_end = end;
+			io->u.ci_setattr.sa_falloc_uid =
+				parent->u.ci_setattr.sa_falloc_uid;
+			io->u.ci_setattr.sa_falloc_gid =
+				parent->u.ci_setattr.sa_falloc_gid;
 		}
 		if (cl_io_is_trunc(io)) {
 			loff_t new_size = parent->u.ci_setattr.sa_attr.lvb_size;
diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c
index b867985..b84022b 100644
--- a/fs/lustre/osc/osc_io.c
+++ b/fs/lustre/osc/osc_io.c
@@ -669,7 +669,13 @@ static int osc_io_setattr_start(const struct lu_env *env,
 
 			oa->o_size = io->u.ci_setattr.sa_falloc_offset;
 			oa->o_blocks = io->u.ci_setattr.sa_falloc_end;
-			oa->o_valid |= OBD_MD_FLSIZE | OBD_MD_FLBLOCKS;
+			oa->o_uid = io->u.ci_setattr.sa_falloc_uid;
+			oa->o_gid = io->u.ci_setattr.sa_falloc_gid;
+			oa->o_valid |= OBD_MD_FLSIZE | OBD_MD_FLBLOCKS |
+				OBD_MD_FLUID | OBD_MD_FLGID;
+
+			CDEBUG(D_INODE, "size %llu blocks %llu uid %u gid %u\n",
+			       oa->o_size, oa->o_blocks, oa->o_uid, oa->o_gid);
 			result = osc_fallocate_base(osc_export(cl2osc(obj)),
 						    oa, osc_async_upcall,
 						    cbargs, falloc_mode);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 04/13] lustre: mdc: add client tunable to disable LSOM update
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
                   ` (2 preceding siblings ...)
  2021-12-29 14:51 ` [lustre-devel] [PATCH 03/13] lustre: quota: fallocate send UID/GID for quota James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 05/13] lustre: dne: dir migration in non-recursive mode James Simmons
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexander Boyko, Lustre Development List

From: Alexander Boyko <alexander.boyko@hpe.com>

It seems that mdt_lsom_update() has a serious issue with a single
shared file because of its mdt-level mutex for every close request.
The patch adds mdc_lsom parameter to mdc, base on it state client
sends or not LSOM updates to MDT. By default LSOM is on.

lctl set_param mdc.*.mdc_lsom=[on|off]

For a configuration when LSOM is not used the patch helps
MDT with load avarage with a specific load when many threads
open/read/close for a single file.

HPE-bug-id: LUS-10604
WC-bug-id: https://jira.whamcloud.com/browse/LU-15252
Lustre-commit: 19172ed37851fdd57 ("LU-15252 mdc: add client tunable to disable LSOM update")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/45619
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd.h     |  3 ++-
 fs/lustre/mdc/lproc_mdc.c   | 29 +++++++++++++++++++++++++++++
 fs/lustre/mdc/mdc_request.c |  4 +++-
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 58a5803..3aa5b37 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -208,7 +208,8 @@ struct client_obd {
 	/* checksumming for data sent over the network */
 	unsigned int		 cl_checksum:1,	/* 0 = disabled, 1 = enabled */
 				 cl_checksum_dump:1, /* same */
-				 cl_ocd_grant_param:1;
+				 cl_ocd_grant_param:1,
+				 cl_lsom_update:1; /* send LSOM updates */
 	/* supported checksum types that are worked out at connect time */
 	enum lustre_sec_part     cl_sp_me;
 	enum lustre_sec_part     cl_sp_to;
diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index fe93ccd..3de6533 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -566,6 +566,33 @@ static ssize_t mdc_dom_min_repsize_seq_write(struct file *file,
 }
 LDEBUGFS_SEQ_FOPS(mdc_dom_min_repsize);
 
+static int mdc_lsom_seq_show(struct seq_file *m, void *v)
+{
+	struct obd_device *dev = m->private;
+
+	seq_printf(m, "%s\n", dev->u.cli.cl_lsom_update ? "On" : "Off");
+
+	return 0;
+}
+
+static ssize_t mdc_lsom_seq_write(struct file *file,
+				  const char __user *buffer,
+				  size_t count, loff_t *off)
+{
+	struct obd_device *dev;
+	bool val;
+	int rc;
+
+	dev =  ((struct seq_file *)file->private_data)->private;
+	rc = kstrtobool_from_user(buffer, count, &val);
+	if (rc)
+		return rc;
+
+	dev->u.cli.cl_lsom_update = val;
+	return count;
+}
+LDEBUGFS_SEQ_FOPS(mdc_lsom);
+
 LDEBUGFS_SEQ_FOPS_RO_TYPE(mdc, connect_flags);
 LDEBUGFS_SEQ_FOPS_RO_TYPE(mdc, server_uuid);
 LDEBUGFS_SEQ_FOPS_RO_TYPE(mdc, timeouts);
@@ -601,6 +628,8 @@ static ssize_t mdc_dom_min_repsize_seq_write(struct file *file,
 	  .fops	=	&mdc_stats_fops			},
 	{ .name	=	"mdc_dom_min_repsize",
 	  .fops	=	&mdc_dom_min_repsize_fops	},
+	{ .name =	"mdc_lsom",
+	  .fops =	&mdc_lsom_fops			},
 	{ NULL }
 };
 
diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index 818c542..9788bd3 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -952,7 +952,8 @@ static int mdc_close(struct obd_export *exp, struct md_op_data *op_data,
 	req->rq_request_portal = MDS_READPAGE_PORTAL;
 	ptlrpc_at_set_req_timeout(req);
 
-	if (!(exp_connect_flags2(exp) & OBD_CONNECT2_LSOM))
+	if (!obd->u.cli.cl_lsom_update ||
+	    !(exp_connect_flags2(exp) & OBD_CONNECT2_LSOM))
 		op_data->op_xvalid &= ~(OP_XVALID_LAZYSIZE |
 					OP_XVALID_LAZYBLOCKS);
 
@@ -2842,6 +2843,7 @@ int mdc_setup(struct obd_device *obd, struct lustre_cfg *cfg)
 		goto err_osc_cleanup;
 
 	obd->u.cli.cl_dom_min_inline_repsize = MDC_DOM_DEF_INLINE_REPSIZE;
+	obd->u.cli.cl_lsom_update = true;
 
 	ns_register_cancel(obd->obd_namespace, mdc_cancel_weight);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 05/13] lustre: dne: dir migration in non-recursive mode
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
                   ` (3 preceding siblings ...)
  2021-12-29 14:51 ` [lustre-devel] [PATCH 04/13] lustre: mdc: add client tunable to disable LSOM update James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 06/13] lustre: update version to 2.14.56 James Simmons
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

Add an option "-d|--directory" option for LL_IOC_MIGRATE to
migrate specified directory only, which is similar to "ls -d".

WC-bug-id: https://jira.whamcloud.com/browse/LU-14975
Lustre-commit: 5604a6d270b8be13a ("LU-14975 dne: dir migration in non-recursive mode")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44802
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c                  | 5 ++++-
 fs/lustre/llite/file.c                 | 7 ++++++-
 fs/lustre/llite/llite_internal.h       | 2 +-
 fs/lustre/lmv/lmv_obd.c                | 5 +++++
 fs/lustre/ptlrpc/wiretest.c            | 6 ++++++
 include/uapi/linux/lustre/lustre_idl.h | 2 ++
 6 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 23d3fba..40e83e7 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -2102,6 +2102,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		struct lmv_user_md *lum;
 		char *filename;
 		int namelen = 0;
+		u32 flags;
 		int len;
 		int rc;
 
@@ -2117,6 +2118,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 
 		filename = data->ioc_inlbuf1;
 		namelen = data->ioc_inllen1;
+		flags = data->ioc_type;
+
 		if (namelen < 1 || namelen != strlen(filename) + 1) {
 			CDEBUG(D_INFO, "IOC_MDC_LOOKUP missing filename\n");
 			rc = -EINVAL;
@@ -2132,7 +2135,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 			goto migrate_free;
 		}
 
-		rc = ll_migrate(inode, file, lum, filename);
+		rc = ll_migrate(inode, file, lum, filename, flags);
 migrate_free:
 		kvfree(data);
 
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 20571c9..0dd1bae 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -4682,7 +4682,7 @@ int ll_get_fid_by_name(struct inode *parent, const char *name,
 }
 
 int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum,
-	       const char *name)
+	       const char *name, u32 flags)
 {
 	struct ptlrpc_request *request = NULL;
 	struct obd_client_handle *och = NULL;
@@ -4779,6 +4779,11 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum,
 	op_data->op_data = lum;
 	op_data->op_data_size = lumlen;
 
+	/* migrate dirent only for subdirs if MDS_MIGRATE_NSONLY set */
+	if (S_ISDIR(child_inode->i_mode) && (flags & MDS_MIGRATE_NSONLY) &&
+	    lmv_dir_layout_changing(ll_i2info(parent)->lli_lsm_md))
+		op_data->op_bias |= MDS_MIGRATE_NSONLY;
+
 again:
 	if (S_ISREG(child_inode->i_mode)) {
 		och = ll_lease_open(child_inode, NULL, FMODE_WRITE, 0);
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 6e212c9..12d47e8 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1130,7 +1130,7 @@ static inline int ll_inode_flags_to_xflags(int inode_flags)
 }
 
 int ll_migrate(struct inode *parent, struct file *file,
-	       struct lmv_user_md *lum, const char *name);
+	       struct lmv_user_md *lum, const char *name, u32 flags);
 int ll_get_fid_by_name(struct inode *parent, const char *name,
 		       int namelen, struct lu_fid *fid, struct inode **inode);
 int ll_inode_permission(struct inode *inode, int mask);
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index b31f943..c87f37f 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -2227,6 +2227,11 @@ static int lmv_migrate(struct obd_export *exp, struct md_op_data *op_data,
 			tp_tgt = lmv_tgt(lmv, oinfo->lmo_mds);
 			if (!tp_tgt)
 				return -ENODEV;
+
+			/* parent unchanged and update namespace only */
+			if (lu_fid_eq(&op_data->op_fid4, &op_data->op_fid2) &&
+			    op_data->op_bias & MDS_MIGRATE_NSONLY)
+				return -EALREADY;
 		}
 	} else {
 		sp_tgt = parent_tgt;
diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index a381af4..687a54d 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -2119,6 +2119,12 @@ void lustre_assert_wire_constants(void)
 		(unsigned int)MDS_PCC_ATTACH);
 	LASSERTF(MDS_CLOSE_UPDATE_TIMES == 0x00100000UL, "found 0x%.8xUL\n",
 		(unsigned int)MDS_CLOSE_UPDATE_TIMES);
+	LASSERTF(MDS_SETSTRIPE_CREATE == 0x00200000UL, "found 0x%.8xUL\n",
+		(unsigned int)MDS_SETSTRIPE_CREATE);
+	LASSERTF(MDS_FID_OP == 0x00400000UL, "found 0x%.8xUL\n",
+		(unsigned int)MDS_FID_OP);
+	LASSERTF(MDS_MIGRATE_NSONLY == 0x00800000UL, "found 0x%.8xUL\n",
+		(unsigned int)MDS_MIGRATE_NSONLY);
 
 	/* Checks for struct mdt_body */
 	LASSERTF((int)sizeof(struct mdt_body) == 216, "found %lld\n",
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index debd0c1..35d3ed2 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -1710,6 +1710,8 @@ enum mds_op_bias {
 	/* setstripe create only, don't restripe if target exists */
 	MDS_SETSTRIPE_CREATE	= 1 << 21,
 	MDS_FID_OP		= 1 << 22,
+	/* migrate dirent only */
+	MDS_MIGRATE_NSONLY	= 1 << 23,
 };
 
 #define MDS_CLOSE_INTENT (MDS_HSM_RELEASE | MDS_CLOSE_LAYOUT_SWAP |         \
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 06/13] lustre: update version to 2.14.56
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
                   ` (4 preceding siblings ...)
  2021-12-29 14:51 ` [lustre-devel] [PATCH 05/13] lustre: dne: dir migration in non-recursive mode James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 07/13] lustre: sec: no encryption key migrate/extend/resync/split James Simmons
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

New tag 2.14.56

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h
index d4ca95e..947a829 100644
--- a/include/uapi/linux/lustre/lustre_ver.h
+++ b/include/uapi/linux/lustre/lustre_ver.h
@@ -3,9 +3,9 @@
 
 #define LUSTRE_MAJOR 2
 #define LUSTRE_MINOR 14
-#define LUSTRE_PATCH 55
+#define LUSTRE_PATCH 56
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.14.55"
+#define LUSTRE_VERSION_STRING "2.14.56"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 07/13] lustre: sec: no encryption key migrate/extend/resync/split
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
                   ` (5 preceding siblings ...)
  2021-12-29 14:51 ` [lustre-devel] [PATCH 06/13] lustre: update version to 2.14.56 James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 08/13] lustre: sec: fix handling of encrypted file with long name James Simmons
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

Allow some layout operations on encrypted files, even when the
encryption key is not available:
- lfs migrate
- lfs mirror extend
- lfs mirror resync
- lfs mirror verify
- lfs mirror split
We allow these access patterns to applications that know what they are
doing, by using the specific flag O_FILE_ENC and O_DIRECT.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14677
Lustre-commit: fdbf2ffd41fa56607 ("LU-14677 sec: no encryption key migrate/extend/resync/split")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/44024
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd.h                 |   1 -
 fs/lustre/llite/crypto.c                |  55 +++++++++++++---
 fs/lustre/llite/dir.c                   |  13 +++-
 fs/lustre/llite/file.c                  |  49 +++++++++-----
 fs/lustre/llite/llite_internal.h        |  10 ++-
 fs/lustre/llite/llite_lib.c             | 109 ++++++++++++++++++++++++++++++--
 fs/lustre/llite/namei.c                 |  64 +++++++++----------
 fs/lustre/llite/rw26.c                  |   2 +-
 fs/lustre/llite/xattr.c                 |   4 +-
 fs/lustre/osc/osc_request.c             |  42 +++++++++---
 include/uapi/linux/lustre/lustre_user.h |   4 ++
 11 files changed, 273 insertions(+), 80 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 3aa5b37..f6b9d16 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -734,7 +734,6 @@ enum md_op_code {
 	LUSTRE_OPC_ANY,
 	LUSTRE_OPC_LOOKUP,
 	LUSTRE_OPC_OPEN,
-	LUSTRE_OPC_MIGR,
 };
 
 /**
diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c
index 7bc6e01..6a12b6c 100644
--- a/fs/lustre/llite/crypto.c
+++ b/fs/lustre/llite/crypto.c
@@ -41,7 +41,7 @@ static int ll_get_context(struct inode *inode, void *ctx, size_t len)
 		return PTR_ERR(env);
 
 	/* Set lcc_getencctx=1 to allow this thread to read
-	 * LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr, as requested by llcrypt.
+	 * LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr, as requested by fscrypt.
 	 */
 	ll_cl_add(inode, env, NULL, LCC_RW);
 	ll_env_info(env)->lti_io_ctx.lcc_getencctx = 1;
@@ -129,7 +129,33 @@ static int ll_set_context(struct inode *inode, const void *ctx, size_t len,
 	return ll_set_encflags(inode, (void *)ctx, len, false);
 }
 
-#define llcrypto_free_ctx	kfree
+/**
+ * ll_file_open_encrypt() - overlay to fscrypt_file_open
+ * @inode: the inode being opened
+ * @filp: the struct file being set up
+ *
+ * This overlay function is necessary to handle encrypted file open without
+ * the key. We allow this access pattern to applications that know what they
+ * are doing, by using the specific flag O_FILE_ENC.
+ * This flag is only compatible with O_DIRECT IOs, to make sure ciphertext
+ * data is wiped from page cache once IOs are finished.
+ */
+int ll_file_open_encrypt(struct inode *inode, struct file *filp)
+{
+	int rc;
+
+	rc = fscrypt_file_open(inode, filp);
+	if (likely(rc != -ENOKEY))
+		return rc;
+
+	if (rc == -ENOKEY &&
+	    (filp->f_flags & O_FILE_ENC) == O_FILE_ENC &&
+	    filp->f_flags & O_DIRECT)
+		/* allow file open with O_FILE_ENC flag when we have O_DIRECT */
+		rc = 0;
+
+	return rc;
+}
 
 bool ll_sbi_has_test_dummy_encryption(struct ll_sb_info *sbi)
 {
@@ -183,9 +209,9 @@ static bool ll_empty_dir(struct inode *inode)
  * This overlay function is necessary to properly encode @fname after
  * encryption, as it will be sent over the wire.
  * This overlay function is also necessary to handle the case of operations
- * carried out without the key. Normally llcrypt makes use of digested names in
+ * carried out without the key. Normally fscrypt makes use of digested names in
  * that case. Having a digested name works for local file systems that can call
- * llcrypt_match_name(), but Lustre server side is not aware of encryption.
+ * fscrypt_match_name(), but Lustre server side is not aware of encryption.
  * So for keyless @lookup operations on long names, for Lustre we choose to
  * present to users the encoded struct ll_digest_filename, instead of a digested
  * name. FID and name hash can then easily be extracted and put into the
@@ -218,6 +244,17 @@ int ll_setup_filename(struct inode *dir, const struct qstr *iname,
 		fid->f_ver = 0;
 	}
 	rc = fscrypt_setup_filename(dir, &dname, lookup, fname);
+	if (rc == -ENOENT && lookup &&
+	    !fscrypt_has_encryption_key(dir) &&
+	    unlikely(filename_is_volatile(iname->name, iname->len, NULL))) {
+		/* For purpose of migration or mirroring without enc key, we
+		 * allow lookup of volatile file without enc context.
+		 */
+		memset(fname, 0, sizeof(struct fscrypt_name));
+		fname->disk_name.name = (unsigned char *)iname->name;
+		fname->disk_name.len = iname->len;
+		rc = 0;
+	}
 	if (rc)
 		return rc;
 
@@ -294,9 +331,9 @@ int ll_setup_filename(struct inode *dir, const struct qstr *iname,
  * This overlay function is necessary to properly decode @iname before
  * decryption, as it comes from the wire.
  * This overlay function is also necessary to handle the case of operations
- * carried out without the key. Normally llcrypt makes use of digested names in
+ * carried out without the key. Normally fscrypt makes use of digested names in
  * that case. Having a digested name works for local file systems that can call
- * llcrypt_match_name(), but Lustre server side is not aware of encryption.
+ * fscrypt_match_name(), but Lustre server side is not aware of encryption.
  * So for keyless @lookup operations on long names, for Lustre we choose to
  * present to users the encoded struct ll_digest_filename, instead of a digested
  * name. FID and name hash can then easily be extracted and put into the
@@ -334,7 +371,7 @@ int ll_fname_disk_to_usr(struct inode *inode,
 			digested = 1;
 			/* Without the key for long names, set the dentry name
 			 * to the representing struct ll_digest_filename. It
-			 * will be encoded by llcrypt for display, and will
+			 * will be encoded by fscrypt for display, and will
 			 * enable further lookup requests.
 			 */
 			if (!fid)
@@ -373,7 +410,7 @@ int ll_revalidate_d_crypto(struct dentry *dentry, unsigned int flags)
 	int valid;
 
 	/*
-	 * Plaintext names are always valid, since llcrypt doesn't support
+	 * Plaintext names are always valid, since fscrypt doesn't support
 	 * reverting to ciphertext names without evicting the directory's inode
 	 * -- which implies eviction of the dentries in the directory.
 	 */
@@ -383,7 +420,7 @@ int ll_revalidate_d_crypto(struct dentry *dentry, unsigned int flags)
 	/*
 	 * Ciphertext name; valid if the directory's key is still unavailable.
 	 *
-	 * Although llcrypt forbids rename() on ciphertext names, we still must
+	 * Although fscrypt forbids rename() on ciphertext names, we still must
 	 * use dget_parent() here rather than use ->d_parent directly.  That's
 	 * because a corrupted fs image may contain directory hard links, which
 	 * the VFS handles by moving the directory's dentry tree in the dcache
diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 40e83e7..f3f1ce7 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -1805,7 +1805,12 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 			st.st_uid = body->mbo_uid;
 			st.st_gid = body->mbo_gid;
 			st.st_rdev = body->mbo_rdev;
-			st.st_size = body->mbo_size;
+			if (fscrypt_require_key(inode) == -ENOKEY)
+				st.st_size = round_up(st.st_size,
+						      LUSTRE_ENCRYPTION_UNIT_SIZE);
+			else
+				st.st_size = body->mbo_size;
+
 			st.st_blksize = PAGE_SIZE;
 			st.st_blocks = body->mbo_blocks;
 			st.st_atime = body->mbo_atime;
@@ -1829,7 +1834,11 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 			stx.stx_mode = body->mbo_mode;
 			stx.stx_ino = cl_fid_build_ino(&body->mbo_fid1,
 						       api32);
-			stx.stx_size = body->mbo_size;
+			if (fscrypt_require_key(inode) == -ENOKEY)
+				stx.stx_size = round_up(stx.stx_size,
+						   LUSTRE_ENCRYPTION_UNIT_SIZE);
+			else
+				stx.stx_size = body->mbo_size;
 			stx.stx_blocks = body->mbo_blocks;
 			stx.stx_atime.tv_sec = body->mbo_atime;
 			stx.stx_ctime.tv_sec = body->mbo_ctime;
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 0dd1bae..eafb936 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -104,7 +104,16 @@ static void ll_prepare_close(struct inode *inode, struct md_op_data *op_data,
 	op_data->op_attr.ia_atime = inode->i_atime;
 	op_data->op_attr.ia_mtime = inode->i_mtime;
 	op_data->op_attr.ia_ctime = inode->i_ctime;
-	op_data->op_attr.ia_size = i_size_read(inode);
+	/* In case of encrypted file without the key, visible size was rounded
+	 * up to next LUSTRE_ENCRYPTION_UNIT_SIZE, and clear text size was
+	 * stored into lli_lazysize in ll_merge_attr(), so set proper file size
+	 * now that we are closing.
+	 */
+	if (fscrypt_require_key(inode) == -ENOKEY &&
+	    ll_i2info(inode)->lli_attr_valid & OBD_MD_FLLAZYSIZE)
+		op_data->op_attr.ia_size = ll_i2info(inode)->lli_lazysize;
+	else
+		op_data->op_attr.ia_size = i_size_read(inode);
 	op_data->op_attr.ia_valid |= (ATTR_MODE | ATTR_ATIME | ATTR_ATIME_SET |
 				      ATTR_MTIME | ATTR_MTIME_SET |
 				      ATTR_CTIME);
@@ -796,6 +805,7 @@ int ll_file_open(struct inode *inode, struct file *file)
 	struct lookup_intent *it, oit = { .it_op = IT_OPEN,
 					  .it_flags = file->f_flags };
 	struct obd_client_handle **och_p = NULL;
+	struct dentry *de = file_dentry(file);
 	u64 *och_usecount = NULL;
 	struct ll_file_data *fd;
 	ktime_t kstart = ktime_get();
@@ -808,9 +818,12 @@ int ll_file_open(struct inode *inode, struct file *file)
 	file->private_data = NULL; /* prevent ll_local_open assertion */
 
 	if (S_ISREG(inode->i_mode)) {
-		rc = fscrypt_file_open(inode, file);
-		if (rc)
+		rc = ll_file_open_encrypt(inode, file);
+		if (rc) {
+			if (it && it->it_disposition)
+				ll_release_openhandle(d_inode(de), it);
 			goto out_nofiledata;
+		}
 	}
 
 	fd = ll_file_data_get();
@@ -1475,6 +1488,16 @@ int ll_merge_attr(const struct lu_env *env, struct inode *inode)
 	CDEBUG(D_VFSTRACE, DFID " updating i_size %llu\n",
 	       PFID(&lli->lli_fid), attr->cat_size);
 
+	if (fscrypt_require_key(inode) == -ENOKEY) {
+		/* Without the key, round up encrypted file size to next
+		 * LUSTRE_ENCRYPTION_UNIT_SIZE. Clear text size is put in
+		 * lli_lazysize for proper file size setting at close time.
+		 */
+		lli->lli_attr_valid |= OBD_MD_FLLAZYSIZE;
+		lli->lli_lazysize = attr->cat_size;
+		attr->cat_size = round_up(attr->cat_size,
+					  LUSTRE_ENCRYPTION_UNIT_SIZE);
+	}
 	i_size_write(inode, attr->cat_size);
 
 	inode->i_blocks = attr->cat_blocks;
@@ -4344,6 +4367,12 @@ loff_t ll_lseek(struct file *file, loff_t offset, int whence)
 
 	cl_env_put(env, &refcheck);
 
+	/* Without the key, SEEK_HOLE return value has to be
+	 * rounded up to next LUSTRE_ENCRYPTION_UNIT_SIZE.
+	 */
+	if (fscrypt_require_key(inode) == -ENOKEY && whence == SEEK_HOLE)
+		retval = round_up(retval, LUSTRE_ENCRYPTION_UNIT_SIZE);
+
 	return retval;
 }
 
@@ -4746,20 +4775,8 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum,
 		goto out_iput;
 	}
 
-	if (IS_ENCRYPTED(child_inode)) {
-		rc = fscrypt_get_encryption_info(child_inode);
-		if (rc)
-			goto out_iput;
-		if (!fscrypt_has_encryption_key(child_inode)) {
-			CDEBUG(D_SEC, "no enc key for "DFID"\n",
-			       PFID(ll_inode2fid(child_inode)));
-			rc = -ENOKEY;
-			goto out_iput;
-		}
-	}
-
 	op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen,
-				     child_inode->i_mode, LUSTRE_OPC_MIGR, NULL);
+				     child_inode->i_mode, LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data)) {
 		rc = PTR_ERR(op_data);
 		goto out_iput;
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 12d47e8..54fd8d4 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1184,6 +1184,8 @@ int ll_revalidate_it_finish(struct ptlrpc_request *request,
 struct inode *ll_inode_from_resource_lock(struct ldlm_lock *lock);
 void ll_dir_clear_lsm_md(struct inode *inode);
 void ll_clear_inode(struct inode *inode);
+int volatile_ref_file(const char *volatile_name, int volatile_len,
+		      struct file **ref_file);
 int ll_setattr_raw(struct dentry *dentry, struct iattr *attr,
 		   enum op_xvalid xvalid, bool hsm_import);
 int ll_setattr(struct dentry *de, struct iattr *attr);
@@ -1707,7 +1709,7 @@ static inline struct pcc_super *ll_info2pccs(struct ll_inode_info *lli)
 #ifdef CONFIG_FS_ENCRYPTION
 /* The digested form is made of a FID (16 bytes) followed by the second-to-last
  * ciphertext block (16 bytes), so a total length of 32 bytes.
- * That way, llcrypt does not compute a digested form of this digest.
+ * That way, fscrypt does not compute a digested form of this digest.
  */
 struct ll_digest_filename {
 	struct lu_fid ldf_fid;
@@ -1722,6 +1724,7 @@ int ll_fname_disk_to_usr(struct inode *inode,
 			 struct fscrypt_str *iname, struct fscrypt_str *oname,
 			 struct lu_fid *fid);
 int ll_revalidate_d_crypto(struct dentry *dentry, unsigned int flags);
+int ll_file_open_encrypt(struct inode *inode, struct file *filp);
 #else
 int ll_setup_filename(struct inode *dir, const struct qstr *iname,
 		      int lookup, struct fscrypt_name *fname)
@@ -1740,6 +1743,11 @@ int ll_revalidate_d_crypto(struct dentry *dentry, unsigned int flags)
 {
 	return 1;
 }
+
+int ll_file_open_encrypt(struct inode *inode, struct file *filp)
+{
+	return 0;
+}
 #endif
 
 extern const struct fscrypt_operations lustre_cryptops;
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 7f168a2..c9be5af 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -40,6 +40,7 @@
 #include <linux/module.h>
 #include <linux/random.h>
 #include <linux/statfs.h>
+#include <linux/file.h>
 #include <linux/types.h>
 #include <linux/mm.h>
 #include <linux/delay.h>
@@ -1863,7 +1864,7 @@ int ll_io_zero_page(struct inode *inode, pgoff_t index, pgoff_t offset,
 		 */
 		SetPagePrivate2(vmpage);
 		rc = ll_io_read_page(env, io, clpage, NULL);
-		if (!PagePrivate2(vmpage))
+		if (!PagePrivate2(vmpage)) {
 			/* PagePrivate2 was cleared in osc_brw_fini_request()
 			 * meaning we read an empty page. In this case, in order
 			 * to avoid allocating unnecessary block in truncated
@@ -1872,6 +1873,7 @@ int ll_io_zero_page(struct inode *inode, pgoff_t index, pgoff_t offset,
 			 */
 			rc = 0;
 			goto clpfini;
+		}
 		ClearPagePrivate2(vmpage);
 		if (rc)
 			goto clpfini;
@@ -1925,6 +1927,44 @@ int ll_io_zero_page(struct inode *inode, pgoff_t index, pgoff_t offset,
 	return rc;
 }
 
+/**
+ * Get reference file from volatile file name.
+ * Volatile file name may look like:
+ * <parent>/LUSTRE_VOLATILE_HDR:<mdt_index>:<random>:fd=<fd>
+ * where fd is opened descriptor of reference file.
+ *
+ * \param[in] volatile_name	volatile file name
+ * \param[in] volatile_len	volatile file name length
+ * \param[out] ref_file		pointer to struct file of reference file
+ *
+ * \retval 0		on success
+ * \retval negative	errno on failure
+ */
+int volatile_ref_file(const char *volatile_name, int volatile_len,
+		      struct file **ref_file)
+{
+	char *p, *q, *fd_str;
+	int fd, rc;
+
+	p = strnstr(volatile_name, ":fd=", volatile_len);
+	if (!p || strlen(p + 4) == 0)
+		return -EINVAL;
+
+	q = strchrnul(p + 4, ':');
+	fd_str = kstrndup(p + 4, q - p - 4, GFP_NOFS);
+	if (!fd_str)
+		return -ENOMEM;
+	rc = kstrtouint(fd_str, 10, &fd);
+	kfree(fd_str);
+	if (rc)
+		return -EINVAL;
+
+	*ref_file = fget(fd);
+	if (!(*ref_file))
+		return -EINVAL;
+	return 0;
+}
+
 /* If this inode has objects allocated to it (lsm != NULL), then the OST
  * object(s) determine the file size and mtime.  Otherwise, the MDS will
  * keep these values until such a time that objects are allocated for it.
@@ -2090,6 +2130,58 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr,
 					if (rc)
 						goto out;
 				}
+				/* If encrypted volatile file without the key,
+				 * we need to fetch size from reference file,
+				 * and set it on OST objects. This happens when
+				 * migrating or extending an encrypted file
+				 * without the key.
+				 */
+				if (filename_is_volatile(dentry->d_name.name,
+							 dentry->d_name.len,
+							 NULL) &&
+				    fscrypt_require_key(inode) == -ENOKEY) {
+					struct file *ref_file;
+					struct inode *ref_inode;
+					struct ll_inode_info *ref_lli;
+					struct cl_object *ref_obj;
+					struct cl_attr ref_attr = { 0 };
+					struct lu_env *env;
+					u16 refcheck;
+
+					rc = volatile_ref_file(
+						dentry->d_name.name,
+						dentry->d_name.len,
+						&ref_file);
+					if (rc)
+						goto out;
+
+					ref_inode = file_inode(ref_file);
+					if (!ref_inode) {
+						fput(ref_file);
+						rc = -EINVAL;
+						goto out;
+					}
+
+					env = cl_env_get(&refcheck);
+					if (IS_ERR(env)) {
+						rc = PTR_ERR(env);
+						goto out;
+					}
+
+					ref_lli = ll_i2info(ref_inode);
+					ref_obj = ref_lli->lli_clob;
+					cl_object_attr_lock(ref_obj);
+					rc = cl_object_attr_get(env, ref_obj,
+								&ref_attr);
+					cl_object_attr_unlock(ref_obj);
+					cl_env_put(env, &refcheck);
+					fput(ref_file);
+					if (rc)
+						goto out;
+
+					attr->ia_valid |= ATTR_SIZE;
+					attr->ia_size = ref_attr.cat_size;
+				}
 			}
 			rc = cl_setattr_ost(ll_i2info(inode)->lli_clob,
 					    attr, xvalid, flags);
@@ -2462,7 +2554,15 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 
 	LASSERT(fid_seq(&lli->lli_fid) != 0);
 
-	lli->lli_attr_valid = body->mbo_valid;
+	/* In case of encrypted file without the key, please do not lose
+	 * clear text size stored into lli_lazysize in ll_merge_attr(),
+	 * we will need it in ll_prepare_close().
+	 */
+	if (lli->lli_attr_valid & OBD_MD_FLLAZYSIZE && lli->lli_lazysize &&
+	    fscrypt_require_key(inode) == -ENOKEY)
+		lli->lli_attr_valid = body->mbo_valid | OBD_MD_FLLAZYSIZE;
+	else
+		lli->lli_attr_valid = body->mbo_valid;
 	if (body->mbo_valid & OBD_MD_FLSIZE) {
 		i_size_write(inode, body->mbo_size);
 
@@ -3097,11 +3197,10 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 			op_data->op_flags |= MF_OPNAME_KMALLOCED;
 	}
 
-	/* In fact LUSTRE_OPC_LOOKUP, LUSTRE_OPC_OPEN, LUSTRE_OPC_MIGR
+	/* In fact LUSTRE_OPC_LOOKUP, LUSTRE_OPC_OPEN
 	 * are LUSTRE_OPC_ANY
 	 */
-	if (opc == LUSTRE_OPC_LOOKUP || opc == LUSTRE_OPC_OPEN ||
-	    opc == LUSTRE_OPC_MIGR)
+	if (opc == LUSTRE_OPC_LOOKUP || opc == LUSTRE_OPC_OPEN)
 		op_data->op_code = LUSTRE_OPC_ANY;
 	else
 		op_data->op_code = opc;
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 5fff54d..d46a30f 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -49,7 +49,7 @@
 static int ll_create_it(struct inode *dir, struct dentry *dentry,
 			struct lookup_intent *it,
 			void *secctx, u32 secctxlen, bool encrypt,
-			void *encctx, u32 encctxlen);
+			void *encctx, u32 encctxlen, unsigned int open_flags);
 
 /* called from iget5_locked->find_inode() under inode_hash_lock spinlock */
 static int ll_test_inode(struct inode *inode, void *opaque)
@@ -908,44 +908,21 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
 			*secctxlen = 0;
 	}
 	if (it->it_op & IT_CREAT && encrypt) {
-		/* Volatile file name may look like:
-		 * <parent>/LUSTRE_VOLATILE_HDR:<mdt_index>:<random>:fd=<fd>
-		 * where fd is opened descriptor of reference file.
-		 */
 		if (unlikely(filename_is_volatile(dentry->d_name.name,
 						  dentry->d_name.len, NULL))) {
+			/* get encryption context from reference file */
 			int ctx_size = LLCRYPT_ENC_CTX_SIZE;
 			struct lustre_sb_info *lsi;
 			struct file *ref_file;
 			struct inode *ref_inode;
-			char *p, *q, *fd_str;
 			void *ctx;
-			int fd;
 
-			p = strnstr(dentry->d_name.name, ":fd=",
-				    dentry->d_name.len);
-			if (!p || strlen(p + 4) == 0) {
-				retval = ERR_PTR(-EINVAL);
-				goto out;
-			}
-
-			q = strchrnul(p + 4, ':');
-			fd_str = kstrndup(p + 4, q - p - 4, GFP_NOFS);
-			if (!fd_str) {
-				retval = ERR_PTR(-ENOMEM);
-				goto out;
-			}
-			rc = kstrtouint(fd_str, 10, &fd);
-			kfree(fd_str);
+			rc = volatile_ref_file(dentry->d_name.name,
+					       dentry->d_name.len,
+					       &ref_file);
 			if (rc) {
-				rc = -EINVAL;
-				goto inherit;
-			}
-
-			ref_file = fget(fd);
-			if (!ref_file) {
-				rc = -EINVAL;
-				goto inherit;
+				retval = ERR_PTR(rc);
+				goto out;
 			}
 
 			ref_inode = file_inode(ref_file);
@@ -1254,7 +1231,14 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 		if (rc)
 			goto out_release;
 		if (open_flags & O_CREAT) {
-			if (!fscrypt_has_encryption_key(dir)) {
+			/* For migration or mirroring without enc key, we still
+			 * need to be able to create a volatile file.
+			 */
+			if (!fscrypt_has_encryption_key(dir) &&
+			    (!filename_is_volatile(dentry->d_name.name,
+						   dentry->d_name.len, NULL) ||
+			    (open_flags & O_FILE_ENC) != O_FILE_ENC ||
+			    !(open_flags & O_DIRECT))) {
 				rc = -ENOKEY;
 				goto out_release;
 			}
@@ -1287,7 +1271,8 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 		if (it_disposition(it, DISP_OPEN_CREATE)) {
 			/* Dentry instantiated in ll_create_it. */
 			rc = ll_create_it(dir, dentry, it, secctx, secctxlen,
-					  encrypt, encctx, encctxlen);
+					  encrypt, encctx, encctxlen,
+					  open_flags);
 			security_release_secctx(secctx, secctxlen);
 			kfree(encctx);
 			if (rc) {
@@ -1414,7 +1399,7 @@ static struct inode *ll_create_node(struct inode *dir, struct lookup_intent *it)
 static int ll_create_it(struct inode *dir, struct dentry *dentry,
 			struct lookup_intent *it,
 			void *secctx, u32 secctxlen, bool encrypt,
-			void *encctx, u32 encctxlen)
+			void *encctx, u32 encctxlen, unsigned int open_flags)
 {
 	struct inode *inode;
 	u64 bits = 0;
@@ -1449,7 +1434,18 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry,
 	d_instantiate(dentry, inode);
 
 	if (encrypt) {
-		rc = ll_set_encflags(inode, encctx, encctxlen, true);
+		bool preload = true;
+
+		/* For migration or mirroring without enc key, we
+		 * create a volatile file without enc context.
+		 */
+		if (!fscrypt_has_encryption_key(dir) &&
+		    filename_is_volatile(dentry->d_name.name,
+					 dentry->d_name.len, NULL) &&
+		    (open_flags & O_FILE_ENC) == O_FILE_ENC &&
+		    open_flags & O_DIRECT)
+			preload = false;
+		rc = ll_set_encflags(inode, encctx, encctxlen, preload);
 		if (rc)
 			return rc;
 	}
diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index 0a271b9..4c2ab38 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -257,7 +257,7 @@ struct ll_dio_pages {
 		if (inode && IS_ENCRYPTED(inode)) {
 			/* In case of Direct IO on encrypted file, we need to
 			 * add a reference to the inode on the cl_page.
-			 * This info is required by llcrypt to proceed
+			 * This info is required by fscrypt to proceed
 			 * to encryption/decryption.
 			 * This is safe because we know these pages are private
 			 * to the thread doing the Direct IO.
diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c
index b67b822..6aea651 100644
--- a/fs/lustre/llite/xattr.c
+++ b/fs/lustre/llite/xattr.c
@@ -365,7 +365,7 @@ int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer,
 	int rc;
 
 	/* Getting LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr is only allowed
-	 * when it comes from ll_get_context(), ie when llcrypt needs to
+	 * when it comes from ll_get_context(), ie when fscrypt needs to
 	 * know the encryption context.
 	 * Otherwise, any direct reading of this xattr returns -EPERM.
 	 */
@@ -646,7 +646,7 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size)
 
 		/* Listing xattrs should not expose
 		 * LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr, unless it comes
-		 * from llcrypt.
+		 * from fscrypt.
 		 */
 		if (get_xattr_type(xattr_name)->flags == XATTR_SECURITY_T &&
 		    !strcmp(xattr_name, LL_XATTR_NAME_ENCRYPTION_CONTEXT)) {
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index e065eab..59dc625 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1450,7 +1450,8 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 	if (!req)
 		return -ENOMEM;
 
-	if (opc == OST_WRITE && inode && IS_ENCRYPTED(inode)) {
+	if (opc == OST_WRITE && inode && IS_ENCRYPTED(inode) &&
+	    fscrypt_has_encryption_key(inode)) {
 		for (i = 0; i < page_count; i++) {
 			struct brw_page *pg = pga[i];
 			struct page *data_page = NULL;
@@ -1461,9 +1462,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 			pgoff_t index_orig;
 
 retry_encrypt:
-			if (nunits & ~LUSTRE_ENCRYPTION_MASK)
-				nunits = (nunits & LUSTRE_ENCRYPTION_MASK) +
-					  LUSTRE_ENCRYPTION_UNIT_SIZE;
+			nunits = round_up(nunits, LUSTRE_ENCRYPTION_UNIT_SIZE);
 			/* The page can already be locked when we arrive here.
 			 * This is possible when cl_page_assume/vvp_page_assume
 			 * is stuck on wait_on_page_writeback with page lock
@@ -1521,14 +1520,38 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 			pg->bp_off_diff = pg->off & ~PAGE_MASK;
 			pg->off = pg->off & PAGE_MASK;
 		}
-	} else if (opc == OST_READ && inode && IS_ENCRYPTED(inode)) {
+	} else if (opc == OST_WRITE && inode && IS_ENCRYPTED(inode)) {
+		struct osc_async_page *oap = brw_page2oap(pga[0]);
+		struct cl_page *clpage = oap2cl_page(oap);
+		struct cl_object *clobj = clpage->cp_obj;
+		struct cl_attr attr = { 0 };
+		struct lu_env *env;
+		u16 refcheck;
+
+		env = cl_env_get(&refcheck);
+		if (IS_ERR(env)) {
+			rc = PTR_ERR(env);
+			ptlrpc_request_free(req);
+			return rc;
+		}
+
+		cl_object_attr_lock(clobj);
+		rc = cl_object_attr_get(env, clobj, &attr);
+		cl_object_attr_unlock(clobj);
+		cl_env_put(env, &refcheck);
+		if (rc != 0) {
+			ptlrpc_request_free(req);
+			return rc;
+		}
+		if (attr.cat_size)
+			oa->o_size = attr.cat_size;
+	} else if (opc == OST_READ && inode && IS_ENCRYPTED(inode) &&
+		   fscrypt_has_encryption_key(inode)) {
 		for (i = 0; i < page_count; i++) {
 			struct brw_page *pg = pga[i];
 			u32 nunits = (pg->off & ~PAGE_MASK) + pg->count;
 
-			if (nunits & ~LUSTRE_ENCRYPTION_MASK)
-				nunits = (nunits & LUSTRE_ENCRYPTION_MASK) +
-					  LUSTRE_ENCRYPTION_UNIT_SIZE;
+			nunits = round_up(nunits, LUSTRE_ENCRYPTION_UNIT_SIZE);
 			/* count/off are forced to cover the whole encryption
 			 * unit size so that all encrypted data is stored on the
 			 * OST, so adjust bp_{count,off}_diff for the size of
@@ -1554,7 +1577,8 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 
 	for (i = 0; i < page_count; i++) {
 		short_io_size += pga[i]->count;
-		if (!inode || !IS_ENCRYPTED(inode)) {
+		if (!inode || !IS_ENCRYPTED(inode) ||
+		    !fscrypt_has_encryption_key(inode)) {
 			pga[i]->bp_count_diff = 0;
 			pga[i]->bp_off_diff = 0;
 		}
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 291e8e0..1e66930 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -399,6 +399,10 @@ struct ll_ioc_lease_id {
  * devices and are safe for use on new files (See LU-812, LU-4209).
  */
 #define O_LOV_DELAY_CREATE	(O_NOCTTY | FASYNC)
+/* O_FILE_ENC principle is similar to O_LOV_DELAY_CREATE above,
+ * for access to encrypted files without the encryption key.
+ */
+#define O_FILE_ENC		(O_NOCTTY | O_NDELAY)
 
 #define LL_FILE_IGNORE_LOCK	0x00000001
 #define LL_FILE_GROUP_LOCKED	0x00000002
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 08/13] lustre: sec: fix handling of encrypted file with long name
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
                   ` (6 preceding siblings ...)
  2021-12-29 14:51 ` [lustre-devel] [PATCH 07/13] lustre: sec: no encryption key migrate/extend/resync/split James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 09/13] lnet: socklnd: expect two control connections maximum James Simmons
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

The ciphertext representation of the name of an encrypted file or
directory can be up to 256 bytes of binary data, if the cleartext
name is up to NAME_MAX. But then this ciphertext is encoded via
critical_encode() before being sent to servers. Once encoded, the
length can exceed NAME_MAX because of the escaped critical
characters.
So make sure ll_prep_md_op_data() accepts those too long encoded names
if it is called for lookup or create of an encrypted file or
directory. In the other cases, the 'name' taken as input is the plain
text version, so it must conform to the NAME_MAX limit.

When carrying out operations on an encrypted file with long name, we
manipulate a digested form whose hash needs to be matched against the
content of the LinkEA. The name found in the LinkEA is not NUL
terminated, so this aspect must be taken care of.

Fixes: e4c377fefc ("lustre: sec: filename encryption")
Fixes: 860818695d ("lustre: sec: filename encryption - digest support")
WC-bug-id: https://jira.whamcloud.com/browse/LU-13717
Lustre-commit: 75414af6bf310244d ("LU-13717 sec: fix handling of encrypted file with long name")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/45163
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_lib.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index c9be5af..11a545a3 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -3110,7 +3110,9 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 		if (namelen)
 			return ERR_PTR(-EINVAL);
 	} else {
-		if (namelen > ll_i2sbi(i1)->ll_namelen)
+		if ((!IS_ENCRYPTED(i1) ||
+		     (opc != LUSTRE_OPC_LOOKUP && opc != LUSTRE_OPC_CREATE)) &&
+		    namelen > ll_i2sbi(i1)->ll_namelen)
 			return ERR_PTR(-ENAMETOOLONG);
 
 		/* "/" is not valid name, but it's allowed */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 09/13] lnet: socklnd: expect two control connections maximum
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
                   ` (7 preceding siblings ...)
  2021-12-29 14:51 ` [lustre-devel] [PATCH 08/13] lustre: sec: fix handling of encrypted file with long name James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 10/13] lustre: ptlrpc: use a cached value James Simmons
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

As a result of connecting to ourselves, e.g. pinging own nid,
two control type connections are established vs. just one
in case of connecting externally.
Fix the control connection counter to be able to handle that.

Fixes: 511ace4a ("lnet: socklnd: add conns_per_peer parameter")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15137
Lustre-commit: ee9a03d8308c5918a ("LU-15137 socklnd: expect two control connections maximum")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45461
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h
index fe1bc7d..4607ef7 100644
--- a/net/lnet/klnds/socklnd/socklnd.h
+++ b/net/lnet/klnds/socklnd/socklnd.h
@@ -397,7 +397,7 @@ struct ksock_conn_cb {
 							 * type
 							 */
 	unsigned int		ksnr_deleted:1;		/* been removed from peer_ni? */
-	unsigned int            ksnr_ctrl_conn_count:1; /* # conns by type */
+	unsigned int            ksnr_ctrl_conn_count:2; /* # conns by type */
 	unsigned int		ksnr_blki_conn_count:8;
 	unsigned int		ksnr_blko_conn_count:8;
 	int			ksnr_conn_count;	/* total # conns for
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 10/13] lustre: ptlrpc: use a cached value
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
                   ` (8 preceding siblings ...)
  2021-12-29 14:51 ` [lustre-devel] [PATCH 09/13] lnet: socklnd: expect two control connections maximum James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 11/13] lnet: Race on discovery queue James Simmons
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexey Lyashkov, Lustre Development List

From: Alexey Lyashkov <alexey.lyashkov@hpe.com>

Don't calculate a early reply size - use a cached,
as it don't changed after start

WC-bug-id: https://jira.whamcloud.com/browse/LU-15279
Lustre-commit: d6a3b0529d7da440a ("LU-15279 ptlrpc: use a cached value")
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/45661
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_net.h     | 2 +-
 fs/lustre/mdc/mdc_locks.c          | 4 ++--
 fs/lustre/ptlrpc/pack_generic.c    | 8 +++++---
 fs/lustre/ptlrpc/ptlrpc_internal.h | 1 +
 fs/lustre/ptlrpc/ptlrpc_module.c   | 1 +
 fs/lustre/ptlrpc/sec_null.c        | 4 ++--
 fs/lustre/ptlrpc/sec_plain.c       | 2 +-
 7 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index 78df59b..cf1bb7f 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -2010,7 +2010,7 @@ int lustre_shrink_msg(struct lustre_msg *msg, int segment,
 u32 lustre_msg_size(u32 magic, int count, u32 *lengths);
 u32 lustre_msg_size_v2(int count, u32 *lengths);
 u32 lustre_packed_msg_size(struct lustre_msg *msg);
-u32 lustre_msg_early_size(void);
+extern u32 lustre_msg_early_size;
 void *lustre_msg_buf_v2(struct lustre_msg_v2 *m, u32 n, u32 min_size);
 void *lustre_msg_buf(struct lustre_msg *m, u32 n, u32 minlen);
 u32 lustre_msg_buflen(struct lustre_msg *m, u32 n);
diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index aba94d1..b86d1b9 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -397,7 +397,7 @@ static int mdc_save_lovea(struct ptlrpc_request *req, void *data, u32 size)
 
 	/* Get real repbuf allocated size as rounded up power of 2 */
 	repsize = size_roundup_power2(req->rq_replen +
-				      lustre_msg_early_size());
+				      lustre_msg_early_size);
 	/* Estimate free space for DoM files in repbuf */
 	repsize_estimate = repsize - (req->rq_replen -
 			   mdt_md_capsule_size +
@@ -415,7 +415,7 @@ static int mdc_save_lovea(struct ptlrpc_request *req, void *data, u32 size)
 		CDEBUG(D_INFO, "Increase repbuf by %d bytes, total: %d\n",
 		       repsize, req->rq_replen);
 		repsize = size_roundup_power2(req->rq_replen +
-					      lustre_msg_early_size());
+					      lustre_msg_early_size);
 	}
 	/* The only way to report real allocated repbuf size to the server
 	 * is the lm_repsize but it must be set prior buffer allocation itself
diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c
index 23b36de..b41f51d 100644
--- a/fs/lustre/ptlrpc/pack_generic.c
+++ b/fs/lustre/ptlrpc/pack_generic.c
@@ -72,14 +72,16 @@ u32 lustre_msg_hdr_size(u32 magic, u32 count)
 	}
 }
 
+u32 lustre_msg_early_size;
+EXPORT_SYMBOL(lustre_msg_early_size);
+
 /* early reply size */
-u32 lustre_msg_early_size(void)
+void lustre_msg_early_size_init(void)
 {
 	u32 pblen = sizeof(struct ptlrpc_body);
 
-	return lustre_msg_size(LUSTRE_MSG_MAGIC_V2, 1, &pblen);
+	lustre_msg_early_size = lustre_msg_size(LUSTRE_MSG_MAGIC_V2, 1, &pblen);
 }
-EXPORT_SYMBOL(lustre_msg_early_size);
 
 u32 lustre_msg_size_v2(int count, u32 *lengths)
 {
diff --git a/fs/lustre/ptlrpc/ptlrpc_internal.h b/fs/lustre/ptlrpc/ptlrpc_internal.h
index f1f414c..d6edfde 100644
--- a/fs/lustre/ptlrpc/ptlrpc_internal.h
+++ b/fs/lustre/ptlrpc/ptlrpc_internal.h
@@ -244,6 +244,7 @@ void ptlrpc_fill_bulk_md(struct lnet_md *md, struct ptlrpc_bulk_desc *desc,
 struct ptlrpc_reply_state *
 lustre_get_emerg_rs(struct ptlrpc_service_part *svcpt);
 void lustre_put_emerg_rs(struct ptlrpc_reply_state *rs);
+void lustre_msg_early_size_init(void); /* just for init */
 
 /* pinger.c */
 int ptlrpc_start_pinger(void);
diff --git a/fs/lustre/ptlrpc/ptlrpc_module.c b/fs/lustre/ptlrpc/ptlrpc_module.c
index 8379bc4..7e29a91 100644
--- a/fs/lustre/ptlrpc/ptlrpc_module.c
+++ b/fs/lustre/ptlrpc/ptlrpc_module.c
@@ -85,6 +85,7 @@ static int __init ptlrpc_init(void)
 	mutex_init(&pinger_mutex);
 	mutex_init(&ptlrpcd_mutex);
 	ptlrpc_init_xid();
+	lustre_msg_early_size_init();
 
 	rc = libcfs_setup();
 	if (rc)
diff --git a/fs/lustre/ptlrpc/sec_null.c b/fs/lustre/ptlrpc/sec_null.c
index cf8f24b..a7241bd 100644
--- a/fs/lustre/ptlrpc/sec_null.c
+++ b/fs/lustre/ptlrpc/sec_null.c
@@ -195,7 +195,7 @@ int null_alloc_repbuf(struct ptlrpc_sec *sec,
 		      int msgsize)
 {
 	/* add space for early replied */
-	msgsize += lustre_msg_early_size();
+	msgsize += lustre_msg_early_size;
 
 	msgsize = size_roundup_power2(msgsize);
 
@@ -367,7 +367,7 @@ int null_authorize(struct ptlrpc_request *req)
 
 	if (likely(req->rq_packed_final)) {
 		if (lustre_msghdr_get_flags(req->rq_reqmsg) & MSGHDR_AT_SUPPORT)
-			req->rq_reply_off = lustre_msg_early_size();
+			req->rq_reply_off = lustre_msg_early_size;
 	} else {
 		u32 cksum;
 
diff --git a/fs/lustre/ptlrpc/sec_plain.c b/fs/lustre/ptlrpc/sec_plain.c
index 0d1c591..d546722 100644
--- a/fs/lustre/ptlrpc/sec_plain.c
+++ b/fs/lustre/ptlrpc/sec_plain.c
@@ -996,7 +996,7 @@ int sptlrpc_plain_init(void)
 	u32 buflens[PLAIN_PACK_SEGMENTS] = { 0, };
 	int rc;
 
-	buflens[PLAIN_PACK_MSG_OFF] = lustre_msg_early_size();
+	buflens[PLAIN_PACK_MSG_OFF] = lustre_msg_early_size;
 	plain_at_offset = lustre_msg_size_v2(PLAIN_PACK_SEGMENTS, buflens);
 
 	rc = sptlrpc_register_policy(&plain_policy);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 11/13] lnet: Race on discovery queue
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
                   ` (9 preceding siblings ...)
  2021-12-29 14:51 ` [lustre-devel] [PATCH 10/13] lustre: ptlrpc: use a cached value James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 12/13] lnet: o2iblnd: convert ibp_refcount to a kref James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 13/13] lustre: llite: set ra_pages of backing_dev_info with 0 James Simmons
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

If the discovery thread clears the LNET_PEER_DISCOVERING bit then a
race window opens when the discovery thread drops the
lnet_peer.lp_lock spinlock and closes when the discovery thread
acquires the lnet_net_lock. If another thread queues the peer for
discovery during this window then the LNET_PEER_DISCOVERING bit is
added back to the peer state, but since the peer is already on the
lnet.ln_dc_working queue, it does not get added to the
lnet.ln_dc_request queue.

When the discovery thread acquires the lnet_net_lock/EX, it sees that
the LNET_PEER_DISCOVERING bit has not been cleared, so it does not
call lnet_peer_discovery_complete() which is responsible for sending
messages on the peer's discovery pending queue.

At this point, the peer is stuck on the lnet.ln_dc_working queue, and
messages may continue to accumulate on the peer's
lnet_peer.lp_dc_pendq.

Fix the issue by re-working the main discovery thread loop so that we
do not release the lnet_peer.lp_lock until after we've determined
whether we need to call lnet_peer_discovery_complete().
This ensures that the lnet_peer is correctly removed from the
discovery work queue and any messages on the peer's
lnet_peer.lp_dc_pendq are sent or finalized.

It is also possible for the lnet_peer.lp_dc_error to be cleared
during the aforementioned window, as well as during the time when
lnet_peer_discovery_complete() is processing the contents of the
lnet_peer.lp_dc_pendq. This could prevent messages on the
lnet_peer.lp_dc_pendq from being correctly finalized. To fix this
issue, the responsibilities of lnet_peer_discovery_error() were
incorporated into lnet_peer_discovery_complete().

HPE-bug-id: LUS-10615
WC-bug-id: https://jira.whamcloud.com/browse/LU-15234
Lustre-commit: 852a4b264a984979d ("LU-15234 lnet: Race on discovery queue")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/45670
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 47 ++++++++++++++++++++---------------------------
 1 file changed, 20 insertions(+), 27 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index cca458f..057a1db 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2262,7 +2262,7 @@ static int lnet_peer_queue_for_discovery(struct lnet_peer *lp)
  * Discovery of a peer is complete. Wake all waiters on the peer.
  * Call with lnet_net_lock/EX held.
  */
-static void lnet_peer_discovery_complete(struct lnet_peer *lp)
+static void lnet_peer_discovery_complete(struct lnet_peer *lp, int dc_error)
 {
 	struct lnet_msg *msg, *tmp;
 	int rc = 0;
@@ -2273,6 +2273,11 @@ static void lnet_peer_discovery_complete(struct lnet_peer *lp)
 
 	list_del_init(&lp->lp_dc_list);
 	spin_lock(&lp->lp_lock);
+	if (dc_error) {
+		lp->lp_dc_error = dc_error;
+		lp->lp_state &= ~LNET_PEER_DISCOVERING;
+		lp->lp_state |= LNET_PEER_REDISCOVER;
+	}
 	list_splice_init(&lp->lp_dc_pendq, &pending_msgs);
 	spin_unlock(&lp->lp_lock);
 	wake_up(&lp->lp_dc_waitq);
@@ -2285,8 +2290,8 @@ static void lnet_peer_discovery_complete(struct lnet_peer *lp)
 	/* iterate through all pending messages and send them again */
 	list_for_each_entry_safe(msg, tmp, &pending_msgs, msg_list) {
 		list_del_init(&msg->msg_list);
-		if (lp->lp_dc_error) {
-			lnet_finalize(msg, lp->lp_dc_error);
+		if (dc_error) {
+			lnet_finalize(msg, dc_error);
 			continue;
 		}
 
@@ -3619,22 +3624,6 @@ static int lnet_peer_send_push(struct lnet_peer *lp)
 }
 
 /*
- * An unrecoverable error was encountered during discovery.
- * Set error status in peer and abort discovery.
- */
-static void lnet_peer_discovery_error(struct lnet_peer *lp, int error)
-{
-	CDEBUG(D_NET, "Discovery error %s: %d\n",
-	       libcfs_nidstr(&lp->lp_primary_nid), error);
-
-	spin_lock(&lp->lp_lock);
-	lp->lp_dc_error = error;
-	lp->lp_state &= ~LNET_PEER_DISCOVERING;
-	lp->lp_state |= LNET_PEER_REDISCOVER;
-	spin_unlock(&lp->lp_lock);
-}
-
-/*
  * Wait for work to be queued or some other change that must be
  * attended to. Returns non-zero if the discovery thread should shut
  * down.
@@ -3810,17 +3799,22 @@ static int lnet_peer_discovery(void *arg)
 			CDEBUG(D_NET, "peer %s(%p) state %#x rc %d\n",
 			       libcfs_nidstr(&lp->lp_primary_nid), lp,
 			       lp->lp_state, rc);
-			spin_unlock(&lp->lp_lock);
 
-			lnet_net_lock(LNET_LOCK_EX);
 			if (rc == LNET_REDISCOVER_PEER) {
+				spin_unlock(&lp->lp_lock);
+				lnet_net_lock(LNET_LOCK_EX);
 				list_move(&lp->lp_dc_list,
 					  &the_lnet.ln_dc_request);
-			} else if (rc) {
-				lnet_peer_discovery_error(lp, rc);
+			} else if (rc ||
+				   !(lp->lp_state & LNET_PEER_DISCOVERING)) {
+				spin_unlock(&lp->lp_lock);
+				lnet_net_lock(LNET_LOCK_EX);
+				lnet_peer_discovery_complete(lp, rc);
+			} else {
+				spin_unlock(&lp->lp_lock);
+				lnet_net_lock(LNET_LOCK_EX);
 			}
-			if (!(lp->lp_state & LNET_PEER_DISCOVERING))
-				lnet_peer_discovery_complete(lp);
+
 			if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING)
 				break;
 		}
@@ -3857,8 +3851,7 @@ static int lnet_peer_discovery(void *arg)
 	while (!list_empty(&the_lnet.ln_dc_request)) {
 		lp = list_first_entry(&the_lnet.ln_dc_request,
 				      struct lnet_peer, lp_dc_list);
-		lnet_peer_discovery_error(lp, -ESHUTDOWN);
-		lnet_peer_discovery_complete(lp);
+		lnet_peer_discovery_complete(lp, -ESHUTDOWN);
 	}
 	lnet_net_unlock(LNET_LOCK_EX);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 12/13] lnet: o2iblnd: convert ibp_refcount to a kref
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
                   ` (10 preceding siblings ...)
  2021-12-29 14:51 ` [lustre-devel] [PATCH 11/13] lnet: Race on discovery queue James Simmons
@ 2021-12-29 14:51 ` James Simmons
  2021-12-29 14:51 ` [lustre-devel] [PATCH 13/13] lustre: llite: set ra_pages of backing_dev_info with 0 James Simmons
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

This refcount is used exactly like a kref.  So change it to one.
kref uses refcount_t which will warn on increment-from-zero and
similar problems (which enabled with CONFIG option), so we don't
need the LASSERT calls.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12678
Lustre-commit: 2968a40a163aa1b0f ("LU-12678 o2iblnd: convert ibp_refcount to a kref")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45685
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c | 11 ++++++-----
 net/lnet/klnds/o2iblnd/o2iblnd.h | 35 +++++++++++++++++------------------
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 9cdc12a..7d28acd 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -337,7 +337,7 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer_ni **peerp,
 	peer_ni->ibp_max_frags = IBLND_MAX_RDMA_FRAGS;
 	peer_ni->ibp_queue_depth = ni->ni_net->net_tunables.lct_peer_tx_credits;
 	peer_ni->ibp_queue_depth_mod = 0;	/* try to use the default */
-	atomic_set(&peer_ni->ibp_refcount, 1);  /* 1 ref for caller */
+	kref_init(&peer_ni->ibp_kref);
 
 	INIT_HLIST_NODE(&peer_ni->ibp_list);
 	INIT_LIST_HEAD(&peer_ni->ibp_conns);
@@ -357,12 +357,13 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer_ni **peerp,
 	return 0;
 }
 
-void kiblnd_destroy_peer(struct kib_peer_ni *peer_ni)
+void kiblnd_destroy_peer(struct kref *kref)
 {
+	struct kib_peer_ni *peer_ni = container_of(kref, struct kib_peer_ni,
+						   ibp_kref);
 	struct kib_net *net = peer_ni->ibp_ni->ni_data;
 
 	LASSERT(net);
-	LASSERT(!atomic_read(&peer_ni->ibp_refcount));
 	LASSERT(!kiblnd_peer_active(peer_ni));
 	LASSERT(kiblnd_peer_idle(peer_ni));
 	LASSERT(list_empty(&peer_ni->ibp_tx_queue));
@@ -403,7 +404,7 @@ struct kib_peer_ni *kiblnd_find_peer_locked(struct lnet_ni *ni, lnet_nid_t nid)
 
 		CDEBUG(D_NET, "got peer_ni [%p] -> %s (%d) version: %x\n",
 		       peer_ni, libcfs_nid2str(nid),
-		       atomic_read(&peer_ni->ibp_refcount),
+		       kref_read(&peer_ni->ibp_kref),
 		       peer_ni->ibp_version);
 		return peer_ni;
 	}
@@ -439,7 +440,7 @@ static int kiblnd_get_peer_info(struct lnet_ni *ni, int index,
 			continue;
 
 		*nidp = peer_ni->ibp_nid;
-		*count = atomic_read(&peer_ni->ibp_refcount);
+		*count = kref_read(&peer_ni->ibp_kref);
 
 		read_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags);
 		return 0;
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 21f8981..4fb651e 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -499,7 +499,7 @@ struct kib_peer_ni {
 	/* when (in seconds) I was last alive */
 	time64_t		ibp_last_alive;
 	/* # users */
-	atomic_t		ibp_refcount;
+	struct kref		ibp_kref;
 	/* version of peer_ni */
 	u16			ibp_version;
 	/* current passive connection attempts */
@@ -607,23 +607,23 @@ static inline int kiblnd_timeout(void)
 	}								\
 } while (0)
 
-#define kiblnd_peer_addref(peer_ni)					\
-do {									\
-	CDEBUG(D_NET, "peer_ni[%p] -> %s (%d)++\n",			\
-	       (peer_ni), libcfs_nid2str((peer_ni)->ibp_nid),		\
-	       atomic_read(&(peer_ni)->ibp_refcount));			\
-	atomic_inc(&(peer_ni)->ibp_refcount);				\
-} while (0)
+void kiblnd_destroy_peer(struct kref *kref);
 
-#define kiblnd_peer_decref(peer_ni)					\
-do {									\
-	CDEBUG(D_NET, "peer_ni[%p] -> %s (%d)--\n",			\
-	       (peer_ni), libcfs_nid2str((peer_ni)->ibp_nid),		\
-	       atomic_read(&(peer_ni)->ibp_refcount));			\
-	LASSERT_ATOMIC_POS(&(peer_ni)->ibp_refcount);			\
-	if (atomic_dec_and_test(&(peer_ni)->ibp_refcount))		\
-		kiblnd_destroy_peer(peer_ni);				\
-} while (0)
+static inline void kiblnd_peer_addref(struct kib_peer_ni *peer_ni)
+{
+	CDEBUG(D_NET, "peer_ni[%p] -> %s (%d)++\n",
+	       peer_ni, libcfs_nid2str(peer_ni->ibp_nid),
+	       kref_read(&peer_ni->ibp_kref));
+	kref_get(&(peer_ni)->ibp_kref);
+}
+
+static inline void kiblnd_peer_decref(struct kib_peer_ni *peer_ni)
+{
+	CDEBUG(D_NET, "peer_ni[%p] -> %s (%d)--\n",
+	       peer_ni, libcfs_nid2str(peer_ni->ibp_nid),
+	       kref_read(&peer_ni->ibp_kref));
+	kref_put(&peer_ni->ibp_kref, kiblnd_destroy_peer);
+}
 
 static inline bool
 kiblnd_peer_connecting(struct kib_peer_ni *peer_ni)
@@ -929,7 +929,6 @@ int kiblnd_cm_callback(struct rdma_cm_id *cmid,
 int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns);
 int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer_ni **peerp,
 		       lnet_nid_t nid);
-void kiblnd_destroy_peer(struct kib_peer_ni *peer_ni);
 bool kiblnd_reconnect_peer(struct kib_peer_ni *peer_ni);
 void kiblnd_destroy_dev(struct kib_dev *dev);
 void kiblnd_unlink_peer_locked(struct kib_peer_ni *peer_ni);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 13/13] lustre: llite: set ra_pages of backing_dev_info with 0
  2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
                   ` (11 preceding siblings ...)
  2021-12-29 14:51 ` [lustre-devel] [PATCH 12/13] lnet: o2iblnd: convert ibp_refcount to a kref James Simmons
@ 2021-12-29 14:51 ` James Simmons
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-12-29 14:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Qian Yingjin <qian@ddn.com>

The latest kernels sets initial @ra_pages of
backing_dev_info with VM_READAHEAD_PAGES:

struct backing_dev_info *bdi_alloc(int node_id)
{
        ...
        bdi->ra_pages = VM_READAHEAD_PAGES;
        bdi->io_pages = VM_READAHEAD_PAGES;
        ...
}

This will cause that @ra_pages of file readahead state is set
with @bdi->ra_pages, make the readahead is out of Lustre control
and trigger the readahead logic in Linux kernel wrongly. And it
results in the failure sanity 101j.

In this patch, we force to set @ra_pages of backing_dev_info with
0 after setup the backing device info. By this way, it disables
kernel readahead in the super block.

This patch also cleanups the unnecessary setting of @ra_pages in
llite "file.c" and "vvp_io.c".

WC-bug-id: https://jira.whamcloud.com/browse/LU-15244
Lustre-commit: 878561880d2aba038 ("LU-15244 llite: set ra_pages of backing_dev_info with 0")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/45712
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c      | 2 --
 fs/lustre/llite/llite_lib.c | 3 +++
 fs/lustre/llite/vvp_io.c    | 3 ---
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index eafb936..30e99c0 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -757,8 +757,6 @@ static int ll_local_open(struct file *file, struct lookup_intent *it,
 	file->private_data = fd;
 	ll_readahead_init(inode, &fd->fd_ras);
 	fd->fd_omode = it->it_flags & (FMODE_READ | FMODE_WRITE | FMODE_EXEC);
-	/* turn off the kernel's read-ahead */
-	file->f_ra.ra_pages = 0;
 
 	return 0;
 }
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 11a545a3..87cdc36 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1203,6 +1203,9 @@ int ll_fill_super(struct super_block *sb)
 	if (err)
 		goto out_free;
 
+	/* disable kernel readahead */
+	sb->s_bdi->ra_pages = 0;
+
 	/* Call ll_debugsfs_register_super() before lustre_process_log()
 	 * so that "llite.*.*" params can be processed correctly.
 	 */
diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index d8951ac..40047f8 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -834,9 +834,6 @@ static int vvp_io_read_start(const struct lu_env *env,
 			 "Read ino %lu, %zu bytes, offset %lld, size %llu\n",
 			 inode->i_ino, cnt, pos, i_size_read(inode));
 
-	/* turn off the kernel's read-ahead */
-	vio->vui_fd->fd_file->f_ra.ra_pages = 0;
-
 	/* initialize read-ahead window once per syscall */
 	if (!vio->vui_ra_valid) {
 		vio->vui_ra_valid = true;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-12-29 14:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-29 14:51 [lustre-devel] [PATCH 00/13] lustre: port OpenSFS updates Dec 29, 2021 James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 01/13] lustre: sec: filename encryption - digest support James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 02/13] lnet: Revert "lnet: Lock primary NID logic" James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 03/13] lustre: quota: fallocate send UID/GID for quota James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 04/13] lustre: mdc: add client tunable to disable LSOM update James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 05/13] lustre: dne: dir migration in non-recursive mode James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 06/13] lustre: update version to 2.14.56 James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 07/13] lustre: sec: no encryption key migrate/extend/resync/split James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 08/13] lustre: sec: fix handling of encrypted file with long name James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 09/13] lnet: socklnd: expect two control connections maximum James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 10/13] lustre: ptlrpc: use a cached value James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 11/13] lnet: Race on discovery queue James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 12/13] lnet: o2iblnd: convert ibp_refcount to a kref James Simmons
2021-12-29 14:51 ` [lustre-devel] [PATCH 13/13] lustre: llite: set ra_pages of backing_dev_info with 0 James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).