lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020
@ 2020-11-16  0:59 James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 01/28] llite: remove splice_read handling for PCC James Simmons
                   ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

Backport of various patches from the OpenSFS tree. Fixes for mmap
and fscrypto has been included which impacted new kernels. This
work has been validated against:

sanity-lnet.sh
sanity.sh
sanity-hsm.sh
sanity-sec.sh
sanity-pcc.sh

Alexander Boyko (2):
  lustre: ptlrpc: remove unused code at pinger
  lustre: ptlrpc: decrease time between reconnection

Amir Shehata (1):
  lnet: o2iblnd: Don't retry indefinitely

Andriy Skulysh (2):
  lustre: llite: ASSERTION( last_oap_count > 0 ) failed
  lustre: ldlm: BL AST vs failed lock enqueue race

Aurelien Degremont (2):
  lustre: ptlrpc: throttle RPC resend if network error
  lustre: ptlrpc: don't log connection 'restored' inappropriately

Brian Behlendorf (1):
  lnet: o2iblnd: 'Timed out tx' error message

Hongchao Zhang (1):
  lustre: lov: doesn't check lov_refcount

James Simmons (1):
  llite: remove splice_read handling for PCC

John L. Hammond (1):
  lnet: o2ib: raise bind cap before resolving address

Lai Siyao (3):
  lustre: llite: rmdir releases inode on client
  lustre: mdc: remote object support getattr from cache
  lustre: llite: pass name in getattr by FID

Mikhail Pershin (2):
  lustre: ptlrpc: introduce OST_SEEK RPC
  lustre: clio: SEEK_HOLE/SEEK_DATA on client side

Mr NeilBrown (2):
  lustre: llite: disable statahead_agl for sanity test_56ra
  lustre: seq_file .next functions must update *pos

NeilBrown (1):
  lustre: use memalloc_nofs_save() for GFP_NOFS kvmalloc allocations.

Oleg Drokin (2):
  lustre: ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait
  lustre: llite: Avoid eternel retry loops with MAP_POPULATE

Sebastien Buisson (6):
  lustre: gss: update sequence in case of target disconnect
  lustre: sec: O_DIRECT for encrypted file
  lustre: sec: restrict fallocate on encrypted files
  lustre: sec: encryption with different client PAGE_SIZE
  lustre: sec: require enc key in case of O_CREAT only
  lustre: sec: fix O_DIRECT and encrypted files

Vitaly Fertman (1):
  lustre: ldlm: group locks for DOM IBIT lock

 .../client_side_encryption/access_semantics.txt    |   3 -
 fs/lustre/include/cl_object.h                      |  10 +
 fs/lustre/include/lustre_export.h                  |   5 +
 fs/lustre/include/lustre_net.h                     |   5 -
 fs/lustre/include/lustre_osc.h                     |   4 +
 fs/lustre/include/lustre_req_layout.h              |   1 +
 fs/lustre/include/obd.h                            |   1 +
 fs/lustre/ldlm/ldlm_inodebits.c                    |   2 +
 fs/lustre/ldlm/ldlm_lock.c                         |  12 +-
 fs/lustre/ldlm/ldlm_lockd.c                        |   4 +-
 fs/lustre/ldlm/ldlm_request.c                      |   5 +-
 fs/lustre/llite/dir.c                              |   1 -
 fs/lustre/llite/file.c                             | 115 ++++++++++--
 fs/lustre/llite/llite_internal.h                   |   1 -
 fs/lustre/llite/llite_lib.c                        |  19 +-
 fs/lustre/llite/llite_mmap.c                       |  10 +-
 fs/lustre/llite/namei.c                            |  24 ++-
 fs/lustre/llite/pcc.c                              |  33 +---
 fs/lustre/llite/pcc.h                              |   5 -
 fs/lustre/llite/rw26.c                             |  27 ++-
 fs/lustre/llite/statahead.c                        |  31 ++--
 fs/lustre/llite/super25.c                          |  11 ++
 fs/lustre/llite/vvp_io.c                           |  53 +++++-
 fs/lustre/lmv/lmv_intent.c                         |  22 ++-
 fs/lustre/lmv/lmv_obd.c                            |   8 +-
 fs/lustre/lov/lov_io.c                             |  99 +++++++++-
 fs/lustre/lov/lov_obd.c                            |   3 +-
 fs/lustre/lov/lov_object.c                         |  13 +-
 fs/lustre/lov/lov_pool.c                           |   2 +-
 fs/lustre/mdc/mdc_dev.c                            |  14 +-
 fs/lustre/mdc/mdc_locks.c                          |   1 -
 fs/lustre/obdclass/cl_io.c                         |  12 ++
 fs/lustre/obdclass/lprocfs_status.c                |   1 +
 fs/lustre/obdecho/echo_client.c                    |   7 +-
 fs/lustre/osc/osc_io.c                             | 143 ++++++++++++++-
 fs/lustre/osc/osc_request.c                        |  95 +++++++---
 fs/lustre/ptlrpc/client.c                          |  20 ++
 fs/lustre/ptlrpc/events.c                          |   5 +
 fs/lustre/ptlrpc/import.c                          |  52 +++++-
 fs/lustre/ptlrpc/layout.c                          |   5 +
 fs/lustre/ptlrpc/lproc_ptlrpc.c                    |   4 +-
 fs/lustre/ptlrpc/niobuf.c                          |   2 -
 fs/lustre/ptlrpc/pinger.c                          | 202 +++++----------------
 fs/lustre/ptlrpc/sec.c                             |   4 +-
 fs/lustre/ptlrpc/sec_null.c                        |   8 -
 fs/lustre/ptlrpc/wiretest.c                        |  14 +-
 include/uapi/linux/lustre/lustre_idl.h             |   3 +
 net/lnet/klnds/o2iblnd/o2iblnd.h                   |   2 +
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c                |  48 ++++-
 net/lnet/klnds/o2iblnd/o2iblnd_modparams.c         |   2 +-
 50 files changed, 834 insertions(+), 339 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 01/28] llite: remove splice_read handling for PCC
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 02/28] lustre: llite: disable statahead_agl for sanity test_56ra James Simmons
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

For older kernels with PCC we handled splice_read ourself but
this is no longer needed since generic_file_splice_read() is just
a wrapper around the read_iter() operation. We can safely remove
pcc_file_splice_read(). Use the read_iter() / write_iter()
functions for pccf_file instead of the default file of the passed
in kiocb.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13745
Lustre-commit: 1635dc9de0bc1d ("LU-13745 llite: switch generic_file_splice_read() to use of ->read_iter()")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/39272
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 fs/lustre/llite/pcc.c | 33 ++-------------------------------
 fs/lustre/llite/pcc.h |  5 -----
 2 files changed, 2 insertions(+), 36 deletions(-)

diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c
index 5a4bb33..28ca9cb 100644
--- a/fs/lustre/llite/pcc.c
+++ b/fs/lustre/llite/pcc.c
@@ -1592,7 +1592,7 @@ ssize_t pcc_file_read_iter(struct kiocb *iocb,
 	 * filp->f_ops->read_iter uses ->aio_read hook directly
 	 * to add support for ext4-dax.
 	 */
-	result = file->f_op->read_iter(iocb, iter);
+	result = iocb->ki_filp->f_op->read_iter(iocb, iter);
 	iocb->ki_filp = file;
 
 	pcc_io_fini(inode);
@@ -1633,7 +1633,7 @@ ssize_t pcc_file_write_iter(struct kiocb *iocb,
 	 * the normal vfs interface to the local PCC file system,
 	 * the inode lock is not needed.
 	 */
-	result = file->f_op->write_iter(iocb, iter);
+	result = iocb->ki_filp->f_op->write_iter(iocb, iter);
 	iocb->ki_filp = file;
 out:
 	pcc_io_fini(inode);
@@ -1734,35 +1734,6 @@ int pcc_inode_getattr(struct inode *inode, u32 request_mask,
 	return rc;
 }
 
-ssize_t pcc_file_splice_read(struct file *in_file, loff_t *ppos,
-			     struct pipe_inode_info *pipe,
-			     size_t count, unsigned int flags,
-			     bool *cached)
-{
-	struct inode *inode = file_inode(in_file);
-	struct ll_file_data *fd = in_file->private_data;
-	struct file *pcc_file = fd->fd_pcc_file.pccf_file;
-	ssize_t result;
-
-	*cached = false;
-	if (!pcc_file)
-		return 0;
-
-	if (!file_inode(pcc_file)->i_fop->splice_read)
-		return -ENOTSUPP;
-
-	pcc_io_init(inode, PIT_SPLICE_READ, cached);
-	if (!*cached)
-		return 0;
-
-	result = file_inode(pcc_file)->i_fop->splice_read(pcc_file,
-							  ppos, pipe, count,
-							  flags);
-
-	pcc_io_fini(inode);
-	return result;
-}
-
 int pcc_fsync(struct file *file, loff_t start, loff_t end,
 	      int datasync, bool *cached)
 {
diff --git a/fs/lustre/llite/pcc.h b/fs/lustre/llite/pcc.h
index b13f9da8..d3aae9b 100644
--- a/fs/lustre/llite/pcc.h
+++ b/fs/lustre/llite/pcc.h
@@ -184,8 +184,6 @@ enum pcc_io_type {
 	PIT_FAULT,
 	/* fsync system call handling */
 	PIT_FSYNC,
-	/* splice_read system call */
-	PIT_SPLICE_READ,
 	/* open system call */
 	PIT_OPEN,
 };
@@ -241,9 +239,6 @@ ssize_t pcc_file_write_iter(struct kiocb *iocb, struct iov_iter *iter,
 int pcc_inode_getattr(struct inode *inode, u32 request_mask,
 		      unsigned int flags, bool *cached);
 int pcc_inode_setattr(struct inode *inode, struct iattr *attr, bool *cached);
-ssize_t pcc_file_splice_read(struct file *in_file, loff_t *ppos,
-			     struct pipe_inode_info *pipe, size_t count,
-			     unsigned int flags, bool *cached);
 int pcc_fsync(struct file *file, loff_t start, loff_t end,
 	      int datasync, bool *cached);
 int pcc_file_mmap(struct file *file, struct vm_area_struct *vma, bool *cached);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 02/28] lustre: llite: disable statahead_agl for sanity test_56ra
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 01/28] llite: remove splice_read handling for PCC James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 03/28] lustre: seq_file .next functions must update *pos James Simmons
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

The sanity test_56ra can fail because statahead_agl can cause extra
glimpse request.

If a stat() systemcall is made after an AGL glimpse request is sent,
but before the reply has been received, the code handling the stat
cannot see that glimpse request and so will send another.  This
elevates the number of requests counted.

There is a parameter (statahead_agl) which make it easy to disable the
AGL, but it isn't implemented properly.  Specifically, inodes can
still be added to the sai_agls list when agl is disabled.  They will
never be removed, which causes an assertion to fail.

To clean this up, remove the sai_agl_valid flag, and use a test on
sai_task being non-NULL instead.  Also check agl_should_run() while
locked against ->sai_task changing, and before adding anything
to lli_agl_list.

We don't need the 'added' variable.  It is perfectly OK to wake_up the
sai_agl_task *before* adding to the list as long is that is all done
under the lock.  The task will wait for the lock before checking the
list, so it won't see it being empty.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13017
Lustre-commit: 3e04c4f0757c22 ("LU-13017 tests: disable statahead_agl for sanity test_56ra")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39667
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_internal.h |  1 -
 fs/lustre/llite/statahead.c      | 31 +++++++++++++------------------
 2 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 0bd6795..9d988aac 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1398,7 +1398,6 @@ struct ll_statahead_info {
 	unsigned int	    sai_ls_all:1,   /* "ls -al", do stat-ahead for
 					     * hidden entries
 					     */
-				sai_agl_valid:1,/* AGL is valid for the dir */
 				sai_in_readpage:1;/* statahead in readdir() */
 	wait_queue_head_t	sai_waitq;      /* stat-ahead wait queue */
 	struct task_struct     *sai_task;       /* stat-ahead thread */
diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c
index 895e496..a7d3a43 100644
--- a/fs/lustre/llite/statahead.c
+++ b/fs/lustre/llite/statahead.c
@@ -129,7 +129,7 @@ static inline int sa_hash(int val)
 static inline int agl_should_run(struct ll_statahead_info *sai,
 				 struct inode *inode)
 {
-	return (inode && S_ISREG(inode->i_mode) && sai->sai_agl_valid);
+	return inode && S_ISREG(inode->i_mode) && sai->sai_agl_task;
 }
 
 /* statahead window is full */
@@ -424,7 +424,6 @@ static void ll_agl_add(struct ll_statahead_info *sai,
 {
 	struct ll_inode_info *child = ll_i2info(inode);
 	struct ll_inode_info *parent = ll_i2info(sai->sai_dentry->d_inode);
-	int added = 0;
 
 	spin_lock(&child->lli_agl_lock);
 	if (child->lli_agl_index == 0) {
@@ -433,18 +432,20 @@ static void ll_agl_add(struct ll_statahead_info *sai,
 
 		LASSERT(list_empty(&child->lli_agl_list));
 
-		igrab(inode);
 		spin_lock(&parent->lli_agl_lock);
-		if (list_empty(&sai->sai_agls))
-			added = 1;
-		list_add_tail(&child->lli_agl_list, &sai->sai_agls);
+		/* Re-check under the lock */
+		if (agl_should_run(sai, inode)) {
+			if (list_empty(&sai->sai_agls))
+				wake_up_process(sai->sai_agl_task);
+			igrab(inode);
+			list_add_tail(&child->lli_agl_list, &sai->sai_agls);
+		} else {
+			child->lli_agl_index = 0;
+		}
 		spin_unlock(&parent->lli_agl_lock);
 	} else {
 		spin_unlock(&child->lli_agl_lock);
 	}
-
-	if (added > 0)
-		wake_up_process(sai->sai_agl_task);
 }
 
 /* allocate sai */
@@ -936,7 +937,6 @@ static void ll_stop_agl(struct ll_statahead_info *sai)
 
 	sai->sai_agl_task = NULL;
 	spin_lock(&plli->lli_agl_lock);
-	sai->sai_agl_valid = 0;
 	while ((clli = list_first_entry_or_null(&sai->sai_agls,
 						struct ll_inode_info,
 						lli_agl_list)) != NULL) {
@@ -967,16 +967,11 @@ static void ll_start_agl(struct dentry *parent, struct ll_statahead_info *sai)
 				      plli->lli_opendir_pid);
 	if (IS_ERR(task)) {
 		CERROR("can't start ll_agl thread, rc: %ld\n", PTR_ERR(task));
-		sai->sai_agl_valid = 0;
 		return;
 	}
 
 	sai->sai_agl_task = task;
-	LASSERT(sai->sai_agl_valid == 1);
 	atomic_inc(&ll_i2sbi(d_inode(parent))->ll_agl_total);
-	spin_lock(&plli->lli_agl_lock);
-	sai->sai_agl_valid = 1;
-	spin_unlock(&plli->lli_agl_lock);
 	/* Get an extra reference that the thread holds */
 	ll_sai_get(d_inode(parent));
 
@@ -1569,7 +1564,6 @@ static int start_statahead_thread(struct inode *dir, struct dentry *dentry,
 	}
 
 	sai->sai_ls_all = (first == LS_FIRST_DOT_DE);
-	sai->sai_agl_valid = agl;
 
 	/*
 	 * if current lli_opendir_key was deauthorized, or dir re-opened by
@@ -1643,10 +1637,11 @@ static inline bool ll_statahead_started(struct inode *dir, bool agl)
 
 	spin_lock(&lli->lli_sa_lock);
 	sai = lli->lli_sai;
-	if (sai && sai->sai_agl_valid != agl)
+	if (sai && (sai->sai_agl_task != NULL) != agl)
 		CDEBUG(D_READA,
 		       "%s: Statahead AGL hint changed from %d to %d\n",
-		       ll_i2sbi(dir)->ll_fsname, sai->sai_agl_valid, agl);
+		       ll_i2sbi(dir)->ll_fsname,
+		       sai->sai_agl_task != NULL, agl);
 	spin_unlock(&lli->lli_sa_lock);
 
 	return !!sai;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 03/28] lustre: seq_file .next functions must update *pos
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 01/28] llite: remove splice_read handling for PCC James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 02/28] lustre: llite: disable statahead_agl for sanity test_56ra James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 04/28] lustre: llite: ASSERTION( last_oap_count > 0 ) failed James Simmons
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Mr NeilBrown <neilb@suse.de>

A seq_file .next function must update *pos on EOF to a value which
will cause a subsequent ->start to also return EOF.
If it doesn't the last record of the file can be returned
twice to a 'read()'.  Also the seq_file code will generate
a warning.

This patch fixes various ->next functions to always update
*pos.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13985
Lustre-commit: 817d6c11659963 ("LU-13985 lustre: seq_file .next functions must update *pos")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/40035
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_pool.c        | 2 +-
 fs/lustre/ptlrpc/lproc_ptlrpc.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/lov/lov_pool.c b/fs/lustre/lov/lov_pool.c
index 9e937ac..f8f14f9 100644
--- a/fs/lustre/lov/lov_pool.c
+++ b/fs/lustre/lov/lov_pool.c
@@ -111,6 +111,7 @@ static void *pool_proc_next(struct seq_file *s, void *v, loff_t *pos)
 
 	LASSERTF(iter->magic == POOL_IT_MAGIC, "%08X\n", iter->magic);
 
+	(*pos)++;
 	/* test if end of file */
 	if (*pos >= pool_tgt_count(iter->pool))
 		return NULL;
@@ -122,7 +123,6 @@ static void *pool_proc_next(struct seq_file *s, void *v, loff_t *pos)
 		iter->idx = prev_idx; /* we stay on the last entry */
 		return NULL;
 	}
-	(*pos)++;
 	/* return != NULL to continue */
 	return iter;
 }
diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c
index 4d2ae14..7276f81 100644
--- a/fs/lustre/ptlrpc/lproc_ptlrpc.c
+++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c
@@ -932,6 +932,7 @@ struct ptlrpc_srh_iterator {
 	}
 
 	kfree(srhi);
+	++*pos;
 	return NULL;
 }
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 04/28] lustre: llite: ASSERTION( last_oap_count > 0 ) failed
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (2 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 03/28] lustre: seq_file .next functions must update *pos James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 05/28] lnet: o2ib: raise bind cap before resolving address James Simmons
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Andriy Skulysh <c17819@cray.com>

Punch uses o_blocks to send end of a region. So it
can be mixed with real blocks count on error.

Update blocks count only on success.

HPE-bug-id: LUS-7407
WC-bug-id: https://jira.whamcloud.com/browse/LU-13992
Lustre-commit: a56fefc535677b ("LU-13992 llite: ASSERTION( last_oap_count > 0 ) failed")
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-on: https://review.whamcloud.com/40050
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_io.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c
index 7ec059a..6121f39 100644
--- a/fs/lustre/osc/osc_io.c
+++ b/fs/lustre/osc/osc_io.c
@@ -691,14 +691,16 @@ void osc_io_setattr_end(const struct lu_env *env,
 	if (cl_io_is_trunc(io)) {
 		u64 size = io->u.ci_setattr.sa_attr.lvb_size;
 
-		cl_object_attr_lock(obj);
-		if (oa->o_valid & OBD_MD_FLBLOCKS) {
-			attr->cat_blocks = oa->o_blocks;
-			cl_valid |= CAT_BLOCKS;
-		}
+		if (result == 0) {
+			cl_object_attr_lock(obj);
+			if (oa->o_valid & OBD_MD_FLBLOCKS) {
+				attr->cat_blocks = oa->o_blocks;
+				cl_valid |= CAT_BLOCKS;
+			}
 
-		cl_object_attr_update(env, obj, attr, cl_valid);
-		cl_object_attr_unlock(obj);
+			cl_object_attr_update(env, obj, attr, cl_valid);
+			cl_object_attr_unlock(obj);
+		}
 		osc_trunc_check(env, io, oio, size);
 		osc_cache_truncate_end(env, oio->oi_trunc);
 		oio->oi_trunc = NULL;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 05/28] lnet: o2ib: raise bind cap before resolving address
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (3 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 04/28] lustre: llite: ASSERTION( last_oap_count > 0 ) failed James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 06/28] lustre: use memalloc_nofs_save() for GFP_NOFS kvmalloc allocations James Simmons
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: "John L. Hammond" <jhammond@whamcloud.com>

In kiblnd_resolve_addr(), ensure that the current task has
CAP_NET_BIND_SERVICE before calling rdma_resolve_addr() with a
protected source port.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14006
Lustre-commit: 1e4bd16acfa26a ("LU-14006 o2ib: raise bind cap before resolving address")
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40127
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 38 +++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index ba2f46f..b642162 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1219,14 +1219,17 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	spin_unlock(&conn->ibc_lock);
 }
 
-static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
-			       struct sockaddr_in *srcaddr,
-			       struct sockaddr_in *dstaddr,
-			       int timeout_ms)
+static int
+kiblnd_resolve_addr_cap(struct rdma_cm_id *cmid,
+			struct sockaddr_in *srcaddr,
+			struct sockaddr_in *dstaddr,
+			int timeout_ms)
 {
 	unsigned short port;
 	int rc;
 
+	LASSERT(capable(CAP_NET_BIND_SERVICE));
+
 	/* allow the port to be reused */
 	rc = rdma_set_reuseaddr(cmid, 1);
 	if (rc) {
@@ -1256,6 +1259,33 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
 	return rc;
 }
 
+static int
+kiblnd_resolve_addr(struct rdma_cm_id *cmid,
+		    struct sockaddr_in *srcaddr,
+		    struct sockaddr_in *dstaddr,
+		    int timeout_ms)
+{
+	const struct cred *old_creds = NULL;
+	struct cred *new_creds;
+	int rc;
+
+	if (!capable(CAP_NET_BIND_SERVICE)) {
+		new_creds = prepare_creds();
+		if (!new_creds)
+			return -ENOMEM;
+
+		cap_raise(new_creds->cap_effective, CAP_NET_BIND_SERVICE);
+		old_creds = override_creds(new_creds);
+	}
+
+	rc = kiblnd_resolve_addr_cap(cmid, srcaddr, dstaddr, timeout_ms);
+
+	if (old_creds)
+		revert_creds(old_creds);
+
+	return rc;
+}
+
 static void
 kiblnd_connect_peer(struct kib_peer_ni *peer_ni)
 {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 06/28] lustre: use memalloc_nofs_save() for GFP_NOFS kvmalloc allocations.
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (4 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 05/28] lnet: o2ib: raise bind cap before resolving address James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 07/28] lnet: o2iblnd: Don't retry indefinitely James Simmons
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.de>

The allocation of lo_sub should be GFP_NOFS as it can happen in support
of write-out, and should allow vmalloc as the array can be as much as
2000 pointers (16K).
So change it to kvmalloc_array() and use memalloc_nofs_save() as
GFP_NOFS doesn't work in kvmalloc_array().

The allocation in echo_client passes GFP_NOFS to kvmalloc_array()
which causes it to map directly to kmalloc_array().
So use memalloc_nofs_save() there too.

Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/lustre/lov/lov_object.c      | 13 +++++++------
 fs/lustre/obdecho/echo_client.c |  7 ++++++-
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c
index 7285276..0762cc5 100644
--- a/fs/lustre/lov/lov_object.c
+++ b/fs/lustre/lov/lov_object.c
@@ -38,6 +38,7 @@
 
 #define DEBUG_SUBSYSTEM S_LOV
 
+#include <linux/sched/mm.h>
 #include "lov_cl_internal.h"
 
 static inline struct lov_device *lov_object_dev(struct lov_object *obj)
@@ -207,6 +208,7 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	struct cl_object_conf *subconf = &lti->lti_stripe_conf;
 	struct lu_fid *ofid = &lti->lti_fid;
 	struct cl_object *stripe;
+	unsigned int flags;
 	int result;
 	int psz, sz;
 	int i;
@@ -214,8 +216,10 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	spin_lock_init(&r0->lo_sub_lock);
 	r0->lo_nr = lse->lsme_stripe_count;
 
-	r0->lo_sub = kcalloc(r0->lo_nr, sizeof(r0->lo_sub[0]),
-			     GFP_KERNEL);
+	flags = memalloc_nofs_save();
+	r0->lo_sub = kvmalloc_array(r0->lo_nr, sizeof(r0->lo_sub[0]),
+				    GFP_KERNEL | __GFP_ZERO);
+	memalloc_nofs_restore(flags);
 	if (!r0->lo_sub)
 		return -ENOMEM;
 
@@ -335,10 +339,7 @@ static void lov_fini_raid0(const struct lu_env *env,
 {
 	struct lov_layout_raid0 *r0 = &lle->lle_raid0;
 
-	if (r0->lo_sub) {
-		kvfree(r0->lo_sub);
-		r0->lo_sub = NULL;
-	}
+	kvfree(r0->lo_sub);
 }
 
 static int lov_print_raid0(const struct lu_env *env, void *cookie,
diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c
index a52e0362..3bee0c2 100644
--- a/fs/lustre/obdecho/echo_client.c
+++ b/fs/lustre/obdecho/echo_client.c
@@ -34,6 +34,7 @@
 #define DEBUG_SUBSYSTEM S_ECHO
 
 #include <linux/highmem.h>
+#include <linux/sched/mm.h>
 #include <obd.h>
 #include <obd_support.h>
 #include <obd_class.h>
@@ -1369,6 +1370,7 @@ static int echo_client_prep_commit(const struct lu_env *env,
 	struct niobuf_remote rnb;
 	u64 off;
 	u64 npages, tot_pages;
+	unsigned int flags;
 	int i, ret = 0, brw_flags = 0;
 
 	if (count <= 0 || (count & (~PAGE_MASK)) != 0)
@@ -1377,8 +1379,11 @@ static int echo_client_prep_commit(const struct lu_env *env,
 	npages = batch >> PAGE_SHIFT;
 	tot_pages = count >> PAGE_SHIFT;
 
+	flags = memalloc_nofs_save();
 	lnb = kvmalloc_array(npages, sizeof(*lnb),
-			     GFP_NOFS | __GFP_ZERO);
+			     GFP_KERNEL | __GFP_ZERO);
+	memalloc_nofs_restore(flags);
+
 	if (!lnb) {
 		ret = -ENOMEM;
 		goto out;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 07/28] lnet: o2iblnd: Don't retry indefinitely
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (5 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 06/28] lustre: use memalloc_nofs_save() for GFP_NOFS kvmalloc allocations James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 08/28] lustre: llite: rmdir releases inode on client James Simmons
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Amir Shehata <ashehata@whamcloud.com>

If peer is down don't retry indefinitely. Use the retry_count
parameter to restrict the number of retries. After which the
connection fails and error is propagated up.

This prevents long timeouts when mounting a file system with
nodes which might have their NIDs configured in the FS, but the
nodes have been taken offline.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13972
Lustre-commit: 7c8ad11ef08f0f ("LU-13972 o2iblnd: Don't retry indefinitely")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39981
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.h           | 2 ++
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c        | 9 +++++++++
 net/lnet/klnds/o2iblnd/o2iblnd_modparams.c | 2 +-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 9a2fb42..9b2f043 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -626,6 +626,8 @@ struct kib_peer_ni {
 	unsigned char		ibp_races;
 	/* # consecutive reconnection attempts to this peer_ni */
 	unsigned int		ibp_reconnected;
+	/* number of total active retries */
+	unsigned int		ibp_retries;
 	/* errno on closing this peer_ni */
 	int			ibp_error;
 	/* max map_on_demand */
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index b642162..9de733d 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -2181,6 +2181,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	/* connection established */
 	write_lock_irqsave(&kiblnd_data.kib_global_lock, flags);
 
+	/* reset retry count */
+	peer_ni->ibp_retries = 0;
+
 	conn->ibc_last_send = ktime_get();
 	kiblnd_set_conn_state(conn, IBLND_CONN_ESTABLISHED);
 	kiblnd_peer_alive(peer_ni);
@@ -2631,6 +2634,11 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		goto out;
 	}
 
+	if (peer_ni->ibp_retries > *kiblnd_tunables.kib_retry_count) {
+		reason = "retry count exceeded due to no listener";
+		goto out;
+	}
+
 	switch (why) {
 	default:
 		reason = "Unknown";
@@ -2688,6 +2696,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		break;
 
 	case IBLND_REJECT_INVALID_SRV_ID:
+		peer_ni->ibp_retries++;
 		reason = "invalid service id";
 		break;
 	}
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/net/lnet/klnds/o2iblnd/o2iblnd_modparams.c
index 73ad22d..029c9fb 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_modparams.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_modparams.c
@@ -93,7 +93,7 @@
 
 static int retry_count = 5;
 module_param(retry_count, int, 0644);
-MODULE_PARM_DESC(retry_count, "Retransmissions when no ACK received");
+MODULE_PARM_DESC(retry_count, "Number of times to retry connection operations");
 
 static int rnr_retry_count = 6;
 module_param(rnr_retry_count, int, 0644);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 08/28] lustre: llite: rmdir releases inode on client
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (6 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 07/28] lnet: o2iblnd: Don't retry indefinitely James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 09/28] lustre: gss: update sequence in case of target disconnect James Simmons
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Lai Siyao <lai.siyao@whamcloud.com>

Same as file unlink, rmdir should release inode on client, to achieve
this, ll_rmdir() update inode i_nlink after rmdir, then the last
iput() will release the inode.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13983
Lustre-commit: 4a4794364eb05f ("LU-13983 llite: rmdir releases inode on client")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40011
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/namei.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index f9c10d0..da6b729 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -1659,12 +1659,23 @@ static int ll_rmdir(struct inode *dir, struct dentry *dchild)
 	rc = md_unlink(ll_i2sbi(dir)->ll_md_exp, op_data, &request);
 	ll_finish_md_op_data(op_data);
 	if (rc == 0) {
+		struct mdt_body *body;
+
 		ll_update_times(request, dir);
 		ll_stats_ops_tally(ll_i2sbi(dir), LPROC_LL_RMDIR,
 				   ktime_us_delta(ktime_get(), kstart));
+		/*
+		 * The server puts attributes in on the last unlink, use them
+		 * to update the link count so the inode can be freed
+		 * immediately.
+		 */
+		body = req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY);
+		if (body->mbo_valid & OBD_MD_FLNLINK)
+			set_nlink(dchild->d_inode, body->mbo_nlink);
 	}
 
 	ptlrpc_req_finished(request);
+
 	return rc;
 }
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 09/28] lustre: gss: update sequence in case of target disconnect
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (7 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 08/28] lustre: llite: rmdir releases inode on client James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 10/28] lustre: lov: doesn't check lov_refcount James Simmons
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Sebastien Buisson <sbuisson@ddn.com>

Client to OST connections can go idle, leading to target disconnect.
In this event, maintaining correct sequence number ensures that GSS
does not erroneously consider requests as replays.
Sequence is normally updated on export destroy, but this can occur too
late, ie after a new target connect request has been processed. So
explicitly update sec context at disconnect time.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13498
Lustre-commit: 1275857c178fdf ("LU-13498 gss: update sequence in case of target disconnect")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/40122
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/sec.c      | 4 ++--
 fs/lustre/ptlrpc/sec_null.c | 8 --------
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/lustre/ptlrpc/sec.c b/fs/lustre/ptlrpc/sec.c
index ca8a646..44c15e6 100644
--- a/fs/lustre/ptlrpc/sec.c
+++ b/fs/lustre/ptlrpc/sec.c
@@ -626,8 +626,8 @@ int sptlrpc_req_refresh_ctx(struct ptlrpc_request *req, long timeout)
 		return 0;
 
 	if (unlikely(test_bit(PTLRPC_CTX_NEW_BIT, &ctx->cc_flags))) {
-		LASSERT(ctx->cc_ops->refresh);
-		ctx->cc_ops->refresh(ctx);
+		if (ctx->cc_ops->refresh)
+			ctx->cc_ops->refresh(ctx);
 	}
 	LASSERT(test_bit(PTLRPC_CTX_NEW_BIT, &ctx->cc_flags) == 0);
 
diff --git a/fs/lustre/ptlrpc/sec_null.c b/fs/lustre/ptlrpc/sec_null.c
index 14058bf..97c4e19 100644
--- a/fs/lustre/ptlrpc/sec_null.c
+++ b/fs/lustre/ptlrpc/sec_null.c
@@ -66,13 +66,6 @@ enum lustre_sec_part null_decode_sec_part(struct lustre_msg *msg)
 	return (msg->lm_secflvr >> 24) & 0xFF;
 }
 
-static int null_ctx_refresh(struct ptlrpc_cli_ctx *ctx)
-{
-	/* should never reach here */
-	LBUG();
-	return 0;
-}
-
 static
 int null_ctx_sign(struct ptlrpc_cli_ctx *ctx, struct ptlrpc_request *req)
 {
@@ -374,7 +367,6 @@ int null_authorize(struct ptlrpc_request *req)
 }
 
 static struct ptlrpc_ctx_ops null_ctx_ops = {
-	.refresh		= null_ctx_refresh,
 	.sign			= null_ctx_sign,
 	.verify			= null_ctx_verify,
 };
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 10/28] lustre: lov: doesn't check lov_refcount
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (8 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 09/28] lustre: gss: update sequence in case of target disconnect James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 11/28] lustre: ptlrpc: remove unused code at pinger James Simmons
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Hongchao Zhang <hongchao@whamcloud.com>

In lov_cleanup, the check of each OSC is protected by
lov_tgt_getrefs, which will increment the "lov_refcount",
so the "lov_refcount" shouldn't be checked inside because
it is always larger than 0.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13719
Lustre-commit: 6ae92a7f1bd94c ("LU-13719 lov: doesn't check lov_refcount")
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39702
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_obd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c
index d88d325..c8654bd 100644
--- a/fs/lustre/lov/lov_obd.c
+++ b/fs/lustre/lov/lov_obd.c
@@ -820,8 +820,7 @@ static int lov_cleanup(struct obd_device *obd)
 				continue;
 
 			/* Inactive targets may never have connected */
-			if (lov->lov_tgts[i]->ltd_active ||
-			    atomic_read(&lov->lov_refcount))
+			if (lov->lov_tgts[i]->ltd_active)
 			    /* We should never get here - these
 			     * should have been removed in the
 			     * disconnect.
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 11/28] lustre: ptlrpc: remove unused code at pinger
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (9 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 10/28] lustre: lov: doesn't check lov_refcount James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 12/28] lustre: mdc: remote object support getattr from cache James Simmons
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Alexander Boyko <alexander.boyko@hpe.com>

The timeout_list was previously used for grant shrinking,
but right now is dead code.

HPE-bug-id: LUS-8520
Fixes: abc88e83673c ("lustre: osc: depart grant shrinking from pinger")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14031
Lustre-commit: f022663059414 ("LU-14031 ptlrpc: remove unused code at pinger")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/40243
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_net.h |   5 --
 fs/lustre/ptlrpc/pinger.c      | 137 -----------------------------------------
 2 files changed, 142 deletions(-)

diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index 1e7fe03..61be05c 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -2369,11 +2369,6 @@ enum timeout_event {
 typedef int (*timeout_cb_t)(struct timeout_item *, void *);
 int ptlrpc_pinger_add_import(struct obd_import *imp);
 int ptlrpc_pinger_del_import(struct obd_import *imp);
-int ptlrpc_add_timeout_client(time64_t time, enum timeout_event event,
-			      timeout_cb_t cb, void *data,
-			      struct list_head *obd_list);
-int ptlrpc_del_timeout_client(struct list_head *obd_list,
-			      enum timeout_event event);
 struct ptlrpc_request *ptlrpc_prep_ping(struct obd_import *imp);
 int ptlrpc_obd_ping(struct obd_device *obd);
 void ptlrpc_pinger_ir_up(void);
diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c
index 9f57c61..e23ba3c 100644
--- a/fs/lustre/ptlrpc/pinger.c
+++ b/fs/lustre/ptlrpc/pinger.c
@@ -51,7 +51,6 @@
 
 struct mutex pinger_mutex;
 static LIST_HEAD(pinger_imports);
-static LIST_HEAD(timeout_list);
 
 struct ptlrpc_request *
 ptlrpc_prep_ping(struct obd_import *imp)
@@ -162,20 +161,8 @@ static inline time64_t ptlrpc_next_reconnect(struct obd_import *imp)
 
 static time64_t pinger_check_timeout(time64_t time)
 {
-	struct timeout_item *item;
 	time64_t timeout = PING_INTERVAL;
 
-	/* This list is sorted in increasing timeout order */
-	mutex_lock(&pinger_mutex);
-	list_for_each_entry(item, &timeout_list, ti_chain) {
-		time64_t ti_timeout = item->ti_timeout;
-
-		if (timeout > ti_timeout)
-			timeout = ti_timeout;
-		break;
-	}
-	mutex_unlock(&pinger_mutex);
-
 	return time + timeout - ktime_get_seconds();
 }
 
@@ -259,16 +246,12 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp,
 static void ptlrpc_pinger_main(struct work_struct *ws)
 {
 	time64_t this_ping, time_after_ping, time_to_next_wake;
-	struct timeout_item *item;
 	struct obd_import *imp;
 
 	do {
 		this_ping = ktime_get_seconds();
 
 		mutex_lock(&pinger_mutex);
-		list_for_each_entry(item, &timeout_list, ti_chain) {
-			item->ti_cb(item, item->ti_cb_data);
-		}
 		list_for_each_entry(imp, &pinger_imports, imp_pinger_chain) {
 			ptlrpc_pinger_process_import(imp, this_ping);
 			/* obd_timeout might have changed */
@@ -323,15 +306,12 @@ int ptlrpc_start_pinger(void)
 	return 0;
 }
 
-static int ptlrpc_pinger_remove_timeouts(void);
-
 int ptlrpc_stop_pinger(void)
 {
 #ifdef CONFIG_LUSTRE_FS_PINGER
 	if (!pinger_wq)
 		return -EALREADY;
 
-	ptlrpc_pinger_remove_timeouts();
 	cancel_delayed_work_sync(&ping_work);
 	destroy_workqueue(pinger_wq);
 	pinger_wq = NULL;
@@ -398,123 +378,6 @@ int ptlrpc_pinger_del_import(struct obd_import *imp)
 }
 EXPORT_SYMBOL(ptlrpc_pinger_del_import);
 
-/**
- * Register a timeout callback to the pinger list, and the callback will
- * be called when timeout happens.
- */
-static struct timeout_item *ptlrpc_new_timeout(time64_t time,
-					       enum timeout_event event,
-					       timeout_cb_t cb, void *data)
-{
-	struct timeout_item *ti;
-
-	ti = kzalloc(sizeof(*ti), GFP_NOFS);
-	if (!ti)
-		return NULL;
-
-	INIT_LIST_HEAD(&ti->ti_obd_list);
-	INIT_LIST_HEAD(&ti->ti_chain);
-	ti->ti_timeout = time;
-	ti->ti_event = event;
-	ti->ti_cb = cb;
-	ti->ti_cb_data = data;
-
-	return ti;
-}
-
-/**
- * Register timeout event on the pinger thread.
- * Note: the timeout list is an sorted list with increased timeout value.
- */
-static struct timeout_item*
-ptlrpc_pinger_register_timeout(time64_t time, enum timeout_event event,
-			       timeout_cb_t cb, void *data)
-{
-	struct timeout_item *item, *tmp;
-
-	LASSERT(mutex_is_locked(&pinger_mutex));
-
-	list_for_each_entry(item, &timeout_list, ti_chain)
-		if (item->ti_event == event)
-			goto out;
-
-	item = ptlrpc_new_timeout(time, event, cb, data);
-	if (item) {
-		list_for_each_entry_reverse(tmp, &timeout_list, ti_chain) {
-			if (tmp->ti_timeout < time) {
-				list_add(&item->ti_chain, &tmp->ti_chain);
-				goto out;
-			}
-		}
-		list_add(&item->ti_chain, &timeout_list);
-	}
-out:
-	return item;
-}
-
-/* Add a client_obd to the timeout event list, when timeout(@time)
- * happens, the callback(@cb) will be called.
- */
-int ptlrpc_add_timeout_client(time64_t time, enum timeout_event event,
-			      timeout_cb_t cb, void *data,
-			      struct list_head *obd_list)
-{
-	struct timeout_item *ti;
-
-	mutex_lock(&pinger_mutex);
-	ti = ptlrpc_pinger_register_timeout(time, event, cb, data);
-	if (!ti) {
-		mutex_unlock(&pinger_mutex);
-		return -EINVAL;
-	}
-	list_add(obd_list, &ti->ti_obd_list);
-	mutex_unlock(&pinger_mutex);
-	return 0;
-}
-EXPORT_SYMBOL(ptlrpc_add_timeout_client);
-
-int ptlrpc_del_timeout_client(struct list_head *obd_list,
-			      enum timeout_event event)
-{
-	struct timeout_item *ti = NULL, *item;
-
-	if (list_empty(obd_list))
-		return 0;
-	mutex_lock(&pinger_mutex);
-	list_del_init(obd_list);
-	/**
-	 * If there are no obd attached to the timeout event
-	 * list, remove this timeout event from the pinger
-	 */
-	list_for_each_entry(item, &timeout_list, ti_chain) {
-		if (item->ti_event == event) {
-			ti = item;
-			break;
-		}
-	}
-	if (list_empty(&ti->ti_obd_list)) {
-		list_del(&ti->ti_chain);
-		kfree(ti);
-	}
-	mutex_unlock(&pinger_mutex);
-	return 0;
-}
-EXPORT_SYMBOL(ptlrpc_del_timeout_client);
-
-static int ptlrpc_pinger_remove_timeouts(void)
-{
-	struct timeout_item *item, *tmp;
-
-	mutex_lock(&pinger_mutex);
-	list_for_each_entry_safe(item, tmp, &timeout_list, ti_chain) {
-		LASSERT(list_empty(&item->ti_obd_list));
-		list_del(&item->ti_chain);
-		kfree(item);
-	}
-	mutex_unlock(&pinger_mutex);
-	return 0;
-}
-
 void ptlrpc_pinger_wake_up(void)
 {
 #ifdef CONFIG_LUSTRE_FS_PINGER
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 12/28] lustre: mdc: remote object support getattr from cache
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (10 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 11/28] lustre: ptlrpc: remove unused code at pinger James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 13/28] lustre: llite: pass name in getattr by FID James Simmons
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Lai Siyao <lai.siyao@whamcloud.com>

For historical reason, IT_GETATTR lock revalidate matches
LOOKUP|UPDATE|PERM lock bits because for MDS < 2.4, permission is
protected by LOOKUP lock, but this will cause remote object not
able to match the cached lock because LOOKUP and UPDATE lock are
fetched separately.

Add sanity 803b, and rename 803 to 803a.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13437
Lustre-commit: 72a1ca996e3a35 ("LU-13437 mdc: remote object support getattr from cache")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40218
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/mdc_locks.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index 72ee070..a82e8ca 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -1235,7 +1235,6 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
 			 * by LOOKUP lock, so it needs to match all bits here.
 			 */
 			policy.l_inodebits.bits = MDS_INODELOCK_UPDATE |
-						  MDS_INODELOCK_LOOKUP |
 						  MDS_INODELOCK_PERM;
 			break;
 		case IT_READDIR:
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 13/28] lustre: llite: pass name in getattr by FID
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (11 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 12/28] lustre: mdc: remote object support getattr from cache James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 14/28] lnet: o2iblnd: 'Timed out tx' error message James Simmons
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Lai Siyao <lai.siyao@whamcloud.com>

Now parent FID is packed in getattr_by_FID request
(see https://review.whamcloud.com/39290), it should also pass in name
from llite, so that lmv can replace fid1 with stripe FID, otherwise
MDS may treat sub files under striped directory as remote object.

Note, the name is not packed in request, because if it's packed, MDS
will getattr by name instead of FID.

Fixes: 3a3a9eeaa ("lustre: llite: pack parent FID in getattr")
WC-bug-id: https://jira.whamcloud.com/browse/LU-13437
Lustre-commit: 90ebab5833007d ("LU-13437 llite: pass name in getattr by FID")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40219
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd.h     |  1 +
 fs/lustre/llite/file.c      | 15 +++++++++++----
 fs/lustre/llite/llite_lib.c |  4 +++-
 fs/lustre/lmv/lmv_intent.c  | 22 +++++++++++++++++++---
 fs/lustre/lmv/lmv_obd.c     |  8 ++++++--
 5 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 39e3d51..a017997 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -704,6 +704,7 @@ enum md_op_flags {
 	MF_MDC_CANCEL_FID3	= BIT(2),
 	MF_MDC_CANCEL_FID4	= BIT(3),
 	MF_GET_MDT_IDX		= BIT(4),
+	MF_GETATTR_BY_FID	= BIT(5),
 };
 
 enum md_cli_flags {
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 0a9a689..3ba9152 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -4587,22 +4587,29 @@ static int ll_inode_revalidate(struct dentry *dentry, enum ldlm_intent_flags op)
 	};
 	struct ptlrpc_request *req = NULL;
 	struct md_op_data *op_data;
+	const char *name = NULL;
+	size_t namelen = 0;
 	int rc = 0;
 
 	CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p),name=%pd\n",
 	       PFID(ll_inode2fid(inode)), inode, dentry);
 
-	if (exp_connect_flags2(exp) & OBD_CONNECT2_GETATTR_PFID)
+	if (exp_connect_flags2(exp) & OBD_CONNECT2_GETATTR_PFID) {
 		parent = dentry->d_parent->d_inode;
-	else
+		name = dentry->d_name.name;
+		namelen = dentry->d_name.len;
+	} else {
 		parent = inode;
+	}
 
-	/* Call getattr by fid, so do not provide name at all. */
-	op_data = ll_prep_md_op_data(NULL, parent, inode, NULL, 0, 0,
+	op_data = ll_prep_md_op_data(NULL, parent, inode, name, namelen, 0,
 				     LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
+	/* Call getattr by fid */
+	if (exp_connect_flags2(exp) & OBD_CONNECT2_GETATTR_PFID)
+		op_data->op_flags = MF_GETATTR_BY_FID;
 	rc = md_intent_lock(exp, op_data, &oit, &req, &ll_md_blocking_ast, 0);
 	ll_finish_md_op_data(op_data);
 	if (rc < 0) {
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 8ef2437..d94c6ca 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2811,7 +2811,9 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 		if (namelen > ll_i2sbi(i1)->ll_namelen)
 			return ERR_PTR(-ENAMETOOLONG);
 
-		if (!lu_name_is_valid_2(name, namelen))
+		/* "/" is not valid name, but it's allowed */
+		if (!lu_name_is_valid_2(name, namelen) &&
+		    strncmp("/", name, namelen) != 0)
 			return ERR_PTR(-EINVAL);
 	}
 
diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c
index 3960b93..ad59b64 100644
--- a/fs/lustre/lmv/lmv_intent.c
+++ b/fs/lustre/lmv/lmv_intent.c
@@ -448,13 +448,29 @@ static int lmv_intent_lookup(struct obd_export *exp,
 	}
 
 retry:
-	if (op_data->op_name) {
+	if (op_data->op_flags & MF_GETATTR_BY_FID) {
+		/* getattr by FID, replace fid1 with stripe FID */
+		LASSERT(op_data->op_name);
+		tgt = lmv_locate_tgt(lmv, op_data);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+
+		/* name is used to locate stripe target, clear it here
+		 * to avoid packing name in request, so that MDS knows
+		 * it's getattr by FID.
+		 */
+		op_data->op_name = NULL;
+		op_data->op_namelen = 0;
+
+		/* getattr request is sent to MDT where fid2 inode is */
+		tgt = lmv_fid2tgt(lmv, &op_data->op_fid2);
+	} else if (op_data->op_name) {
+		/* getattr by name */
 		tgt = lmv_locate_tgt(lmv, op_data);
 		if (!fid_is_sane(&op_data->op_fid2))
 			fid_zero(&op_data->op_fid2);
-	} else if (fid_is_sane(&op_data->op_fid2)) {
-		tgt = lmv_fid2tgt(lmv, &op_data->op_fid2);
 	} else {
+		/* old way to getattr by FID, parent FID not packed */
 		tgt = lmv_fid2tgt(lmv, &op_data->op_fid1);
 	}
 	if (IS_ERR(tgt))
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 5a75c69..fa1dae5 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1513,6 +1513,9 @@ static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt)
 	return ERR_PTR(-ENODEV);
 }
 
+/* locate MDT by file name, for striped directory, the file name hash decides
+ * which stripe its dirent is stored.
+ */
 static struct lmv_tgt_desc *
 lmv_locate_tgt_by_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm,
 		       const char *name, int namelen, struct lu_fid *fid,
@@ -1564,8 +1567,9 @@ static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt)
  * For plain direcotry, it just locate the MDT of op_data->op_fid1.
  *
  * @lmv:	LMV device
- * @op_data:	client MD stack parameters, name, namelen
- *		mds_num etc.
+ * @op_data:	client MD stack parameters, name, namelen etc,
+ *		op_mds and op_fid1 will be updated if op_mea1
+ *		indicates fid1 represents a striped directory.
  *
  * Returns:	pointer to the lmv_tgt_desc if succeed.
  *		ERR_PTR(errno) if failed.
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 14/28] lnet: o2iblnd: 'Timed out tx' error message
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (12 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 13/28] lustre: llite: pass name in getattr by FID James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 15/28] lustre: ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait James Simmons
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Brian Behlendorf <behlendorf1@llnl.gov>

Trivial fix to report the total RDMA time outstanding rather
than the number of seconds past the deadline.

WC-bug-id: https://jira.whamcloud.com/browse/LU-1742
Lustre-commit: 2be289a2b1f12b ("LU-1742 o2iblnd: 'Timed out tx' error message")
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/3622
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 9de733d..3d7026b 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -3219,6 +3219,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		if (ktime_compare(ktime_get(), tx->tx_deadline) >= 0) {
 			CERROR("Timed out tx: %s, %lld seconds\n",
 			       kiblnd_queue2str(conn, txs),
+			       kiblnd_timeout() +
 			       ktime_ms_delta(ktime_get(),
 					      tx->tx_deadline) / MSEC_PER_SEC);
 			return 1;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 15/28] lustre: ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (13 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 14/28] lnet: o2iblnd: 'Timed out tx' error message James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 16/28] lustre: ldlm: group locks for DOM IBIT lock James Simmons
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Oleg Drokin <green@whamcloud.com>

in ldlm_handle_cp_callback the while loop is clearly supposed
to be limited by the "to" value of 1 second, but is not.
Seems to have been broken by all the Solaris porting in HEAD
all the way back in 2008.
Restore the to assignment to make it not hang indefinitely.

Fixes: adde80ff ("Land b_head_libcfs onto OpenSFS tree HEAD")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14069
Lustre-commit: 5da99051e58b9e ("LU-14069 ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40375
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_lockd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c
index 4a91a7f..1ae65b829 100644
--- a/fs/lustre/ldlm/ldlm_lockd.c
+++ b/fs/lustre/ldlm/ldlm_lockd.c
@@ -223,7 +223,7 @@ static int ldlm_handle_cp_callback(struct ptlrpc_request *req,
 		ldlm_callback_reply(req, 0);
 
 		while (to > 0) {
-			schedule_timeout_interruptible(to);
+			to = schedule_timeout_interruptible(to);
 			if (ldlm_is_granted(lock) ||
 			    ldlm_is_destroyed(lock))
 				break;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 16/28] lustre: ldlm: group locks for DOM IBIT lock
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (14 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 15/28] lustre: ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 17/28] lustre: ptlrpc: decrease time between reconnection James Simmons
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Vitaly Fertman <c17818@cray.com>

Group lock is supposed to be taken on such operations as layout swap
used for e.g. HSM, and is to be taken for DOM locks as well.

HPE-bug-id: LUS-8987
WC-bug-id: https://jira.whamcloud.com/browse/LU-13645
Lustre-commit: 06740440363424 (LU-13645 ldlm: group locks for DOM IBIT lock")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/39406
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_inodebits.c        |  2 ++
 fs/lustre/ldlm/ldlm_lock.c             | 12 ++++++++++--
 fs/lustre/mdc/mdc_dev.c                | 10 ++++++++--
 fs/lustre/ptlrpc/wiretest.c            |  6 +++++-
 include/uapi/linux/lustre/lustre_idl.h |  1 +
 5 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_inodebits.c b/fs/lustre/ldlm/ldlm_inodebits.c
index 2288eb5..a367ff1 100644
--- a/fs/lustre/ldlm/ldlm_inodebits.c
+++ b/fs/lustre/ldlm/ldlm_inodebits.c
@@ -59,6 +59,7 @@ void ldlm_ibits_policy_wire_to_local(const union ldlm_wire_policy_data *wpolicy,
 				     union ldlm_policy_data *lpolicy)
 {
 	lpolicy->l_inodebits.bits = wpolicy->l_inodebits.bits;
+	lpolicy->l_inodebits.li_gid = wpolicy->l_inodebits.li_gid;
 }
 
 void ldlm_ibits_policy_local_to_wire(const union ldlm_policy_data *lpolicy,
@@ -66,6 +67,7 @@ void ldlm_ibits_policy_local_to_wire(const union ldlm_policy_data *lpolicy,
 {
 	memset(wpolicy, 0, sizeof(*wpolicy));
 	wpolicy->l_inodebits.bits = lpolicy->l_inodebits.bits;
+	wpolicy->l_inodebits.li_gid = lpolicy->l_inodebits.li_gid;
 }
 
 /**
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index 0dbd4f3..56f1550 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -1111,6 +1111,12 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata)
 		     data->lmd_policy->l_inodebits.bits) !=
 		    data->lmd_policy->l_inodebits.bits)
 			return false;
+
+		if (unlikely(match == LCK_GROUP) &&
+		    data->lmd_policy->l_inodebits.li_gid != LDLM_GID_ANY &&
+		    lpol->l_inodebits.li_gid !=
+		    data->lmd_policy->l_inodebits.li_gid)
+			return false;
 		break;
 	default:
 		break;
@@ -1903,7 +1909,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 	switch (resource->lr_type) {
 	case LDLM_EXTENT:
 		libcfs_debug_msg(msgdata,
-				 "%pV ns: %s lock: %p/%#llx lrc: %d/%d,%d mode: %s/%s res: " DLDLMRES " rrc: %d type: %s [%llu->%llu] (req %llu->%llu) flags: %#llx nid: %s remote: %#llx expref: %d pid: %u timeout: %lld lvb_type: %d\n",
+				 "%pV ns: %s lock: %p/%#llx lrc: %d/%d,%d mode: %s/%s res: " DLDLMRES " rrc: %d type: %s [%llu->%llu] (req %llu->%llu) gid %llu flags: %#llx nid: %s remote: %#llx expref: %d pid: %u timeout: %lld lvb_type: %d\n",
 				 &vaf,
 				 ldlm_lock_to_ns_name(lock), lock,
 				 lock->l_handle.h_cookie,
@@ -1918,6 +1924,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 				 lock->l_policy_data.l_extent.end,
 				 lock->l_req_extent.start,
 				 lock->l_req_extent.end,
+				 lock->l_req_extent.gid,
 				 lock->l_flags, nid,
 				 lock->l_remote_handle.cookie,
 				 exp ? refcount_read(&exp->exp_handle.h_ref) : -99,
@@ -1949,7 +1956,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 
 	case LDLM_IBITS:
 		libcfs_debug_msg(msgdata,
-				 "%pV ns: %s lock: %p/%#llx lrc: %d/%d,%d mode: %s/%s res: " DLDLMRES " bits %#llx rrc: %d type: %s flags: %#llx nid: %s remote: %#llx expref: %d pid: %u timeout: %lld lvb_type: %d\n",
+				 "%pV ns: %s lock: %p/%#llx lrc: %d/%d,%d mode: %s/%s res: " DLDLMRES " bits %#llx rrc: %d type: %s gid %llu flags: %#llx nid: %s remote: %#llx expref: %d pid: %u timeout: %lld lvb_type: %d\n",
 				 &vaf,
 				 ldlm_lock_to_ns_name(lock),
 				 lock, lock->l_handle.h_cookie,
@@ -1961,6 +1968,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 				 lock->l_policy_data.l_inodebits.bits,
 				 atomic_read(&resource->lr_refcount),
 				 ldlm_typename[resource->lr_type],
+				 lock->l_policy_data.l_inodebits.li_gid,
 				 lock->l_flags, nid,
 				 lock->l_remote_handle.cookie,
 				 exp ? refcount_read(&exp->exp_handle.h_ref) : -99,
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index 329371b..90b60f5 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -40,10 +40,13 @@
 #include "mdc_internal.h"
 
 static void mdc_lock_build_policy(const struct lu_env *env,
+				  const struct cl_lock *lock,
 				  union ldlm_policy_data *policy)
 {
 	memset(policy, 0, sizeof(*policy));
 	policy->l_inodebits.bits = MDS_INODELOCK_DOM;
+	if (lock)
+		policy->l_inodebits.li_gid = lock->cll_descr.cld_gid;
 }
 
 int mdc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data)
@@ -144,7 +147,8 @@ struct ldlm_lock *mdc_dlmlock_at_pgoff(const struct lu_env *env,
 	enum ldlm_match_flags match_flags = 0;
 
 	fid_build_reg_res_name(lu_object_fid(osc2lu(obj)), resname);
-	mdc_lock_build_policy(env, policy);
+	mdc_lock_build_policy(env, NULL, policy);
+	policy->l_inodebits.li_gid = LDLM_GID_ANY;
 
 	flags = LDLM_FL_BLOCK_GRANTED | LDLM_FL_CBPENDING;
 	if (dap_flags & OSC_DAP_FL_TEST_LOCK)
@@ -867,7 +871,7 @@ static int mdc_lock_enqueue(const struct lu_env *env,
 	 * osc_lock.
 	 */
 	fid_build_reg_res_name(lu_object_fid(osc2lu(osc)), resname);
-	mdc_lock_build_policy(env, policy);
+	mdc_lock_build_policy(env, lock, policy);
 	LASSERT(!oscl->ols_speculative);
 	result = mdc_enqueue_send(env, osc_export(osc), resname,
 				  &oscl->ols_flags, policy,
@@ -931,6 +935,8 @@ int mdc_lock_init(const struct lu_env *env, struct cl_object *obj,
 
 	ols->ols_flags = flags;
 	ols->ols_speculative = !!(enqflags & CEF_SPECULATIVE);
+	if (lock->cll_descr.cld_mode == CLM_GROUP)
+		ols->ols_flags |= LDLM_FL_ATOMIC_CB;
 
 	if (ols->ols_flags & LDLM_FL_HAS_INTENT) {
 		ols->ols_flags |= LDLM_FL_BLOCK_GRANTED;
diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index 556aaff..9b1caf4 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -3249,12 +3249,16 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)sizeof(((struct ldlm_extent *)0)->gid));
 
 	/* Checks for struct ldlm_inodebits */
-	LASSERTF((int)sizeof(struct ldlm_inodebits) == 16, "found %lld\n",
+	LASSERTF((int)sizeof(struct ldlm_inodebits) == 24, "found %lld\n",
 		 (long long)(int)sizeof(struct ldlm_inodebits));
 	LASSERTF((int)offsetof(struct ldlm_inodebits, bits) == 0, "found %lld\n",
 		 (long long)(int)offsetof(struct ldlm_inodebits, bits));
 	LASSERTF((int)sizeof(((struct ldlm_inodebits *)0)->bits) == 8, "found %lld\n",
 		 (long long)(int)sizeof(((struct ldlm_inodebits *)0)->bits));
+	LASSERTF((int)offsetof(struct ldlm_inodebits, li_gid) == 16, "found %lld\n",
+		 (long long)(int)offsetof(struct ldlm_inodebits, li_gid));
+	LASSERTF((int)sizeof(((struct ldlm_inodebits *)0)->li_gid) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct ldlm_inodebits *)0)->li_gid));
 
 	/* Checks for struct ldlm_flock_wire */
 	LASSERTF((int)sizeof(struct ldlm_flock_wire) == 32, "found %lld\n",
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index fda56d8..34b2367 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -2206,6 +2206,7 @@ static inline bool ldlm_extent_equal(const struct ldlm_extent *ex1,
 struct ldlm_inodebits {
 	__u64 bits;
 	__u64 cancel_bits; /* for lock convert */
+	__u64 li_gid;
 };
 
 struct ldlm_flock_wire {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 17/28] lustre: ptlrpc: decrease time between reconnection
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (15 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 16/28] lustre: ldlm: group locks for DOM IBIT lock James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 18/28] lustre: ptlrpc: throttle RPC resend if network error James Simmons
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Alexander Boyko <alexander.boyko@hpe.com>

When a connection get a timeout or get an error reply from a sever,
the next attempt happens after PING_INTERVAL. It is equal to
obd_timeout/4. When a first reconnection fails, a second go to
failover pair. And a third connection go to a original server.
Only 3 reconnection before server evicts client base on blocking
ast timeout. Some times a first failed and the last is a bit late,
so client is evicted. It is better to try reconnect with a timeout
equal to a connection request deadline, it would increase a number
of attempts in 5 times for a large obd_timeout. For example,
    obd_timeout=200
     - [ 1597902357, CONNECTING ]
     - [ 1597902357, FULL ]
     - [ 1597902422, DISCONN ]
     - [ 1597902422, CONNECTING ]
     - [ 1597902433, DISCONN ]
     - [ 1597902473, CONNECTING ]
     - [ 1597902473, DISCONN ] <- ENODEV from a failover pair
     - [ 1597902523, CONNECTING ]
     - [ 1597902539, DISCONN ]

The patch adds a logic to wakeup pinger for failed connection request
with ETIMEDOUT or ENODEV. It adds imp_next_ping processing for
ptlrpc_pinger_main() time_to_next_wake calculation, and fixes setting
of imp_next_ping value.

HPE-bug-id: LUS-8520
WC-bug-id: https://jira.whamcloud.com/browse/LU-14031
Lustre-commit: de8ed5f19f0413 ("LU-14031 ptlrpc: decrease time between reconnection")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/40244
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/events.c |  5 ++++
 fs/lustre/ptlrpc/import.c | 36 ++++++++++++++++++++++-
 fs/lustre/ptlrpc/niobuf.c |  2 --
 fs/lustre/ptlrpc/pinger.c | 73 ++++++++++++++++++++++++++++++-----------------
 4 files changed, 87 insertions(+), 29 deletions(-)

diff --git a/fs/lustre/ptlrpc/events.c b/fs/lustre/ptlrpc/events.c
index 0943612..fe33600 100644
--- a/fs/lustre/ptlrpc/events.c
+++ b/fs/lustre/ptlrpc/events.c
@@ -59,6 +59,11 @@ void request_out_callback(struct lnet_event *ev)
 
 	DEBUG_REQ(D_NET, req, "type %d, status %d", ev->type, ev->status);
 
+	/* Do not update imp_next_ping for connection request */
+	if (lustre_msg_get_opc(req->rq_reqmsg) !=
+	    req->rq_import->imp_connect_op)
+		ptlrpc_pinger_sending_on_import(req->rq_import);
+
 	sptlrpc_request_out_callback(req);
 
 	spin_lock(&req->rq_lock);
diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c
index 4e573cd..21ce593 100644
--- a/fs/lustre/ptlrpc/import.c
+++ b/fs/lustre/ptlrpc/import.c
@@ -1037,7 +1037,6 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
 		 */
 		imp->imp_force_reconnect = ptlrpc_busy_reconnect(rc);
 		spin_unlock(&imp->imp_lock);
-		ptlrpc_maybe_ping_import_soon(imp);
 		goto out;
 	}
 
@@ -1303,6 +1302,8 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
 
 	if (rc) {
 		bool inact = false;
+		time64_t now = ktime_get_seconds();
+		time64_t next_connect;
 
 		import_set_state_nolock(imp, LUSTRE_IMP_DISCON);
 		if (rc == -EACCES) {
@@ -1344,7 +1345,28 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
 				import_set_state_nolock(imp, LUSTRE_IMP_CLOSED);
 				inact = true;
 			}
+		} else if (rc == -ENODEV || rc == -ETIMEDOUT) {
+			/* ENODEV means there is no service, force reconnection
+			 * to a pair if attempt happen ptlrpc_next_reconnect
+			 * before now. ETIMEDOUT could be set during network
+			 * error and do not guarantee request deadline happened.
+			 */
+			struct obd_import_conn *conn;
+			time64_t reconnect_time;
+
+			/* Same as ptlrpc_next_reconnect, but in past */
+			reconnect_time = now - INITIAL_CONNECT_TIMEOUT;
+			list_for_each_entry(conn, &imp->imp_conn_list,
+					    oic_item) {
+				if (conn->oic_last_attempt <= reconnect_time) {
+					imp->imp_force_verify = 1;
+					break;
+				}
+			}
 		}
+
+		next_connect = imp->imp_conn_current->oic_last_attempt +
+			       (request->rq_deadline - request->rq_sent);
 		spin_unlock(&imp->imp_lock);
 
 		if (inact)
@@ -1353,6 +1375,18 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
 		if (rc == -EPROTO)
 			return rc;
 
+		/* adjust imp_next_ping to request deadline + 1 and reschedule
+		 * a pinger if import lost processing during CONNECTING or far
+		 * away from request deadline. It could happen when connection
+		 * was initiated outside of pinger, like
+		 * ptlrpc_set_import_discon().
+		 */
+		if (!imp->imp_force_verify && (imp->imp_next_ping <= now ||
+		    imp->imp_next_ping > next_connect)) {
+			imp->imp_next_ping = max(now, next_connect) + 1;
+			ptlrpc_pinger_wake_up();
+		}
+
 		ptlrpc_maybe_ping_import_soon(imp);
 
 		CDEBUG(D_HA, "recovery of %s on %s failed (%d)\n",
diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c
index 924b9c4..a1e6581 100644
--- a/fs/lustre/ptlrpc/niobuf.c
+++ b/fs/lustre/ptlrpc/niobuf.c
@@ -701,8 +701,6 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 	request->rq_deadline = request->rq_sent + request->rq_timeout +
 			       ptlrpc_at_get_net_latency(request);
 
-	ptlrpc_pinger_sending_on_import(imp);
-
 	DEBUG_REQ(D_INFO, request, "send flags=%x",
 		  lustre_msg_get_flags(request->rq_reqmsg));
 	rc = ptl_send_buf(&request->rq_req_md_h,
diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c
index e23ba3c..178153c 100644
--- a/fs/lustre/ptlrpc/pinger.c
+++ b/fs/lustre/ptlrpc/pinger.c
@@ -108,6 +108,21 @@ static bool ptlrpc_check_import_is_idle(struct obd_import *imp)
 	return true;
 }
 
+static void ptlrpc_update_next_ping(struct obd_import *imp, int soon)
+{
+#ifdef CONFIG_LUSTRE_FS_PINGER
+	time64_t time = soon ? PING_INTERVAL_SHORT : PING_INTERVAL;
+
+	if (imp->imp_state == LUSTRE_IMP_DISCON) {
+		time64_t dtime = max_t(time64_t, CONNECTION_SWITCH_MIN,
+				  AT_OFF ? 0 :
+				  at_get(&imp->imp_at.iat_net_latency));
+		time = min(time, dtime);
+	}
+	imp->imp_next_ping = ktime_get_seconds() + time;
+#endif
+}
+
 static int ptlrpc_ping(struct obd_import *imp)
 {
 	struct ptlrpc_request *req;
@@ -125,26 +140,17 @@ static int ptlrpc_ping(struct obd_import *imp)
 
 	DEBUG_REQ(D_INFO, req, "pinging %s->%s",
 		  imp->imp_obd->obd_uuid.uuid, obd2cli_tgt(imp->imp_obd));
+	/* Updating imp_next_ping early, it allows pinger_check_timeout to
+	 * see an actual time for next awake. request_out_callback update
+	 * happens at another thread, and ptlrpc_pinger_main may sleep
+	 * already.
+	 */
+	ptlrpc_update_next_ping(imp, 0);
 	ptlrpcd_add_req(req);
 
 	return 0;
 }
 
-static void ptlrpc_update_next_ping(struct obd_import *imp, int soon)
-{
-#ifdef CONFIG_LUSTRE_FS_PINGER
-	time64_t time = soon ? PING_INTERVAL_SHORT : PING_INTERVAL;
-
-	if (imp->imp_state == LUSTRE_IMP_DISCON) {
-		time64_t dtime = max_t(time64_t, CONNECTION_SWITCH_MIN,
-				  AT_OFF ? 0 :
-				  at_get(&imp->imp_at.iat_net_latency));
-		time = min(time, dtime);
-	}
-	imp->imp_next_ping = ktime_get_seconds() + time;
-#endif
-}
-
 static inline int imp_is_deactive(struct obd_import *imp)
 {
 	return (imp->imp_deactive ||
@@ -153,17 +159,32 @@ static inline int imp_is_deactive(struct obd_import *imp)
 
 static inline time64_t ptlrpc_next_reconnect(struct obd_import *imp)
 {
-	if (imp->imp_server_timeout)
-		return ktime_get_seconds() + (obd_timeout >> 1);
-	else
-		return ktime_get_seconds() + obd_timeout;
+	return ktime_get_seconds() + INITIAL_CONNECT_TIMEOUT;
 }
 
-static time64_t pinger_check_timeout(time64_t time)
+static timeout_t pinger_check_timeout(time64_t time)
 {
-	time64_t timeout = PING_INTERVAL;
+	timeout_t timeout = PING_INTERVAL;
+	timeout_t next_timeout;
+	time64_t now;
+	struct list_head *iter;
+	struct obd_import *imp;
+
+	mutex_lock(&pinger_mutex);
+	now = ktime_get_seconds();
+	/* Process imports to find a nearest next ping */
+	list_for_each(iter, &pinger_imports) {
+		imp = list_entry(iter, struct obd_import, imp_pinger_chain);
+		if (!imp->imp_pingable || imp->imp_next_ping < now)
+			continue;
+		next_timeout = imp->imp_next_ping - now;
+		/* make sure imp_next_ping in the future from time */
+		if (next_timeout > (now - time) && timeout > next_timeout)
+			timeout = next_timeout;
+	}
+	mutex_unlock(&pinger_mutex);
 
-	return time + timeout - ktime_get_seconds();
+	return timeout - (now - time);
 }
 
 static bool ir_up;
@@ -245,7 +266,8 @@ static void ptlrpc_pinger_process_import(struct obd_import *imp,
 
 static void ptlrpc_pinger_main(struct work_struct *ws)
 {
-	time64_t this_ping, time_after_ping, time_to_next_wake;
+	time64_t this_ping, time_after_ping;
+	timeout_t time_to_next_wake;
 	struct obd_import *imp;
 
 	do {
@@ -276,9 +298,8 @@ static void ptlrpc_pinger_main(struct work_struct *ws)
 		 * we will SKIP the next ping at next_ping, and the
 		 * ping will get sent 2 timeouts from now!  Beware.
 		 */
-		CDEBUG(D_INFO, "next wakeup in %lld (%lld)\n",
-		       time_to_next_wake,
-		       this_ping + PING_INTERVAL);
+		CDEBUG(D_INFO, "next wakeup in %d (%lld)\n",
+		       time_to_next_wake, this_ping + PING_INTERVAL);
 	} while (time_to_next_wake <= 0);
 
 	queue_delayed_work(pinger_wq, &ping_work,
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 18/28] lustre: ptlrpc: throttle RPC resend if network error
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (16 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 17/28] lustre: ptlrpc: decrease time between reconnection James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 19/28] lustre: ldlm: BL AST vs failed lock enqueue race James Simmons
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Aurelien Degremont <degremoa@amazon.com>

When sending a callback AST to a non-responding client, the server
retries endlessly until the client is eventually evicted. When using
ksocklnd, it will retry after each AST timeout, until the socket is
eventually closed, after sock_timeout sec, where the retry will fail
immediately, returning -110, as no socket could be established.

The thread will spin on retrying and failing, until eventual client
eviction. This will cause high thread CPU usage and possible resource
denial.

To workaround that, this patch avoids re-trying callback resend if:
 - the request is flagged with network error and timeout
 - last try was less than 1 sec ago

In worst case, retry will happen after a timeout based on req->rq_deadline.
If there is nothing else to handle, thread will be sleeping during that
time, removing CPU overhead.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13984
Lustre-commit: 4103527c1c9b38 ("LU-13984 ptlrpc: throttle RPC resend if network error")
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/40020
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/client.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index c9d9fe9..0e01ab33 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -1900,6 +1900,26 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
 					goto interpret;
 				}
 
+				/* don't resend too fast in case of network
+				 * errors.
+				 */
+				if (ktime_get_real_seconds() < (req->rq_sent + 1)
+				    && req->rq_net_err && req->rq_timedout) {
+					DEBUG_REQ(D_INFO, req,
+						  "throttle request");
+					/* Don't try to resend RPC right away
+					 * as it is likely it will fail again
+					 * and ptlrpc_check_set() will be
+					 * called again, keeping this thread
+					 * busy. Instead, wait for the next
+					 * timeout. Flag it as resend to
+					 * ensure we don't wait to long.
+					 */
+					req->rq_resend = 1;
+					spin_unlock(&imp->imp_lock);
+					continue;
+				}
+
 				list_move_tail(&req->rq_list,
 					       &imp->imp_sending_list);
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 19/28] lustre: ldlm: BL AST vs failed lock enqueue race
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (17 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 18/28] lustre: ptlrpc: throttle RPC resend if network error James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 20/28] lustre: ptlrpc: don't log connection 'restored' inappropriately James Simmons
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Andriy Skulysh <c17819@cray.com>

failed_lock_cleanup() marks the lock with LDLM_FL_LOCAL_ONLY,
so cancel request isn't sent.

Mark failed lock with LDLM_FL_LOCAL_ONLY only
if BL AST wasn't received.
Add server's lock handle to BL AST RPC.
So client will be able to cancel the lock
even if enqueue fails.

HPE-bug-id: LUS-8493, LUS-8830
WC-bug-id: https://jira.whamcloud.com/browse/LU-13989
Lustre-commit: c1be044913dde3 ("LU-13989 ldlm: BL AST vs failed lock enqueue race")
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/40046
Reviewed-by: Vladimir Saveliev <c17830@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_lockd.c   | 2 ++
 fs/lustre/ldlm/ldlm_request.c | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c
index 1ae65b829..6f498cc 100644
--- a/fs/lustre/ldlm/ldlm_lockd.c
+++ b/fs/lustre/ldlm/ldlm_lockd.c
@@ -698,6 +698,8 @@ static int ldlm_callback_handler(struct ptlrpc_request *req)
 		ldlm_lock_remove_from_lru(lock);
 		ldlm_set_bl_ast(lock);
 	}
+	if (lock->l_remote_handle.cookie == 0)
+		lock->l_remote_handle = dlm_req->lock_handle[1];
 	unlock_res_and_lock(lock);
 
 	/*
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index dd897ec..74bcba2 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -317,8 +317,11 @@ static void failed_lock_cleanup(struct ldlm_namespace *ns,
 		 * bl_ast and -EINVAL reply is sent to server anyways.
 		 * b=17645
 		 */
-		lock->l_flags |= LDLM_FL_LOCAL_ONLY | LDLM_FL_FAILED |
+		lock->l_flags |= LDLM_FL_FAILED |
 				 LDLM_FL_ATOMIC_CB | LDLM_FL_CBPENDING;
+		if (!(ldlm_is_bl_ast(lock) &&
+		      lock->l_remote_handle.cookie != 0))
+			lock->l_flags |= LDLM_FL_LOCAL_ONLY;
 		need_cancel = 1;
 	}
 	unlock_res_and_lock(lock);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 20/28] lustre: ptlrpc: don't log connection 'restored' inappropriately
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (18 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 19/28] lustre: ldlm: BL AST vs failed lock enqueue race James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 21/28] lustre: llite: Avoid eternel retry loops with MAP_POPULATE James Simmons
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Aurelien Degremont <degremoa@amazon.com>

Reverse imports maintain a target->client connection which
does not support recovery as client don't run a recovery.
At every connection, the reverse import state goes from
NEW to RECOVER to FULL which triggers a `Connection restored`
log message, even if this is the first connection from
this client.

Suppress this log message for reverse import to avoid
this wrong logging.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14057
Lustre-commit: 2135f46b816223 ("LU-14057 ptlrpc: don't log connection 'restored' inappropriately")
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/40331
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/import.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c
index 21ce593..35c4f83 100644
--- a/fs/lustre/ptlrpc/import.c
+++ b/fs/lustre/ptlrpc/import.c
@@ -1590,12 +1590,16 @@ int ptlrpc_import_recovery_state_machine(struct obd_import *imp)
 			goto out;
 		ptlrpc_activate_import(imp, true);
 
-		CDEBUG_LIMIT(imp->imp_was_idle ?
-				imp->imp_idle_debug : D_CONSOLE,
-			     "%s: Connection restored to %s (at %s)\n",
-			     imp->imp_obd->obd_name,
-			     obd_uuid2str(&conn->c_remote_uuid),
-			     obd_import_nid2str(imp));
+		/* Reverse import are flagged with dlm_fake == 1.
+		 * They do not do recovery and connection are not "restored".
+		 */
+		if (!imp->imp_dlm_fake)
+			CDEBUG_LIMIT(imp->imp_was_idle ?
+				     imp->imp_idle_debug : D_CONSOLE,
+				     "%s: Connection restored to %s (at %s)\n",
+				     imp->imp_obd->obd_name,
+				     obd_uuid2str(&conn->c_remote_uuid),
+				     obd_import_nid2str(imp));
 		spin_lock(&imp->imp_lock);
 		imp->imp_was_idle = 0;
 		spin_unlock(&imp->imp_lock);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 21/28] lustre: llite: Avoid eternel retry loops with MAP_POPULATE
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (19 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 20/28] lustre: ptlrpc: don't log connection 'restored' inappropriately James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 22/28] lustre: ptlrpc: introduce OST_SEEK RPC James Simmons
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Oleg Drokin <green@whamcloud.com>

Kernels 5.4+ have an infinite retry loop from MAP_POPULATE mmap
option. Use the FAULT_FLAG_RETRY_NOWAIT to instruct filemap_fault
to not drop the mmap_sem so if the call fails, we could use
the slow path and break the loop from forming.
(Idea by Neil Brown)

WC-bug-id: https://jira.whamcloud.com/browse/LU-13182
Lustre-commit: bb50c62c6f4cdd ("LU-13182 llite: Avoid eternel retry loops with MAP_POPULATE")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40221
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_mmap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
index f77b8f9..f0be7ba 100644
--- a/fs/lustre/llite/llite_mmap.c
+++ b/fs/lustre/llite/llite_mmap.c
@@ -288,18 +288,24 @@ static vm_fault_t __ll_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 	if (ll_sbi_has_fast_read(ll_i2sbi(file_inode(vma->vm_file)))) {
 		/* do fast fault */
+		bool has_retry = vmf->flags & FAULT_FLAG_RETRY_NOWAIT;
+
+		/* To avoid loops, instruct downstream to not drop mmap_sem */
+		vmf->flags |= FAULT_FLAG_RETRY_NOWAIT;
 		ll_cl_add(vma->vm_file, env, NULL, LCC_MMAP);
 		fault_ret = filemap_fault(vmf);
 		ll_cl_remove(vma->vm_file, env);
+		if (has_retry)
+			vmf->flags &= ~FAULT_FLAG_RETRY_NOWAIT;
 
 		/*
 		 * - If there is no error, then the page was found in cache and
 		 *   uptodate;
 		 * - If VM_FAULT_RETRY is set, the page existed but failed to
-		 *   lock. It will return to kernel and retry;
+		 *   lock. We will try slow path to avoid loops.
 		 * - Otherwise, it should try normal fault under DLM lock.
 		 */
-		if ((fault_ret & VM_FAULT_RETRY) ||
+		if (!(fault_ret & VM_FAULT_RETRY) &&
 		    !(fault_ret & VM_FAULT_ERROR)) {
 			result = 0;
 			goto out;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 22/28] lustre: ptlrpc: introduce OST_SEEK RPC
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (20 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 21/28] lustre: llite: Avoid eternel retry loops with MAP_POPULATE James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 23/28] lustre: clio: SEEK_HOLE/SEEK_DATA on client side James Simmons
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Mikhail Pershin <mpershin@whamcloud.com>

For the purposes of SEEK_HOLE/SEEK_DATA support introduce
new OST_SEEK RPC.

Patch add RPC layout, unified handler and connect flag for
compatibility needs.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10810
Lustre-commit: 6d5fe29066af5f ("LU-10810 ptlrpc: introduce OST_SEEK RPC")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39707
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_req_layout.h  | 1 +
 fs/lustre/obdclass/lprocfs_status.c    | 1 +
 fs/lustre/ptlrpc/layout.c              | 5 +++++
 fs/lustre/ptlrpc/lproc_ptlrpc.c        | 3 ++-
 fs/lustre/ptlrpc/wiretest.c            | 6 +++++-
 include/uapi/linux/lustre/lustre_idl.h | 2 ++
 6 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h
index 54041b0..434bb08 100644
--- a/fs/lustre/include/lustre_req_layout.h
+++ b/fs/lustre/include/lustre_req_layout.h
@@ -199,6 +199,7 @@ void req_capsule_shrink(struct req_capsule *pill,
 extern struct req_format RQF_OST_SET_INFO_LAST_FID;
 extern struct req_format RQF_OST_GET_INFO_FIEMAP;
 extern struct req_format RQF_OST_LADVISE;
+extern struct req_format RQF_OST_SEEK;
 
 /* LDLM req_format */
 extern struct req_format RQF_LDLM_ENQUEUE;
diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index a3c5657..6ce0a5d 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -129,6 +129,7 @@
 	"client_encryption",	/* 0x8000 */
 	"fidmap",		/* 0x10000 */
 	"getattr_pfid",		/* 0x20000 */
+	"lseek",		/* 0x40000 */
 	NULL
 };
 
diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c
index 4e37bbb..11c0d50 100644
--- a/fs/lustre/ptlrpc/layout.c
+++ b/fs/lustre/ptlrpc/layout.c
@@ -777,6 +777,7 @@
 	&RQF_OST_SET_INFO_LAST_FID,
 	&RQF_OST_GET_INFO_FIEMAP,
 	&RQF_OST_LADVISE,
+	&RQF_OST_SEEK,
 	&RQF_LDLM_ENQUEUE,
 	&RQF_LDLM_ENQUEUE_LVB,
 	&RQF_LDLM_CONVERT,
@@ -1611,6 +1612,10 @@ struct req_format RQF_OST_FALLOCATE =
 	DEFINE_REQ_FMT0("OST_FALLOCATE", ost_body_capa, ost_body_only);
 EXPORT_SYMBOL(RQF_OST_FALLOCATE);
 
+struct req_format RQF_OST_SEEK =
+	DEFINE_REQ_FMT0("OST_SEEK", ost_body_only, ost_body_only);
+EXPORT_SYMBOL(RQF_OST_SEEK);
+
 struct req_format RQF_OST_SYNC =
 	DEFINE_REQ_FMT0("OST_SYNC", ost_body_capa, ost_body_only);
 EXPORT_SYMBOL(RQF_OST_SYNC);
diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c
index 7276f81..26ca55e 100644
--- a/fs/lustre/ptlrpc/lproc_ptlrpc.c
+++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c
@@ -67,7 +67,8 @@
 	{ OST_QUOTACTL,				"ost_quotactl" },
 	{ OST_QUOTA_ADJUST_QUNIT,		"ost_quota_adjust_qunit" },
 	{ OST_LADVISE,				"ost_ladvise" },
-	{ OST_FALLOCATE,			"ost_fallocate"},
+	{ OST_FALLOCATE,			"ost_fallocate" },
+	{ OST_SEEK,				"ost_seek" },
 	{ MDS_GETATTR,				"mds_getattr" },
 	{ MDS_GETATTR_NAME,			"mds_getattr_lock" },
 	{ MDS_CLOSE,				"mds_close" },
diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index 9b1caf4..ba19b78 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -108,7 +108,9 @@ void lustre_assert_wire_constants(void)
 		 (long long)OST_LADVISE);
 	LASSERTF(OST_FALLOCATE == 22, "found %lld\n",
 		 (long long)OST_FALLOCATE);
-	LASSERTF(OST_LAST_OPC == 23, "found %lld\n",
+	LASSERTF(OST_SEEK == 23, "found %lld\n",
+		 (long long)OST_SEEK);
+	LASSERTF(OST_LAST_OPC == 24, "found %lld\n",
 		 (long long)OST_LAST_OPC);
 	LASSERTF(OBD_OBJECT_EOF == 0xffffffffffffffffULL, "found 0x%.16llxULL\n",
 		 OBD_OBJECT_EOF);
@@ -1245,6 +1247,8 @@ void lustre_assert_wire_constants(void)
 		 OBD_CONNECT2_FIDMAP);
 	LASSERTF(OBD_CONNECT2_GETATTR_PFID == 0x20000ULL, "found 0x%.16llxULL\n",
 		 OBD_CONNECT2_GETATTR_PFID);
+	LASSERTF(OBD_CONNECT2_LSEEK == 0x40000ULL, "found 0x%.16llxULL\n",
+		 OBD_CONNECT2_LSEEK);
 	LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n",
 		 (unsigned int)OBD_CKSUM_CRC32);
 	LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n",
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index 34b2367..f56b3c5 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -838,6 +838,7 @@ struct ptlrpc_body_v2 {
 #define OBD_CONNECT2_ENCRYPT	       0x8000ULL /* client-to-disk encrypt */
 #define OBD_CONNECT2_FIDMAP	      0x10000ULL /* FID map */
 #define OBD_CONNECT2_GETATTR_PFID     0x20000ULL /* pack parent FID in getattr */
+#define OBD_CONNECT2_LSEEK	      0x40000ULL /* SEEK_HOLE/DATA RPC */
 /* XXX README XXX:
  * Please DO NOT add flag values here before first ensuring that this same
  * flag value is not in use on some other branch.  Please clear any such
@@ -972,6 +973,7 @@ enum ost_cmd {
 	OST_QUOTA_ADJUST_QUNIT = 20, /* not used since 2.4 */
 	OST_LADVISE	= 21,
 	OST_FALLOCATE	= 22,
+	OST_SEEK	= 23,
 	OST_LAST_OPC /* must be < 33 to avoid MDS_GETATTR */
 };
 #define OST_FIRST_OPC  OST_REPLY
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 23/28] lustre: clio: SEEK_HOLE/SEEK_DATA on client side
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (21 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 22/28] lustre: ptlrpc: introduce OST_SEEK RPC James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 24/28] lustre: sec: O_DIRECT for encrypted file James Simmons
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Mikhail Pershin <mpershin@whamcloud.com>

Patch introduces basic support for lseek SEEK_HOLE/SEEK_DATA
parameters in lustre client.

- introduce new IO type CIT_LSEEK in CLIO stack
- LOV splits request to all stripes involved and merges
  results back.
- OSC sends OST LSEEK RPC asynchronously
- if target doesn't support LSEEK RPC then OSC assumes
  whole related object is data with virtual hole at the end
- lseek restores released files assuming it is done prior
  the file copying.
- tool is added to request needed lseek on file
- basic tests are added in sanity, sanityn and sanity-hsm

WC-bug-id: https://jira.whamcloud.com/browse/LU-10810
Lustre-commit: cda353e6efae50 ("LU-10810 clio: SEEK_HOLE/SEEK_DATA on client side")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39708
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h     |  10 +++
 fs/lustre/include/lustre_export.h |   5 ++
 fs/lustre/include/lustre_osc.h    |   4 ++
 fs/lustre/llite/file.c            |  61 ++++++++++++++++--
 fs/lustre/llite/llite_lib.c       |   4 +-
 fs/lustre/llite/vvp_io.c          |  53 +++++++++++++++-
 fs/lustre/lov/lov_io.c            |  99 ++++++++++++++++++++++++++++-
 fs/lustre/mdc/mdc_dev.c           |   4 ++
 fs/lustre/obdclass/cl_io.c        |   1 +
 fs/lustre/osc/osc_io.c            | 127 ++++++++++++++++++++++++++++++++++++++
 10 files changed, 358 insertions(+), 10 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 56200d2..e17385c0 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -1415,6 +1415,11 @@ enum cl_io_type {
 	 * To give advice about access of a file
 	 */
 	CIT_LADVISE,
+	/**
+	 * SEEK_HOLE/SEEK_DATA handling to search holes or data
+	 * across all file objects
+	 */
+	CIT_LSEEK,
 	CIT_OP_NR
 };
 
@@ -1892,6 +1897,11 @@ struct cl_io {
 			enum lu_ladvise_type	li_advice;
 			u64			li_flags;
 		} ci_ladvise;
+		struct cl_lseek_io {
+			loff_t			ls_start;
+			loff_t			ls_result;
+			int			ls_whence;
+		} ci_lseek;
 	} u;
 	struct cl_2queue	ci_queue;
 	size_t			ci_nob;
diff --git a/fs/lustre/include/lustre_export.h b/fs/lustre/include/lustre_export.h
index b5fdf8a..ed49a97 100644
--- a/fs/lustre/include/lustre_export.h
+++ b/fs/lustre/include/lustre_export.h
@@ -285,6 +285,11 @@ static inline int exp_connect_encrypt(struct obd_export *exp)
 	return !!(exp_connect_flags2(exp) & OBD_CONNECT2_ENCRYPT);
 }
 
+static inline int exp_connect_lseek(struct obd_export *exp)
+{
+	return !!(exp_connect_flags2(exp) & OBD_CONNECT2_LSEEK);
+}
+
 enum {
 	/* archive_ids in array format */
 	KKUC_CT_DATA_ARRAY_MAGIC	= 0x092013cea,
diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index 24cfec8..ef5237b 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -704,6 +704,10 @@ int osc_fsync_ost(const struct lu_env *env, struct osc_object *obj,
 void osc_io_fsync_end(const struct lu_env *env,
 		      const struct cl_io_slice *slice);
 void osc_read_ahead_release(const struct lu_env *env, void *cbdata);
+int osc_io_lseek_start(const struct lu_env *env,
+		       const struct cl_io_slice *slice);
+void osc_io_lseek_end(const struct lu_env *env,
+		      const struct cl_io_slice *slice);
 
 /* osc_lock.c */
 void osc_lock_to_lockless(const struct lu_env *env, struct osc_lock *ols,
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 3ba9152..4a3c534 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -3984,26 +3984,75 @@ static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags)
 	}
 }
 
+loff_t ll_lseek(struct inode *inode, loff_t offset, int whence)
+{
+	struct lu_env *env;
+	struct cl_io *io;
+	struct cl_lseek_io *lsio;
+	u16 refcheck;
+	int rc;
+	loff_t retval;
+
+	env = cl_env_get(&refcheck);
+	if (IS_ERR(env))
+		return PTR_ERR(env);
+
+	io = vvp_env_thread_io(env);
+	io->ci_obj = ll_i2info(inode)->lli_clob;
+
+	lsio = &io->u.ci_lseek;
+	lsio->ls_start = offset;
+	lsio->ls_whence = whence;
+	lsio->ls_result = -ENXIO;
+
+	do {
+		rc = cl_io_init(env, io, CIT_LSEEK, io->ci_obj);
+		if (!rc)
+			rc = cl_io_loop(env, io);
+		else
+			rc = io->ci_result;
+		retval = rc ? : lsio->ls_result;
+		cl_io_fini(env, io);
+	} while (unlikely(io->ci_need_restart));
+
+	cl_env_put(env, &refcheck);
+
+	return retval;
+}
+
 static loff_t ll_file_seek(struct file *file, loff_t offset, int origin)
 {
 	struct inode *inode = file_inode(file);
-	loff_t retval, eof = 0;
+	loff_t retval = offset, eof = 0;
 	ktime_t kstart = ktime_get();
 
-	retval = offset + ((origin == SEEK_END) ? i_size_read(inode) :
-			   (origin == SEEK_CUR) ? file->f_pos : 0);
 	CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p), to=%llu=%#llx(%d)\n",
 	       PFID(ll_inode2fid(inode)), inode, retval, retval, origin);
 
-	if (origin == SEEK_END || origin == SEEK_HOLE || origin == SEEK_DATA) {
+	if (origin == SEEK_END) {
 		retval = ll_glimpse_size(inode);
 		if (retval != 0)
 			return retval;
 		eof = i_size_read(inode);
 	}
 
-	retval = generic_file_llseek_size(file, offset, origin,
-					  ll_file_maxbytes(inode), eof);
+	if (origin == SEEK_HOLE || origin == SEEK_DATA) {
+		if (offset < 0)
+			return -ENXIO;
+
+		/* flush local cache first if any */
+		cl_sync_file_range(inode, offset, OBD_OBJECT_EOF,
+				   CL_FSYNC_LOCAL, 0);
+
+		retval = ll_lseek(inode, offset, origin);
+		if (retval < 0)
+			return retval;
+
+		retval = vfs_setpos(file, retval, ll_file_maxbytes(inode));
+	} else {
+		retval = generic_file_llseek_size(file, offset, origin,
+						  ll_file_maxbytes(inode), eof);
+	}
 	if (retval >= 0)
 		ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_LLSEEK,
 				   ktime_us_delta(ktime_get(), kstart));
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index d94c6ca..a4042b8 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -263,7 +263,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 				   OBD_CONNECT2_LSOM |
 				   OBD_CONNECT2_ASYNC_DISCARD |
 				   OBD_CONNECT2_PCC |
-				   OBD_CONNECT2_CRUSH |
+				   OBD_CONNECT2_CRUSH | OBD_CONNECT2_LSEEK |
 				   OBD_CONNECT2_GETATTR_PFID;
 
 	if (sbi->ll_flags & LL_SBI_LRU_RESIZE)
@@ -473,7 +473,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 				  OBD_CONNECT_FLAGS2 | OBD_CONNECT_GRANT_SHRINK;
 
 	data->ocd_connect_flags2 = OBD_CONNECT2_LOCKAHEAD |
-				   OBD_CONNECT2_INC_XID;
+				   OBD_CONNECT2_INC_XID | OBD_CONNECT2_LSEEK;
 
 	if (!OBD_FAIL_CHECK(OBD_FAIL_OSC_CONNECT_GRANT_PARAM))
 		data->ocd_connect_flags |= OBD_CONNECT_GRANT_PARAM;
diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index 3a2e1cc..d6ca267 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -1531,6 +1531,51 @@ static int vvp_io_read_ahead(const struct lu_env *env,
 	return result;
 }
 
+static int vvp_io_lseek_lock(const struct lu_env *env,
+			     const struct cl_io_slice *ios)
+{
+	struct cl_io *io = ios->cis_io;
+	u64 lock_start = io->u.ci_lseek.ls_start;
+	u64 lock_end = OBD_OBJECT_EOF;
+	u32 enqflags = CEF_MUST; /* always take client lock */
+
+	return vvp_io_one_lock(env, io, enqflags, CLM_READ,
+			       lock_start, lock_end);
+}
+
+static int vvp_io_lseek_start(const struct lu_env *env,
+			      const struct cl_io_slice *ios)
+{
+	struct cl_io *io = ios->cis_io;
+	struct inode *inode = vvp_object_inode(io->ci_obj);
+	u64 start = io->u.ci_lseek.ls_start;
+
+	inode_lock(inode);
+	inode_dio_wait(inode);
+
+	/* At the moment we have DLM lock so just update inode
+	 * to know the file size.
+	 */
+	ll_merge_attr(env, inode);
+	if (start >= i_size_read(inode)) {
+		io->u.ci_lseek.ls_result = -ENXIO;
+		return -ENXIO;
+	}
+	return 0;
+}
+
+static void vvp_io_lseek_end(const struct lu_env *env,
+			     const struct cl_io_slice *ios)
+{
+	struct cl_io *io = ios->cis_io;
+	struct inode *inode = vvp_object_inode(io->ci_obj);
+
+	if (io->u.ci_lseek.ls_result > i_size_read(inode))
+		io->u.ci_lseek.ls_result = -ENXIO;
+
+	inode_unlock(inode);
+}
+
 static const struct cl_io_operations vvp_io_ops = {
 	.op = {
 		[CIT_READ] = {
@@ -1576,7 +1621,13 @@ static int vvp_io_read_ahead(const struct lu_env *env,
 		},
 		[CIT_LADVISE] = {
 			.cio_fini	= vvp_io_fini
-		}
+		},
+		[CIT_LSEEK] = {
+			.cio_fini	= vvp_io_fini,
+			.cio_lock	= vvp_io_lseek_lock,
+			.cio_start	= vvp_io_lseek_start,
+			.cio_end	= vvp_io_lseek_end,
+		},
 	},
 	.cio_read_ahead	= vvp_io_read_ahead,
 };
diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c
index af79d20..20fcde1 100644
--- a/fs/lustre/lov/lov_io.c
+++ b/fs/lustre/lov/lov_io.c
@@ -529,6 +529,12 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
 		break;
 	}
 
+	case CIT_LSEEK: {
+		lio->lis_pos = io->u.ci_lseek.ls_start;
+		lio->lis_endpos = OBD_OBJECT_EOF;
+		break;
+	}
+
 	case CIT_GLIMPSE:
 		lio->lis_pos = 0;
 		lio->lis_endpos = OBD_OBJECT_EOF;
@@ -715,6 +721,12 @@ static void lov_io_sub_inherit(struct lov_io_sub *sub, struct lov_io *lio,
 		io->u.ci_ladvise.li_flags = parent->u.ci_ladvise.li_flags;
 		break;
 	}
+	case CIT_LSEEK: {
+		io->u.ci_lseek.ls_start = start;
+		io->u.ci_lseek.ls_whence = parent->u.ci_lseek.ls_whence;
+		io->u.ci_lseek.ls_result = parent->u.ci_lseek.ls_result;
+		break;
+	}
 	case CIT_GLIMPSE:
 	case CIT_MISC:
 	default:
@@ -1265,6 +1277,80 @@ static void lov_io_fsync_end(const struct lu_env *env,
 	}
 }
 
+static void lov_io_lseek_end(const struct lu_env *env,
+			     const struct cl_io_slice *ios)
+{
+	struct lov_io *lio = cl2lov_io(env, ios);
+	struct cl_io *io = lio->lis_cl.cis_io;
+	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
+	struct lov_io_sub *sub;
+	loff_t offset = -ENXIO;
+	bool seek_hole = io->u.ci_lseek.ls_whence == SEEK_HOLE;
+
+	list_for_each_entry(sub, &lio->lis_active, sub_linkage) {
+		struct cl_io *subio = &sub->sub_io;
+		int index = lov_comp_entry(sub->sub_subio_index);
+		int stripe = lov_comp_stripe(sub->sub_subio_index);
+		loff_t sub_off, lov_off;
+
+		lov_io_end_wrapper(sub->sub_env, subio);
+
+		if (io->ci_result == 0)
+			io->ci_result = sub->sub_io.ci_result;
+
+		if (io->ci_result)
+			continue;
+
+		CDEBUG(D_INFO, DFID ": entry %x stripe %u: SEEK_%s from %lld\n",
+		       PFID(lu_object_fid(lov2lu(lio->lis_object))),
+		       index, stripe, seek_hole ? "HOLE" : "DATA",
+		       subio->u.ci_lseek.ls_start);
+
+		/* first subio with positive result is what we need */
+		sub_off = subio->u.ci_lseek.ls_result;
+		/* Expected error, offset is out of stripe file size */
+		if (sub_off == -ENXIO)
+			continue;
+		/* Any other errors are not expected with ci_result == 0 */
+		if (sub_off < 0) {
+			CDEBUG(D_INFO, "unexpected error: rc = %lld\n",
+			       sub_off);
+			io->ci_result = sub_off;
+			continue;
+		}
+		lov_off = lov_stripe_size(lsm, index, sub_off + 1, stripe) - 1;
+		if (lov_off < 0) {
+			/* the only way to get negatove lov_off here is too big
+			 * result. Return -EOVERFLOW then.
+			 */
+			io->ci_result = -EOVERFLOW;
+			CDEBUG(D_INFO, "offset %llu is too big: rc = %d\n",
+			       (u64)lov_off, io->ci_result);
+			continue;
+		}
+		if (lov_off < io->u.ci_lseek.ls_start) {
+			io->ci_result = -EINVAL;
+			CDEBUG(D_INFO, "offset %lld < start %lld: rc = %d\n",
+			       sub_off, io->u.ci_lseek.ls_start, io->ci_result);
+			continue;
+		}
+		/* resulting offset can be out of component range if stripe
+		 * object is full and its file size was returned as virtual
+		 * hole start. Skip this result, the next component will give
+		 * us correct lseek result.
+		 */
+		if (lov_off >= lsm->lsm_entries[index]->lsme_extent.e_end)
+			continue;
+
+		CDEBUG(D_INFO, "SEEK_%s: %lld->%lld/%lld: rc = %d\n",
+		       seek_hole ? "HOLE" : "DATA",
+		       subio->u.ci_lseek.ls_start, sub_off, lov_off,
+		       sub->sub_io.ci_result);
+		offset = min_t(u64, offset, lov_off);
+	}
+	io->u.ci_lseek.ls_result = offset;
+}
+
 static const struct cl_io_operations lov_io_ops = {
 	.op = {
 		[CIT_READ] = {
@@ -1330,8 +1416,17 @@ static void lov_io_fsync_end(const struct lu_env *env,
 			.cio_start	= lov_io_start,
 			.cio_end	= lov_io_end
 		},
+		[CIT_LSEEK] = {
+			.cio_fini	= lov_io_fini,
+			.cio_iter_init	= lov_io_iter_init,
+			.cio_iter_fini	= lov_io_iter_fini,
+			.cio_lock	= lov_io_lock,
+			.cio_unlock	= lov_io_unlock,
+			.cio_start	= lov_io_start,
+			.cio_end	= lov_io_lseek_end
+		},
 		[CIT_GLIMPSE] = {
-			.cio_fini      = lov_io_fini,
+			.cio_fini	= lov_io_fini,
 		},
 		[CIT_MISC] = {
 			.cio_fini	= lov_io_fini
@@ -1459,6 +1554,7 @@ int lov_io_init_empty(const struct lu_env *env, struct cl_object *obj,
 		break;
 	case CIT_FSYNC:
 	case CIT_LADVISE:
+	case CIT_LSEEK:
 	case CIT_SETATTR:
 	case CIT_DATA_VERSION:
 		result = 1;
@@ -1522,6 +1618,7 @@ int lov_io_init_released(const struct lu_env *env, struct cl_object *obj,
 	case CIT_READ:
 	case CIT_WRITE:
 	case CIT_FAULT:
+	case CIT_LSEEK:
 		io->ci_restore_needed = 1;
 		result = -ENODATA;
 		break;
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index 90b60f5..214fd31 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -1297,6 +1297,10 @@ static void mdc_io_data_version_end(const struct lu_env *env,
 			.cio_start	= mdc_io_fsync_start,
 			.cio_end	= osc_io_fsync_end,
 		},
+		[CIT_LSEEK] = {
+			.cio_start	= osc_io_lseek_start,
+			.cio_end	= osc_io_lseek_end,
+		},
 	},
 	.cio_read_ahead		= mdc_io_read_ahead,
 	.cio_submit		= osc_io_submit,
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index aa3cb17..c57a3766 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -127,6 +127,7 @@ void cl_io_fini(const struct lu_env *env, struct cl_io *io)
 	case CIT_GLIMPSE:
 		break;
 	case CIT_LADVISE:
+	case CIT_LSEEK:
 		break;
 	default:
 		LBUG();
diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c
index 6121f39..a0537b8 100644
--- a/fs/lustre/osc/osc_io.c
+++ b/fs/lustre/osc/osc_io.c
@@ -1042,6 +1042,128 @@ void osc_io_end(const struct lu_env *env, const struct cl_io_slice *slice)
 }
 EXPORT_SYMBOL(osc_io_end);
 
+struct osc_lseek_args {
+	struct osc_io *lsa_oio;
+};
+
+static int osc_lseek_interpret(const struct lu_env *env,
+			       struct ptlrpc_request *req,
+			       void *arg, int rc)
+{
+	struct ost_body *reply;
+	struct osc_lseek_args *lsa = arg;
+	struct osc_io *oio = lsa->lsa_oio;
+	struct cl_io *io = oio->oi_cl.cis_io;
+	struct cl_lseek_io *lsio = &io->u.ci_lseek;
+
+	if (rc != 0)
+		goto out;
+
+	reply = req_capsule_server_get(&req->rq_pill, &RMF_OST_BODY);
+	if (!reply) {
+		rc = -EPROTO;
+		goto out;
+	}
+
+	lsio->ls_result = reply->oa.o_size;
+out:
+	osc_async_upcall(&oio->oi_cbarg, rc);
+	return rc;
+}
+
+int osc_io_lseek_start(const struct lu_env *env,
+		       const struct cl_io_slice *slice)
+{
+	struct cl_io *io = slice->cis_io;
+	struct osc_io *oio = cl2osc_io(env, slice);
+	struct cl_object *obj = slice->cis_obj;
+	struct lov_oinfo *loi = cl2osc(obj)->oo_oinfo;
+	struct cl_lseek_io *lsio = &io->u.ci_lseek;
+	struct obdo *oa = &oio->oi_oa;
+	struct osc_async_cbargs *cbargs = &oio->oi_cbarg;
+	struct obd_export *exp = osc_export(cl2osc(obj));
+	struct ptlrpc_request *req;
+	struct ost_body *body;
+	struct osc_lseek_args *lsa;
+	int rc = 0;
+
+	/* No negative values at this point */
+	LASSERT(lsio->ls_start >= 0);
+	LASSERT(lsio->ls_whence == SEEK_HOLE || lsio->ls_whence == SEEK_DATA);
+
+	/* with IO lock taken we have object size in LVB and can check
+	 * boundaries prior sending LSEEK RPC
+	 */
+	if (lsio->ls_start >= loi->loi_lvb.lvb_size) {
+		/* consider area beyond end of object as hole */
+		if (lsio->ls_whence == SEEK_HOLE)
+			lsio->ls_result = lsio->ls_start;
+		else
+			lsio->ls_result = -ENXIO;
+		return 0;
+	}
+
+	/* if LSEEK RPC is not supported by server, consider whole stripe
+	 * object is data with hole after end of object
+	 */
+	if (!exp_connect_lseek(exp)) {
+		if (lsio->ls_whence == SEEK_HOLE)
+			lsio->ls_result = loi->loi_lvb.lvb_size;
+		else
+			lsio->ls_result = lsio->ls_start;
+		return 0;
+	}
+
+	memset(oa, 0, sizeof(*oa));
+	oa->o_oi = loi->loi_oi;
+	oa->o_valid = OBD_MD_FLID | OBD_MD_FLGROUP;
+	oa->o_size = lsio->ls_start;
+	oa->o_mode = lsio->ls_whence;
+	if (oio->oi_lockless) {
+		oa->o_flags = OBD_FL_SRVLOCK;
+		oa->o_valid |= OBD_MD_FLFLAGS;
+	}
+
+	init_completion(&cbargs->opc_sync);
+	req = ptlrpc_request_alloc(class_exp2cliimp(exp), &RQF_OST_SEEK);
+	if (!req)
+		return -ENOMEM;
+
+	rc = ptlrpc_request_pack(req, LUSTRE_OST_VERSION, OST_SEEK);
+	if (rc < 0) {
+		ptlrpc_request_free(req);
+		return rc;
+	}
+
+	body = req_capsule_client_get(&req->rq_pill, &RMF_OST_BODY);
+	lustre_set_wire_obdo(&req->rq_import->imp_connect_data, &body->oa, oa);
+	ptlrpc_request_set_replen(req);
+	req->rq_interpret_reply = osc_lseek_interpret;
+	lsa = ptlrpc_req_async_args(lsa, req);
+	lsa->lsa_oio = oio;
+
+	ptlrpcd_add_req(req);
+	cbargs->opc_rpc_sent = 1;
+
+	return 0;
+}
+EXPORT_SYMBOL(osc_io_lseek_start);
+
+void osc_io_lseek_end(const struct lu_env *env,
+		      const struct cl_io_slice *slice)
+{
+	struct osc_io *oio = cl2osc_io(env, slice);
+	struct osc_async_cbargs *cbargs = &oio->oi_cbarg;
+	int rc = 0;
+
+	if (cbargs->opc_rpc_sent) {
+		wait_for_completion(&cbargs->opc_sync);
+		rc = cbargs->opc_rc;
+	}
+	slice->cis_io->ci_result = rc;
+}
+EXPORT_SYMBOL(osc_io_lseek_end);
+
 static const struct cl_io_operations osc_io_ops = {
 	.op = {
 		[CIT_READ] = {
@@ -1084,6 +1206,11 @@ void osc_io_end(const struct lu_env *env, const struct cl_io_slice *slice)
 			.cio_end	= osc_io_ladvise_end,
 			.cio_fini	= osc_io_fini
 		},
+		[CIT_LSEEK] = {
+			.cio_start	= osc_io_lseek_start,
+			.cio_end	= osc_io_lseek_end,
+			.cio_fini	= osc_io_fini
+		},
 		[CIT_MISC] = {
 			.cio_fini	= osc_io_fini
 		}
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 24/28] lustre: sec: O_DIRECT for encrypted file
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (22 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 23/28] lustre: clio: SEEK_HOLE/SEEK_DATA on client side James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 25/28] lustre: sec: restrict fallocate on encrypted files James Simmons
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Sebastien Buisson <sbuisson@ddn.com>

Add O_DIRECT support for encrypted files.
By default, fscrypt does not support O_DIRECT because it needs
pagecache pages to proceed.
With Lustre, we can make use of pages being used for sending RPCs.
They can be twisted so that they have a proper mapping and index,
suitable for encryption/decryption.

One of the benefits of O_DIRECT support for encrypted files is that
we get support for mirroring at the same time.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12275
Lustre-commit: 728036f25635a ("LU-12275 sec: O_DIRECT for encrypted file")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/38967
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../client_side_encryption/access_semantics.txt    |  3 --
 fs/lustre/llite/dir.c                              |  1 -
 fs/lustre/llite/llite_lib.c                        | 11 ++++++-
 fs/lustre/llite/rw26.c                             | 27 +++++++++++++---
 fs/lustre/llite/super25.c                          | 11 +++++++
 fs/lustre/obdclass/cl_io.c                         | 11 +++++++
 fs/lustre/osc/osc_request.c                        | 37 ++++++++++++++++++----
 fs/lustre/ptlrpc/wiretest.c                        |  2 ++
 8 files changed, 87 insertions(+), 16 deletions(-)

diff --git a/Documentation/lustre/client_side_encryption/access_semantics.txt b/Documentation/lustre/client_side_encryption/access_semantics.txt
index fe2c28d..7ed0bc7 100644
--- a/Documentation/lustre/client_side_encryption/access_semantics.txt
+++ b/Documentation/lustre/client_side_encryption/access_semantics.txt
@@ -42,9 +42,6 @@ astute users may notice some differences in behavior:
   may be used to overwrite the source files but isn't guaranteed to be
   effective on all filesystems and storage devices.
 
-- Direct I/O is not supported on encrypted files.  Attempts to use
-  direct I/O on such files will fall back to buffered I/O.
-
 - The fallocate operations FALLOC_FL_COLLAPSE_RANGE,
   FALLOC_FL_INSERT_RANGE, and FALLOC_FL_ZERO_RANGE are not supported
   on encrypted files and will fail with EOPNOTSUPP.
diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 262aea0..6bc95d9 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -481,7 +481,6 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 			goto out_op_data;
 	}
 
-
 	op_data->op_cli_flags |= CLI_SET_MEA;
 	err = md_create(sbi->ll_md_exp, op_data, lump, len, mode,
 			from_kuid(&init_user_ns, current_fsuid()),
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index a4042b8..e4036af 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1759,6 +1759,7 @@ int ll_io_zero_page(struct inode *inode, pgoff_t index, pgoff_t offset,
 			 * file, we must not zero and write as below. Subsequent
 			 * server-side truncate will handle things correctly.
 			 */
+			rc = 0;
 			goto clpfini;
 		ClearPagePrivate2(vmpage);
 		if (rc)
@@ -1960,7 +1961,15 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr,
 			    attr->ia_valid & ATTR_SIZE) {
 				xvalid |= OP_XVALID_FLAGS;
 				flags = LUSTRE_ENCRYPT_FL;
-				if (attr->ia_size & ~PAGE_MASK) {
+				/* Call to ll_io_zero_page is not necessary if
+				 * truncating on PAGE_SIZE boundary, because
+				 * whole pages will be wiped.
+				 * In case of Direct IO, all we need is to set
+				 * new size.
+				 */
+				if (attr->ia_size & ~PAGE_MASK &&
+				    !(attr->ia_valid & ATTR_FILE &&
+				      attr->ia_file->f_flags & O_DIRECT)) {
 					pgoff_t offset;
 
 					offset = attr->ia_size & (PAGE_SIZE - 1);
diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index a4ae211..1736e9a 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -207,6 +207,7 @@ struct ll_dio_pages {
 	int io_pages = 0;
 	size_t page_size = cl_page_size(obj);
 	int i;
+	pgoff_t index = offset >> PAGE_SHIFT;
 	ssize_t rc = 0;
 
 	cl_2queue_init(queue);
@@ -226,6 +227,28 @@ struct ll_dio_pages {
 		}
 
 		page->cp_sync_io = anchor;
+		if (inode && IS_ENCRYPTED(inode)) {
+			struct page *vmpage = cl_page_vmpage(page);
+
+			/* In case of Direct IO on encrypted file, we need to
+			 * set the correct page index, and add a reference to
+			 * the mapping. This is required by llcrypt to proceed
+			 * to encryption/decryption, because each block is
+			 * encrypted independently, and each block's IV is set
+			 * to the logical block number within the file.
+			 * This is safe because we know these pages are private
+			 * to the thread doing the Direct IO, and despite
+			 * setting a mapping on the pages, cached lookups will
+			 * not find them.
+			 * Set PageChecked to detect special case of Direct IO
+			 * in osc_brw_fini_request().
+			 * Reference to the mapping and PageChecked flag are
+			 * removed in cl_aio_end().
+			 */
+			vmpage->index = index++;
+			vmpage->mapping = inode->i_mapping;
+			SetPageChecked(vmpage);
+		}
 		cl_page_list_add(&queue->c2_qin, page);
 		/*
 		 * Set page clip to tell transfer formation engine
@@ -297,10 +320,6 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	int rw = iov_iter_rw(iter);
 	struct vvp_io *vio;
 
-	/* if file is encrypted, return 0 so that we fall back to buffered IO */
-	if (IS_ENCRYPTED(inode))
-		return 0;
-
 	/* Check EOF by ourselves */
 	if (rw == READ && file_offset >= i_size_read(inode))
 		return 0;
diff --git a/fs/lustre/llite/super25.c b/fs/lustre/llite/super25.c
index 8eb3fc3..d02c8cf 100644
--- a/fs/lustre/llite/super25.c
+++ b/fs/lustre/llite/super25.c
@@ -72,10 +72,21 @@ static void ll_destroy_inode(struct inode *inode)
 	call_rcu(&inode->i_rcu, ll_inode_destroy_callback);
 }
 
+static int ll_drop_inode(struct inode *inode)
+{
+	int drop = generic_drop_inode(inode);
+
+	if (!drop)
+		drop = llcrypt_drop_inode(inode);
+
+	return drop;
+}
+
 /* exported operations */
 struct super_operations lustre_super_operations = {
 	.alloc_inode		= ll_alloc_inode,
 	.destroy_inode		= ll_destroy_inode,
+	.drop_inode		= ll_drop_inode,
 	.evict_inode		= ll_delete_inode,
 	.put_super		= ll_put_super,
 	.statfs			= ll_statfs,
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index c57a3766..37b0828 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -1081,8 +1081,19 @@ static void cl_aio_end(const struct lu_env *env, struct cl_sync_io *anchor)
 	/* release pages */
 	while (aio->cda_pages.pl_nr > 0) {
 		struct cl_page *page = cl_page_list_first(&aio->cda_pages);
+		struct page *vmpage = cl_page_vmpage(page);
+		struct inode *inode = vmpage ? page2inode(vmpage) : NULL;
 
 		cl_page_get(page);
+		/* We end up here in case of Direct IO only. For encrypted file,
+		 * mapping was set on pages in ll_direct_rw_pages(), so it has
+		 * to be cleared now before page cleanup.
+		 * PageChecked flag was also set there, so we clean up here.
+		 */
+		if (inode && IS_ENCRYPTED(inode)) {
+			vmpage->mapping = NULL;
+			ClearPageChecked(vmpage);
+		}
 		cl_page_list_del(env, &aio->cda_pages, page);
 		cl_page_delete(env, page);
 		cl_page_put(env, page);
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 8a8a624..bf9ce44 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1369,13 +1369,9 @@ static inline void osc_release_bounce_pages(struct brw_page **pga,
 	int i;
 
 	for (i = 0; i < page_count; i++) {
-		if (pga[i]->pg->mapping)
+		if (!pga[i]->pg->mapping)
 			/* bounce pages are unmapped */
-			continue;
-		if (pga[i]->flag & OBD_BRW_SYNC)
-			/* sync transfer cannot have encrypted pages */
-			continue;
-		llcrypt_finalize_bounce_page(&pga[i]->pg);
+			llcrypt_finalize_bounce_page(&pga[i]->pg);
 		pga[i]->count -= pga[i]->bp_count_diff;
 		pga[i]->off += pga[i]->bp_off_diff;
 	}
@@ -1470,6 +1466,19 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 			pg->bp_off_diff = pg->off & ~PAGE_MASK;
 			pg->off = pg->off & PAGE_MASK;
 		}
+	} else if (opc == OST_READ && inode && IS_ENCRYPTED(inode)) {
+		for (i = 0; i < page_count; i++) {
+			struct brw_page *pg = pga[i];
+
+			/* count/off are forced to cover the whole page so that
+			 * all encrypted data is stored on the OST, so adjust
+			 * bp_{count,off}_diff for the size of the clear text.
+			 */
+			pg->bp_count_diff = PAGE_SIZE - pg->count;
+			pg->count = PAGE_SIZE;
+			pg->bp_off_diff = pg->off & ~PAGE_MASK;
+			pg->off = pg->off & PAGE_MASK;
+		}
 	}
 
 	for (niocount = i = 1; i < page_count; i++) {
@@ -1483,8 +1492,13 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 	req_capsule_set_size(pill, &RMF_NIOBUF_REMOTE, RCL_CLIENT,
 			     niocount * sizeof(*niobuf));
 
-	for (i = 0; i < page_count; i++)
+	for (i = 0; i < page_count; i++) {
 		short_io_size += pga[i]->count;
+		if (!inode || !IS_ENCRYPTED(inode)) {
+			pga[i]->bp_count_diff = 0;
+			pga[i]->bp_off_diff = 0;
+		}
+	}
 
 	/* Check if read/write is small enough to be a short io. */
 	if (short_io_size > cli->cl_max_short_io_bytes || niocount > 1 ||
@@ -2093,8 +2107,17 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 				continue;
 			}
 
+			/* The page is already locked when we arrive here,
+			 * except when we deal with a twisted page for
+			 * specific Direct IO support, in which case
+			 * PageChecked flag is set on page.
+			 */
+			if (PageChecked(pg->pg))
+				lock_page(pg->pg);
 			rc = llcrypt_decrypt_pagecache_blocks(pg->pg,
 							      PAGE_SIZE, 0);
+			if (PageChecked(pg->pg))
+				unlock_page(pg->pg);
 			if (rc)
 				goto out;
 		}
diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index ba19b78..c8b97fa 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -2307,6 +2307,8 @@ void lustre_assert_wire_constants(void)
 		 LUSTRE_TOPDIR_FL);
 	LASSERTF(LUSTRE_INLINE_DATA_FL == 0x10000000, "found 0x%.8x\n",
 		 LUSTRE_INLINE_DATA_FL);
+	LASSERTF(LUSTRE_ENCRYPT_FL == 0x00800000UL, "found 0x%.8x\n",
+		 LUSTRE_ENCRYPT_FL);
 	LASSERTF(MDS_INODELOCK_LOOKUP == 0x00000001UL, "found 0x%.8x\n",
 		 MDS_INODELOCK_LOOKUP);
 	LASSERTF(MDS_INODELOCK_UPDATE == 0x00000002UL, "found 0x%.8x\n",
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 25/28] lustre: sec: restrict fallocate on encrypted files
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (23 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 24/28] lustre: sec: O_DIRECT for encrypted file James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  0:59 ` [lustre-devel] [PATCH 26/28] lustre: sec: encryption with different client PAGE_SIZE James Simmons
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Sebastien Buisson <sbuisson@ddn.com>

For now, ll_fallocate only supports standard preallocation.
Anyway, encrypted inodes can't handle collapse range or zero range or
insert range since we would need to re-encrypt blocks with a different
IV or XTS tweak (which are based on the logical block number).
So make sure we return -EOPNOTSUPP in this case, like what ext4 does.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12275
Lustre-commit: a7870fb9568bf ("LU-12275 sec: restrict fallocate on encrypted files")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/39220
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 4a3c534..02cc2d6 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -4927,6 +4927,17 @@ long ll_fallocate(struct file *filp, int mode, loff_t offset, loff_t len)
 	struct inode *inode = filp->f_path.dentry->d_inode;
 
 	/*
+	 * Encrypted inodes can't handle collapse range or zero range or insert
+	 * range since we would need to re-encrypt blocks with a different IV or
+	 * XTS tweak (which are based on the logical block number).
+	 * Similar to what ext4 does.
+	 */
+	if (IS_ENCRYPTED(inode) &&
+	    (mode & (FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_INSERT_RANGE |
+		     FALLOC_FL_ZERO_RANGE)))
+		return -EOPNOTSUPP;
+
+	/*
 	 * Only mode == 0 (which is standard prealloc) is supported now.
 	 * Punch is not supported yet.
 	 */
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 26/28] lustre: sec: encryption with different client PAGE_SIZE
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (24 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 25/28] lustre: sec: restrict fallocate on encrypted files James Simmons
@ 2020-11-16  0:59 ` James Simmons
  2020-11-16  1:00 ` [lustre-devel] [PATCH 27/28] lustre: sec: require enc key in case of O_CREAT only James Simmons
  2020-11-16  1:00 ` [lustre-devel] [PATCH 28/28] lustre: sec: fix O_DIRECT and encrypted files James Simmons
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  0:59 UTC (permalink / raw)
  To: lustre-devel

From: Sebastien Buisson <sbuisson@ddn.com>

In order to properly handle encryption/decryption on clients that have
a PAGE_SIZE != LUSTRE_ENCRYPTION_UNIT_SIZE (typically aarch64/ppc64),
a few adjustements are necessary:
- when encrypting, do not proceed with PAGE_SIZE as encryption length.
  Instead, round up to a multiple of LUSTRE_ENCRYPTION_UNIT_SIZE.
  On aarch64/ppc64, it avoids encrypting way beyond
  LUSTRE_ENCRYPTION_UNIT_SIZE when the page is not full.
- when decrypting, do not proceed with PAGE_SIZE as decryption length.
  Instead, do LUSTRE_ENCRYPTION_UNIT_SIZE length at a time. It enables
  proper detection of 'all 0s' sent by servers for content beyond file
  size.

Regarding tests, add sanity-sec test_53 to exercise encryption from
clients with different PAGE_SIZE.
The trick to achieve this with AT is to expect the client to have 64KB
PAGE_SIZE, and the servers to have 4KB PAGE_SIZE, and then mount a
client from the MDS node.
This also means code running on server side needs to have client
encryption support enabled, so CentOS/RHEL 8 at least.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12275
Lustre-commit: ac5fcdce025b4 ("LU-12275 sec: encryption with different client PAGE_SIZE")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/39315
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c      | 28 +++++++++++-----
 fs/lustre/osc/osc_request.c | 79 +++++++++++++++++++++++++++------------------
 2 files changed, 68 insertions(+), 39 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 02cc2d6..f7f917b 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -444,16 +444,28 @@ static inline int ll_dom_readpage(void *data, struct page *page)
 	kunmap_atomic(kaddr);
 
 	if (inode && IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode)) {
-		if (!llcrypt_has_encryption_key(inode))
+		if (!llcrypt_has_encryption_key(inode)) {
 			CDEBUG(D_SEC, "no enc key for " DFID "\n",
 			       PFID(ll_inode2fid(inode)));
-		/* decrypt only if page is not empty */
-		else if (memcmp(page_address(page),
-				page_address(ZERO_PAGE(0)),
-				PAGE_SIZE) != 0)
-			rc = llcrypt_decrypt_pagecache_blocks(page,
-							      PAGE_SIZE,
-							      0);
+		} else {
+			unsigned int offs = 0;
+
+			while (offs < PAGE_SIZE) {
+				/* decrypt only if page is not empty */
+				if (memcmp(page_address(page) + offs,
+					   page_address(ZERO_PAGE(0)),
+					   LUSTRE_ENCRYPTION_UNIT_SIZE) == 0)
+					break;
+
+				rc = llcrypt_decrypt_pagecache_blocks(page,
+								      LUSTRE_ENCRYPTION_UNIT_SIZE,
+								      0);
+				if (rc)
+					break;
+
+				offs += LUSTRE_ENCRYPTION_UNIT_SIZE;
+			}
+		}
 	}
 	unlock_page(page);
 
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index bf9ce44..746b695 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1421,8 +1421,12 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 			struct page *data_page = NULL;
 			bool retried = false;
 			bool lockedbymyself;
+			u32 nunits = (pg->off & ~PAGE_MASK) + pg->count;
 
 retry_encrypt:
+			if (nunits & ~LUSTRE_ENCRYPTION_MASK)
+				nunits = (nunits & LUSTRE_ENCRYPTION_MASK) +
+					  LUSTRE_ENCRYPTION_UNIT_SIZE;
 			/* The page can already be locked when we arrive here.
 			 * This is possible when cl_page_assume/vvp_page_assume
 			 * is stuck on wait_on_page_writeback with page lock
@@ -1435,7 +1439,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 			lockedbymyself = trylock_page(pg->pg);
 			data_page =
 				llcrypt_encrypt_pagecache_blocks(pg->pg,
-								 PAGE_SIZE, 0,
+								 nunits, 0,
 								 GFP_NOFS);
 			if (lockedbymyself)
 				unlock_page(pg->pg);
@@ -1458,24 +1462,29 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 					     oap->oap_obj_off +
 					     oap->oap_page_off;
 			}
-			/* len is forced to PAGE_SIZE, and poff to 0
+			/* len is forced to nunits, and relative offset to 0
 			 * so store the old, clear text info
 			 */
-			pg->bp_count_diff = PAGE_SIZE - pg->count;
-			pg->count = PAGE_SIZE;
+			pg->bp_count_diff = nunits - pg->count;
+			pg->count = nunits;
 			pg->bp_off_diff = pg->off & ~PAGE_MASK;
 			pg->off = pg->off & PAGE_MASK;
 		}
 	} else if (opc == OST_READ && inode && IS_ENCRYPTED(inode)) {
 		for (i = 0; i < page_count; i++) {
 			struct brw_page *pg = pga[i];
-
-			/* count/off are forced to cover the whole page so that
-			 * all encrypted data is stored on the OST, so adjust
-			 * bp_{count,off}_diff for the size of the clear text.
+			u32 nunits = (pg->off & ~PAGE_MASK) + pg->count;
+
+			if (nunits & ~LUSTRE_ENCRYPTION_MASK)
+				nunits = (nunits & LUSTRE_ENCRYPTION_MASK) +
+					  LUSTRE_ENCRYPTION_UNIT_SIZE;
+			/* count/off are forced to cover the whole encryption
+			 * unit size so that all encrypted data is stored on the
+			 * OST, so adjust bp_{count,off}_diff for the size of
+			 * the clear text.
 			 */
-			pg->bp_count_diff = PAGE_SIZE - pg->count;
-			pg->count = PAGE_SIZE;
+			pg->bp_count_diff = nunits - pg->count;
+			pg->count = nunits;
 			pg->bp_off_diff = pg->off & ~PAGE_MASK;
 			pg->off = pg->off & PAGE_MASK;
 		}
@@ -2096,30 +2105,38 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 		}
 		for (idx = 0; idx < aa->aa_page_count; idx++) {
 			struct brw_page *pg = aa->aa_ppga[idx];
+			unsigned int offs = 0;
+
+			while (offs < PAGE_SIZE) {
+				/* do not decrypt if page is all 0s */
+				if (memchr_inv(page_address(pg->pg) + offs, 0,
+					       LUSTRE_ENCRYPTION_UNIT_SIZE) == NULL) {
+					/* if page is empty forward info to
+					 * upper layers (ll_io_zero_page) by
+					 * clearing PagePrivate2
+					 */
+					if (!offs)
+						ClearPagePrivate2(pg->pg);
+					break;
+				}
 
-			/* do not decrypt if page is all 0s */
-			if (memchr_inv(page_address(pg->pg), 0,
-				       PAGE_SIZE) == NULL) {
-				/* if page is empty forward info to upper layers
-				 * (ll_io_zero_page) by clearing PagePrivate2
+				/* The page is already locked when we arrive
+				 * here, except when we deal with a twisted
+				 * page for specific Direct IO support, in
+				 * which case PageChecked flag is set on page.
 				 */
-				ClearPagePrivate2(pg->pg);
-				continue;
+				if (PageChecked(pg->pg))
+					lock_page(pg->pg);
+				rc = llcrypt_decrypt_pagecache_blocks(pg->pg,
+								      LUSTRE_ENCRYPTION_UNIT_SIZE,
+								      offs);
+				if (PageChecked(pg->pg))
+					unlock_page(pg->pg);
+				if (rc)
+					goto out;
+
+				offs += LUSTRE_ENCRYPTION_UNIT_SIZE;
 			}
-
-			/* The page is already locked when we arrive here,
-			 * except when we deal with a twisted page for
-			 * specific Direct IO support, in which case
-			 * PageChecked flag is set on page.
-			 */
-			if (PageChecked(pg->pg))
-				lock_page(pg->pg);
-			rc = llcrypt_decrypt_pagecache_blocks(pg->pg,
-							      PAGE_SIZE, 0);
-			if (PageChecked(pg->pg))
-				unlock_page(pg->pg);
-			if (rc)
-				goto out;
 		}
 	}
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 27/28] lustre: sec: require enc key in case of O_CREAT only
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (25 preceding siblings ...)
  2020-11-16  0:59 ` [lustre-devel] [PATCH 26/28] lustre: sec: encryption with different client PAGE_SIZE James Simmons
@ 2020-11-16  1:00 ` James Simmons
  2020-11-16  1:00 ` [lustre-devel] [PATCH 28/28] lustre: sec: fix O_DIRECT and encrypted files James Simmons
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  1:00 UTC (permalink / raw)
  To: lustre-devel

From: Sebastien Buisson <sbuisson@ddn.com>

In ll_atomic_open(), do not return -ENOKEY when trying to open
either a directory or a file without the encryption key, unless
O_CREAT flag is specified.
Indeed, listing directory content is allowed even without the key.
And in case of regular file, ll_file_open() already checks for the
presence of an encryption key.

Improve sanity-sec test_54 to verify this is working properly.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13975
Lustre-commit: f6daee15b2c8ec ("LU-13975 sec: require enc key in case of O_CREAT only")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/39983
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/namei.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index da6b729..b24f097 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -1113,18 +1113,19 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 	it->it_flags &= ~MDS_OPEN_FL_INTERNAL;
 
 	if (ll_sbi_has_encrypt(ll_i2sbi(dir)) && IS_ENCRYPTED(dir)) {
-		/* we know that we are going to create a regular file because
+		/* in case of create, this is going to be a regular file because
 		 * we set S_IFREG bit on it->it_create_mode above
 		 */
 		rc = llcrypt_get_encryption_info(dir);
 		if (rc)
 			goto out_release;
-		if (!llcrypt_has_encryption_key(dir)) {
-			rc = -ENOKEY;
-			goto out_release;
+		if (open_flags & O_CREAT) {
+			if (!llcrypt_has_encryption_key(dir)) {
+				rc = -ENOKEY;
+				goto out_release;
+			}
+			encrypt = true;
 		}
-		encrypt = true;
-		rc = 0;
 	}
 
 	OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_CREATE_FILE_PAUSE2, cfs_fail_val);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [lustre-devel] [PATCH 28/28] lustre: sec: fix O_DIRECT and encrypted files
  2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
                   ` (26 preceding siblings ...)
  2020-11-16  1:00 ` [lustre-devel] [PATCH 27/28] lustre: sec: require enc key in case of O_CREAT only James Simmons
@ 2020-11-16  1:00 ` James Simmons
  27 siblings, 0 replies; 29+ messages in thread
From: James Simmons @ 2020-11-16  1:00 UTC (permalink / raw)
  To: lustre-devel

From: Sebastien Buisson <sbuisson@ddn.com>

Sometimes, we can end up in a situation where
osc_release_bounce_pages() mistakenly consider pages as fscrypt
bounce pages, and tries to free them.
Fix the way we consider bounce pages by always setting the PageChecked
flag on them.

Fixes: feca6b62a6 ("lustre: sec: O_DIRECT for encrypted file")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14045
Lustre-commit: e07d0516dcde4b ("LU-14045 sec: fix O_DIRECT and encrypted files")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/40295
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_request.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 746b695..f225ccd 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1369,8 +1369,11 @@ static inline void osc_release_bounce_pages(struct brw_page **pga,
 	int i;
 
 	for (i = 0; i < page_count; i++) {
-		if (!pga[i]->pg->mapping)
-			/* bounce pages are unmapped */
+		/* Bounce pages allocated by a call to
+		 * llcrypt_encrypt_pagecache_blocks() in osc_brw_prep_request()
+		 * are identified thanks to the PageChecked flag.
+		 */
+		if (PageChecked(pga[i]->pg))
 			llcrypt_finalize_bounce_page(&pga[i]->pg);
 		pga[i]->count -= pga[i]->bp_count_diff;
 		pga[i]->off += pga[i]->bp_off_diff;
@@ -1453,6 +1456,10 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 				ptlrpc_request_free(req);
 				return rc;
 			}
+			/* Set PageChecked flag on bounce page for
+			 * disambiguation in osc_release_bounce_pages().
+			 */
+			SetPageChecked(data_page);
 			pg->pg = data_page;
 			/* there should be no gap in the middle of page array */
 			if (i == page_count - 1) {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-11-16  1:00 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-16  0:59 [lustre-devel] [PATCH 00/28] OpenSFS backport for Nov 15 2020 James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 01/28] llite: remove splice_read handling for PCC James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 02/28] lustre: llite: disable statahead_agl for sanity test_56ra James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 03/28] lustre: seq_file .next functions must update *pos James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 04/28] lustre: llite: ASSERTION( last_oap_count > 0 ) failed James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 05/28] lnet: o2ib: raise bind cap before resolving address James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 06/28] lustre: use memalloc_nofs_save() for GFP_NOFS kvmalloc allocations James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 07/28] lnet: o2iblnd: Don't retry indefinitely James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 08/28] lustre: llite: rmdir releases inode on client James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 09/28] lustre: gss: update sequence in case of target disconnect James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 10/28] lustre: lov: doesn't check lov_refcount James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 11/28] lustre: ptlrpc: remove unused code at pinger James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 12/28] lustre: mdc: remote object support getattr from cache James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 13/28] lustre: llite: pass name in getattr by FID James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 14/28] lnet: o2iblnd: 'Timed out tx' error message James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 15/28] lustre: ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 16/28] lustre: ldlm: group locks for DOM IBIT lock James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 17/28] lustre: ptlrpc: decrease time between reconnection James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 18/28] lustre: ptlrpc: throttle RPC resend if network error James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 19/28] lustre: ldlm: BL AST vs failed lock enqueue race James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 20/28] lustre: ptlrpc: don't log connection 'restored' inappropriately James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 21/28] lustre: llite: Avoid eternel retry loops with MAP_POPULATE James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 22/28] lustre: ptlrpc: introduce OST_SEEK RPC James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 23/28] lustre: clio: SEEK_HOLE/SEEK_DATA on client side James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 24/28] lustre: sec: O_DIRECT for encrypted file James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 25/28] lustre: sec: restrict fallocate on encrypted files James Simmons
2020-11-16  0:59 ` [lustre-devel] [PATCH 26/28] lustre: sec: encryption with different client PAGE_SIZE James Simmons
2020-11-16  1:00 ` [lustre-devel] [PATCH 27/28] lustre: sec: require enc key in case of O_CREAT only James Simmons
2020-11-16  1:00 ` [lustre-devel] [PATCH 28/28] lustre: sec: fix O_DIRECT and encrypted files James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).