All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
	Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 14/40] lustre: llite: check read page past requested
Date: Sun,  9 Apr 2023 08:12:54 -0400	[thread overview]
Message-ID: <1681042400-15491-15-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org>

From: Qian Yingjin <qian@ddn.com>

Due to a kernel bug introduced in 5.12 in commit:
cbd59c48ae2bcadc4a7599c29cf32fd3f9b78251
("mm/filemap: use head pages in generic_file_buffered_read")
if the page immediately after the current read is in cache,
the kernel will try to read it.

This attempts to read a page past the end of requested
read from userspace, and so has not been safely locked by
Lustre.

For a page after the end of the current read, check whether
it is under the protection of a DLM lock. If so, we take a
reference on the DLM lock until the page read has finished
and then release the reference.  If the page is not covered
by a DLM lock, then we are racing with the page being
removed from Lustre.  In that case, we return
AOP_TRUNCATED_PAGE, which makes the kernel release its
reference on the page and retry the page read.  This allows
the page to be removed from cache, so the kernel will not
find it and incorrectly attempt to read it again.

NB: Earlier versions of this description refer to stripe
boundaries, but the locking issue can occur whether or
not the page is on a stripe boundary, because dlmlocks
can cover part of a stripe.  (This is rare, but is
allowed.)

WC-bug-id: https://jira.whamcloud.com/browse/LU-16412
Lustre-commit: 2f8f38effac3a9519 ("LU-16412 llite: check read page past requested")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49723
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_internal.h |  2 ++
 fs/lustre/llite/rw.c             | 58 +++++++++++++++++++++++++++++++++++++---
 fs/lustre/llite/vvp_io.c         | 10 +++++--
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 2223dbb..970b144 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1371,6 +1371,8 @@ struct ll_cl_context {
 	struct cl_io		*lcc_io;
 	struct cl_page		*lcc_page;
 	enum lcc_type		 lcc_type;
+	struct kiocb		*lcc_iocb;
+	struct iov_iter		*lcc_iter;
 };
 
 struct ll_thread_info {
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index dea2af1..d285ae1 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -1858,11 +1858,14 @@ int ll_readpage(struct file *file, struct page *vmpage)
 {
 	struct inode *inode = file_inode(file);
 	struct cl_object *clob = ll_i2info(inode)->lli_clob;
-	struct ll_cl_context *lcc;
+	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	const struct lu_env *env = NULL;
+	struct cl_read_ahead ra = { 0 };
+	struct ll_cl_context *lcc;
 	struct cl_io *io = NULL;
+	struct iov_iter *iter;
 	struct cl_page *page;
-	struct ll_sb_info *sbi = ll_i2sbi(inode);
+	struct kiocb *iocb;
 	int result;
 
 	if (OBD_FAIL_PRECHECK(OBD_FAIL_LLITE_READPAGE_PAUSE)) {
@@ -1911,6 +1914,8 @@ int ll_readpage(struct file *file, struct page *vmpage)
 		struct ll_readahead_state *ras = &fd->fd_ras;
 		struct lu_env *local_env = NULL;
 
+		CDEBUG(D_VFSTRACE, "fast read pgno: %ld\n", vmpage->index);
+
 		result = -ENODATA;
 
 		/*
@@ -1968,6 +1973,47 @@ int ll_readpage(struct file *file, struct page *vmpage)
 		return result;
 	}
 
+	if (lcc && lcc->lcc_type != LCC_MMAP) {
+		iocb = lcc->lcc_iocb;
+		iter = lcc->lcc_iter;
+
+		CDEBUG(D_VFSTRACE, "pgno:%ld, cnt:%ld, pos:%lld\n",
+		       vmpage->index, iter->count, iocb->ki_pos);
+
+		/*
+		 * This handles a kernel bug introduced in kernel 5.12:
+		 * comment: cbd59c48ae2bcadc4a7599c29cf32fd3f9b78251
+		 * ("mm/filemap: use head pages in generic_file_buffered_read")
+		 *
+		 * See above in this function for a full description of the
+		 * bug.  Briefly, the kernel will try to read 1 more page than
+		 * was actually requested *if that page is already in cache*.
+		 *
+		 * Because this page is beyond the boundary of the requested
+		 * read, Lustre does not lock it as part of the read.  This
+		 * means we must check if there is a valid dlmlock on this
+		 * page and reference it before we attempt to read in the
+		 * page.  If there is not a valid dlmlock, then we are racing
+		 * with dlmlock cancellation and the page is being removed
+		 * from the cache.
+		 *
+		 * That means we should return AOP_TRUNCATED_PAGE, which will
+		 * cause the kernel to retry the read, which should allow the
+		 * page to be removed from cache as the lock is cancelled.
+		 *
+		 * This should never occur except in kernels with the bug
+		 * mentioned above.
+		 */
+		if (cl_offset(clob, vmpage->index) >= iter->count + iocb->ki_pos) {
+			result = cl_io_read_ahead(env, io, vmpage->index, &ra);
+			if (result < 0 || vmpage->index > ra.cra_end_idx) {
+				cl_read_ahead_release(env, &ra);
+				unlock_page(vmpage);
+				return AOP_TRUNCATED_PAGE;
+			}
+		}
+	}
+
 	/**
 	 * Direct read can fall back to buffered read, but DIO is done
 	 * with lockless i/o, and buffered requires LDLM locking, so in
@@ -1979,7 +2025,8 @@ int ll_readpage(struct file *file, struct page *vmpage)
 		unlock_page(vmpage);
 		io->ci_dio_lock = 1;
 		io->ci_need_restart = 1;
-		return -ENOLCK;
+		result = -ENOLCK;
+		goto out;
 	}
 
 	page = cl_page_find(env, clob, vmpage->index, vmpage, CPT_CACHEABLE);
@@ -1999,5 +2046,10 @@ int ll_readpage(struct file *file, struct page *vmpage)
 		unlock_page(vmpage);
 		result = PTR_ERR(page);
 	}
+
+out:
+	if (ra.cra_release)
+		cl_read_ahead_release(env, &ra);
+
 	return result;
 }
diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index eacb35b..2da74a2 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -806,6 +806,7 @@ static int vvp_io_read_start(const struct lu_env *env,
 	loff_t pos = io->u.ci_rd.rd.crw_pos;
 	size_t cnt = io->u.ci_rd.rd.crw_count;
 	size_t tot = vio->vui_tot_count;
+	struct ll_cl_context *lcc;
 	int exceed = 0;
 	int result;
 	struct iov_iter iter;
@@ -868,9 +869,14 @@ static int vvp_io_read_start(const struct lu_env *env,
 	file_accessed(file);
 	LASSERT(vio->vui_iocb->ki_pos == pos);
 	iter = *vio->vui_iter;
-	result = generic_file_read_iter(vio->vui_iocb, &iter);
-	goto out;
 
+	lcc = ll_cl_find(inode);
+	lcc->lcc_iter = &iter;
+	lcc->lcc_iocb = vio->vui_iocb;
+	CDEBUG(D_VFSTRACE, "cnt:%ld,iocb pos:%lld\n", lcc->lcc_iter->count,
+	       lcc->lcc_iocb->ki_pos);
+
+	result = generic_file_read_iter(vio->vui_iocb, &iter);
 out:
 	if (result >= 0) {
 		if (result < cnt)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

  parent reply	other threads:[~2023-04-09 12:26 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-09 12:12 [lustre-devel] [PATCH 00/40] lustre: backport OpenSFS changes from March XX, 2023 James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 01/40] lustre: protocol: basic batching processing framework James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 02/40] lustre: lov: fiemap improperly handles fm_extent_count=0 James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 03/40] lustre: llite: SIGBUS is possible on a race with page reclaim James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 04/40] lustre: osc: page fault in osc_release_bounce_pages() James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 05/40] lustre: readahead: add stats for read-ahead page count James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 06/40] lustre: quota: enforce project quota for root James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 07/40] lustre: ldlm: send the cancel RPC asap James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 08/40] lustre: enc: align Base64 encoding with RFC 4648 base64url James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 09/40] lustre: quota: fix insane grant quota James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 10/40] lustre: llite: check truncated page in ->readpage() James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 11/40] lnet: o2iblnd: Fix key mismatch issue James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 12/40] lustre: sec: fid2path for encrypted files James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 13/40] lustre: sec: Lustre/HSM on enc file with enc key James Simmons
2023-04-09 12:12 ` James Simmons [this message]
2023-04-09 12:12 ` [lustre-devel] [PATCH 15/40] lustre: llite: fix relatime support James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 16/40] lustre: ptlrpc: clarify AT error message James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 17/40] lustre: update version to 2.15.54 James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 18/40] lustre: tgt: skip free inodes in OST weights James Simmons
2023-04-09 12:12 ` [lustre-devel] [PATCH 19/40] lustre: fileset: check fileset for operations by fid James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 20/40] lustre: clio: Remove cl_page_size() James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 21/40] lustre: fid: clean up OBIF_MAX_OID and IDIF_MAX_OID James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 22/40] lustre: llog: fix processing of a wrapped catalog James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 23/40] lustre: llite: replace lld_nfs_dentry flag with opencache handling James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 24/40] lustre: llite: match lock in corresponding namespace James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 25/40] lnet: libcfs: remove unused hash code James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 26/40] lustre: client: -o network needs add_conn processing James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 27/40] lnet: Lock primary NID logic James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 28/40] lnet: Peers added via kernel API should be permanent James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 29/40] lnet: don't delete peer created by Lustre James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 30/40] lnet: memory leak in copy_ioc_udsp_descr James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 31/40] lnet: remove crash with UDSP James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 32/40] lustre: ptlrpc: fix clang build errors James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 33/40] lustre: ldlm: remove client_import_find_conn() James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 34/40] lnet: add 'force' option to lnetctl peer del James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 35/40] lustre: ldlm: BL_AST lock cancel still can be batched James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 36/40] lnet: lnet_parse_route uses wrong loop var James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 37/40] lustre: tgt: add qos debug James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 38/40] lustre: enc: file names encryption when using secure boot James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 39/40] lustre: uapi: add DMV_IMP_INHERIT connect flag James Simmons
2023-04-09 12:13 ` [lustre-devel] [PATCH 40/40] lustre: llite: dir layout inheritance fixes James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1681042400-15491-15-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=adilger@whamcloud.com \
    --cc=green@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.