Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/3] Rework the gfs2 read and page fault locking
@ 2019-11-22 23:53 Andreas Gruenbacher
  2019-11-22 23:53 ` [RFC PATCH 1/3] fs: Add IOCB_CACHED flag for generic_file_read_iter Andreas Gruenbacher
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Andreas Gruenbacher @ 2019-11-22 23:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Whitehouse, Konstantin Khlebnikov, Kirill A. Shutemov,
	linux-mm, Andrew Morton, linux-kernel, linux-fsdevel,
	Alexander Viro, Johannes Weiner, cluster-devel, Ronnie Sahlberg,
	Steve French, Bob Peterson, Andreas Gruenbacher

Hello,

this patch series moves the glock lock taking in gfs2 from the
->readpage and ->readpages inode operations to the ->read_iter file and
->fault vm operations.  To achieve that, we add flags to the
generic_file_read_iter and filemap_fault generic helpers.

This proposal was triggered by the following discussion:

https://lore.kernel.org/linux-fsdevel/157225677483.3442.4227193290486305330.stgit@buzz/

In that thread, Linus argued that filesystems should make sure the inode
size is sufficiently up-to-date before calling the generic helpers, and
that filesystems can do it themselves if they want more than that.
That's surely doable.  However, implementing those operations properly
at the filesystem level quickly becomes complicated when it gets to
things like readahead.  In addition, those slightly modified copies of
those helpers would surely diverge from their originals over time, and
maintaining them properly would become hard.  So I hope the relatively
small changes to make the original helpers slightly more flexible will
be acceptable instead.

With the IOCB_CACHED flag added by one of the patches in this series,
the code that Konstantin's initial patch adds to
generic_file_buffered_read could be made conditional on the IOCB_CACHED
flag being cleared.  That way, it won't misfire on filesystems that
allow a stale inode size.  (I'm not sure if any filesystems other than
gfs2 are actually affected.)

Some additional explanation:

The cache consistency model of filesystems like gfs2 is such that if
pages are found in an inode's address space, those pages as well as the
inode size are up to date and can be used without taking any filesystem
locks.  If a page is not cached, filesystem locks must be taken before
the page can be read; this will also bring the inode size up to date.

Thus far, gfs2 has taken the filesystem locks inside the ->readpage and
->readpages address space operations.  A better approach seems to be to
take those locks earlier, in the ->read_iter file and ->fault vm
operations.  This would also avoid a lock inversion in ->readpages.

We obviously want to avoid taking the filesystem locks unnecessarily
when the pages we are looking for are already cached; otherwise, we
would cripple performance.  So we need to check if those pages are
present first.  That's actually exactly what the generic_file_read_iter
and filemap_fault helpers do already anyway, except that they will call
into ->readpage and ->readpages when they find pages missing.  Instead
of that, we'd like those helpers to return with an error code that
allows us to retry the operation after taking the filesystem locks.

Thanks,
Andreas

Andreas Gruenbacher (3):
  fs: Add IOCB_CACHED flag for generic_file_read_iter
  fs: Add FAULT_FLAG_CACHED flag for filemap_fault
  gfs2: Rework read and page fault locking

 fs/gfs2/aops.c     | 36 +++++--------------------
 fs/gfs2/file.c     | 66 ++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/fs.h |  1 +
 include/linux/mm.h |  4 ++-
 mm/filemap.c       | 60 ++++++++++++++++++++++++++++++-----------
 5 files changed, 119 insertions(+), 48 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/3] fs: Add IOCB_CACHED flag for generic_file_read_iter
  2019-11-22 23:53 [RFC PATCH 0/3] Rework the gfs2 read and page fault locking Andreas Gruenbacher
@ 2019-11-22 23:53 ` Andreas Gruenbacher
  2019-11-22 23:53 ` [RFC PATCH 2/3] fs: Add FAULT_FLAG_CACHED flag for filemap_fault Andreas Gruenbacher
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Andreas Gruenbacher @ 2019-11-22 23:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Whitehouse, Konstantin Khlebnikov, Kirill A. Shutemov,
	linux-mm, Andrew Morton, linux-kernel, linux-fsdevel,
	Alexander Viro, Johannes Weiner, cluster-devel, Ronnie Sahlberg,
	Steve French, Bob Peterson, Andreas Gruenbacher

Add an IOCB_CACHED flag which indicates to generic_file_read_iter that
it should only look at the page cache, without triggering any filesystem
I/O for the actual request or for readahead.  When filesystem I/O would
be triggered, an error code should be returned instead.

This allows the caller to perform a tentative read out of the page
cache, and to retry the read after taking the necessary steps when the
requested pages are not cached.

When readahead would be triggered, we return -ECANCELED instead of
-EAGAIN.  This allows to distinguish attempted readheads from attempted
reads (with IOCB_NOWAIT).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 include/linux/fs.h |  1 +
 mm/filemap.c       | 17 ++++++++++++++---
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index e0d909d35763..4ca5e2885452 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -314,6 +314,7 @@ enum rw_hint {
 #define IOCB_SYNC		(1 << 5)
 #define IOCB_WRITE		(1 << 6)
 #define IOCB_NOWAIT		(1 << 7)
+#define IOCB_CACHED		(1 << 8)
 
 struct kiocb {
 	struct file		*ki_filp;
diff --git a/mm/filemap.c b/mm/filemap.c
index 85b7d087eb45..024ff0b5fcb6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2046,7 +2046,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 
 		page = find_get_page(mapping, index);
 		if (!page) {
-			if (iocb->ki_flags & IOCB_NOWAIT)
+			if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_CACHED))
 				goto would_block;
 			page_cache_sync_readahead(mapping,
 					ra, filp,
@@ -2056,12 +2056,16 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 				goto no_cached_page;
 		}
 		if (PageReadahead(page)) {
+			if (iocb->ki_flags & IOCB_CACHED) {
+				error = -ECANCELED;
+				goto out;
+			}
 			page_cache_async_readahead(mapping,
 					ra, filp, page,
 					index, last_index - index);
 		}
 		if (!PageUptodate(page)) {
-			if (iocb->ki_flags & IOCB_NOWAIT) {
+			if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_CACHED)) {
 				put_page(page);
 				goto would_block;
 			}
@@ -2266,6 +2270,13 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
  *
  * This is the "read_iter()" routine for all filesystems
  * that can use the page cache directly.
+ *
+ * In the IOCB_NOWAIT flag in iocb->ki_flags indicates that -EAGAIN should be
+ * returned if completing the request would require I/O; this does not prevent
+ * readahead.  The IOCB_CACHED flag indicates that -EAGAIN should be returned
+ * as under the IOCB_NOWAIT flag, and that -ECANCELED should be returned when
+ * readhead would be triggered.
+ *
  * Return:
  * * number of bytes copied, even for partial reads
  * * negative error code if nothing was read
@@ -2286,7 +2297,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 		loff_t size;
 
 		size = i_size_read(inode);
-		if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_CACHED)) {
 			if (filemap_range_has_page(mapping, iocb->ki_pos,
 						   iocb->ki_pos + count - 1))
 				return -EAGAIN;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/3] fs: Add FAULT_FLAG_CACHED flag for filemap_fault
  2019-11-22 23:53 [RFC PATCH 0/3] Rework the gfs2 read and page fault locking Andreas Gruenbacher
  2019-11-22 23:53 ` [RFC PATCH 1/3] fs: Add IOCB_CACHED flag for generic_file_read_iter Andreas Gruenbacher
@ 2019-11-22 23:53 ` Andreas Gruenbacher
  2019-11-22 23:53 ` [RFC PATCH 3/3] gfs2: Rework read and page fault locking Andreas Gruenbacher
  2019-11-25  9:17 ` [RFC PATCH 0/3] Rework the gfs2 " Kirill A. Shutemov
  3 siblings, 0 replies; 7+ messages in thread
From: Andreas Gruenbacher @ 2019-11-22 23:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Whitehouse, Konstantin Khlebnikov, Kirill A. Shutemov,
	linux-mm, Andrew Morton, linux-kernel, linux-fsdevel,
	Alexander Viro, Johannes Weiner, cluster-devel, Ronnie Sahlberg,
	Steve French, Bob Peterson, Andreas Gruenbacher

Add a FAULT_FLAG_CACHED flag which indicates to filemap_fault that it
should only look at the page cache, without triggering filesystem I/O
for the actual request or for readahead.  When filesystem I/O would be
triggered, VM_FAULT_RETRY should be returned instead.

This allows the caller to tentatively satisfy a minor page fault out of
the page cache, and to retry the operation after taking the necessary
steps when that isn't possible.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 include/linux/mm.h |  4 +++-
 mm/filemap.c       | 43 ++++++++++++++++++++++++++++++-------------
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a2adf95b3f9c..b3317e4b2607 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -392,6 +392,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_USER		0x40	/* The fault originated in userspace */
 #define FAULT_FLAG_REMOTE	0x80	/* faulting for non current tsk/mm */
 #define FAULT_FLAG_INSTRUCTION  0x100	/* The fault was during an instruction fetch */
+#define FAULT_FLAG_CACHED		0x200	/* Only look at the page cache */
 
 #define FAULT_FLAG_TRACE \
 	{ FAULT_FLAG_WRITE,		"WRITE" }, \
@@ -402,7 +403,8 @@ extern pgprot_t protection_map[16];
 	{ FAULT_FLAG_TRIED,		"TRIED" }, \
 	{ FAULT_FLAG_USER,		"USER" }, \
 	{ FAULT_FLAG_REMOTE,		"REMOTE" }, \
-	{ FAULT_FLAG_INSTRUCTION,	"INSTRUCTION" }
+	{ FAULT_FLAG_INSTRUCTION,	"INSTRUCTION" }, \
+	{ FAULT_FLAG_CACHED,		"CACHED" }
 
 /*
  * vm_fault is filled by the the pagefault handler and passed to the vma's
diff --git a/mm/filemap.c b/mm/filemap.c
index 024ff0b5fcb6..2297fad3b03a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2383,7 +2383,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 	 * the mmap_sem still held. That's how FAULT_FLAG_RETRY_NOWAIT
 	 * is supposed to work. We have way too many special cases..
 	 */
-	if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
+	if (vmf->flags & (FAULT_FLAG_RETRY_NOWAIT | FAULT_FLAG_CACHED))
 		return 0;
 
 	*fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
@@ -2460,26 +2460,28 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
  * so we want to possibly extend the readahead further.  We return the file that
  * was pinned if we have to drop the mmap_sem in order to do IO.
  */
-static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
-					    struct page *page)
+static vm_fault_t do_async_mmap_readahead(struct vm_fault *vmf,
+					  struct page *page,
+					  struct file **fpin)
 {
 	struct file *file = vmf->vma->vm_file;
 	struct file_ra_state *ra = &file->f_ra;
 	struct address_space *mapping = file->f_mapping;
-	struct file *fpin = NULL;
 	pgoff_t offset = vmf->pgoff;
 
 	/* If we don't want any read-ahead, don't bother */
 	if (vmf->vma->vm_flags & VM_RAND_READ)
-		return fpin;
+		return 0;
 	if (ra->mmap_miss > 0)
 		ra->mmap_miss--;
 	if (PageReadahead(page)) {
-		fpin = maybe_unlock_mmap_for_io(vmf, fpin);
+		if (vmf->flags & FAULT_FLAG_CACHED)
+			return VM_FAULT_RETRY;
+		*fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
 		page_cache_async_readahead(mapping, ra, file,
 					   page, offset, ra->ra_pages);
 	}
-	return fpin;
+	return 0;
 }
 
 /**
@@ -2495,8 +2497,11 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
  *
  * vma->vm_mm->mmap_sem must be held on entry.
  *
- * If our return value has VM_FAULT_RETRY set, it's because the mmap_sem
- * may be dropped before doing I/O or by lock_page_maybe_drop_mmap().
+ * This function may drop the mmap_sem before doing I/O or waiting for a page
+ * lock; this is indicated by the VM_FAULT_RETRY flag in our return value.
+ * Setting FAULT_FLAG_CACHED or FAULT_FLAG_RETRY_NOWAIT in vmf->flags will
+ * prevent dropping the mmap_sem; in that case, VM_FAULT_RETRY indicates that
+ * the mmap_sem would have been dropped.
  *
  * If our return value does not have VM_FAULT_RETRY set, the mmap_sem
  * has not been released.
@@ -2518,9 +2523,15 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	struct page *page;
 	vm_fault_t ret = 0;
 
-	max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-	if (unlikely(offset >= max_off))
-		return VM_FAULT_SIGBUS;
+	/*
+	 * FAULT_FLAG_CACHED indicates that the inode size is only guaranteed
+	 * to be valid when the page we are looking for is in the page cache.
+	 */
+	if (!(vmf->flags & FAULT_FLAG_CACHED)) {
+		max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+		if (unlikely(offset >= max_off))
+			return VM_FAULT_SIGBUS;
+	}
 
 	/*
 	 * Do we have something in the page cache already?
@@ -2531,8 +2542,14 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 		 * We found the page, so try async readahead before
 		 * waiting for the lock.
 		 */
-		fpin = do_async_mmap_readahead(vmf, page);
+		ret = do_async_mmap_readahead(vmf, page, &fpin);
+		if (ret) {
+			put_page(page);
+			return ret;
+		}
 	} else if (!page) {
+		if (vmf->flags & FAULT_FLAG_CACHED)
+			goto out_retry;
 		/* No page in the page cache at all */
 		count_vm_event(PGMAJFAULT);
 		count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 3/3] gfs2: Rework read and page fault locking
  2019-11-22 23:53 [RFC PATCH 0/3] Rework the gfs2 read and page fault locking Andreas Gruenbacher
  2019-11-22 23:53 ` [RFC PATCH 1/3] fs: Add IOCB_CACHED flag for generic_file_read_iter Andreas Gruenbacher
  2019-11-22 23:53 ` [RFC PATCH 2/3] fs: Add FAULT_FLAG_CACHED flag for filemap_fault Andreas Gruenbacher
@ 2019-11-22 23:53 ` Andreas Gruenbacher
  2019-11-25  9:15   ` Kirill A. Shutemov
  2019-11-25  9:17 ` [RFC PATCH 0/3] Rework the gfs2 " Kirill A. Shutemov
  3 siblings, 1 reply; 7+ messages in thread
From: Andreas Gruenbacher @ 2019-11-22 23:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Whitehouse, Konstantin Khlebnikov, Kirill A. Shutemov,
	linux-mm, Andrew Morton, linux-kernel, linux-fsdevel,
	Alexander Viro, Johannes Weiner, cluster-devel, Ronnie Sahlberg,
	Steve French, Bob Peterson, Andreas Gruenbacher

Move the glock lock taking code from the ->readpage and ->readpages
address space operations to the ->read_iter file and ->fault vm
operations.

To avoid taking the lock even when an operation can be satisfied out of
the page cache, try the operation without taking the lock first.  When
that fails, take the lock and repeat the opeation.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/aops.c | 36 ++++++---------------------
 fs/gfs2/file.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 71 insertions(+), 31 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index b9fe975d7625..4aa6c952eb90 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -515,26 +515,10 @@ static int __gfs2_readpage(void *file, struct page *page)
 
 static int gfs2_readpage(struct file *file, struct page *page)
 {
-	struct address_space *mapping = page->mapping;
-	struct gfs2_inode *ip = GFS2_I(mapping->host);
-	struct gfs2_holder gh;
 	int error;
 
-	unlock_page(page);
-	gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh);
-	error = gfs2_glock_nq(&gh);
-	if (unlikely(error))
-		goto out;
-	error = AOP_TRUNCATED_PAGE;
-	lock_page(page);
-	if (page->mapping == mapping && !PageUptodate(page))
-		error = __gfs2_readpage(file, page);
-	else
-		unlock_page(page);
-	gfs2_glock_dq(&gh);
-out:
-	gfs2_holder_uninit(&gh);
-	if (error && error != AOP_TRUNCATED_PAGE)
+	error = __gfs2_readpage(file, page);
+	if (error)
 		lock_page(page);
 	return error;
 }
@@ -602,18 +586,12 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
 	struct inode *inode = mapping->host;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct gfs2_holder gh;
-	int ret;
+	int ret = 0;
 
-	gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh);
-	ret = gfs2_glock_nq(&gh);
-	if (unlikely(ret))
-		goto out_uninit;
-	if (!gfs2_is_stuffed(ip))
-		ret = mpage_readpages(mapping, pages, nr_pages, gfs2_block_map);
-	gfs2_glock_dq(&gh);
-out_uninit:
-	gfs2_holder_uninit(&gh);
+	if (gfs2_is_stuffed(ip))
+		goto out;
+	ret = mpage_readpages(mapping, pages, nr_pages, gfs2_block_map);
+out:
 	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
 		ret = -EIO;
 	return ret;
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 997b326247e2..207d39996353 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -524,8 +524,34 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
 	return block_page_mkwrite_return(ret);
 }
 
+static vm_fault_t gfs2_fault(struct vm_fault *vmf)
+{
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_holder gh;
+	vm_fault_t ret;
+	int err;
+
+	vmf->flags |= FAULT_FLAG_CACHED;
+	ret = filemap_fault(vmf);
+	vmf->flags &= ~FAULT_FLAG_CACHED;
+	if (!(ret & VM_FAULT_RETRY))
+		return ret;
+	gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh);
+	err = gfs2_glock_nq(&gh);
+	if (err) {
+		ret = block_page_mkwrite_return(err);
+		goto out_uninit;
+	}
+	ret = filemap_fault(vmf);
+	gfs2_glock_dq(&gh);
+out_uninit:
+	gfs2_holder_uninit(&gh);
+	return ret;
+}
+
 static const struct vm_operations_struct gfs2_vm_ops = {
-	.fault = filemap_fault,
+	.fault = gfs2_fault,
 	.map_pages = filemap_map_pages,
 	.page_mkwrite = gfs2_page_mkwrite,
 };
@@ -778,15 +804,51 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 
 static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
+	struct gfs2_inode *ip;
+	struct gfs2_holder gh;
+	size_t written = 0;
 	ssize_t ret;
 
+	gfs2_holder_mark_uninitialized(&gh);
 	if (iocb->ki_flags & IOCB_DIRECT) {
 		ret = gfs2_file_direct_read(iocb, to);
 		if (likely(ret != -ENOTBLK))
 			return ret;
 		iocb->ki_flags &= ~IOCB_DIRECT;
 	}
-	return generic_file_read_iter(iocb, to);
+	iocb->ki_flags |= IOCB_CACHED;
+	ret = generic_file_read_iter(iocb, to);
+	iocb->ki_flags &= ~IOCB_CACHED;
+	if (ret >= 0) {
+		if (!iov_iter_count(to))
+			return ret;
+		written = ret;
+	} else {
+		switch(ret) {
+		case -EAGAIN:
+			if (iocb->ki_flags & IOCB_NOWAIT)
+				return ret;
+			break;
+		case -ECANCELED:
+			break;
+		default:
+			return ret;
+		}
+	}
+	ip = GFS2_I(iocb->ki_filp->f_mapping->host);
+	gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh);
+	ret = gfs2_glock_nq(&gh);
+	if (ret)
+		goto out_uninit;
+	ret = generic_file_read_iter(iocb, to);
+	if (ret > 0)
+		written += ret;
+	if (gfs2_holder_initialized(&gh))
+		gfs2_glock_dq(&gh);
+out_uninit:
+	if (gfs2_holder_initialized(&gh))
+		gfs2_holder_uninit(&gh);
+	return written ? written : ret;
 }
 
 /**
-- 
2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 3/3] gfs2: Rework read and page fault locking
  2019-11-22 23:53 ` [RFC PATCH 3/3] gfs2: Rework read and page fault locking Andreas Gruenbacher
@ 2019-11-25  9:15   ` Kirill A. Shutemov
  2019-11-26  9:08     ` Andreas Grünbacher
  0 siblings, 1 reply; 7+ messages in thread
From: Kirill A. Shutemov @ 2019-11-25  9:15 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Linus Torvalds, Steven Whitehouse, Konstantin Khlebnikov,
	linux-mm, Andrew Morton, linux-kernel, linux-fsdevel,
	Alexander Viro, Johannes Weiner, cluster-devel, Ronnie Sahlberg,
	Steve French, Bob Peterson

On Sat, Nov 23, 2019 at 12:53:24AM +0100, Andreas Gruenbacher wrote:
> @@ -778,15 +804,51 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
>  
>  static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  {
> +	struct gfs2_inode *ip;
> +	struct gfs2_holder gh;
> +	size_t written = 0;

'written' in a read routine?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 0/3] Rework the gfs2 read and page fault locking
  2019-11-22 23:53 [RFC PATCH 0/3] Rework the gfs2 read and page fault locking Andreas Gruenbacher
                   ` (2 preceding siblings ...)
  2019-11-22 23:53 ` [RFC PATCH 3/3] gfs2: Rework read and page fault locking Andreas Gruenbacher
@ 2019-11-25  9:17 ` Kirill A. Shutemov
  3 siblings, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2019-11-25  9:17 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Linus Torvalds, Steven Whitehouse, Konstantin Khlebnikov,
	linux-mm, Andrew Morton, linux-kernel, linux-fsdevel,
	Alexander Viro, Johannes Weiner, cluster-devel, Ronnie Sahlberg,
	Steve French, Bob Peterson

On Sat, Nov 23, 2019 at 12:53:21AM +0100, Andreas Gruenbacher wrote:
> Hello,
> 
> this patch series moves the glock lock taking in gfs2 from the
> ->readpage and ->readpages inode operations to the ->read_iter file and
> ->fault vm operations.  To achieve that, we add flags to the
> generic_file_read_iter and filemap_fault generic helpers.
> 
> This proposal was triggered by the following discussion:
> 
> https://lore.kernel.org/linux-fsdevel/157225677483.3442.4227193290486305330.stgit@buzz/
> 
> In that thread, Linus argued that filesystems should make sure the inode
> size is sufficiently up-to-date before calling the generic helpers, and
> that filesystems can do it themselves if they want more than that.
> That's surely doable.  However, implementing those operations properly
> at the filesystem level quickly becomes complicated when it gets to
> things like readahead.  In addition, those slightly modified copies of
> those helpers would surely diverge from their originals over time, and
> maintaining them properly would become hard.  So I hope the relatively
> small changes to make the original helpers slightly more flexible will
> be acceptable instead.
> 
> With the IOCB_CACHED flag added by one of the patches in this series,
> the code that Konstantin's initial patch adds to
> generic_file_buffered_read could be made conditional on the IOCB_CACHED
> flag being cleared.  That way, it won't misfire on filesystems that
> allow a stale inode size.  (I'm not sure if any filesystems other than
> gfs2 are actually affected.)
> 
> Some additional explanation:
> 
> The cache consistency model of filesystems like gfs2 is such that if
> pages are found in an inode's address space, those pages as well as the
> inode size are up to date and can be used without taking any filesystem
> locks.  If a page is not cached, filesystem locks must be taken before
> the page can be read; this will also bring the inode size up to date.
> 
> Thus far, gfs2 has taken the filesystem locks inside the ->readpage and
> ->readpages address space operations.  A better approach seems to be to
> take those locks earlier, in the ->read_iter file and ->fault vm
> operations.  This would also avoid a lock inversion in ->readpages.
> 
> We obviously want to avoid taking the filesystem locks unnecessarily
> when the pages we are looking for are already cached; otherwise, we
> would cripple performance.  So we need to check if those pages are
> present first.  That's actually exactly what the generic_file_read_iter
> and filemap_fault helpers do already anyway, except that they will call
> into ->readpage and ->readpages when they find pages missing.  Instead
> of that, we'd like those helpers to return with an error code that
> allows us to retry the operation after taking the filesystem locks.

Do you see IOCB_CACHED/FAULT_FLAG_CACHED semantics being usable for
anyting beyond gfs2?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 3/3] gfs2: Rework read and page fault locking
  2019-11-25  9:15   ` Kirill A. Shutemov
@ 2019-11-26  9:08     ` Andreas Grünbacher
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Grünbacher @ 2019-11-26  9:08 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andreas Gruenbacher, Linus Torvalds, Steven Whitehouse,
	Konstantin Khlebnikov, Linux-MM, Andrew Morton,
	Linux Kernel Mailing List, Linux FS-devel Mailing List,
	Alexander Viro, Johannes Weiner, cluster-devel, Ronnie Sahlberg,
	Steve French, Bob Peterson

Am Mo., 25. Nov. 2019 um 10:16 Uhr schrieb Kirill A. Shutemov
<kirill@shutemov.name>:
> On Sat, Nov 23, 2019 at 12:53:24AM +0100, Andreas Gruenbacher wrote:
> > @@ -778,15 +804,51 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
> >
> >  static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >  {
> > +     struct gfs2_inode *ip;
> > +     struct gfs2_holder gh;
> > +     size_t written = 0;
>
> 'written' in a read routine?

It's a bit weird, but it's the same as in generic_file_buffered_read.

> --
>  Kirill A. Shutemov

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22 23:53 [RFC PATCH 0/3] Rework the gfs2 read and page fault locking Andreas Gruenbacher
2019-11-22 23:53 ` [RFC PATCH 1/3] fs: Add IOCB_CACHED flag for generic_file_read_iter Andreas Gruenbacher
2019-11-22 23:53 ` [RFC PATCH 2/3] fs: Add FAULT_FLAG_CACHED flag for filemap_fault Andreas Gruenbacher
2019-11-22 23:53 ` [RFC PATCH 3/3] gfs2: Rework read and page fault locking Andreas Gruenbacher
2019-11-25  9:15   ` Kirill A. Shutemov
2019-11-26  9:08     ` Andreas Grünbacher
2019-11-25  9:17 ` [RFC PATCH 0/3] Rework the gfs2 " Kirill A. Shutemov

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git