All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH 0/7] gfs2 fixes
@ 2022-05-13 20:48 Andreas Gruenbacher
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 1/7] gfs2: Fix filesystem block deallocation for short writes Andreas Gruenbacher
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2022-05-13 20:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hello,

here's a bunch of fixes and related cleanups for the filesystem
corruption we've sometimes seen since commit 00bfe02f4796 ("gfs2: Fix
mmap + page fault deadlocks for buffered I/O").

Thanks,
Andreas

Andreas Gruenbacher (7):
  gfs2: Fix filesystem block deallocation for short writes
  gfs2: Variable rename
  gfs2: Clean up use of fault_in_iov_iter_{read,write}able
  gfs2: Pull return value test out of should_fault_in_pages
  gfs2: Align read and write chunks to the page cache
  gfs2: buffered write prefaulting
  gfs2: Stop using glock holder auto-demotion for now

 fs/gfs2/bmap.c |  11 ++--
 fs/gfs2/file.c | 139 ++++++++++++++++++++++---------------------------
 2 files changed, 68 insertions(+), 82 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cluster-devel] [PATCH 1/7] gfs2: Fix filesystem block deallocation for short writes
  2022-05-13 20:48 [Cluster-devel] [PATCH 0/7] gfs2 fixes Andreas Gruenbacher
@ 2022-05-13 20:48 ` Andreas Gruenbacher
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 2/7] gfs2: Variable rename Andreas Gruenbacher
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2022-05-13 20:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When a write cannot be carried out in full, gfs2_iomap_end() releases
blocks that have been allocated for this write but haven't been used.

To compute the end of the allocation, gfs2_iomap_end() incorrectly
rounded the end of the attempted write down to the next block boundary
to arrive at the end of the allocation.  It would have to round up, but
the end of the allocation is also available as iomap->offset +
iomap->length, so just use that instead.

In addition, use round_up() for computing the start of the unused range.

Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 39080b2d6cf8..b6697333bb2b 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1153,13 +1153,12 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 
 	if (length != written && (iomap->flags & IOMAP_F_NEW)) {
 		/* Deallocate blocks that were just allocated. */
-		loff_t blockmask = i_blocksize(inode) - 1;
-		loff_t end = (pos + length) & ~blockmask;
+		loff_t hstart = round_up(pos + written, i_blocksize(inode));
+		loff_t hend = iomap->offset + iomap->length;
 
-		pos = (pos + written + blockmask) & ~blockmask;
-		if (pos < end) {
-			truncate_pagecache_range(inode, pos, end - 1);
-			punch_hole(ip, pos, end - pos);
+		if (hstart < hend) {
+			truncate_pagecache_range(inode, hstart, hend - 1);
+			punch_hole(ip, hstart, hend - hstart);
 		}
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [PATCH 2/7] gfs2: Variable rename
  2022-05-13 20:48 [Cluster-devel] [PATCH 0/7] gfs2 fixes Andreas Gruenbacher
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 1/7] gfs2: Fix filesystem block deallocation for short writes Andreas Gruenbacher
@ 2022-05-13 20:48 ` Andreas Gruenbacher
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 3/7] gfs2: Clean up use of fault_in_iov_iter_{read, write}able Andreas Gruenbacher
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2022-05-13 20:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Instead of counting the number of bytes read from the filesystem,
functions gfs2_file_direct_read and gfs2_file_read_iter count the number
of bytes written into the user buffer.  Conversely, functions
gfs2_file_direct_write and gfs2_file_buffered_write count the number of
bytes read from the user buffer.  This is nothing but confusing, so
change the read functions to count how many bytes they have read, and
the write functions to count how many bytes they have written.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 48f01323c37c..4d36c01727ad 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -807,7 +807,7 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to,
 	struct file *file = iocb->ki_filp;
 	struct gfs2_inode *ip = GFS2_I(file->f_mapping->host);
 	size_t prev_count = 0, window_size = 0;
-	size_t written = 0;
+	size_t read = 0;
 	ssize_t ret;
 
 	/*
@@ -839,11 +839,11 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to,
 	pagefault_disable();
 	to->nofault = true;
 	ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL,
-			   IOMAP_DIO_PARTIAL, written);
+			   IOMAP_DIO_PARTIAL, read);
 	to->nofault = false;
 	pagefault_enable();
 	if (ret > 0)
-		written = ret;
+		read = ret;
 
 	if (should_fault_in_pages(ret, to, &prev_count, &window_size)) {
 		size_t leftover;
@@ -863,7 +863,7 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to,
 	gfs2_holder_uninit(gh);
 	if (ret < 0)
 		return ret;
-	return written;
+	return read;
 }
 
 static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
@@ -873,7 +873,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
 	struct inode *inode = file->f_mapping->host;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	size_t prev_count = 0, window_size = 0;
-	size_t read = 0;
+	size_t written = 0;
 	ssize_t ret;
 
 	/*
@@ -906,13 +906,13 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
 
 	from->nofault = true;
 	ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL,
-			   IOMAP_DIO_PARTIAL, read);
+			   IOMAP_DIO_PARTIAL, written);
 	from->nofault = false;
 
 	if (ret == -ENOTBLK)
 		ret = 0;
 	if (ret > 0)
-		read = ret;
+		written = ret;
 
 	if (should_fault_in_pages(ret, from, &prev_count, &window_size)) {
 		size_t leftover;
@@ -933,7 +933,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
 	gfs2_holder_uninit(gh);
 	if (ret < 0)
 		return ret;
-	return read;
+	return written;
 }
 
 static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
@@ -941,7 +941,7 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	struct gfs2_inode *ip;
 	struct gfs2_holder gh;
 	size_t prev_count = 0, window_size = 0;
-	size_t written = 0;
+	size_t read = 0;
 	ssize_t ret;
 
 	/*
@@ -962,7 +962,7 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (ret >= 0) {
 		if (!iov_iter_count(to))
 			return ret;
-		written = ret;
+		read = ret;
 	} else if (ret != -EFAULT) {
 		if (ret != -EAGAIN)
 			return ret;
@@ -980,7 +980,7 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	ret = generic_file_read_iter(iocb, to);
 	pagefault_enable();
 	if (ret > 0)
-		written += ret;
+		read += ret;
 
 	if (should_fault_in_pages(ret, to, &prev_count, &window_size)) {
 		size_t leftover;
@@ -998,7 +998,7 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		gfs2_glock_dq(&gh);
 out_uninit:
 	gfs2_holder_uninit(&gh);
-	return written ? written : ret;
+	return read ? read : ret;
 }
 
 static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
@@ -1012,7 +1012,7 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 	struct gfs2_holder *statfs_gh = NULL;
 	size_t prev_count = 0, window_size = 0;
 	size_t orig_count = iov_iter_count(from);
-	size_t read = 0;
+	size_t written = 0;
 	ssize_t ret;
 
 	/*
@@ -1050,13 +1050,13 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 	current->backing_dev_info = NULL;
 	if (ret > 0) {
 		iocb->ki_pos += ret;
-		read += ret;
+		written += ret;
 	}
 
 	if (inode == sdp->sd_rindex)
 		gfs2_glock_dq_uninit(statfs_gh);
 
-	from->count = orig_count - read;
+	from->count = orig_count - written;
 	if (should_fault_in_pages(ret, from, &prev_count, &window_size)) {
 		size_t leftover;
 
@@ -1077,8 +1077,8 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 	gfs2_holder_uninit(gh);
 	if (statfs_gh)
 		kfree(statfs_gh);
-	from->count = orig_count - read;
-	return read ? read : ret;
+	from->count = orig_count - written;
+	return written ? written : ret;
 }
 
 /**
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [PATCH 3/7] gfs2: Clean up use of fault_in_iov_iter_{read, write}able
  2022-05-13 20:48 [Cluster-devel] [PATCH 0/7] gfs2 fixes Andreas Gruenbacher
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 1/7] gfs2: Fix filesystem block deallocation for short writes Andreas Gruenbacher
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 2/7] gfs2: Variable rename Andreas Gruenbacher
@ 2022-05-13 20:48 ` Andreas Gruenbacher
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 4/7] gfs2: Pull return value test out of should_fault_in_pages Andreas Gruenbacher
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2022-05-13 20:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

No need to store the return value of the fault_in functions in separate
variables.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 4d36c01727ad..acc0c1d41564 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -846,12 +846,10 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to,
 		read = ret;
 
 	if (should_fault_in_pages(ret, to, &prev_count, &window_size)) {
-		size_t leftover;
-
 		gfs2_holder_allow_demote(gh);
-		leftover = fault_in_iov_iter_writeable(to, window_size);
+		window_size -= fault_in_iov_iter_writeable(to, window_size);
 		gfs2_holder_disallow_demote(gh);
-		if (leftover != window_size) {
+		if (window_size) {
 			if (gfs2_holder_queued(gh))
 				goto retry_under_glock;
 			goto retry;
@@ -915,12 +913,10 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
 		written = ret;
 
 	if (should_fault_in_pages(ret, from, &prev_count, &window_size)) {
-		size_t leftover;
-
 		gfs2_holder_allow_demote(gh);
-		leftover = fault_in_iov_iter_readable(from, window_size);
+		window_size -= fault_in_iov_iter_readable(from, window_size);
 		gfs2_holder_disallow_demote(gh);
-		if (leftover != window_size) {
+		if (window_size) {
 			if (gfs2_holder_queued(gh))
 				goto retry_under_glock;
 			goto retry;
@@ -983,12 +979,10 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		read += ret;
 
 	if (should_fault_in_pages(ret, to, &prev_count, &window_size)) {
-		size_t leftover;
-
 		gfs2_holder_allow_demote(&gh);
-		leftover = fault_in_iov_iter_writeable(to, window_size);
+		window_size -= fault_in_iov_iter_writeable(to, window_size);
 		gfs2_holder_disallow_demote(&gh);
-		if (leftover != window_size) {
+		if (window_size) {
 			if (gfs2_holder_queued(&gh))
 				goto retry_under_glock;
 			goto retry;
@@ -1058,13 +1052,11 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 
 	from->count = orig_count - written;
 	if (should_fault_in_pages(ret, from, &prev_count, &window_size)) {
-		size_t leftover;
-
 		gfs2_holder_allow_demote(gh);
-		leftover = fault_in_iov_iter_readable(from, window_size);
+		window_size -= fault_in_iov_iter_readable(from, window_size);
 		gfs2_holder_disallow_demote(gh);
-		if (leftover != window_size) {
-			from->count = min(from->count, window_size - leftover);
+		if (window_size) {
+			from->count = min(from->count, window_size);
 			if (gfs2_holder_queued(gh))
 				goto retry_under_glock;
 			goto retry;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [PATCH 4/7] gfs2: Pull return value test out of should_fault_in_pages
  2022-05-13 20:48 [Cluster-devel] [PATCH 0/7] gfs2 fixes Andreas Gruenbacher
                   ` (2 preceding siblings ...)
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 3/7] gfs2: Clean up use of fault_in_iov_iter_{read, write}able Andreas Gruenbacher
@ 2022-05-13 20:48 ` Andreas Gruenbacher
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 5/7] gfs2: Align read and write chunks to the page cache Andreas Gruenbacher
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2022-05-13 20:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Pull the return value test of the previous read or write operation out
of should_fault_in_pages().  In a following patch, we'll fault in pages
before the I/O and there will be no return value to check.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index acc0c1d41564..ea87bef7314d 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -770,7 +770,7 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 	return ret ? ret : ret1;
 }
 
-static inline bool should_fault_in_pages(ssize_t ret, struct iov_iter *i,
+static inline bool should_fault_in_pages(struct iov_iter *i,
 					 size_t *prev_count,
 					 size_t *window_size)
 {
@@ -779,8 +779,6 @@ static inline bool should_fault_in_pages(ssize_t ret, struct iov_iter *i,
 
 	if (likely(!count))
 		return false;
-	if (ret <= 0 && ret != -EFAULT)
-		return false;
 	if (!iter_is_iovec(i))
 		return false;
 
@@ -842,10 +840,12 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to,
 			   IOMAP_DIO_PARTIAL, read);
 	to->nofault = false;
 	pagefault_enable();
+	if (ret <= 0 && ret != -EFAULT)
+		goto out_unlock;
 	if (ret > 0)
 		read = ret;
 
-	if (should_fault_in_pages(ret, to, &prev_count, &window_size)) {
+	if (should_fault_in_pages(to, &prev_count, &window_size)) {
 		gfs2_holder_allow_demote(gh);
 		window_size -= fault_in_iov_iter_writeable(to, window_size);
 		gfs2_holder_disallow_demote(gh);
@@ -855,6 +855,7 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to,
 			goto retry;
 		}
 	}
+out_unlock:
 	if (gfs2_holder_queued(gh))
 		gfs2_glock_dq(gh);
 out_uninit:
@@ -899,20 +900,23 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
 		goto out_uninit;
 	/* Silently fall back to buffered I/O when writing beyond EOF */
 	if (iocb->ki_pos + iov_iter_count(from) > i_size_read(&ip->i_inode))
-		goto out;
+		goto out_unlock;
 retry_under_glock:
 
 	from->nofault = true;
 	ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL,
 			   IOMAP_DIO_PARTIAL, written);
 	from->nofault = false;
-
-	if (ret == -ENOTBLK)
-		ret = 0;
+	if (ret <= 0) {
+		if (ret == -ENOTBLK)
+			ret = 0;
+		if (ret != -EFAULT)
+			goto out_unlock;
+	}
 	if (ret > 0)
 		written = ret;
 
-	if (should_fault_in_pages(ret, from, &prev_count, &window_size)) {
+	if (should_fault_in_pages(from, &prev_count, &window_size)) {
 		gfs2_holder_allow_demote(gh);
 		window_size -= fault_in_iov_iter_readable(from, window_size);
 		gfs2_holder_disallow_demote(gh);
@@ -922,7 +926,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
 			goto retry;
 		}
 	}
-out:
+out_unlock:
 	if (gfs2_holder_queued(gh))
 		gfs2_glock_dq(gh);
 out_uninit:
@@ -975,10 +979,12 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	pagefault_disable();
 	ret = generic_file_read_iter(iocb, to);
 	pagefault_enable();
+	if (ret <= 0 && ret != -EFAULT)
+		goto out_unlock;
 	if (ret > 0)
 		read += ret;
 
-	if (should_fault_in_pages(ret, to, &prev_count, &window_size)) {
+	if (should_fault_in_pages(to, &prev_count, &window_size)) {
 		gfs2_holder_allow_demote(&gh);
 		window_size -= fault_in_iov_iter_writeable(to, window_size);
 		gfs2_holder_disallow_demote(&gh);
@@ -988,6 +994,7 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			goto retry;
 		}
 	}
+out_unlock:
 	if (gfs2_holder_queued(&gh))
 		gfs2_glock_dq(&gh);
 out_uninit:
@@ -1050,8 +1057,11 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 	if (inode == sdp->sd_rindex)
 		gfs2_glock_dq_uninit(statfs_gh);
 
+	if (ret <= 0 && ret != -EFAULT)
+		goto out_unlock;
+
 	from->count = orig_count - written;
-	if (should_fault_in_pages(ret, from, &prev_count, &window_size)) {
+	if (should_fault_in_pages(from, &prev_count, &window_size)) {
 		gfs2_holder_allow_demote(gh);
 		window_size -= fault_in_iov_iter_readable(from, window_size);
 		gfs2_holder_disallow_demote(gh);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [PATCH 5/7] gfs2: Align read and write chunks to the page cache
  2022-05-13 20:48 [Cluster-devel] [PATCH 0/7] gfs2 fixes Andreas Gruenbacher
                   ` (3 preceding siblings ...)
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 4/7] gfs2: Pull return value test out of should_fault_in_pages Andreas Gruenbacher
@ 2022-05-13 20:48 ` Andreas Gruenbacher
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 6/7] gfs2: buffered write prefaulting Andreas Gruenbacher
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 7/7] gfs2: Stop using glock holder auto-demotion for now Andreas Gruenbacher
  6 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2022-05-13 20:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Align the chunks that reads and writes are carried out in to the page
cache rather than the user buffers.  This will be more efficient in
general, especially for allocating writes.  Optimizing the case that the
user buffer is gfs2 backed isn't very useful; we only need to make sure
we won't deadlock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index ea87bef7314d..11c46407d4a8 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -771,6 +771,7 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 }
 
 static inline bool should_fault_in_pages(struct iov_iter *i,
+					 struct kiocb *iocb,
 					 size_t *prev_count,
 					 size_t *window_size)
 {
@@ -783,15 +784,13 @@ static inline bool should_fault_in_pages(struct iov_iter *i,
 		return false;
 
 	size = PAGE_SIZE;
-	offs = offset_in_page(i->iov[0].iov_base + i->iov_offset);
+	offs = offset_in_page(iocb->ki_pos);
 	if (*prev_count != count || !*window_size) {
 		size_t nr_dirtied;
 
-		size = ALIGN(offs + count, PAGE_SIZE);
-		size = min_t(size_t, size, SZ_1M);
 		nr_dirtied = max(current->nr_dirtied_pause -
 				 current->nr_dirtied, 8);
-		size = min(size, nr_dirtied << PAGE_SHIFT);
+		size = min_t(size_t, SZ_1M, nr_dirtied << PAGE_SHIFT);
 	}
 
 	*prev_count = count;
@@ -845,7 +844,7 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to,
 	if (ret > 0)
 		read = ret;
 
-	if (should_fault_in_pages(to, &prev_count, &window_size)) {
+	if (should_fault_in_pages(to, iocb, &prev_count, &window_size)) {
 		gfs2_holder_allow_demote(gh);
 		window_size -= fault_in_iov_iter_writeable(to, window_size);
 		gfs2_holder_disallow_demote(gh);
@@ -916,7 +915,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
 	if (ret > 0)
 		written = ret;
 
-	if (should_fault_in_pages(from, &prev_count, &window_size)) {
+	if (should_fault_in_pages(from, iocb, &prev_count, &window_size)) {
 		gfs2_holder_allow_demote(gh);
 		window_size -= fault_in_iov_iter_readable(from, window_size);
 		gfs2_holder_disallow_demote(gh);
@@ -984,7 +983,7 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (ret > 0)
 		read += ret;
 
-	if (should_fault_in_pages(to, &prev_count, &window_size)) {
+	if (should_fault_in_pages(to, iocb, &prev_count, &window_size)) {
 		gfs2_holder_allow_demote(&gh);
 		window_size -= fault_in_iov_iter_writeable(to, window_size);
 		gfs2_holder_disallow_demote(&gh);
@@ -1061,7 +1060,7 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 		goto out_unlock;
 
 	from->count = orig_count - written;
-	if (should_fault_in_pages(from, &prev_count, &window_size)) {
+	if (should_fault_in_pages(from, iocb, &prev_count, &window_size)) {
 		gfs2_holder_allow_demote(gh);
 		window_size -= fault_in_iov_iter_readable(from, window_size);
 		gfs2_holder_disallow_demote(gh);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [PATCH 6/7] gfs2: buffered write prefaulting
  2022-05-13 20:48 [Cluster-devel] [PATCH 0/7] gfs2 fixes Andreas Gruenbacher
                   ` (4 preceding siblings ...)
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 5/7] gfs2: Align read and write chunks to the page cache Andreas Gruenbacher
@ 2022-05-13 20:48 ` Andreas Gruenbacher
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 7/7] gfs2: Stop using glock holder auto-demotion for now Andreas Gruenbacher
  6 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2022-05-13 20:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

In gfs2_file_buffered_write, to increase the likelihood that all the
user memory we're trying to write will be resident in memory, carry out
the write in chunks and fault in each chunk of user memory before trying
to write it.  Otherwise, some workloads will trigger frequent short
"internal" writes, causing filesystem blocks to be allocated and then
partially deallocated again when writing into holes, which is wasteful
and breaks reservations.

Neither the chunked writes nor any of the short "internal" writes are
user visible.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 11c46407d4a8..5eda1bcc85e3 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -778,7 +778,7 @@ static inline bool should_fault_in_pages(struct iov_iter *i,
 	size_t count = iov_iter_count(i);
 	size_t size, offs;
 
-	if (likely(!count))
+	if (!count)
 		return false;
 	if (!iter_is_iovec(i))
 		return false;
@@ -1033,7 +1033,20 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 	ret = gfs2_glock_nq(gh);
 	if (ret)
 		goto out_uninit;
+	if (should_fault_in_pages(from, iocb, &prev_count, &window_size)) {
 retry_under_glock:
+		gfs2_holder_allow_demote(gh);
+		window_size -= fault_in_iov_iter_readable(from, window_size);
+		gfs2_holder_disallow_demote(gh);
+		if (!window_size) {
+			ret = -EFAULT;
+			goto out_unlock;
+		}
+		if (!gfs2_holder_queued(gh))
+			goto retry;
+		from->count = min(from->count, window_size);
+	}
+
 	if (inode == sdp->sd_rindex) {
 		struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
 
@@ -1060,17 +1073,8 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 		goto out_unlock;
 
 	from->count = orig_count - written;
-	if (should_fault_in_pages(from, iocb, &prev_count, &window_size)) {
-		gfs2_holder_allow_demote(gh);
-		window_size -= fault_in_iov_iter_readable(from, window_size);
-		gfs2_holder_disallow_demote(gh);
-		if (window_size) {
-			from->count = min(from->count, window_size);
-			if (gfs2_holder_queued(gh))
-				goto retry_under_glock;
-			goto retry;
-		}
-	}
+	if (should_fault_in_pages(from, iocb, &prev_count, &window_size))
+		goto retry_under_glock;
 out_unlock:
 	if (gfs2_holder_queued(gh))
 		gfs2_glock_dq(gh);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [PATCH 7/7] gfs2: Stop using glock holder auto-demotion for now
  2022-05-13 20:48 [Cluster-devel] [PATCH 0/7] gfs2 fixes Andreas Gruenbacher
                   ` (5 preceding siblings ...)
  2022-05-13 20:48 ` [Cluster-devel] [PATCH 6/7] gfs2: buffered write prefaulting Andreas Gruenbacher
@ 2022-05-13 20:48 ` Andreas Gruenbacher
  6 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2022-05-13 20:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

We're having unresolved issues with the glock holder auto-demotion mechanism
introduced in commit dc732906c245.  This mechanism was assumed to be essential
for avoiding frequent short reads and writes until commit 296abc0d91d8
("gfs2: No short reads or writes upon glock contention").  Since then,
when the inode glock is lost, it is simply re-acquired and the operation
is resumed.  This means that apart from the performance penalty, we
might as well drop the inode glock before faulting in pages, and
re-acquire it afterwards.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c | 46 ++++++++++++++--------------------------------
 1 file changed, 14 insertions(+), 32 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 5eda1bcc85e3..2556ae1f92ea 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -832,7 +832,6 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to,
 	ret = gfs2_glock_nq(gh);
 	if (ret)
 		goto out_uninit;
-retry_under_glock:
 	pagefault_disable();
 	to->nofault = true;
 	ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL,
@@ -845,14 +844,10 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to,
 		read = ret;
 
 	if (should_fault_in_pages(to, iocb, &prev_count, &window_size)) {
-		gfs2_holder_allow_demote(gh);
+		gfs2_glock_dq(gh);
 		window_size -= fault_in_iov_iter_writeable(to, window_size);
-		gfs2_holder_disallow_demote(gh);
-		if (window_size) {
-			if (gfs2_holder_queued(gh))
-				goto retry_under_glock;
+		if (window_size)
 			goto retry;
-		}
 	}
 out_unlock:
 	if (gfs2_holder_queued(gh))
@@ -900,7 +895,6 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
 	/* Silently fall back to buffered I/O when writing beyond EOF */
 	if (iocb->ki_pos + iov_iter_count(from) > i_size_read(&ip->i_inode))
 		goto out_unlock;
-retry_under_glock:
 
 	from->nofault = true;
 	ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL,
@@ -916,14 +910,10 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from,
 		written = ret;
 
 	if (should_fault_in_pages(from, iocb, &prev_count, &window_size)) {
-		gfs2_holder_allow_demote(gh);
+		gfs2_glock_dq(gh);
 		window_size -= fault_in_iov_iter_readable(from, window_size);
-		gfs2_holder_disallow_demote(gh);
-		if (window_size) {
-			if (gfs2_holder_queued(gh))
-				goto retry_under_glock;
+		if (window_size)
 			goto retry;
-		}
 	}
 out_unlock:
 	if (gfs2_holder_queued(gh))
@@ -974,7 +964,6 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	ret = gfs2_glock_nq(&gh);
 	if (ret)
 		goto out_uninit;
-retry_under_glock:
 	pagefault_disable();
 	ret = generic_file_read_iter(iocb, to);
 	pagefault_enable();
@@ -984,14 +973,10 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		read += ret;
 
 	if (should_fault_in_pages(to, iocb, &prev_count, &window_size)) {
-		gfs2_holder_allow_demote(&gh);
+		gfs2_glock_dq(&gh);
 		window_size -= fault_in_iov_iter_writeable(to, window_size);
-		gfs2_holder_disallow_demote(&gh);
-		if (window_size) {
-			if (gfs2_holder_queued(&gh))
-				goto retry_under_glock;
+		if (window_size)
 			goto retry;
-		}
 	}
 out_unlock:
 	if (gfs2_holder_queued(&gh))
@@ -1030,22 +1015,17 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 
 	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, gh);
 retry:
-	ret = gfs2_glock_nq(gh);
-	if (ret)
-		goto out_uninit;
 	if (should_fault_in_pages(from, iocb, &prev_count, &window_size)) {
-retry_under_glock:
-		gfs2_holder_allow_demote(gh);
 		window_size -= fault_in_iov_iter_readable(from, window_size);
-		gfs2_holder_disallow_demote(gh);
 		if (!window_size) {
 			ret = -EFAULT;
-			goto out_unlock;
+			goto out_uninit;
 		}
-		if (!gfs2_holder_queued(gh))
-			goto retry;
 		from->count = min(from->count, window_size);
 	}
+	ret = gfs2_glock_nq(gh);
+	if (ret)
+		goto out_uninit;
 
 	if (inode == sdp->sd_rindex) {
 		struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
@@ -1073,8 +1053,10 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 		goto out_unlock;
 
 	from->count = orig_count - written;
-	if (should_fault_in_pages(from, iocb, &prev_count, &window_size))
-		goto retry_under_glock;
+	if (should_fault_in_pages(from, iocb, &prev_count, &window_size)) {
+		gfs2_glock_dq(gh);
+		goto retry;
+	}
 out_unlock:
 	if (gfs2_holder_queued(gh))
 		gfs2_glock_dq(gh);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-13 20:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-13 20:48 [Cluster-devel] [PATCH 0/7] gfs2 fixes Andreas Gruenbacher
2022-05-13 20:48 ` [Cluster-devel] [PATCH 1/7] gfs2: Fix filesystem block deallocation for short writes Andreas Gruenbacher
2022-05-13 20:48 ` [Cluster-devel] [PATCH 2/7] gfs2: Variable rename Andreas Gruenbacher
2022-05-13 20:48 ` [Cluster-devel] [PATCH 3/7] gfs2: Clean up use of fault_in_iov_iter_{read, write}able Andreas Gruenbacher
2022-05-13 20:48 ` [Cluster-devel] [PATCH 4/7] gfs2: Pull return value test out of should_fault_in_pages Andreas Gruenbacher
2022-05-13 20:48 ` [Cluster-devel] [PATCH 5/7] gfs2: Align read and write chunks to the page cache Andreas Gruenbacher
2022-05-13 20:48 ` [Cluster-devel] [PATCH 6/7] gfs2: buffered write prefaulting Andreas Gruenbacher
2022-05-13 20:48 ` [Cluster-devel] [PATCH 7/7] gfs2: Stop using glock holder auto-demotion for now Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.