All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/5] iomap and gfs2 fixes
@ 2019-04-29 22:09 ` Andreas Gruenbacher
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel, Darrick J . Wong
  Cc: Christoph Hellwig, Bob Peterson, Jan Kara, Dave Chinner,
	Ross Lagerwall, Mark Syms, Edwin Török, linux-fsdevel,
	linux-mm, Andreas Gruenbacher

Here's another update of this patch queue, hopefully with all wrinkles
ironed out now.

Darrick, I think Linus would be unhappy seeing the first four patches in
the gfs2 tree; could you put them into the xfs tree instead like we did
some time ago already?

Thanks,
Andreas

Andreas Gruenbacher (4):
  fs: Turn __generic_write_end into a void function
  iomap: Fix use-after-free error in page_done callback
  iomap: Add a page_prepare callback
  gfs2: Fix iomap write page reclaim deadlock

Christoph Hellwig (1):
  iomap: Clean up __generic_write_end calling

 fs/buffer.c           |   8 ++--
 fs/gfs2/aops.c        |  14 ++++--
 fs/gfs2/bmap.c        | 101 ++++++++++++++++++++++++------------------
 fs/internal.h         |   2 +-
 fs/iomap.c            |  55 ++++++++++++++---------
 include/linux/iomap.h |  22 ++++++---
 6 files changed, 124 insertions(+), 78 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 0/5] iomap and gfs2 fixes
@ 2019-04-29 22:09 ` Andreas Gruenbacher
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Here's another update of this patch queue, hopefully with all wrinkles
ironed out now.

Darrick, I think Linus would be unhappy seeing the first four patches in
the gfs2 tree; could you put them into the xfs tree instead like we did
some time ago already?

Thanks,
Andreas

Andreas Gruenbacher (4):
  fs: Turn __generic_write_end into a void function
  iomap: Fix use-after-free error in page_done callback
  iomap: Add a page_prepare callback
  gfs2: Fix iomap write page reclaim deadlock

Christoph Hellwig (1):
  iomap: Clean up __generic_write_end calling

 fs/buffer.c           |   8 ++--
 fs/gfs2/aops.c        |  14 ++++--
 fs/gfs2/bmap.c        | 101 ++++++++++++++++++++++++------------------
 fs/internal.h         |   2 +-
 fs/iomap.c            |  55 ++++++++++++++---------
 include/linux/iomap.h |  22 ++++++---
 6 files changed, 124 insertions(+), 78 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v7 1/5] iomap: Clean up __generic_write_end calling
  2019-04-29 22:09 ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-29 22:09   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel, Darrick J . Wong
  Cc: Christoph Hellwig, Bob Peterson, Jan Kara, Dave Chinner,
	Ross Lagerwall, Mark Syms, Edwin Török, linux-fsdevel,
	linux-mm, Andreas Gruenbacher

From: Christoph Hellwig <hch@lst.de>

Move the call to __generic_write_end into iomap_write_end instead of
duplicating it in each of the three branches.  This requires open coding
the generic_write_end for the buffer_head case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/iomap.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 97cb9d486a7d..2344c662e6fc 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -738,13 +738,11 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	 * uptodate page as a zero-length write, and force the caller to redo
 	 * the whole thing.
 	 */
-	if (unlikely(copied < len && !PageUptodate(page))) {
-		copied = 0;
-	} else {
-		iomap_set_range_uptodate(page, offset_in_page(pos), len);
-		iomap_set_page_dirty(page);
-	}
-	return __generic_write_end(inode, pos, copied, page);
+	if (unlikely(copied < len && !PageUptodate(page)))
+		return 0;
+	iomap_set_range_uptodate(page, offset_in_page(pos), len);
+	iomap_set_page_dirty(page);
+	return copied;
 }
 
 static int
@@ -761,7 +759,6 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
 	kunmap_atomic(addr);
 
 	mark_inode_dirty(inode);
-	__generic_write_end(inode, pos, copied, page);
 	return copied;
 }
 
@@ -774,12 +771,13 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	if (iomap->type == IOMAP_INLINE) {
 		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
 	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
-		ret = generic_write_end(NULL, inode->i_mapping, pos, len,
-				copied, page, NULL);
+		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
+				page, NULL);
 	} else {
 		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
 	}
 
+	ret = __generic_write_end(inode, pos, ret, page);
 	if (iomap->page_done)
 		iomap->page_done(inode, pos, copied, page, iomap);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 1/5] iomap: Clean up __generic_write_end calling
@ 2019-04-29 22:09   ` Andreas Gruenbacher
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Christoph Hellwig <hch@lst.de>

Move the call to __generic_write_end into iomap_write_end instead of
duplicating it in each of the three branches.  This requires open coding
the generic_write_end for the buffer_head case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/iomap.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 97cb9d486a7d..2344c662e6fc 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -738,13 +738,11 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	 * uptodate page as a zero-length write, and force the caller to redo
 	 * the whole thing.
 	 */
-	if (unlikely(copied < len && !PageUptodate(page))) {
-		copied = 0;
-	} else {
-		iomap_set_range_uptodate(page, offset_in_page(pos), len);
-		iomap_set_page_dirty(page);
-	}
-	return __generic_write_end(inode, pos, copied, page);
+	if (unlikely(copied < len && !PageUptodate(page)))
+		return 0;
+	iomap_set_range_uptodate(page, offset_in_page(pos), len);
+	iomap_set_page_dirty(page);
+	return copied;
 }
 
 static int
@@ -761,7 +759,6 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
 	kunmap_atomic(addr);
 
 	mark_inode_dirty(inode);
-	__generic_write_end(inode, pos, copied, page);
 	return copied;
 }
 
@@ -774,12 +771,13 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	if (iomap->type == IOMAP_INLINE) {
 		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
 	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
-		ret = generic_write_end(NULL, inode->i_mapping, pos, len,
-				copied, page, NULL);
+		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
+				page, NULL);
 	} else {
 		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
 	}
 
+	ret = __generic_write_end(inode, pos, ret, page);
 	if (iomap->page_done)
 		iomap->page_done(inode, pos, copied, page, iomap);
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 2/5] fs: Turn __generic_write_end into a void function
  2019-04-29 22:09 ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-29 22:09   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel, Darrick J . Wong
  Cc: Christoph Hellwig, Bob Peterson, Jan Kara, Dave Chinner,
	Ross Lagerwall, Mark Syms, Edwin Török, linux-fsdevel,
	linux-mm, Andreas Gruenbacher

The VFS-internal __generic_write_end helper always returns the value of
its @copied argument.  This can be confusing, and it isn't very useful
anyway, so turn __generic_write_end into a function returning void
instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/buffer.c   | 6 +++---
 fs/internal.h | 2 +-
 fs/iomap.c    | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index ce357602f471..e0d4c6a5e2d2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2085,7 +2085,7 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
 }
 EXPORT_SYMBOL(block_write_begin);
 
-int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
 		struct page *page)
 {
 	loff_t old_size = inode->i_size;
@@ -2116,7 +2116,6 @@ int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
 	 */
 	if (i_size_changed)
 		mark_inode_dirty(inode);
-	return copied;
 }
 
 int block_write_end(struct file *file, struct address_space *mapping,
@@ -2160,7 +2159,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 			struct page *page, void *fsdata)
 {
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
-	return __generic_write_end(mapping->host, pos, copied, page);
+	__generic_write_end(mapping->host, pos, copied, page);
+	return copied;
 }
 EXPORT_SYMBOL(generic_write_end);
 
diff --git a/fs/internal.h b/fs/internal.h
index 6a8b71643af4..530587fdf5d8 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -44,7 +44,7 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
 extern void guard_bio_eod(int rw, struct bio *bio);
 extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
 		get_block_t *get_block, struct iomap *iomap);
-int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
 		struct page *page);
 
 /*
diff --git a/fs/iomap.c b/fs/iomap.c
index 2344c662e6fc..f8c9722d1a97 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -777,7 +777,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
 	}
 
-	ret = __generic_write_end(inode, pos, ret, page);
+	__generic_write_end(inode, pos, ret, page);
 	if (iomap->page_done)
 		iomap->page_done(inode, pos, copied, page, iomap);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 2/5] fs: Turn __generic_write_end into a void function
@ 2019-04-29 22:09   ` Andreas Gruenbacher
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

The VFS-internal __generic_write_end helper always returns the value of
its @copied argument.  This can be confusing, and it isn't very useful
anyway, so turn __generic_write_end into a function returning void
instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/buffer.c   | 6 +++---
 fs/internal.h | 2 +-
 fs/iomap.c    | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index ce357602f471..e0d4c6a5e2d2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2085,7 +2085,7 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
 }
 EXPORT_SYMBOL(block_write_begin);
 
-int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
 		struct page *page)
 {
 	loff_t old_size = inode->i_size;
@@ -2116,7 +2116,6 @@ int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
 	 */
 	if (i_size_changed)
 		mark_inode_dirty(inode);
-	return copied;
 }
 
 int block_write_end(struct file *file, struct address_space *mapping,
@@ -2160,7 +2159,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 			struct page *page, void *fsdata)
 {
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
-	return __generic_write_end(mapping->host, pos, copied, page);
+	__generic_write_end(mapping->host, pos, copied, page);
+	return copied;
 }
 EXPORT_SYMBOL(generic_write_end);
 
diff --git a/fs/internal.h b/fs/internal.h
index 6a8b71643af4..530587fdf5d8 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -44,7 +44,7 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
 extern void guard_bio_eod(int rw, struct bio *bio);
 extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
 		get_block_t *get_block, struct iomap *iomap);
-int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
 		struct page *page);
 
 /*
diff --git a/fs/iomap.c b/fs/iomap.c
index 2344c662e6fc..f8c9722d1a97 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -777,7 +777,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
 	}
 
-	ret = __generic_write_end(inode, pos, ret, page);
+	__generic_write_end(inode, pos, ret, page);
 	if (iomap->page_done)
 		iomap->page_done(inode, pos, copied, page, iomap);
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 3/5] iomap: Fix use-after-free error in page_done callback
  2019-04-29 22:09 ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-29 22:09   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel, Darrick J . Wong
  Cc: Christoph Hellwig, Bob Peterson, Jan Kara, Dave Chinner,
	Ross Lagerwall, Mark Syms, Edwin Török, linux-fsdevel,
	linux-mm, Andreas Gruenbacher

In iomap_write_end, we're not holding a page reference anymore when
calling the page_done callback, but the callback needs that reference to
access the page.  To fix that, move the put_page call in
__generic_write_end into the callers of __generic_write_end.  Then, in
iomap_write_end, put the page after calling the page_done callback.

Reported-by: Jan Kara <jack@suse.cz>
Fixes: 63899c6f8851 ("iomap: add a page_done callback")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/buffer.c | 2 +-
 fs/iomap.c  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index e0d4c6a5e2d2..0faa41fb4c88 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2104,7 +2104,6 @@ void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
 	}
 
 	unlock_page(page);
-	put_page(page);
 
 	if (old_size < pos)
 		pagecache_isize_extended(inode, old_size, pos);
@@ -2160,6 +2159,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 {
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
 	__generic_write_end(mapping->host, pos, copied, page);
+	put_page(page);
 	return copied;
 }
 EXPORT_SYMBOL(generic_write_end);
diff --git a/fs/iomap.c b/fs/iomap.c
index f8c9722d1a97..62e3461704ce 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -780,6 +780,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	__generic_write_end(inode, pos, ret, page);
 	if (iomap->page_done)
 		iomap->page_done(inode, pos, copied, page, iomap);
+	put_page(page);
 
 	if (ret < len)
 		iomap_write_failed(inode, pos, len);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 3/5] iomap: Fix use-after-free error in page_done callback
@ 2019-04-29 22:09   ` Andreas Gruenbacher
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

In iomap_write_end, we're not holding a page reference anymore when
calling the page_done callback, but the callback needs that reference to
access the page.  To fix that, move the put_page call in
__generic_write_end into the callers of __generic_write_end.  Then, in
iomap_write_end, put the page after calling the page_done callback.

Reported-by: Jan Kara <jack@suse.cz>
Fixes: 63899c6f8851 ("iomap: add a page_done callback")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/buffer.c | 2 +-
 fs/iomap.c  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index e0d4c6a5e2d2..0faa41fb4c88 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2104,7 +2104,6 @@ void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
 	}
 
 	unlock_page(page);
-	put_page(page);
 
 	if (old_size < pos)
 		pagecache_isize_extended(inode, old_size, pos);
@@ -2160,6 +2159,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 {
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
 	__generic_write_end(mapping->host, pos, copied, page);
+	put_page(page);
 	return copied;
 }
 EXPORT_SYMBOL(generic_write_end);
diff --git a/fs/iomap.c b/fs/iomap.c
index f8c9722d1a97..62e3461704ce 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -780,6 +780,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	__generic_write_end(inode, pos, ret, page);
 	if (iomap->page_done)
 		iomap->page_done(inode, pos, copied, page, iomap);
+	put_page(page);
 
 	if (ret < len)
 		iomap_write_failed(inode, pos, len);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 4/5] iomap: Add a page_prepare callback
  2019-04-29 22:09 ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-29 22:09   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel, Darrick J . Wong
  Cc: Christoph Hellwig, Bob Peterson, Jan Kara, Dave Chinner,
	Ross Lagerwall, Mark Syms, Edwin Török, linux-fsdevel,
	linux-mm, Andreas Gruenbacher

Move the page_done callback into a separate iomap_page_ops structure and
add a page_prepare calback to be called before the next page is written
to.  In gfs2, we'll want to start a transaction in page_prepare and end
it in page_done.  Other filesystems that implement data journaling will
require the same kind of mechanism.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/gfs2/bmap.c        | 15 ++++++++++-----
 fs/iomap.c            | 36 ++++++++++++++++++++++++++----------
 include/linux/iomap.h | 22 +++++++++++++++++-----
 3 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 5da4ca9041c0..aa014725f84a 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -991,15 +991,20 @@ static void gfs2_write_unlock(struct inode *inode)
 	gfs2_glock_dq_uninit(&ip->i_gh);
 }
 
-static void gfs2_iomap_journaled_page_done(struct inode *inode, loff_t pos,
-				unsigned copied, struct page *page,
-				struct iomap *iomap)
+static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
+				 unsigned copied, struct page *page,
+				 struct iomap *iomap)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 
-	gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
+	if (page)
+		gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
 }
 
+static const struct iomap_page_ops gfs2_iomap_page_ops = {
+	.page_done = gfs2_iomap_page_done,
+};
+
 static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 				  loff_t length, unsigned flags,
 				  struct iomap *iomap,
@@ -1077,7 +1082,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 		}
 	}
 	if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip))
-		iomap->page_done = gfs2_iomap_journaled_page_done;
+		iomap->page_ops = &gfs2_iomap_page_ops;
 	return 0;
 
 out_trans_end:
diff --git a/fs/iomap.c b/fs/iomap.c
index 62e3461704ce..a3ffc83134ee 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -665,6 +665,7 @@ static int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap)
 {
+	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	pgoff_t index = pos >> PAGE_SHIFT;
 	struct page *page;
 	int status = 0;
@@ -674,9 +675,17 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	if (fatal_signal_pending(current))
 		return -EINTR;
 
+	if (page_ops && page_ops->page_prepare) {
+		status = page_ops->page_prepare(inode, pos, len, iomap);
+		if (status)
+			return status;
+	}
+
 	page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
-	if (!page)
-		return -ENOMEM;
+	if (!page) {
+		status = -ENOMEM;
+		goto out_no_page;
+	}
 
 	if (iomap->type == IOMAP_INLINE)
 		iomap_read_inline_data(inode, page, iomap);
@@ -684,15 +693,21 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		status = __block_write_begin_int(page, pos, len, NULL, iomap);
 	else
 		status = __iomap_write_begin(inode, pos, len, page, iomap);
-	if (unlikely(status)) {
-		unlock_page(page);
-		put_page(page);
-		page = NULL;
 
-		iomap_write_failed(inode, pos, len);
-	}
+	if (unlikely(status))
+		goto out_unlock;
 
 	*pagep = page;
+	return 0;
+
+out_unlock:
+	unlock_page(page);
+	put_page(page);
+	iomap_write_failed(inode, pos, len);
+
+out_no_page:
+	if (page_ops && page_ops->page_done)
+		page_ops->page_done(inode, pos, 0, NULL, iomap);
 	return status;
 }
 
@@ -766,6 +781,7 @@ static int
 iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 		unsigned copied, struct page *page, struct iomap *iomap)
 {
+	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	int ret;
 
 	if (iomap->type == IOMAP_INLINE) {
@@ -778,8 +794,8 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	}
 
 	__generic_write_end(inode, pos, ret, page);
-	if (iomap->page_done)
-		iomap->page_done(inode, pos, copied, page, iomap);
+	if (page_ops && page_ops->page_done)
+		page_ops->page_done(inode, pos, copied, page, iomap);
 	put_page(page);
 
 	if (ret < len)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0fefb5455bda..2103b94cb1bf 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -53,6 +53,8 @@ struct vm_fault;
  */
 #define IOMAP_NULL_ADDR -1ULL	/* addr is not valid */
 
+struct iomap_page_ops;
+
 struct iomap {
 	u64			addr; /* disk offset of mapping, bytes */
 	loff_t			offset;	/* file offset of mapping, bytes */
@@ -63,12 +65,22 @@ struct iomap {
 	struct dax_device	*dax_dev; /* dax_dev for dax operations */
 	void			*inline_data;
 	void			*private; /* filesystem private */
+	const struct iomap_page_ops *page_ops;
+};
 
-	/*
-	 * Called when finished processing a page in the mapping returned in
-	 * this iomap.  At least for now this is only supported in the buffered
-	 * write path.
-	 */
+/*
+ * When a filesystem sets page_ops in an iomap mapping it returns, page_prepare
+ * and page_done will be called for each page written to.  This only applies to
+ * buffered writes as unbuffered writes will not typically have pages
+ * associated with them.
+ *
+ * When page_prepare succeeds, page_done will always be called to do any
+ * cleanup work necessary.  In that page_done call, @page will be NULL if the
+ * associated page could not be obtained.
+ */
+struct iomap_page_ops {
+	int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len,
+			struct iomap *iomap);
 	void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
 			struct page *page, struct iomap *iomap);
 };
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 4/5] iomap: Add a page_prepare callback
@ 2019-04-29 22:09   ` Andreas Gruenbacher
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Move the page_done callback into a separate iomap_page_ops structure and
add a page_prepare calback to be called before the next page is written
to.  In gfs2, we'll want to start a transaction in page_prepare and end
it in page_done.  Other filesystems that implement data journaling will
require the same kind of mechanism.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/gfs2/bmap.c        | 15 ++++++++++-----
 fs/iomap.c            | 36 ++++++++++++++++++++++++++----------
 include/linux/iomap.h | 22 +++++++++++++++++-----
 3 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 5da4ca9041c0..aa014725f84a 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -991,15 +991,20 @@ static void gfs2_write_unlock(struct inode *inode)
 	gfs2_glock_dq_uninit(&ip->i_gh);
 }
 
-static void gfs2_iomap_journaled_page_done(struct inode *inode, loff_t pos,
-				unsigned copied, struct page *page,
-				struct iomap *iomap)
+static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
+				 unsigned copied, struct page *page,
+				 struct iomap *iomap)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 
-	gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
+	if (page)
+		gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
 }
 
+static const struct iomap_page_ops gfs2_iomap_page_ops = {
+	.page_done = gfs2_iomap_page_done,
+};
+
 static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 				  loff_t length, unsigned flags,
 				  struct iomap *iomap,
@@ -1077,7 +1082,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 		}
 	}
 	if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip))
-		iomap->page_done = gfs2_iomap_journaled_page_done;
+		iomap->page_ops = &gfs2_iomap_page_ops;
 	return 0;
 
 out_trans_end:
diff --git a/fs/iomap.c b/fs/iomap.c
index 62e3461704ce..a3ffc83134ee 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -665,6 +665,7 @@ static int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap)
 {
+	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	pgoff_t index = pos >> PAGE_SHIFT;
 	struct page *page;
 	int status = 0;
@@ -674,9 +675,17 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	if (fatal_signal_pending(current))
 		return -EINTR;
 
+	if (page_ops && page_ops->page_prepare) {
+		status = page_ops->page_prepare(inode, pos, len, iomap);
+		if (status)
+			return status;
+	}
+
 	page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
-	if (!page)
-		return -ENOMEM;
+	if (!page) {
+		status = -ENOMEM;
+		goto out_no_page;
+	}
 
 	if (iomap->type == IOMAP_INLINE)
 		iomap_read_inline_data(inode, page, iomap);
@@ -684,15 +693,21 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		status = __block_write_begin_int(page, pos, len, NULL, iomap);
 	else
 		status = __iomap_write_begin(inode, pos, len, page, iomap);
-	if (unlikely(status)) {
-		unlock_page(page);
-		put_page(page);
-		page = NULL;
 
-		iomap_write_failed(inode, pos, len);
-	}
+	if (unlikely(status))
+		goto out_unlock;
 
 	*pagep = page;
+	return 0;
+
+out_unlock:
+	unlock_page(page);
+	put_page(page);
+	iomap_write_failed(inode, pos, len);
+
+out_no_page:
+	if (page_ops && page_ops->page_done)
+		page_ops->page_done(inode, pos, 0, NULL, iomap);
 	return status;
 }
 
@@ -766,6 +781,7 @@ static int
 iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 		unsigned copied, struct page *page, struct iomap *iomap)
 {
+	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	int ret;
 
 	if (iomap->type == IOMAP_INLINE) {
@@ -778,8 +794,8 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	}
 
 	__generic_write_end(inode, pos, ret, page);
-	if (iomap->page_done)
-		iomap->page_done(inode, pos, copied, page, iomap);
+	if (page_ops && page_ops->page_done)
+		page_ops->page_done(inode, pos, copied, page, iomap);
 	put_page(page);
 
 	if (ret < len)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0fefb5455bda..2103b94cb1bf 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -53,6 +53,8 @@ struct vm_fault;
  */
 #define IOMAP_NULL_ADDR -1ULL	/* addr is not valid */
 
+struct iomap_page_ops;
+
 struct iomap {
 	u64			addr; /* disk offset of mapping, bytes */
 	loff_t			offset;	/* file offset of mapping, bytes */
@@ -63,12 +65,22 @@ struct iomap {
 	struct dax_device	*dax_dev; /* dax_dev for dax operations */
 	void			*inline_data;
 	void			*private; /* filesystem private */
+	const struct iomap_page_ops *page_ops;
+};
 
-	/*
-	 * Called when finished processing a page in the mapping returned in
-	 * this iomap.  At least for now this is only supported in the buffered
-	 * write path.
-	 */
+/*
+ * When a filesystem sets page_ops in an iomap mapping it returns, page_prepare
+ * and page_done will be called for each page written to.  This only applies to
+ * buffered writes as unbuffered writes will not typically have pages
+ * associated with them.
+ *
+ * When page_prepare succeeds, page_done will always be called to do any
+ * cleanup work necessary.  In that page_done call, @page will be NULL if the
+ * associated page could not be obtained.
+ */
+struct iomap_page_ops {
+	int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len,
+			struct iomap *iomap);
 	void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
 			struct page *page, struct iomap *iomap);
 };
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 5/5] gfs2: Fix iomap write page reclaim deadlock
  2019-04-29 22:09 ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-29 22:09   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel, Darrick J . Wong
  Cc: Christoph Hellwig, Bob Peterson, Jan Kara, Dave Chinner,
	Ross Lagerwall, Mark Syms, Edwin Török, linux-fsdevel,
	linux-mm, Andreas Gruenbacher

Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
buffered writes by starting a transaction in iomap_begin, writing a range of
pages, and ending that transaction in iomap_end.  This approach suffers from
two problems:

  (1) Any allocations necessary for the write are done in iomap_begin, so when
  the data aren't journaled, there is no need for keeping the transaction open
  until iomap_end.

  (2) Transactions keep the gfs2 log flush lock held.  When
  iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
  gfs2_write_inode, which will try to flush the log.  This requires taking the
  log flush lock which is already held, resulting in a deadlock.

Fix both of these issues by not keeping transactions open from iomap_begin to
iomap_end.  Instead, start a small transaction in page_prepare and end it in
page_done when necessary.

Reported-by: Edwin Török <edvin.torok@citrix.com>
Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/aops.c | 14 +++++---
 fs/gfs2/bmap.c | 88 +++++++++++++++++++++++++++-----------------------
 2 files changed, 58 insertions(+), 44 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 05dd78f4b2b3..6210d4429d84 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -649,7 +649,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
  */
 void adjust_fs_space(struct inode *inode)
 {
-	struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
 	struct gfs2_inode *l_ip = GFS2_I(sdp->sd_sc_inode);
 	struct gfs2_statfs_change_host *m_sc = &sdp->sd_statfs_master;
@@ -657,10 +657,13 @@ void adjust_fs_space(struct inode *inode)
 	struct buffer_head *m_bh, *l_bh;
 	u64 fs_total, new_free;
 
+	if (gfs2_trans_begin(sdp, 2 * RES_STATFS, 0) != 0)
+		return;
+
 	/* Total up the file system space, according to the latest rindex. */
 	fs_total = gfs2_ri_total(sdp);
 	if (gfs2_meta_inode_buffer(m_ip, &m_bh) != 0)
-		return;
+		goto out;
 
 	spin_lock(&sdp->sd_statfs_spin);
 	gfs2_statfs_change_in(m_sc, m_bh->b_data +
@@ -675,11 +678,14 @@ void adjust_fs_space(struct inode *inode)
 	gfs2_statfs_change(sdp, new_free, new_free, 0);
 
 	if (gfs2_meta_inode_buffer(l_ip, &l_bh) != 0)
-		goto out;
+		goto out2;
 	update_statfs(sdp, m_bh, l_bh);
 	brelse(l_bh);
-out:
+out2:
 	brelse(m_bh);
+out:
+	sdp->sd_rindex_uptodate = 0;
+	gfs2_trans_end(sdp);
 }
 
 /**
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index aa014725f84a..27c82f4aaf32 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -991,17 +991,28 @@ static void gfs2_write_unlock(struct inode *inode)
 	gfs2_glock_dq_uninit(&ip->i_gh);
 }
 
+static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
+				   unsigned len, struct iomap *iomap)
+{
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+
+	return gfs2_trans_begin(sdp, RES_DINODE + (len >> inode->i_blkbits), 0);
+}
+
 static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
 				 unsigned copied, struct page *page,
 				 struct iomap *iomap)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
 
-	if (page)
+	if (page && !gfs2_is_stuffed(ip))
 		gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
+	gfs2_trans_end(sdp);
 }
 
 static const struct iomap_page_ops gfs2_iomap_page_ops = {
+	.page_prepare = gfs2_iomap_page_prepare,
 	.page_done = gfs2_iomap_page_done,
 };
 
@@ -1057,31 +1068,45 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 	if (alloc_required)
 		rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
 
-	ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits);
-	if (ret)
-		goto out_trans_fail;
+	if (unstuff || iomap->type == IOMAP_HOLE) {
+		struct gfs2_trans *tr;
 
-	if (unstuff) {
-		ret = gfs2_unstuff_dinode(ip, NULL);
+		ret = gfs2_trans_begin(sdp, rblocks,
+				       iomap->length >> inode->i_blkbits);
 		if (ret)
-			goto out_trans_end;
-		release_metapath(mp);
-		ret = gfs2_iomap_get(inode, iomap->offset, iomap->length,
-				     flags, iomap, mp);
-		if (ret)
-			goto out_trans_end;
-	}
+			goto out_trans_fail;
 
-	if (iomap->type == IOMAP_HOLE) {
-		ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
-		if (ret) {
-			gfs2_trans_end(sdp);
-			gfs2_inplace_release(ip);
-			punch_hole(ip, iomap->offset, iomap->length);
-			goto out_qunlock;
+		if (unstuff) {
+			ret = gfs2_unstuff_dinode(ip, NULL);
+			if (ret)
+				goto out_trans_end;
+			release_metapath(mp);
+			ret = gfs2_iomap_get(inode, iomap->offset,
+					     iomap->length, flags, iomap, mp);
+			if (ret)
+				goto out_trans_end;
+		}
+
+		if (iomap->type == IOMAP_HOLE) {
+			ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
+			if (ret) {
+				gfs2_trans_end(sdp);
+				gfs2_inplace_release(ip);
+				punch_hole(ip, iomap->offset, iomap->length);
+				goto out_qunlock;
+			}
 		}
+
+		tr = current->journal_info;
+		if (tr->tr_num_buf_new)
+			__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+		else
+			gfs2_trans_add_meta(ip->i_gl, mp->mp_bh[0]);
+
+		gfs2_trans_end(sdp);
 	}
-	if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip))
+
+	if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip))
 		iomap->page_ops = &gfs2_iomap_page_ops;
 	return 0;
 
@@ -1121,10 +1146,6 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 		    iomap->type != IOMAP_MAPPED)
 			ret = -ENOTBLK;
 	}
-	if (!ret) {
-		get_bh(mp.mp_bh[0]);
-		iomap->private = mp.mp_bh[0];
-	}
 	release_metapath(&mp);
 	trace_gfs2_iomap_end(ip, iomap, ret);
 	return ret;
@@ -1135,27 +1156,16 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct gfs2_trans *tr = current->journal_info;
-	struct buffer_head *dibh = iomap->private;
 
 	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
 		goto out;
 
-	if (iomap->type != IOMAP_INLINE) {
+	if (!gfs2_is_stuffed(ip))
 		gfs2_ordered_add_inode(ip);
 
-		if (tr->tr_num_buf_new)
-			__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-		else
-			gfs2_trans_add_meta(ip->i_gl, dibh);
-	}
-
-	if (inode == sdp->sd_rindex) {
+	if (inode == sdp->sd_rindex)
 		adjust_fs_space(inode);
-		sdp->sd_rindex_uptodate = 0;
-	}
 
-	gfs2_trans_end(sdp);
 	gfs2_inplace_release(ip);
 
 	if (length != written && (iomap->flags & IOMAP_F_NEW)) {
@@ -1175,8 +1185,6 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	gfs2_write_unlock(inode);
 
 out:
-	if (dibh)
-		brelse(dibh);
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 5/5] gfs2: Fix iomap write page reclaim deadlock
@ 2019-04-29 22:09   ` Andreas Gruenbacher
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-29 22:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
buffered writes by starting a transaction in iomap_begin, writing a range of
pages, and ending that transaction in iomap_end.  This approach suffers from
two problems:

  (1) Any allocations necessary for the write are done in iomap_begin, so when
  the data aren't journaled, there is no need for keeping the transaction open
  until iomap_end.

  (2) Transactions keep the gfs2 log flush lock held.  When
  iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
  gfs2_write_inode, which will try to flush the log.  This requires taking the
  log flush lock which is already held, resulting in a deadlock.

Fix both of these issues by not keeping transactions open from iomap_begin to
iomap_end.  Instead, start a small transaction in page_prepare and end it in
page_done when necessary.

Reported-by: Edwin T?r?k <edvin.torok@citrix.com>
Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/aops.c | 14 +++++---
 fs/gfs2/bmap.c | 88 +++++++++++++++++++++++++++-----------------------
 2 files changed, 58 insertions(+), 44 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 05dd78f4b2b3..6210d4429d84 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -649,7 +649,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
  */
 void adjust_fs_space(struct inode *inode)
 {
-	struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
 	struct gfs2_inode *l_ip = GFS2_I(sdp->sd_sc_inode);
 	struct gfs2_statfs_change_host *m_sc = &sdp->sd_statfs_master;
@@ -657,10 +657,13 @@ void adjust_fs_space(struct inode *inode)
 	struct buffer_head *m_bh, *l_bh;
 	u64 fs_total, new_free;
 
+	if (gfs2_trans_begin(sdp, 2 * RES_STATFS, 0) != 0)
+		return;
+
 	/* Total up the file system space, according to the latest rindex. */
 	fs_total = gfs2_ri_total(sdp);
 	if (gfs2_meta_inode_buffer(m_ip, &m_bh) != 0)
-		return;
+		goto out;
 
 	spin_lock(&sdp->sd_statfs_spin);
 	gfs2_statfs_change_in(m_sc, m_bh->b_data +
@@ -675,11 +678,14 @@ void adjust_fs_space(struct inode *inode)
 	gfs2_statfs_change(sdp, new_free, new_free, 0);
 
 	if (gfs2_meta_inode_buffer(l_ip, &l_bh) != 0)
-		goto out;
+		goto out2;
 	update_statfs(sdp, m_bh, l_bh);
 	brelse(l_bh);
-out:
+out2:
 	brelse(m_bh);
+out:
+	sdp->sd_rindex_uptodate = 0;
+	gfs2_trans_end(sdp);
 }
 
 /**
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index aa014725f84a..27c82f4aaf32 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -991,17 +991,28 @@ static void gfs2_write_unlock(struct inode *inode)
 	gfs2_glock_dq_uninit(&ip->i_gh);
 }
 
+static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
+				   unsigned len, struct iomap *iomap)
+{
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+
+	return gfs2_trans_begin(sdp, RES_DINODE + (len >> inode->i_blkbits), 0);
+}
+
 static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
 				 unsigned copied, struct page *page,
 				 struct iomap *iomap)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
 
-	if (page)
+	if (page && !gfs2_is_stuffed(ip))
 		gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
+	gfs2_trans_end(sdp);
 }
 
 static const struct iomap_page_ops gfs2_iomap_page_ops = {
+	.page_prepare = gfs2_iomap_page_prepare,
 	.page_done = gfs2_iomap_page_done,
 };
 
@@ -1057,31 +1068,45 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 	if (alloc_required)
 		rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
 
-	ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits);
-	if (ret)
-		goto out_trans_fail;
+	if (unstuff || iomap->type == IOMAP_HOLE) {
+		struct gfs2_trans *tr;
 
-	if (unstuff) {
-		ret = gfs2_unstuff_dinode(ip, NULL);
+		ret = gfs2_trans_begin(sdp, rblocks,
+				       iomap->length >> inode->i_blkbits);
 		if (ret)
-			goto out_trans_end;
-		release_metapath(mp);
-		ret = gfs2_iomap_get(inode, iomap->offset, iomap->length,
-				     flags, iomap, mp);
-		if (ret)
-			goto out_trans_end;
-	}
+			goto out_trans_fail;
 
-	if (iomap->type == IOMAP_HOLE) {
-		ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
-		if (ret) {
-			gfs2_trans_end(sdp);
-			gfs2_inplace_release(ip);
-			punch_hole(ip, iomap->offset, iomap->length);
-			goto out_qunlock;
+		if (unstuff) {
+			ret = gfs2_unstuff_dinode(ip, NULL);
+			if (ret)
+				goto out_trans_end;
+			release_metapath(mp);
+			ret = gfs2_iomap_get(inode, iomap->offset,
+					     iomap->length, flags, iomap, mp);
+			if (ret)
+				goto out_trans_end;
+		}
+
+		if (iomap->type == IOMAP_HOLE) {
+			ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
+			if (ret) {
+				gfs2_trans_end(sdp);
+				gfs2_inplace_release(ip);
+				punch_hole(ip, iomap->offset, iomap->length);
+				goto out_qunlock;
+			}
 		}
+
+		tr = current->journal_info;
+		if (tr->tr_num_buf_new)
+			__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+		else
+			gfs2_trans_add_meta(ip->i_gl, mp->mp_bh[0]);
+
+		gfs2_trans_end(sdp);
 	}
-	if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip))
+
+	if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip))
 		iomap->page_ops = &gfs2_iomap_page_ops;
 	return 0;
 
@@ -1121,10 +1146,6 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 		    iomap->type != IOMAP_MAPPED)
 			ret = -ENOTBLK;
 	}
-	if (!ret) {
-		get_bh(mp.mp_bh[0]);
-		iomap->private = mp.mp_bh[0];
-	}
 	release_metapath(&mp);
 	trace_gfs2_iomap_end(ip, iomap, ret);
 	return ret;
@@ -1135,27 +1156,16 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct gfs2_trans *tr = current->journal_info;
-	struct buffer_head *dibh = iomap->private;
 
 	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
 		goto out;
 
-	if (iomap->type != IOMAP_INLINE) {
+	if (!gfs2_is_stuffed(ip))
 		gfs2_ordered_add_inode(ip);
 
-		if (tr->tr_num_buf_new)
-			__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-		else
-			gfs2_trans_add_meta(ip->i_gl, dibh);
-	}
-
-	if (inode == sdp->sd_rindex) {
+	if (inode == sdp->sd_rindex)
 		adjust_fs_space(inode);
-		sdp->sd_rindex_uptodate = 0;
-	}
 
-	gfs2_trans_end(sdp);
 	gfs2_inplace_release(ip);
 
 	if (length != written && (iomap->flags & IOMAP_F_NEW)) {
@@ -1175,8 +1185,6 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	gfs2_write_unlock(inode);
 
 out:
-	if (dibh)
-		brelse(dibh);
 	return 0;
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 0/5] iomap and gfs2 fixes
  2019-04-29 22:09 ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-30  2:50   ` Darrick J. Wong
  -1 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30  2:50 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
	Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
	linux-fsdevel, linux-mm

On Tue, Apr 30, 2019 at 12:09:29AM +0200, Andreas Gruenbacher wrote:
> Here's another update of this patch queue, hopefully with all wrinkles
> ironed out now.
> 
> Darrick, I think Linus would be unhappy seeing the first four patches in
> the gfs2 tree; could you put them into the xfs tree instead like we did
> some time ago already?

Sure.  When I'm done reviewing them I'll put them in the iomap tree,
though, since we now have a separate one. :)

--D

> Thanks,
> Andreas
> 
> Andreas Gruenbacher (4):
>   fs: Turn __generic_write_end into a void function
>   iomap: Fix use-after-free error in page_done callback
>   iomap: Add a page_prepare callback
>   gfs2: Fix iomap write page reclaim deadlock
> 
> Christoph Hellwig (1):
>   iomap: Clean up __generic_write_end calling
> 
>  fs/buffer.c           |   8 ++--
>  fs/gfs2/aops.c        |  14 ++++--
>  fs/gfs2/bmap.c        | 101 ++++++++++++++++++++++++------------------
>  fs/internal.h         |   2 +-
>  fs/iomap.c            |  55 ++++++++++++++---------
>  include/linux/iomap.h |  22 ++++++---
>  6 files changed, 124 insertions(+), 78 deletions(-)
> 
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 0/5] iomap and gfs2 fixes
@ 2019-04-30  2:50   ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30  2:50 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Apr 30, 2019 at 12:09:29AM +0200, Andreas Gruenbacher wrote:
> Here's another update of this patch queue, hopefully with all wrinkles
> ironed out now.
> 
> Darrick, I think Linus would be unhappy seeing the first four patches in
> the gfs2 tree; could you put them into the xfs tree instead like we did
> some time ago already?

Sure.  When I'm done reviewing them I'll put them in the iomap tree,
though, since we now have a separate one. :)

--D

> Thanks,
> Andreas
> 
> Andreas Gruenbacher (4):
>   fs: Turn __generic_write_end into a void function
>   iomap: Fix use-after-free error in page_done callback
>   iomap: Add a page_prepare callback
>   gfs2: Fix iomap write page reclaim deadlock
> 
> Christoph Hellwig (1):
>   iomap: Clean up __generic_write_end calling
> 
>  fs/buffer.c           |   8 ++--
>  fs/gfs2/aops.c        |  14 ++++--
>  fs/gfs2/bmap.c        | 101 ++++++++++++++++++++++++------------------
>  fs/internal.h         |   2 +-
>  fs/iomap.c            |  55 ++++++++++++++---------
>  include/linux/iomap.h |  22 ++++++---
>  6 files changed, 124 insertions(+), 78 deletions(-)
> 
> -- 
> 2.20.1
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 2/5] fs: Turn __generic_write_end into a void function
  2019-04-29 22:09   ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-30 10:29     ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2019-04-30 10:29 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: cluster-devel, Darrick J . Wong, Christoph Hellwig, Bob Peterson,
	Jan Kara, Dave Chinner, Ross Lagerwall, Mark Syms,
	Edwin Török, linux-fsdevel, linux-mm

On Tue, Apr 30, 2019 at 12:09:31AM +0200, Andreas Gruenbacher wrote:
> The VFS-internal __generic_write_end helper always returns the value of
> its @copied argument.  This can be confusing, and it isn't very useful
> anyway, so turn __generic_write_end into a function returning void
> instead.
> 
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 2/5] fs: Turn __generic_write_end into a void function
@ 2019-04-30 10:29     ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2019-04-30 10:29 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Apr 30, 2019 at 12:09:31AM +0200, Andreas Gruenbacher wrote:
> The VFS-internal __generic_write_end helper always returns the value of
> its @copied argument.  This can be confusing, and it isn't very useful
> anyway, so turn __generic_write_end into a function returning void
> instead.
> 
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 1/5] iomap: Clean up __generic_write_end calling
  2019-04-29 22:09   ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-30 15:14     ` Darrick J. Wong
  -1 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:14 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
	Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
	linux-fsdevel, linux-mm

On Tue, Apr 30, 2019 at 12:09:30AM +0200, Andreas Gruenbacher wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Move the call to __generic_write_end into iomap_write_end instead of
> duplicating it in each of the three branches.  This requires open coding
> the generic_write_end for the buffer_head case.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 97cb9d486a7d..2344c662e6fc 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -738,13 +738,11 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  	 * uptodate page as a zero-length write, and force the caller to redo
>  	 * the whole thing.
>  	 */
> -	if (unlikely(copied < len && !PageUptodate(page))) {
> -		copied = 0;
> -	} else {
> -		iomap_set_range_uptodate(page, offset_in_page(pos), len);
> -		iomap_set_page_dirty(page);
> -	}
> -	return __generic_write_end(inode, pos, copied, page);
> +	if (unlikely(copied < len && !PageUptodate(page)))
> +		return 0;
> +	iomap_set_range_uptodate(page, offset_in_page(pos), len);
> +	iomap_set_page_dirty(page);
> +	return copied;
>  }
>  
>  static int
> @@ -761,7 +759,6 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>  	kunmap_atomic(addr);
>  
>  	mark_inode_dirty(inode);
> -	__generic_write_end(inode, pos, copied, page);
>  	return copied;
>  }
>  
> @@ -774,12 +771,13 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  	if (iomap->type == IOMAP_INLINE) {
>  		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
>  	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> -		ret = generic_write_end(NULL, inode->i_mapping, pos, len,
> -				copied, page, NULL);
> +		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
> +				page, NULL);
>  	} else {
>  		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
>  	}
>  
> +	ret = __generic_write_end(inode, pos, ret, page);
>  	if (iomap->page_done)
>  		iomap->page_done(inode, pos, copied, page, iomap);
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 1/5] iomap: Clean up __generic_write_end calling
@ 2019-04-30 15:14     ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Apr 30, 2019 at 12:09:30AM +0200, Andreas Gruenbacher wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Move the call to __generic_write_end into iomap_write_end instead of
> duplicating it in each of the three branches.  This requires open coding
> the generic_write_end for the buffer_head case.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 97cb9d486a7d..2344c662e6fc 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -738,13 +738,11 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  	 * uptodate page as a zero-length write, and force the caller to redo
>  	 * the whole thing.
>  	 */
> -	if (unlikely(copied < len && !PageUptodate(page))) {
> -		copied = 0;
> -	} else {
> -		iomap_set_range_uptodate(page, offset_in_page(pos), len);
> -		iomap_set_page_dirty(page);
> -	}
> -	return __generic_write_end(inode, pos, copied, page);
> +	if (unlikely(copied < len && !PageUptodate(page)))
> +		return 0;
> +	iomap_set_range_uptodate(page, offset_in_page(pos), len);
> +	iomap_set_page_dirty(page);
> +	return copied;
>  }
>  
>  static int
> @@ -761,7 +759,6 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>  	kunmap_atomic(addr);
>  
>  	mark_inode_dirty(inode);
> -	__generic_write_end(inode, pos, copied, page);
>  	return copied;
>  }
>  
> @@ -774,12 +771,13 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  	if (iomap->type == IOMAP_INLINE) {
>  		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
>  	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> -		ret = generic_write_end(NULL, inode->i_mapping, pos, len,
> -				copied, page, NULL);
> +		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
> +				page, NULL);
>  	} else {
>  		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
>  	}
>  
> +	ret = __generic_write_end(inode, pos, ret, page);
>  	if (iomap->page_done)
>  		iomap->page_done(inode, pos, copied, page, iomap);
>  
> -- 
> 2.20.1
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 2/5] fs: Turn __generic_write_end into a void function
  2019-04-29 22:09   ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-30 15:17     ` Darrick J. Wong
  -1 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:17 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
	Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
	linux-fsdevel, linux-mm

On Tue, Apr 30, 2019 at 12:09:31AM +0200, Andreas Gruenbacher wrote:
> The VFS-internal __generic_write_end helper always returns the value of
> its @copied argument.  This can be confusing, and it isn't very useful
> anyway, so turn __generic_write_end into a function returning void
> instead.

(Also weird that @copied is unsigned but the return value is signed...)

> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/buffer.c   | 6 +++---
>  fs/internal.h | 2 +-
>  fs/iomap.c    | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index ce357602f471..e0d4c6a5e2d2 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2085,7 +2085,7 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
>  }
>  EXPORT_SYMBOL(block_write_begin);
>  
> -int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
> +void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
>  		struct page *page)
>  {
>  	loff_t old_size = inode->i_size;
> @@ -2116,7 +2116,6 @@ int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
>  	 */
>  	if (i_size_changed)
>  		mark_inode_dirty(inode);
> -	return copied;
>  }
>  
>  int block_write_end(struct file *file, struct address_space *mapping,
> @@ -2160,7 +2159,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
>  			struct page *page, void *fsdata)
>  {
>  	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
> -	return __generic_write_end(mapping->host, pos, copied, page);
> +	__generic_write_end(mapping->host, pos, copied, page);
> +	return copied;
>  }
>  EXPORT_SYMBOL(generic_write_end);
>  
> diff --git a/fs/internal.h b/fs/internal.h
> index 6a8b71643af4..530587fdf5d8 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -44,7 +44,7 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
>  extern void guard_bio_eod(int rw, struct bio *bio);
>  extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
>  		get_block_t *get_block, struct iomap *iomap);
> -int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
> +void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
>  		struct page *page);
>  
>  /*
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 2344c662e6fc..f8c9722d1a97 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -777,7 +777,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
>  	}
>  
> -	ret = __generic_write_end(inode, pos, ret, page);
> +	__generic_write_end(inode, pos, ret, page);
>  	if (iomap->page_done)
>  		iomap->page_done(inode, pos, copied, page, iomap);
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 2/5] fs: Turn __generic_write_end into a void function
@ 2019-04-30 15:17     ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:17 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Apr 30, 2019 at 12:09:31AM +0200, Andreas Gruenbacher wrote:
> The VFS-internal __generic_write_end helper always returns the value of
> its @copied argument.  This can be confusing, and it isn't very useful
> anyway, so turn __generic_write_end into a function returning void
> instead.

(Also weird that @copied is unsigned but the return value is signed...)

> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/buffer.c   | 6 +++---
>  fs/internal.h | 2 +-
>  fs/iomap.c    | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index ce357602f471..e0d4c6a5e2d2 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2085,7 +2085,7 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
>  }
>  EXPORT_SYMBOL(block_write_begin);
>  
> -int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
> +void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
>  		struct page *page)
>  {
>  	loff_t old_size = inode->i_size;
> @@ -2116,7 +2116,6 @@ int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
>  	 */
>  	if (i_size_changed)
>  		mark_inode_dirty(inode);
> -	return copied;
>  }
>  
>  int block_write_end(struct file *file, struct address_space *mapping,
> @@ -2160,7 +2159,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
>  			struct page *page, void *fsdata)
>  {
>  	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
> -	return __generic_write_end(mapping->host, pos, copied, page);
> +	__generic_write_end(mapping->host, pos, copied, page);
> +	return copied;
>  }
>  EXPORT_SYMBOL(generic_write_end);
>  
> diff --git a/fs/internal.h b/fs/internal.h
> index 6a8b71643af4..530587fdf5d8 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -44,7 +44,7 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
>  extern void guard_bio_eod(int rw, struct bio *bio);
>  extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
>  		get_block_t *get_block, struct iomap *iomap);
> -int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
> +void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
>  		struct page *page);
>  
>  /*
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 2344c662e6fc..f8c9722d1a97 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -777,7 +777,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
>  	}
>  
> -	ret = __generic_write_end(inode, pos, ret, page);
> +	__generic_write_end(inode, pos, ret, page);
>  	if (iomap->page_done)
>  		iomap->page_done(inode, pos, copied, page, iomap);
>  
> -- 
> 2.20.1
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 3/5] iomap: Fix use-after-free error in page_done callback
  2019-04-29 22:09   ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-30 15:23     ` Darrick J. Wong
  -1 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:23 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
	Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
	linux-fsdevel, linux-mm

On Tue, Apr 30, 2019 at 12:09:32AM +0200, Andreas Gruenbacher wrote:
> In iomap_write_end, we're not holding a page reference anymore when
> calling the page_done callback, but the callback needs that reference to
> access the page.  To fix that, move the put_page call in
> __generic_write_end into the callers of __generic_write_end.  Then, in
> iomap_write_end, put the page after calling the page_done callback.
> 
> Reported-by: Jan Kara <jack@suse.cz>
> Fixes: 63899c6f8851 ("iomap: add a page_done callback")
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/buffer.c | 2 +-
>  fs/iomap.c  | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index e0d4c6a5e2d2..0faa41fb4c88 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2104,7 +2104,6 @@ void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
>  	}
>  
>  	unlock_page(page);
> -	put_page(page);
>  
>  	if (old_size < pos)
>  		pagecache_isize_extended(inode, old_size, pos);
> @@ -2160,6 +2159,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
>  {
>  	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
>  	__generic_write_end(mapping->host, pos, copied, page);
> +	put_page(page);
>  	return copied;
>  }
>  EXPORT_SYMBOL(generic_write_end);
> diff --git a/fs/iomap.c b/fs/iomap.c
> index f8c9722d1a97..62e3461704ce 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -780,6 +780,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  	__generic_write_end(inode, pos, ret, page);
>  	if (iomap->page_done)
>  		iomap->page_done(inode, pos, copied, page, iomap);
> +	put_page(page);
>  
>  	if (ret < len)
>  		iomap_write_failed(inode, pos, len);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 3/5] iomap: Fix use-after-free error in page_done callback
@ 2019-04-30 15:23     ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:23 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Apr 30, 2019 at 12:09:32AM +0200, Andreas Gruenbacher wrote:
> In iomap_write_end, we're not holding a page reference anymore when
> calling the page_done callback, but the callback needs that reference to
> access the page.  To fix that, move the put_page call in
> __generic_write_end into the callers of __generic_write_end.  Then, in
> iomap_write_end, put the page after calling the page_done callback.
> 
> Reported-by: Jan Kara <jack@suse.cz>
> Fixes: 63899c6f8851 ("iomap: add a page_done callback")
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/buffer.c | 2 +-
>  fs/iomap.c  | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index e0d4c6a5e2d2..0faa41fb4c88 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2104,7 +2104,6 @@ void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
>  	}
>  
>  	unlock_page(page);
> -	put_page(page);
>  
>  	if (old_size < pos)
>  		pagecache_isize_extended(inode, old_size, pos);
> @@ -2160,6 +2159,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
>  {
>  	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
>  	__generic_write_end(mapping->host, pos, copied, page);
> +	put_page(page);
>  	return copied;
>  }
>  EXPORT_SYMBOL(generic_write_end);
> diff --git a/fs/iomap.c b/fs/iomap.c
> index f8c9722d1a97..62e3461704ce 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -780,6 +780,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  	__generic_write_end(inode, pos, ret, page);
>  	if (iomap->page_done)
>  		iomap->page_done(inode, pos, copied, page, iomap);
> +	put_page(page);
>  
>  	if (ret < len)
>  		iomap_write_failed(inode, pos, len);
> -- 
> 2.20.1
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 4/5] iomap: Add a page_prepare callback
  2019-04-29 22:09   ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-30 15:26     ` Darrick J. Wong
  -1 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:26 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
	Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
	linux-fsdevel, linux-mm

On Tue, Apr 30, 2019 at 12:09:33AM +0200, Andreas Gruenbacher wrote:
> Move the page_done callback into a separate iomap_page_ops structure and
> add a page_prepare calback to be called before the next page is written
> to.  In gfs2, we'll want to start a transaction in page_prepare and end
> it in page_done.  Other filesystems that implement data journaling will
> require the same kind of mechanism.
> 
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jan Kara <jack@suse.cz>

Looks fine, assuming you've done the appropriate QA/testing on the gfs2
side. :)

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/gfs2/bmap.c        | 15 ++++++++++-----
>  fs/iomap.c            | 36 ++++++++++++++++++++++++++----------
>  include/linux/iomap.h | 22 +++++++++++++++++-----
>  3 files changed, 53 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index 5da4ca9041c0..aa014725f84a 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -991,15 +991,20 @@ static void gfs2_write_unlock(struct inode *inode)
>  	gfs2_glock_dq_uninit(&ip->i_gh);
>  }
>  
> -static void gfs2_iomap_journaled_page_done(struct inode *inode, loff_t pos,
> -				unsigned copied, struct page *page,
> -				struct iomap *iomap)
> +static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
> +				 unsigned copied, struct page *page,
> +				 struct iomap *iomap)
>  {
>  	struct gfs2_inode *ip = GFS2_I(inode);
>  
> -	gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
> +	if (page)
> +		gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
>  }
>  
> +static const struct iomap_page_ops gfs2_iomap_page_ops = {
> +	.page_done = gfs2_iomap_page_done,
> +};
> +
>  static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
>  				  loff_t length, unsigned flags,
>  				  struct iomap *iomap,
> @@ -1077,7 +1082,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
>  		}
>  	}
>  	if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip))
> -		iomap->page_done = gfs2_iomap_journaled_page_done;
> +		iomap->page_ops = &gfs2_iomap_page_ops;
>  	return 0;
>  
>  out_trans_end:
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 62e3461704ce..a3ffc83134ee 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -665,6 +665,7 @@ static int
>  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		struct page **pagep, struct iomap *iomap)
>  {
> +	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	pgoff_t index = pos >> PAGE_SHIFT;
>  	struct page *page;
>  	int status = 0;
> @@ -674,9 +675,17 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  	if (fatal_signal_pending(current))
>  		return -EINTR;
>  
> +	if (page_ops && page_ops->page_prepare) {
> +		status = page_ops->page_prepare(inode, pos, len, iomap);
> +		if (status)
> +			return status;
> +	}
> +
>  	page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
> -	if (!page)
> -		return -ENOMEM;
> +	if (!page) {
> +		status = -ENOMEM;
> +		goto out_no_page;
> +	}
>  
>  	if (iomap->type == IOMAP_INLINE)
>  		iomap_read_inline_data(inode, page, iomap);
> @@ -684,15 +693,21 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		status = __block_write_begin_int(page, pos, len, NULL, iomap);
>  	else
>  		status = __iomap_write_begin(inode, pos, len, page, iomap);
> -	if (unlikely(status)) {
> -		unlock_page(page);
> -		put_page(page);
> -		page = NULL;
>  
> -		iomap_write_failed(inode, pos, len);
> -	}
> +	if (unlikely(status))
> +		goto out_unlock;
>  
>  	*pagep = page;
> +	return 0;
> +
> +out_unlock:
> +	unlock_page(page);
> +	put_page(page);
> +	iomap_write_failed(inode, pos, len);
> +
> +out_no_page:
> +	if (page_ops && page_ops->page_done)
> +		page_ops->page_done(inode, pos, 0, NULL, iomap);
>  	return status;
>  }
>  
> @@ -766,6 +781,7 @@ static int
>  iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  		unsigned copied, struct page *page, struct iomap *iomap)
>  {
> +	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	int ret;
>  
>  	if (iomap->type == IOMAP_INLINE) {
> @@ -778,8 +794,8 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  	}
>  
>  	__generic_write_end(inode, pos, ret, page);
> -	if (iomap->page_done)
> -		iomap->page_done(inode, pos, copied, page, iomap);
> +	if (page_ops && page_ops->page_done)
> +		page_ops->page_done(inode, pos, copied, page, iomap);
>  	put_page(page);
>  
>  	if (ret < len)
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 0fefb5455bda..2103b94cb1bf 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -53,6 +53,8 @@ struct vm_fault;
>   */
>  #define IOMAP_NULL_ADDR -1ULL	/* addr is not valid */
>  
> +struct iomap_page_ops;
> +
>  struct iomap {
>  	u64			addr; /* disk offset of mapping, bytes */
>  	loff_t			offset;	/* file offset of mapping, bytes */
> @@ -63,12 +65,22 @@ struct iomap {
>  	struct dax_device	*dax_dev; /* dax_dev for dax operations */
>  	void			*inline_data;
>  	void			*private; /* filesystem private */
> +	const struct iomap_page_ops *page_ops;
> +};
>  
> -	/*
> -	 * Called when finished processing a page in the mapping returned in
> -	 * this iomap.  At least for now this is only supported in the buffered
> -	 * write path.
> -	 */
> +/*
> + * When a filesystem sets page_ops in an iomap mapping it returns, page_prepare
> + * and page_done will be called for each page written to.  This only applies to
> + * buffered writes as unbuffered writes will not typically have pages
> + * associated with them.
> + *
> + * When page_prepare succeeds, page_done will always be called to do any
> + * cleanup work necessary.  In that page_done call, @page will be NULL if the
> + * associated page could not be obtained.
> + */
> +struct iomap_page_ops {
> +	int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len,
> +			struct iomap *iomap);
>  	void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
>  			struct page *page, struct iomap *iomap);
>  };
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 4/5] iomap: Add a page_prepare callback
@ 2019-04-30 15:26     ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:26 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Apr 30, 2019 at 12:09:33AM +0200, Andreas Gruenbacher wrote:
> Move the page_done callback into a separate iomap_page_ops structure and
> add a page_prepare calback to be called before the next page is written
> to.  In gfs2, we'll want to start a transaction in page_prepare and end
> it in page_done.  Other filesystems that implement data journaling will
> require the same kind of mechanism.
> 
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jan Kara <jack@suse.cz>

Looks fine, assuming you've done the appropriate QA/testing on the gfs2
side. :)

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/gfs2/bmap.c        | 15 ++++++++++-----
>  fs/iomap.c            | 36 ++++++++++++++++++++++++++----------
>  include/linux/iomap.h | 22 +++++++++++++++++-----
>  3 files changed, 53 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index 5da4ca9041c0..aa014725f84a 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -991,15 +991,20 @@ static void gfs2_write_unlock(struct inode *inode)
>  	gfs2_glock_dq_uninit(&ip->i_gh);
>  }
>  
> -static void gfs2_iomap_journaled_page_done(struct inode *inode, loff_t pos,
> -				unsigned copied, struct page *page,
> -				struct iomap *iomap)
> +static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
> +				 unsigned copied, struct page *page,
> +				 struct iomap *iomap)
>  {
>  	struct gfs2_inode *ip = GFS2_I(inode);
>  
> -	gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
> +	if (page)
> +		gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
>  }
>  
> +static const struct iomap_page_ops gfs2_iomap_page_ops = {
> +	.page_done = gfs2_iomap_page_done,
> +};
> +
>  static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
>  				  loff_t length, unsigned flags,
>  				  struct iomap *iomap,
> @@ -1077,7 +1082,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
>  		}
>  	}
>  	if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip))
> -		iomap->page_done = gfs2_iomap_journaled_page_done;
> +		iomap->page_ops = &gfs2_iomap_page_ops;
>  	return 0;
>  
>  out_trans_end:
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 62e3461704ce..a3ffc83134ee 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -665,6 +665,7 @@ static int
>  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		struct page **pagep, struct iomap *iomap)
>  {
> +	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	pgoff_t index = pos >> PAGE_SHIFT;
>  	struct page *page;
>  	int status = 0;
> @@ -674,9 +675,17 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  	if (fatal_signal_pending(current))
>  		return -EINTR;
>  
> +	if (page_ops && page_ops->page_prepare) {
> +		status = page_ops->page_prepare(inode, pos, len, iomap);
> +		if (status)
> +			return status;
> +	}
> +
>  	page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
> -	if (!page)
> -		return -ENOMEM;
> +	if (!page) {
> +		status = -ENOMEM;
> +		goto out_no_page;
> +	}
>  
>  	if (iomap->type == IOMAP_INLINE)
>  		iomap_read_inline_data(inode, page, iomap);
> @@ -684,15 +693,21 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		status = __block_write_begin_int(page, pos, len, NULL, iomap);
>  	else
>  		status = __iomap_write_begin(inode, pos, len, page, iomap);
> -	if (unlikely(status)) {
> -		unlock_page(page);
> -		put_page(page);
> -		page = NULL;
>  
> -		iomap_write_failed(inode, pos, len);
> -	}
> +	if (unlikely(status))
> +		goto out_unlock;
>  
>  	*pagep = page;
> +	return 0;
> +
> +out_unlock:
> +	unlock_page(page);
> +	put_page(page);
> +	iomap_write_failed(inode, pos, len);
> +
> +out_no_page:
> +	if (page_ops && page_ops->page_done)
> +		page_ops->page_done(inode, pos, 0, NULL, iomap);
>  	return status;
>  }
>  
> @@ -766,6 +781,7 @@ static int
>  iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  		unsigned copied, struct page *page, struct iomap *iomap)
>  {
> +	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	int ret;
>  
>  	if (iomap->type == IOMAP_INLINE) {
> @@ -778,8 +794,8 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  	}
>  
>  	__generic_write_end(inode, pos, ret, page);
> -	if (iomap->page_done)
> -		iomap->page_done(inode, pos, copied, page, iomap);
> +	if (page_ops && page_ops->page_done)
> +		page_ops->page_done(inode, pos, copied, page, iomap);
>  	put_page(page);
>  
>  	if (ret < len)
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 0fefb5455bda..2103b94cb1bf 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -53,6 +53,8 @@ struct vm_fault;
>   */
>  #define IOMAP_NULL_ADDR -1ULL	/* addr is not valid */
>  
> +struct iomap_page_ops;
> +
>  struct iomap {
>  	u64			addr; /* disk offset of mapping, bytes */
>  	loff_t			offset;	/* file offset of mapping, bytes */
> @@ -63,12 +65,22 @@ struct iomap {
>  	struct dax_device	*dax_dev; /* dax_dev for dax operations */
>  	void			*inline_data;
>  	void			*private; /* filesystem private */
> +	const struct iomap_page_ops *page_ops;
> +};
>  
> -	/*
> -	 * Called when finished processing a page in the mapping returned in
> -	 * this iomap.  At least for now this is only supported in the buffered
> -	 * write path.
> -	 */
> +/*
> + * When a filesystem sets page_ops in an iomap mapping it returns, page_prepare
> + * and page_done will be called for each page written to.  This only applies to
> + * buffered writes as unbuffered writes will not typically have pages
> + * associated with them.
> + *
> + * When page_prepare succeeds, page_done will always be called to do any
> + * cleanup work necessary.  In that page_done call, @page will be NULL if the
> + * associated page could not be obtained.
> + */
> +struct iomap_page_ops {
> +	int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len,
> +			struct iomap *iomap);
>  	void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
>  			struct page *page, struct iomap *iomap);
>  };
> -- 
> 2.20.1
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 5/5] gfs2: Fix iomap write page reclaim deadlock
  2019-04-29 22:09   ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-30 15:32     ` Darrick J. Wong
  -1 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:32 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
	Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
	linux-fsdevel, linux-mm

On Tue, Apr 30, 2019 at 12:09:34AM +0200, Andreas Gruenbacher wrote:
> Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
> buffered writes by starting a transaction in iomap_begin, writing a range of
> pages, and ending that transaction in iomap_end.  This approach suffers from
> two problems:
> 
>   (1) Any allocations necessary for the write are done in iomap_begin, so when
>   the data aren't journaled, there is no need for keeping the transaction open
>   until iomap_end.
> 
>   (2) Transactions keep the gfs2 log flush lock held.  When
>   iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
>   gfs2_write_inode, which will try to flush the log.  This requires taking the
>   log flush lock which is already held, resulting in a deadlock.

/me wonders how holding the log flush lock doesn't seriously limit
performance, but gfs2 isn't my fight so I'll set that aside and assume
that a patch S-o-B'd by both maintainers is ok. :)

How should we merge this patch #5?  It doesn't touch fs/iomap.c itself,
so do you want me to pull it into the iomap branch along with the
previous four patches?  That would be fine with me (and easier than a
multi-tree merge mess)...

--D

> 
> Fix both of these issues by not keeping transactions open from iomap_begin to
> iomap_end.  Instead, start a small transaction in page_prepare and end it in
> page_done when necessary.
> 
> Reported-by: Edwin Török <edvin.torok@citrix.com>
> Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/aops.c | 14 +++++---
>  fs/gfs2/bmap.c | 88 +++++++++++++++++++++++++++-----------------------
>  2 files changed, 58 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index 05dd78f4b2b3..6210d4429d84 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -649,7 +649,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
>   */
>  void adjust_fs_space(struct inode *inode)
>  {
> -	struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
> +	struct gfs2_sbd *sdp = GFS2_SB(inode);
>  	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
>  	struct gfs2_inode *l_ip = GFS2_I(sdp->sd_sc_inode);
>  	struct gfs2_statfs_change_host *m_sc = &sdp->sd_statfs_master;
> @@ -657,10 +657,13 @@ void adjust_fs_space(struct inode *inode)
>  	struct buffer_head *m_bh, *l_bh;
>  	u64 fs_total, new_free;
>  
> +	if (gfs2_trans_begin(sdp, 2 * RES_STATFS, 0) != 0)
> +		return;
> +
>  	/* Total up the file system space, according to the latest rindex. */
>  	fs_total = gfs2_ri_total(sdp);
>  	if (gfs2_meta_inode_buffer(m_ip, &m_bh) != 0)
> -		return;
> +		goto out;
>  
>  	spin_lock(&sdp->sd_statfs_spin);
>  	gfs2_statfs_change_in(m_sc, m_bh->b_data +
> @@ -675,11 +678,14 @@ void adjust_fs_space(struct inode *inode)
>  	gfs2_statfs_change(sdp, new_free, new_free, 0);
>  
>  	if (gfs2_meta_inode_buffer(l_ip, &l_bh) != 0)
> -		goto out;
> +		goto out2;
>  	update_statfs(sdp, m_bh, l_bh);
>  	brelse(l_bh);
> -out:
> +out2:
>  	brelse(m_bh);
> +out:
> +	sdp->sd_rindex_uptodate = 0;
> +	gfs2_trans_end(sdp);
>  }
>  
>  /**
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index aa014725f84a..27c82f4aaf32 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -991,17 +991,28 @@ static void gfs2_write_unlock(struct inode *inode)
>  	gfs2_glock_dq_uninit(&ip->i_gh);
>  }
>  
> +static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
> +				   unsigned len, struct iomap *iomap)
> +{
> +	struct gfs2_sbd *sdp = GFS2_SB(inode);
> +
> +	return gfs2_trans_begin(sdp, RES_DINODE + (len >> inode->i_blkbits), 0);
> +}
> +
>  static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
>  				 unsigned copied, struct page *page,
>  				 struct iomap *iomap)
>  {
>  	struct gfs2_inode *ip = GFS2_I(inode);
> +	struct gfs2_sbd *sdp = GFS2_SB(inode);
>  
> -	if (page)
> +	if (page && !gfs2_is_stuffed(ip))
>  		gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
> +	gfs2_trans_end(sdp);
>  }
>  
>  static const struct iomap_page_ops gfs2_iomap_page_ops = {
> +	.page_prepare = gfs2_iomap_page_prepare,
>  	.page_done = gfs2_iomap_page_done,
>  };
>  
> @@ -1057,31 +1068,45 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
>  	if (alloc_required)
>  		rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
>  
> -	ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits);
> -	if (ret)
> -		goto out_trans_fail;
> +	if (unstuff || iomap->type == IOMAP_HOLE) {
> +		struct gfs2_trans *tr;
>  
> -	if (unstuff) {
> -		ret = gfs2_unstuff_dinode(ip, NULL);
> +		ret = gfs2_trans_begin(sdp, rblocks,
> +				       iomap->length >> inode->i_blkbits);
>  		if (ret)
> -			goto out_trans_end;
> -		release_metapath(mp);
> -		ret = gfs2_iomap_get(inode, iomap->offset, iomap->length,
> -				     flags, iomap, mp);
> -		if (ret)
> -			goto out_trans_end;
> -	}
> +			goto out_trans_fail;
>  
> -	if (iomap->type == IOMAP_HOLE) {
> -		ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
> -		if (ret) {
> -			gfs2_trans_end(sdp);
> -			gfs2_inplace_release(ip);
> -			punch_hole(ip, iomap->offset, iomap->length);
> -			goto out_qunlock;
> +		if (unstuff) {
> +			ret = gfs2_unstuff_dinode(ip, NULL);
> +			if (ret)
> +				goto out_trans_end;
> +			release_metapath(mp);
> +			ret = gfs2_iomap_get(inode, iomap->offset,
> +					     iomap->length, flags, iomap, mp);
> +			if (ret)
> +				goto out_trans_end;
> +		}
> +
> +		if (iomap->type == IOMAP_HOLE) {
> +			ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
> +			if (ret) {
> +				gfs2_trans_end(sdp);
> +				gfs2_inplace_release(ip);
> +				punch_hole(ip, iomap->offset, iomap->length);
> +				goto out_qunlock;
> +			}
>  		}
> +
> +		tr = current->journal_info;
> +		if (tr->tr_num_buf_new)
> +			__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> +		else
> +			gfs2_trans_add_meta(ip->i_gl, mp->mp_bh[0]);
> +
> +		gfs2_trans_end(sdp);
>  	}
> -	if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip))
> +
> +	if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip))
>  		iomap->page_ops = &gfs2_iomap_page_ops;
>  	return 0;
>  
> @@ -1121,10 +1146,6 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
>  		    iomap->type != IOMAP_MAPPED)
>  			ret = -ENOTBLK;
>  	}
> -	if (!ret) {
> -		get_bh(mp.mp_bh[0]);
> -		iomap->private = mp.mp_bh[0];
> -	}
>  	release_metapath(&mp);
>  	trace_gfs2_iomap_end(ip, iomap, ret);
>  	return ret;
> @@ -1135,27 +1156,16 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
>  {
>  	struct gfs2_inode *ip = GFS2_I(inode);
>  	struct gfs2_sbd *sdp = GFS2_SB(inode);
> -	struct gfs2_trans *tr = current->journal_info;
> -	struct buffer_head *dibh = iomap->private;
>  
>  	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
>  		goto out;
>  
> -	if (iomap->type != IOMAP_INLINE) {
> +	if (!gfs2_is_stuffed(ip))
>  		gfs2_ordered_add_inode(ip);
>  
> -		if (tr->tr_num_buf_new)
> -			__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> -		else
> -			gfs2_trans_add_meta(ip->i_gl, dibh);
> -	}
> -
> -	if (inode == sdp->sd_rindex) {
> +	if (inode == sdp->sd_rindex)
>  		adjust_fs_space(inode);
> -		sdp->sd_rindex_uptodate = 0;
> -	}
>  
> -	gfs2_trans_end(sdp);
>  	gfs2_inplace_release(ip);
>  
>  	if (length != written && (iomap->flags & IOMAP_F_NEW)) {
> @@ -1175,8 +1185,6 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
>  	gfs2_write_unlock(inode);
>  
>  out:
> -	if (dibh)
> -		brelse(dibh);
>  	return 0;
>  }
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 5/5] gfs2: Fix iomap write page reclaim deadlock
@ 2019-04-30 15:32     ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:32 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Apr 30, 2019 at 12:09:34AM +0200, Andreas Gruenbacher wrote:
> Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
> buffered writes by starting a transaction in iomap_begin, writing a range of
> pages, and ending that transaction in iomap_end.  This approach suffers from
> two problems:
> 
>   (1) Any allocations necessary for the write are done in iomap_begin, so when
>   the data aren't journaled, there is no need for keeping the transaction open
>   until iomap_end.
> 
>   (2) Transactions keep the gfs2 log flush lock held.  When
>   iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
>   gfs2_write_inode, which will try to flush the log.  This requires taking the
>   log flush lock which is already held, resulting in a deadlock.

/me wonders how holding the log flush lock doesn't seriously limit
performance, but gfs2 isn't my fight so I'll set that aside and assume
that a patch S-o-B'd by both maintainers is ok. :)

How should we merge this patch #5?  It doesn't touch fs/iomap.c itself,
so do you want me to pull it into the iomap branch along with the
previous four patches?  That would be fine with me (and easier than a
multi-tree merge mess)...

--D

> 
> Fix both of these issues by not keeping transactions open from iomap_begin to
> iomap_end.  Instead, start a small transaction in page_prepare and end it in
> page_done when necessary.
> 
> Reported-by: Edwin T?r?k <edvin.torok@citrix.com>
> Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/aops.c | 14 +++++---
>  fs/gfs2/bmap.c | 88 +++++++++++++++++++++++++++-----------------------
>  2 files changed, 58 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index 05dd78f4b2b3..6210d4429d84 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -649,7 +649,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
>   */
>  void adjust_fs_space(struct inode *inode)
>  {
> -	struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
> +	struct gfs2_sbd *sdp = GFS2_SB(inode);
>  	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
>  	struct gfs2_inode *l_ip = GFS2_I(sdp->sd_sc_inode);
>  	struct gfs2_statfs_change_host *m_sc = &sdp->sd_statfs_master;
> @@ -657,10 +657,13 @@ void adjust_fs_space(struct inode *inode)
>  	struct buffer_head *m_bh, *l_bh;
>  	u64 fs_total, new_free;
>  
> +	if (gfs2_trans_begin(sdp, 2 * RES_STATFS, 0) != 0)
> +		return;
> +
>  	/* Total up the file system space, according to the latest rindex. */
>  	fs_total = gfs2_ri_total(sdp);
>  	if (gfs2_meta_inode_buffer(m_ip, &m_bh) != 0)
> -		return;
> +		goto out;
>  
>  	spin_lock(&sdp->sd_statfs_spin);
>  	gfs2_statfs_change_in(m_sc, m_bh->b_data +
> @@ -675,11 +678,14 @@ void adjust_fs_space(struct inode *inode)
>  	gfs2_statfs_change(sdp, new_free, new_free, 0);
>  
>  	if (gfs2_meta_inode_buffer(l_ip, &l_bh) != 0)
> -		goto out;
> +		goto out2;
>  	update_statfs(sdp, m_bh, l_bh);
>  	brelse(l_bh);
> -out:
> +out2:
>  	brelse(m_bh);
> +out:
> +	sdp->sd_rindex_uptodate = 0;
> +	gfs2_trans_end(sdp);
>  }
>  
>  /**
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index aa014725f84a..27c82f4aaf32 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -991,17 +991,28 @@ static void gfs2_write_unlock(struct inode *inode)
>  	gfs2_glock_dq_uninit(&ip->i_gh);
>  }
>  
> +static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
> +				   unsigned len, struct iomap *iomap)
> +{
> +	struct gfs2_sbd *sdp = GFS2_SB(inode);
> +
> +	return gfs2_trans_begin(sdp, RES_DINODE + (len >> inode->i_blkbits), 0);
> +}
> +
>  static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
>  				 unsigned copied, struct page *page,
>  				 struct iomap *iomap)
>  {
>  	struct gfs2_inode *ip = GFS2_I(inode);
> +	struct gfs2_sbd *sdp = GFS2_SB(inode);
>  
> -	if (page)
> +	if (page && !gfs2_is_stuffed(ip))
>  		gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
> +	gfs2_trans_end(sdp);
>  }
>  
>  static const struct iomap_page_ops gfs2_iomap_page_ops = {
> +	.page_prepare = gfs2_iomap_page_prepare,
>  	.page_done = gfs2_iomap_page_done,
>  };
>  
> @@ -1057,31 +1068,45 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
>  	if (alloc_required)
>  		rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
>  
> -	ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits);
> -	if (ret)
> -		goto out_trans_fail;
> +	if (unstuff || iomap->type == IOMAP_HOLE) {
> +		struct gfs2_trans *tr;
>  
> -	if (unstuff) {
> -		ret = gfs2_unstuff_dinode(ip, NULL);
> +		ret = gfs2_trans_begin(sdp, rblocks,
> +				       iomap->length >> inode->i_blkbits);
>  		if (ret)
> -			goto out_trans_end;
> -		release_metapath(mp);
> -		ret = gfs2_iomap_get(inode, iomap->offset, iomap->length,
> -				     flags, iomap, mp);
> -		if (ret)
> -			goto out_trans_end;
> -	}
> +			goto out_trans_fail;
>  
> -	if (iomap->type == IOMAP_HOLE) {
> -		ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
> -		if (ret) {
> -			gfs2_trans_end(sdp);
> -			gfs2_inplace_release(ip);
> -			punch_hole(ip, iomap->offset, iomap->length);
> -			goto out_qunlock;
> +		if (unstuff) {
> +			ret = gfs2_unstuff_dinode(ip, NULL);
> +			if (ret)
> +				goto out_trans_end;
> +			release_metapath(mp);
> +			ret = gfs2_iomap_get(inode, iomap->offset,
> +					     iomap->length, flags, iomap, mp);
> +			if (ret)
> +				goto out_trans_end;
> +		}
> +
> +		if (iomap->type == IOMAP_HOLE) {
> +			ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
> +			if (ret) {
> +				gfs2_trans_end(sdp);
> +				gfs2_inplace_release(ip);
> +				punch_hole(ip, iomap->offset, iomap->length);
> +				goto out_qunlock;
> +			}
>  		}
> +
> +		tr = current->journal_info;
> +		if (tr->tr_num_buf_new)
> +			__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> +		else
> +			gfs2_trans_add_meta(ip->i_gl, mp->mp_bh[0]);
> +
> +		gfs2_trans_end(sdp);
>  	}
> -	if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip))
> +
> +	if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip))
>  		iomap->page_ops = &gfs2_iomap_page_ops;
>  	return 0;
>  
> @@ -1121,10 +1146,6 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
>  		    iomap->type != IOMAP_MAPPED)
>  			ret = -ENOTBLK;
>  	}
> -	if (!ret) {
> -		get_bh(mp.mp_bh[0]);
> -		iomap->private = mp.mp_bh[0];
> -	}
>  	release_metapath(&mp);
>  	trace_gfs2_iomap_end(ip, iomap, ret);
>  	return ret;
> @@ -1135,27 +1156,16 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
>  {
>  	struct gfs2_inode *ip = GFS2_I(inode);
>  	struct gfs2_sbd *sdp = GFS2_SB(inode);
> -	struct gfs2_trans *tr = current->journal_info;
> -	struct buffer_head *dibh = iomap->private;
>  
>  	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
>  		goto out;
>  
> -	if (iomap->type != IOMAP_INLINE) {
> +	if (!gfs2_is_stuffed(ip))
>  		gfs2_ordered_add_inode(ip);
>  
> -		if (tr->tr_num_buf_new)
> -			__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> -		else
> -			gfs2_trans_add_meta(ip->i_gl, dibh);
> -	}
> -
> -	if (inode == sdp->sd_rindex) {
> +	if (inode == sdp->sd_rindex)
>  		adjust_fs_space(inode);
> -		sdp->sd_rindex_uptodate = 0;
> -	}
>  
> -	gfs2_trans_end(sdp);
>  	gfs2_inplace_release(ip);
>  
>  	if (length != written && (iomap->flags & IOMAP_F_NEW)) {
> @@ -1175,8 +1185,6 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
>  	gfs2_write_unlock(inode);
>  
>  out:
> -	if (dibh)
> -		brelse(dibh);
>  	return 0;
>  }
>  
> -- 
> 2.20.1
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 5/5] gfs2: Fix iomap write page reclaim deadlock
  2019-04-30 15:32     ` [Cluster-devel] " Darrick J. Wong
@ 2019-04-30 15:39       ` Andreas Gruenbacher
  -1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-30 15:39 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
	Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
	linux-fsdevel, linux-mm

On Tue, 30 Apr 2019 at 17:33, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> On Tue, Apr 30, 2019 at 12:09:34AM +0200, Andreas Gruenbacher wrote:
> > Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
> > buffered writes by starting a transaction in iomap_begin, writing a range of
> > pages, and ending that transaction in iomap_end.  This approach suffers from
> > two problems:
> >
> >   (1) Any allocations necessary for the write are done in iomap_begin, so when
> >   the data aren't journaled, there is no need for keeping the transaction open
> >   until iomap_end.
> >
> >   (2) Transactions keep the gfs2 log flush lock held.  When
> >   iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
> >   gfs2_write_inode, which will try to flush the log.  This requires taking the
> >   log flush lock which is already held, resulting in a deadlock.
>
> /me wonders how holding the log flush lock doesn't seriously limit
> performance, but gfs2 isn't my fight so I'll set that aside and assume
> that a patch S-o-B'd by both maintainers is ok. :)

This only affects inline and journaled data, not standard writes, so
it's not quite as bad as it looks.

> How should we merge this patch #5?  It doesn't touch fs/iomap.c itself,
> so do you want me to pull it into the iomap branch along with the
> previous four patches?  That would be fine with me (and easier than a
> multi-tree merge mess)...

I'd prefer to get this merged via the gfs2 tree once the iomap fixes
have been pulled.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 5/5] gfs2: Fix iomap write page reclaim deadlock
@ 2019-04-30 15:39       ` Andreas Gruenbacher
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2019-04-30 15:39 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, 30 Apr 2019 at 17:33, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> On Tue, Apr 30, 2019 at 12:09:34AM +0200, Andreas Gruenbacher wrote:
> > Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
> > buffered writes by starting a transaction in iomap_begin, writing a range of
> > pages, and ending that transaction in iomap_end.  This approach suffers from
> > two problems:
> >
> >   (1) Any allocations necessary for the write are done in iomap_begin, so when
> >   the data aren't journaled, there is no need for keeping the transaction open
> >   until iomap_end.
> >
> >   (2) Transactions keep the gfs2 log flush lock held.  When
> >   iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
> >   gfs2_write_inode, which will try to flush the log.  This requires taking the
> >   log flush lock which is already held, resulting in a deadlock.
>
> /me wonders how holding the log flush lock doesn't seriously limit
> performance, but gfs2 isn't my fight so I'll set that aside and assume
> that a patch S-o-B'd by both maintainers is ok. :)

This only affects inline and journaled data, not standard writes, so
it's not quite as bad as it looks.

> How should we merge this patch #5?  It doesn't touch fs/iomap.c itself,
> so do you want me to pull it into the iomap branch along with the
> previous four patches?  That would be fine with me (and easier than a
> multi-tree merge mess)...

I'd prefer to get this merged via the gfs2 tree once the iomap fixes
have been pulled.

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 5/5] gfs2: Fix iomap write page reclaim deadlock
  2019-04-30 15:39       ` [Cluster-devel] " Andreas Gruenbacher
@ 2019-04-30 15:47         ` Darrick J. Wong
  -1 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:47 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
	Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
	linux-fsdevel, linux-mm

On Tue, Apr 30, 2019 at 05:39:28PM +0200, Andreas Gruenbacher wrote:
> On Tue, 30 Apr 2019 at 17:33, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > On Tue, Apr 30, 2019 at 12:09:34AM +0200, Andreas Gruenbacher wrote:
> > > Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
> > > buffered writes by starting a transaction in iomap_begin, writing a range of
> > > pages, and ending that transaction in iomap_end.  This approach suffers from
> > > two problems:
> > >
> > >   (1) Any allocations necessary for the write are done in iomap_begin, so when
> > >   the data aren't journaled, there is no need for keeping the transaction open
> > >   until iomap_end.
> > >
> > >   (2) Transactions keep the gfs2 log flush lock held.  When
> > >   iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
> > >   gfs2_write_inode, which will try to flush the log.  This requires taking the
> > >   log flush lock which is already held, resulting in a deadlock.
> >
> > /me wonders how holding the log flush lock doesn't seriously limit
> > performance, but gfs2 isn't my fight so I'll set that aside and assume
> > that a patch S-o-B'd by both maintainers is ok. :)
> 
> This only affects inline and journaled data, not standard writes, so
> it's not quite as bad as it looks.

Ah, ok.

> > How should we merge this patch #5?  It doesn't touch fs/iomap.c itself,
> > so do you want me to pull it into the iomap branch along with the
> > previous four patches?  That would be fine with me (and easier than a
> > multi-tree merge mess)...
> 
> I'd prefer to get this merged via the gfs2 tree once the iomap fixes
> have been pulled.

Ok, I'll take the first four patches through the iomap branch and cc you
on the pull request.

--D

> 
> Thanks,
> Andreas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 5/5] gfs2: Fix iomap write page reclaim deadlock
@ 2019-04-30 15:47         ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-04-30 15:47 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Apr 30, 2019 at 05:39:28PM +0200, Andreas Gruenbacher wrote:
> On Tue, 30 Apr 2019 at 17:33, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > On Tue, Apr 30, 2019 at 12:09:34AM +0200, Andreas Gruenbacher wrote:
> > > Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
> > > buffered writes by starting a transaction in iomap_begin, writing a range of
> > > pages, and ending that transaction in iomap_end.  This approach suffers from
> > > two problems:
> > >
> > >   (1) Any allocations necessary for the write are done in iomap_begin, so when
> > >   the data aren't journaled, there is no need for keeping the transaction open
> > >   until iomap_end.
> > >
> > >   (2) Transactions keep the gfs2 log flush lock held.  When
> > >   iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
> > >   gfs2_write_inode, which will try to flush the log.  This requires taking the
> > >   log flush lock which is already held, resulting in a deadlock.
> >
> > /me wonders how holding the log flush lock doesn't seriously limit
> > performance, but gfs2 isn't my fight so I'll set that aside and assume
> > that a patch S-o-B'd by both maintainers is ok. :)
> 
> This only affects inline and journaled data, not standard writes, so
> it's not quite as bad as it looks.

Ah, ok.

> > How should we merge this patch #5?  It doesn't touch fs/iomap.c itself,
> > so do you want me to pull it into the iomap branch along with the
> > previous four patches?  That would be fine with me (and easier than a
> > multi-tree merge mess)...
> 
> I'd prefer to get this merged via the gfs2 tree once the iomap fixes
> have been pulled.

Ok, I'll take the first four patches through the iomap branch and cc you
on the pull request.

--D

> 
> Thanks,
> Andreas



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 5/5] gfs2: Fix iomap write page reclaim deadlock
  2019-04-30 15:47         ` [Cluster-devel] " Darrick J. Wong
@ 2019-04-30 16:15           ` Andreas Grünbacher
  -1 siblings, 0 replies; 36+ messages in thread
From: Andreas Grünbacher @ 2019-04-30 16:15 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andreas Gruenbacher, cluster-devel, Christoph Hellwig,
	Bob Peterson, Jan Kara, Dave Chinner, Ross Lagerwall, Mark Syms,
	Edwin Török, linux-fsdevel, linux-mm

Am Di., 30. Apr. 2019 um 17:48 Uhr schrieb Darrick J. Wong
<darrick.wong@oracle.com>:
> Ok, I'll take the first four patches through the iomap branch and cc you
> on the pull request.

Ok great, thanks.

Andreas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 5/5] gfs2: Fix iomap write page reclaim deadlock
@ 2019-04-30 16:15           ` Andreas Grünbacher
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Grünbacher @ 2019-04-30 16:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Am Di., 30. Apr. 2019 um 17:48 Uhr schrieb Darrick J. Wong
<darrick.wong@oracle.com>:
> Ok, I'll take the first four patches through the iomap branch and cc you
> on the pull request.

Ok great, thanks.

Andreas



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 0/5] iomap and gfs2 fixes
  2019-04-30  2:50   ` [Cluster-devel] " Darrick J. Wong
@ 2019-04-30 21:21     ` Dave Chinner
  -1 siblings, 0 replies; 36+ messages in thread
From: Dave Chinner @ 2019-04-30 21:21 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andreas Gruenbacher, cluster-devel, Christoph Hellwig,
	Bob Peterson, Jan Kara, Ross Lagerwall, Mark Syms,
	Edwin Török, linux-fsdevel, linux-mm

On Mon, Apr 29, 2019 at 07:50:28PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 30, 2019 at 12:09:29AM +0200, Andreas Gruenbacher wrote:
> > Here's another update of this patch queue, hopefully with all wrinkles
> > ironed out now.
> > 
> > Darrick, I think Linus would be unhappy seeing the first four patches in
> > the gfs2 tree; could you put them into the xfs tree instead like we did
> > some time ago already?
> 
> Sure.  When I'm done reviewing them I'll put them in the iomap tree,
> though, since we now have a separate one. :)

I'd just keep the iomap stuff in the xfs tree as a separate set of
branches and merge them into the XFS for-next when composing it.
That way it still gets plenty of test coverage from all the XFS
devs and linux next without anyone having to think about.

You really only need to send separate pull requests for the iomap
and XFS branches - IMO, there's no really need to have a complete
new tree for it...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 0/5] iomap and gfs2 fixes
@ 2019-04-30 21:21     ` Dave Chinner
  0 siblings, 0 replies; 36+ messages in thread
From: Dave Chinner @ 2019-04-30 21:21 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, Apr 29, 2019 at 07:50:28PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 30, 2019 at 12:09:29AM +0200, Andreas Gruenbacher wrote:
> > Here's another update of this patch queue, hopefully with all wrinkles
> > ironed out now.
> > 
> > Darrick, I think Linus would be unhappy seeing the first four patches in
> > the gfs2 tree; could you put them into the xfs tree instead like we did
> > some time ago already?
> 
> Sure.  When I'm done reviewing them I'll put them in the iomap tree,
> though, since we now have a separate one. :)

I'd just keep the iomap stuff in the xfs tree as a separate set of
branches and merge them into the XFS for-next when composing it.
That way it still gets plenty of test coverage from all the XFS
devs and linux next without anyone having to think about.

You really only need to send separate pull requests for the iomap
and XFS branches - IMO, there's no really need to have a complete
new tree for it...

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 0/5] iomap and gfs2 fixes
  2019-04-30 21:21     ` [Cluster-devel] " Dave Chinner
@ 2019-05-01 15:06       ` Darrick J. Wong
  -1 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-05-01 15:06 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Gruenbacher, cluster-devel, Christoph Hellwig,
	Bob Peterson, Jan Kara, Ross Lagerwall, Mark Syms,
	Edwin Török, linux-fsdevel, linux-mm

On Wed, May 01, 2019 at 07:21:46AM +1000, Dave Chinner wrote:
> On Mon, Apr 29, 2019 at 07:50:28PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 30, 2019 at 12:09:29AM +0200, Andreas Gruenbacher wrote:
> > > Here's another update of this patch queue, hopefully with all wrinkles
> > > ironed out now.
> > > 
> > > Darrick, I think Linus would be unhappy seeing the first four patches in
> > > the gfs2 tree; could you put them into the xfs tree instead like we did
> > > some time ago already?
> > 
> > Sure.  When I'm done reviewing them I'll put them in the iomap tree,
> > though, since we now have a separate one. :)
> 
> I'd just keep the iomap stuff in the xfs tree as a separate set of
> branches and merge them into the XFS for-next when composing it.
> That way it still gets plenty of test coverage from all the XFS
> devs and linux next without anyone having to think about.
> 
> You really only need to send separate pull requests for the iomap
> and XFS branches - IMO, there's no really need to have a complete
> new tree for it...

<nod> That was totally a braino on my part -- I put the patches in the
iomap *branch* since now we have a separate *branch*. :)

(and just merged that branch into for-next)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cluster-devel] [PATCH v7 0/5] iomap and gfs2 fixes
@ 2019-05-01 15:06       ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2019-05-01 15:06 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, May 01, 2019 at 07:21:46AM +1000, Dave Chinner wrote:
> On Mon, Apr 29, 2019 at 07:50:28PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 30, 2019 at 12:09:29AM +0200, Andreas Gruenbacher wrote:
> > > Here's another update of this patch queue, hopefully with all wrinkles
> > > ironed out now.
> > > 
> > > Darrick, I think Linus would be unhappy seeing the first four patches in
> > > the gfs2 tree; could you put them into the xfs tree instead like we did
> > > some time ago already?
> > 
> > Sure.  When I'm done reviewing them I'll put them in the iomap tree,
> > though, since we now have a separate one. :)
> 
> I'd just keep the iomap stuff in the xfs tree as a separate set of
> branches and merge them into the XFS for-next when composing it.
> That way it still gets plenty of test coverage from all the XFS
> devs and linux next without anyone having to think about.
> 
> You really only need to send separate pull requests for the iomap
> and XFS branches - IMO, there's no really need to have a complete
> new tree for it...

<nod> That was totally a braino on my part -- I put the patches in the
iomap *branch* since now we have a separate *branch*. :)

(and just merged that branch into for-next)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david at fromorbit.com



^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-05-01 15:07 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-29 22:09 [PATCH v7 0/5] iomap and gfs2 fixes Andreas Gruenbacher
2019-04-29 22:09 ` [Cluster-devel] " Andreas Gruenbacher
2019-04-29 22:09 ` [PATCH v7 1/5] iomap: Clean up __generic_write_end calling Andreas Gruenbacher
2019-04-29 22:09   ` [Cluster-devel] " Andreas Gruenbacher
2019-04-30 15:14   ` Darrick J. Wong
2019-04-30 15:14     ` [Cluster-devel] " Darrick J. Wong
2019-04-29 22:09 ` [PATCH v7 2/5] fs: Turn __generic_write_end into a void function Andreas Gruenbacher
2019-04-29 22:09   ` [Cluster-devel] " Andreas Gruenbacher
2019-04-30 10:29   ` Christoph Hellwig
2019-04-30 10:29     ` [Cluster-devel] " Christoph Hellwig
2019-04-30 15:17   ` Darrick J. Wong
2019-04-30 15:17     ` [Cluster-devel] " Darrick J. Wong
2019-04-29 22:09 ` [PATCH v7 3/5] iomap: Fix use-after-free error in page_done callback Andreas Gruenbacher
2019-04-29 22:09   ` [Cluster-devel] " Andreas Gruenbacher
2019-04-30 15:23   ` Darrick J. Wong
2019-04-30 15:23     ` [Cluster-devel] " Darrick J. Wong
2019-04-29 22:09 ` [PATCH v7 4/5] iomap: Add a page_prepare callback Andreas Gruenbacher
2019-04-29 22:09   ` [Cluster-devel] " Andreas Gruenbacher
2019-04-30 15:26   ` Darrick J. Wong
2019-04-30 15:26     ` [Cluster-devel] " Darrick J. Wong
2019-04-29 22:09 ` [PATCH v7 5/5] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher
2019-04-29 22:09   ` [Cluster-devel] " Andreas Gruenbacher
2019-04-30 15:32   ` Darrick J. Wong
2019-04-30 15:32     ` [Cluster-devel] " Darrick J. Wong
2019-04-30 15:39     ` Andreas Gruenbacher
2019-04-30 15:39       ` [Cluster-devel] " Andreas Gruenbacher
2019-04-30 15:47       ` Darrick J. Wong
2019-04-30 15:47         ` [Cluster-devel] " Darrick J. Wong
2019-04-30 16:15         ` Andreas Grünbacher
2019-04-30 16:15           ` [Cluster-devel] " Andreas Grünbacher
2019-04-30  2:50 ` [PATCH v7 0/5] iomap and gfs2 fixes Darrick J. Wong
2019-04-30  2:50   ` [Cluster-devel] " Darrick J. Wong
2019-04-30 21:21   ` Dave Chinner
2019-04-30 21:21     ` [Cluster-devel] " Dave Chinner
2019-05-01 15:06     ` Darrick J. Wong
2019-05-01 15:06       ` [Cluster-devel] " Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.