linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] iomap: Add a page_prepare callback
@ 2019-04-24 17:18 Andreas Gruenbacher
  2019-04-24 17:18 ` [PATCH 2/2] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Andreas Gruenbacher @ 2019-04-24 17:18 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig
  Cc: Bob Peterson, Jan Kara, Dave Chinner, Ross Lagerwall, Mark Syms,
	Edwin Török, linux-fsdevel, linux-mm,
	Andreas Gruenbacher

Add a page_prepare calback that's called before a page is written to.  This
will be used by gfs2 to start a transaction in page_prepare and end it in
page_done.  Other filesystems that implement data journaling will require the
same kind of mechanism.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/iomap.c            | 4 ++++
 include/linux/iomap.h | 9 ++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 97cb9d486a7d..abd9aa76dbd1 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -684,6 +684,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		status = __block_write_begin_int(page, pos, len, NULL, iomap);
 	else
 		status = __iomap_write_begin(inode, pos, len, page, iomap);
+
+	if (likely(!status) && iomap->page_prepare)
+		status = iomap->page_prepare(inode, pos, len, page, iomap);
+
 	if (unlikely(status)) {
 		unlock_page(page);
 		put_page(page);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0fefb5455bda..0982f3e13e56 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -65,10 +65,13 @@ struct iomap {
 	void			*private; /* filesystem private */
 
 	/*
-	 * Called when finished processing a page in the mapping returned in
-	 * this iomap.  At least for now this is only supported in the buffered
-	 * write path.
+	 * Called before / after processing a page in the mapping returned in
+	 * this iomap.  At least for now, this is only supported in the
+	 * buffered write path.  When page_prepare returns 0 for a page,
+	 * page_done is called for that page as well.
 	 */
+	int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len,
+			struct page *page, struct iomap *iomap);
 	void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
 			struct page *page, struct iomap *iomap);
 };
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] gfs2: Fix iomap write page reclaim deadlock
  2019-04-24 17:18 [PATCH 1/2] iomap: Add a page_prepare callback Andreas Gruenbacher
@ 2019-04-24 17:18 ` Andreas Gruenbacher
  2019-04-25  7:59 ` [PATCH 1/2] iomap: Add a page_prepare callback Christoph Hellwig
  2019-04-25  8:32 ` Jan Kara
  2 siblings, 0 replies; 6+ messages in thread
From: Andreas Gruenbacher @ 2019-04-24 17:18 UTC (permalink / raw)
  To: cluster-devel, Christoph Hellwig
  Cc: Bob Peterson, Jan Kara, Dave Chinner, Ross Lagerwall, Mark Syms,
	Edwin Török, linux-fsdevel, linux-mm,
	Andreas Gruenbacher

Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
buffered writes by starting a transaction in iomap_begin, writing a range of
pages, and ending that transaction in iomap_end.  This approach suffers from
two problems:

  (1) Any allocations necessary for the write are done in iomap_begin, so when
  the data aren't journaled, there is no need for keeping the transaction open
  until iomap_end.

  (2) Transactions keep the gfs2 log flush lock held.  When
  iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
  gfs2_write_inode, which will try to flush the log.  This requires taking the
  log flush lock which is already held, resulting in a deadlock.

Fix both of these issues by not keeping transactions open from iomap_begin to
iomap_end.  Instead, start a small transaction in page_prepare and end it in
page_done when necessary.

Reported-by: Edwin Török <edvin.torok@citrix.com>
Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/aops.c | 14 +++++--
 fs/gfs2/bmap.c | 99 ++++++++++++++++++++++++++++----------------------
 2 files changed, 65 insertions(+), 48 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 05dd78f4b2b3..6210d4429d84 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -649,7 +649,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
  */
 void adjust_fs_space(struct inode *inode)
 {
-	struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
 	struct gfs2_inode *l_ip = GFS2_I(sdp->sd_sc_inode);
 	struct gfs2_statfs_change_host *m_sc = &sdp->sd_statfs_master;
@@ -657,10 +657,13 @@ void adjust_fs_space(struct inode *inode)
 	struct buffer_head *m_bh, *l_bh;
 	u64 fs_total, new_free;
 
+	if (gfs2_trans_begin(sdp, 2 * RES_STATFS, 0) != 0)
+		return;
+
 	/* Total up the file system space, according to the latest rindex. */
 	fs_total = gfs2_ri_total(sdp);
 	if (gfs2_meta_inode_buffer(m_ip, &m_bh) != 0)
-		return;
+		goto out;
 
 	spin_lock(&sdp->sd_statfs_spin);
 	gfs2_statfs_change_in(m_sc, m_bh->b_data +
@@ -675,11 +678,14 @@ void adjust_fs_space(struct inode *inode)
 	gfs2_statfs_change(sdp, new_free, new_free, 0);
 
 	if (gfs2_meta_inode_buffer(l_ip, &l_bh) != 0)
-		goto out;
+		goto out2;
 	update_statfs(sdp, m_bh, l_bh);
 	brelse(l_bh);
-out:
+out2:
 	brelse(m_bh);
+out:
+	sdp->sd_rindex_uptodate = 0;
+	gfs2_trans_end(sdp);
 }
 
 /**
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 5da4ca9041c0..34543a4d4e4a 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -991,13 +991,25 @@ static void gfs2_write_unlock(struct inode *inode)
 	gfs2_glock_dq_uninit(&ip->i_gh);
 }
 
-static void gfs2_iomap_journaled_page_done(struct inode *inode, loff_t pos,
-				unsigned copied, struct page *page,
-				struct iomap *iomap)
+static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
+				   unsigned len, struct page *page,
+				   struct iomap *iomap)
+{
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+
+	return gfs2_trans_begin(sdp, RES_DINODE + (len >> inode->i_blkbits), 0);
+}
+
+static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
+				 unsigned copied, struct page *page,
+				 struct iomap *iomap)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
 
-	gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
+	if (!gfs2_is_stuffed(ip))
+		gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
+	gfs2_trans_end(sdp);
 }
 
 static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
@@ -1052,32 +1064,48 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 	if (alloc_required)
 		rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
 
-	ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits);
-	if (ret)
-		goto out_trans_fail;
+	if (unstuff || iomap->type == IOMAP_HOLE) {
+		struct gfs2_trans *tr;
 
-	if (unstuff) {
-		ret = gfs2_unstuff_dinode(ip, NULL);
+		ret = gfs2_trans_begin(sdp, rblocks,
+				       iomap->length >> inode->i_blkbits);
 		if (ret)
-			goto out_trans_end;
-		release_metapath(mp);
-		ret = gfs2_iomap_get(inode, iomap->offset, iomap->length,
-				     flags, iomap, mp);
-		if (ret)
-			goto out_trans_end;
-	}
+			goto out_trans_fail;
 
-	if (iomap->type == IOMAP_HOLE) {
-		ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
-		if (ret) {
-			gfs2_trans_end(sdp);
-			gfs2_inplace_release(ip);
-			punch_hole(ip, iomap->offset, iomap->length);
-			goto out_qunlock;
+		if (unstuff) {
+			ret = gfs2_unstuff_dinode(ip, NULL);
+			if (ret)
+				goto out_trans_end;
+			release_metapath(mp);
+			ret = gfs2_iomap_get(inode, iomap->offset,
+					     iomap->length, flags, iomap, mp);
+			if (ret)
+				goto out_trans_end;
+		}
+
+		if (iomap->type == IOMAP_HOLE) {
+			ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
+			if (ret) {
+				gfs2_trans_end(sdp);
+				gfs2_inplace_release(ip);
+				punch_hole(ip, iomap->offset, iomap->length);
+				goto out_qunlock;
+			}
 		}
+
+		tr = current->journal_info;
+		if (tr->tr_num_buf_new)
+			__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+		else
+			gfs2_trans_add_meta(ip->i_gl, mp->mp_bh[0]);
+
+		gfs2_trans_end(sdp);
+	}
+
+	if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip)) {
+		iomap->page_prepare = gfs2_iomap_page_prepare;
+		iomap->page_done = gfs2_iomap_page_done;
 	}
-	if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip))
-		iomap->page_done = gfs2_iomap_journaled_page_done;
 	return 0;
 
 out_trans_end:
@@ -1116,10 +1144,6 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 		    iomap->type != IOMAP_MAPPED)
 			ret = -ENOTBLK;
 	}
-	if (!ret) {
-		get_bh(mp.mp_bh[0]);
-		iomap->private = mp.mp_bh[0];
-	}
 	release_metapath(&mp);
 	trace_gfs2_iomap_end(ip, iomap, ret);
 	return ret;
@@ -1130,27 +1154,16 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct gfs2_trans *tr = current->journal_info;
-	struct buffer_head *dibh = iomap->private;
 
 	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
 		goto out;
 
-	if (iomap->type != IOMAP_INLINE) {
+	if (!gfs2_is_stuffed(ip))
 		gfs2_ordered_add_inode(ip);
 
-		if (tr->tr_num_buf_new)
-			__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-		else
-			gfs2_trans_add_meta(ip->i_gl, dibh);
-	}
-
-	if (inode == sdp->sd_rindex) {
+	if (inode == sdp->sd_rindex)
 		adjust_fs_space(inode);
-		sdp->sd_rindex_uptodate = 0;
-	}
 
-	gfs2_trans_end(sdp);
 	gfs2_inplace_release(ip);
 
 	if (length != written && (iomap->flags & IOMAP_F_NEW)) {
@@ -1170,8 +1183,6 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	gfs2_write_unlock(inode);
 
 out:
-	if (dibh)
-		brelse(dibh);
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] iomap: Add a page_prepare callback
  2019-04-24 17:18 [PATCH 1/2] iomap: Add a page_prepare callback Andreas Gruenbacher
  2019-04-24 17:18 ` [PATCH 2/2] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher
@ 2019-04-25  7:59 ` Christoph Hellwig
  2019-04-25  8:32 ` Jan Kara
  2 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2019-04-25  7:59 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
	Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
	linux-fsdevel, linux-mm

On Wed, Apr 24, 2019 at 07:18:03PM +0200, Andreas Gruenbacher wrote:
> Add a page_prepare calback that's called before a page is written to.  This
> will be used by gfs2 to start a transaction in page_prepare and end it in
> page_done.  Other filesystems that implement data journaling will require the
> same kind of mechanism.

This looks basically fine to me.  But I think it would be nicer to
add a iomap_page_ops structure so that we don't have to add more
pointers directly to the iomap.  We can make that struct pointer const
also to avoid runtime overwriting attacks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] iomap: Add a page_prepare callback
  2019-04-24 17:18 [PATCH 1/2] iomap: Add a page_prepare callback Andreas Gruenbacher
  2019-04-24 17:18 ` [PATCH 2/2] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher
  2019-04-25  7:59 ` [PATCH 1/2] iomap: Add a page_prepare callback Christoph Hellwig
@ 2019-04-25  8:32 ` Jan Kara
  2019-04-25 15:03   ` Christoph Hellwig
  2019-04-25 15:26   ` Andreas Gruenbacher
  2 siblings, 2 replies; 6+ messages in thread
From: Jan Kara @ 2019-04-25  8:32 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
	Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
	linux-fsdevel, linux-mm

On Wed 24-04-19 19:18:03, Andreas Gruenbacher wrote:
> Add a page_prepare calback that's called before a page is written to.  This
> will be used by gfs2 to start a transaction in page_prepare and end it in
> page_done.  Other filesystems that implement data journaling will require the
> same kind of mechanism.
> 
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

Thanks for the patch. Some comments below.

> diff --git a/fs/iomap.c b/fs/iomap.c
> index 97cb9d486a7d..abd9aa76dbd1 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -684,6 +684,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		status = __block_write_begin_int(page, pos, len, NULL, iomap);
>  	else
>  		status = __iomap_write_begin(inode, pos, len, page, iomap);
> +
> +	if (likely(!status) && iomap->page_prepare)
> +		status = iomap->page_prepare(inode, pos, len, page, iomap);
> +
>  	if (unlikely(status)) {
>  		unlock_page(page);
>  		put_page(page);

So this gets called after a page is locked. Is it OK for GFS2 to acquire
sd_log_flush_lock under page lock? Because e.g. gfs2_write_jdata_pagevec()
seems to acquire these locks the other way around so that could cause ABBA
deadlocks?

Also just looking at the code I was wondering about the following. E.g. in
iomap_write_end() we have code like:

        if (iomap->type == IOMAP_INLINE) {
		foo
	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
		bar
	} else {
		baz
	}

	if (iomap->page_done)
		iomap->page_done(...);

And now something very similar is in iomap_write_begin(). So won't it be
more natural to just mandate ->page_prepare() and ->page_done() callbacks
and each filesystem would set it to a helper function it needs? Probably we
could get rid of IOMAP_F_BUFFER_HEAD flag that way...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] iomap: Add a page_prepare callback
  2019-04-25  8:32 ` Jan Kara
@ 2019-04-25 15:03   ` Christoph Hellwig
  2019-04-25 15:26   ` Andreas Gruenbacher
  1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2019-04-25 15:03 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andreas Gruenbacher, cluster-devel, Christoph Hellwig,
	Bob Peterson, Dave Chinner, Ross Lagerwall, Mark Syms,
	Edwin Török, linux-fsdevel, linux-mm

On Thu, Apr 25, 2019 at 10:32:52AM +0200, Jan Kara wrote:
> Also just looking at the code I was wondering about the following. E.g. in
> iomap_write_end() we have code like:
> 
>         if (iomap->type == IOMAP_INLINE) {
> 		foo
> 	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> 		bar
> 	} else {
> 		baz
> 	}
> 
> 	if (iomap->page_done)
> 		iomap->page_done(...);
> 
> And now something very similar is in iomap_write_begin(). So won't it be
> more natural to just mandate ->page_prepare() and ->page_done() callbacks
> and each filesystem would set it to a helper function it needs? Probably we
> could get rid of IOMAP_F_BUFFER_HEAD flag that way...

I don't want pointless indirect calls for the default, non-buffer
head case.  Also inline really is a special case independent of
what the caller could pass in as flags or callbacks.  We could try to
hide the buffer_head stuff in there, but then again I'd rather kill
that off sooner than later.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] iomap: Add a page_prepare callback
  2019-04-25  8:32 ` Jan Kara
  2019-04-25 15:03   ` Christoph Hellwig
@ 2019-04-25 15:26   ` Andreas Gruenbacher
  1 sibling, 0 replies; 6+ messages in thread
From: Andreas Gruenbacher @ 2019-04-25 15:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Dave Chinner,
	Ross Lagerwall, Mark Syms, Edwin Török, linux-fsdevel,
	linux-mm

On Thu, 25 Apr 2019 at 10:32, Jan Kara <jack@suse.cz> wrote:
> On Wed 24-04-19 19:18:03, Andreas Gruenbacher wrote:
> > Add a page_prepare calback that's called before a page is written to.  This
> > will be used by gfs2 to start a transaction in page_prepare and end it in
> > page_done.  Other filesystems that implement data journaling will require the
> > same kind of mechanism.
> >
> > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
>
> Thanks for the patch. Some comments below.
>
> > diff --git a/fs/iomap.c b/fs/iomap.c
> > index 97cb9d486a7d..abd9aa76dbd1 100644
> > --- a/fs/iomap.c
> > +++ b/fs/iomap.c
> > @@ -684,6 +684,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> >               status = __block_write_begin_int(page, pos, len, NULL, iomap);
> >       else
> >               status = __iomap_write_begin(inode, pos, len, page, iomap);
> > +
> > +     if (likely(!status) && iomap->page_prepare)
> > +             status = iomap->page_prepare(inode, pos, len, page, iomap);
> > +
> >       if (unlikely(status)) {
> >               unlock_page(page);
> >               put_page(page);
>
> So this gets called after a page is locked. Is it OK for GFS2 to acquire
> sd_log_flush_lock under page lock? Because e.g. gfs2_write_jdata_pagevec()
> seems to acquire these locks the other way around so that could cause ABBA
> deadlocks?

Good catch, the callback indeed needs to happen earlier.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-04-25 15:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-24 17:18 [PATCH 1/2] iomap: Add a page_prepare callback Andreas Gruenbacher
2019-04-24 17:18 ` [PATCH 2/2] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher
2019-04-25  7:59 ` [PATCH 1/2] iomap: Add a page_prepare callback Christoph Hellwig
2019-04-25  8:32 ` Jan Kara
2019-04-25 15:03   ` Christoph Hellwig
2019-04-25 15:26   ` Andreas Gruenbacher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).