linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Gruenbacher <agruenba@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Andreas Gruenbacher" <agruenba@redhat.com>,
	"Christoph Hellwig" <hch@lst.de>,
	cluster-devel <cluster-devel@redhat.com>,
	"Ross Lagerwall" <ross.lagerwall@citrix.com>,
	"Mark Syms" <Mark.Syms@citrix.com>,
	"Edwin Török" <edvin.torok@citrix.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED
Date: Fri, 22 Mar 2019 01:21:00 +0100	[thread overview]
Message-ID: <20190322002100.5628-1-agruenba@redhat.com> (raw)
In-Reply-To: <CAHc6FU5T2cMSycj=9BHCnPLkWT8MsdJGSJ8akh3cd6ni0b6WRA@mail.gmail.com>

On Fri, 22 Mar 2019 at 00:01, Andreas Gruenbacher <agruenba@redhat.com> wrote:
> On Thu, 21 Mar 2019 at 22:43, Dave Chinner <david@fromorbit.com> wrote:
> > The problem is calling balance_dirty_pages() inside the
> > ->iomap_begin/->iomap_end calls and not that it is called by the
> > iomap infrastructure itself, right?
> >
> > Is so, I'd prefer to see this in iomap_apply() after the call to
> > ops->iomap_end because iomap_file_buffered_write() can iterate and
> > call iomap_apply() multiple times. This would keep the balancing to
> > a per-iomap granularity, rather than a per-syscall granularity.
> >
> > i.e. if we do write(2GB), we want more than one balancing call
> > during that syscall, so it would be up to the filesystem to a) limit
> > the size of write mappings to something smaller (e.g. 1024 pages)
> > so that there are still frequent balancing calls for large writes.
>
> Hmm. The looping across multiple mappings isn't done in iomap_apply
> but in iomap_file_buffered_write, so the balancing could go into
> iomap_apply or iomap_file_buffered_write, but can't go further up the
> stack. Given that, iomap_file_buffered_write seems the better place,
> but this is still quite horrible.

Here's a more reasonable version of my first patch, with a cleaned up
and hopefully fixed gfs2 part.

In addition, this checks for IOMAP_F_UNBALANCED in iomap_dirty_actor,
the actor for iomap_file_dirty.  We don't use iomap_file_dirty in gfs2,
but we should probably allowing to skip the dirty page balancing there
as well.

Thanks,
Andreas
---
 fs/gfs2/bmap.c        | 64 +++++++++++++++++++++++++++++++++----------
 fs/iomap.c            |  6 ++--
 include/linux/iomap.h |  1 +
 3 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 02b2646d84b3a..628d66d07fc6c 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -974,6 +974,19 @@ static void gfs2_iomap_journaled_page_done(struct inode *inode, loff_t pos,
 	gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
 }
 
+static void gfs2_write_end(struct inode *inode, struct buffer_head *dibh)
+{
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_trans *tr = current->journal_info;
+
+	gfs2_ordered_add_inode(ip);
+
+	if (tr->tr_num_buf_new)
+		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+	else
+		gfs2_trans_add_meta(ip->i_gl, dibh);
+}
+
 static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 				  loff_t length, unsigned flags,
 				  struct iomap *iomap,
@@ -996,6 +1009,25 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 	if (ret)
 		goto out_unlock;
 
+	if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip) || inode == sdp->sd_rindex) {
+		int max_pages;
+		u64 max_length;
+
+		iomap->flags |= IOMAP_F_UNBALANCED;
+
+		/*
+		 * Limit the write size: this ensures that write throttling
+		 * will kick in fast enough even when we don't call
+		 * balance_dirty_pages_ratelimited for each page written.
+		 */
+		max_pages = current->nr_dirtied_pause - current->nr_dirtied;
+		if (max_pages < 8)
+			max_pages = 8;
+		max_length = (u64)max_pages << PAGE_SHIFT;
+		if (iomap->length > max_length)
+			iomap->length = max_length;
+	}
+
 	alloc_required = unstuff || iomap->type == IOMAP_HOLE;
 
 	if (alloc_required || gfs2_is_jdata(ip))
@@ -1052,6 +1084,11 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 	}
 	if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip))
 		iomap->page_done = gfs2_iomap_journaled_page_done;
+
+	if (!(iomap->flags & IOMAP_F_UNBALANCED)) {
+		gfs2_write_end(inode, mp->mp_bh[0]);
+		gfs2_trans_end(sdp);
+	}
 	return 0;
 
 out_trans_end:
@@ -1103,30 +1140,29 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 			  ssize_t written, unsigned flags, struct iomap *iomap)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	struct gfs2_trans *tr = current->journal_info;
 	struct buffer_head *dibh = iomap->private;
 
 	if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
 		goto out;
 
-	if (iomap->type != IOMAP_INLINE) {
-		gfs2_ordered_add_inode(ip);
+	if (current->journal_info) {
+		struct gfs2_sbd *sdp = GFS2_SB(inode);
 
-		if (tr->tr_num_buf_new)
-			__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-		else
-			gfs2_trans_add_meta(ip->i_gl, dibh);
-	}
+		if (iomap->type != IOMAP_INLINE)
+			gfs2_write_end(inode, dibh);
 
-	if (inode == sdp->sd_rindex) {
-		adjust_fs_space(inode);
-		sdp->sd_rindex_uptodate = 0;
-	}
+		if (inode == sdp->sd_rindex) {
+			adjust_fs_space(inode);
+			sdp->sd_rindex_uptodate = 0;
+		}
 
-	gfs2_trans_end(sdp);
+		gfs2_trans_end(sdp);
+	}
 	gfs2_inplace_release(ip);
 
+	if (iomap->flags & IOMAP_F_UNBALANCED)
+		balance_dirty_pages_ratelimited(inode->i_mapping);
+
 	if (length != written && (iomap->flags & IOMAP_F_NEW)) {
 		/* Deallocate blocks that were just allocated. */
 		loff_t blockmask = i_blocksize(inode) - 1;
diff --git a/fs/iomap.c b/fs/iomap.c
index 97cb9d486a7da..5f950fee0834f 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -863,7 +863,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		written += copied;
 		length -= copied;
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		if (!(iomap->flags & IOMAP_F_UNBALANCED))
+			balance_dirty_pages_ratelimited(inode->i_mapping);
 	} while (iov_iter_count(i) && length);
 
 	return written ? written : status;
@@ -945,7 +946,8 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		written += status;
 		length -= status;
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		if (!(iomap->flags & IOMAP_F_UNBALANCED))
+			balance_dirty_pages_ratelimited(inode->i_mapping);
 	} while (length);
 
 	return written;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0fefb5455bdaf..e9a04e76a3217 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -35,6 +35,7 @@ struct vm_fault;
 #define IOMAP_F_NEW		0x01	/* blocks have been newly allocated */
 #define IOMAP_F_DIRTY		0x02	/* uncommitted metadata */
 #define IOMAP_F_BUFFER_HEAD	0x04	/* file system requires buffer heads */
+#define IOMAP_F_UNBALANCED	0x08	/* don't balance dirty pages */
 
 /*
  * Flags that only need to be reported for IOMAP_REPORT requests:
-- 
2.20.1


  parent reply	other threads:[~2019-03-22  0:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-21 13:13 gfs2 iomap dealock, IOMAP_F_UNBALANCED Andreas Gruenbacher
2019-03-21 21:43 ` Dave Chinner
2019-03-21 23:01   ` Andreas Gruenbacher
2019-03-22  0:21   ` Andreas Gruenbacher [this message]
2019-03-27 16:49     ` Ross Lagerwall
2019-03-28 16:51 ` Christoph Hellwig
2019-03-29 22:13   ` Andreas Gruenbacher
2019-04-07  7:32     ` Christoph Hellwig
2019-04-08  8:53       ` Andreas Gruenbacher
2019-04-08 13:44         ` Jan Kara
2019-04-09 12:15           ` Christoph Hellwig
2019-04-09 12:27             ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190322002100.5628-1-agruenba@redhat.com \
    --to=agruenba@redhat.com \
    --cc=Mark.Syms@citrix.com \
    --cc=cluster-devel@redhat.com \
    --cc=david@fromorbit.com \
    --cc=edvin.torok@citrix.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=ross.lagerwall@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).