linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Andreas Gruenbacher <agruenba@redhat.com>
Cc: "Christoph Hellwig" <hch@lst.de>,
	cluster-devel <cluster-devel@redhat.com>,
	"Dave Chinner" <david@fromorbit.com>,
	"Ross Lagerwall" <ross.lagerwall@citrix.com>,
	"Mark Syms" <Mark.Syms@citrix.com>,
	"Edwin Török" <edvin.torok@citrix.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"Jan Kara" <jack@suse.cz>,
	linux-mm@kvack.org
Subject: Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED
Date: Mon, 8 Apr 2019 15:44:05 +0200	[thread overview]
Message-ID: <20190408134405.GA15023@quack2.suse.cz> (raw)
In-Reply-To: <CAHc6FU7kgm4OyrY-KRb8H2w6LDrWDSJ2p=UgZeeJ8YrHynKU2w@mail.gmail.com>

On Mon 08-04-19 10:53:34, Andreas Gruenbacher wrote:
> On Sun, 7 Apr 2019 at 09:32, Christoph Hellwig <hch@lst.de> wrote:
> >
> > [adding Jan and linux-mm]
> >
> > On Fri, Mar 29, 2019 at 11:13:00PM +0100, Andreas Gruenbacher wrote:
> > > > But what is the requirement to do this in writeback context?  Can't
> > > > we move it out into another context instead?
> > >
> > > Indeed, this isn't for data integrity in this case but because the
> > > dirty limit is exceeded. What other context would you suggest to move
> > > this to?
> > >
> > > (The iomap flag I've proposed would save us from getting into this
> > > situation in the first place.)
> >
> > Your patch does two things:
> >
> >  - it only calls balance_dirty_pages_ratelimited once per write
> >    operation instead of once per page.  In the past btrfs did
> >    hacks like that, but IIRC they caused VM balancing issues.
> >    That is why everyone now calls balance_dirty_pages_ratelimited
> >    one per page.  If calling it at a coarse granularity would
> >    be fine we should do it everywhere instead of just in gfs2
> >    in journaled mode
> >  - it artifically reduces the size of writes to a low value,
> >    which I suspect is going to break real life application
> 
> Not quite, balance_dirty_pages_ratelimited is called from iomap_end,
> so once per iomap mapping returned, not per write. (The first version
> of this patch got that wrong by accident, but not the second.)
> 
> We can limit the size of the mappings returned just in that case. I'm
> aware that there is a risk of balancing problems, I just don't have
> any better ideas.
> 
> This is a problem all filesystems with data-journaling will have with
> iomap, it's not that gfs2 is doing anything particularly stupid.

I agree that if ext4 would be using iomap, it would have similar issues.

> > So I really think we need to fix this properly.  And if that means
> > that you can't make use of the iomap batching for gfs2 in journaled
> > mode that is still a better option.
> 
> That would mean using the old-style, page-size allocations, and a
> completely separate write path in that case. That would be quite a
> nightmare.
> 
> > But I really think you need
> > to look into the scope of your flush_log and figure out a good way
> > to reduce that as solve the root cause.
> 
> We won't be able to do a log flush while another transaction is
> active, but that's what's needed to clean dirty pages. iomap doesn't
> allow us to put the block allocation into a separate transaction from
> the page writes; for that, the opposite to the page_done hook would
> probably be needed.

I agree that a ->page_prepare() hook would be probably the cleanest
solution for this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2019-04-08 13:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-21 13:13 gfs2 iomap dealock, IOMAP_F_UNBALANCED Andreas Gruenbacher
2019-03-21 21:43 ` Dave Chinner
2019-03-21 23:01   ` Andreas Gruenbacher
2019-03-22  0:21   ` Andreas Gruenbacher
2019-03-27 16:49     ` Ross Lagerwall
2019-03-28 16:51 ` Christoph Hellwig
2019-03-29 22:13   ` Andreas Gruenbacher
2019-04-07  7:32     ` Christoph Hellwig
2019-04-08  8:53       ` Andreas Gruenbacher
2019-04-08 13:44         ` Jan Kara [this message]
2019-04-09 12:15           ` Christoph Hellwig
2019-04-09 12:27             ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190408134405.GA15023@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=Mark.Syms@citrix.com \
    --cc=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=david@fromorbit.com \
    --cc=edvin.torok@citrix.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ross.lagerwall@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).