linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Martin Raiber <martin@urbackup.org>
Cc: Chris Mason <clm@fb.com>, Ethan Lien <ethanlien@synology.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	David Sterba <dsterba@suse.cz>
Subject: Re: [PATCH v2] btrfs: balance dirty metadata pages in btrfs_finish_ordered_io
Date: Wed, 12 Dec 2018 16:36:16 +0100	[thread overview]
Message-ID: <20181212153616.GY23615@twin.jikos.cz> (raw)
In-Reply-To: <01020167a30347da-385e2eff-ed13-422a-b27f-c3d5933aaef2-000000@eu-west-1.amazonses.com>

On Wed, Dec 12, 2018 at 03:22:40PM +0000, Martin Raiber wrote:
> On 12.12.2018 15:47 Chris Mason wrote:
> > On 28 May 2018, at 1:48, Ethan Lien wrote:
> >
> > It took me a while to trigger, but this actually deadlocks ;)  More 
> > below.
> >
> >> [Problem description and how we fix it]
> >> We should balance dirty metadata pages at the end of
> >> btrfs_finish_ordered_io, since a small, unmergeable random write can
> >> potentially produce dirty metadata which is multiple times larger than
> >> the data itself. For example, a small, unmergeable 4KiB write may
> >> produce:
> >>
> >>     16KiB dirty leaf (and possibly 16KiB dirty node) in subvolume tree
> >>     16KiB dirty leaf (and possibly 16KiB dirty node) in checksum tree
> >>     16KiB dirty leaf (and possibly 16KiB dirty node) in extent tree
> >>
> >> Although we do call balance dirty pages in write side, but in the
> >> buffered write path, most metadata are dirtied only after we reach the
> >> dirty background limit (which by far only counts dirty data pages) and
> >> wakeup the flusher thread. If there are many small, unmergeable random
> >> writes spread in a large btree, we'll find a burst of dirty pages
> >> exceeds the dirty_bytes limit after we wakeup the flusher thread - 
> >> which
> >> is not what we expect. In our machine, it caused out-of-memory problem
> >> since a page cannot be dropped if it is marked dirty.
> >>
> >> Someone may worry about we may sleep in 
> >> btrfs_btree_balance_dirty_nodelay,
> >> but since we do btrfs_finish_ordered_io in a separate worker, it will 
> >> not
> >> stop the flusher consuming dirty pages. Also, we use different worker 
> >> for
> >> metadata writeback endio, sleep in btrfs_finish_ordered_io help us 
> >> throttle
> >> the size of dirty metadata pages.
> > In general, slowing down btrfs_finish_ordered_io isn't ideal because it 
> > adds latency to places we need to finish quickly.  Also, 
> > btrfs_finish_ordered_io is used by the free space cache.  Even though 
> > this happens from its own workqueue, it means completing free space 
> > cache writeback may end up waiting on balance_dirty_pages, something 
> > like this stack trace:
> >
> > [..]
> >
> > Eventually, we have every process in the system waiting on 
> > balance_dirty_pages(), and nobody is able to make progress on page 
> > writeback.
> >
> I had lockups with this patch as well. If you put e.g. a loop device on
> top of a btrfs file, loop sets PF_LESS_THROTTLE to avoid a feed back
> loop causing delays. The task balancing dirty pages in
> btrfs_finish_ordered_io doesn't have the flag and causes slow-downs. In
> my case it managed to cause a feedback loop where it queues other
> btrfs_finish_ordered_io and gets stuck completely.

This does not look like the artificial and hard to hit case that's in
the original patch. I'm thinking about sending a revert to 4.20-rc6, the
deadlock is IMO worse than OOM.

  reply	other threads:[~2018-12-12 15:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-28  5:48 [PATCH v2] btrfs: balance dirty metadata pages in btrfs_finish_ordered_io Ethan Lien
2018-05-29 15:33 ` David Sterba
2018-12-12 14:47 ` Chris Mason
2018-12-12 15:22   ` Martin Raiber
2018-12-12 15:36     ` David Sterba [this message]
2018-12-12 17:55       ` Chris Mason
2018-12-14  8:07     ` ethanlien
2018-12-17 14:00       ` Martin Raiber
2018-12-19 10:33         ` ethanlien
2018-12-19 14:22           ` Chris Mason
2018-12-13  8:38   ` ethanlien
2019-01-04 15:59     ` David Sterba
2019-01-09 10:07       ` ethanlien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181212153616.GY23615@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=clm@fb.com \
    --cc=ethanlien@synology.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin@urbackup.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).