linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ethanlien <ethanlien@synology.com>
To: Martin Raiber <martin@urbackup.org>
Cc: Chris Mason <clm@fb.com>,
	linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.cz>,
	linux-btrfs-owner@vger.kernel.org
Subject: Re: [PATCH v2] btrfs: balance dirty metadata pages in btrfs_finish_ordered_io
Date: Wed, 19 Dec 2018 18:33:11 +0800	[thread overview]
Message-ID: <362642e5e618dbf17effc17e0cb7457f@synology.com> (raw)
In-Reply-To: <01020167bc7811f0-cb1970f5-3d51-49f9-a5bb-63ba1ea35eea-000000@eu-west-1.amazonses.com>

Martin Raiber 於 2018-12-17 22:00 寫到:

>>>> 
>>> I had lockups with this patch as well. If you put e.g. a loop device 
>>> on
>>> top of a btrfs file, loop sets PF_LESS_THROTTLE to avoid a feed back
>>> loop causing delays. The task balancing dirty pages in
>>> btrfs_finish_ordered_io doesn't have the flag and causes slow-downs. 
>>> In
>>> my case it managed to cause a feedback loop where it queues other
>>> btrfs_finish_ordered_io and gets stuck completely.
>>> 
>> 
>> The data writepage endio will queue a work for
>> btrfs_finish_ordered_io() in a separate workqueue and clear page's
>> writeback, so throttling in btrfs_finish_ordered_io() should not slow
>> down flusher thread. One suspicious point is while the caller is
>> waiting a range of ordered_extents to complete, they will be
>> blocked until balance_dirty_pages_ratelimited() make some
>> progress, since we finish ordered_extents in
>> btrfs_finish_ordered_io().
>> Do you have call stack information for stuck processes or using
>> fsync/sync frequently? If this is the case, maybe we should pull
>> this thing out and try balance dirty metadata pages somewhere.
> 
> Yeah like,
> 
> [875317.071433] Call Trace:
> [875317.071438]  ? __schedule+0x306/0x7f0
> [875317.071442]  schedule+0x32/0x80
> [875317.071447]  btrfs_start_ordered_extent+0xed/0x120
> [875317.071450]  ? remove_wait_queue+0x60/0x60
> [875317.071454]  btrfs_wait_ordered_range+0xa0/0x100
> [875317.071457]  btrfs_sync_file+0x1d6/0x400
> [875317.071461]  ? do_fsync+0x38/0x60
> [875317.071463]  ? btrfs_fdatawrite_range+0x50/0x50
> [875317.071465]  do_fsync+0x38/0x60
> [875317.071468]  __x64_sys_fsync+0x10/0x20
> [875317.071470]  do_syscall_64+0x55/0x100
> [875317.071473]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> so I guess the problem is that the calling balance_dirty_pages causes
> fsyncs to the same btrfs (via my unusual setup of loop+fuse)? Those
> fsyncs are deadlocked because they are called indirectly from
> btrfs_finish_ordered_io... It is a unusal setup, which is why I did not
> post it to the mailing list initially.

To me this is not like a real deadlock. The fsync call invokes two 
steps:
(1) flushing dirty data pages, (2) update corresponding metadata to
point to those flushed data. Since step1 consume dirty pages and
step2 produce more dirty pages, in this patch we leave step1
unchanged and block step2 in btrfs_finish_ordered_io(), which
seems reasonable to a OOM fix. The problem is, if there are
other processes continually writing new data, the fsync call will
need to wait the metadata update for a long time, even its dirty
data has been flushed long time ago.

Back to the deadlock problem, what Chris found is really a deadlock,
and it can be fixed by adding a check of free space inode.
For the fsync delay problem, it depends on whether we want to
control dirty metadata page size to match dirty_bytes limit here.
If we don't want, we revert this patch and try flushing dirty metadata
pages somewhere. It should be enough to fix the OOM. But if we
want to control dirty metadata page size to match dirty_bytes limit, we
still need some throttling here. Setting __GFP_WRITE in extent_buffer
page is bad since in most of the time we use them as read cache to
speedup b-tree traversal. Adding PF_LESS_THROTTLE in
btrfs_finish_ordered_io() might help since it create a window where
new writes are blocked but all the flushing and metadata update
keep unblocked, and still remain some kind of throttling. But it
needs more test to know the delay of fsync in the worst case.


  reply	other threads:[~2018-12-19 10:33 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-28  5:48 [PATCH v2] btrfs: balance dirty metadata pages in btrfs_finish_ordered_io Ethan Lien
2018-05-29 15:33 ` David Sterba
2018-12-12 14:47 ` Chris Mason
2018-12-12 15:22   ` Martin Raiber
2018-12-12 15:36     ` David Sterba
2018-12-12 17:55       ` Chris Mason
2018-12-14  8:07     ` ethanlien
2018-12-17 14:00       ` Martin Raiber
2018-12-19 10:33         ` ethanlien [this message]
2018-12-19 14:22           ` Chris Mason
2018-12-13  8:38   ` ethanlien
2019-01-04 15:59     ` David Sterba
2019-01-09 10:07       ` ethanlien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=362642e5e618dbf17effc17e0cb7457f@synology.com \
    --to=ethanlien@synology.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs-owner@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin@urbackup.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).