All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Josef Bacik <josef@toxicpanda.com>,
	hannes@cmpxchg.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	kernel-team@fb.com, linux-btrfs@vger.kernel.org,
	Josef Bacik <jbacik@fb.com>
Subject: Re: [PATCH v3 06/10] writeback: introduce super_operations->write_metadata
Date: Thu, 4 Jan 2018 12:32:07 +1100	[thread overview]
Message-ID: <20180104013207.GB32627@dastard> (raw)
In-Reply-To: <20180103135921.GF4911@quack2.suse.cz>

On Wed, Jan 03, 2018 at 02:59:21PM +0100, Jan Kara wrote:
> On Wed 03-01-18 13:32:19, Dave Chinner wrote:
> > I think we could probably block ->write_metadata if necessary via a
> > completion/wakeup style notification when a specific LSN is reached
> > by the log tail, but realistically if there's any amount of data
> > needing to be written it'll throttle data writes because the IO
> > pipeline is being kept full by background metadata writes....
> 
> So the problem I'm concerned about is a corner case. Consider a situation
> when you have no dirty data, only dirty metadata but enough of them to
> trigger background writeback. How should metadata writeback behave for XFS
> in this case? Who should be responsible that wb_writeback() just does not
> loop invoking ->write_metadata() as fast as CPU allows until xfsaild makes
> enough progress?
>
> Thinking about this today, I think this looping prevention belongs to
> wb_writeback().

Well, backgroudn data writeback can block in two ways. One is during
IO submission when the request queue is full, the other is when all
dirty inodes have had some work done on them and have all been moved
to b_more_io - wb_writeback waits for the __I_SYNC bit to be cleared
on the last(?) inode on that list, hence backing off before
submitting more IO.

IOws, there's a "during writeback" blocking mechanism as well as a
"between cycles" block mechanism.

> Sadly we don't have much info to decide how long to sleep
> before trying more writeback so we'd have to just sleep for
> <some_magic_amount> if we found no writeback happened in the last writeback
> round before going through the whole writeback loop again.

Right - I don't think we can provide a generic "between cycles"
blocking mechanism for XFS, but I'm pretty sure we can emulate a
"during writeback" blocking mechanism to avoid busy looping inside
the XFS code.

e.g. if we get a writeback call that asks for 5% to be written,
and we already have a metadata writeback target of 5% in place,
that means we should block for a while. That would emulate request
queue blocking and prevent busy looping in this case....

> And
> ->write_metadata() for XFS would need to always return 0 (as in "no progress
> made") to make sure this busyloop avoidance logic in wb_writeback()
> triggers. ext4 and btrfs would return number of bytes written from
> ->write_metadata (or just 1 would be enough to indicate some progress in
> metadata writeback was made and busyloop avoidance is not needed).

Well, if we block for a little while, we can indicate that progress
has been made and this whole mess would go away, right?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Josef Bacik <josef@toxicpanda.com>,
	hannes@cmpxchg.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	kernel-team@fb.com, linux-btrfs@vger.kernel.org,
	Josef Bacik <jbacik@fb.com>
Subject: Re: [PATCH v3 06/10] writeback: introduce super_operations->write_metadata
Date: Thu, 4 Jan 2018 12:32:07 +1100	[thread overview]
Message-ID: <20180104013207.GB32627@dastard> (raw)
In-Reply-To: <20180103135921.GF4911@quack2.suse.cz>

On Wed, Jan 03, 2018 at 02:59:21PM +0100, Jan Kara wrote:
> On Wed 03-01-18 13:32:19, Dave Chinner wrote:
> > I think we could probably block ->write_metadata if necessary via a
> > completion/wakeup style notification when a specific LSN is reached
> > by the log tail, but realistically if there's any amount of data
> > needing to be written it'll throttle data writes because the IO
> > pipeline is being kept full by background metadata writes....
> 
> So the problem I'm concerned about is a corner case. Consider a situation
> when you have no dirty data, only dirty metadata but enough of them to
> trigger background writeback. How should metadata writeback behave for XFS
> in this case? Who should be responsible that wb_writeback() just does not
> loop invoking ->write_metadata() as fast as CPU allows until xfsaild makes
> enough progress?
>
> Thinking about this today, I think this looping prevention belongs to
> wb_writeback().

Well, backgroudn data writeback can block in two ways. One is during
IO submission when the request queue is full, the other is when all
dirty inodes have had some work done on them and have all been moved
to b_more_io - wb_writeback waits for the __I_SYNC bit to be cleared
on the last(?) inode on that list, hence backing off before
submitting more IO.

IOws, there's a "during writeback" blocking mechanism as well as a
"between cycles" block mechanism.

> Sadly we don't have much info to decide how long to sleep
> before trying more writeback so we'd have to just sleep for
> <some_magic_amount> if we found no writeback happened in the last writeback
> round before going through the whole writeback loop again.

Right - I don't think we can provide a generic "between cycles"
blocking mechanism for XFS, but I'm pretty sure we can emulate a
"during writeback" blocking mechanism to avoid busy looping inside
the XFS code.

e.g. if we get a writeback call that asks for 5% to be written,
and we already have a metadata writeback target of 5% in place,
that means we should block for a while. That would emulate request
queue blocking and prevent busy looping in this case....

> And
> ->write_metadata() for XFS would need to always return 0 (as in "no progress
> made") to make sure this busyloop avoidance logic in wb_writeback()
> triggers. ext4 and btrfs would return number of bytes written from
> ->write_metadata (or just 1 would be enough to indicate some progress in
> metadata writeback was made and busyloop avoidance is not needed).

Well, if we block for a little while, we can indicate that progress
has been made and this whole mess would go away, right?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2018-01-04  1:32 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-11 21:55 [PATCH v3 00/11] Metadata specific accouting and dirty writeout Josef Bacik
2017-12-11 21:55 ` Josef Bacik
2017-12-11 21:55 ` [PATCH v3 01/10] remove mapping from balance_dirty_pages*() Josef Bacik
2017-12-11 21:55   ` Josef Bacik
2017-12-11 21:55 ` [PATCH v3 02/10] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes Josef Bacik
2017-12-11 21:55   ` Josef Bacik
2017-12-11 21:55 ` [PATCH v3 03/10] lib: add a __fprop_add_percpu_max Josef Bacik
2017-12-11 21:55   ` Josef Bacik
2017-12-19  7:25   ` Jan Kara
2017-12-11 21:55 ` [PATCH v3 04/10] writeback: convert the flexible prop stuff to bytes Josef Bacik
2017-12-11 21:55   ` Josef Bacik
2017-12-11 21:55 ` [PATCH v3 05/10] writeback: add counters for metadata usage Josef Bacik
2017-12-11 21:55   ` Josef Bacik
2017-12-19  7:52   ` Jan Kara
2017-12-11 21:55 ` [PATCH v3 06/10] writeback: introduce super_operations->write_metadata Josef Bacik
2017-12-11 21:55   ` Josef Bacik
2017-12-11 23:36   ` Dave Chinner
2017-12-11 23:36     ` Dave Chinner
2017-12-12 18:05     ` Josef Bacik
2017-12-12 18:05       ` Josef Bacik
2017-12-12 22:20       ` Dave Chinner
2017-12-12 22:20         ` Dave Chinner
2017-12-12 23:59         ` Josef Bacik
2017-12-12 23:59           ` Josef Bacik
2017-12-19 12:07         ` Jan Kara
2017-12-19 21:35           ` Dave Chinner
2017-12-20 14:30             ` Jan Kara
2018-01-02 16:13               ` Josef Bacik
2018-01-02 16:13                 ` Josef Bacik
2018-01-03  2:32                 ` Dave Chinner
2018-01-03  2:32                   ` Dave Chinner
2018-01-03 13:59                   ` Jan Kara
2018-01-03 13:59                     ` Jan Kara
2018-01-03 15:49                     ` Josef Bacik
2018-01-03 15:49                       ` Josef Bacik
2018-01-03 16:26                       ` Jan Kara
2018-01-03 16:26                         ` Jan Kara
2018-01-03 16:29                         ` Josef Bacik
2018-01-03 16:29                           ` Josef Bacik
2018-01-29  9:06                           ` Chandan Rajendra
2018-01-29  9:06                             ` Chandan Rajendra
2018-09-28  8:37                             ` Chandan Rajendra
2018-01-04  1:32                     ` Dave Chinner [this message]
2018-01-04  1:32                       ` Dave Chinner
2018-01-04  9:10                       ` Jan Kara
2018-01-04  9:10                         ` Jan Kara
2017-12-19 12:21   ` Jan Kara
2017-12-11 21:55 ` [PATCH v3 07/10] export radix_tree_iter_tag_set Josef Bacik
2017-12-11 21:55   ` Josef Bacik
2017-12-11 21:55 ` [PATCH v3 08/10] Btrfs: kill the btree_inode Josef Bacik
2017-12-11 21:55   ` Josef Bacik
2017-12-11 21:55 ` [PATCH v3 09/10] btrfs: rework end io for extent buffer reads Josef Bacik
2017-12-11 21:55   ` Josef Bacik
2017-12-11 21:55 ` [PATCH v3 10/10] btrfs: add NR_METADATA_BYTES accounting Josef Bacik
2017-12-11 21:55   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180104013207.GB32627@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=jbacik@fb.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.