Re: [PATCH v3 06/10] writeback: introduce super_operations->write_metadata

From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Josef Bacik <josef@toxicpanda.com>,
	hannes@cmpxchg.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	kernel-team@fb.com, linux-btrfs@vger.kernel.org,
	Josef Bacik <jbacik@fb.com>
Subject: Re: [PATCH v3 06/10] writeback: introduce super_operations->write_metadata
Date: Thu, 4 Jan 2018 12:32:07 +1100	[thread overview]
Message-ID: <20180104013207.GB32627@dastard> (raw)
In-Reply-To: <20180103135921.GF4911@quack2.suse.cz>

On Wed, Jan 03, 2018 at 02:59:21PM +0100, Jan Kara wrote:
> On Wed 03-01-18 13:32:19, Dave Chinner wrote:
> > I think we could probably block ->write_metadata if necessary via a
> > completion/wakeup style notification when a specific LSN is reached
> > by the log tail, but realistically if there's any amount of data
> > needing to be written it'll throttle data writes because the IO
> > pipeline is being kept full by background metadata writes....
> 
> So the problem I'm concerned about is a corner case. Consider a situation
> when you have no dirty data, only dirty metadata but enough of them to
> trigger background writeback. How should metadata writeback behave for XFS
> in this case? Who should be responsible that wb_writeback() just does not
> loop invoking ->write_metadata() as fast as CPU allows until xfsaild makes
> enough progress?
>
> Thinking about this today, I think this looping prevention belongs to
> wb_writeback().

Well, backgroudn data writeback can block in two ways. One is during
IO submission when the request queue is full, the other is when all
dirty inodes have had some work done on them and have all been moved
to b_more_io - wb_writeback waits for the __I_SYNC bit to be cleared
on the last(?) inode on that list, hence backing off before
submitting more IO.

IOws, there's a "during writeback" blocking mechanism as well as a
"between cycles" block mechanism.

> Sadly we don't have much info to decide how long to sleep
> before trying more writeback so we'd have to just sleep for
> <some_magic_amount> if we found no writeback happened in the last writeback
> round before going through the whole writeback loop again.

Right - I don't think we can provide a generic "between cycles"
blocking mechanism for XFS, but I'm pretty sure we can emulate a
"during writeback" blocking mechanism to avoid busy looping inside
the XFS code.

e.g. if we get a writeback call that asks for 5% to be written,
and we already have a metadata writeback target of 5% in place,
that means we should block for a while. That would emulate request
queue blocking and prevent busy looping in this case....

> And
> ->write_metadata() for XFS would need to always return 0 (as in "no progress
> made") to make sure this busyloop avoidance logic in wb_writeback()
> triggers. ext4 and btrfs would return number of bytes written from
> ->write_metadata (or just 1 would be enough to indicate some progress in
> metadata writeback was made and busyloop avoidance is not needed).

Well, if we block for a little while, we can indicate that progress
has been made and this whole mess would go away, right?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com