From: Dave Chinner <david@fromorbit.com> To: Jan Kara <jack@suse.cz> Cc: Josef Bacik <josef@toxicpanda.com>, hannes@cmpxchg.org, linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kernel-team@fb.com, linux-btrfs@vger.kernel.org, Josef Bacik <jbacik@fb.com> Subject: Re: [PATCH v3 06/10] writeback: introduce super_operations->write_metadata Date: Thu, 4 Jan 2018 12:32:07 +1100 [thread overview] Message-ID: <20180104013207.GB32627@dastard> (raw) In-Reply-To: <20180103135921.GF4911@quack2.suse.cz> On Wed, Jan 03, 2018 at 02:59:21PM +0100, Jan Kara wrote: > On Wed 03-01-18 13:32:19, Dave Chinner wrote: > > I think we could probably block ->write_metadata if necessary via a > > completion/wakeup style notification when a specific LSN is reached > > by the log tail, but realistically if there's any amount of data > > needing to be written it'll throttle data writes because the IO > > pipeline is being kept full by background metadata writes.... > > So the problem I'm concerned about is a corner case. Consider a situation > when you have no dirty data, only dirty metadata but enough of them to > trigger background writeback. How should metadata writeback behave for XFS > in this case? Who should be responsible that wb_writeback() just does not > loop invoking ->write_metadata() as fast as CPU allows until xfsaild makes > enough progress? > > Thinking about this today, I think this looping prevention belongs to > wb_writeback(). Well, backgroudn data writeback can block in two ways. One is during IO submission when the request queue is full, the other is when all dirty inodes have had some work done on them and have all been moved to b_more_io - wb_writeback waits for the __I_SYNC bit to be cleared on the last(?) inode on that list, hence backing off before submitting more IO. IOws, there's a "during writeback" blocking mechanism as well as a "between cycles" block mechanism. > Sadly we don't have much info to decide how long to sleep > before trying more writeback so we'd have to just sleep for > <some_magic_amount> if we found no writeback happened in the last writeback > round before going through the whole writeback loop again. Right - I don't think we can provide a generic "between cycles" blocking mechanism for XFS, but I'm pretty sure we can emulate a "during writeback" blocking mechanism to avoid busy looping inside the XFS code. e.g. if we get a writeback call that asks for 5% to be written, and we already have a metadata writeback target of 5% in place, that means we should block for a while. That would emulate request queue blocking and prevent busy looping in this case.... > And > ->write_metadata() for XFS would need to always return 0 (as in "no progress > made") to make sure this busyloop avoidance logic in wb_writeback() > triggers. ext4 and btrfs would return number of bytes written from > ->write_metadata (or just 1 would be enough to indicate some progress in > metadata writeback was made and busyloop avoidance is not needed). Well, if we block for a little while, we can indicate that progress has been made and this whole mess would go away, right? Cheers, Dave. -- Dave Chinner david@fromorbit.com
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com> To: Jan Kara <jack@suse.cz> Cc: Josef Bacik <josef@toxicpanda.com>, hannes@cmpxchg.org, linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kernel-team@fb.com, linux-btrfs@vger.kernel.org, Josef Bacik <jbacik@fb.com> Subject: Re: [PATCH v3 06/10] writeback: introduce super_operations->write_metadata Date: Thu, 4 Jan 2018 12:32:07 +1100 [thread overview] Message-ID: <20180104013207.GB32627@dastard> (raw) In-Reply-To: <20180103135921.GF4911@quack2.suse.cz> On Wed, Jan 03, 2018 at 02:59:21PM +0100, Jan Kara wrote: > On Wed 03-01-18 13:32:19, Dave Chinner wrote: > > I think we could probably block ->write_metadata if necessary via a > > completion/wakeup style notification when a specific LSN is reached > > by the log tail, but realistically if there's any amount of data > > needing to be written it'll throttle data writes because the IO > > pipeline is being kept full by background metadata writes.... > > So the problem I'm concerned about is a corner case. Consider a situation > when you have no dirty data, only dirty metadata but enough of them to > trigger background writeback. How should metadata writeback behave for XFS > in this case? Who should be responsible that wb_writeback() just does not > loop invoking ->write_metadata() as fast as CPU allows until xfsaild makes > enough progress? > > Thinking about this today, I think this looping prevention belongs to > wb_writeback(). Well, backgroudn data writeback can block in two ways. One is during IO submission when the request queue is full, the other is when all dirty inodes have had some work done on them and have all been moved to b_more_io - wb_writeback waits for the __I_SYNC bit to be cleared on the last(?) inode on that list, hence backing off before submitting more IO. IOws, there's a "during writeback" blocking mechanism as well as a "between cycles" block mechanism. > Sadly we don't have much info to decide how long to sleep > before trying more writeback so we'd have to just sleep for > <some_magic_amount> if we found no writeback happened in the last writeback > round before going through the whole writeback loop again. Right - I don't think we can provide a generic "between cycles" blocking mechanism for XFS, but I'm pretty sure we can emulate a "during writeback" blocking mechanism to avoid busy looping inside the XFS code. e.g. if we get a writeback call that asks for 5% to be written, and we already have a metadata writeback target of 5% in place, that means we should block for a while. That would emulate request queue blocking and prevent busy looping in this case.... > And > ->write_metadata() for XFS would need to always return 0 (as in "no progress > made") to make sure this busyloop avoidance logic in wb_writeback() > triggers. ext4 and btrfs would return number of bytes written from > ->write_metadata (or just 1 would be enough to indicate some progress in > metadata writeback was made and busyloop avoidance is not needed). Well, if we block for a little while, we can indicate that progress has been made and this whole mess would go away, right? Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2018-01-04 1:32 UTC|newest] Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-12-11 21:55 [PATCH v3 00/11] Metadata specific accouting and dirty writeout Josef Bacik 2017-12-11 21:55 ` Josef Bacik 2017-12-11 21:55 ` [PATCH v3 01/10] remove mapping from balance_dirty_pages*() Josef Bacik 2017-12-11 21:55 ` Josef Bacik 2017-12-11 21:55 ` [PATCH v3 02/10] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes Josef Bacik 2017-12-11 21:55 ` Josef Bacik 2017-12-11 21:55 ` [PATCH v3 03/10] lib: add a __fprop_add_percpu_max Josef Bacik 2017-12-11 21:55 ` Josef Bacik 2017-12-19 7:25 ` Jan Kara 2017-12-11 21:55 ` [PATCH v3 04/10] writeback: convert the flexible prop stuff to bytes Josef Bacik 2017-12-11 21:55 ` Josef Bacik 2017-12-11 21:55 ` [PATCH v3 05/10] writeback: add counters for metadata usage Josef Bacik 2017-12-11 21:55 ` Josef Bacik 2017-12-19 7:52 ` Jan Kara 2017-12-11 21:55 ` [PATCH v3 06/10] writeback: introduce super_operations->write_metadata Josef Bacik 2017-12-11 21:55 ` Josef Bacik 2017-12-11 23:36 ` Dave Chinner 2017-12-11 23:36 ` Dave Chinner 2017-12-12 18:05 ` Josef Bacik 2017-12-12 18:05 ` Josef Bacik 2017-12-12 22:20 ` Dave Chinner 2017-12-12 22:20 ` Dave Chinner 2017-12-12 23:59 ` Josef Bacik 2017-12-12 23:59 ` Josef Bacik 2017-12-19 12:07 ` Jan Kara 2017-12-19 21:35 ` Dave Chinner 2017-12-20 14:30 ` Jan Kara 2018-01-02 16:13 ` Josef Bacik 2018-01-02 16:13 ` Josef Bacik 2018-01-03 2:32 ` Dave Chinner 2018-01-03 2:32 ` Dave Chinner 2018-01-03 13:59 ` Jan Kara 2018-01-03 13:59 ` Jan Kara 2018-01-03 15:49 ` Josef Bacik 2018-01-03 15:49 ` Josef Bacik 2018-01-03 16:26 ` Jan Kara 2018-01-03 16:26 ` Jan Kara 2018-01-03 16:29 ` Josef Bacik 2018-01-03 16:29 ` Josef Bacik 2018-01-29 9:06 ` Chandan Rajendra 2018-01-29 9:06 ` Chandan Rajendra 2018-09-28 8:37 ` Chandan Rajendra 2018-01-04 1:32 ` Dave Chinner [this message] 2018-01-04 1:32 ` Dave Chinner 2018-01-04 9:10 ` Jan Kara 2018-01-04 9:10 ` Jan Kara 2017-12-19 12:21 ` Jan Kara 2017-12-11 21:55 ` [PATCH v3 07/10] export radix_tree_iter_tag_set Josef Bacik 2017-12-11 21:55 ` Josef Bacik 2017-12-11 21:55 ` [PATCH v3 08/10] Btrfs: kill the btree_inode Josef Bacik 2017-12-11 21:55 ` Josef Bacik 2017-12-11 21:55 ` [PATCH v3 09/10] btrfs: rework end io for extent buffer reads Josef Bacik 2017-12-11 21:55 ` Josef Bacik 2017-12-11 21:55 ` [PATCH v3 10/10] btrfs: add NR_METADATA_BYTES accounting Josef Bacik 2017-12-11 21:55 ` Josef Bacik
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180104013207.GB32627@dastard \ --to=david@fromorbit.com \ --cc=akpm@linux-foundation.org \ --cc=hannes@cmpxchg.org \ --cc=jack@suse.cz \ --cc=jbacik@fb.com \ --cc=josef@toxicpanda.com \ --cc=kernel-team@fb.com \ --cc=linux-btrfs@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-mm@kvack.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.