linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/11] Metadata specific accouting and dirty writeout
@ 2017-12-11 21:55 Josef Bacik
  2017-12-11 21:55 ` [PATCH v3 01/10] remove mapping from balance_dirty_pages*() Josef Bacik
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Josef Bacik @ 2017-12-11 21:55 UTC (permalink / raw)
  To: hannes, linux-mm, akpm, jack, linux-fsdevel, kernel-team, linux-btrfs

FYI patches 8-10 are purely there so people can see how I intend to use this.
These are large changes that need to go through the btrfs tree and will
undoubtedly change a lot.  My goal is for patches 1-7 to go through Andrew via
the mm tree and then once they have landed to go ahead and work out the details
of the btrfs patches with the other btrfs developers and merge via that tree.
I'm not asking for reviews on those, Jan just mentioned that it would be easier
to tell what I was trying to do if he could see how I intended to use it.

v2->v3:
- addressed issues brought up by Jan in the actual node metadata bytes
  accounting patch.
- collapsed the fprop patch that converted everything to bytes into the patch
  that converted the wb usage of fprop stuff to bytes.

-- Original message --
These patches are to support having metadata accounting and dirty handling
in a generic way.  For dirty metadata ext4 and xfs currently are limited by
their journal size, which allows them to handle dirty metadata flushing in a
relatively easy way.  Btrfs does not have this limiting factor, we can have as
much dirty metadata on the system as we have memory, so we have a dummy inode
that all of our metadat pages are allocated from so we can call
balance_dirty_pages() on it and make sure we don't overwhelm the system with
dirty metadata pages.

The problem with this is it severely limits our ability to do things like
support sub-pagesize blocksizes.  Btrfs also supports metadata blocksizes > page
size, which makes keeping track of our metadata and it's pages particularly
tricky.  We have the inode mapping with our pages, and we have another radix
tree for our actual metadata buffers.  This double accounting leads to some fun
shenanigans around reclaim and evicting pages we know we are done using.

To solve this we would like to switch to a scheme like xfs has, where we simply
have our metadata structures tied into the slab shrinking code, and we just use
alloc_page() for our pages, or kmalloc() when we add sub-pagesize blocksizes.
In order to do this we need infrastructure in place to make sure we still don't
overwhelm the system with dirty metadata pages.

Enter these patches.  Because metadata is tracked on a non-pagesize amount we
need to convert a bunch of our existing counters to bytes.  From there I've
added various counters for metadata, to keep track of overall metadata bytes,
how many are dirty and how many are under writeback.  I've added a super
operation to handle the dirty writeback, which is going to be handled mostly
inside the fs since we will need a little more smarts around what we writeback.

The last three patches are just there to show how we use the infrastructure in
the first 8 patches.  The actuall kill btree_inode patch is pretty big,
unfortunately ripping out all of the pagecache based handling and replacing it
with the new infrastructure has to be done whole-hog and can't be broken up
anymore than it already has been without making it un-bisectable.

Thanks,

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-09-28  8:37 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-11 21:55 [PATCH v3 00/11] Metadata specific accouting and dirty writeout Josef Bacik
2017-12-11 21:55 ` [PATCH v3 01/10] remove mapping from balance_dirty_pages*() Josef Bacik
2017-12-11 21:55 ` [PATCH v3 02/10] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes Josef Bacik
2017-12-11 21:55 ` [PATCH v3 03/10] lib: add a __fprop_add_percpu_max Josef Bacik
2017-12-19  7:25   ` Jan Kara
2017-12-11 21:55 ` [PATCH v3 04/10] writeback: convert the flexible prop stuff to bytes Josef Bacik
2017-12-11 21:55 ` [PATCH v3 05/10] writeback: add counters for metadata usage Josef Bacik
2017-12-19  7:52   ` Jan Kara
2017-12-11 21:55 ` [PATCH v3 06/10] writeback: introduce super_operations->write_metadata Josef Bacik
2017-12-11 23:36   ` Dave Chinner
2017-12-12 18:05     ` Josef Bacik
2017-12-12 22:20       ` Dave Chinner
2017-12-12 23:59         ` Josef Bacik
2017-12-19 12:07         ` Jan Kara
2017-12-19 21:35           ` Dave Chinner
2017-12-20 14:30             ` Jan Kara
2018-01-02 16:13               ` Josef Bacik
2018-01-03  2:32                 ` Dave Chinner
2018-01-03 13:59                   ` Jan Kara
2018-01-03 15:49                     ` Josef Bacik
2018-01-03 16:26                       ` Jan Kara
2018-01-03 16:29                         ` Josef Bacik
2018-01-29  9:06                           ` Chandan Rajendra
2018-09-28  8:37                             ` Chandan Rajendra
2018-01-04  1:32                     ` Dave Chinner
2018-01-04  9:10                       ` Jan Kara
2017-12-19 12:21   ` Jan Kara
2017-12-11 21:55 ` [PATCH v3 07/10] export radix_tree_iter_tag_set Josef Bacik
2017-12-11 21:55 ` [PATCH v3 08/10] Btrfs: kill the btree_inode Josef Bacik
2017-12-11 21:55 ` [PATCH v3 09/10] btrfs: rework end io for extent buffer reads Josef Bacik
2017-12-11 21:55 ` [PATCH v3 10/10] btrfs: add NR_METADATA_BYTES accounting Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).