All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Josef Bacik <josef@toxicpanda.com>
Cc: hannes@cmpxchg.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, jack@suse.cz,
	linux-fsdevel@vger.kernel.org, kernel-team@fb.com,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 00/11] Metadata specific accouting and dirty writeout
Date: Thu, 23 Nov 2017 19:45:49 +0100	[thread overview]
Message-ID: <20171123184549.GT3553@twin.jikos.cz> (raw)
In-Reply-To: <1511385366-20329-1-git-send-email-josef@toxicpanda.com>

On Wed, Nov 22, 2017 at 04:15:55PM -0500, Josef Bacik wrote:
> These patches are to support having metadata accounting and dirty handling
> in a generic way.  For dirty metadata ext4 and xfs currently are limited by
> their journal size, which allows them to handle dirty metadata flushing in a
> relatively easy way.  Btrfs does not have this limiting factor, we can have as
> much dirty metadata on the system as we have memory, so we have a dummy inode
> that all of our metadat pages are allocated from so we can call
> balance_dirty_pages() on it and make sure we don't overwhelm the system with
> dirty metadata pages.
> 
> The problem with this is it severely limits our ability to do things like
> support sub-pagesize blocksizes.  Btrfs also supports metadata blocksizes > page
> size, which makes keeping track of our metadata and it's pages particularly
> tricky.  We have the inode mapping with our pages, and we have another radix
> tree for our actual metadata buffers.  This double accounting leads to some fun
> shenanigans around reclaim and evicting pages we know we are done using.
> 
> To solve this we would like to switch to a scheme like xfs has, where we simply
> have our metadata structures tied into the slab shrinking code, and we just use
> alloc_page() for our pages, or kmalloc() when we add sub-pagesize blocksizes.
> In order to do this we need infrastructure in place to make sure we still don't
> overwhelm the system with dirty metadata pages.
> 
> Enter these patches.  Because metadata is tracked on a non-pagesize amount we
> need to convert a bunch of our existing counters to bytes.  From there I've
> added various counters for metadata, to keep track of overall metadata bytes,
> how many are dirty and how many are under writeback.  I've added a super
> operation to handle the dirty writeback, which is going to be handled mostly
> inside the fs since we will need a little more smarts around what we writeback.

The text relevant for btrfs should also go to the btree_inode removal
patch changelog. The cover letter gets lost but we still might need to
refer to the overall logic that's going to be changed in that patch.

And possibly more documentation should go to the code itself, there are
some scattered comments in the tricky parts but the overall logic is not
described and the key functions lack comments.

What's your merge plan? There are other subsystem changes needed, before
the btree_inode removal can happen and can be tested within our for-next
branches. The 4.15 target is out of reach, so I assume 4.16 for the
dependencies and 4.17 for the btree_inode. We can of course test them
earlier, but 4.16 does not seem realistic for the whole patchset.

> The last three patches are just there to show how we use the infrastructure in
> the first 8 patches.  The actuall kill btree_inode patch is pretty big,
> unfortunately ripping out all of the pagecache based handling and replacing it
> with the new infrastructure has to be done whole-hog and can't be broken up
> anymore than it already has been without making it un-bisectable.

I don't completely agree that it cannot be split. I went through it a
few times, the patch is too big for review. It mixes the core part and
cleanups that do not necessarily need to be in the patch. I'd like to
see the core part minimized further, at the cost of leaving some dead
code behind (like the old callbacks).

WARNING: multiple messages have this Message-ID (diff)
From: David Sterba <dsterba@suse.cz>
To: Josef Bacik <josef@toxicpanda.com>
Cc: hannes@cmpxchg.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, jack@suse.cz,
	linux-fsdevel@vger.kernel.org, kernel-team@fb.com,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 00/11] Metadata specific accouting and dirty writeout
Date: Thu, 23 Nov 2017 19:45:49 +0100	[thread overview]
Message-ID: <20171123184549.GT3553@twin.jikos.cz> (raw)
In-Reply-To: <1511385366-20329-1-git-send-email-josef@toxicpanda.com>

On Wed, Nov 22, 2017 at 04:15:55PM -0500, Josef Bacik wrote:
> These patches are to support having metadata accounting and dirty handling
> in a generic way.  For dirty metadata ext4 and xfs currently are limited by
> their journal size, which allows them to handle dirty metadata flushing in a
> relatively easy way.  Btrfs does not have this limiting factor, we can have as
> much dirty metadata on the system as we have memory, so we have a dummy inode
> that all of our metadat pages are allocated from so we can call
> balance_dirty_pages() on it and make sure we don't overwhelm the system with
> dirty metadata pages.
> 
> The problem with this is it severely limits our ability to do things like
> support sub-pagesize blocksizes.  Btrfs also supports metadata blocksizes > page
> size, which makes keeping track of our metadata and it's pages particularly
> tricky.  We have the inode mapping with our pages, and we have another radix
> tree for our actual metadata buffers.  This double accounting leads to some fun
> shenanigans around reclaim and evicting pages we know we are done using.
> 
> To solve this we would like to switch to a scheme like xfs has, where we simply
> have our metadata structures tied into the slab shrinking code, and we just use
> alloc_page() for our pages, or kmalloc() when we add sub-pagesize blocksizes.
> In order to do this we need infrastructure in place to make sure we still don't
> overwhelm the system with dirty metadata pages.
> 
> Enter these patches.  Because metadata is tracked on a non-pagesize amount we
> need to convert a bunch of our existing counters to bytes.  From there I've
> added various counters for metadata, to keep track of overall metadata bytes,
> how many are dirty and how many are under writeback.  I've added a super
> operation to handle the dirty writeback, which is going to be handled mostly
> inside the fs since we will need a little more smarts around what we writeback.

The text relevant for btrfs should also go to the btree_inode removal
patch changelog. The cover letter gets lost but we still might need to
refer to the overall logic that's going to be changed in that patch.

And possibly more documentation should go to the code itself, there are
some scattered comments in the tricky parts but the overall logic is not
described and the key functions lack comments.

What's your merge plan? There are other subsystem changes needed, before
the btree_inode removal can happen and can be tested within our for-next
branches. The 4.15 target is out of reach, so I assume 4.16 for the
dependencies and 4.17 for the btree_inode. We can of course test them
earlier, but 4.16 does not seem realistic for the whole patchset.

> The last three patches are just there to show how we use the infrastructure in
> the first 8 patches.  The actuall kill btree_inode patch is pretty big,
> unfortunately ripping out all of the pagecache based handling and replacing it
> with the new infrastructure has to be done whole-hog and can't be broken up
> anymore than it already has been without making it un-bisectable.

I don't completely agree that it cannot be split. I went through it a
few times, the patch is too big for review. It mixes the core part and
cleanups that do not necessarily need to be in the patch. I'd like to
see the core part minimized further, at the cost of leaving some dead
code behind (like the old callbacks).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-11-23 18:47 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-22 21:15 [PATCH v2 00/11] Metadata specific accouting and dirty writeout Josef Bacik
2017-11-22 21:15 ` Josef Bacik
2017-11-22 21:15 ` [PATCH v2 01/11] remove mapping from balance_dirty_pages*() Josef Bacik
2017-11-22 21:15   ` Josef Bacik
2017-11-22 21:15 ` [PATCH v2 02/11] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes Josef Bacik
2017-11-22 21:15   ` Josef Bacik
2017-11-29 17:02   ` Jan Kara
2017-11-29 17:02     ` Jan Kara
2017-11-22 21:15 ` [PATCH v2 03/11] lib: make the fprop batch size a multiple of PAGE_SIZE Josef Bacik
2017-11-22 21:15   ` Josef Bacik
2017-11-29 17:04   ` Jan Kara
2017-11-29 17:04     ` Jan Kara
2017-11-30 15:48     ` David Sterba
2017-11-30 15:48       ` David Sterba
2017-11-22 21:15 ` [PATCH v2 04/11] lib: add a __fprop_add_percpu_max Josef Bacik
2017-11-22 21:15   ` Josef Bacik
2017-11-29 17:05   ` Jan Kara
2017-11-29 17:05     ` Jan Kara
2017-11-22 21:16 ` [PATCH v2 05/11] writeback: convert the flexible prop stuff to bytes Josef Bacik
2017-11-22 21:16   ` Josef Bacik
2017-11-29 17:05   ` Jan Kara
2017-11-29 17:05     ` Jan Kara
2017-11-22 21:16 ` [PATCH v2 06/11] writeback: add counters for metadata usage Josef Bacik
2017-11-22 21:16   ` Josef Bacik
2017-12-04 13:06   ` Jan Kara
2017-12-04 13:06     ` Jan Kara
2017-12-06 20:18     ` Josef Bacik
2017-12-06 20:18       ` Josef Bacik
2017-12-06 22:43       ` Johannes Weiner
2017-12-06 22:43         ` Johannes Weiner
2017-11-22 21:16 ` [PATCH v2 07/11] writeback: introduce super_operations->write_metadata Josef Bacik
2017-11-22 21:16   ` Josef Bacik
2017-11-22 21:16 ` [PATCH v2 08/11] export radix_tree_iter_tag_set Josef Bacik
2017-11-22 21:16   ` Josef Bacik
2017-11-22 21:16 ` [PATCH v2 09/11] Btrfs: kill the btree_inode Josef Bacik
2017-11-22 21:16   ` Josef Bacik
2017-11-27 17:10   ` David Sterba
2017-11-27 17:10     ` David Sterba
2017-11-22 21:16 ` [PATCH v2 10/11] btrfs: rework end io for extent buffer reads Josef Bacik
2017-11-22 21:16   ` Josef Bacik
2017-11-22 21:16 ` [PATCH v2 11/11] btrfs: add NR_METADATA_BYTES accounting Josef Bacik
2017-11-22 21:16   ` Josef Bacik
2017-11-23 18:45 ` David Sterba [this message]
2017-11-23 18:45   ` [PATCH v2 00/11] Metadata specific accouting and dirty writeout David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171123184549.GT3553@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.