All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fb.com>
To: Jan Kara <jack@suse.cz>
Cc: <linux-btrfs@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
	<kernel-team@fb.com>, <jack@suse.com>, <viro@zeniv.linux.org.uk>,
	<dchinner@redhat.com>, <hch@lst.de>, <linux-mm@kvack.org>
Subject: Re: [PATCH 2/3] writeback: allow for dirty metadata accounting
Date: Mon, 12 Sep 2016 10:56:04 -0400	[thread overview]
Message-ID: <bd00ed53-00c8-49d2-13b2-5f7dfa607185@fb.com> (raw)
In-Reply-To: <20160909081743.GC22777@quack2.suse.cz>

On 09/09/2016 04:17 AM, Jan Kara wrote:
> On Mon 22-08-16 13:35:01, Josef Bacik wrote:
>> Provide a mechanism for file systems to indicate how much dirty metadata they
>> are holding.  This introduces a few things
>>
>> 1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
>> 2) WB stat for dirty metadata.  This way we know if we need to try and call into
>> the file system to write out metadata.  This could potentially be used in the
>> future to make balancing of dirty pages smarter.
>
> So I'm curious about one thing: In the previous posting you have mentioned
> that the main motivation for this work is to have a simple support for
> sub-pagesize dirty metadata blocks that need tracking in btrfs. However you
> do the dirty accounting at page granularity. What are your plans to handle
> this mismatch?

We already track how much dirty metadata we have internally in btrfs, I 
envisioned the subpage blocksize guys just calling the accounting ever N objects 
that were dirited in order to keep the accounting correct.  This is not great, 
but it was better than the hoops we needed to jump through to deal with the 
btree_inode and subpagesize blocksizes.

>
> The thing is you actually shouldn't miscount by too much as that could
> upset some checks in mm checking how much dirty pages a node has directing
> how reclaim should be done... But it's a question whether NR_METADATA_DIRTY
> should be actually used in the checks in node_limits_ok() or in
> node_pagecache_reclaimable() at all because once you start accounting dirty
> slab objects, you are really on a thin ice...

Agreed, this does get a bit ugly.

>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 56c8fda..d329f89 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void)
>>  {
>>  	return global_node_page_state(NR_FILE_DIRTY) +
>>  		global_node_page_state(NR_UNSTABLE_NFS) +
>> +		global_node_page_state(NR_METADATA_DIRTY) +
>>  		get_nr_dirty_inodes();
>
> With my question is also connected this - when we have NR_METADATA_DIRTY,
> we could just account dirty inodes there and get rid of this
> get_nr_dirty_inodes() hack...
>
> But actually getting this to work right to be able to track dirty inodes would
> be useful on its own - some throlling of creation of dirty inodes would be
> useful for several filesystems (ext4, xfs, ...).

So I suppose what I could do is instead provide a callback for the vm to ask how 
many dirty objects we have in the file system, instead of adding another page 
counter.  That way the actual accounting is kept internal to the file system, 
and it gets rid of the weird mismatch when blocksize < pagesize.  Does that 
sound like a more acceptable approach?  Unfortunately I decided to do this work 
to make the blocksize < pagesize work easier, but then didn't actually think 
about how the accounting would interact with that case, because I'm an idiot.

I think that looping through all the sb's in the system would be kinda shitty 
for this tho, we want the "get number of dirty pages" part to be relatively 
fast.  What if I do something like the shrinker_control only for dirty objects. 
So the fs registers some dirty_objects_control, we call into each of those and 
get the counts from that.  Does that sound less crappy?  Thanks,

Josef

WARNING: multiple messages have this Message-ID (diff)
From: Josef Bacik <jbacik@fb.com>
To: Jan Kara <jack@suse.cz>
Cc: <linux-btrfs@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
	<kernel-team@fb.com>, <jack@suse.com>, <viro@zeniv.linux.org.uk>,
	<dchinner@redhat.com>, <hch@lst.de>, <linux-mm@kvack.org>
Subject: Re: [PATCH 2/3] writeback: allow for dirty metadata accounting
Date: Mon, 12 Sep 2016 10:56:04 -0400	[thread overview]
Message-ID: <bd00ed53-00c8-49d2-13b2-5f7dfa607185@fb.com> (raw)
In-Reply-To: <20160909081743.GC22777@quack2.suse.cz>

On 09/09/2016 04:17 AM, Jan Kara wrote:
> On Mon 22-08-16 13:35:01, Josef Bacik wrote:
>> Provide a mechanism for file systems to indicate how much dirty metadata they
>> are holding.  This introduces a few things
>>
>> 1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
>> 2) WB stat for dirty metadata.  This way we know if we need to try and call into
>> the file system to write out metadata.  This could potentially be used in the
>> future to make balancing of dirty pages smarter.
>
> So I'm curious about one thing: In the previous posting you have mentioned
> that the main motivation for this work is to have a simple support for
> sub-pagesize dirty metadata blocks that need tracking in btrfs. However you
> do the dirty accounting at page granularity. What are your plans to handle
> this mismatch?

We already track how much dirty metadata we have internally in btrfs, I 
envisioned the subpage blocksize guys just calling the accounting ever N objects 
that were dirited in order to keep the accounting correct.  This is not great, 
but it was better than the hoops we needed to jump through to deal with the 
btree_inode and subpagesize blocksizes.

>
> The thing is you actually shouldn't miscount by too much as that could
> upset some checks in mm checking how much dirty pages a node has directing
> how reclaim should be done... But it's a question whether NR_METADATA_DIRTY
> should be actually used in the checks in node_limits_ok() or in
> node_pagecache_reclaimable() at all because once you start accounting dirty
> slab objects, you are really on a thin ice...

Agreed, this does get a bit ugly.

>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 56c8fda..d329f89 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void)
>>  {
>>  	return global_node_page_state(NR_FILE_DIRTY) +
>>  		global_node_page_state(NR_UNSTABLE_NFS) +
>> +		global_node_page_state(NR_METADATA_DIRTY) +
>>  		get_nr_dirty_inodes();
>
> With my question is also connected this - when we have NR_METADATA_DIRTY,
> we could just account dirty inodes there and get rid of this
> get_nr_dirty_inodes() hack...
>
> But actually getting this to work right to be able to track dirty inodes would
> be useful on its own - some throlling of creation of dirty inodes would be
> useful for several filesystems (ext4, xfs, ...).

So I suppose what I could do is instead provide a callback for the vm to ask how 
many dirty objects we have in the file system, instead of adding another page 
counter.  That way the actual accounting is kept internal to the file system, 
and it gets rid of the weird mismatch when blocksize < pagesize.  Does that 
sound like a more acceptable approach?  Unfortunately I decided to do this work 
to make the blocksize < pagesize work easier, but then didn't actually think 
about how the accounting would interact with that case, because I'm an idiot.

I think that looping through all the sb's in the system would be kinda shitty 
for this tho, we want the "get number of dirty pages" part to be relatively 
fast.  What if I do something like the shrinker_control only for dirty objects. 
So the fs registers some dirty_objects_control, we call into each of those and 
get the counts from that.  Does that sound less crappy?  Thanks,

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Josef Bacik <jbacik@fb.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	kernel-team@fb.com, jack@suse.com, viro@zeniv.linux.org.uk,
	dchinner@redhat.com, hch@lst.de, linux-mm@kvack.org
Subject: Re: [PATCH 2/3] writeback: allow for dirty metadata accounting
Date: Mon, 12 Sep 2016 10:56:04 -0400	[thread overview]
Message-ID: <bd00ed53-00c8-49d2-13b2-5f7dfa607185@fb.com> (raw)
In-Reply-To: <20160909081743.GC22777@quack2.suse.cz>

On 09/09/2016 04:17 AM, Jan Kara wrote:
> On Mon 22-08-16 13:35:01, Josef Bacik wrote:
>> Provide a mechanism for file systems to indicate how much dirty metadata they
>> are holding.  This introduces a few things
>>
>> 1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
>> 2) WB stat for dirty metadata.  This way we know if we need to try and call into
>> the file system to write out metadata.  This could potentially be used in the
>> future to make balancing of dirty pages smarter.
>
> So I'm curious about one thing: In the previous posting you have mentioned
> that the main motivation for this work is to have a simple support for
> sub-pagesize dirty metadata blocks that need tracking in btrfs. However you
> do the dirty accounting at page granularity. What are your plans to handle
> this mismatch?

We already track how much dirty metadata we have internally in btrfs, I 
envisioned the subpage blocksize guys just calling the accounting ever N objects 
that were dirited in order to keep the accounting correct.  This is not great, 
but it was better than the hoops we needed to jump through to deal with the 
btree_inode and subpagesize blocksizes.

>
> The thing is you actually shouldn't miscount by too much as that could
> upset some checks in mm checking how much dirty pages a node has directing
> how reclaim should be done... But it's a question whether NR_METADATA_DIRTY
> should be actually used in the checks in node_limits_ok() or in
> node_pagecache_reclaimable() at all because once you start accounting dirty
> slab objects, you are really on a thin ice...

Agreed, this does get a bit ugly.

>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 56c8fda..d329f89 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void)
>>  {
>>  	return global_node_page_state(NR_FILE_DIRTY) +
>>  		global_node_page_state(NR_UNSTABLE_NFS) +
>> +		global_node_page_state(NR_METADATA_DIRTY) +
>>  		get_nr_dirty_inodes();
>
> With my question is also connected this - when we have NR_METADATA_DIRTY,
> we could just account dirty inodes there and get rid of this
> get_nr_dirty_inodes() hack...
>
> But actually getting this to work right to be able to track dirty inodes would
> be useful on its own - some throlling of creation of dirty inodes would be
> useful for several filesystems (ext4, xfs, ...).

So I suppose what I could do is instead provide a callback for the vm to ask how 
many dirty objects we have in the file system, instead of adding another page 
counter.  That way the actual accounting is kept internal to the file system, 
and it gets rid of the weird mismatch when blocksize < pagesize.  Does that 
sound like a more acceptable approach?  Unfortunately I decided to do this work 
to make the blocksize < pagesize work easier, but then didn't actually think 
about how the accounting would interact with that case, because I'm an idiot.

I think that looping through all the sb's in the system would be kinda shitty 
for this tho, we want the "get number of dirty pages" part to be relatively 
fast.  What if I do something like the shrinker_control only for dirty objects. 
So the fs registers some dirty_objects_control, we call into each of those and 
get the counts from that.  Does that sound less crappy?  Thanks,

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-09-12 14:57 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-22 17:34 [PATCH 0/3][V2] Provide accounting for dirty metadata Josef Bacik
2016-08-22 17:34 ` Josef Bacik
2016-08-22 17:34 ` Josef Bacik
2016-08-22 17:35 ` [PATCH 1/3] remove mapping from balance_dirty_pages*() Josef Bacik
2016-08-22 17:35   ` Josef Bacik
2016-08-22 17:35   ` Josef Bacik
2016-08-22 18:29   ` kbuild test robot
2016-08-22 17:35 ` [PATCH 2/3] writeback: allow for dirty metadata accounting Josef Bacik
2016-08-22 17:35   ` Josef Bacik
2016-08-22 17:35   ` Josef Bacik
2016-09-09  8:17   ` Jan Kara
2016-09-09  8:17     ` Jan Kara
2016-09-12  0:46     ` Dave Chinner
2016-09-12  0:46       ` Dave Chinner
2016-09-12  7:34       ` Jan Kara
2016-09-12  7:34         ` Jan Kara
2016-09-12 14:24         ` Josef Bacik
2016-09-12 14:24           ` Josef Bacik
2016-09-12 14:24           ` Josef Bacik
2016-09-12 23:01           ` Dave Chinner
2016-09-12 23:01             ` Dave Chinner
2016-09-12 14:56     ` Josef Bacik [this message]
2016-09-12 14:56       ` Josef Bacik
2016-09-12 14:56       ` Josef Bacik
2016-09-12 23:18       ` Dave Chinner
2016-09-12 23:18         ` Dave Chinner
2016-08-22 17:35 ` [PATCH 3/3] writeback: introduce super_operations->write_metadata Josef Bacik
2016-08-22 17:35   ` Josef Bacik
2016-08-22 17:35   ` Josef Bacik
2016-09-09  8:29   ` Jan Kara
2016-09-09  8:29     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd00ed53-00c8-49d2-13b2-5f7dfa607185@fb.com \
    --to=jbacik@fb.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.