All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Tejun Heo <tj@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm/filemap: don't initiate writeback if mapping has no dirty pages
Date: Tue, 30 Jul 2019 21:15:22 +0300	[thread overview]
Message-ID: <c28d4243-aeb9-901a-46e9-bfe2e704cd8f@yandex-team.ru> (raw)
In-Reply-To: <20190730154854.GG28829@quack2.suse.cz>



On 30.07.2019 18:48, Jan Kara wrote:
> On Tue 30-07-19 17:57:18, Konstantin Khlebnikov wrote:
>> On 30.07.2019 17:14, Jan Kara wrote:
>>> On Tue 23-07-19 11:16:51, Konstantin Khlebnikov wrote:
>>>> On 23.07.2019 3:52, Andrew Morton wrote:
>>>>>
>>>>> (cc linux-fsdevel and Jan)
>>>
>>> Thanks for CC Andrew.
>>>
>>>>> On Mon, 22 Jul 2019 12:36:08 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:
>>>>>
>>>>>> Functions like filemap_write_and_wait_range() should do nothing if inode
>>>>>> has no dirty pages or pages currently under writeback. But they anyway
>>>>>> construct struct writeback_control and this does some atomic operations
>>>>>> if CONFIG_CGROUP_WRITEBACK=y - on fast path it locks inode->i_lock and
>>>>>> updates state of writeback ownership, on slow path might be more work.
>>>>>> Current this path is safely avoided only when inode mapping has no pages.
>>>>>>
>>>>>> For example generic_file_read_iter() calls filemap_write_and_wait_range()
>>>>>> at each O_DIRECT read - pretty hot path.
>>>
>>> Yes, but in common case mapping_needs_writeback() is false for files you do
>>> direct IO to (exactly the case with no pages in the mapping). So you
>>> shouldn't see the overhead at all. So which case you really care about?
>>>
>>>>>> This patch skips starting new writeback if mapping has no dirty tags set.
>>>>>> If writeback is already in progress filemap_write_and_wait_range() will
>>>>>> wait for it.
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> --- a/mm/filemap.c
>>>>>> +++ b/mm/filemap.c
>>>>>> @@ -408,7 +408,8 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start,
>>>>>>     		.range_end = end,
>>>>>>     	};
>>>>>> -	if (!mapping_cap_writeback_dirty(mapping))
>>>>>> +	if (!mapping_cap_writeback_dirty(mapping) ||
>>>>>> +	    !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
>>>>>>     		return 0;
>>>>>>     	wbc_attach_fdatawrite_inode(&wbc, mapping->host);
>>>>>
>>>>> How does this play with tagged_writepages?  We assume that no tagging
>>>>> has been performed by any __filemap_fdatawrite_range() caller?
>>>>>
>>>>
>>>> Checking also PAGECACHE_TAG_TOWRITE is cheap but seems redundant.
>>>>
>>>> To-write tags are supposed to be a subset of dirty tags:
>>>> to-write is set only when dirty is set and cleared after starting writeback.
>>>>
>>>> Special case set_page_writeback_keepwrite() which does not clear to-write
>>>> should be for dirty page thus dirty tag is not going to be cleared either.
>>>> Ext4 calls it after redirty_page_for_writepage()
>>>> XFS even without clear_page_dirty_for_io()
>>>>
>>>> Anyway to-write tag without dirty tag or at clear page is confusing.
>>>
>>> Yeah, TOWRITE tag is intended to be internal to writepages logic so your
>>> patch is fine in that regard. Overall the patch looks good to me so I'm
>>> just wondering a bit about the motivation...
>>
>> In our case file mixes cached pages and O_DIRECT read. Kind of database
>> were index header is memory mapped while the rest data read via O_DIRECT.
>> I suppose for sharing index between multiple instances.
> 
> OK, that has always been a bit problematic but you're not the first one to
> have such design ;). So feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> to your patch.

Thanks.

O_DIRECT has long history of misunderstandings =)
It looks some cases are still not documented.
My favourite: O_DIRECT write into hole goes into cache, at least for ext4.

> 
>> On this path we also hit this bug:
>> https://lore.kernel.org/lkml/156355839560.2063.5265687291430814589.stgit@buzz/
>> so that's why I've started looking into this code.
> 
> I see. OK.
> 
> 								Honza
> 

      reply	other threads:[~2019-07-30 18:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-22  9:36 [PATCH 1/2] mm/filemap: don't initiate writeback if mapping has no dirty pages Konstantin Khlebnikov
2019-07-22  9:36 ` [PATCH 2/2] mm/filemap: rewrite mapping_needs_writeback in less fancy manner Konstantin Khlebnikov
2019-07-23  0:52 ` [PATCH 1/2] mm/filemap: don't initiate writeback if mapping has no dirty pages Andrew Morton
2019-07-23  8:16   ` Konstantin Khlebnikov
2019-07-30 14:14     ` Jan Kara
2019-07-30 14:57       ` Konstantin Khlebnikov
2019-07-30 15:48         ` Jan Kara
2019-07-30 18:15           ` Konstantin Khlebnikov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c28d4243-aeb9-901a-46e9-bfe2e704cd8f@yandex-team.ru \
    --to=khlebnikov@yandex-team.ru \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.