linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ritesh Harjani <riteshh@linux.ibm.com>
To: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
	"Theodore Y. Ts'o" <tytso@mit.edu>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Jan Kara <jack@suse.cz>
Cc: joseph.qi@linux.alibaba.com, Liu Bo <bo.liu@linux.alibaba.com>
Subject: Re: Discussion: is it time to remove dioread_nolock?
Date: Mon, 6 Jan 2020 17:54:56 +0530	[thread overview]
Message-ID: <20200106122457.A10F7AE053@d06av26.portsmouth.uk.ibm.com> (raw)
In-Reply-To: <9042a8f4-985a-fc83-c059-241c9440200c@linux.alibaba.com>



On 12/29/19 8:33 PM, Xiaoguang Wang wrote:
> hi,
> 
>> With inclusion of Ritesh's inode lock scalability patches[1], the
>> traditional performance reasons for dioread_nolock --- namely,
>> removing the need to take an exclusive lock for Direct I/O read
>> operations --- has been removed.
>>
>> [1] 
>> https://lore.kernel.org/r/20191212055557.11151-1-riteshh@linux.ibm.com
>>
>> So... is it time to remove the code which supports dioread_nolock?
>> Doing so would simplify the code base, and reduce the test matrix.
>> This would also make it easier to restructure the write path when
>> allocating blocks so that the extent tree is updated after writing out
>> the data blocks, by clearing away the underbrush of dioread nolock
>> first.
>>
>> If we do this, we'd leave the dioread_nolock mount option for
>> backwards compatibility, but it would be a no-op and not actually do
>> anything.
>>
>> Any objections before I look into ripping out dioread_nolock?
>>
>> The one possible concern that I considered was for Alibaba, which was
>> doing something interesting with dioread_nolock plus nodelalloc.  But
>> looking at Liu Bo's explanation[2], I believe that their workload
>> would be satisfied simply by using the standard ext4 mount options
>> (that is, the default mode has the performance benefits when doing
>> parallel DIO reads, and so the need for nodelalloc to mitigate the
>> tail latency concerns which Alibaba was seeing in their workload would
>> not be needed).  Could Liu or someone from Alibaba confirm, perhaps
>> with some benchmarks using their workload?
> Currently we don't use dioread_nolock & nodelalloc in our internal
> servers, and we use dioread_nolock & delalloc widely, it works well.
> 
> The initial reason we use dioread_nolock is that it'll also allocate
> unwritten extents for buffered write, and normally the corresponding
> inode won't be added to jbd2 transaction's t_inode_list, so while
> commiting transaction, it won't flush inodes' dirty pages, then
> transaction will commit quickly, otherwise in extream case, the time

I do notice this in ext4_map_blocks(). We add inode to t_inode_list only
in case if we allocate written blocks. I guess this was done to avoid
stale data exposure problem. So now due to ordered mode, we may end up
flushing all dirty data pages in committing transaction before the
metadata is flushed.

Do you have any benchmarks or workload where we could see this problem?
And could this actually be a problem with any real world workload too?

Actually speaking, dioread_nolock mount option was not meant to solve 
the problem you mentioned.
Atleast as per the Documentation it is meant to help with the inode lock
scalablity issue in DIO reads case, which mostly should be fixed
with recent fixes pointed by Ted in above link.



> taking to flush dirty inodes will be very big, especially cgroup writeback
> is enabled. A previous discussion: 
> https://marc.info/?l=linux-fsdevel&m=151799957104768&w=2
> I think this semantics hidden behind diread_nolock is also important,
> so if planning to remove this mount option, we should keep same semantics.


Jan/Ted, your opinion on this pls?

I do see that there was a proposal by Ted @ [1] which should
also solve this problem. I do have plans to work on Ted's proposal, but
meanwhile, should we preserve this mount option for above mentioned use
case? Or should we make it a no-op now?

Also as an update, I am working on a patch which should fix the
stale data exposure issue which we currently have with ext4_page_mkwrite
& ext4 DIO reads. With that fixed it was discussed @ [2], that we don't
need dioread_nolock, so I was also planning to make this mount option a
no-op. Will soon post the RFC version of this patch too.


[1] - https://marc.info/?l=linux-ext4&m=157244559501734&w=2

[2] - 
https://lore.kernel.org/linux-ext4/20191120143257.GE9509@quack2.suse.cz/


-ritesh


> 
> Regards,
> Xiaoguang Wang
> 
>>
>> [2] 
>> https://lore.kernel.org/linux-ext4/20181121013035.ab4xp7evjyschecy@US-160370MP2.local/ 
>>
>>
>>                                          - Ted
>>


  reply	other threads:[~2020-01-06 12:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-26 15:31 Discussion: is it time to remove dioread_nolock? Theodore Y. Ts'o
2019-12-27 13:10 ` Joseph Qi
2019-12-29 15:03 ` Xiaoguang Wang
2020-01-06 12:24   ` Ritesh Harjani [this message]
2020-01-07  0:43     ` Theodore Y. Ts'o
2020-01-07  8:22       ` Jan Kara
2020-01-07 17:11         ` Theodore Y. Ts'o
2020-01-07 17:22           ` Jan Kara
2020-01-08 10:45             ` Ritesh Harjani
2020-01-08 17:42               ` Theodore Y. Ts'o
2020-01-09  9:21                 ` Ritesh Harjani
2020-01-09 16:38                   ` Theodore Y. Ts'o
2020-01-14 23:30                     ` Ritesh Harjani
2020-01-15 16:48                       ` Theodore Y. Ts'o
2020-01-16  9:46                         ` Ritesh Harjani
2020-01-09 12:34                 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200106122457.A10F7AE053@d06av26.portsmouth.uk.ibm.com \
    --to=riteshh@linux.ibm.com \
    --cc=bo.liu@linux.alibaba.com \
    --cc=jack@suse.cz \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).