linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Y. Ts'o" <tytso@mit.edu>
To: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Jan Kara <jack@suse.cz>,
	joseph.qi@linux.alibaba.com, Liu Bo <bo.liu@linux.alibaba.com>
Subject: Re: Discussion: is it time to remove dioread_nolock?
Date: Mon, 6 Jan 2020 19:43:38 -0500	[thread overview]
Message-ID: <20200107004338.GB125832@mit.edu> (raw)
In-Reply-To: <20200106122457.A10F7AE053@d06av26.portsmouth.uk.ibm.com>

On Mon, Jan 06, 2020 at 05:54:56PM +0530, Ritesh Harjani wrote:
> > The initial reason we use dioread_nolock is that it'll also allocate
> > unwritten extents for buffered write, and normally the corresponding
> > inode won't be added to jbd2 transaction's t_inode_list, so while
> > commiting transaction, it won't flush inodes' dirty pages, then
> > transaction will commit quickly, otherwise in extream case, the time
> 
> I do notice this in ext4_map_blocks(). We add inode to t_inode_list only
> in case if we allocate written blocks. I guess this was done to avoid
> stale data exposure problem. So now due to ordered mode, we may end up
> flushing all dirty data pages in committing transaction before the
> metadata is flushed.
> 
> Do you have any benchmarks or workload where we could see this problem?
> And could this actually be a problem with any real world workload too?

After thinking about this some more, I've changed my mind.

I think this is something which *can* be very noticeable in some real
world workloads.  If the workload is doing a lot of allocating,
buffered writes to an inode, and the writeback thread starts doing the
writeback for that inode right before a commit starts, then the commit
can take a long time.  The problem is that if the storage device is
particularly slow --- for example, a slow USB drive, or a 32 GiB
Standard Persistent Disk in a Google Compute Environment (which has a
max sustained throughput of 3 MiB/s), it doesn't take a lot of queued
writeback I/O to trigger a hung task warning.  Even if hung task panic
isn't enabled, if the commit thread is busied out for a minute or two,
anything that is blocked on a commit completing --- such a fsync(2)
system call, could end up getting blocked for a long time, and that
could easily make a userspace application sad.

> Jan/Ted, your opinion on this pls?
> 
> I do see that there was a proposal by Ted @ [1] which should
> also solve this problem. I do have plans to work on Ted's proposal, but
> meanwhile, should we preserve this mount option for above mentioned use
> case? Or should we make it a no-op now?

> [1] - https://marc.info/?l=linux-ext4&m=157244559501734&w=2

I agree that this was not the original intent of dioread_nolock, but I
until we can implement [1], dioread_nolock is the only workaround real
workaround we have today.  (Well, data=writeback also works, but that
has other problems.)

If dropping dioread_nolock makes it easier to implement [1], we can
certainly make that one of the first patches in a patch series which
changes how we ext4_writepages() works so it writes the data blocks
before it updates the metadata blocks.  But unless there are some real
downsides to keeping the code around in the kernel until then, I'm not
sure it's worth it to take away the diorad_nolock functionality until
we have a good replacement --- even if that wasn't the original
purpose of the code.

What do other folks think?

					- Ted

  reply	other threads:[~2020-01-07  0:43 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-26 15:31 Discussion: is it time to remove dioread_nolock? Theodore Y. Ts'o
2019-12-27 13:10 ` Joseph Qi
2019-12-29 15:03 ` Xiaoguang Wang
2020-01-06 12:24   ` Ritesh Harjani
2020-01-07  0:43     ` Theodore Y. Ts'o [this message]
2020-01-07  8:22       ` Jan Kara
2020-01-07 17:11         ` Theodore Y. Ts'o
2020-01-07 17:22           ` Jan Kara
2020-01-08 10:45             ` Ritesh Harjani
2020-01-08 17:42               ` Theodore Y. Ts'o
2020-01-09  9:21                 ` Ritesh Harjani
2020-01-09 16:38                   ` Theodore Y. Ts'o
2020-01-14 23:30                     ` Ritesh Harjani
2020-01-15 16:48                       ` Theodore Y. Ts'o
2020-01-16  9:46                         ` Ritesh Harjani
2020-01-09 12:34                 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200107004338.GB125832@mit.edu \
    --to=tytso@mit.edu \
    --cc=bo.liu@linux.alibaba.com \
    --cc=jack@suse.cz \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=riteshh@linux.ibm.com \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).