All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: "Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com>
Cc: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"adilger.kernel@dilger.ca" <adilger.kernel@dilger.ca>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: Ext4: Buffered random writes performance regression with dioread_nolock enabled
Date: Tue, 20 Sep 2022 22:07:37 -0400	[thread overview]
Message-ID: <Yypx6VQRbl3bFP2v@mit.edu> (raw)
In-Reply-To: <28460B7B-F66E-4BDC-9F6E-B7E77A3FEE83@amazon.com>

On Mon, Sep 19, 2022 at 03:06:46PM +0000, Mohamed Abuelfotoh, Hazem wrote:
> Hey Team,
> 
> 
>   *   I am sending this e-mail to report a performance regression that’s caused by commit 244adf6426(ext4: make dioread_nolock the default) , I am listing the performance regression symptoms below & our analysis for the reported regression.

Performance regressions are always tricky; dioread_nolock improves on
some workloads, and can cause regressions for others.  In this
particular case, the choice to make it the default was to also fix a
direct I/O vs. writeback race which can result in stale data being
revealed (which is a security issue).

That being said...

1)  as you've noted, this commit has been around since 5.6.

2)  as you noted,

    Increasing the journal size from ext4 128 MiB to 1GiB will also
    fix the problem .

Since 2016, the commit bbd2f78cf63a ("libext2fs: allow the default
journal size to go as large as a gigabyte") has been in e2fsprogs
v1.43.2 and newer (the current version of e2fsprogs v1.46.5; v1.43.2
was released in September 2016, six years ago).  Quoting the commit
description:

    Recent research has shown that for a metadata-heavy workload, a 128 MB
    is journal be a bottleneck on HDD's, and that the optimal journal size
    is proportional to number of unique metadata blocks that can be
    modified (and written into the journal) in a 30 second window.  One
    gigabyte should be sufficient for most workloads, which will be used
    for file systems larger than 128 gigabytes.

So this should not be a problem in practice, and if there are users
who are using antedeluvian versions of e2fsprogs, or who have old file
systems which were created many years ago, it's quite easy to adjust
the journal size.  For example, to adjust the journal to be 2GiB (2048
MiB), just run the commands:

   tune2fs -O ^has_journal /dev/sdXX
   tune2fs -O has_journal -J size=2048 /tmp/sdXX

Hence, I disagree that we should revert commit 244adf6426.  It may be
that for your workload and your file system configuration, using the
mount option nodioread_nolock (or dioread_lock), may make sense.  But
there were also workloads for which using dioread_nolock improved
benchmark numbers, so the question of which is the better default is
not at all obvious.

That being said, I do have plans for a new writeback scheme which will
replace dioread_nolock *and* dioread_lock, and which will hopefully be
faster than either approach.

						- Ted

P.S.  I'm puzzled by your comment, "we have to note that this should
be only beneficial with extent-based files" --- while this is true,
why does this matter?  Unless you're dealing with an ancient file
system that was originally created as ext2 or ext3 and then converted
to ext4, *all* ext4 files should be extent-based...

       reply	other threads:[~2022-09-21  2:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <28460B7B-F66E-4BDC-9F6E-B7E77A3FEE83@amazon.com>
2022-09-21  2:07 ` Theodore Ts'o [this message]
2022-09-21 12:06   ` Ext4: Buffered random writes performance regression with dioread_nolock enabled Mohamed Abuelfotoh, Hazem
2022-09-19 15:18 hazem ahmed mohamed
2022-09-20  8:07 ` Thorsten Leemhuis
2022-09-20 13:59   ` hazem ahmed mohamed
2022-09-21  2:41   ` Theodore Ts'o
2022-09-21  9:55     ` Thorsten Leemhuis
2022-09-21  4:44 ` Ritesh Harjani (IBM)
2022-09-21 12:22   ` hazem ahmed mohamed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yypx6VQRbl3bFP2v@mit.edu \
    --to=tytso@mit.edu \
    --cc=abuehaze@amazon.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.