From: Mauricio Faria de Oliveira <mfo@canonical.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>, linux-ext4@vger.kernel.org
Cc: dann frazier <dann.frazier@canonical.com>,
Andreas Dilger <adilger@dilger.ca>, Jan Kara <jack@suse.com>
Subject: Re: [RFC PATCH 00/11] ext4: data=journal: writeback mmap'ed pagecache
Date: Fri, 15 May 2020 15:39:16 -0300 [thread overview]
Message-ID: <CAO9xwp1Gj+tffyp0Q=99VBnhX3WvHaq7qg7pf4kpty9_0+-ACQ@mail.gmail.com> (raw)
In-Reply-To: <20200423233705.5878-1-mfo@canonical.com>
Hi,
On Thu, Apr 23, 2020 at 8:37 PM Mauricio Faria de Oliveira
<mfo@canonical.com> wrote:
[snip]
> Summary:
> -------
>
> The patchset is a bit long with 11 patches, but I tried to get
> changes tiny to help with review, and better document how each
> of them work, why and how this or that is done. It's RFC as I
> would like to ask for suggestions/feedback, if at all possible.
If at all possible, may this patchset have at least a cursory look?
I'm aware it's been a busy period for some of you, so I just wanted
to friendly ping on it, in case this got buried deep under other stuff.
Thanks!
>
> Patch 01 and 02 implement the outlined fix, with a few changes
> (fix first deadlock; use existing plumbing in jbd2 as the list.)
>
> Patch 03 fix a seconds-delay on msync().
>
> Patch 04 introduces helpers to handle the second deadlock.
>
> Patch 05-11 handle the second deadlock (three of these patches,
> namely 07, 09 and 10 are changes not specific for data=journal,
> affecting other journaling modes, so it's not on their subject.)
>
> The order of the patches intentionally allow the issues on 03
> and 05-11 to occur (while putting the core patches first), so
> to allow issues to be reproduced/regression tested one by one,
> as needed. It can be changed, of course, so to enable actual
> writeback changes in the last patch (when issues are patched.)
>
>
> Testing:
> -------
>
> This has been built and regression tested on next-20200417.
> (Also rebased and build tested on next-20200423 / "today").
>
> On xfstests (commit b2faf204) quick group (and excluding
> generic 430/431/434 which always hung): no regressions w/
> data=ordered (default) nor data=journal,journal_checksum.
>
> With data=ordered: (on both original and patched kernel)
>
> Failures: generic/223 generic/465 generic/553 generic/554 generic/565 generic/570
>
> With data=journal,journal_checksum: (likewise)
>
> Failures: ext4/044 generic/223 generic/441 generic/553 generic/554 generic/565 generic/570
>
> The test-case for the problem (and deadlocks) and further
> stress testing is stress-ng (with 512 workers on 16 vCPUs)
>
> $ sudo mount -o data=journal,journal_checksum $DEV $MNT
> $ cd $MNT
> $ sudo stress-ng --mmap 512 --mmap-file --timeout 1w
>
> To reproduce the problem (without patchset), run it a bit
> and crash the kernel (to cause unclean shutdown) w/ sysrq,
> and mount the device again (it should fail / need e2fsck):
>
> Original:
>
> [ 27.660063] JBD2: Invalid checksum recovering data block 79449 in log
> [ 27.792371] JBD2: recovery failed
> [ 27.792854] EXT4-fs (vdc): error loading journal
> mount: /tmp/ext4: can't read superblock on /dev/vdc.
>
> Patched:
>
> [ 33.111230] EXT4-fs (vdc): 512 orphan inodes deleted
> [ 33.111961] EXT4-fs (vdc): recovery complete
> [ 33.114590] EXT4-fs (vdc): mounted filesystem with journalled data mode. Opts: data=journal,journal_checksum
>
>
> RFC / Questions:
> ---------------
>
> 0) Usage of ext4_inode_info.i_datasync_tid for checks
>
> We rely on the struct ext4_inode_info.i_datasync_tid field
> (set by __ext4_journalled_writepage() and others) to check
> it against the running transaction. Of course, other sites
> set it too, and it might be that some of our checks return
> false positives then (should be fine, just less efficient.)
>
> To avoid such false positives, we could add another field
> to that structure, exclusively for this, but that is more
> 8 bytes (pointer) for inodes and even on non-data=journal
> cases.. so it didn't seem good enough reason, but if that
> is better/worth it for efficiency reasons (speed, in this
> case, vs. memory consumption) we could do it.
>
> Maybe there are other ideas/better ways to do it?
>
> 1) Usage of ext4_force_commit() in ext4_writepages()
>
> Patch 03 describes/fixes an issue where the underlying problem is,
> if __ext4_journalled_writepage() does set_page_writeback() but no
> journal commit is triggered, wait_on_page_writeback() may wait up
> to seconds until the periodic journal commit happens.
>
> The solution there, to fix the impact on msync(), is to just call
> ext4_force_commit() (as it's done anyway in ext4_sync_file()), on
> ext4_writepages().
>
> Is that a good enough solution? Other ideas?
>
> 2) Similar issue (unhandled) in ext4_writepage()
>
> The other, related question is, what about direct callers of
> ext4_writepage() that obviously do not use ext4_writepages() ?
> (e.g., pageout() and writeout(); write_one_page() not used.)
>
> Those are memory-cleasing writeback, which should not wait,
> however, as mentioned in that patch, if its writeback goes
> on for seconds and an data-integrity writeback/system call
> comes in, it is delayed/wait_on_page_writeback() that long.
>
> So, ideally, we should be trying to kick a journal commit?
>
> It looks like ext4_handle_sync() is not the answer, since
> it waits for commit to finish, and pageout() is called on
> a list of pages by shrinking. So, not effective to block
> on each one of them.
>
> We might not want to start anything right now, actually,
> since the memory-cleasing writeback can be happening on
> memory pressure scenarios, right? But would need to do
> something, to ensure that future wait_on_page_writeback()
> do not wait too long.
>
> Maybe the answer is something similar to jbd2 sync transaction
> batching (used by ext4_handle_sync()), but in *async* fashion,
> say, possibly implemented/batching in the jbd2 worker thread.
> Is that reasonable?
>
> ...
>
> Any comments/feedback/reviews are very appreciated.
>
> Thank you in advance,
> Mauricio
>
> [1] https://lore.kernel.org/linux-ext4/20190830012236.GC10779@mit.edu/
>
> Mauricio Faria de Oliveira (11):
> ext4: data=journal: introduce struct/kmem_cache
> ext4_journalled_wb_page/_cachep
> ext4: data=journal: handle page writeback in
> __ext4_journalled_writepage()
> ext4: data=journal: call ext4_force_commit() in ext4_writepages() for
> msync()
> ext4: data=journal: introduce helpers for journalled writeback
> deadlock
> ext4: data=journal: prevent journalled writeback deadlock in
> __ext4_journalled_writepage()
> ext4: data=journal: prevent journalled writeback deadlock in
> ext4_write_begin()
> ext4: grab page before starting transaction handle in
> ext4_convert_inline_data_to_extent()
> ext4: data=journal: prevent journalled writeback deadlock in
> ext4_convert_inline_data_to_extent()
> ext4: grab page before starting transaction handle in
> ext4_try_to_write_inline_data()
> ext4: deduplicate code with error legs in
> ext4_try_to_write_inline_data()
> ext4: data=journal: prevent journalled writeback deadlock in
> ext4_try_to_write_inline_data()
>
> fs/ext4/ext4_jbd2.h | 88 +++++++++++++++++++++++++
> fs/ext4/inline.c | 153 +++++++++++++++++++++++++++++++-------------
> fs/ext4/inode.c | 137 +++++++++++++++++++++++++++++++++++++--
> fs/ext4/page-io.c | 11 ++++
> 4 files changed, 341 insertions(+), 48 deletions(-)
>
> --
> 2.20.1
>
--
Mauricio Faria de Oliveira
next prev parent reply other threads:[~2020-05-15 18:39 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-23 23:36 [RFC PATCH 00/11] ext4: data=journal: writeback mmap'ed pagecache Mauricio Faria de Oliveira
2020-04-23 23:36 ` [RFC PATCH 01/11] ext4: data=journal: introduce struct/kmem_cache ext4_journalled_wb_page/_cachep Mauricio Faria de Oliveira
2020-04-23 23:36 ` [RFC PATCH 02/11] ext4: data=journal: handle page writeback in __ext4_journalled_writepage() Mauricio Faria de Oliveira
2020-04-23 23:36 ` [RFC PATCH 03/11] ext4: data=journal: call ext4_force_commit() in ext4_writepages() for msync() Mauricio Faria de Oliveira
2020-04-23 23:36 ` [RFC PATCH 04/11] ext4: data=journal: introduce helpers for journalled writeback deadlock Mauricio Faria de Oliveira
2020-04-23 23:36 ` [RFC PATCH 05/11] ext4: data=journal: prevent journalled writeback deadlock in __ext4_journalled_writepage() Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 06/11] ext4: data=journal: prevent journalled writeback deadlock in ext4_write_begin() Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 07/11] ext4: grab page before starting transaction handle in ext4_convert_inline_data_to_extent() Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 08/11] ext4: data=journal: prevent journalled writeback deadlock " Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 09/11] ext4: grab page before starting transaction handle in ext4_try_to_write_inline_data() Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 10/11] ext4: deduplicate code with error legs " Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 11/11] ext4: data=journal: prevent journalled writeback deadlock " Mauricio Faria de Oliveira
2020-05-15 18:39 ` Mauricio Faria de Oliveira [this message]
2020-05-17 7:40 ` [RFC PATCH 00/11] ext4: data=journal: writeback mmap'ed pagecache Andreas Dilger
2020-06-10 13:21 ` Jan Kara
2020-06-10 15:15 ` Mauricio Faria de Oliveira
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAO9xwp1Gj+tffyp0Q=99VBnhX3WvHaq7qg7pf4kpty9_0+-ACQ@mail.gmail.com' \
--to=mfo@canonical.com \
--cc=adilger@dilger.ca \
--cc=dann.frazier@canonical.com \
--cc=jack@suse.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).