linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauricio Faria de Oliveira <mfo@canonical.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>, linux-ext4@vger.kernel.org
Cc: dann frazier <dann.frazier@canonical.com>,
	Andreas Dilger <adilger@dilger.ca>, Jan Kara <jack@suse.com>
Subject: Re: [RFC PATCH 00/11] ext4: data=journal: writeback mmap'ed pagecache
Date: Fri, 15 May 2020 15:39:16 -0300	[thread overview]
Message-ID: <CAO9xwp1Gj+tffyp0Q=99VBnhX3WvHaq7qg7pf4kpty9_0+-ACQ@mail.gmail.com> (raw)
In-Reply-To: <20200423233705.5878-1-mfo@canonical.com>

Hi,

On Thu, Apr 23, 2020 at 8:37 PM Mauricio Faria de Oliveira
<mfo@canonical.com> wrote:
[snip]
> Summary:
> -------
>
> The patchset is a bit long with 11 patches, but I tried to get
> changes tiny to help with review, and better document how each
> of them work, why and how this or that is done.  It's RFC as I
> would like to ask for suggestions/feedback, if at all possible.

If at all possible, may this patchset have at least a cursory look?

I'm aware it's been a busy period for some of you, so I just wanted
to friendly ping on it, in case this got buried deep under other stuff.

Thanks!


>
> Patch 01 and 02 implement the outlined fix, with a few changes
> (fix first deadlock; use existing plumbing in jbd2 as the list.)
>
> Patch 03 fix a seconds-delay on msync().
>
> Patch 04 introduces helpers to handle the second deadlock.
>
> Patch 05-11 handle the second deadlock (three of these patches,
> namely 07, 09 and 10 are changes not specific for data=journal,
> affecting other journaling modes, so it's not on their subject.)
>
> The order of the patches intentionally allow the issues on 03
> and 05-11 to occur (while putting the core patches first), so
> to allow issues to be reproduced/regression tested one by one,
> as needed.  It can be changed, of course, so to enable actual
> writeback changes in the last patch (when issues are patched.)
>
>
> Testing:
> -------
>
> This has been built and regression tested on next-20200417.
> (Also rebased and build tested on next-20200423 / "today").
>
> On xfstests (commit b2faf204) quick group (and excluding
> generic 430/431/434 which always hung): no regressions w/
> data=ordered (default) nor data=journal,journal_checksum.
>
> With data=ordered: (on both original and patched kernel)
>
>     Failures: generic/223 generic/465 generic/553 generic/554 generic/565 generic/570
>
> With data=journal,journal_checksum: (likewise)
>
>     Failures: ext4/044 generic/223 generic/441 generic/553 generic/554 generic/565 generic/570
>
> The test-case for the problem (and deadlocks) and further
> stress testing is stress-ng (with 512 workers on 16 vCPUs)
>
>     $ sudo mount -o data=journal,journal_checksum $DEV $MNT
>     $ cd $MNT
>     $ sudo stress-ng --mmap 512 --mmap-file --timeout 1w
>
> To reproduce the problem (without patchset), run it a bit
> and crash the kernel (to cause unclean shutdown) w/ sysrq,
> and mount the device again (it should fail / need e2fsck):
>
> Original:
>
>     [   27.660063] JBD2: Invalid checksum recovering data block 79449 in log
>     [   27.792371] JBD2: recovery failed
>     [   27.792854] EXT4-fs (vdc): error loading journal
>     mount: /tmp/ext4: can't read superblock on /dev/vdc.
>
> Patched:
>
>     [  33.111230] EXT4-fs (vdc): 512 orphan inodes deleted
>     [  33.111961] EXT4-fs (vdc): recovery complete
>     [  33.114590] EXT4-fs (vdc): mounted filesystem with journalled data mode. Opts: data=journal,journal_checksum
>
>
> RFC / Questions:
> ---------------
>
> 0) Usage of ext4_inode_info.i_datasync_tid for checks
>
> We rely on the struct ext4_inode_info.i_datasync_tid field
> (set by __ext4_journalled_writepage() and others) to check
> it against the running transaction. Of course, other sites
> set it too, and it might be that some of our checks return
> false positives then (should be fine, just less efficient.)
>
> To avoid such false positives, we could add another field
> to that structure, exclusively for this, but that is more
> 8 bytes (pointer) for inodes and even on non-data=journal
> cases.. so it didn't seem good enough reason, but if that
> is better/worth it for efficiency reasons (speed, in this
> case, vs. memory consumption) we could do it.
>
> Maybe there are other ideas/better ways to do it?
>
> 1) Usage of ext4_force_commit() in ext4_writepages()
>
> Patch 03 describes/fixes an issue where the underlying problem is,
> if __ext4_journalled_writepage() does set_page_writeback() but no
> journal commit is triggered, wait_on_page_writeback() may wait up
> to seconds until the periodic journal commit happens.
>
> The solution there, to fix the impact on msync(), is to just call
> ext4_force_commit() (as it's done anyway in ext4_sync_file()), on
> ext4_writepages().
>
> Is that a good enough solution?  Other ideas?
>
> 2) Similar issue (unhandled) in ext4_writepage()
>
> The other, related question is, what about direct callers of
> ext4_writepage() that obviously do not use ext4_writepages() ?
> (e.g., pageout() and writeout(); write_one_page() not used.)
>
> Those are memory-cleasing writeback, which should not wait,
> however, as mentioned in that patch, if its writeback goes
> on for seconds and an data-integrity writeback/system call
> comes in, it is delayed/wait_on_page_writeback() that long.
>
> So, ideally, we should be trying to kick a journal commit?
>
> It looks like ext4_handle_sync() is not the answer, since
> it waits for commit to finish, and pageout() is called on
> a list of pages by shrinking.  So, not effective to block
> on each one of them.
>
> We might not want to start anything right now, actually,
> since the memory-cleasing writeback can be happening on
> memory pressure scenarios, right?  But would need to do
> something, to ensure that future wait_on_page_writeback()
> do not wait too long.
>
> Maybe the answer is something similar to jbd2 sync transaction
> batching (used by ext4_handle_sync()), but in *async* fashion,
> say, possibly implemented/batching in the jbd2 worker thread.
> Is that reasonable?
>
> ...
>
> Any comments/feedback/reviews are very appreciated.
>
> Thank you in advance,
> Mauricio
>
> [1] https://lore.kernel.org/linux-ext4/20190830012236.GC10779@mit.edu/
>
> Mauricio Faria de Oliveira (11):
>   ext4: data=journal: introduce struct/kmem_cache
>     ext4_journalled_wb_page/_cachep
>   ext4: data=journal: handle page writeback in
>     __ext4_journalled_writepage()
>   ext4: data=journal: call ext4_force_commit() in ext4_writepages() for
>     msync()
>   ext4: data=journal: introduce helpers for journalled writeback
>     deadlock
>   ext4: data=journal: prevent journalled writeback deadlock in
>     __ext4_journalled_writepage()
>   ext4: data=journal: prevent journalled writeback deadlock in
>     ext4_write_begin()
>   ext4: grab page before starting transaction handle in
>     ext4_convert_inline_data_to_extent()
>   ext4: data=journal: prevent journalled writeback deadlock in
>     ext4_convert_inline_data_to_extent()
>   ext4: grab page before starting transaction handle in
>     ext4_try_to_write_inline_data()
>   ext4: deduplicate code with error legs in
>     ext4_try_to_write_inline_data()
>   ext4: data=journal: prevent journalled writeback deadlock in
>     ext4_try_to_write_inline_data()
>
>  fs/ext4/ext4_jbd2.h |  88 +++++++++++++++++++++++++
>  fs/ext4/inline.c    | 153 +++++++++++++++++++++++++++++++-------------
>  fs/ext4/inode.c     | 137 +++++++++++++++++++++++++++++++++++++--
>  fs/ext4/page-io.c   |  11 ++++
>  4 files changed, 341 insertions(+), 48 deletions(-)
>
> --
> 2.20.1
>


--
Mauricio Faria de Oliveira

  parent reply	other threads:[~2020-05-15 18:39 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-23 23:36 [RFC PATCH 00/11] ext4: data=journal: writeback mmap'ed pagecache Mauricio Faria de Oliveira
2020-04-23 23:36 ` [RFC PATCH 01/11] ext4: data=journal: introduce struct/kmem_cache ext4_journalled_wb_page/_cachep Mauricio Faria de Oliveira
2020-04-23 23:36 ` [RFC PATCH 02/11] ext4: data=journal: handle page writeback in __ext4_journalled_writepage() Mauricio Faria de Oliveira
2020-04-23 23:36 ` [RFC PATCH 03/11] ext4: data=journal: call ext4_force_commit() in ext4_writepages() for msync() Mauricio Faria de Oliveira
2020-04-23 23:36 ` [RFC PATCH 04/11] ext4: data=journal: introduce helpers for journalled writeback deadlock Mauricio Faria de Oliveira
2020-04-23 23:36 ` [RFC PATCH 05/11] ext4: data=journal: prevent journalled writeback deadlock in __ext4_journalled_writepage() Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 06/11] ext4: data=journal: prevent journalled writeback deadlock in ext4_write_begin() Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 07/11] ext4: grab page before starting transaction handle in ext4_convert_inline_data_to_extent() Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 08/11] ext4: data=journal: prevent journalled writeback deadlock " Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 09/11] ext4: grab page before starting transaction handle in ext4_try_to_write_inline_data() Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 10/11] ext4: deduplicate code with error legs " Mauricio Faria de Oliveira
2020-04-23 23:37 ` [RFC PATCH 11/11] ext4: data=journal: prevent journalled writeback deadlock " Mauricio Faria de Oliveira
2020-05-15 18:39 ` Mauricio Faria de Oliveira [this message]
2020-05-17  7:40   ` [RFC PATCH 00/11] ext4: data=journal: writeback mmap'ed pagecache Andreas Dilger
2020-06-10 13:21 ` Jan Kara
2020-06-10 15:15   ` Mauricio Faria de Oliveira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAO9xwp1Gj+tffyp0Q=99VBnhX3WvHaq7qg7pf4kpty9_0+-ACQ@mail.gmail.com' \
    --to=mfo@canonical.com \
    --cc=adilger@dilger.ca \
    --cc=dann.frazier@canonical.com \
    --cc=jack@suse.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).