All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 6/7] btrfs: remove unnecessary check_parent_dirs_for_sync()
Date: Wed, 27 Jan 2021 10:42:12 -0500	[thread overview]
Message-ID: <3fec2b88-99e8-7aba-25bd-f746aed8ac7f@toxicpanda.com> (raw)
In-Reply-To: <CAL3q7H674gb03GJh3owLSVBndSO0JsT3STVHJDeOGU72_Ar4LQ@mail.gmail.com>

On 1/27/21 10:36 AM, Filipe Manana wrote:
> On Wed, Jan 27, 2021 at 3:23 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> On 1/27/21 5:34 AM, fdmanana@kernel.org wrote:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> Whenever we fsync an inode, if it is a directory, a regular file that was
>>> created in the current transaction or has last_unlink_trans set to the
>>> generation of the current transaction, we check if any of its ancestor
>>> inodes (and the inode itself if it is a directory) can not be logged and
>>> need a fallback to a full transaction commit - if so, we return with a
>>> value of 1 in order to fallback to a transaction commit.
>>>
>>> However we often do not need to fallback to a transaction commit because:
>>>
>>> 1) The ancestor inode is not an immediate parent, and therefore there is
>>>      not an explicit request to log it and it is not needed neither to
>>>      guarantee the consistency of the inode originally asked to be logged
>>>      (fsynced) nor its immediate parent;
>>>
>>> 2) The ancestor inode was already logged before, in which case any link,
>>>      unlink or rename operation updates the log as needed.
>>>
>>> So for these two cases we can avoid an unnecessary transaction commit.
>>> Therefore remove check_parent_dirs_for_sync() and add a check at the top
>>> of btrfs_log_inode() to make us fallback immediately to a transaction
>>> commit when we are logging a directory inode that can not be logged and
>>> needs a full transaction commit. All we need to protect is the case where
>>> after renaming a file someone fsyncs only the old directory, which would
>>> result is losing the renamed file after a log replay.
>>>
>>> This patch is part of a patchset comprised of the following patches:
>>>
>>>     btrfs: remove unnecessary directory inode item update when deleting dir entry
>>>     btrfs: stop setting nbytes when filling inode item for logging
>>>     btrfs: avoid logging new ancestor inodes when logging new inode
>>>     btrfs: skip logging directories already logged when logging all parents
>>>     btrfs: skip logging inodes already logged when logging new entries
>>>     btrfs: remove unnecessary check_parent_dirs_for_sync()
>>>     btrfs: make concurrent fsyncs wait less when waiting for a transaction commit
>>>
>>> Performance results, after applying all patches, are mentioned in the
>>> change log of the last patch.
>>>
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>
>> I'm having a hard time with this one.
>>
>> Previously we would commit the transaction if the inode was a regular file, that
>> was created in this current transaction, and had been renamed.  Now with this
>> patch you're only committing the transaction if we are a directory and were
>> renamed ourselves.  Before if you already had directories A and B and then did
>> something like
>>
>> echo "foo" > /mnt/test/A/blah
>> fsync(/mnt/test/A/blah);
>> fsync(/mnt/test/A);
>> mv /mnt/test/A/blah /mnt/test/B
>> fsync(/mnt/test/B/blah);
>>
>> we would commit the transaction on this second fsync, but with your patch we are
>> not.  I suppose that's keeping in line with how fsync is allowed to work, but
>> it's definitely a change in behavior from what we used to do.  Not sure if
>> that's good or not, I'll have to think about it.  Thanks,
> 
> Yes. Because of the rename (or a link), we will set last_unlink_trans
> to the current transaction, and when logging the file that will cause
> logging of all its old parents (A). That was added several years ago
> to fix corruptions, and it turned out to be needed later as well to
> ensure we have
> a behaviour similar to xfs and ext4 (and others) regarding strictly
> ordered metadata updates (I added several tests to fstests over the
> years for all the cases).
> There's also the fact that on replay we will delete any inode refs
> that aren't in the log (that one was added in commit 1f250e929a9c
> ("Btrfs: fix log replay failure after unlink and link combination").
> 
> For that example we also have A updated in the log by the rename. So
> we know the log is consistent.
> 
> So that's why the whole check_parents_for_sync() is not needed.
> 

Ok that's reasonable, thanks,

Josef


  reply	other threads:[~2021-01-27 15:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-27 10:34 [PATCH 0/7] btrfs: more performance improvements for dbench workloads fdmanana
2021-01-27 10:34 ` [PATCH 1/7] btrfs: remove unnecessary directory inode item update when deleting dir entry fdmanana
2021-01-27 10:34 ` [PATCH 2/7] btrfs: stop setting nbytes when filling inode item for logging fdmanana
2021-01-27 10:34 ` [PATCH 3/7] btrfs: avoid logging new ancestor inodes when logging new inode fdmanana
2021-01-27 10:34 ` [PATCH 4/7] btrfs: skip logging directories already logged when logging all parents fdmanana
2021-01-27 10:34 ` [PATCH 5/7] btrfs: skip logging inodes already logged when logging new entries fdmanana
2021-01-27 10:34 ` [PATCH 6/7] btrfs: remove unnecessary check_parent_dirs_for_sync() fdmanana
2021-01-27 15:23   ` Josef Bacik
2021-01-27 15:36     ` Filipe Manana
2021-01-27 15:42       ` Josef Bacik [this message]
2021-01-27 10:35 ` [PATCH 7/7] btrfs: make concurrent fsyncs wait less when waiting for a transaction commit fdmanana
2021-01-27 15:26   ` Josef Bacik
2021-01-27 15:42 ` [PATCH 0/7] btrfs: more performance improvements for dbench workloads Josef Bacik
2021-02-01 21:56 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3fec2b88-99e8-7aba-25bd-f746aed8ac7f@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.