linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vijay Chidambaram <vijay@cs.utexas.edu>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>,
	Amir Goldstein <amir73il@gmail.com>,
	lsf-pc@lists.linux-foundation.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Jan Kara <jack@suse.cz>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jayashree Mohan <jaya@cs.utexas.edu>,
	Filipe Manana <fdmanana@suse.com>, Chris Mason <clm@fb.com>,
	lwn@lwn.net
Subject: Re: [TOPIC] Extending the filesystem crash recovery guaranties contract
Date: Thu, 9 May 2019 00:02:17 -0500	[thread overview]
Message-ID: <CAHWVdUVViC_EJm3K7MfvfSQ+G1u=SX=RXAZWPYjZuS16JWxNEw@mail.gmail.com> (raw)
In-Reply-To: <20190509022013.GC7031@mit.edu>

On Wed, May 8, 2019 at 9:30 PM Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Thu, May 09, 2019 at 11:43:27AM +1000, Dave Chinner wrote:
> >
> > .... the whole point of SOMC is that allows filesystems to avoid
> > dragging external metadata into fsync() operations /unless/ there's
> > a user visible ordering dependency that must be maintained between
> > objects.  If all you are doing is stabilising file data in a stable
> > file/directory, then independent, incremental journaling of the
> > fsync operations on that file fit the SOMC model just fine.
>
> Well, that's not what Vijay's crash consistency guarantees state.  It
> guarantees quite a bit more than what you've written above.  Which is
> my concern.

The intention is to capture Dave's SOMC semantics. We can re-iterate
and re-phrase until we capture what Dave meant precisely. I am fairly
confident we can do this, given that Dave himself is participating and
helping us refine the text. So this doesn't seem like a reason not to
have documentation at all to me.

As we have stated on multiple times on this and other threads, the
intention is *not* to come up with one set of crash-recovery
guarantees that every Linux file system must abide by forever. Ted,
you keep repeating this, though we have never said this was our
intention.

The intention behind this effort is to simply document the
crash-recovery guarantees provided today by different Linux file
systems. Ted, you question why this is required at all, and why we
simply can't use POSIX and man pages. The answer:

1. POSIX is vague. Not persisting data to stable media on fsync is
also allowed in POSIX (but no Linux file system actually does this),
so its not very useful in terms of understanding what crash-recovery
guarantees file systems actually provide. Given that all Linux file
systems provide something more than POSIX, the natural question to ask
is what do they provide? We understood this from working on
CrashMonkey, and we wanted to document it.
2. Other parts of the Linux kernel have much better documentation,
even though they similarly want to provide freedom for developers to
optimize and change internal implementation. I don't think
documentation and freedom to change internals are mutually exclusive.
3. XFS provides SOMC semantics, and btrfs developers have stated they
want to provide SOMC as well. F2FS developers have a mode in which
they seek to provide SOMC semantics. Given all this, it seemed prudent
to document SOMC.
4. Apart from developers, a document like this would also help
academic researchers understand the current state-of-the-art in
crash-recovery guarantees and the different choices made by different
file systems. It is non-trivial to understand this without
documentation.

FWIW, I think the position of "if we don't write it down, application
developers can't depend on it" is wrong. Even with nothing written
down, developers noticed they could skip fsync() in ext3 when
atomically updating files with rename(). This lead to the whole ext4
rename-and-delayed-allocation problem. The much better path, IMO, is
to document the current set of guarantees given by different file
systems, and talk about what is intended and what is not. This would
give application developers much better guidance in writing
applications.

If ext4 wants to develop incremental fsync and introduce a new set of
semantics that is different from SOMC and much closer to minimal
POSIX, I don't think the documentation affects that at all. As Dave
notes, diversity is good! Documentation is also good :)

That being said, I think I'll stop our push to get this documented
inside the Linux kernel at this point. We got useful comments from
Dave, Amir, and others, so we will incorporate those comments and put
up the documentation on a University of Texas web page. If someone
else wants to carry on and get this merged, you are welcome to do so
:)

  parent reply	other threads:[~2019-05-09  5:02 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-27 21:00 [TOPIC] Extending the filesystem crash recovery guaranties contract Amir Goldstein
2019-05-02 16:12 ` Amir Goldstein
2019-05-02 17:11   ` Vijay Chidambaram
2019-05-02 17:39     ` Amir Goldstein
2019-05-03  2:30       ` Theodore Ts'o
2019-05-03  3:15         ` Vijay Chidambaram
2019-05-03  9:45           ` Theodore Ts'o
2019-05-04  0:17             ` Vijay Chidambaram
2019-05-04  1:43               ` Theodore Ts'o
2019-05-07 18:38                 ` Jan Kara
2019-05-03  4:16         ` Amir Goldstein
2019-05-03  9:58           ` Theodore Ts'o
2019-05-03 14:18             ` Amir Goldstein
2019-05-09  2:36             ` Dave Chinner
2019-05-09  1:43         ` Dave Chinner
2019-05-09  2:20           ` Theodore Ts'o
2019-05-09  2:58             ` Dave Chinner
2019-05-09  3:31               ` Theodore Ts'o
2019-05-09  5:19                 ` Darrick J. Wong
2019-05-09  5:02             ` Vijay Chidambaram [this message]
2019-05-09  5:37               ` Darrick J. Wong
2019-05-09 15:46               ` Theodore Ts'o
2019-05-09  8:47           ` Amir Goldstein
2019-05-02 21:05   ` Darrick J. Wong
2019-05-02 22:19     ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHWVdUVViC_EJm3K7MfvfSQ+G1u=SX=RXAZWPYjZuS16JWxNEw@mail.gmail.com' \
    --to=vijay@cs.utexas.edu \
    --cc=amir73il@gmail.com \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=fdmanana@suse.com \
    --cc=jack@suse.cz \
    --cc=jaya@cs.utexas.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=lwn@lwn.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).