linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Vijay Chidambaram <vijay@cs.utexas.edu>,
	lsf-pc@lists.linux-foundation.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Jan Kara <jack@suse.cz>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jayashree Mohan <jaya@cs.utexas.edu>,
	Filipe Manana <fdmanana@suse.com>, Chris Mason <clm@fb.com>,
	lwn@lwn.net
Subject: Re: [TOPIC] Extending the filesystem crash recovery guaranties contract
Date: Thu, 9 May 2019 11:43:27 +1000	[thread overview]
Message-ID: <20190509014327.GT1454@dread.disaster.area> (raw)
In-Reply-To: <20190503023043.GB23724@mit.edu>

On Thu, May 02, 2019 at 10:30:43PM -0400, Theodore Ts'o wrote:
> On Thu, May 02, 2019 at 01:39:47PM -0400, Amir Goldstein wrote:
> > I am not saying there is no room for a document that elaborates on those
> > guaranties. I personally think that could be useful and certainly think that
> > your group's work for adding xfstest coverage for API guaranties is useful.
> 
> Again, here is my concern.  If we promise that ext4 will always obey
> Dave Chinner's SOMC model, it would forever rule out Daejun Park and
> Dongkun Shin's "iJournaling: Fine-grained journaling for improving the
> latency of fsync system call"[1] published in Usenix ATC 2017.

No, it doesn't rule that out at all.

In a SOMC model, incremental journalling is just fine when there are
no external dependencies on the thing being fsync'd.  If you have
other dependencies (e.g. the file has just be created and so the dir
it dirty, too) then fsync would need to do the whole shebang, but
otherwise....

> So if the crash consistency guarantees forbids future innovations
> where applications might *want* a fast fsync() that doesn't drag
> unrelated inodes into the persistence guarantees,

.... the whole point of SOMC is that allows filesystems to avoid
dragging external metadata into fsync() operations /unless/ there's
a user visible ordering dependency that must be maintained between
objects.  If all you are doing is stabilising file data in a stable
file/directory, then independent, incremental journaling of the
fsync operations on that file fit the SOMC model just fine.

> is that really what
> we want?  Do we want to forever rule out various academic
> investigations such as Park and Shin's because "it violates the crash
> consistency recovery model"?  Especially if some applications don't
> *need* the crash consistency model?

Stop with the silly inflammatory hyperbole already, Ted. It is not
necessary.

> P.P.S.  One of the other discussions that did happen during the main
> LSF/MM File system session, and for which there was general agreement
> across a number of major file system maintainers, was a fsync2()
> system call which would take a list of file descriptors (and flags)
> that should be fsync'ed.

Hmmmm, that wasn't on the agenda, and nobody has documented it as
yet.

> The semantics would be that when the
> fsync2() successfully returns, all of the guarantees of fsync() or
> fdatasync() requested by the list of file descriptors and flags would
> be satisfied.  This would allow file systems to more optimally fsync a
> batch of files, for example by implementing data integrity writebacks
> for all of the files, followed by a single journal commit to guarantee
> persistence for all of the metadata changes.

What happens when you get writeback errors on only some of the fds?
How do you report the failures and what do you do with the journal
commit on partial success?

Of course, this ignores the elephant in the room: applications can
/already do this/ using AIO_FSYNC and have individual error status
for each fd. Not to mention that filesystems already batch
concurrent fsync journal commits into a single operation. I'm not
seeing the point of a new syscall to do this right now....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2019-05-09  1:43 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-27 21:00 [TOPIC] Extending the filesystem crash recovery guaranties contract Amir Goldstein
2019-05-02 16:12 ` Amir Goldstein
2019-05-02 17:11   ` Vijay Chidambaram
2019-05-02 17:39     ` Amir Goldstein
2019-05-03  2:30       ` Theodore Ts'o
2019-05-03  3:15         ` Vijay Chidambaram
2019-05-03  9:45           ` Theodore Ts'o
2019-05-04  0:17             ` Vijay Chidambaram
2019-05-04  1:43               ` Theodore Ts'o
2019-05-07 18:38                 ` Jan Kara
2019-05-03  4:16         ` Amir Goldstein
2019-05-03  9:58           ` Theodore Ts'o
2019-05-03 14:18             ` Amir Goldstein
2019-05-09  2:36             ` Dave Chinner
2019-05-09  1:43         ` Dave Chinner [this message]
2019-05-09  2:20           ` Theodore Ts'o
2019-05-09  2:58             ` Dave Chinner
2019-05-09  3:31               ` Theodore Ts'o
2019-05-09  5:19                 ` Darrick J. Wong
2019-05-09  5:02             ` Vijay Chidambaram
2019-05-09  5:37               ` Darrick J. Wong
2019-05-09 15:46               ` Theodore Ts'o
2019-05-09  8:47           ` Amir Goldstein
2019-05-02 21:05   ` Darrick J. Wong
2019-05-02 22:19     ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190509014327.GT1454@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=amir73il@gmail.com \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=fdmanana@suse.com \
    --cc=jack@suse.cz \
    --cc=jaya@cs.utexas.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=lwn@lwn.net \
    --cc=tytso@mit.edu \
    --cc=vijay@cs.utexas.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).