linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [TOPIC] Extending the filesystem crash recovery guaranties contract
@ 2019-04-27 21:00 Amir Goldstein
  2019-05-02 16:12 ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2019-04-27 21:00 UTC (permalink / raw)
  To: lsf-pc
  Cc: Dave Chinner, Theodore Tso, Jan Kara, linux-fsdevel,
	Jayashree Mohan, Vijaychidambaram Velayudhan Pillai,
	Filipe Manana

Suggestion for another filesystems track topic.

Some of you may remember the emotional(?) discussions that ensued
when the crashmonkey developers embarked on a mission to document
and verify filesystem crash recovery guaranties:

https://lore.kernel.org/linux-fsdevel/CAOQ4uxj8YpYPPdEvAvKPKXO7wdBg6T1O3osd6fSPFKH9j=i2Yg@mail.gmail.com/

There are two camps among filesystem developers and every camp
has good arguments for wanting to document existing behavior and for
not wanting to document anything beyond "use fsync if you want any guaranty".

I would like to take a suggestion proposed by Jan on a related discussion:
https://lore.kernel.org/linux-fsdevel/CAOQ4uxjQx+TO3Dt7TA3ocXnNxbr3+oVyJLYUSpv4QCt_Texdvw@mail.gmail.com/

and make a proposal that may be able to meet the concerns of
both camps.

The proposal is to add new APIs which communicate
crash consistency requirements of the application to the filesystem.

Example API could look like this:
renameat2(..., RENAME_METADATA_BARRIER | RENAME_DATA_BARRIER)
It's just an example. The API could take another form and may need
more barrier types (I proposed to use new file_sync_range() flags).

The idea is simple though.
METADATA_BARRIER means all the inode metadata will be observed
after crash if rename is observed after crash.
DATA_BARRIER same for file data.
We may also want a "ALL_METADATA_BARRIER" and/or
"METADATA_DEPENDENCY_BARRIER" to more accurately
describe what SOMC guaranties actually provide today.

The implementation is also simple. filesystem that currently
have SOMC behavior don't need to do anything to respect
METADATA_BARRIER and only need to call
filemap_write_and_wait_range() to respect DATA_BARRIER.
filesystem developers are thus not tying their hands w.r.t future
performance optimizations for operations that are not explicitly
requesting a barrier.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-05-09 15:47 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-27 21:00 [TOPIC] Extending the filesystem crash recovery guaranties contract Amir Goldstein
2019-05-02 16:12 ` Amir Goldstein
2019-05-02 17:11   ` Vijay Chidambaram
2019-05-02 17:39     ` Amir Goldstein
2019-05-03  2:30       ` Theodore Ts'o
2019-05-03  3:15         ` Vijay Chidambaram
2019-05-03  9:45           ` Theodore Ts'o
2019-05-04  0:17             ` Vijay Chidambaram
2019-05-04  1:43               ` Theodore Ts'o
2019-05-07 18:38                 ` Jan Kara
2019-05-03  4:16         ` Amir Goldstein
2019-05-03  9:58           ` Theodore Ts'o
2019-05-03 14:18             ` Amir Goldstein
2019-05-09  2:36             ` Dave Chinner
2019-05-09  1:43         ` Dave Chinner
2019-05-09  2:20           ` Theodore Ts'o
2019-05-09  2:58             ` Dave Chinner
2019-05-09  3:31               ` Theodore Ts'o
2019-05-09  5:19                 ` Darrick J. Wong
2019-05-09  5:02             ` Vijay Chidambaram
2019-05-09  5:37               ` Darrick J. Wong
2019-05-09 15:46               ` Theodore Ts'o
2019-05-09  8:47           ` Amir Goldstein
2019-05-02 21:05   ` Darrick J. Wong
2019-05-02 22:19     ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).