From: Amir Goldstein <amir73il@gmail.com>
To: Theodore Tso <tytso@mit.edu>, Jan Kara <jack@suse.cz>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>,
Dave Chinner <david@fromorbit.com>, Chris Mason <clm@fb.com>,
Al Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org,
linux-api@vger.kernel.org
Subject: [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA
Date: Mon, 27 May 2019 20:26:55 +0300 [thread overview]
Message-ID: <20190527172655.9287-1-amir73il@gmail.com> (raw)
New link flags to request "atomic" link.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
Hi Guys,
Following our discussions on LSF/MM and beyond [1][2], here is
an RFC documentation patch.
Ted, I know we discussed limiting the API for linking an O_TMPFILE
to avert the hardlinks issue, but I decided it would be better to
document the hardlinks non-guaranty instead. This will allow me to
replicate the same semantics and documentation to renameat(2).
Let me know how that works out for you.
I also decided to try out two separate flags for data and metadata.
I do not find any of those flags very useful without the other, but
documenting them seprately was easier, because of the fsync/fdatasync
reference. In the end, we are trying to solve a social engineering
problem, so this is the least confusing way I could think of to describe
the new API.
First implementation of AT_ATOMIC_METADATA is expected to be
noop for xfs/ext4 and probably fsync for btrfs.
First implementation of AT_ATOMIC_DATA is expected to be
filemap_write_and_wait() for xfs/ext4 and probably fdatasync for btrfs.
Thoughts?
Amir.
[1] https://lwn.net/Articles/789038/
[2] https://lore.kernel.org/linux-fsdevel/CAOQ4uxjZm6E2TmCv8JOyQr7f-2VB0uFRy7XEp8HBHQmMdQg+6w@mail.gmail.com/
man2/link.2 | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/man2/link.2 b/man2/link.2
index 649ba00c7..15c24703e 100644
--- a/man2/link.2
+++ b/man2/link.2
@@ -184,6 +184,57 @@ See
.BR openat (2)
for an explanation of the need for
.BR linkat ().
+.TP
+.BR AT_ATOMIC_METADATA " (since Linux 5.x)"
+By default, a link operation followed by a system crash, may result in the
+new file name being linked with old inode metadata, such as out dated time
+stamps or missing extended attributes.
+One way to prevent this is to call
+.BR fsync (2)
+before linking the inode, but that involves flushing of volatile disk caches.
+
+A filesystem that accepts this flag will guaranty, that old inode metadata
+will not be exposed in the new linked name.
+Some filesystems may internally perform
+.BR fsync (2)
+before linking the inode to provide this guaranty,
+but often, filesystems will have a more efficient method to provide this
+guaranty without flushing volatile disk caches.
+
+A filesystem that accepts this flag does
+.BR NOT
+guaranty that the new file name will exist after a system crash, nor that the
+current inode metadata is persisted to disk.
+Specifically, if a file has hardlinks, the existance of the linked name after
+a system crash does
+.BR NOT
+guaranty that any of the other file names exist, nor that the last observed
+value of
+.I st_nlink
+(see
+.BR stat (2))
+has persisted.
+.TP
+.BR AT_ATOMIC_DATA " (since Linux 5.x)"
+By default, a link operation followed by a system crash, may result in the
+new file name being linked with old data or missing data.
+One way to prevent this is to call
+.BR fdatasync (2)
+before linking the inode, but that involves flushing of volatile disk caches.
+
+A filesystem that accepts this flag will guaranty, that old data
+will not be exposed in the new linked name.
+Some filesystems may internally perform
+.BR fsync (2)
+before linking the inode to provide this guaranty,
+but often, filesystems will have a more efficient method to provide this
+guaranty without flushing volatile disk caches.
+
+A filesystem that accepts this flag does
+.BR NOT
+guaranty that the new file name will exist after a system crash, nor that the
+current inode data is persisted to disk.
+.TP
.SH RETURN VALUE
On success, zero is returned.
On error, \-1 is returned, and
--
2.17.1
next reply other threads:[~2019-05-27 17:27 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-27 17:26 Amir Goldstein [this message]
2019-05-28 20:06 ` [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA Darrick J. Wong
2019-05-29 5:58 ` Amir Goldstein
2019-05-28 20:26 ` Theodore Ts'o
2019-05-29 5:38 ` Amir Goldstein
2019-05-31 15:21 ` Amir Goldstein
2019-05-31 16:41 ` Theodore Ts'o
2019-05-31 17:22 ` Amir Goldstein
2019-05-31 19:21 ` Theodore Ts'o
2019-05-31 22:45 ` Dave Chinner
2019-05-31 23:28 ` Dave Chinner
2019-06-01 8:01 ` Amir Goldstein
2019-06-03 4:25 ` Dave Chinner
2019-06-03 6:17 ` Amir Goldstein
2019-06-01 7:21 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190527172655.9287-1-amir73il@gmail.com \
--to=amir73il@gmail.com \
--cc=clm@fb.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).