linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sage Weil <sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org>
To: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
Cc: Trond Myklebust
	<trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>,
	Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Alexander Viro
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	Linux FS-devel Mailing List
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux Kernel Mailing List
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux API Mailing List
	<linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH RFC] vfs: add a O_NOMTIME flag
Date: Fri, 8 May 2015 15:24:11 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1505081517470.28239@cobra.newdream.net> (raw)
In-Reply-To: <20150508221325.GM4327@dastard>

On Sat, 9 May 2015, Dave Chinner wrote:
> On Thu, May 07, 2015 at 09:23:24PM -0400, Trond Myklebust wrote:
> > On Thu, May 7, 2015 at 9:01 PM, Sage Weil <sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org> wrote:
> > > On Thu, 7 May 2015, Zach Brown wrote:
> > >> On Thu, May 07, 2015 at 10:26:17AM +1000, Dave Chinner wrote:
> > >> > On Wed, May 06, 2015 at 03:00:12PM -0700, Zach Brown wrote:
> > >> > > The criteria for using O_NOMTIME is the same as for using O_NOATIME:
> > >> > > owning the file or having the CAP_FOWNER capability.  If we're not
> > >> > > comfortable allowing owners to prevent mtime/ctime updates then we
> > >> > > should add a tunable to allow O_NOMTIME.  Maybe a mount option?
> > >> >
> > >> > I dislike "turn off safety for performance" options because Joe
> > >> > SpeedRacer will always select performance over safety.
> > >>
> > >> Well, for ceph there's no safety concern.  They never use cmtime in
> > >> these files.
> > >>
> > >> So are you suggesting not implementing this and making them rework their
> > >> IO paths to avoid the fs maintaining mtime so that we don't give Joe
> > >> Speedracer more rope?  Or are we talking about adding some speed bumps
> > >> that ceph can flip on that might give Joe Speedracer pause?
> > >
> > > I think this is the fundamental question: who do we give the ammunition
> > > to, the user or app writer, or the sysadmin?
> > >
> > > One might argue that we gave the user a similar power with O_NOATIME (the
> > > power to break applications that assume atime is accurate).  Here we give
> > > developers/users the power to not update mtime and suffer the consequences
> > > (like, obviously, breaking mtime-based backups).  It should be pretty
> > > obvious to anyone using the flag what the consequences are.
> > >
> > > Note that we can suffer similar lapses in mtime with fdatasync followed by
> > > a system crash.  And as Andy points out it's semi-broken for writable
> > > mmap.  The crash case is obviously a slightly different thing, but the
> > > idea that mtime can't always be trusted certainly isn't crazy talk.
> > >
> > > Or, we can be conservative and require a mount option so that the admin
> > > has to explicitly allow behavior that might break some existing
> > > assumptions about mtime/ctime ('-o user_noatime' I guess?).
> > >
> > > I'm happy either way, so long as in the end an unprivileged ceph daemon
> > > avoids the useless work.  In our case we always own the entire mount/disk,
> > > so a mount option is just fine.
> > >
> > 
> > So, what is the expectation here for filesystems that cannot support
> > this flag? NFSv3 in particular would break pretty catastrophically if
> > someone decided on a whim to turn off mtime: they will have turned off
> > the client's ability to detect cache incoherencies.
> 
> It's worse than that, now that I think about it. I think nomtime
> will break nfsv4 as the I_VERSION check is done *after* the
> NO[C]MTIME checks. e.g. the atomic change count used to detect file
> changes is only updated during the mtime update on write() calls in
> XFS. i.e. when the timestamp is changed, a transaction to change
> mtime is run, and that transaction commit bumps the change count.
> 
> So cutting out mtime updates at the VFS will prevent XFS and other
> I_VERSION aware filesystems from updating the change count that
> NFSv4 clients rely on to detect foreign data changes in a file.
> 
> Not sure what to do here, because the current NOCMTIME
> implementation intentionally cuts out the timestamp update because
> it's usage is fully invisible IO. i.e. it is used by utilities like
> xfs_fsr and HSMs to move data into and out of files without the
> application being able to detect the data movement in any way. These
> are not data modification operations, though - the file contents as
> read by the application do not change despite the fact we are moving
> data in and out of the file. In this case we don't want timestamps
> or change counters to change on the data movement, so I think we've
> actually got a difference in behaviour here between O_NOMTIME and
> O_NOCMTIME, right?
> 
> i.e. for nfsv4 sanity O_NOMTIME still needs to bump I_VERSION on
> write, just not modify the timestamp? In which case, not modifying
> the timestamps gains us nothing, because the inode is still dirtied?

Right: if we dirty the inode we've defeated the purpose of the patch.

> The list of caveats on O_NOMTIME seems to be growing...

...and remain consistent with our goals.  We couldn't care less if NFS or 
backup software or anything else doesn't notice these changes.  This is 
private data that is wholly managed by the ceph daemon.  The goal is to 
derive *some* value from the file system and avoid reimplementing it in 
userspace (without the bits we don't need).

I'm sure you realize what we're try to achieve is the same "invisible IO" 
that the XFS open by handle ioctls do by default.  Would you be more 
comfortable if this option where only available to the generic 
open_by_handle syscall, and not to open(2)?

sage

  reply	other threads:[~2015-05-08 22:24 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-06 22:00 [PATCH RFC] vfs: add a O_NOMTIME flag Zach Brown
2015-05-06 22:14 ` Trond Myklebust
2015-05-06 22:19   ` Sage Weil
     [not found]     ` <alpine.DEB.2.00.1505061515550.28239-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-05-06 22:41       ` Zach Brown
     [not found]         ` <20150506224113.GA17282-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
2015-05-06 22:46           ` Sage Weil
2015-05-06 23:21       ` Theodore Ts'o
     [not found] ` <1430949612-21356-1-git-send-email-zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-07  0:26   ` Dave Chinner
2015-05-07 17:20     ` Zach Brown
2015-05-07 18:43       ` Zach Brown
2015-05-08  1:01       ` Sage Weil
     [not found]         ` <alpine.DEB.2.00.1505071752520.28239-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-05-08  1:23           ` Trond Myklebust
2015-05-08 15:19             ` Sage Weil
     [not found]             ` <CAHQdGtQjMHA8rVPkggB2zMz=k3O667+APH_1EY_2FtYmHL7-hw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-08 22:13               ` Dave Chinner
2015-05-08 22:24                 ` Sage Weil [this message]
     [not found]                   ` <alpine.DEB.2.00.1505081517470.28239-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-05-10 23:13                     ` Trond Myklebust
     [not found]                       ` <CAHQdGtTFTN2XuvmarFZ9HPQV=cuhh7FosdHSrJME_U4htr=i8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-11  7:31                         ` Dave Chinner
2015-05-11 16:39                           ` Sage Weil
2015-05-11 17:12                             ` Trond Myklebust
     [not found]                               ` <CAHQdGtT3rCf-ycAYw-=7HGaemg1+HfY8sw3+kb54VHONxDyP3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-11 17:30                                 ` Sage Weil
2015-05-12  1:21                                   ` Dave Chinner
2015-05-12 23:12                                     ` Sage Weil
2015-05-13  0:57                                       ` Dave Chinner
     [not found]                                   ` <alpine.DEB.2.00.1505111020120.28239-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-05-12 13:41                                     ` John Stoffel
2015-05-11 14:47                       ` Theodore Ts'o
     [not found]                         ` <20150511144719.GA14088-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2015-05-11 16:24                           ` Sage Weil
     [not found]                             ` <alpine.DEB.2.00.1505110920520.28239-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-05-11 23:10                               ` Theodore Ts'o
2015-05-12  5:08                                 ` Kevin Easton
     [not found]                                   ` <20150512050821.GA9404-Qr0l8DEfScZEV+tojptmR0B+6BGkLq7r@public.gmane.org>
2015-05-12 11:45                                     ` Austin S Hemmelgarn
     [not found]                                       ` <5551E7EB.8040301-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-05-12 13:54                                         ` John Stoffel
2015-05-12 14:36                                           ` J. Bruce Fields
     [not found]                                             ` <20150512143637.GA6370-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-05-12 14:53                                               ` Austin S Hemmelgarn
2015-05-12 21:51                                                 ` Dave Chinner
2015-05-13 15:16                                                   ` Austin S Hemmelgarn
2015-05-12 22:39                                               ` NeilBrown
     [not found]                                                 ` <20150513083951.5eb63bc0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2015-07-14 13:13                                                   ` Pavel Machek
2015-07-15  4:54                                                     ` NeilBrown
2015-07-22 13:47                                                       ` Pavel Machek
2015-05-12 21:35                                     ` Sage Weil
2015-05-13 12:32                               ` Jan Kara
2015-05-08 14:43           ` Austin S Hemmelgarn
2015-05-08 17:11           ` Zach Brown
2015-05-08 14:29         ` John Stoffel
2015-07-14 11:50           ` Pavel Machek
     [not found]       ` <20150507172053.GA659-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
2015-05-07 19:09         ` Richard Weinberger
2015-05-07 19:53           ` Andy Lutomirski
     [not found]             ` <554BC4D8.9010507@nod.at>
2015-05-07 20:06               ` Andy Lutomirski
     [not found]             ` <CALCETrWNDMq0nK3ac-uZweV5BKK_yWTQHH5D0YkyEu7bcONo9g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-08  2:42               ` Dave Chinner
2015-07-14 11:44             ` Pavel Machek
2015-05-08  2:37         ` Dave Chinner
2015-05-08  3:24           ` Andy Lutomirski
     [not found]             ` <CALCETrUksu5ZB4QBfC8DMwYO2OFjfPW2eWsTweZGN_gybzcsmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-08 14:44               ` Eric Sandeen
2015-05-11 20:36                 ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1505081517470.28239@cobra.newdream.net \
    --to=sage-bntbu8nrog7k1umjsbkqmq@public.gmane.org \
    --cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    --cc=zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).