cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Jeff Layton <jlayton@kernel.org>
Cc: Latchesar Ionkov <lucho@ionkov.net>,
	Martin Brandenburg <martin@omnibond.com>,
	Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
	Jan Kara <jack@suse.cz>,
	linux-xfs@vger.kernel.org, "Darrick J. Wong" <djwong@kernel.org>,
	Dominique Martinet <asmadeus@codewreck.org>,
	Christian Schoenebeck <linux_oss@crudebyte.com>,
	linux-unionfs@vger.kernel.org,
	David Howells <dhowells@redhat.com>, Chris Mason <clm@fb.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Hans de Goede <hdegoede@redhat.com>,
	Marc Dionne <marc.dionne@auristor.com>,
	codalist@coda.cs.cmu.edu, linux-afs@lists.infradead.org,
	linux-mtd@lists.infradead.org,
	Mike Marshall <hubcap@omnibond.com>,
	Paulo Alcantara <pc@manguebit.com>, Amir Goldstein <l@gmail.com>,
	Eric Van Hensbergen <ericvh@kernel.org>,
	bug-gnulib@gnu.org, Miklos Szeredi <miklos@szeredi.hu>,
	Richard Weinberger <richard@nod.at>,
	Mark Fasheh <mark@fasheh.com>, Hugh Dickins <hughd@google.com>,
	Tyler Hicks <code@tyhicks.com>,
	cluster-devel@redhat.com, coda@cs.cmu.edu, linux-mm@kvack.org,
	Gao Xiang <xiang@kernel.org>, Iurii Zaikin <yzaikin@google.com>,
	Namjae Jeon <linkinjeon@kernel.org>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Xi Ruoyao <xry111@linuxfromscratch.org>,
	Shyam Prasad N <sprasad@microsoft.com>,
	ecryptfs@vger.kernel.org, Kees Cook <keescook@chromium.org>,
	ocfs2-devel@lists.linux.dev, linux-cifs@vger.kernel.org,
	Chao Yu <chao@kernel.org>,
	linux-erofs@lists.ozlabs.org, Josef Bacik <josef@toxicpanda.com>,
	Tom Talpey <tom@talpey.com>, Tejun Heo <tj@kernel.org>,
	Yue Hu <huyue2@coolpad.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Ronnie Sahlberg <ronniesahlberg@gmail.com>,
	David Sterba <dsterba@suse.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
	ceph-devel@vger.kernel.org, Xiubo Li <xiubli@redhat.com>,
	Ilya Dryomov <idryomov@gmail.com>,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
	Jan Harkes <jaharkes@cs.cmu.edu>,
	Christian Brauner <brauner@kernel.org>,
	linux-ext4@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Joseph Qi <joseph.qi@linux.alibaba.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	v9fs@lists.linux.dev, ntfs3@lists.linux.dev,
	samba-technical@lists.samba.org, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	Steve French <sfrench@samba.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Jeffle Xu <jefflexu@linux.alibaba.com>,
	devel@lists.orangefs.org, Anna Schumaker <anna@kernel.org>,
	Jan Kara <jack@suse.com>,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Sungjong Seo <sj1557.seo@samsung.com>,
	Bruno Haible <bruno@clisp.org>,
	linux-nfs@vger.kernel.org, linux-btrfs@vger.kernel.org,
	Joel Becker <jlbec@evilplan.org>
Subject: Re: [Cluster-devel] [PATCH v7 12/13] ext4: switch to multigrain timestamps
Date: Wed, 20 Sep 2023 17:45:27 +0200	[thread overview]
Message-ID: <20230920154527.pkwot4nu2nzrnamd@quack3> (raw)
In-Reply-To: <ca82af4d6a72d7f83223c0ddd74fd9f7bcfa96b1.camel@kernel.org>

On Wed 20-09-23 10:12:03, Jeff Layton wrote:
> On Wed, 2023-09-20 at 14:48 +0200, Jan Kara wrote:
> > On Wed 20-09-23 06:35:18, Jeff Layton wrote:
> > > On Wed, 2023-09-20 at 12:17 +0200, Jan Kara wrote:
> > > > If I were a sysadmin, I'd rather opt for something like
> > > > finegrained timestamps + lazytime (if I needed the finegrained timestamps
> > > > functionality). That should avoid the IO overhead of finegrained timestamps
> > > > as well and I'd know I can have problems with timestamps only after a
> > > > system crash.
> > > 
> > > > I've just got another idea how we could solve the problem: Couldn't we
> > > > always just report coarsegrained timestamp to userspace and provide access
> > > > to finegrained value only to NFS which should know what it's doing?
> > > > 
> > > 
> > > I think that'd be hard. First of all, where would we store the second
> > > timestamp? We can't just truncate the fine-grained ones to come up with
> > > a coarse-grained one. It might also be confusing having nfsd and local
> > > filesystems present different attributes.
> > 
> > So what I had in mind (and I definitely miss all the NFS intricacies so the
> > idea may be bogus) was that inode->i_ctime would be maintained exactly as
> > is now. There will be new (kernel internal at least for now) STATX flag
> > STATX_MULTIGRAIN_TS. fill_mg_cmtime() will return timestamp truncated to
> > sb->s_time_gran unless STATX_MULTIGRAIN_TS is set. Hence unless you set
> > STATX_MULTIGRAIN_TS, there is no difference in the returned timestamps
> > compared to the state before multigrain timestamps were introduced. With
> > STATX_MULTIGRAIN_TS we return full precision timestamp as stored in the
> > inode. Then NFS in fh_fill_pre_attrs() and fh_fill_post_attrs() needs to
> > make sure STATX_MULTIGRAIN_TS is set when calling vfs_getattr() to get
> > multigrain time.
> 
> > I agree nfsd may now be presenting slightly different timestamps than user
> > is able to see with stat(2) directly on the filesystem. But is that a
> > problem? Essentially it is a similar solution as the mgtime mount option
> > but now sysadmin doesn't have to decide on filesystem mount how to report
> > timestamps but the stat caller knowingly opts into possibly inconsistent
> > (among files) but high precision timestamps. And in the particular NFS
> > usecase where stat is called all the time anyway, timestamps will likely
> > even be consistent among files.
> > 
> 
> I like this idea...
> 
> Would we also need to raise sb->s_time_gran to something corresponding
> to HZ on these filesystems?

I was actually confused a bit about how timestamp_truncate() works. The
jiffie granularity is just direct consequence of current_time() using
ktime_get_coarse_real_ts64() and not of timestamp_truncate().
sb->s_time_gran seems to be more about the on-disk format so it doesn't
seem like a great idea to touch it. So probably we can just truncate
timestamps in generic_fillattr() to HZ granularity unconditionally.

> If we truncate the timestamps at a granularity corresponding to HZ before
> presenting them via statx and the like then that should work around the
> problem with programs that compare timestamps between inodes.

Exactly.

> With NFSv4, when a filesystem doesn't report a STATX_CHANGE_COOKIE, nfsd
> will fake one up using the ctime. It's fine for that to use a full fine-
> grained timestamp since we don't expect to be able to compare that value
> with one of a different inode.

Yes.

> I think we'd want nfsd to present the mtime/ctime values as truncated,
> just like we would with a local fs. We could hit the same problem of an
> earlier-looking timestamp with NFS if we try to present the actual fine-
> grained values to the clients. IOW, I'm convinced that we need to avoid
> this behavior in most situations.

I wasn't sure if there's a way to do this within NFS - i.e., if the value
communicated via NFSv3 protocol (I know v4 has a special change cookie
field for it) that gets used for detecting need to revalidate file contents
isn't the one presented to client's userspace as ctime. If there's a way to
do this then great, I'm all for presenting truncated timestamps even for
NFS.

> If we do this, then we technically don't need the mount option either.

Yes, that was my hope.

> We could still add it though, and have it govern whether fill_mg_cmtime
> truncates the timestamps before storing them in the kstat.

Well, if we decide these timestamps are useful for userspace as well, I'd
rather make that a userspace visible STATX flag than a mount option. So
applications aware of the pitfalls can get high precision timestamps
without possibly breaking unaware applications.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


  reply	other threads:[~2023-09-20 15:51 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-07 19:38 [Cluster-devel] [PATCH v7 00/13] fs: implement multigrain timestamps Jeff Layton
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 01/13] fs: remove silly warning from current_time Jeff Layton
2023-08-08  9:05   ` Jan Kara
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 02/13] fs: pass the request_mask to generic_fillattr Jeff Layton
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 03/13] fs: drop the timespec64 arg from generic_update_time Jeff Layton
2023-08-08  9:25   ` Jan Kara
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 04/13] btrfs: have it use inode_update_timestamps Jeff Layton
2023-08-08  9:26   ` Jan Kara
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 05/13] fat: make fat_update_time get its own timestamp Jeff Layton
2023-08-08  9:32   ` Jan Kara
2023-08-09  7:08     ` Christian Brauner
2023-08-09  8:37   ` OGAWA Hirofumi
2023-08-09  8:41     ` OGAWA Hirofumi
2023-08-09 10:10     ` Jeff Layton
2023-08-09 13:36       ` OGAWA Hirofumi
2023-08-09 14:22         ` Jeff Layton
2023-08-09 14:44           ` OGAWA Hirofumi
2023-08-09 14:52             ` OGAWA Hirofumi
2023-08-09 15:00         ` Jan Kara
2023-08-09 15:17           ` OGAWA Hirofumi
2023-08-09 16:30             ` Jeff Layton
2023-08-09 17:44               ` OGAWA Hirofumi
2023-08-09 17:59                 ` Jeff Layton
2023-08-09 18:31                   ` OGAWA Hirofumi
2023-08-09 19:04                     ` Jeff Layton
2023-08-09 20:14                       ` OGAWA Hirofumi
2023-08-09 22:07                         ` Jeff Layton
2023-08-09 22:37                           ` OGAWA Hirofumi
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 06/13] ubifs: have ubifs_update_time use inode_update_timestamps Jeff Layton
2023-08-08  9:37   ` Jan Kara
2023-08-09  7:06     ` Christian Brauner
2023-08-09  8:23       ` Jan Kara
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 07/13] xfs: have xfs_vn_update_time gets its own timestamp Jeff Layton
2023-08-08  9:39   ` Jan Kara
2023-08-09  7:04     ` Christian Brauner
2023-08-09 15:57   ` Darrick J. Wong
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 08/13] fs: drop the timespec64 argument from update_time Jeff Layton
2023-08-08  9:45   ` Jan Kara
2023-08-09 12:31   ` Christian Brauner
2023-08-09 18:38     ` Mike Marshall
2023-08-09 19:05       ` Jeff Layton
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 09/13] fs: add infrastructure for multigrain timestamps Jeff Layton
2023-08-08 10:02   ` Jan Kara
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 10/13] tmpfs: add support " Jeff Layton
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 11/13] xfs: switch to " Jeff Layton
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 12/13] ext4: " Jeff Layton
2023-09-19  7:05   ` Xi Ruoyao
2023-09-19 11:04     ` Jan Kara
2023-09-19 11:33       ` Jeff Layton
2023-09-19 14:52         ` Bruno Haible
2023-09-19 16:31           ` Jeff Layton
2023-09-19 20:10             ` Paul Eggert
2023-09-19 20:46               ` Jeff Layton
2023-09-20  8:41             ` Christian Brauner
2023-09-20  8:50               ` Xi Ruoyao
2023-09-20  9:56               ` Jeff Layton
2023-09-20 10:17               ` Jan Kara
2023-09-20 10:30                 ` Christian Brauner
2023-09-20 13:03                   ` Jan Kara
2023-09-20 10:35                 ` Jeff Layton
2023-09-20 11:48                   ` Christian Brauner
2023-09-20 11:56                     ` Jeff Layton
2023-09-20 12:08                       ` Christian Brauner
2023-09-20 12:26                         ` Jeff Layton
2023-09-20 12:30                           ` Christian Brauner
2023-09-20 13:57                     ` Chuck Lever III
2023-09-20 14:53                       ` Christian Brauner
2023-09-20 15:29                         ` Jeff Layton
2023-09-20 15:30                         ` Jan Kara
2023-09-20 12:48                   ` Jan Kara
2023-09-20 14:12                     ` Jeff Layton
2023-09-20 15:45                       ` Jan Kara [this message]
2023-09-20 12:48                   ` Bruno Haible
2023-09-20  9:58             ` Jan Kara
2023-08-07 19:38 ` [Cluster-devel] [PATCH v7 13/13] btrfs: convert " Jeff Layton
2023-08-08 10:05   ` Jan Kara
2023-08-09  7:09 ` [Cluster-devel] [PATCH v7 00/13] fs: implement " Christian Brauner
2023-09-04 18:11 ` [Cluster-devel] [f2fs-dev] " patchwork-bot+f2fs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230920154527.pkwot4nu2nzrnamd@quack3 \
    --to=jack@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=almaz.alexandrovich@paragon-software.com \
    --cc=anna@kernel.org \
    --cc=asmadeus@codewreck.org \
    --cc=brauner@kernel.org \
    --cc=bruno@clisp.org \
    --cc=bug-gnulib@gnu.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chao@kernel.org \
    --cc=clm@fb.com \
    --cc=cluster-devel@redhat.com \
    --cc=coda@cs.cmu.edu \
    --cc=codalist@coda.cs.cmu.edu \
    --cc=code@tyhicks.com \
    --cc=devel@lists.orangefs.org \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=dsterba@suse.com \
    --cc=ecryptfs@vger.kernel.org \
    --cc=ericvh@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hdegoede@redhat.com \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=hubcap@omnibond.com \
    --cc=hughd@google.com \
    --cc=huyue2@coolpad.com \
    --cc=idryomov@gmail.com \
    --cc=jack@suse.com \
    --cc=jaegeuk@kernel.org \
    --cc=jaharkes@cs.cmu.edu \
    --cc=jefflexu@linux.alibaba.com \
    --cc=jlayton@kernel.org \
    --cc=jlbec@evilplan.org \
    --cc=josef@toxicpanda.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=keescook@chromium.org \
    --cc=l@gmail.com \
    --cc=linkinjeon@kernel.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=linux_oss@crudebyte.com \
    --cc=lucho@ionkov.net \
    --cc=marc.dionne@auristor.com \
    --cc=mark@fasheh.com \
    --cc=martin@omnibond.com \
    --cc=mcgrof@kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=ntfs3@lists.linux.dev \
    --cc=ocfs2-devel@lists.linux.dev \
    --cc=pc@manguebit.com \
    --cc=richard@nod.at \
    --cc=ronniesahlberg@gmail.com \
    --cc=samba-technical@lists.samba.org \
    --cc=senozhatsky@chromium.org \
    --cc=sfrench@samba.org \
    --cc=sj1557.seo@samsung.com \
    --cc=sprasad@microsoft.com \
    --cc=tj@kernel.org \
    --cc=tom@talpey.com \
    --cc=trond.myklebust@hammerspace.com \
    --cc=tytso@mit.edu \
    --cc=v9fs@lists.linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xiang@kernel.org \
    --cc=xiubli@redhat.com \
    --cc=xry111@linuxfromscratch.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).