linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@arndb.de>
To: Dave Chinner <david@fromorbit.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	joseph@codesourcery.com, john.stultz@linaro.org,
	hch@infradead.org, tglx@linutronix.de, geert@linux-m68k.org,
	lftan@altera.com, linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time
Date: Tue, 03 Jun 2014 09:33:36 +0200	[thread overview]
Message-ID: <5082342.alZgfaU1Q0@wuerfel> (raw)
In-Reply-To: <20140603003227.GP6677@dastard>

On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
> On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> > On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > > > In my list at http://kernelnewbies.org/y2038, I found that almost
> > > > > all file systems at least times until 2106, because they treat
> > > > > the on-disk value as unsigned on 64-bit systems, or they use
> > > > > a completely different representation. My guess is that somebody
> > > > > earlier spent a lot of work on making that happen.
> > > > > 
> > > > > The exceptions are:
> > > > > 
> > > > > * exofs uses signed values, which can probably be changed to be
> > > > >   consistent with the others.
> > > > > * isofs has a bug that limits it until 2027 on architectures with
> > > > >   a signed 'char' type (otherwise it's 2155).
> > > > > * udf can represent times for many thousands of years through a
> > > > >   16-bit year representation, but the code to convert to epoch
> > > > >   uses a const array that ends at 2038.
> > > > > * afs uses signed seconds and can probably be fixed
> > > > > * coda relies on user space time representation getting passed
> > > > >   through an ioctl.
> > > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > > > >   where they really use signed.
> > > > > 
> > > > > I was confused about XFS since I didn't noticed that there are
> > > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > > > > XFS to also use the 1970-2106 time range on 64-bit systems today.
> > > > 
> > > > You've missed an awful lot more than just the implications for the
> > > > core kernel code.
> > > > 
> > > > There's a good chance such changes propagate to APIs elsewhere in
> > > > the filesystems, because something you haven't realised is that XFS
> > > > effectively exposes the on-disk timestamp format directly to
> > > > userspace via the bulkstat interface (see struct xfs_bstat). It also
> > > > affects the XFS open-by-handle ioctl and the swap extent ioctl used
> > > > by the online defragmenter.
> > 
> > I really didn't look at them at all, as ioctl is very late on my
> > mental list of things to change. I do realize that a lot of drivers
> > and file systems do have ioctls that pass time values and we need to
> > address them one by one.
> > 
> > I just looked at the ioctls you mentioned but don't see how open-by-handle
> > is affected by this. Can you point me to what you mean?
> 
> Sorry, I misremembered how some of the XFS open-by-handle code works
> in userspace (XFS has a pretty rich open-by-handle ioctl() interface
> that predates the kernel syscalls by at least 10 years).  Basically
> there is code in userspace that uses the information returned from
> bulkstat to construct file handles to pass to the open-by-handle
> ioctls. xfs_fsr then uses the combination of open-by-handle from the
> bulkstat output and the bulkstat output to feed into the swap extent
> ioctls....
> 
> i.e. the filesystem's idea of what time is is passed to userspace as
> an opaque cookie in this case, but it is not used directly by the
> open-by-handle interfaces like I implied it was.

Ok, I see.

> > My patch set
> > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> > more like 64-bit kernels regarding inode time stamps, which does
> > impact all the file systems that the a 64-bit time or the NFS
> > unsigned epoch (1970-2106), while your patch extends the file
> > system internal epoch (1901-2038 for XFS) so it can be used by
> > anything that knows how to handle larger than 32-bit second values
> > (either 64-bit kernel or 32-bit with inode_time patch).
> 
> Right, but the issue is that 64 bit second counters are broken right
> now because most filesystems can't support more than 32 bit values.
> So it doesn't matter whether it's 32 bit or 64 bit machines, just
> adding explicit support for >32 bit second counters without doing
> anything else just extends that brokenness into the indefinite
> future.

Of course, "most filesystems" are obsolete, and most of the modern
file systems already support >32 bit timestamps: ext4, btrfs, cifs,
f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
64-bit systems, which interprets time stamps with the high bit
set as years 2038-2106 rather than 1903-1969.

> If we don't fix it now (i.e in the new user API and supporting
> infrastructure), then we'll *never be able to fix it* and we'll be
> stuck with timestamps that do really weird things when you pass
> arbitrary future dates to the kernel.

We already have that. I agree it's fixable and we should fix it,
but I don't see how this is different from what we had 20 years
ago when Linux on Alpha first introduced a 64-bit time_t. It's
been this way on every 64-bit Linux system since.

> > This is how ext4 does it (I mean
> > the sizeof() trick, not the bit stuffing they do):
> ....
> > I guess if there is general agreement on introducing 'struct inode_time',
> > we can skip that intermediate step.
> 
> Also, I don't like the concept of having filesystems that will work
> on 64 bit but not 32 bit machines. Over the past 10 years, we've
> managed to remove most of those differences from the VFS and XFS,
> so adding new distinctions between 32/64 bit machines is not the
> direction I want to head in.
> 
> As it is, I'm expecting to do this only after the struct inode_time
> and the superblock "time range" infrastructure have been added to
> the kernel and VFS.  If that change is not made, then we've still
> only got 32 bit time....

Ok.

> > Do you have to manually change it in the
> > superblock? Since most of the time I'd suspect you wouldn't actually
> > use it for the foreseeable future, would it make sense to have a mount
> > option that allows it to be set, but doesn't actually change the
> > superblock until the first inode gets written with a nonzero epoch?
> 
> Yes, we could set the flag on the first timestamp that goes beyond
> the current epoch, but that has two problems:
> 
> 	1. filesystem silently becomes incompatible with older
> 	kernels so failed upgrade rollbacks become problematic; and
> 
> 	2. It adds unecessary complexity, as this will end up being
> 	the default behaviour for all new filesystems within a year.
> 	Then we end up with a mount option and conversion functions
> 	that never get used but we have to support for years....
> 
> > That way, you'd still be able to mount it with an older kernel but
> > also be forward compatible with time moving on.
> 
> We've got plenty of time to roll this out so I don't see any need
> for putting in place temporary support mechanisms that unnecessarily
> complicate the code.

Ok, fair enough.

	Arnd


  reply	other threads:[~2014-06-03  7:35 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann
2014-05-30 20:01 ` [RFC 01/32] fs: introduce new 'struct inode_time' Arnd Bergmann
2014-05-31  7:56   ` Geert Uytterhoeven
2014-05-31  8:39     ` Andreas Schwab
2014-05-31 13:19       ` Geert Uytterhoeven
2014-05-31 13:46         ` Andreas Schwab
2014-05-31 14:54       ` Arnd Bergmann
2014-05-31 16:15         ` Geert Uytterhoeven
2014-05-31  9:03   ` H. Peter Anvin
2014-05-31 14:53     ` Arnd Bergmann
2014-05-31 14:55       ` H. Peter Anvin
2014-05-30 20:01 ` [RFC 02/32] uapi: add struct __kernel_timespec{32,64} Arnd Bergmann
2014-05-30 20:18   ` H. Peter Anvin
2014-05-31 15:09     ` Arnd Bergmann
2014-05-30 20:01 ` [RFC 03/32] fs: introduce sys_utimens64at Arnd Bergmann
2014-05-31  9:22   ` Andreas Schwab
2014-05-31 14:55     ` Arnd Bergmann
2014-05-30 20:01 ` [RFC 04/32] fs: introduce sys_newfstat64/sys_newfstatat64 Arnd Bergmann
2014-05-30 20:01 ` [RFC 05/32] arch: hook up new stat and utimes syscalls Arnd Bergmann
2014-05-30 20:01 ` [RFC 06/32] isofs: fix timestamps beyond 2027 Arnd Bergmann
2014-05-31  7:59   ` Geert Uytterhoeven
2014-05-31  8:47     ` H. Peter Anvin
2014-05-30 20:01 ` [RFC 07/32] fs/nfs: convert to struct inode_time Arnd Bergmann
2014-05-30 20:01 ` [RFC 08/32] fs/ceph: convert to 'struct inode_time' Arnd Bergmann
2014-05-30 20:01 ` [RFC 09/32] fs/pstore: convert to struct inode_time Arnd Bergmann
2014-05-30 21:14   ` Kees Cook
2014-05-30 20:01 ` [RFC 10/32] fs/coda: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 11/32] xfs: " Arnd Bergmann
2014-05-31  0:37   ` Dave Chinner
2014-05-31  0:41     ` H. Peter Anvin
2014-05-31  1:14       ` Dave Chinner
2014-05-31  1:22         ` H. Peter Anvin
2014-05-31  5:54           ` Dave Chinner
2014-05-31  8:41             ` H. Peter Anvin
2014-05-31 15:46               ` Nicolas Pitre
2014-06-01 19:56                 ` Arnd Bergmann
2014-06-01 20:26                   ` H. Peter Anvin
2014-06-02 11:02                     ` Arnd Bergmann
2014-06-02  1:36                   ` Nicolas Pitre
2014-06-02  2:22                     ` Dave Chinner
2014-06-02  7:09                       ` Geert Uytterhoeven
2014-06-02 10:56                     ` Arnd Bergmann
2014-06-02 11:57                       ` Theodore Ts'o
2014-06-02 12:38                         ` Arnd Bergmann
2014-06-02 13:15                           ` Theodore Ts'o
2014-06-02 12:52                         ` Arnd Bergmann
2014-06-02 13:07                           ` Theodore Ts'o
2014-06-02 15:01                             ` Arnd Bergmann
2014-06-02 14:52                         ` H. Peter Anvin
2014-06-02 15:04                       ` Chuck Lever
2014-06-02 15:31                         ` Theodore Ts'o
2014-06-02 17:12                           ` H. Peter Anvin
2014-06-02 18:50                             ` Arnd Bergmann
2014-06-02 22:29                             ` Theodore Ts'o
2014-06-02 22:32                               ` H. Peter Anvin
2014-06-02 23:32                                 ` Theodore Ts'o
2014-06-02 23:33                                   ` H. Peter Anvin
2014-06-03 13:09                                   ` Roger Willcocks
2014-06-02 18:52                         ` Arnd Bergmann
2014-06-02 18:58                         ` Roger Willcocks
2014-06-02 19:04                           ` Chuck Lever
2014-06-02 19:10                             ` Arnd Bergmann
2014-06-01  0:39               ` Dave Chinner
2014-06-02 14:00             ` Joseph S. Myers
2014-05-31 15:37         ` Arnd Bergmann
2014-06-01  0:24           ` Dave Chinner
2014-06-02  0:28             ` Dave Chinner
2014-06-02 11:35               ` Roger Willcocks
2014-06-02 11:43               ` Arnd Bergmann
2014-06-03  0:32                 ` Dave Chinner
2014-06-03  7:33                   ` Arnd Bergmann [this message]
2014-06-03  8:41                     ` Dave Chinner
2014-06-03  9:16                       ` Arnd Bergmann
2014-05-30 20:01 ` [RFC 12/32] btrfs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 13/32] ext3: " Arnd Bergmann
2014-05-31  9:10   ` H. Peter Anvin
2014-05-31 14:32     ` Arnd Bergmann
2014-05-30 20:01 ` [RFC 14/32] ext4: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 15/32] cifs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 16/32] ntfs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 17/32] ubifs: " Arnd Bergmann
2014-06-02  7:54   ` Artem Bityutskiy
2014-05-30 20:01 ` [RFC 18/32] ocfs2: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 19/32] fs/fat: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 20/32] afs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 21/32] udf: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 22/32] fs: convert simple fs to inode_time Arnd Bergmann
2014-05-30 23:06   ` Greg Kroah-Hartman
2014-05-30 20:01 ` [RFC 23/32] logfs: convert to struct inode_time Arnd Bergmann
2014-05-30 20:01 ` [RFC 24/32] hfs, hfsplus: " Arnd Bergmann
2014-05-31 14:23   ` Vyacheslav Dubeyko
2014-05-30 20:01 ` [RFC 25/32] gfs2: " Arnd Bergmann
2014-06-02  9:52   ` Steven Whitehouse
2014-05-30 20:01 ` [RFC 26/32] reiserfs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 27/32] jffs2: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 28/32] adfs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 29/32] f2fs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 30/32] fuse: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 31/32] scsi: fnic: use current_kernel_time() for timestamp Arnd Bergmann
2014-05-30 20:01 ` [RFC 32/32] fs: use new inode_time definition unconditionally Arnd Bergmann
2014-05-31 14:30 ` [RFC 00/32] making inode time stamps y2038 ready Vyacheslav Dubeyko
2014-06-03 12:21   ` Arnd Bergmann
2014-05-31 14:51 ` Richard Cochran
2014-05-31 15:23   ` Arnd Bergmann
2014-05-31 18:22     ` Richard Cochran
2014-05-31 19:34       ` H. Peter Anvin
2014-06-01  4:46         ` Richard Cochran
2014-06-01  4:44     ` Richard Cochran
2014-06-02 13:52 ` Joseph S. Myers
2014-06-02 19:19   ` Arnd Bergmann
2014-06-02 19:26     ` H. Peter Anvin
2014-06-02 19:55       ` Arnd Bergmann
2014-06-02 21:57         ` H. Peter Anvin
2014-06-03 14:22           ` Arnd Bergmann
2014-06-03 14:33             ` Joseph S. Myers
2014-06-03 14:37               ` Arnd Bergmann
2014-06-03 21:38             ` Dave Chinner
2014-06-04 15:03               ` Arnd Bergmann
2014-06-04 17:30                 ` Nicolas Pitre
2014-06-04 19:24                   ` Arnd Bergmann
2014-06-05  0:10                     ` H. Peter Anvin
2014-06-10  9:54                       ` Arnd Bergmann
2014-06-02 21:02     ` Joseph S. Myers
2014-06-04 15:05       ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5082342.alZgfaU1Q0@wuerfel \
    --to=arnd@arndb.de \
    --cc=david@fromorbit.com \
    --cc=geert@linux-m68k.org \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=john.stultz@linaro.org \
    --cc=joseph@codesourcery.com \
    --cc=lftan@altera.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).