From: Arnd Bergmann <arnd@arndb.de>
To: Dave Chinner <david@fromorbit.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
joseph@codesourcery.com, john.stultz@linaro.org,
hch@infradead.org, tglx@linutronix.de, geert@linux-m68k.org,
lftan@altera.com, linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [RFC 11/32] xfs: convert to struct inode_time
Date: Tue, 03 Jun 2014 09:33:36 +0200 [thread overview]
Message-ID: <5082342.alZgfaU1Q0@wuerfel> (raw)
In-Reply-To: <20140603003227.GP6677@dastard>
On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
> On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> > On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > > > In my list at http://kernelnewbies.org/y2038, I found that almost
> > > > > all file systems at least times until 2106, because they treat
> > > > > the on-disk value as unsigned on 64-bit systems, or they use
> > > > > a completely different representation. My guess is that somebody
> > > > > earlier spent a lot of work on making that happen.
> > > > >
> > > > > The exceptions are:
> > > > >
> > > > > * exofs uses signed values, which can probably be changed to be
> > > > > consistent with the others.
> > > > > * isofs has a bug that limits it until 2027 on architectures with
> > > > > a signed 'char' type (otherwise it's 2155).
> > > > > * udf can represent times for many thousands of years through a
> > > > > 16-bit year representation, but the code to convert to epoch
> > > > > uses a const array that ends at 2038.
> > > > > * afs uses signed seconds and can probably be fixed
> > > > > * coda relies on user space time representation getting passed
> > > > > through an ioctl.
> > > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > > > > where they really use signed.
> > > > >
> > > > > I was confused about XFS since I didn't noticed that there are
> > > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > > > > XFS to also use the 1970-2106 time range on 64-bit systems today.
> > > >
> > > > You've missed an awful lot more than just the implications for the
> > > > core kernel code.
> > > >
> > > > There's a good chance such changes propagate to APIs elsewhere in
> > > > the filesystems, because something you haven't realised is that XFS
> > > > effectively exposes the on-disk timestamp format directly to
> > > > userspace via the bulkstat interface (see struct xfs_bstat). It also
> > > > affects the XFS open-by-handle ioctl and the swap extent ioctl used
> > > > by the online defragmenter.
> >
> > I really didn't look at them at all, as ioctl is very late on my
> > mental list of things to change. I do realize that a lot of drivers
> > and file systems do have ioctls that pass time values and we need to
> > address them one by one.
> >
> > I just looked at the ioctls you mentioned but don't see how open-by-handle
> > is affected by this. Can you point me to what you mean?
>
> Sorry, I misremembered how some of the XFS open-by-handle code works
> in userspace (XFS has a pretty rich open-by-handle ioctl() interface
> that predates the kernel syscalls by at least 10 years). Basically
> there is code in userspace that uses the information returned from
> bulkstat to construct file handles to pass to the open-by-handle
> ioctls. xfs_fsr then uses the combination of open-by-handle from the
> bulkstat output and the bulkstat output to feed into the swap extent
> ioctls....
>
> i.e. the filesystem's idea of what time is is passed to userspace as
> an opaque cookie in this case, but it is not used directly by the
> open-by-handle interfaces like I implied it was.
Ok, I see.
> > My patch set
> > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> > more like 64-bit kernels regarding inode time stamps, which does
> > impact all the file systems that the a 64-bit time or the NFS
> > unsigned epoch (1970-2106), while your patch extends the file
> > system internal epoch (1901-2038 for XFS) so it can be used by
> > anything that knows how to handle larger than 32-bit second values
> > (either 64-bit kernel or 32-bit with inode_time patch).
>
> Right, but the issue is that 64 bit second counters are broken right
> now because most filesystems can't support more than 32 bit values.
> So it doesn't matter whether it's 32 bit or 64 bit machines, just
> adding explicit support for >32 bit second counters without doing
> anything else just extends that brokenness into the indefinite
> future.
Of course, "most filesystems" are obsolete, and most of the modern
file systems already support >32 bit timestamps: ext4, btrfs, cifs,
f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
64-bit systems, which interprets time stamps with the high bit
set as years 2038-2106 rather than 1903-1969.
> If we don't fix it now (i.e in the new user API and supporting
> infrastructure), then we'll *never be able to fix it* and we'll be
> stuck with timestamps that do really weird things when you pass
> arbitrary future dates to the kernel.
We already have that. I agree it's fixable and we should fix it,
but I don't see how this is different from what we had 20 years
ago when Linux on Alpha first introduced a 64-bit time_t. It's
been this way on every 64-bit Linux system since.
> > This is how ext4 does it (I mean
> > the sizeof() trick, not the bit stuffing they do):
> ....
> > I guess if there is general agreement on introducing 'struct inode_time',
> > we can skip that intermediate step.
>
> Also, I don't like the concept of having filesystems that will work
> on 64 bit but not 32 bit machines. Over the past 10 years, we've
> managed to remove most of those differences from the VFS and XFS,
> so adding new distinctions between 32/64 bit machines is not the
> direction I want to head in.
>
> As it is, I'm expecting to do this only after the struct inode_time
> and the superblock "time range" infrastructure have been added to
> the kernel and VFS. If that change is not made, then we've still
> only got 32 bit time....
Ok.
> > Do you have to manually change it in the
> > superblock? Since most of the time I'd suspect you wouldn't actually
> > use it for the foreseeable future, would it make sense to have a mount
> > option that allows it to be set, but doesn't actually change the
> > superblock until the first inode gets written with a nonzero epoch?
>
> Yes, we could set the flag on the first timestamp that goes beyond
> the current epoch, but that has two problems:
>
> 1. filesystem silently becomes incompatible with older
> kernels so failed upgrade rollbacks become problematic; and
>
> 2. It adds unecessary complexity, as this will end up being
> the default behaviour for all new filesystems within a year.
> Then we end up with a mount option and conversion functions
> that never get used but we have to support for years....
>
> > That way, you'd still be able to mount it with an older kernel but
> > also be forward compatible with time moving on.
>
> We've got plenty of time to roll this out so I don't see any need
> for putting in place temporary support mechanisms that unnecessarily
> complicate the code.
Ok, fair enough.
Arnd
next prev parent reply other threads:[~2014-06-03 7:35 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-30 20:01 [RFC 00/32] making inode time stamps y2038 ready Arnd Bergmann
2014-05-30 20:01 ` [RFC 01/32] fs: introduce new 'struct inode_time' Arnd Bergmann
2014-05-31 7:56 ` Geert Uytterhoeven
2014-05-31 8:39 ` Andreas Schwab
2014-05-31 13:19 ` Geert Uytterhoeven
2014-05-31 13:46 ` Andreas Schwab
2014-05-31 14:54 ` Arnd Bergmann
2014-05-31 16:15 ` Geert Uytterhoeven
2014-05-31 9:03 ` H. Peter Anvin
2014-05-31 14:53 ` Arnd Bergmann
2014-05-31 14:55 ` H. Peter Anvin
2014-05-30 20:01 ` [RFC 02/32] uapi: add struct __kernel_timespec{32,64} Arnd Bergmann
2014-05-30 20:18 ` H. Peter Anvin
2014-05-31 15:09 ` Arnd Bergmann
2014-05-30 20:01 ` [RFC 03/32] fs: introduce sys_utimens64at Arnd Bergmann
2014-05-31 9:22 ` Andreas Schwab
2014-05-31 14:55 ` Arnd Bergmann
2014-05-30 20:01 ` [RFC 04/32] fs: introduce sys_newfstat64/sys_newfstatat64 Arnd Bergmann
2014-05-30 20:01 ` [RFC 05/32] arch: hook up new stat and utimes syscalls Arnd Bergmann
2014-05-30 20:01 ` [RFC 06/32] isofs: fix timestamps beyond 2027 Arnd Bergmann
2014-05-31 7:59 ` Geert Uytterhoeven
2014-05-31 8:47 ` H. Peter Anvin
2014-05-30 20:01 ` [RFC 07/32] fs/nfs: convert to struct inode_time Arnd Bergmann
2014-05-30 20:01 ` [RFC 08/32] fs/ceph: convert to 'struct inode_time' Arnd Bergmann
2014-05-30 20:01 ` [RFC 09/32] fs/pstore: convert to struct inode_time Arnd Bergmann
2014-05-30 21:14 ` Kees Cook
2014-05-30 20:01 ` [RFC 10/32] fs/coda: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 11/32] xfs: " Arnd Bergmann
2014-05-31 0:37 ` Dave Chinner
2014-05-31 0:41 ` H. Peter Anvin
2014-05-31 1:14 ` Dave Chinner
2014-05-31 1:22 ` H. Peter Anvin
2014-05-31 5:54 ` Dave Chinner
2014-05-31 8:41 ` H. Peter Anvin
2014-05-31 15:46 ` Nicolas Pitre
2014-06-01 19:56 ` Arnd Bergmann
2014-06-01 20:26 ` H. Peter Anvin
2014-06-02 11:02 ` Arnd Bergmann
2014-06-02 1:36 ` Nicolas Pitre
2014-06-02 2:22 ` Dave Chinner
2014-06-02 7:09 ` Geert Uytterhoeven
2014-06-02 10:56 ` Arnd Bergmann
2014-06-02 11:57 ` Theodore Ts'o
2014-06-02 12:38 ` Arnd Bergmann
2014-06-02 13:15 ` Theodore Ts'o
2014-06-02 12:52 ` Arnd Bergmann
2014-06-02 13:07 ` Theodore Ts'o
2014-06-02 15:01 ` Arnd Bergmann
2014-06-02 14:52 ` H. Peter Anvin
2014-06-02 15:04 ` Chuck Lever
2014-06-02 15:31 ` Theodore Ts'o
2014-06-02 17:12 ` H. Peter Anvin
2014-06-02 18:50 ` Arnd Bergmann
2014-06-02 22:29 ` Theodore Ts'o
2014-06-02 22:32 ` H. Peter Anvin
2014-06-02 23:32 ` Theodore Ts'o
2014-06-02 23:33 ` H. Peter Anvin
2014-06-03 13:09 ` Roger Willcocks
2014-06-02 18:52 ` Arnd Bergmann
2014-06-02 18:58 ` Roger Willcocks
2014-06-02 19:04 ` Chuck Lever
2014-06-02 19:10 ` Arnd Bergmann
2014-06-01 0:39 ` Dave Chinner
2014-06-02 14:00 ` Joseph S. Myers
2014-05-31 15:37 ` Arnd Bergmann
2014-06-01 0:24 ` Dave Chinner
2014-06-02 0:28 ` Dave Chinner
2014-06-02 11:35 ` Roger Willcocks
2014-06-02 11:43 ` Arnd Bergmann
2014-06-03 0:32 ` Dave Chinner
2014-06-03 7:33 ` Arnd Bergmann [this message]
2014-06-03 8:41 ` Dave Chinner
2014-06-03 9:16 ` Arnd Bergmann
2014-05-30 20:01 ` [RFC 12/32] btrfs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 13/32] ext3: " Arnd Bergmann
2014-05-31 9:10 ` H. Peter Anvin
2014-05-31 14:32 ` Arnd Bergmann
2014-05-30 20:01 ` [RFC 14/32] ext4: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 15/32] cifs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 16/32] ntfs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 17/32] ubifs: " Arnd Bergmann
2014-06-02 7:54 ` Artem Bityutskiy
2014-05-30 20:01 ` [RFC 18/32] ocfs2: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 19/32] fs/fat: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 20/32] afs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 21/32] udf: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 22/32] fs: convert simple fs to inode_time Arnd Bergmann
2014-05-30 23:06 ` Greg Kroah-Hartman
2014-05-30 20:01 ` [RFC 23/32] logfs: convert to struct inode_time Arnd Bergmann
2014-05-30 20:01 ` [RFC 24/32] hfs, hfsplus: " Arnd Bergmann
2014-05-31 14:23 ` Vyacheslav Dubeyko
2014-05-30 20:01 ` [RFC 25/32] gfs2: " Arnd Bergmann
2014-06-02 9:52 ` Steven Whitehouse
2014-05-30 20:01 ` [RFC 26/32] reiserfs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 27/32] jffs2: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 28/32] adfs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 29/32] f2fs: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 30/32] fuse: " Arnd Bergmann
2014-05-30 20:01 ` [RFC 31/32] scsi: fnic: use current_kernel_time() for timestamp Arnd Bergmann
2014-05-30 20:01 ` [RFC 32/32] fs: use new inode_time definition unconditionally Arnd Bergmann
2014-05-31 14:30 ` [RFC 00/32] making inode time stamps y2038 ready Vyacheslav Dubeyko
2014-06-03 12:21 ` Arnd Bergmann
2014-05-31 14:51 ` Richard Cochran
2014-05-31 15:23 ` Arnd Bergmann
2014-05-31 18:22 ` Richard Cochran
2014-05-31 19:34 ` H. Peter Anvin
2014-06-01 4:46 ` Richard Cochran
2014-06-01 4:44 ` Richard Cochran
2014-06-02 13:52 ` Joseph S. Myers
2014-06-02 19:19 ` Arnd Bergmann
2014-06-02 19:26 ` H. Peter Anvin
2014-06-02 19:55 ` Arnd Bergmann
2014-06-02 21:57 ` H. Peter Anvin
2014-06-03 14:22 ` Arnd Bergmann
2014-06-03 14:33 ` Joseph S. Myers
2014-06-03 14:37 ` Arnd Bergmann
2014-06-03 21:38 ` Dave Chinner
2014-06-04 15:03 ` Arnd Bergmann
2014-06-04 17:30 ` Nicolas Pitre
2014-06-04 19:24 ` Arnd Bergmann
2014-06-05 0:10 ` H. Peter Anvin
2014-06-10 9:54 ` Arnd Bergmann
2014-06-02 21:02 ` Joseph S. Myers
2014-06-04 15:05 ` Arnd Bergmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5082342.alZgfaU1Q0@wuerfel \
--to=arnd@arndb.de \
--cc=david@fromorbit.com \
--cc=geert@linux-m68k.org \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=john.stultz@linaro.org \
--cc=joseph@codesourcery.com \
--cc=lftan@altera.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).