linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, sandeen@sandeen.net
Subject: Re: [PATCH 08/11] xfs: widen ondisk timestamps to deal with y2038 problem
Date: Wed, 19 Aug 2020 14:43:22 -0700	[thread overview]
Message-ID: <20200819214322.GE6096@magnolia> (raw)
In-Reply-To: <20200818233535.GD21744@dread.disaster.area>

On Wed, Aug 19, 2020 at 09:35:35AM +1000, Dave Chinner wrote:
> On Mon, Aug 17, 2020 at 03:57:39PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Redesign the ondisk timestamps to be a simple unsigned 64-bit counter of
> > nanoseconds since 14 Dec 1901 (i.e. the minimum time in the 32-bit unix
> > time epoch).  This enables us to handle dates up to 2486, which solves
> > the y2038 problem.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> .....
> > +/* Convert an ondisk timestamp into the 64-bit safe incore format. */
> >  void
> >  xfs_inode_from_disk_timestamp(
> > +	struct xfs_dinode		*dip,
> >  	struct timespec64		*tv,
> >  	const union xfs_timestamp	*ts)
> >  {
> > +	if (dip->di_version >= 3 &&
> > +	    (dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_BIGTIME))) {
> > +		uint64_t		t = be64_to_cpu(ts->t_bigtime);
> > +		uint64_t		s;
> > +		uint32_t		n;
> > +
> > +		s = div_u64_rem(t, NSEC_PER_SEC, &n);
> > +		tv->tv_sec = s - XFS_INO_BIGTIME_EPOCH;
> > +		tv->tv_nsec = n;
> > +		return;
> > +	}
> > +
> >  	tv->tv_sec = (int)be32_to_cpu(ts->t_sec);
> >  	tv->tv_nsec = (int)be32_to_cpu(ts->t_nsec);
> >  }
> 
> Can't say I'm sold on this union. It seems cleaner to me to just
> make the timestamp an opaque 64 bit field on disk and convert it to
> the in-memory representation directly in the to/from disk
> operations. e.g.:
> 
> void
> xfs_inode_from_disk_timestamp(
> 	struct xfs_dinode		*dip,
> 	struct timespec64		*tv,
> 	__be64				ts)
> {
> 
> 	uint64_t		t = be64_to_cpu(ts);
> 	uint64_t		s;
> 	uint32_t		n;
> 
> 	if (xfs_dinode_is_bigtime(dip)) {
> 		s = div_u64_rem(t, NSEC_PER_SEC, &n) - XFS_INO_BIGTIME_EPOCH;
> 	} else {
> 		s = (int)(t >> 32);
> 		n = (int)(t & 0xffffffff);
> 	}
> 	tv->tv_sec = s;
> 	tv->tv_nsec = n;
> }

I don't like this open-coded union approach at all because now I have to
keep the t_sec and t_nsec bits separate in my head instead of letting
the C compiler take care of that detail.  The sample code above doesn't
handle that correctly either:

Start with an old kernel on a little endian system; each uppercase
letter represents a byte (A is the LSB of t_sec, D is the MSB of t_sec,
E is the LSB of t_nsec, and H is the MSB of t_nsec):

	sec  nsec (incore)
	ABCD EFGH

That gets written out as:

	sec  nsec (ondisk)
	DCBA HGFE

Now reboot with a new kernel that only knows 64bit timestamps on disk:

	64bit (ondisk)
	DCBAHGFE

Now it does the first be64_to_cpu conversion:
	64bit (incore)
	EFGHABCD

And then masks and shifts:
	sec  nsec (incore)
	EFGH ABCD

Oops, we just switched the values!

The correct approach (I think) is to perform the shifting and masking on
the raw __be64 value before converting them to incore format via
be32_to_cpu, but now I have to work out all four cases by hand instead
of letting the compiler do the legwork for me.  I don't remember if it's
correct to go around shifting and masking __be64 values.

I guess the good news is that at least we have generic/402 to catch
these kinds of persistence problems, but ugh.

Anyway, what are you afraid of?  The C compiler smoking crack and not
actually overlapping the two union elements?  We could control for
that...

> > @@ -220,9 +234,9 @@ xfs_inode_from_disk(
> >  	 * a time before epoch is converted to a time long after epoch
> >  	 * on 64 bit systems.
> >  	 */
> > -	xfs_inode_from_disk_timestamp(&inode->i_atime, &from->di_atime);
> > -	xfs_inode_from_disk_timestamp(&inode->i_mtime, &from->di_mtime);
> > -	xfs_inode_from_disk_timestamp(&inode->i_ctime, &from->di_ctime);
> > +	xfs_inode_from_disk_timestamp(from, &inode->i_atime, &from->di_atime);
> > +	xfs_inode_from_disk_timestamp(from, &inode->i_mtime, &from->di_mtime);
> > +	xfs_inode_from_disk_timestamp(from, &inode->i_ctime, &from->di_ctime);
> >  
> >  	to->di_size = be64_to_cpu(from->di_size);
> >  	to->di_nblocks = be64_to_cpu(from->di_nblocks);
> > @@ -235,9 +249,17 @@ xfs_inode_from_disk(
> >  	if (xfs_sb_version_has_v3inode(&ip->i_mount->m_sb)) {
> >  		inode_set_iversion_queried(inode,
> >  					   be64_to_cpu(from->di_changecount));
> > -		xfs_inode_from_disk_timestamp(&to->di_crtime, &from->di_crtime);
> > +		xfs_inode_from_disk_timestamp(from, &to->di_crtime,
> > +				&from->di_crtime);
> >  		to->di_flags2 = be64_to_cpu(from->di_flags2);
> >  		to->di_cowextsize = be32_to_cpu(from->di_cowextsize);
> > +		/*
> > +		 * Set the bigtime flag incore so that we automatically convert
> > +		 * this inode's ondisk timestamps to bigtime format the next
> > +		 * time we write the inode core to disk.
> > +		 */
> > +		if (xfs_sb_version_hasbigtime(&ip->i_mount->m_sb))
> > +			to->di_flags2 |= XFS_DIFLAG2_BIGTIME;
> >  	}
> 
> We do not want on-disk flags to be changed outside transactions like
> this. Indeed, this has implications for O_DSYNC operation, in that
> we do not trigger inode sync operations if the inode is only
> timestamp dirty. If we've changed this flag, then the inode is more
> than "timestamp dirty" and O_DSYNC will need to flush the entire
> inode.... :/

I forgot about XFS_ILOG_TIMESTAMP.

> IOWs, I think we should only change this flag in a timestamp
> transaction where the timestamps are actually being logged and hence
> we can set inode dirty state appropriately so that everything will
> get logged, changed and written back correctly....

Yeah, that's fair.  I'll change xfs_trans_log_inode to set the bigtime
flag if we're logging either the timestamps or the core.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2020-08-19 21:43 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-17 22:56 [PATCH v2 00/11] xfs: widen timestamps to deal with y2038 Darrick J. Wong
2020-08-17 22:56 ` [PATCH 01/11] xfs: explicitly define inode timestamp range Darrick J. Wong
2020-08-18  6:25   ` Amir Goldstein
2020-08-17 22:57 ` [PATCH 02/11] xfs: refactor quota expiration timer modification Darrick J. Wong
2020-08-18  6:48   ` Amir Goldstein
2020-08-18 15:40     ` Darrick J. Wong
2020-08-17 22:57 ` [PATCH 03/11] xfs: refactor default quota grace period setting code Darrick J. Wong
2020-08-18 10:46   ` Amir Goldstein
2020-08-17 22:57 ` [PATCH 04/11] xfs: remove xfs_timestamp_t Darrick J. Wong
2020-08-18 10:50   ` Amir Goldstein
2020-08-17 22:57 ` [PATCH 05/11] xfs: move xfs_log_dinode_to_disk to the log code Darrick J. Wong
2020-08-18 10:51   ` Amir Goldstein
2020-08-17 22:57 ` [PATCH 06/11] xfs: refactor inode timestamp coding Darrick J. Wong
2020-08-18 11:20   ` Amir Goldstein
2020-08-18 15:42     ` Darrick J. Wong
2020-08-17 22:57 ` [PATCH 07/11] xfs: convert struct xfs_timestamp to union Darrick J. Wong
2020-08-18 11:24   ` Amir Goldstein
2020-08-17 22:57 ` [PATCH 08/11] xfs: widen ondisk timestamps to deal with y2038 problem Darrick J. Wong
2020-08-18 12:00   ` Amir Goldstein
2020-08-18 12:53     ` Amir Goldstein
2020-08-18 15:53       ` Darrick J. Wong
2020-08-18 20:52         ` Darrick J. Wong
2020-08-18 15:44     ` Darrick J. Wong
2020-08-18 23:35   ` Dave Chinner
2020-08-19 21:43     ` Darrick J. Wong [this message]
2020-08-19 23:58       ` Dave Chinner
2020-08-20  0:01       ` Darrick J. Wong
2020-08-20  4:42         ` griffin tucker
2020-08-20 16:23           ` Darrick J. Wong
2020-08-21  5:02             ` griffin tucker
2020-08-21 15:31               ` Mike Fleetwood
2020-08-20  5:11         ` Amir Goldstein
2020-08-20 22:47           ` Dave Chinner
2020-08-17 22:57 ` [PATCH 09/11] xfs: refactor quota timestamp coding Darrick J. Wong
2020-08-18 12:25   ` Amir Goldstein
2020-08-17 22:57 ` [PATCH 10/11] xfs: enable bigtime for quota timers Darrick J. Wong
2020-08-18 13:58   ` Amir Goldstein
2020-08-18 15:59     ` Darrick J. Wong
2020-08-17 22:57 ` [PATCH 11/11] xfs: enable big timestamps Darrick J. Wong
2020-08-18 14:04   ` Amir Goldstein
2020-08-18 23:01 ` [PATCH v2 00/11] xfs: widen timestamps to deal with y2038 Dave Chinner
2020-08-18 23:10   ` Darrick J. Wong
2020-08-18 23:41     ` Dave Chinner
2020-08-21  2:11 [PATCH v3 " Darrick J. Wong
2020-08-21  2:12 ` [PATCH 08/11] xfs: widen ondisk timestamps to deal with y2038 problem Darrick J. Wong
2020-08-22  7:33   ` Christoph Hellwig
2020-08-24  2:43     ` Darrick J. Wong
2020-08-25  0:39       ` Darrick J. Wong
2020-08-24  1:25   ` Dave Chinner
2020-08-24  3:13     ` Darrick J. Wong
2020-08-24  6:15       ` Dave Chinner
2020-08-24 16:24         ` Darrick J. Wong
2020-08-24 21:13           ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200819214322.GE6096@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).