All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>
Cc: Amir Goldstein <amir73il@gmail.com>,
	linux-xfs@vger.kernel.org, sandeen@sandeen.net
Subject: Re: [PATCH 08/11] xfs: widen ondisk timestamps to deal with y2038 problem
Date: Mon, 24 Aug 2020 17:39:45 -0700	[thread overview]
Message-ID: <20200825003945.GA6096@magnolia> (raw)
In-Reply-To: <20200824024341.GT6096@magnolia>

On Sun, Aug 23, 2020 at 07:43:41PM -0700, Darrick J. Wong wrote:
> On Sat, Aug 22, 2020 at 08:33:19AM +0100, Christoph Hellwig wrote:
> > >   * in the AGI header so that we can skip the finobt walk at mount time when
> > > @@ -855,12 +862,18 @@ struct xfs_agfl {
> > >   *
> > >   * Inode timestamps consist of signed 32-bit counters for seconds and
> > >   * nanoseconds; time zero is the Unix epoch, Jan  1 00:00:00 UTC 1970.
> > > + *
> > > + * When bigtime is enabled, timestamps become an unsigned 64-bit nanoseconds
> > > + * counter.  Time zero is the start of the classic timestamp range.
> > >   */
> > >  union xfs_timestamp {
> > >  	struct {
> > >  		__be32		t_sec;		/* timestamp seconds */
> > >  		__be32		t_nsec;		/* timestamp nanoseconds */
> > >  	};
> > > +
> > > +	/* Nanoseconds since the bigtime epoch. */
> > > +	__be64			t_bigtime;
> > >  };
> > 
> > So do we really need the union here?  What about:
> > 
> >  (1) keep the typedef instead of removing it
> >  (2) switch the typedef to be just a __be64, and use trivial helpers
> >      to extract the two separate legacy sec/nsec field
> >  (3) PROFIT!!!
> 
> Been there, done that.  Dave suggested some replacement code (which
> corrupted the values), then I modified that into a correct version,
> which then made smatch angry because it doesn't like code that does bit
> shifts on __be64 values.

Backing up here, I've realized that my own analysis of Dave's pseudocode
was incorrect.

On a little endian machine, we'll start with the following.  A is the
LSB of seconds; D is the MSB of seconds; E is the LSB of nsec, and H is
the MSB of nsec.

  sec  nsec (incore)
  l  m l  m
  ABCD EFGH

Now we encode that with an old kernel, which calls cpu_to_be32 to turn
that into:

  sec  nsec (ondisk)
  m  l m  l
  DCBA HGFE

Move over to a new kernel, and that becomes:

  tstamp (ondisk)
  m      l
  DCBAHGFE

Next we decode with be64_to_cpu:

  tstamp (incore)
  l      m
  EFGHABCD

Now we extract nsec from (tstamp & -1U) and sec from (tstamp >> 32):

  sec  nsec
  l  m l  m
  ABCD EFGH

So yes, masking and shifting /after/ the endian conversion works just
fine and doesn't throw any sparse/smatch errors.

Now on a big endian machine:

  sec  nsec (incore)
  m  l m  l
  DCBA HGFE

Now we encode that with an old kernel, which calls cpu_to_be32 (a nop)
to turn that into:

  sec  nsec (ondisk)
  m  l m  l
  DCBA HGFE

Move over to a new kernel, and that becomes:

  tstamp (ondisk)
  m      l
  DCBAHGFE

Next we decode with be64_to_cpu (a nop):

  tstamp (incore)
  m      l
  DCBAHGFE

Now we extract nsec from (tstamp & -1U) and sec from (tstamp >> 32):

  sec  nsec
  m  l m  l
  DCBA HGFE

Works fine here too.

Now the /truly/ nasty case here is xfs_ictimestamp, since we log the
inode core in host endian format.  If we start with this the vfs
timestamp on a new kernel:

  sec  nsec (incore)
  l  m l  m
  ABCD EFGH

We need to encode that as:

  tstamp (ondisk)
  l      m
  ABCDEFGH

The only way to do this is: (nsec << 32) | (sec & -1U).  That makes the
log timestamp encoding is the opposite of what we do for the ondisk
inodes, because log formats don't use cpu_to_be64.

At least for a big endian machine, log timestamp coding is easy:

  sec  nsec (incore)
  m  l m  l
  DCBA HGFE

We need to encode that as:

  tstamp (ondisk)
  m      l
  DCBAHGFE

And the only way to get there is (sec << 32) | (nsec & -1U), which is
what the ondisk inode timestamp coding does.

I still think this is grody, but at least now now I have a new fstest to
make sure that log recovery doesn't trip over this.  So, you were
technically right and I was wrong.  We'll see how you like the new
stuff. ;)

--D

> > > +/* Convert an ondisk timestamp into the 64-bit safe incore format. */
> > >  void
> > >  xfs_inode_from_disk_timestamp(
> > > +	struct xfs_dinode		*dip,
> > >  	struct timespec64		*tv,
> > >  	const union xfs_timestamp	*ts)
> > 
> > I think passing ts by value might lead to somewhat better code
> > generation on modern ABIs (and older ABIs just fall back to pass
> > by reference transparently).
> 
> Hm, ok.  I did not know that. :)
> 
> > >  {
> > > +	if (dip->di_version >= 3 &&
> > > +	    (dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_BIGTIME))) {
> > 
> > Do we want a helper for this condition?
> 
> Yes, yes we do.  Will add.
> 
> > > +		uint64_t		t = be64_to_cpu(ts->t_bigtime);
> > > +		uint64_t		s;
> > > +		uint32_t		n;
> > > +
> > > +		s = div_u64_rem(t, NSEC_PER_SEC, &n);
> > > +		tv->tv_sec = s - XFS_INO_BIGTIME_EPOCH;
> > > +		tv->tv_nsec = n;
> > > +		return;
> > > +	}
> > > +
> > >  	tv->tv_sec = (int)be32_to_cpu(ts->t_sec);
> > >  	tv->tv_nsec = (int)be32_to_cpu(ts->t_nsec);
> > 
> > Nit: for these kinds of symmetric conditions and if/else feels a little
> > more natural.
> > 
> > > +		xfs_log_dinode_to_disk_ts(from, &to->di_crtime, &from->di_crtime);
> > 
> > This adds a > 80 char line.
> 
> Do we care now that checkpatch has been changed to allow up to 100
> columns?
> 
> > > +	if (from->di_flags2 & XFS_DIFLAG2_BIGTIME) {
> > > +		uint64_t		t;
> > > +
> > > +		t = (uint64_t)(ts->tv_sec + XFS_INO_BIGTIME_EPOCH);
> > > +		t *= NSEC_PER_SEC;
> > > +		its->t_bigtime = t + ts->tv_nsec;
> > 
> > This calculation is dupliated in two places, might be worth
> > adding a little helper (which will need to get the sec/nsec values
> > passed separately due to the different structures).
> > 
> > > +		xfs_inode_to_log_dinode_ts(from, &to->di_crtime, &from->di_crtime);
> > 
> > Another line over 8 characters here.
> > 
> > > +	if (xfs_sb_version_hasbigtime(&mp->m_sb)) {
> > > +		sb->s_time_min = XFS_INO_BIGTIME_MIN;
> > > +		sb->s_time_max = XFS_INO_BIGTIME_MAX;
> > > +	} else {
> > > +		sb->s_time_min = XFS_INO_TIME_MIN;
> > > +		sb->s_time_max = XFS_INO_TIME_MAX;
> > > +	}
> > 
> > This is really a comment on the earlier patch, but maybe we should
> > name the old constants with "OLD" or "LEGACY" or "SMALL" in the name?
> 
> Yes, good suggestion!
> 
> > > @@ -1494,6 +1499,10 @@ xfs_fc_fill_super(
> > >  	if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
> > >  		sb->s_flags |= SB_I_VERSION;
> > >  
> > > +	if (xfs_sb_version_hasbigtime(&mp->m_sb))
> > > +		xfs_warn(mp,
> > > + "EXPERIMENTAL big timestamp feature in use. Use at your own risk!");
> > > +
> > 
> > Is there any good reason to mark this experimental?
> 
> As you and Dave have both pointed out, there are plenty of stupid bugs
> still in this.  I think I'd like to have at least one EXPERIMENTAL cycle
> to make sure I didn't commit anything pathologically stupid in here.
> 
> <cough> ext4 34-bit sign extension bug <cough>.
> 
> --D

  reply	other threads:[~2020-08-25  0:40 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21  2:11 [PATCH v3 00/11] xfs: widen timestamps to deal with y2038 Darrick J. Wong
2020-08-21  2:11 ` [PATCH 01/11] xfs: explicitly define inode timestamp range Darrick J. Wong
2020-08-22  7:12   ` Christoph Hellwig
2020-08-24 16:29     ` Darrick J. Wong
2020-08-23 23:54   ` Dave Chinner
2020-08-24  2:34     ` Darrick J. Wong
2020-08-21  2:11 ` [PATCH 02/11] xfs: refactor quota expiration timer modification Darrick J. Wong
2020-08-22  7:14   ` Christoph Hellwig
2020-08-23 23:57   ` Dave Chinner
2020-08-24  2:34     ` Darrick J. Wong
2020-08-21  2:11 ` [PATCH 03/11] xfs: refactor default quota grace period setting code Darrick J. Wong
2020-08-22  7:15   ` Christoph Hellwig
2020-08-24  0:01   ` Dave Chinner
2020-08-21  2:11 ` [PATCH 04/11] xfs: remove xfs_timestamp_t Darrick J. Wong
2020-08-22  7:15   ` Christoph Hellwig
2020-08-24  0:04   ` Dave Chinner
2020-08-21  2:12 ` [PATCH 05/11] xfs: move xfs_log_dinode_to_disk to the log code Darrick J. Wong
2020-08-22  7:16   ` Christoph Hellwig
2020-08-24  2:31     ` Darrick J. Wong
2020-08-24  0:06   ` Dave Chinner
2020-08-21  2:12 ` [PATCH 06/11] xfs: refactor inode timestamp coding Darrick J. Wong
2020-08-22  7:17   ` Christoph Hellwig
2020-08-24  0:10   ` Dave Chinner
2020-08-21  2:12 ` [PATCH 07/11] xfs: convert struct xfs_timestamp to union Darrick J. Wong
2020-08-22  7:18   ` Christoph Hellwig
2020-08-24  2:35     ` Darrick J. Wong
2020-08-21  2:12 ` [PATCH 08/11] xfs: widen ondisk timestamps to deal with y2038 problem Darrick J. Wong
2020-08-22  7:33   ` Christoph Hellwig
2020-08-24  2:43     ` Darrick J. Wong
2020-08-25  0:39       ` Darrick J. Wong [this message]
2020-08-24  1:25   ` Dave Chinner
2020-08-24  3:13     ` Darrick J. Wong
2020-08-24  6:15       ` Dave Chinner
2020-08-24 16:24         ` Darrick J. Wong
2020-08-24 21:13           ` Darrick J. Wong
2020-08-21  2:12 ` [PATCH 09/11] xfs: refactor quota timestamp coding Darrick J. Wong
2020-08-22  7:33   ` Christoph Hellwig
2020-08-24  2:38     ` Darrick J. Wong
2020-08-21  2:12 ` [PATCH 10/11] xfs: enable bigtime for quota timers Darrick J. Wong
2020-08-22  7:36   ` Christoph Hellwig
2020-08-24  2:39     ` Darrick J. Wong
2020-08-21  2:12 ` [PATCH 11/11] xfs: enable big timestamps Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2020-08-17 22:56 [PATCH v2 00/11] xfs: widen timestamps to deal with y2038 Darrick J. Wong
2020-08-17 22:57 ` [PATCH 08/11] xfs: widen ondisk timestamps to deal with y2038 problem Darrick J. Wong
2020-08-18 12:00   ` Amir Goldstein
2020-08-18 12:53     ` Amir Goldstein
2020-08-18 15:53       ` Darrick J. Wong
2020-08-18 20:52         ` Darrick J. Wong
2020-08-18 15:44     ` Darrick J. Wong
2020-08-18 23:35   ` Dave Chinner
2020-08-19 21:43     ` Darrick J. Wong
2020-08-19 23:58       ` Dave Chinner
2020-08-20  0:01       ` Darrick J. Wong
2020-08-20  4:42         ` griffin tucker
2020-08-20 16:23           ` Darrick J. Wong
2020-08-21  5:02             ` griffin tucker
2020-08-21 15:31               ` Mike Fleetwood
2020-08-20  5:11         ` Amir Goldstein
2020-08-20 22:47           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200825003945.GA6096@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.