linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/5] xfs: y2038 conversion
@ 2019-11-12 12:09 Arnd Bergmann
  2019-11-12 12:09 ` [RFC 1/5] xfs: [variant A] avoid time_t in user api Arnd Bergmann
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Arnd Bergmann @ 2019-11-12 12:09 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: linux-xfs, y2038, arnd, Deepa Dinamani, Christoph Hellwig,
	Dave Chinner, Brian Foster, Allison Collins, linux-fsdevel

This is part of a longer set of changes to clean up the last remaining
bits for the y2038 conversion. In XFS, three distinct problems need to
be addressed:

1. The use of time_t in kernel sources -- I'm in the process
   of removing all of them so we can remove the definition itself,
   making it harder to write new y2038-unsafe code. This part is
   trivially done as a side-effect of the other two.

2. The use of time_t in a user API header for ioctls. When
   building against a new 32-bit libc with 64-bit time_t, the structures
   no longer match and we get incorrect data from ioctls.  Unfortunately,
   there is no good way to fix XFS_IOC_FSBULKSTAT, I considered different
   approaches and in the end came up with three variants that are all
   part of this series. The idea is to pick just one of course.

3. On-disk timestamps hitting the y2038 limit. This applies to both
   inode timestamps and quota data. Both are extended to 40 bits,
   with the minimum timestamp still being year 1902, and the maximum
   getting extended from 2038 to 36744.

Please review and let me know which of ioctl API changes makes the
most sense. I have not done any actual runtime testing on the patches,
so this is clearly too late for the next merge window, but I hope to
get it all merged for v5.6.

      Arnd

Arnd Bergmann (5):
  xfs: [variant A] avoid time_t in user api
  xfs: [variant B] add time64 version of xfs_bstat
  xfs: [variant C] avoid i386-misaligned xfs_bstat
  xfs: extend inode format for 40-bit timestamps
  xfs: use 40-bit quota time limits

 fs/xfs/libxfs/xfs_dquot_buf.c   |   6 +-
 fs/xfs/libxfs/xfs_format.h      |  11 +-
 fs/xfs/libxfs/xfs_fs.h          |  37 +++++-
 fs/xfs/libxfs/xfs_inode_buf.c   |  28 +++--
 fs/xfs/libxfs/xfs_inode_buf.h   |   1 +
 fs/xfs/libxfs/xfs_log_format.h  |   6 +-
 fs/xfs/libxfs/xfs_trans_inode.c |   3 +-
 fs/xfs/xfs_dquot.c              |  29 +++--
 fs/xfs/xfs_inode.c              |   3 +-
 fs/xfs/xfs_inode_item.c         |  10 +-
 fs/xfs/xfs_ioctl.c              | 195 +++++++++++++++++++++++++++++++-
 fs/xfs/xfs_ioctl.h              |  12 ++
 fs/xfs/xfs_ioctl32.c            | 160 +++++---------------------
 fs/xfs/xfs_ioctl32.h            |  26 ++---
 fs/xfs/xfs_iops.c               |   3 +-
 fs/xfs/xfs_itable.c             |   2 +-
 fs/xfs/xfs_qm.c                 |  18 ++-
 fs/xfs/xfs_qm.h                 |   6 +-
 fs/xfs/xfs_qm_syscalls.c        |  16 ++-
 fs/xfs/xfs_quotaops.c           |   6 +-
 fs/xfs/xfs_super.c              |   2 +-
 fs/xfs/xfs_trans_dquot.c        |  17 ++-
 22 files changed, 387 insertions(+), 210 deletions(-)

-- 
2.20.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC 1/5] xfs: [variant A] avoid time_t in user api
  2019-11-12 12:09 [RFC 0/5] xfs: y2038 conversion Arnd Bergmann
@ 2019-11-12 12:09 ` Arnd Bergmann
  2019-11-12 14:16   ` Christoph Hellwig
  2019-11-12 12:09 ` [RFC 2/5] xfs: [variant B] add time64 version of xfs_bstat Arnd Bergmann
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2019-11-12 12:09 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: linux-xfs, y2038, arnd, Deepa Dinamani, Christoph Hellwig,
	Dave Chinner, Brian Foster, Allison Collins, linux-fsdevel

The ioctl definitions for XFS_IOC_SWAPEXT, XFS_IOC_FSBULKSTAT and
XFS_IOC_FSBULKSTAT_SINGLE are part of libxfs and based on time_t.

The definition for time_t differs between current kernels and coming
32-bit libc variants that define it as 64-bit. For most ioctls, that
means the kernel has to be able to handle two different command codes
based on the different structure sizes.

The same solution could be applied for XFS_IOC_SWAPEXT, but it would
not work for XFS_IOC_FSBULKSTAT and XFS_IOC_FSBULKSTAT_SINGLE because
the structure with the time_t is passed through an indirect pointer,
and the command number itself is based on struct xfs_fsop_bulkreq,
which does not differ based on time_t.

This means any solution that can be applied requires a change of the
ABI definition in the xfs_fs.h header file, as well as doing the same
change in any user application that contains a copy of this header.

The usual solution would be to define a replacement structure and
use conditional compilation for the ioctl command codes to use
one or the other, such as

 #define XFS_IOC_FSBULKSTAT_OLD _IOWR('X', 101, struct xfs_fsop_bulkreq)
 #define XFS_IOC_FSBULKSTAT_NEW _IOWR('X', 129, struct xfs_fsop_bulkreq)
 #define XFS_IOC_FSBULKSTAT ((sizeof(time_t) == sizeof(__kernel_long_t)) ? \
			     XFS_IOC_FSBULKSTAT_OLD : XFS_IOC_FSBULKSTAT_NEW)

After this, the kernel would be able to implement both
XFS_IOC_FSBULKSTAT_OLD and XFS_IOC_FSBULKSTAT_NEW handlers on
32-bit architectures with the correct ABI for either definition
of time_t.

However, as long as two observations are true, a much simpler solution
can be used:

1. xfsprogs is the only user space project that has a copy of this header
2. xfsprogs already has a replacement for all three affected ioctl commands,
   based on the xfs_bulkstat structure to pass 64-bit timestamps
   regardless of the architecture

Based on those assumptions, changing xfs_bstime to use __kernel_long_t
instead of time_t in both the kernel and in xfsprogs preserves the current
ABI for any libc definition of time_t and solves the problem of passing
64-bit timestamps to 32-bit user space.

If either of the two assumptions is invalid, more discussion is needed
for coming up with a way to fix as much of the affected user space
code as possible.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 fs/xfs/libxfs/xfs_fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index e9371a8e0e26..4c4330f6e653 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -324,7 +324,7 @@ typedef struct xfs_growfs_rt {
  * Structures returned from ioctl XFS_IOC_FSBULKSTAT & XFS_IOC_FSBULKSTAT_SINGLE
  */
 typedef struct xfs_bstime {
-	time_t		tv_sec;		/* seconds		*/
+	__kernel_long_t tv_sec;		/* seconds		*/
 	__s32		tv_nsec;	/* and nanoseconds	*/
 } xfs_bstime_t;
 
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 2/5] xfs: [variant B] add time64 version of xfs_bstat
  2019-11-12 12:09 [RFC 0/5] xfs: y2038 conversion Arnd Bergmann
  2019-11-12 12:09 ` [RFC 1/5] xfs: [variant A] avoid time_t in user api Arnd Bergmann
@ 2019-11-12 12:09 ` Arnd Bergmann
  2019-11-12 12:09 ` [RFC 3/5] xfs: [variant C] avoid i386-misaligned xfs_bstat Arnd Bergmann
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2019-11-12 12:09 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: linux-xfs, y2038, arnd, Deepa Dinamani, Christoph Hellwig,
	Dave Chinner, Brian Foster, Allison Collins, linux-fsdevel

The ioctl definitions for XFS_IOC_SWAPEXT, XFS_IOC_FSBULKSTAT and
XFS_IOC_FSBULKSTAT_SINGLE are part of libxfs and based on time_t.

The definition for time_t differs between current kernels and coming
32-bit libc variants that define it as 64-bit. For most ioctls, that
means the kernel has to be able to handle two different command codes
based on the different structure sizes.

The same solution is be applied for XFS_IOC_SWAPEXT, but it would does
work for XFS_IOC_FSBULKSTAT and XFS_IOC_FSBULKSTAT_SINGLE because the
structure with the time_t is passed through an indirect pointer, and
the command number itself is based on struct xfs_fsop_bulkreq, which
does not differ based on time_t.

The best workaround I could come up with is to change the header file to
define new command numbers with the same structure and have users pick
one or the other at compile-time based on the time_t definition in the
C library, like:

 #define XFS_IOC_FSBULKSTAT_OLD _IOWR('X', 101, struct xfs_fsop_bulkreq)
 #define XFS_IOC_FSBULKSTAT_NEW _IOWR('X', 129, struct xfs_fsop_bulkreq)
 #define XFS_IOC_FSBULKSTAT ((sizeof(time_t) == sizeof(__kernel_long_t)) ? \
                             XFS_IOC_FSBULKSTAT_OLD : XFS_IOC_FSBULKSTAT_NEW)

The native xfs_ioctl now handles both the time32 and the time64 version
of the xfs_bstat data structure, and this gets called indirectly by the
compat code implementing the xfs_fsop_bulkreq commands. For x86, we
still need another implementation to deal with the broken alignment,
so the existing code is left in ioctl32 but changed to now handle
the misaligned time64 structure.

Based on the requirement to change the header file, a much simpler fix
would be to change the time_t reference to 'long' and always keep
passing the shorter timestamp as on 32-bit applications using the old
ioctl, forcing code changes to use the v5 API to access post-y2038
timestamps, and still requiring the updated header when building with
a new C library.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
Note: I have not tested this patch beyond compiling it, for now this
is for discussion, to decide which approach makes more sense.
---
 fs/xfs/libxfs/xfs_fs.h |  19 +++-
 fs/xfs/xfs_ioctl.c     | 195 +++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_ioctl.h     |  12 +++
 fs/xfs/xfs_ioctl32.c   |  82 ++++++++++-------
 fs/xfs/xfs_ioctl32.h   |  26 +++---
 5 files changed, 278 insertions(+), 56 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 4c4330f6e653..9310576a45e5 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -324,8 +324,13 @@ typedef struct xfs_growfs_rt {
  * Structures returned from ioctl XFS_IOC_FSBULKSTAT & XFS_IOC_FSBULKSTAT_SINGLE
  */
 typedef struct xfs_bstime {
-	__kernel_long_t tv_sec;		/* seconds		*/
+#ifdef __KERNEL__
+	__s64		tv_sec;		/* seconds		*/
 	__s32		tv_nsec;	/* and nanoseconds	*/
+#else
+	time_t		tv_sec;		/* seconds		*/
+	__s32		tv_nsec;	/* and nanoseconds	*/
+#endif
 } xfs_bstime_t;
 
 struct xfs_bstat {
@@ -775,8 +780,8 @@ struct xfs_scrub_metadata {
  * ioctl commands that replace IRIX syssgi()'s
  */
 #define XFS_IOC_FSGEOMETRY_V1	     _IOR ('X', 100, struct xfs_fsop_geom_v1)
-#define XFS_IOC_FSBULKSTAT	     _IOWR('X', 101, struct xfs_fsop_bulkreq)
-#define XFS_IOC_FSBULKSTAT_SINGLE    _IOWR('X', 102, struct xfs_fsop_bulkreq)
+#define XFS_IOC_FSBULKSTAT_OLD	     _IOWR('X', 101, struct xfs_fsop_bulkreq)
+#define XFS_IOC_FSBULKSTAT_SINGLE_OLD _IOWR('X', 102, struct xfs_fsop_bulkreq)
 #define XFS_IOC_FSINUMBERS	     _IOWR('X', 103, struct xfs_fsop_bulkreq)
 #define XFS_IOC_PATH_TO_FSHANDLE     _IOWR('X', 104, struct xfs_fsop_handlereq)
 #define XFS_IOC_PATH_TO_HANDLE	     _IOWR('X', 105, struct xfs_fsop_handlereq)
@@ -805,6 +810,14 @@ struct xfs_scrub_metadata {
 #define XFS_IOC_FSGEOMETRY	     _IOR ('X', 126, struct xfs_fsop_geom)
 #define XFS_IOC_BULKSTAT	     _IOR ('X', 127, struct xfs_bulkstat_req)
 #define XFS_IOC_INUMBERS	     _IOR ('X', 128, struct xfs_inumbers_req)
+#define XFS_IOC_FSBULKSTAT_NEW	     _IOWR('X', 129, struct xfs_fsop_bulkreq)
+#define XFS_IOC_FSBULKSTAT_SINGLE_NEW _IOWR('X', 130, struct xfs_fsop_bulkreq)
+
+#define XFS_IOC_FSBULKSTAT	  ((sizeof(time_t) == sizeof(__kernel_long_t)) ? \
+				   XFS_IOC_FSBULKSTAT_OLD : XFS_IOC_FSBULKSTAT_NEW)
+#define XFS_IOC_FSBULKSTAT_SINGLE ((sizeof(time_t) == sizeof(__kernel_long_t)) ? \
+				   XFS_IOC_FSBULKSTAT_OLD : XFS_IOC_FSBULKSTAT_NEW)
+
 /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
 
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d58f0d6a699e..d50135760622 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -24,6 +24,7 @@
 #include "xfs_export.h"
 #include "xfs_trace.h"
 #include "xfs_icache.h"
+#include "xfs_ioctl.h"
 #include "xfs_trans.h"
 #include "xfs_acl.h"
 #include "xfs_btree.h"
@@ -713,7 +714,96 @@ xfs_ioc_space(
 	return error;
 }
 
+/*
+ * Structures returned from ioctl XFS_IOC_FSBULKSTAT_TIME32 & XFS_IOC_FSBULKSTAT_SINGLE_TIME32
+ */
+struct xfs_bstime32 {
+	__s32		tv_sec;		/* seconds		*/
+	__s32		tv_nsec;	/* and nanoseconds	*/
+};
+
+struct xfs_bstat_time32 {
+	__u64		bs_ino;		/* inode number			*/
+	__u16		bs_mode;	/* type and mode		*/
+	__u16		bs_nlink;	/* number of links		*/
+	__u32		bs_uid;		/* user id			*/
+	__u32		bs_gid;		/* group id			*/
+	__u32		bs_rdev;	/* device value			*/
+	__s32		bs_blksize;	/* block size			*/
+	__s64		bs_size;	/* file size			*/
+	struct xfs_bstime32 bs_atime;	/* access time			*/
+	struct xfs_bstime32 bs_mtime;	/* modify time			*/
+	struct xfs_bstime32 bs_ctime;	/* inode change time		*/
+	int64_t		bs_blocks;	/* number of blocks		*/
+	__u32		bs_xflags;	/* extended flags		*/
+	__s32		bs_extsize;	/* extent size			*/
+	__s32		bs_extents;	/* number of extents		*/
+	__u32		bs_gen;		/* generation count		*/
+	__u16		bs_projid_lo;	/* lower part of project id	*/
+	__u16		bs_forkoff;	/* inode fork offset in bytes	*/
+	__u16		bs_projid_hi;	/* higher part of project id	*/
+	uint16_t	bs_sick;	/* sick inode metadata		*/
+	uint16_t	bs_checked;	/* checked inode metadata	*/
+	unsigned char	bs_pad[2];	/* pad space, unused		*/
+	__u32		bs_cowextsize;	/* cow extent size		*/
+	__u32		bs_dmevmask;	/* DMIG event mask		*/
+	__u16		bs_dmstate;	/* DMIG state info		*/
+	__u16		bs_aextents;	/* attribute number of extents	*/
+} __compat_packed; /* packing for x86-64 compat mode */
+
+/* Convert bulkstat (v5) to bstat (v1) with 32-bit time_t. */
+void
+xfs_bulkstat_to_bstat_time32(
+	struct xfs_mount		*mp,
+	struct xfs_bstat_time32		*bs1,
+	const struct xfs_bulkstat	*bstat)
+{
+	/* memset is needed here because of padding holes in the structure. */
+	memset(bs1, 0, sizeof(struct xfs_bstat));
+	bs1->bs_ino = bstat->bs_ino;
+	bs1->bs_mode = bstat->bs_mode;
+	bs1->bs_nlink = bstat->bs_nlink;
+	bs1->bs_uid = bstat->bs_uid;
+	bs1->bs_gid = bstat->bs_gid;
+	bs1->bs_rdev = bstat->bs_rdev;
+	bs1->bs_blksize = bstat->bs_blksize;
+	bs1->bs_size = bstat->bs_size;
+	bs1->bs_atime.tv_sec = bstat->bs_atime;
+	bs1->bs_mtime.tv_sec = bstat->bs_mtime;
+	bs1->bs_ctime.tv_sec = bstat->bs_ctime;
+	bs1->bs_atime.tv_nsec = bstat->bs_atime_nsec;
+	bs1->bs_mtime.tv_nsec = bstat->bs_mtime_nsec;
+	bs1->bs_ctime.tv_nsec = bstat->bs_ctime_nsec;
+	bs1->bs_blocks = bstat->bs_blocks;
+	bs1->bs_xflags = bstat->bs_xflags;
+	bs1->bs_extsize = XFS_FSB_TO_B(mp, bstat->bs_extsize_blks);
+	bs1->bs_extents = bstat->bs_extents;
+	bs1->bs_gen = bstat->bs_gen;
+	bs1->bs_projid_lo = bstat->bs_projectid & 0xFFFF;
+	bs1->bs_forkoff = bstat->bs_forkoff;
+	bs1->bs_projid_hi = bstat->bs_projectid >> 16;
+	bs1->bs_sick = bstat->bs_sick;
+	bs1->bs_checked = bstat->bs_checked;
+	bs1->bs_cowextsize = XFS_FSB_TO_B(mp, bstat->bs_cowextsize_blks);
+	bs1->bs_dmevmask = 0;
+	bs1->bs_dmstate = 0;
+	bs1->bs_aextents = bstat->bs_aextents;
+}
+
 /* Return 0 on success or positive error */
+int
+xfs_fsbulkstat_time32_one_fmt(
+	struct xfs_ibulk		*breq,
+	const struct xfs_bulkstat	*bstat)
+{
+	struct xfs_bstat_time32		bs1;
+
+	xfs_bulkstat_to_bstat_time32(breq->mp, &bs1, bstat);
+	if (copy_to_user(breq->ubuffer, &bs1, sizeof(bs1)))
+		return -EFAULT;
+	return xfs_ibulk_advance(breq, sizeof(struct xfs_bstat_time32));
+}
+
 int
 xfs_fsbulkstat_one_fmt(
 	struct xfs_ibulk		*breq,
@@ -789,18 +879,40 @@ xfs_ioc_fsbulkstat(
 	 * is a special case because it has traditionally meant "first inode
 	 * in filesystem".
 	 */
-	if (cmd == XFS_IOC_FSINUMBERS) {
+	switch (cmd) {
+	case XFS_IOC_FSINUMBERS:
 		breq.startino = lastino ? lastino + 1 : 0;
 		error = xfs_inumbers(&breq, xfs_fsinumbers_fmt);
 		lastino = breq.startino - 1;
-	} else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE) {
+		break;
+	case XFS_IOC_FSBULKSTAT_SINGLE_OLD:
+		if (!IS_ENABLED(CONFIG_64BIT)) {
+			breq.startino = lastino;
+			breq.icount = 1;
+			error = xfs_bulkstat_one(&breq, xfs_fsbulkstat_time32_one_fmt);
+			break;
+		}
+		/* Fallthrough */
+	case XFS_IOC_FSBULKSTAT_SINGLE_NEW:
 		breq.startino = lastino;
 		breq.icount = 1;
 		error = xfs_bulkstat_one(&breq, xfs_fsbulkstat_one_fmt);
-	} else {	/* XFS_IOC_FSBULKSTAT */
+		break;
+	case XFS_IOC_FSBULKSTAT_OLD:
+		if (!IS_ENABLED(CONFIG_64BIT)) {
+			breq.startino = lastino ? lastino + 1 : 0;
+			error = xfs_bulkstat(&breq, xfs_fsbulkstat_time32_one_fmt);
+			lastino = breq.startino - 1;
+			break;
+		}
+		/* Fallthrough */
+	case XFS_IOC_FSBULKSTAT_NEW:
 		breq.startino = lastino ? lastino + 1 : 0;
 		error = xfs_bulkstat(&breq, xfs_fsbulkstat_one_fmt);
 		lastino = breq.startino - 1;
+		break;
+	default:
+		error = -EINVAL;
 	}
 
 	if (error)
@@ -2093,6 +2205,74 @@ xfs_ioc_setlabel(
 	return error;
 }
 
+static int get_xfs_bstime32(struct xfs_bstime *bstime,
+				struct xfs_bstime32 __user *bstime32)
+{
+	struct xfs_bstime32 t;
+
+	if (copy_from_user(&t, bstime32, sizeof(t)))
+		return -EFAULT;
+
+	*bstime = (struct xfs_bstime){
+		.tv_sec = t.tv_sec,
+		.tv_nsec = t.tv_nsec,
+	};
+
+	return 0;
+}
+
+static int get_xfs_bstat_time32(struct xfs_bstat *bstat,
+				struct xfs_bstat_time32 __user *bstat32)
+{
+	if (copy_from_user(bstat, bstat32,
+			   offsetof(struct xfs_bstat, bs_atime)))
+		return -EFAULT;
+
+	if (get_xfs_bstime32(&bstat->bs_atime, &bstat32->bs_atime) ||
+	    get_xfs_bstime32(&bstat->bs_mtime, &bstat32->bs_mtime) ||
+	    get_xfs_bstime32(&bstat->bs_ctime, &bstat32->bs_ctime))
+		return -EFAULT;
+
+	if (copy_from_user(&bstat->bs_blocks, &bstat32->bs_blocks,
+			   sizeof(struct xfs_bstat) -
+			   offsetof(struct xfs_bstat, bs_blocks)))
+		return -EFAULT;
+
+	return 0;
+}
+
+/*
+ * Structure passed to XFS_IOC_SWAPEXT_TIME32
+ */
+struct xfs_swapext_time32
+{
+	int64_t		sx_version;	/* version */
+	int64_t		sx_fdtarget;	/* fd of target file */
+	int64_t		sx_fdtmp;	/* fd of tmp file */
+	xfs_off_t	sx_offset;	/* offset into file */
+	xfs_off_t	sx_length;	/* leng from offset */
+	char		sx_pad[16];	/* pad space, unused */
+	struct xfs_bstat_time32 sx_stat;/* stat of target b4 copy */
+};
+#define XFS_IOC_SWAPEXT_TIME32    _IOWR('X', 109, struct xfs_swapext_time32)
+
+static int get_xfs_swapext(struct xfs_swapext *sxp, unsigned int cmd, void __user *arg)
+{
+	struct xfs_swapext_time32 *sxp32 = arg;
+	int ret;
+
+	if (cmd == XFS_IOC_SWAPEXT) {
+		ret = copy_from_user(sxp, arg, sizeof(struct xfs_swapext));
+		return ret ? -EFAULT : 0;
+	}
+
+	ret = copy_from_user(sxp, arg, offsetof(struct xfs_swapext, sx_stat));
+	if (ret)
+		return -EFAULT;
+
+	return get_xfs_bstat_time32(&sxp->sx_stat, &sxp32->sx_stat);
+}
+
 /*
  * Note: some of the ioctl's return positive numbers as a
  * byte count indicating success, such as readlink_by_handle.
@@ -2149,8 +2329,10 @@ xfs_file_ioctl(
 		return 0;
 	}
 
-	case XFS_IOC_FSBULKSTAT_SINGLE:
-	case XFS_IOC_FSBULKSTAT:
+	case XFS_IOC_FSBULKSTAT_SINGLE_OLD:
+	case XFS_IOC_FSBULKSTAT_OLD:
+	case XFS_IOC_FSBULKSTAT_SINGLE_NEW:
+	case XFS_IOC_FSBULKSTAT_NEW:
 	case XFS_IOC_FSINUMBERS:
 		return xfs_ioc_fsbulkstat(mp, cmd, arg);
 
@@ -2242,10 +2424,11 @@ xfs_file_ioctl(
 	case XFS_IOC_ATTRMULTI_BY_HANDLE:
 		return xfs_attrmulti_by_handle(filp, arg);
 
+	case XFS_IOC_SWAPEXT_TIME32:
 	case XFS_IOC_SWAPEXT: {
 		struct xfs_swapext	sxp;
 
-		if (copy_from_user(&sxp, arg, sizeof(xfs_swapext_t)))
+		if (get_xfs_swapext(&sxp, cmd, arg))
 			return -EFAULT;
 		error = mnt_want_write_file(filp);
 		if (error)
diff --git a/fs/xfs/xfs_ioctl.h b/fs/xfs/xfs_ioctl.h
index 654c0bb1bcf8..318ff243258f 100644
--- a/fs/xfs/xfs_ioctl.h
+++ b/fs/xfs/xfs_ioctl.h
@@ -83,6 +83,18 @@ struct xfs_inogrp;
 
 int xfs_fsbulkstat_one_fmt(struct xfs_ibulk *breq,
 			   const struct xfs_bulkstat *bstat);
+int xfs_fsbulkstat_time32_one_fmt(struct xfs_ibulk *breq,
+			   const struct xfs_bulkstat *bstat);
 int xfs_fsinumbers_fmt(struct xfs_ibulk *breq, const struct xfs_inumbers *igrp);
 
+/*
+ * On intel, even if sizes match, alignment and/or padding may differ.
+ */
+#if defined(CONFIG_IA64) || defined(CONFIG_X86_64)
+#define BROKEN_X86_ALIGNMENT
+#define __compat_packed __attribute__((packed))
+#else
+#define __compat_packed
+#endif
+
 #endif
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 1e08bf79b478..2ea7d3e12b4b 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -98,28 +98,18 @@ xfs_fsinumbers_fmt_compat(
 	return xfs_ibulk_advance(breq, sizeof(struct compat_xfs_inogrp));
 }
 
-#else
-#define xfs_fsinumbers_fmt_compat xfs_fsinumbers_fmt
-#endif	/* BROKEN_X86_ALIGNMENT */
-
+/* struct xfs_bstat has differing alignment on intel */
 STATIC int
 xfs_ioctl32_bstime_copyin(
 	xfs_bstime_t		*bstime,
 	compat_xfs_bstime_t	__user *bstime32)
 {
-	compat_time_t		sec32;	/* tv_sec differs on 64 vs. 32 */
-
-	if (get_user(sec32,		&bstime32->tv_sec)	||
+	if (get_user(bstime->tv_sec,	&bstime32->tv_sec)	||
 	    get_user(bstime->tv_nsec,	&bstime32->tv_nsec))
 		return -EFAULT;
-	bstime->tv_sec = sec32;
 	return 0;
 }
 
-/*
- * struct xfs_bstat has differing alignment on intel, & bstime_t sizes
- * everywhere
- */
 STATIC int
 xfs_ioctl32_bstat_copyin(
 	struct xfs_bstat		*bstat,
@@ -158,10 +148,7 @@ xfs_bstime_store_compat(
 	compat_xfs_bstime_t	__user *p32,
 	const xfs_bstime_t	*p)
 {
-	__s32			sec32;
-
-	sec32 = p->tv_sec;
-	if (put_user(sec32, &p32->tv_sec) ||
+	if (put_user(p->tv_sec, &p32->tv_sec) ||
 	    put_user(p->tv_nsec, &p32->tv_nsec))
 		return -EFAULT;
 	return 0;
@@ -206,6 +193,10 @@ xfs_fsbulkstat_one_fmt_compat(
 	return xfs_ibulk_advance(breq, sizeof(struct compat_xfs_bstat));
 }
 
+#else
+#define xfs_fsinumbers_fmt_compat xfs_fsinumbers_fmt
+#endif	/* BROKEN_X86_ALIGNMENT */
+
 /* copied from xfs_ioctl.c */
 STATIC int
 xfs_compat_ioc_fsbulkstat(
@@ -229,7 +220,12 @@ xfs_compat_ioc_fsbulkstat(
 	 * functions and structure size are the correct ones to use ...
 	 */
 	inumbers_fmt_pf		inumbers_func = xfs_fsinumbers_fmt_compat;
-	bulkstat_one_fmt_pf	bs_one_func = xfs_fsbulkstat_one_fmt_compat;
+	bulkstat_one_fmt_pf	bs_one_func_old = xfs_fsbulkstat_time32_one_fmt;
+	bulkstat_one_fmt_pf	bs_one_func_new = xfs_fsbulkstat_one_fmt;
+
+#ifdef BROKEN_X86_ALIGNMENT
+	bs_one_func_new = xfs_fsbulkstat_one_fmt_compat;
+#endif
 
 #ifdef CONFIG_X86_X32
 	if (in_x32_syscall()) {
@@ -242,7 +238,7 @@ xfs_compat_ioc_fsbulkstat(
 		 * x32 userspace expects.
 		 */
 		inumbers_func = xfs_fsinumbers_fmt;
-		bs_one_func = xfs_fsbulkstat_one_fmt;
+		bs_one_func_old = xfs_fsbulkstat_one_fmt;
 	}
 #endif
 
@@ -289,21 +285,37 @@ xfs_compat_ioc_fsbulkstat(
 	 * is a special case because it has traditionally meant "first inode
 	 * in filesystem".
 	 */
-	if (cmd == XFS_IOC_FSINUMBERS_32) {
+	switch (cmd) {
+	case XFS_IOC_FSINUMBERS_32:
 		breq.startino = lastino ? lastino + 1 : 0;
 		error = xfs_inumbers(&breq, inumbers_func);
 		lastino = breq.startino - 1;
-	} else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE_32) {
+		break;
+	case XFS_IOC_FSBULKSTAT_SINGLE_OLD32:
 		breq.startino = lastino;
 		breq.icount = 1;
-		error = xfs_bulkstat_one(&breq, bs_one_func);
+		error = xfs_bulkstat_one(&breq, bs_one_func_old);
 		lastino = breq.startino;
-	} else if (cmd == XFS_IOC_FSBULKSTAT_32) {
+		break;
+	case XFS_IOC_FSBULKSTAT_OLD32:
 		breq.startino = lastino ? lastino + 1 : 0;
-		error = xfs_bulkstat(&breq, bs_one_func);
+		error = xfs_bulkstat(&breq, bs_one_func_old);
 		lastino = breq.startino - 1;
-	} else {
+		break;
+	case XFS_IOC_FSBULKSTAT_SINGLE_NEW32:
+		breq.startino = lastino;
+		breq.icount = 1;
+		error = xfs_bulkstat_one(&breq, bs_one_func_new);
+		lastino = breq.startino;
+		break;
+	case XFS_IOC_FSBULKSTAT_NEW32:
+		breq.startino = lastino ? lastino + 1 : 0;
+		error = xfs_bulkstat(&breq, bs_one_func_new);
+		lastino = breq.startino - 1;
+		break;
+	default:
 		error = -EINVAL;
+		break;
 	}
 	if (error)
 		return error;
@@ -548,7 +560,9 @@ xfs_file_compat_ioctl(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	void			__user *arg = compat_ptr(p);
+#if defined(BROKEN_X86_ALIGNMENT)
 	int			error;
+#endif
 
 	trace_xfs_file_compat_ioctl(ip);
 
@@ -596,13 +610,6 @@ xfs_file_compat_ioctl(
 		mnt_drop_write_file(filp);
 		return error;
 	}
-#endif
-	/* long changes size, but xfs only copiese out 32 bits */
-	case XFS_IOC_GETXFLAGS_32:
-	case XFS_IOC_SETXFLAGS_32:
-	case XFS_IOC_GETVERSION_32:
-		cmd = _NATIVE_IOC(cmd, long);
-		return xfs_file_ioctl(filp, cmd, p);
 	case XFS_IOC_SWAPEXT_32: {
 		struct xfs_swapext	  sxp;
 		struct compat_xfs_swapext __user *sxu = arg;
@@ -619,8 +626,17 @@ xfs_file_compat_ioctl(
 		mnt_drop_write_file(filp);
 		return error;
 	}
-	case XFS_IOC_FSBULKSTAT_32:
-	case XFS_IOC_FSBULKSTAT_SINGLE_32:
+#endif
+	/* long changes size, but xfs only copiese out 32 bits */
+	case XFS_IOC_GETXFLAGS_32:
+	case XFS_IOC_SETXFLAGS_32:
+	case XFS_IOC_GETVERSION_32:
+		cmd = _NATIVE_IOC(cmd, long);
+		return xfs_file_ioctl(filp, cmd, p);
+	case XFS_IOC_FSBULKSTAT_OLD32:
+	case XFS_IOC_FSBULKSTAT_SINGLE_OLD32:
+	case XFS_IOC_FSBULKSTAT_NEW32:
+	case XFS_IOC_FSBULKSTAT_SINGLE_NEW32:
 	case XFS_IOC_FSINUMBERS_32:
 		return xfs_compat_ioc_fsbulkstat(mp, cmd, arg);
 	case XFS_IOC_FD_TO_HANDLE_32:
diff --git a/fs/xfs/xfs_ioctl32.h b/fs/xfs/xfs_ioctl32.h
index 7985344d3aa6..d7050f4360e9 100644
--- a/fs/xfs/xfs_ioctl32.h
+++ b/fs/xfs/xfs_ioctl32.h
@@ -21,20 +21,11 @@
 #define XFS_IOC_SETXFLAGS_32	FS_IOC32_SETFLAGS
 #define XFS_IOC_GETVERSION_32	FS_IOC32_GETVERSION
 
-/*
- * On intel, even if sizes match, alignment and/or padding may differ.
- */
-#if defined(CONFIG_IA64) || defined(CONFIG_X86_64)
-#define BROKEN_X86_ALIGNMENT
-#define __compat_packed __attribute__((packed))
-#else
-#define __compat_packed
-#endif
-
+#ifdef BROKEN_X86_ALIGNMENT
 typedef struct compat_xfs_bstime {
-	compat_time_t	tv_sec;		/* seconds		*/
+	__s64		tv_sec;		/* seconds		*/
 	__s32		tv_nsec;	/* and nanoseconds	*/
-} compat_xfs_bstime_t;
+} __compat_packed compat_xfs_bstime_t;
 
 struct compat_xfs_bstat {
 	__u64		bs_ino;		/* inode number			*/
@@ -62,6 +53,7 @@ struct compat_xfs_bstat {
 	__u16		bs_dmstate;	/* DMIG state info		*/
 	__u16		bs_aextents;	/* attribute number of extents	*/
 } __compat_packed;
+#endif
 
 struct compat_xfs_fsop_bulkreq {
 	compat_uptr_t	lastip;		/* last inode # pointer		*/
@@ -70,10 +62,14 @@ struct compat_xfs_fsop_bulkreq {
 	compat_uptr_t	ocount;		/* output count pointer		*/
 };
 
-#define XFS_IOC_FSBULKSTAT_32 \
+#define XFS_IOC_FSBULKSTAT_OLD32 \
 	_IOWR('X', 101, struct compat_xfs_fsop_bulkreq)
-#define XFS_IOC_FSBULKSTAT_SINGLE_32 \
+#define XFS_IOC_FSBULKSTAT_SINGLE_OLD32 \
 	_IOWR('X', 102, struct compat_xfs_fsop_bulkreq)
+#define XFS_IOC_FSBULKSTAT_NEW32 \
+	_IOWR('X', 129, struct compat_xfs_fsop_bulkreq)
+#define XFS_IOC_FSBULKSTAT_SINGLE_NEW32 \
+	_IOWR('X', 130, struct compat_xfs_fsop_bulkreq)
 #define XFS_IOC_FSINUMBERS_32 \
 	_IOWR('X', 103, struct compat_xfs_fsop_bulkreq)
 
@@ -98,6 +94,7 @@ typedef struct compat_xfs_fsop_handlereq {
 #define XFS_IOC_READLINK_BY_HANDLE_32 \
 	_IOWR('X', 108, struct compat_xfs_fsop_handlereq)
 
+#ifdef BROKEN_X86_ALIGNMENT
 /* The bstat field in the swapext struct needs translation */
 typedef struct compat_xfs_swapext {
 	int64_t			sx_version;	/* version */
@@ -110,6 +107,7 @@ typedef struct compat_xfs_swapext {
 } __compat_packed compat_xfs_swapext_t;
 
 #define XFS_IOC_SWAPEXT_32	_IOWR('X', 109, struct compat_xfs_swapext)
+#endif
 
 typedef struct compat_xfs_fsop_attrlist_handlereq {
 	struct compat_xfs_fsop_handlereq hreq; /* handle interface structure */
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 3/5] xfs: [variant C] avoid i386-misaligned xfs_bstat
  2019-11-12 12:09 [RFC 0/5] xfs: y2038 conversion Arnd Bergmann
  2019-11-12 12:09 ` [RFC 1/5] xfs: [variant A] avoid time_t in user api Arnd Bergmann
  2019-11-12 12:09 ` [RFC 2/5] xfs: [variant B] add time64 version of xfs_bstat Arnd Bergmann
@ 2019-11-12 12:09 ` Arnd Bergmann
  2019-11-12 12:09 ` [RFC 4/5] xfs: extend inode format for 40-bit timestamps Arnd Bergmann
  2019-11-12 12:09 ` [RFC 5/5] xfs: use 40-bit quota time limits Arnd Bergmann
  4 siblings, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2019-11-12 12:09 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: linux-xfs, y2038, arnd, Deepa Dinamani, Christoph Hellwig,
	Dave Chinner, Brian Foster, Allison Collins, linux-fsdevel

The change to the ioctl commands requires updating the header file. If
we do that, the alignment problem on i386 could be solved for the new
format by explictly aligning all 64-bit members to time_t. This complicates
the header file definition, but saves the code to deal with a third format.

The native ioctl simply deals with both the time32 and the time64 version,
and the latter is the same across all 64-bit architectures and all 32-bit
user space with 32-bit time_t including i386 and x32.

If we decide to do this, this change should be folded into "xfs: [variant B]
add time64 version of xfs_bstat".

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 fs/xfs/libxfs/xfs_fs.h |  20 +++++--
 fs/xfs/xfs_ioctl32.c   | 120 +----------------------------------------
 2 files changed, 18 insertions(+), 122 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 9310576a45e5..e95807d223ad 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -319,6 +319,17 @@ typedef struct xfs_growfs_rt {
 	__u32		extsize;	/* new realtime extent size, fsblocks */
 } xfs_growfs_rt_t;
 
+/*
+ * on i386, u64 has 32-bit alignment, which makes the traditional xfs_bstat
+ * different from other architectures. Aligning all 64-bit members to
+ * sizeof(time_t) means that the new version with 64-bit time_t is padded
+ * the same as on all other architectures.
+ */
+#ifdef __i386__
+#define __TIME_ALIGN __attribute__((aligned(sizeof(time_t))))
+#else
+#define __TIME_ALIGN
+#endif
 
 /*
  * Structures returned from ioctl XFS_IOC_FSBULKSTAT & XFS_IOC_FSBULKSTAT_SINGLE
@@ -328,24 +339,24 @@ typedef struct xfs_bstime {
 	__s64		tv_sec;		/* seconds		*/
 	__s32		tv_nsec;	/* and nanoseconds	*/
 #else
-	time_t		tv_sec;		/* seconds		*/
+	time_t		tv_sec __TIME_ALIGN; /* seconds		*/
 	__s32		tv_nsec;	/* and nanoseconds	*/
 #endif
 } xfs_bstime_t;
 
 struct xfs_bstat {
-	__u64		bs_ino;		/* inode number			*/
+	__u64		bs_ino __TIME_ALIGN; /* inode number		*/
 	__u16		bs_mode;	/* type and mode		*/
 	__u16		bs_nlink;	/* number of links		*/
 	__u32		bs_uid;		/* user id			*/
 	__u32		bs_gid;		/* group id			*/
 	__u32		bs_rdev;	/* device value			*/
 	__s32		bs_blksize;	/* block size			*/
-	__s64		bs_size;	/* file size			*/
+	__s64		bs_size __TIME_ALIGN;	/* file size		*/
 	xfs_bstime_t	bs_atime;	/* access time			*/
 	xfs_bstime_t	bs_mtime;	/* modify time			*/
 	xfs_bstime_t	bs_ctime;	/* inode change time		*/
-	int64_t		bs_blocks;	/* number of blocks		*/
+	int64_t		bs_blocks __TIME_ALIGN;	/* number of blocks	*/
 	__u32		bs_xflags;	/* extended flags		*/
 	__s32		bs_extsize;	/* extent size			*/
 	__s32		bs_extents;	/* number of extents		*/
@@ -362,6 +373,7 @@ struct xfs_bstat {
 	__u16		bs_dmstate;	/* DMIG state info		*/
 	__u16		bs_aextents;	/* attribute number of extents	*/
 };
+#undef __TIME_ALIGN
 
 /* New bulkstat structure that reports v5 features and fixes padding issues */
 struct xfs_bulkstat {
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 2ea7d3e12b4b..8059cdc2d5ca 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -98,101 +98,6 @@ xfs_fsinumbers_fmt_compat(
 	return xfs_ibulk_advance(breq, sizeof(struct compat_xfs_inogrp));
 }
 
-/* struct xfs_bstat has differing alignment on intel */
-STATIC int
-xfs_ioctl32_bstime_copyin(
-	xfs_bstime_t		*bstime,
-	compat_xfs_bstime_t	__user *bstime32)
-{
-	if (get_user(bstime->tv_sec,	&bstime32->tv_sec)	||
-	    get_user(bstime->tv_nsec,	&bstime32->tv_nsec))
-		return -EFAULT;
-	return 0;
-}
-
-STATIC int
-xfs_ioctl32_bstat_copyin(
-	struct xfs_bstat		*bstat,
-	struct compat_xfs_bstat	__user	*bstat32)
-{
-	if (get_user(bstat->bs_ino,	&bstat32->bs_ino)	||
-	    get_user(bstat->bs_mode,	&bstat32->bs_mode)	||
-	    get_user(bstat->bs_nlink,	&bstat32->bs_nlink)	||
-	    get_user(bstat->bs_uid,	&bstat32->bs_uid)	||
-	    get_user(bstat->bs_gid,	&bstat32->bs_gid)	||
-	    get_user(bstat->bs_rdev,	&bstat32->bs_rdev)	||
-	    get_user(bstat->bs_blksize,	&bstat32->bs_blksize)	||
-	    get_user(bstat->bs_size,	&bstat32->bs_size)	||
-	    xfs_ioctl32_bstime_copyin(&bstat->bs_atime, &bstat32->bs_atime) ||
-	    xfs_ioctl32_bstime_copyin(&bstat->bs_mtime, &bstat32->bs_mtime) ||
-	    xfs_ioctl32_bstime_copyin(&bstat->bs_ctime, &bstat32->bs_ctime) ||
-	    get_user(bstat->bs_blocks,	&bstat32->bs_size)	||
-	    get_user(bstat->bs_xflags,	&bstat32->bs_size)	||
-	    get_user(bstat->bs_extsize,	&bstat32->bs_extsize)	||
-	    get_user(bstat->bs_extents,	&bstat32->bs_extents)	||
-	    get_user(bstat->bs_gen,	&bstat32->bs_gen)	||
-	    get_user(bstat->bs_projid_lo, &bstat32->bs_projid_lo) ||
-	    get_user(bstat->bs_projid_hi, &bstat32->bs_projid_hi) ||
-	    get_user(bstat->bs_forkoff,	&bstat32->bs_forkoff)	||
-	    get_user(bstat->bs_dmevmask, &bstat32->bs_dmevmask)	||
-	    get_user(bstat->bs_dmstate,	&bstat32->bs_dmstate)	||
-	    get_user(bstat->bs_aextents, &bstat32->bs_aextents))
-		return -EFAULT;
-	return 0;
-}
-
-/* XFS_IOC_FSBULKSTAT and friends */
-
-STATIC int
-xfs_bstime_store_compat(
-	compat_xfs_bstime_t	__user *p32,
-	const xfs_bstime_t	*p)
-{
-	if (put_user(p->tv_sec, &p32->tv_sec) ||
-	    put_user(p->tv_nsec, &p32->tv_nsec))
-		return -EFAULT;
-	return 0;
-}
-
-/* Return 0 on success or positive error (to xfs_bulkstat()) */
-STATIC int
-xfs_fsbulkstat_one_fmt_compat(
-	struct xfs_ibulk		*breq,
-	const struct xfs_bulkstat	*bstat)
-{
-	struct compat_xfs_bstat	__user	*p32 = breq->ubuffer;
-	struct xfs_bstat		bs1;
-	struct xfs_bstat		*buffer = &bs1;
-
-	xfs_bulkstat_to_bstat(breq->mp, &bs1, bstat);
-
-	if (put_user(buffer->bs_ino,	  &p32->bs_ino)		||
-	    put_user(buffer->bs_mode,	  &p32->bs_mode)	||
-	    put_user(buffer->bs_nlink,	  &p32->bs_nlink)	||
-	    put_user(buffer->bs_uid,	  &p32->bs_uid)		||
-	    put_user(buffer->bs_gid,	  &p32->bs_gid)		||
-	    put_user(buffer->bs_rdev,	  &p32->bs_rdev)	||
-	    put_user(buffer->bs_blksize,  &p32->bs_blksize)	||
-	    put_user(buffer->bs_size,	  &p32->bs_size)	||
-	    xfs_bstime_store_compat(&p32->bs_atime, &buffer->bs_atime) ||
-	    xfs_bstime_store_compat(&p32->bs_mtime, &buffer->bs_mtime) ||
-	    xfs_bstime_store_compat(&p32->bs_ctime, &buffer->bs_ctime) ||
-	    put_user(buffer->bs_blocks,	  &p32->bs_blocks)	||
-	    put_user(buffer->bs_xflags,	  &p32->bs_xflags)	||
-	    put_user(buffer->bs_extsize,  &p32->bs_extsize)	||
-	    put_user(buffer->bs_extents,  &p32->bs_extents)	||
-	    put_user(buffer->bs_gen,	  &p32->bs_gen)		||
-	    put_user(buffer->bs_projid,	  &p32->bs_projid)	||
-	    put_user(buffer->bs_projid_hi,	&p32->bs_projid_hi)	||
-	    put_user(buffer->bs_forkoff,  &p32->bs_forkoff)	||
-	    put_user(buffer->bs_dmevmask, &p32->bs_dmevmask)	||
-	    put_user(buffer->bs_dmstate,  &p32->bs_dmstate)	||
-	    put_user(buffer->bs_aextents, &p32->bs_aextents))
-		return -EFAULT;
-
-	return xfs_ibulk_advance(breq, sizeof(struct compat_xfs_bstat));
-}
-
 #else
 #define xfs_fsinumbers_fmt_compat xfs_fsinumbers_fmt
 #endif	/* BROKEN_X86_ALIGNMENT */
@@ -221,11 +126,6 @@ xfs_compat_ioc_fsbulkstat(
 	 */
 	inumbers_fmt_pf		inumbers_func = xfs_fsinumbers_fmt_compat;
 	bulkstat_one_fmt_pf	bs_one_func_old = xfs_fsbulkstat_time32_one_fmt;
-	bulkstat_one_fmt_pf	bs_one_func_new = xfs_fsbulkstat_one_fmt;
-
-#ifdef BROKEN_X86_ALIGNMENT
-	bs_one_func_new = xfs_fsbulkstat_one_fmt_compat;
-#endif
 
 #ifdef CONFIG_X86_X32
 	if (in_x32_syscall()) {
@@ -305,12 +205,12 @@ xfs_compat_ioc_fsbulkstat(
 	case XFS_IOC_FSBULKSTAT_SINGLE_NEW32:
 		breq.startino = lastino;
 		breq.icount = 1;
-		error = xfs_bulkstat_one(&breq, bs_one_func_new);
+		error = xfs_bulkstat_one(&breq, xfs_fsbulkstat_one_fmt);
 		lastino = breq.startino;
 		break;
 	case XFS_IOC_FSBULKSTAT_NEW32:
 		breq.startino = lastino ? lastino + 1 : 0;
-		error = xfs_bulkstat(&breq, bs_one_func_new);
+		error = xfs_bulkstat(&breq, xfs_fsbulkstat_one_fmt);
 		lastino = breq.startino - 1;
 		break;
 	default:
@@ -610,22 +510,6 @@ xfs_file_compat_ioctl(
 		mnt_drop_write_file(filp);
 		return error;
 	}
-	case XFS_IOC_SWAPEXT_32: {
-		struct xfs_swapext	  sxp;
-		struct compat_xfs_swapext __user *sxu = arg;
-
-		/* Bulk copy in up to the sx_stat field, then copy bstat */
-		if (copy_from_user(&sxp, sxu,
-				   offsetof(struct xfs_swapext, sx_stat)) ||
-		    xfs_ioctl32_bstat_copyin(&sxp.sx_stat, &sxu->sx_stat))
-			return -EFAULT;
-		error = mnt_want_write_file(filp);
-		if (error)
-			return error;
-		error = xfs_ioc_swapext(&sxp);
-		mnt_drop_write_file(filp);
-		return error;
-	}
 #endif
 	/* long changes size, but xfs only copiese out 32 bits */
 	case XFS_IOC_GETXFLAGS_32:
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 4/5] xfs: extend inode format for 40-bit timestamps
  2019-11-12 12:09 [RFC 0/5] xfs: y2038 conversion Arnd Bergmann
                   ` (2 preceding siblings ...)
  2019-11-12 12:09 ` [RFC 3/5] xfs: [variant C] avoid i386-misaligned xfs_bstat Arnd Bergmann
@ 2019-11-12 12:09 ` Arnd Bergmann
  2019-11-12 14:16   ` Christoph Hellwig
  2019-11-12 21:32   ` Dave Chinner
  2019-11-12 12:09 ` [RFC 5/5] xfs: use 40-bit quota time limits Arnd Bergmann
  4 siblings, 2 replies; 15+ messages in thread
From: Arnd Bergmann @ 2019-11-12 12:09 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: linux-xfs, y2038, arnd, Deepa Dinamani, Christoph Hellwig,
	Dave Chinner, Brian Foster, Allison Collins, linux-fsdevel

XFS is the only major file system that lacks timestamps beyond year 2038,
and is already being deployed in systems that may have to be supported
beyond that time.

Fortunately, the inode format still has a few reserved bits that can be
used to extend the current format. There are two bits in the nanosecond
portion that could be used in the same way that ext4 does, extending
the timestamps until year 2378, as well as 12 unused bytes after the
already allocated fields.

There are four timestamps that need to be extended, so using four
bytes out of the reserved space gets us all the way until year 36676,
by extending the current 1902-2036 with another 255 epochs, which
seems to be a reasonable range.

I am not sure whether this change to the inode format requires a
new version for the inode. All existing file system images remain
compatible, while mounting a file systems with extended timestamps
beyond 2038 would report that timestamp incorrectly in the 1902
through 2038 range, matching the traditional Linux behavior of
wrapping timestamps.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 fs/xfs/libxfs/xfs_format.h      |  6 +++++-
 fs/xfs/libxfs/xfs_inode_buf.c   | 28 ++++++++++++++++++++--------
 fs/xfs/libxfs/xfs_inode_buf.h   |  1 +
 fs/xfs/libxfs/xfs_log_format.h  |  6 +++++-
 fs/xfs/libxfs/xfs_trans_inode.c |  3 ++-
 fs/xfs/xfs_inode.c              |  3 ++-
 fs/xfs/xfs_inode_item.c         | 10 +++++++---
 fs/xfs/xfs_iops.c               |  3 ++-
 fs/xfs/xfs_itable.c             |  2 +-
 fs/xfs/xfs_super.c              |  2 +-
 10 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index c968b60cee15..dc8d160775fb 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -883,7 +883,11 @@ typedef struct xfs_dinode {
 	__be64		di_lsn;		/* flush sequence */
 	__be64		di_flags2;	/* more random flags */
 	__be32		di_cowextsize;	/* basic cow extent size for file */
-	__u8		di_pad2[12];	/* more padding for future expansion */
+	__u8		di_atime_hi;	/* upper 8 bits of di_atime */
+	__u8		di_mtime_hi;	/* upper 8 bits of di_mtime */
+	__u8		di_ctime_hi;	/* upper 8 bits of di_ctime */
+	__u8		di_crtime_hi;	/* upper 8 bits of di_crtime */
+	__u8		di_pad2[8];	/* more padding for future expansion */
 
 	/* fields only written to during inode creation */
 	xfs_timestamp_t	di_crtime;	/* time created */
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 28ab3c5255e1..4989b6f1ac6f 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -228,16 +228,19 @@ xfs_inode_from_disk(
 	to->di_flushiter = be16_to_cpu(from->di_flushiter);
 
 	/*
-	 * Time is signed, so need to convert to signed 32 bit before
-	 * storing in inode timestamp which may be 64 bit. Otherwise
-	 * a time before epoch is converted to a time long after epoch
-	 * on 64 bit systems.
+	 * The supported time range starts at INT_MIN, corresponding to
+	 * year 1902. With the traditional low 32 bits, this ends in
+	 * year 2038, the extra 8 bits extend it by another 255 epochs
+	 * of 136.1 years each, up to year 36744.
 	 */
-	inode->i_atime.tv_sec = (int)be32_to_cpu(from->di_atime.t_sec);
+	inode->i_atime.tv_sec = be32_to_cpu(from->di_atime.t_sec) +
+				((u64)from->di_atime_hi << 32);
 	inode->i_atime.tv_nsec = (int)be32_to_cpu(from->di_atime.t_nsec);
-	inode->i_mtime.tv_sec = (int)be32_to_cpu(from->di_mtime.t_sec);
+	inode->i_mtime.tv_sec = (int)be32_to_cpu(from->di_mtime.t_sec) +
+				((u64)from->di_mtime_hi << 32);
 	inode->i_mtime.tv_nsec = (int)be32_to_cpu(from->di_mtime.t_nsec);
-	inode->i_ctime.tv_sec = (int)be32_to_cpu(from->di_ctime.t_sec);
+	inode->i_ctime.tv_sec = (int)be32_to_cpu(from->di_ctime.t_sec) +
+				((u64)from->di_ctime_hi << 32);
 	inode->i_ctime.tv_nsec = (int)be32_to_cpu(from->di_ctime.t_nsec);
 	inode->i_generation = be32_to_cpu(from->di_gen);
 	inode->i_mode = be16_to_cpu(from->di_mode);
@@ -256,7 +259,8 @@ xfs_inode_from_disk(
 	if (to->di_version == 3) {
 		inode_set_iversion_queried(inode,
 					   be64_to_cpu(from->di_changecount));
-		to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec);
+		to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec) +
+				((u64)from->di_crtime_hi << 32);
 		to->di_crtime.t_nsec = be32_to_cpu(from->di_crtime.t_nsec);
 		to->di_flags2 = be64_to_cpu(from->di_flags2);
 		to->di_cowextsize = be32_to_cpu(from->di_cowextsize);
@@ -284,10 +288,13 @@ xfs_inode_to_disk(
 
 	memset(to->di_pad, 0, sizeof(to->di_pad));
 	to->di_atime.t_sec = cpu_to_be32(inode->i_atime.tv_sec);
+	to->di_atime_hi = upper_32_bits(inode->i_atime.tv_sec);
 	to->di_atime.t_nsec = cpu_to_be32(inode->i_atime.tv_nsec);
 	to->di_mtime.t_sec = cpu_to_be32(inode->i_mtime.tv_sec);
+	to->di_mtime_hi = upper_32_bits(inode->i_mtime.tv_sec);
 	to->di_mtime.t_nsec = cpu_to_be32(inode->i_mtime.tv_nsec);
 	to->di_ctime.t_sec = cpu_to_be32(inode->i_ctime.tv_sec);
+	to->di_ctime_hi = upper_32_bits(inode->i_ctime.tv_sec);
 	to->di_ctime.t_nsec = cpu_to_be32(inode->i_ctime.tv_nsec);
 	to->di_nlink = cpu_to_be32(inode->i_nlink);
 	to->di_gen = cpu_to_be32(inode->i_generation);
@@ -307,6 +314,7 @@ xfs_inode_to_disk(
 	if (from->di_version == 3) {
 		to->di_changecount = cpu_to_be64(inode_peek_iversion(inode));
 		to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
+		to->di_crtime_hi = upper_32_bits(from->di_crtime.t_sec);
 		to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
 		to->di_flags2 = cpu_to_be64(from->di_flags2);
 		to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
@@ -338,10 +346,13 @@ xfs_log_dinode_to_disk(
 	memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
 
 	to->di_atime.t_sec = cpu_to_be32(from->di_atime.t_sec);
+	to->di_atime_hi = from->di_atime_hi;
 	to->di_atime.t_nsec = cpu_to_be32(from->di_atime.t_nsec);
 	to->di_mtime.t_sec = cpu_to_be32(from->di_mtime.t_sec);
+	to->di_mtime_hi = from->di_mtime_hi;
 	to->di_mtime.t_nsec = cpu_to_be32(from->di_mtime.t_nsec);
 	to->di_ctime.t_sec = cpu_to_be32(from->di_ctime.t_sec);
+	to->di_ctime_hi = from->di_ctime_hi;
 	to->di_ctime.t_nsec = cpu_to_be32(from->di_ctime.t_nsec);
 
 	to->di_size = cpu_to_be64(from->di_size);
@@ -359,6 +370,7 @@ xfs_log_dinode_to_disk(
 	if (from->di_version == 3) {
 		to->di_changecount = cpu_to_be64(from->di_changecount);
 		to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
+		to->di_crtime_hi = from->di_crtime_hi;
 		to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
 		to->di_flags2 = cpu_to_be64(from->di_flags2);
 		to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
diff --git a/fs/xfs/libxfs/xfs_inode_buf.h b/fs/xfs/libxfs/xfs_inode_buf.h
index ab0f84165317..49556e1898da 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.h
+++ b/fs/xfs/libxfs/xfs_inode_buf.h
@@ -38,6 +38,7 @@ struct xfs_icdinode {
 	uint32_t	di_cowextsize;	/* basic cow extent size for file */
 
 	xfs_ictimestamp_t di_crtime;	/* time created */
+	uint8_t		di_crtime_hi;	/* upper 8 bites of di_crtime */
 };
 
 /*
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index e5f97c69b320..c17e7c6511ff 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -414,7 +414,11 @@ struct xfs_log_dinode {
 	xfs_lsn_t	di_lsn;		/* flush sequence */
 	uint64_t	di_flags2;	/* more random flags */
 	uint32_t	di_cowextsize;	/* basic cow extent size for file */
-	uint8_t		di_pad2[12];	/* more padding for future expansion */
+	uint8_t		di_atime_hi;	/* upper 8 bits of di_atime */
+	uint8_t		di_mtime_hi;	/* upper 8 bits of di_mtime */
+	uint8_t		di_ctime_hi;	/* upper 8 bits of di_ctime */
+	uint8_t		di_crtime_hi;	/* upper 8 bits of di_crtime */
+	uint8_t		di_pad2[8];	/* more padding for future expansion */
 
 	/* fields only written to during inode creation */
 	xfs_ictimestamp_t di_crtime;	/* time created */
diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
index a9ad90926b87..419356eec52c 100644
--- a/fs/xfs/libxfs/xfs_trans_inode.c
+++ b/fs/xfs/libxfs/xfs_trans_inode.c
@@ -67,7 +67,8 @@ xfs_trans_ichgtime(
 	if (flags & XFS_ICHGTIME_CHG)
 		inode->i_ctime = tv;
 	if (flags & XFS_ICHGTIME_CREATE) {
-		ip->i_d.di_crtime.t_sec = (int32_t)tv.tv_sec;
+		ip->i_d.di_crtime.t_sec = lower_32_bits(tv.tv_sec);
+		ip->i_d.di_crtime_hi = upper_32_bits(tv.tv_sec);
 		ip->i_d.di_crtime.t_nsec = (int32_t)tv.tv_nsec;
 	}
 }
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 18f4b262e61c..c0d9d568ea4f 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -845,7 +845,8 @@ xfs_ialloc(
 		inode_set_iversion(inode, 1);
 		ip->i_d.di_flags2 = 0;
 		ip->i_d.di_cowextsize = 0;
-		ip->i_d.di_crtime.t_sec = (int32_t)tv.tv_sec;
+		ip->i_d.di_crtime.t_sec = lower_32_bits(tv.tv_sec);
+		ip->i_d.di_crtime_hi = upper_32_bits(tv.tv_sec);
 		ip->i_d.di_crtime.t_nsec = (int32_t)tv.tv_nsec;
 	}
 
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index bb8f076805b9..338188a5a698 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -314,11 +314,14 @@ xfs_inode_to_log_dinode(
 
 	memset(to->di_pad, 0, sizeof(to->di_pad));
 	memset(to->di_pad3, 0, sizeof(to->di_pad3));
-	to->di_atime.t_sec = inode->i_atime.tv_sec;
+	to->di_atime.t_sec = lower_32_bits(inode->i_atime.tv_sec);
+	to->di_atime_hi = upper_32_bits(inode->i_atime.tv_sec);
 	to->di_atime.t_nsec = inode->i_atime.tv_nsec;
-	to->di_mtime.t_sec = inode->i_mtime.tv_sec;
+	to->di_mtime.t_sec = lower_32_bits(inode->i_mtime.tv_sec);
+	to->di_mtime_hi = upper_32_bits(inode->i_mtime.tv_sec);
 	to->di_mtime.t_nsec = inode->i_mtime.tv_nsec;
-	to->di_ctime.t_sec = inode->i_ctime.tv_sec;
+	to->di_ctime.t_sec = lower_32_bits(inode->i_ctime.tv_sec);
+	to->di_ctime_hi = upper_32_bits(inode->i_ctime.tv_sec);
 	to->di_ctime.t_nsec = inode->i_ctime.tv_nsec;
 	to->di_nlink = inode->i_nlink;
 	to->di_gen = inode->i_generation;
@@ -341,6 +344,7 @@ xfs_inode_to_log_dinode(
 	if (from->di_version == 3) {
 		to->di_changecount = inode_peek_iversion(inode);
 		to->di_crtime.t_sec = from->di_crtime.t_sec;
+		to->di_crtime_hi = from->di_crtime_hi;
 		to->di_crtime.t_nsec = from->di_crtime.t_nsec;
 		to->di_flags2 = from->di_flags2;
 		to->di_cowextsize = from->di_cowextsize;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index fe285d123d69..72d40ae1e91f 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -516,7 +516,8 @@ xfs_vn_getattr(
 	if (ip->i_d.di_version == 3) {
 		if (request_mask & STATX_BTIME) {
 			stat->result_mask |= STATX_BTIME;
-			stat->btime.tv_sec = ip->i_d.di_crtime.t_sec;
+			stat->btime.tv_sec = ip->i_d.di_crtime.t_sec +
+					((u64)ip->i_d.di_crtime_hi << 32);
 			stat->btime.tv_nsec = ip->i_d.di_crtime.t_nsec;
 		}
 	}
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 884950adbd16..ea4bf4475727 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -97,7 +97,7 @@ xfs_bulkstat_one_int(
 	buf->bs_mtime_nsec = inode->i_mtime.tv_nsec;
 	buf->bs_ctime = inode->i_ctime.tv_sec;
 	buf->bs_ctime_nsec = inode->i_ctime.tv_nsec;
-	buf->bs_btime = dic->di_crtime.t_sec;
+	buf->bs_btime = dic->di_crtime.t_sec + ((u64)dic->di_crtime_hi << 32);
 	buf->bs_btime_nsec = dic->di_crtime.t_nsec;
 	buf->bs_gen = inode->i_generation;
 	buf->bs_mode = inode->i_mode;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8d1df9f8be07..2adfe1039693 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1665,7 +1665,7 @@ xfs_fs_fill_super(
 	sb->s_max_links = XFS_MAXLINK;
 	sb->s_time_gran = 1;
 	sb->s_time_min = S32_MIN;
-	sb->s_time_max = S32_MAX;
+	sb->s_time_max = S32_MAX + 255 * 0x100000000ull;
 	sb->s_iflags |= SB_I_CGROUPWB;
 
 	set_posix_acl_flag(sb);
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 5/5] xfs: use 40-bit quota time limits
  2019-11-12 12:09 [RFC 0/5] xfs: y2038 conversion Arnd Bergmann
                   ` (3 preceding siblings ...)
  2019-11-12 12:09 ` [RFC 4/5] xfs: extend inode format for 40-bit timestamps Arnd Bergmann
@ 2019-11-12 12:09 ` Arnd Bergmann
  4 siblings, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2019-11-12 12:09 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: linux-xfs, y2038, arnd, Deepa Dinamani, Christoph Hellwig,
	Dave Chinner, Brian Foster, Allison Collins, linux-fsdevel

The quota handling in xfs is based around an in-memory representation
of time_t, which overflows in year 2038 on 32-bit architectures, and an
on-disk representation of __be32, which overflows in year 2106 based
on interpreting the values as unsigned.

Extend both to allow for much longer times: the in-memory representation
should just use time64_t and the on-disk representation has to live with
the spare bits in struct xfs_disk_dquot. As there is an unused 32-bit
field, and three time limits in it, allocating 8 bits per timeout
seems appropriate.

Note: the quotactl() syscall is not affected by this, it has its
own struct fs_disk_quota that may need a similar conversion.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 fs/xfs/libxfs/xfs_dquot_buf.c |  6 +++---
 fs/xfs/libxfs/xfs_format.h    |  5 ++++-
 fs/xfs/xfs_dquot.c            | 29 ++++++++++++++++++++---------
 fs/xfs/xfs_qm.c               | 18 ++++++++++++------
 fs/xfs/xfs_qm.h               |  6 +++---
 fs/xfs/xfs_qm_syscalls.c      | 16 +++++++++++-----
 fs/xfs/xfs_quotaops.c         |  6 +++---
 fs/xfs/xfs_trans_dquot.c      | 17 +++++++++++------
 8 files changed, 67 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dquot_buf.c b/fs/xfs/libxfs/xfs_dquot_buf.c
index e8bd688a4073..ee59c539f9ab 100644
--- a/fs/xfs/libxfs/xfs_dquot_buf.c
+++ b/fs/xfs/libxfs/xfs_dquot_buf.c
@@ -75,17 +75,17 @@ xfs_dquot_verify(
 
 	if (ddq->d_blk_softlimit &&
 	    be64_to_cpu(ddq->d_bcount) > be64_to_cpu(ddq->d_blk_softlimit) &&
-	    !ddq->d_btimer)
+	    !ddq->d_btimer && !ddq->d_btimer_high)
 		return __this_address;
 
 	if (ddq->d_ino_softlimit &&
 	    be64_to_cpu(ddq->d_icount) > be64_to_cpu(ddq->d_ino_softlimit) &&
-	    !ddq->d_itimer)
+	    !ddq->d_itimer && !ddq->d_itimer_high)
 		return __this_address;
 
 	if (ddq->d_rtb_softlimit &&
 	    be64_to_cpu(ddq->d_rtbcount) > be64_to_cpu(ddq->d_rtb_softlimit) &&
-	    !ddq->d_rtbtimer)
+	    !ddq->d_rtbtimer && !ddq->d_rtbtimer_high)
 		return __this_address;
 
 	return NULL;
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index dc8d160775fb..83bd5166c0ee 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1168,7 +1168,10 @@ typedef struct	xfs_disk_dquot {
 	__be32		d_btimer;	/* similar to above; for disk blocks */
 	__be16		d_iwarns;	/* warnings issued wrt num inodes */
 	__be16		d_bwarns;	/* warnings issued wrt disk blocks */
-	__be32		d_pad0;		/* 64 bit align */
+	__u8		d_itimer_high;	/* upper bits of d_itimer */
+	__u8		d_btimer_high;	/* upper bits of d_btimer */
+	__u8		d_rtbtimer_high;/* upper bits of d_rtbtimer */
+	__u8		d_pad0;		/* 64 bit align */
 	__be64		d_rtb_hardlimit;/* absolute limit on realtime blks */
 	__be64		d_rtb_softlimit;/* preferred limit on RT disk blks */
 	__be64		d_rtbcount;	/* realtime blocks owned */
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index aeb95e7391c1..15b5a339f6df 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -116,6 +116,8 @@ xfs_qm_adjust_dqtimers(
 	xfs_mount_t		*mp,
 	xfs_disk_dquot_t	*d)
 {
+	time64_t timer;
+
 	ASSERT(d->d_id);
 
 #ifdef DEBUG
@@ -130,15 +132,17 @@ xfs_qm_adjust_dqtimers(
 		       be64_to_cpu(d->d_rtb_hardlimit));
 #endif
 
-	if (!d->d_btimer) {
+	if (!d->d_btimer && !d->d_btimer_high) {
 		if ((d->d_blk_softlimit &&
 		     (be64_to_cpu(d->d_bcount) >
 		      be64_to_cpu(d->d_blk_softlimit))) ||
 		    (d->d_blk_hardlimit &&
 		     (be64_to_cpu(d->d_bcount) >
 		      be64_to_cpu(d->d_blk_hardlimit)))) {
-			d->d_btimer = cpu_to_be32(get_seconds() +
-					mp->m_quotainfo->qi_btimelimit);
+			timer = ktime_get_real_seconds() +
+				mp->m_quotainfo->qi_btimelimit;
+			d->d_btimer = cpu_to_be32(lower_32_bits(timer));
+			d->d_btimer_high = (u8)upper_32_bits(timer);
 		} else {
 			d->d_bwarns = 0;
 		}
@@ -150,18 +154,21 @@ xfs_qm_adjust_dqtimers(
 		    (be64_to_cpu(d->d_bcount) <=
 		     be64_to_cpu(d->d_blk_hardlimit)))) {
 			d->d_btimer = 0;
+			d->d_btimer_high = 0;
 		}
 	}
 
-	if (!d->d_itimer) {
+	if (!d->d_itimer && !d->d_itimer_high) {
 		if ((d->d_ino_softlimit &&
 		     (be64_to_cpu(d->d_icount) >
 		      be64_to_cpu(d->d_ino_softlimit))) ||
 		    (d->d_ino_hardlimit &&
 		     (be64_to_cpu(d->d_icount) >
 		      be64_to_cpu(d->d_ino_hardlimit)))) {
-			d->d_itimer = cpu_to_be32(get_seconds() +
-					mp->m_quotainfo->qi_itimelimit);
+			timer = ktime_get_real_seconds() +
+				mp->m_quotainfo->qi_itimelimit;
+			d->d_itimer = cpu_to_be32(lower_32_bits(timer));
+			d->d_itimer_high = (u8)upper_32_bits(timer);
 		} else {
 			d->d_iwarns = 0;
 		}
@@ -173,18 +180,21 @@ xfs_qm_adjust_dqtimers(
 		     (be64_to_cpu(d->d_icount) <=
 		      be64_to_cpu(d->d_ino_hardlimit)))) {
 			d->d_itimer = 0;
+			d->d_itimer_high = 0;
 		}
 	}
 
-	if (!d->d_rtbtimer) {
+	if (!d->d_rtbtimer && !d->d_rtbtimer_high) {
 		if ((d->d_rtb_softlimit &&
 		     (be64_to_cpu(d->d_rtbcount) >
 		      be64_to_cpu(d->d_rtb_softlimit))) ||
 		    (d->d_rtb_hardlimit &&
 		     (be64_to_cpu(d->d_rtbcount) >
 		      be64_to_cpu(d->d_rtb_hardlimit)))) {
-			d->d_rtbtimer = cpu_to_be32(get_seconds() +
-					mp->m_quotainfo->qi_rtbtimelimit);
+			timer = ktime_get_real_seconds() +
+				mp->m_quotainfo->qi_rtbtimelimit;
+			d->d_rtbtimer = cpu_to_be32(lower_32_bits(timer));
+			d->d_rtbtimer_high = (u8)upper_32_bits(timer);
 		} else {
 			d->d_rtbwarns = 0;
 		}
@@ -196,6 +206,7 @@ xfs_qm_adjust_dqtimers(
 		     (be64_to_cpu(d->d_rtbcount) <=
 		      be64_to_cpu(d->d_rtb_hardlimit)))) {
 			d->d_rtbtimer = 0;
+			d->d_rtbtimer_high = 0;
 		}
 	}
 }
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index ecd8ce152ab1..afd0384850f9 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -613,12 +613,15 @@ xfs_qm_init_timelimits(
 	 * a user or group before he or she can not perform any
 	 * more writing. If it is zero, a default is used.
 	 */
-	if (ddqp->d_btimer)
-		qinf->qi_btimelimit = be32_to_cpu(ddqp->d_btimer);
-	if (ddqp->d_itimer)
-		qinf->qi_itimelimit = be32_to_cpu(ddqp->d_itimer);
-	if (ddqp->d_rtbtimer)
-		qinf->qi_rtbtimelimit = be32_to_cpu(ddqp->d_rtbtimer);
+	if (ddqp->d_btimer || ddqp->d_btimer_high)
+		qinf->qi_btimelimit = be32_to_cpu(ddqp->d_btimer) +
+					((u64)ddqp->d_btimer_high << 32);
+	if (ddqp->d_itimer || ddqp->d_itimer_high)
+		qinf->qi_itimelimit = be32_to_cpu(ddqp->d_itimer) +
+					((u64)ddqp->d_itimer_high << 32);
+	if (ddqp->d_rtbtimer || ddqp->d_rtbtimer_high)
+		qinf->qi_rtbtimelimit = be32_to_cpu(ddqp->d_rtbtimer) +
+					((u64)ddqp->d_rtbtimer_high << 32);
 	if (ddqp->d_bwarns)
 		qinf->qi_bwarnlimit = be16_to_cpu(ddqp->d_bwarns);
 	if (ddqp->d_iwarns)
@@ -867,8 +870,11 @@ xfs_qm_reset_dqcounts(
 		ddq->d_icount = 0;
 		ddq->d_rtbcount = 0;
 		ddq->d_btimer = 0;
+		ddq->d_btimer_high = 0;
 		ddq->d_itimer = 0;
+		ddq->d_itimer_high = 0;
 		ddq->d_rtbtimer = 0;
+		ddq->d_rtbtimer_high = 0;
 		ddq->d_bwarns = 0;
 		ddq->d_iwarns = 0;
 		ddq->d_rtbwarns = 0;
diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h
index b41b75089548..4742686d522e 100644
--- a/fs/xfs/xfs_qm.h
+++ b/fs/xfs/xfs_qm.h
@@ -64,9 +64,9 @@ typedef struct xfs_quotainfo {
 	struct xfs_inode	*qi_pquotaip;	/* project quota inode */
 	struct list_lru	 qi_lru;
 	int		 qi_dquots;
-	time_t		 qi_btimelimit;	 /* limit for blks timer */
-	time_t		 qi_itimelimit;	 /* limit for inodes timer */
-	time_t		 qi_rtbtimelimit;/* limit for rt blks timer */
+	time64_t	 qi_btimelimit;	 /* limit for blks timer */
+	time64_t	 qi_itimelimit;	 /* limit for inodes timer */
+	time64_t	 qi_rtbtimelimit;/* limit for rt blks timer */
 	xfs_qwarncnt_t	 qi_bwarnlimit;	 /* limit for blks warnings */
 	xfs_qwarncnt_t	 qi_iwarnlimit;	 /* limit for inodes warnings */
 	xfs_qwarncnt_t	 qi_rtbwarnlimit;/* limit for rt blks warnings */
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index da7ad0383037..8d7d075d7779 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -500,15 +500,18 @@ xfs_qm_scall_setqlim(
 		 */
 		if (newlim->d_fieldmask & QC_SPC_TIMER) {
 			q->qi_btimelimit = newlim->d_spc_timer;
-			ddq->d_btimer = cpu_to_be32(newlim->d_spc_timer);
+			ddq->d_btimer = cpu_to_be32(lower_32_bits(newlim->d_spc_timer));
+			ddq->d_btimer_high = (u8)upper_32_bits(newlim->d_spc_timer);
 		}
 		if (newlim->d_fieldmask & QC_INO_TIMER) {
 			q->qi_itimelimit = newlim->d_ino_timer;
-			ddq->d_itimer = cpu_to_be32(newlim->d_ino_timer);
+			ddq->d_itimer = cpu_to_be32(lower_32_bits(newlim->d_ino_timer));
+			ddq->d_itimer_high = (u8)upper_32_bits(newlim->d_ino_timer);
 		}
 		if (newlim->d_fieldmask & QC_RT_SPC_TIMER) {
 			q->qi_rtbtimelimit = newlim->d_rt_spc_timer;
 			ddq->d_rtbtimer = cpu_to_be32(newlim->d_rt_spc_timer);
+			ddq->d_rtbtimer_high = (u8)upper_32_bits(newlim->d_rt_spc_timer);
 		}
 		if (newlim->d_fieldmask & QC_SPC_WARNS)
 			q->qi_bwarnlimit = newlim->d_spc_warns;
@@ -623,8 +626,10 @@ xfs_qm_scall_getquota_fill_qc(
 	dst->d_ino_softlimit = be64_to_cpu(dqp->q_core.d_ino_softlimit);
 	dst->d_space = XFS_FSB_TO_B(mp, dqp->q_res_bcount);
 	dst->d_ino_count = dqp->q_res_icount;
-	dst->d_spc_timer = be32_to_cpu(dqp->q_core.d_btimer);
-	dst->d_ino_timer = be32_to_cpu(dqp->q_core.d_itimer);
+	dst->d_spc_timer = be32_to_cpu(dqp->q_core.d_btimer) +
+			   ((u64)dqp->q_core.d_btimer_high << 32);
+	dst->d_ino_timer = be32_to_cpu(dqp->q_core.d_itimer) +
+			   ((u64)dqp->q_core.d_itimer_high << 32);
 	dst->d_ino_warns = be16_to_cpu(dqp->q_core.d_iwarns);
 	dst->d_spc_warns = be16_to_cpu(dqp->q_core.d_bwarns);
 	dst->d_rt_spc_hardlimit =
@@ -632,7 +637,8 @@ xfs_qm_scall_getquota_fill_qc(
 	dst->d_rt_spc_softlimit =
 		XFS_FSB_TO_B(mp, be64_to_cpu(dqp->q_core.d_rtb_softlimit));
 	dst->d_rt_space = XFS_FSB_TO_B(mp, dqp->q_res_rtbcount);
-	dst->d_rt_spc_timer = be32_to_cpu(dqp->q_core.d_rtbtimer);
+	dst->d_rt_spc_timer = be32_to_cpu(dqp->q_core.d_rtbtimer) +
+			   ((u64)dqp->q_core.d_rtbtimer_high << 32);
 	dst->d_rt_spc_warns = be16_to_cpu(dqp->q_core.d_rtbwarns);
 
 	/*
diff --git a/fs/xfs/xfs_quotaops.c b/fs/xfs/xfs_quotaops.c
index cd6c7210a373..96c3818b27ad 100644
--- a/fs/xfs/xfs_quotaops.c
+++ b/fs/xfs/xfs_quotaops.c
@@ -37,9 +37,9 @@ xfs_qm_fill_state(
 	tstate->flags |= QCI_SYSFILE;
 	tstate->blocks = ip->i_d.di_nblocks;
 	tstate->nextents = ip->i_d.di_nextents;
-	tstate->spc_timelimit = q->qi_btimelimit;
-	tstate->ino_timelimit = q->qi_itimelimit;
-	tstate->rt_spc_timelimit = q->qi_rtbtimelimit;
+	tstate->spc_timelimit = (u32)q->qi_btimelimit;
+	tstate->ino_timelimit = (u32)q->qi_itimelimit;
+	tstate->rt_spc_timelimit = (u32)q->qi_rtbtimelimit;
 	tstate->spc_warnlimit = q->qi_bwarnlimit;
 	tstate->ino_warnlimit = q->qi_iwarnlimit;
 	tstate->rt_spc_warnlimit = q->qi_rtbwarnlimit;
diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index 16457465833b..6efca54e0edb 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -580,7 +580,7 @@ xfs_trans_dqresv(
 {
 	xfs_qcnt_t	hardlimit;
 	xfs_qcnt_t	softlimit;
-	time_t		timer;
+	time64_t	timer;
 	xfs_qwarncnt_t	warns;
 	xfs_qwarncnt_t	warnlimit;
 	xfs_qcnt_t	total_count;
@@ -600,7 +600,8 @@ xfs_trans_dqresv(
 		softlimit = be64_to_cpu(dqp->q_core.d_blk_softlimit);
 		if (!softlimit)
 			softlimit = defq->bsoftlimit;
-		timer = be32_to_cpu(dqp->q_core.d_btimer);
+		timer = be32_to_cpu(dqp->q_core.d_btimer) +
+			((u64)dqp->q_core.d_btimer_high << 32);
 		warns = be16_to_cpu(dqp->q_core.d_bwarns);
 		warnlimit = dqp->q_mount->m_quotainfo->qi_bwarnlimit;
 		resbcountp = &dqp->q_res_bcount;
@@ -612,7 +613,8 @@ xfs_trans_dqresv(
 		softlimit = be64_to_cpu(dqp->q_core.d_rtb_softlimit);
 		if (!softlimit)
 			softlimit = defq->rtbsoftlimit;
-		timer = be32_to_cpu(dqp->q_core.d_rtbtimer);
+		timer = be32_to_cpu(dqp->q_core.d_rtbtimer) +
+			((u64)dqp->q_core.d_rtbtimer_high << 32);
 		warns = be16_to_cpu(dqp->q_core.d_rtbwarns);
 		warnlimit = dqp->q_mount->m_quotainfo->qi_rtbwarnlimit;
 		resbcountp = &dqp->q_res_rtbcount;
@@ -635,7 +637,8 @@ xfs_trans_dqresv(
 				goto error_return;
 			}
 			if (softlimit && total_count > softlimit) {
-				if ((timer != 0 && get_seconds() > timer) ||
+				if ((timer != 0 &&
+				     ktime_get_real_seconds() > timer) ||
 				    (warns != 0 && warns >= warnlimit)) {
 					xfs_quota_warn(mp, dqp,
 						       QUOTA_NL_BSOFTLONGWARN);
@@ -647,7 +650,8 @@ xfs_trans_dqresv(
 		}
 		if (ninos > 0) {
 			total_count = be64_to_cpu(dqp->q_core.d_icount) + ninos;
-			timer = be32_to_cpu(dqp->q_core.d_itimer);
+			timer = be32_to_cpu(dqp->q_core.d_itimer) +
+				((u64)dqp->q_core.d_itimer_high << 32);
 			warns = be16_to_cpu(dqp->q_core.d_iwarns);
 			warnlimit = dqp->q_mount->m_quotainfo->qi_iwarnlimit;
 			hardlimit = be64_to_cpu(dqp->q_core.d_ino_hardlimit);
@@ -662,7 +666,8 @@ xfs_trans_dqresv(
 				goto error_return;
 			}
 			if (softlimit && total_count > softlimit) {
-				if  ((timer != 0 && get_seconds() > timer) ||
+				if  ((timer != 0 &&
+				      ktime_get_real_seconds() > timer) ||
 				     (warns != 0 && warns >= warnlimit)) {
 					xfs_quota_warn(mp, dqp,
 						       QUOTA_NL_ISOFTLONGWARN);
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC 1/5] xfs: [variant A] avoid time_t in user api
  2019-11-12 12:09 ` [RFC 1/5] xfs: [variant A] avoid time_t in user api Arnd Bergmann
@ 2019-11-12 14:16   ` Christoph Hellwig
  2019-11-13  5:06     ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2019-11-12 14:16 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Darrick J . Wong, linux-xfs, y2038, Deepa Dinamani,
	Christoph Hellwig, Dave Chinner, Brian Foster, Allison Collins,
	linux-fsdevel

On Tue, Nov 12, 2019 at 01:09:06PM +0100, Arnd Bergmann wrote:
> However, as long as two observations are true, a much simpler solution
> can be used:
> 
> 1. xfsprogs is the only user space project that has a copy of this header

We can't guarantee that.

> 2. xfsprogs already has a replacement for all three affected ioctl commands,
>    based on the xfs_bulkstat structure to pass 64-bit timestamps
>    regardless of the architecture

XFS_IOC_BULKSTAT replaces XFS_IOC_FSBULKSTAT directly, and can replace
XFS_IOC_FSBULKSTAT_SINGLE indirectly, so that is easy.  Most users
actually use the new one now through libfrog, although I found a user
of the direct ioctl in the xfs_io tool, which could easily be fixed as
well.

XFS_IOC_SWAPEXT does not have a direct replacement, but the timestamp
is only used to verify that the file did not change vs the previous
stat.  So not being able to represent > 2038 times is not a real
problem anyway.

At some point we should probably look into a file system independent
defrag ioctl anyway, at which point we can deprecate XFS_IOC_SWAPEXT.

> Based on those assumptions, changing xfs_bstime to use __kernel_long_t
> instead of time_t in both the kernel and in xfsprogs preserves the current
> ABI for any libc definition of time_t and solves the problem of passing
> 64-bit timestamps to 32-bit user space.

As said above their are not entirely true, but I still think this patch
is the right thing to do, if only to get the time_t out of the ABI..

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 4/5] xfs: extend inode format for 40-bit timestamps
  2019-11-12 12:09 ` [RFC 4/5] xfs: extend inode format for 40-bit timestamps Arnd Bergmann
@ 2019-11-12 14:16   ` Christoph Hellwig
  2019-11-12 15:02     ` Amir Goldstein
  2019-11-12 21:32   ` Dave Chinner
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2019-11-12 14:16 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Darrick J . Wong, linux-xfs, y2038, Deepa Dinamani,
	Christoph Hellwig, Dave Chinner, Brian Foster, Allison Collins,
	linux-fsdevel, Amir Goldstein

Amir just send another patch dealing with the time stamps.  I'd suggest
you chime into the discussion in that thread.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 4/5] xfs: extend inode format for 40-bit timestamps
  2019-11-12 14:16   ` Christoph Hellwig
@ 2019-11-12 15:02     ` Amir Goldstein
  2019-11-12 15:29       ` [Y2038] " Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2019-11-12 15:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Arnd Bergmann, Darrick J . Wong, linux-xfs, y2038,
	Deepa Dinamani, Dave Chinner, Brian Foster, Allison Collins,
	linux-fsdevel

On Tue, Nov 12, 2019 at 4:16 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Amir just send another patch dealing with the time stamps.  I'd suggest
> you chime into the discussion in that thread.

That's right I just posted the ext4 style extend to 34bits yesterday [1],
but I like your version so much better, so I will withdraw mine.

Sorry I did not CC you nor Deepa nor y2038 list.
I did not think you were going to actually deal with specific filesystems.

I'd also like to hear people's thoughts about migration process.
Should the new feature be ro_compat as I defined it or incompat?

If all agree that Arnd's format change is preferred, I can assist with
xfsprogs patches, tests or whatnot.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-xfs/20191112082524.GA18779@infradead.org/T/#mfa11ea3c035d4c21ec6a56b7c83a6dfa76e48068

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Y2038] [RFC 4/5] xfs: extend inode format for 40-bit timestamps
  2019-11-12 15:02     ` Amir Goldstein
@ 2019-11-12 15:29       ` Arnd Bergmann
  0 siblings, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2019-11-12 15:29 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christoph Hellwig, Darrick J . Wong, y2038 Mailman List,
	Brian Foster, Dave Chinner, linux-xfs, Deepa Dinamani,
	linux-fsdevel, Allison Collins

On Tue, Nov 12, 2019 at 4:02 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 4:16 PM Christoph Hellwig <hch@lst.de> wrote:
> >
> > Amir just send another patch dealing with the time stamps.  I'd suggest
> > you chime into the discussion in that thread.
>
> That's right I just posted the ext4 style extend to 34bits yesterday [1],
> but I like your version so much better, so I will withdraw mine.

Thanks! I guess we probably want part of both in the end. I considered
adding wrappers for encoding and decoding the timestamp like yours
but in the end went with open-coding that part. The difference is pretty
minimal, should we leave it with the open-coded addition?

One part that I was missing (as described in my changelog) is any
versioning or feature flag, including the dynamically set sb->s_time_max
value. From looking at your patch, I guess this is something we want.

> Sorry I did not CC you nor Deepa nor y2038 list.
> I did not think you were going to actually deal with specific filesystems.

I was not going to, but in the process of cleaning up the remaining ioctls
I came to xfs last week and thought it would be silly to extend the uapi
but not the file system ;-)

> I'd also like to hear people's thoughts about migration process.
> Should the new feature be ro_compat as I defined it or incompat?
>
> If all agree that Arnd's format change is preferred, I can assist with
> xfsprogs patches, tests or whatnot.

Awesome, that would be great indeed!

      Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 4/5] xfs: extend inode format for 40-bit timestamps
  2019-11-12 12:09 ` [RFC 4/5] xfs: extend inode format for 40-bit timestamps Arnd Bergmann
  2019-11-12 14:16   ` Christoph Hellwig
@ 2019-11-12 21:32   ` Dave Chinner
  1 sibling, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2019-11-12 21:32 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Darrick J . Wong, linux-xfs, y2038, Deepa Dinamani,
	Christoph Hellwig, Brian Foster, Allison Collins, linux-fsdevel

On Tue, Nov 12, 2019 at 01:09:09PM +0100, Arnd Bergmann wrote:
> XFS is the only major file system that lacks timestamps beyond year 2038,
> and is already being deployed in systems that may have to be supported
> beyond that time.
> 
> Fortunately, the inode format still has a few reserved bits that can be
> used to extend the current format. There are two bits in the nanosecond
> portion that could be used in the same way that ext4 does, extending
> the timestamps until year 2378, as well as 12 unused bytes after the
> already allocated fields.
> 
> There are four timestamps that need to be extended, so using four
> bytes out of the reserved space gets us all the way until year 36676,
> by extending the current 1902-2036 with another 255 epochs, which
> seems to be a reasonable range.
> 
> I am not sure whether this change to the inode format requires a
> new version for the inode. All existing file system images remain
> compatible, while mounting a file systems with extended timestamps
> beyond 2038 would report that timestamp incorrectly in the 1902
> through 2038 range, matching the traditional Linux behavior of
> wrapping timestamps.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

This is basically what I proposed ~5 years or so ago and posted a
patch to implement it in an early y2038 discussion with you. I jsut
mentioned that very patch in my reposnse to Amir's timestamp
extension patchset, pointing out that this isn't the way we want
to proceed with >y2038 on-disk support.

https://lore.kernel.org/linux-xfs/20191112161242.GA19334@infradead.org/T/#maf6b2719ed561cc2865cc5e7eb82df206b971261

I'd suggest taking the discussion there....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 1/5] xfs: [variant A] avoid time_t in user api
  2019-11-12 14:16   ` Christoph Hellwig
@ 2019-11-13  5:06     ` Darrick J. Wong
  2019-11-13 13:42       ` Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2019-11-13  5:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Arnd Bergmann, linux-xfs, y2038, Deepa Dinamani, Dave Chinner,
	Brian Foster, Allison Collins, linux-fsdevel

On Tue, Nov 12, 2019 at 03:16:00PM +0100, Christoph Hellwig wrote:
> On Tue, Nov 12, 2019 at 01:09:06PM +0100, Arnd Bergmann wrote:
> > However, as long as two observations are true, a much simpler solution
> > can be used:
> > 
> > 1. xfsprogs is the only user space project that has a copy of this header
> 
> We can't guarantee that.
> 
> > 2. xfsprogs already has a replacement for all three affected ioctl commands,
> >    based on the xfs_bulkstat structure to pass 64-bit timestamps
> >    regardless of the architecture
> 
> XFS_IOC_BULKSTAT replaces XFS_IOC_FSBULKSTAT directly, and can replace
> XFS_IOC_FSBULKSTAT_SINGLE indirectly, so that is easy.  Most users
> actually use the new one now through libfrog, although I found a user
> of the direct ioctl in the xfs_io tool, which could easily be fixed as
> well.

Agreed, XFS_IOC_BULKSTAT is the replacement for the two FSBULKSTAT
variants.  The only question in my mind for the old ioctls is whether we
should return EOVERFLOW if the timestamp would overflow?  Or just
truncate the results?

> XFS_IOC_SWAPEXT does not have a direct replacement, but the timestamp
> is only used to verify that the file did not change vs the previous
> stat.  So not being able to represent > 2038 times is not a real
> problem anyway.

Won't it become a problem when the tv_sec comparison in xfs_swap_extents
is type-promoted to the larger type but lacks the upper bits?

I guess we could force the check to the lower 32 bits on the assumption
that those are the most likely to change due to a file write.

I kinda want to do a SWAPEXT v5, fwiw....

> At some point we should probably look into a file system independent
> defrag ioctl anyway, at which point we can deprecate XFS_IOC_SWAPEXT.
> 
> > Based on those assumptions, changing xfs_bstime to use __kernel_long_t
> > instead of time_t in both the kernel and in xfsprogs preserves the current
> > ABI for any libc definition of time_t and solves the problem of passing
> > 64-bit timestamps to 32-bit user space.
> 
> As said above their are not entirely true, but I still think this patch
> is the right thing to do, if only to get the time_t out of the ABI..
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Seconded,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 1/5] xfs: [variant A] avoid time_t in user api
  2019-11-13  5:06     ` Darrick J. Wong
@ 2019-11-13 13:42       ` Arnd Bergmann
  2019-11-13 16:34         ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2019-11-13 13:42 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, linux-xfs, y2038 Mailman List, Deepa Dinamani,
	Dave Chinner, Brian Foster, Allison Collins,
	Linux FS-devel Mailing List

On Wed, Nov 13, 2019 at 6:08 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> On Tue, Nov 12, 2019 at 03:16:00PM +0100, Christoph Hellwig wrote:
> > On Tue, Nov 12, 2019 at 01:09:06PM +0100, Arnd Bergmann wrote:
> > > However, as long as two observations are true, a much simpler solution
> > > can be used:
> > >
> > > 1. xfsprogs is the only user space project that has a copy of this header
> >
> > We can't guarantee that.
> >
> > > 2. xfsprogs already has a replacement for all three affected ioctl commands,
> > >    based on the xfs_bulkstat structure to pass 64-bit timestamps
> > >    regardless of the architecture
> >
> > XFS_IOC_BULKSTAT replaces XFS_IOC_FSBULKSTAT directly, and can replace
> > XFS_IOC_FSBULKSTAT_SINGLE indirectly, so that is easy.  Most users
> > actually use the new one now through libfrog, although I found a user
> > of the direct ioctl in the xfs_io tool, which could easily be fixed as
> > well.
>
> Agreed, XFS_IOC_BULKSTAT is the replacement for the two FSBULKSTAT
> variants.  The only question in my mind for the old ioctls is whether we
> should return EOVERFLOW if the timestamp would overflow?  Or just
> truncate the results?

I think neither of these would be particularly helpful, the result is that users
see no change in behavior until it's actually too late and the timestamps have
overrun.

If we take variant A and just fix the ABI to 32-bit time_t, it is important
that all user space stop using these ioctls and moves to the v5 interfaces
instead (including SWAPEXT I guess).

Something along the lines of this change would work:

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d50135760622..87318486c96e 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -830,6 +830,23 @@ xfs_fsinumbers_fmt(
        return xfs_ibulk_advance(breq, sizeof(struct xfs_inogrp));
 }

+/* disallow y2038-unsafe ioctls with CONFIG_COMPAT_32BIT_TIME=n */
+static bool xfs_have_compat_bstat_time32(unsigned int cmd)
+{
+       if (IS_ENABLED(CONFIG_COMPAT_32BIT_TIME))
+               return true;
+
+       if (IS_ENABLED(CONFIG_64BIT) && !in_compat_syscall())
+               return true;
+
+       if (cmd == XFS_IOC_FSBULKSTAT_SINGLE ||
+           cmd == XFS_IOC_FSBULKSTAT ||
+           cmd == XFS_IOC_SWAPEXT)
+               return false;
+
+       return true;
+}
+
 STATIC int
 xfs_ioc_fsbulkstat(
        xfs_mount_t             *mp,
@@ -850,6 +867,9 @@ xfs_ioc_fsbulkstat(
        if (!capable(CAP_SYS_ADMIN))
                return -EPERM;

+       if (!xfs_have_compat_bstat_time32())
+               return -EINVAL;
+
        if (XFS_FORCED_SHUTDOWN(mp))
                return -EIO;

@@ -2051,6 +2071,11 @@ xfs_ioc_swapext(
        struct fd       f, tmp;
        int             error = 0;

+       if (xfs_have_compat_bstat_time32(XFS_IOC_SWAPEXT)) {
+               error = -EINVAL
+               got out;
+       }
+
        /* Pull information for the target fd */
        f = fdget((int)sxp->sx_fdtarget);
        if (!f.file) {

This way, at least users that intentionally turn off CONFIG_COMPAT_32BIT_TIME
run into the broken application right away, which forces them to upgrade or fix
the code to use the v5 ioctl.

> > > Based on those assumptions, changing xfs_bstime to use __kernel_long_t
> > > instead of time_t in both the kernel and in xfsprogs preserves the current
> > > ABI for any libc definition of time_t and solves the problem of passing
> > > 64-bit timestamps to 32-bit user space.
> >
> > As said above their are not entirely true, but I still think this patch
> > is the right thing to do, if only to get the time_t out of the ABI..
> >
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
>
> Seconded,
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks!

     Arnd

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC 1/5] xfs: [variant A] avoid time_t in user api
  2019-11-13 13:42       ` Arnd Bergmann
@ 2019-11-13 16:34         ` Darrick J. Wong
  2019-11-13 17:14           ` Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2019-11-13 16:34 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, linux-xfs, y2038 Mailman List, Deepa Dinamani,
	Dave Chinner, Brian Foster, Allison Collins,
	Linux FS-devel Mailing List

On Wed, Nov 13, 2019 at 02:42:24PM +0100, Arnd Bergmann wrote:
> On Wed, Nov 13, 2019 at 6:08 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > On Tue, Nov 12, 2019 at 03:16:00PM +0100, Christoph Hellwig wrote:
> > > On Tue, Nov 12, 2019 at 01:09:06PM +0100, Arnd Bergmann wrote:
> > > > However, as long as two observations are true, a much simpler solution
> > > > can be used:
> > > >
> > > > 1. xfsprogs is the only user space project that has a copy of this header
> > >
> > > We can't guarantee that.
> > >
> > > > 2. xfsprogs already has a replacement for all three affected ioctl commands,
> > > >    based on the xfs_bulkstat structure to pass 64-bit timestamps
> > > >    regardless of the architecture
> > >
> > > XFS_IOC_BULKSTAT replaces XFS_IOC_FSBULKSTAT directly, and can replace
> > > XFS_IOC_FSBULKSTAT_SINGLE indirectly, so that is easy.  Most users
> > > actually use the new one now through libfrog, although I found a user
> > > of the direct ioctl in the xfs_io tool, which could easily be fixed as
> > > well.
> >
> > Agreed, XFS_IOC_BULKSTAT is the replacement for the two FSBULKSTAT
> > variants.  The only question in my mind for the old ioctls is whether we
> > should return EOVERFLOW if the timestamp would overflow?  Or just
> > truncate the results?
> 
> I think neither of these would be particularly helpful, the result is that users
> see no change in behavior until it's actually too late and the timestamps have
> overrun.
> 
> If we take variant A and just fix the ABI to 32-bit time_t, it is important
> that all user space stop using these ioctls and moves to the v5 interfaces
> instead (including SWAPEXT I guess).
> 
> Something along the lines of this change would work:
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index d50135760622..87318486c96e 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -830,6 +830,23 @@ xfs_fsinumbers_fmt(
>         return xfs_ibulk_advance(breq, sizeof(struct xfs_inogrp));
>  }
> 
> +/* disallow y2038-unsafe ioctls with CONFIG_COMPAT_32BIT_TIME=n */
> +static bool xfs_have_compat_bstat_time32(unsigned int cmd)
> +{

Wouldn't we want a test here like:

	if (!xfs_sb_version_hasbigtimestamps(&mp->m_sb))
		return true;

Since date overflow isn't a problem for existing xfs with 32-bit
timestamps, right?

> +       if (IS_ENABLED(CONFIG_COMPAT_32BIT_TIME))

Heh, I didn't know that existed.

> +               return true;
> +
> +       if (IS_ENABLED(CONFIG_64BIT) && !in_compat_syscall())
> +               return true;
> +       if (cmd == XFS_IOC_FSBULKSTAT_SINGLE ||
> +           cmd == XFS_IOC_FSBULKSTAT ||
> +           cmd == XFS_IOC_SWAPEXT)
> +               return false;
> +
> +       return true;
> +}
> +
>  STATIC int
>  xfs_ioc_fsbulkstat(
>         xfs_mount_t             *mp,
> @@ -850,6 +867,9 @@ xfs_ioc_fsbulkstat(
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EPERM;
> 
> +       if (!xfs_have_compat_bstat_time32())
> +               return -EINVAL;
> +
>         if (XFS_FORCED_SHUTDOWN(mp))
>                 return -EIO;
> 
> @@ -2051,6 +2071,11 @@ xfs_ioc_swapext(
>         struct fd       f, tmp;
>         int             error = 0;
> 
> +       if (xfs_have_compat_bstat_time32(XFS_IOC_SWAPEXT)) {
> +               error = -EINVAL
> +               got out;
> +       }
> +
>         /* Pull information for the target fd */
>         f = fdget((int)sxp->sx_fdtarget);
>         if (!f.file) {
> 
> This way, at least users that intentionally turn off CONFIG_COMPAT_32BIT_TIME
> run into the broken application right away, which forces them to upgrade or fix
> the code to use the v5 ioctl.

Sounds reasonable.

--D

> > > > Based on those assumptions, changing xfs_bstime to use __kernel_long_t
> > > > instead of time_t in both the kernel and in xfsprogs preserves the current
> > > > ABI for any libc definition of time_t and solves the problem of passing
> > > > 64-bit timestamps to 32-bit user space.
> > >
> > > As said above their are not entirely true, but I still think this patch
> > > is the right thing to do, if only to get the time_t out of the ABI..
> > >
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> >
> > Seconded,
> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Thanks!
> 
>      Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 1/5] xfs: [variant A] avoid time_t in user api
  2019-11-13 16:34         ` Darrick J. Wong
@ 2019-11-13 17:14           ` Arnd Bergmann
  0 siblings, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2019-11-13 17:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, linux-xfs, y2038 Mailman List, Deepa Dinamani,
	Dave Chinner, Brian Foster, Allison Collins,
	Linux FS-devel Mailing List

On Wed, Nov 13, 2019 at 5:34 PM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Wed, Nov 13, 2019 at 02:42:24PM +0100, Arnd Bergmann wrote:
> > On Wed, Nov 13, 2019 at 6:08 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > > On Tue, Nov 12, 2019 at 03:16:00PM +0100, Christoph Hellwig wrote:
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index d50135760622..87318486c96e 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -830,6 +830,23 @@ xfs_fsinumbers_fmt(
> >         return xfs_ibulk_advance(breq, sizeof(struct xfs_inogrp));
> >  }
> >
> > +/* disallow y2038-unsafe ioctls with CONFIG_COMPAT_32BIT_TIME=n */
> > +static bool xfs_have_compat_bstat_time32(unsigned int cmd)
> > +{
>
> Wouldn't we want a test here like:
>
>         if (!xfs_sb_version_hasbigtimestamps(&mp->m_sb))
>                 return true;
>
> Since date overflow isn't a problem for existing xfs with 32-bit
> timestamps, right?

I'd say probably not.

This would be for a kernel that intentionally only supports y2038-safe
user space to ensure that we don't get any surprises later. In that
configuration, I think we're still better off not allowing broken ioctls
regardless of whether the file system is also broken. ;-)

> > +       if (IS_ENABLED(CONFIG_COMPAT_32BIT_TIME))
>
> Heh, I didn't know that existed.

It barely does. At the moment it is always enabled on all architectures
except for 32-bit riscv, but I submitted a patch to make it user selectable
last week.

      Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-11-13 17:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 12:09 [RFC 0/5] xfs: y2038 conversion Arnd Bergmann
2019-11-12 12:09 ` [RFC 1/5] xfs: [variant A] avoid time_t in user api Arnd Bergmann
2019-11-12 14:16   ` Christoph Hellwig
2019-11-13  5:06     ` Darrick J. Wong
2019-11-13 13:42       ` Arnd Bergmann
2019-11-13 16:34         ` Darrick J. Wong
2019-11-13 17:14           ` Arnd Bergmann
2019-11-12 12:09 ` [RFC 2/5] xfs: [variant B] add time64 version of xfs_bstat Arnd Bergmann
2019-11-12 12:09 ` [RFC 3/5] xfs: [variant C] avoid i386-misaligned xfs_bstat Arnd Bergmann
2019-11-12 12:09 ` [RFC 4/5] xfs: extend inode format for 40-bit timestamps Arnd Bergmann
2019-11-12 14:16   ` Christoph Hellwig
2019-11-12 15:02     ` Amir Goldstein
2019-11-12 15:29       ` [Y2038] " Arnd Bergmann
2019-11-12 21:32   ` Dave Chinner
2019-11-12 12:09 ` [RFC 5/5] xfs: use 40-bit quota time limits Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).