All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] xfs: online health tracking support
@ 2019-04-01 17:10 Darrick J. Wong
  2019-04-01 17:10 ` [PATCH 01/10] xfs: track metadata health levels Darrick J. Wong
                   ` (9 more replies)
  0 siblings, 10 replies; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-01 17:10 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This series adds online health tracking capabilities to XFS, which
enables userspace to discover if any metadata corruptions have been
found (and not fixed) within a given class of metadata.

The reporting of metadata health problems is triggered only by the
online scrub code, though in principle a metadata read encountering
corruption could also set a sick flag.

Online repair will clear the appropriate sick flags when metadata passes
its inspection after a repair attempt.

Reporting to userspace is handled by three ioctl modifications:
enhancements of the existing fs geometry ioctl to include a health
field; enhancement of the existing bulkstat ioctl to report health, and
a totally new ioctl to report allocation group geometry and status.

On the userspace side of things, xfs_scrub is adapted to give a clean
bill of health to the kernel when it is warranted, and xfs_spaceman can
now perform live health reporting.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=health-tracking

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=health-tracking

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/10] xfs: track metadata health levels
  2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
@ 2019-04-01 17:10 ` Darrick J. Wong
  2019-04-02 13:22   ` Brian Foster
  2019-04-01 17:10 ` [PATCH 02/10] xfs: replace the BAD_SUMMARY mount flag with the equivalent health code Darrick J. Wong
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-01 17:10 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add the necessary in-core metadata fields to keep track of which parts
of the filesystem have been observed to be unhealthy, and print a
warning at unmount time if we have unfixed problems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile            |    1 
 fs/xfs/libxfs/xfs_health.h |  201 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_health.c        |  192 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_inode.h         |    7 ++
 fs/xfs/xfs_mount.c         |    1 
 fs/xfs/xfs_mount.h         |   23 +++++
 fs/xfs/xfs_trace.h         |   73 ++++++++++++++++
 7 files changed, 498 insertions(+)
 create mode 100644 fs/xfs/libxfs/xfs_health.h
 create mode 100644 fs/xfs/xfs_health.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 7f96bdadc372..786379c143f4 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -73,6 +73,7 @@ xfs-y				+= xfs_aops.o \
 				   xfs_fsmap.o \
 				   xfs_fsops.o \
 				   xfs_globals.o \
+				   xfs_health.o \
 				   xfs_icache.o \
 				   xfs_ioctl.o \
 				   xfs_iomap.o \
diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
new file mode 100644
index 000000000000..0d51bd2689ea
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_health.h
@@ -0,0 +1,201 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#ifndef __XFS_HEALTH_H__
+#define __XFS_HEALTH_H__
+
+/*
+ * In-Core Filesystem Health Assessments
+ * =====================================
+ *
+ * We'd like to be able to summarize the current health status of the
+ * filesystem so that the administrator knows when it's necessary to schedule
+ * some downtime for repairs.  Until then, we would also like to avoid abrupt
+ * shutdowns due to corrupt metadata.
+ *
+ * The online scrub feature evaluates the health of all filesystem metadata.
+ * When scrub detects corruption in a piece of metadata it will set the
+ * corresponding sickness flag, and repair will clear it if successful.
+ *
+ * If problems remain at unmount time, we can also request manual intervention
+ * by logging a notice to run xfs_repair.
+ *
+ * Evidence of health problems can be sorted into three basic categories:
+ *
+ * a) Primary evidence, which signals that something is defective within the
+ *    general grouping of metadata.
+ *
+ * b) Secondary evidence, which are side effects of primary problem but are
+ *    not themselves problems.  These can be forgotten when the primary
+ *    health problems are addressed.
+ *
+ * c) Indirect evidence, which points to something being wrong in another
+ *    group, but we had to release resources and this is all that's left of
+ *    that state.
+ */
+
+struct xfs_mount;
+struct xfs_perag;
+struct xfs_inode;
+
+/* Observable health issues for metadata spanning the entire filesystem. */
+#define XFS_HEALTH_FS_COUNTERS	(1 << 0)  /* summary counters */
+#define XFS_HEALTH_FS_UQUOTA	(1 << 1)  /* user quota */
+#define XFS_HEALTH_FS_GQUOTA	(1 << 2)  /* group quota */
+#define XFS_HEALTH_FS_PQUOTA	(1 << 3)  /* project quota */
+
+/* Observable health issues for realtime volume metadata. */
+#define XFS_HEALTH_RT_BITMAP	(1 << 0)  /* realtime bitmap */
+#define XFS_HEALTH_RT_SUMMARY	(1 << 1)  /* realtime summary */
+
+/* Observable health issues for AG metadata. */
+#define XFS_HEALTH_AG_SB	(1 << 0)  /* superblock */
+#define XFS_HEALTH_AG_AGF	(1 << 1)  /* AGF header */
+#define XFS_HEALTH_AG_AGFL	(1 << 2)  /* AGFL header */
+#define XFS_HEALTH_AG_AGI	(1 << 3)  /* AGI header */
+#define XFS_HEALTH_AG_BNOBT	(1 << 4)  /* free space by block */
+#define XFS_HEALTH_AG_CNTBT	(1 << 5)  /* free space by length */
+#define XFS_HEALTH_AG_INOBT	(1 << 6)  /* inode index */
+#define XFS_HEALTH_AG_FINOBT	(1 << 7)  /* free inode index */
+#define XFS_HEALTH_AG_RMAPBT	(1 << 8)  /* reverse mappings */
+#define XFS_HEALTH_AG_REFCNTBT	(1 << 9)  /* reference counts */
+
+/* Observable health issues for inode metadata. */
+#define XFS_HEALTH_INO_CORE	(1 << 0)  /* inode core */
+#define XFS_HEALTH_INO_BMBTD	(1 << 1)  /* data fork */
+#define XFS_HEALTH_INO_BMBTA	(1 << 2)  /* attr fork */
+#define XFS_HEALTH_INO_BMBTC	(1 << 3)  /* cow fork */
+#define XFS_HEALTH_INO_DIR	(1 << 4)  /* directory */
+#define XFS_HEALTH_INO_XATTR	(1 << 5)  /* extended attributes */
+#define XFS_HEALTH_INO_SYMLINK	(1 << 6)  /* symbolic link remote target */
+#define XFS_HEALTH_INO_PARENT	(1 << 7)  /* parent pointers */
+
+/* Primary evidence of health problems in a given group. */
+#define XFS_HEALTH_FS_PRIMARY	(XFS_HEALTH_FS_COUNTERS | \
+				 XFS_HEALTH_FS_UQUOTA | \
+				 XFS_HEALTH_FS_GQUOTA | \
+				 XFS_HEALTH_FS_PQUOTA)
+
+#define XFS_HEALTH_RT_PRIMARY	(XFS_HEALTH_RT_BITMAP | \
+				 XFS_HEALTH_RT_SUMMARY)
+
+#define XFS_HEALTH_AG_PRIMARY	(XFS_HEALTH_AG_SB | \
+				 XFS_HEALTH_AG_AGF | \
+				 XFS_HEALTH_AG_AGFL | \
+				 XFS_HEALTH_AG_AGI | \
+				 XFS_HEALTH_AG_BNOBT | \
+				 XFS_HEALTH_AG_CNTBT | \
+				 XFS_HEALTH_AG_INOBT | \
+				 XFS_HEALTH_AG_FINOBT | \
+				 XFS_HEALTH_AG_RMAPBT | \
+				 XFS_HEALTH_AG_REFCNTBT)
+
+#define XFS_HEALTH_INO_PRIMARY	(XFS_HEALTH_INO_CORE | \
+				 XFS_HEALTH_INO_BMBTD | \
+				 XFS_HEALTH_INO_BMBTA | \
+				 XFS_HEALTH_INO_BMBTC | \
+				 XFS_HEALTH_INO_DIR | \
+				 XFS_HEALTH_INO_XATTR | \
+				 XFS_HEALTH_INO_SYMLINK | \
+				 XFS_HEALTH_INO_PARENT)
+
+/* Secondary state related to (but not primary evidence of) health problems. */
+#define XFS_HEALTH_FS_SECONDARY	(0)
+#define XFS_HEALTH_RT_SECONDARY	(0)
+#define XFS_HEALTH_AG_SECONDARY	(0)
+#define XFS_HEALTH_INO_SECONDARY (0)
+
+/* Evidence of health problems elsewhere. */
+#define XFS_HEALTH_FS_INDIRECT	(0)
+#define XFS_HEALTH_RT_INDIRECT	(0)
+#define XFS_HEALTH_AG_INDIRECT	(0)
+#define XFS_HEALTH_INO_INDIRECT	(0)
+
+/* All health masks. */
+#define XFS_HEALTH_FS_ALL	(XFS_HEALTH_FS_PRIMARY | \
+				 XFS_HEALTH_FS_SECONDARY | \
+				 XFS_HEALTH_FS_INDIRECT)
+
+#define XFS_HEALTH_RT_ALL	(XFS_HEALTH_RT_PRIMARY | \
+				 XFS_HEALTH_RT_SECONDARY | \
+				 XFS_HEALTH_RT_INDIRECT)
+
+#define XFS_HEALTH_AG_ALL	(XFS_HEALTH_AG_PRIMARY | \
+				 XFS_HEALTH_AG_SECONDARY | \
+				 XFS_HEALTH_AG_INDIRECT)
+
+#define XFS_HEALTH_INO_ALL	(XFS_HEALTH_INO_PRIMARY | \
+				 XFS_HEALTH_INO_SECONDARY | \
+				 XFS_HEALTH_INO_INDIRECT)
+
+/* These functions must be provided by the xfs implementation. */
+
+void xfs_fs_mark_sick(struct xfs_mount *mp, unsigned int mask);
+void xfs_fs_mark_healthy(struct xfs_mount *mp, unsigned int mask);
+unsigned int xfs_fs_measure_sickness(struct xfs_mount *mp);
+
+void xfs_rt_mark_sick(struct xfs_mount *mp, unsigned int mask);
+void xfs_rt_mark_healthy(struct xfs_mount *mp, unsigned int mask);
+unsigned int xfs_rt_measure_sickness(struct xfs_mount *mp);
+
+void xfs_ag_mark_sick(struct xfs_perag *pag, unsigned int mask);
+void xfs_ag_mark_healthy(struct xfs_perag *pag, unsigned int mask);
+unsigned int xfs_ag_measure_sickness(struct xfs_perag *pag);
+
+void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask);
+void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask);
+unsigned int xfs_inode_measure_sickness(struct xfs_inode *ip);
+
+/* Now some helpers. */
+
+static inline bool
+xfs_fs_is_sick(struct xfs_mount *mp, unsigned int mask)
+{
+	return (xfs_fs_measure_sickness(mp) & mask) != 0;
+}
+
+static inline bool
+xfs_rt_is_sick(struct xfs_mount *mp, unsigned int mask)
+{
+	return (xfs_rt_measure_sickness(mp) & mask) != 0;
+}
+
+static inline bool
+xfs_ag_is_sick(struct xfs_perag *pag, unsigned int mask)
+{
+	return (xfs_ag_measure_sickness(pag) & mask) != 0;
+}
+
+static inline bool
+xfs_inode_is_sick(struct xfs_inode *ip, unsigned int mask)
+{
+	return (xfs_inode_measure_sickness(ip) & mask) != 0;
+}
+
+static inline bool
+xfs_fs_healthy(struct xfs_mount *mp)
+{
+	return xfs_fs_measure_sickness(mp) == 0;
+}
+
+static inline bool
+xfs_rt_healthy(struct xfs_mount *mp)
+{
+	return xfs_rt_measure_sickness(mp) == 0;
+}
+
+static inline bool
+xfs_ag_healthy(struct xfs_perag *pag)
+{
+	return xfs_ag_measure_sickness(pag) == 0;
+}
+
+static inline bool
+xfs_inode_healthy(struct xfs_inode *ip)
+{
+	return xfs_inode_measure_sickness(ip) == 0;
+}
+
+#endif	/* __XFS_HEALTH_H__ */
diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
new file mode 100644
index 000000000000..e9d6859f7501
--- /dev/null
+++ b/fs/xfs/xfs_health.c
@@ -0,0 +1,192 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_inode.h"
+#include "xfs_trace.h"
+#include "xfs_health.h"
+
+/* Mark unhealthy per-fs metadata. */
+void
+xfs_fs_mark_sick(
+	struct xfs_mount	*mp,
+	unsigned int		mask)
+{
+	ASSERT(!(mask & ~XFS_HEALTH_FS_ALL));
+	trace_xfs_fs_mark_sick(mp, mask);
+
+	spin_lock(&mp->m_sb_lock);
+	mp->m_sick |= mask;
+	spin_unlock(&mp->m_sb_lock);
+}
+
+/* Mark a per-fs metadata healed. */
+void
+xfs_fs_mark_healthy(
+	struct xfs_mount	*mp,
+	unsigned int		mask)
+{
+	ASSERT(!(mask & ~XFS_HEALTH_FS_ALL));
+	trace_xfs_fs_mark_healthy(mp, mask);
+
+	spin_lock(&mp->m_sb_lock);
+	mp->m_sick &= ~mask;
+	if (!(mp->m_sick & XFS_HEALTH_FS_PRIMARY))
+		mp->m_sick &= ~XFS_HEALTH_FS_SECONDARY;
+	spin_unlock(&mp->m_sb_lock);
+}
+
+/* Sample which per-fs metadata are unhealthy. */
+unsigned int
+xfs_fs_measure_sickness(
+	struct xfs_mount	*mp)
+{
+	unsigned int		ret;
+
+	spin_lock(&mp->m_sb_lock);
+	ret = mp->m_sick;
+	spin_unlock(&mp->m_sb_lock);
+	return ret;
+}
+
+/* Mark unhealthy realtime metadata. */
+void
+xfs_rt_mark_sick(
+	struct xfs_mount	*mp,
+	unsigned int		mask)
+{
+	ASSERT(!(mask & ~XFS_HEALTH_RT_ALL));
+	trace_xfs_rt_mark_sick(mp, mask);
+
+	spin_lock(&mp->m_sb_lock);
+	mp->m_rt_sick |= mask;
+	spin_unlock(&mp->m_sb_lock);
+}
+
+/* Mark a realtime metadata healed. */
+void
+xfs_rt_mark_healthy(
+	struct xfs_mount	*mp,
+	unsigned int		mask)
+{
+	ASSERT(!(mask & ~XFS_HEALTH_RT_ALL));
+	trace_xfs_rt_mark_healthy(mp, mask);
+
+	spin_lock(&mp->m_sb_lock);
+	mp->m_rt_sick &= ~mask;
+	if (!(mp->m_rt_sick & XFS_HEALTH_RT_PRIMARY))
+		mp->m_rt_sick &= ~XFS_HEALTH_RT_SECONDARY;
+	spin_unlock(&mp->m_sb_lock);
+}
+
+/* Sample which realtime metadata are unhealthy. */
+unsigned int
+xfs_rt_measure_sickness(
+	struct xfs_mount	*mp)
+{
+	unsigned int		ret;
+
+	spin_lock(&mp->m_sb_lock);
+	ret = mp->m_rt_sick;
+	spin_unlock(&mp->m_sb_lock);
+	return ret;
+}
+
+/* Mark unhealthy per-ag metadata. */
+void
+xfs_ag_mark_sick(
+	struct xfs_perag	*pag,
+	unsigned int		mask)
+{
+	ASSERT(!(mask & ~XFS_HEALTH_AG_ALL));
+	trace_xfs_ag_mark_sick(pag->pag_mount, pag->pag_agno, mask);
+
+	spin_lock(&pag->pag_state_lock);
+	pag->pag_sick |= mask;
+	spin_unlock(&pag->pag_state_lock);
+}
+
+/* Mark per-ag metadata ok. */
+void
+xfs_ag_mark_healthy(
+	struct xfs_perag	*pag,
+	unsigned int		mask)
+{
+	ASSERT(!(mask & ~XFS_HEALTH_AG_ALL));
+	trace_xfs_ag_mark_healthy(pag->pag_mount, pag->pag_agno, mask);
+
+	spin_lock(&pag->pag_state_lock);
+	pag->pag_sick &= ~mask;
+	if (!(pag->pag_sick & XFS_HEALTH_AG_PRIMARY))
+		pag->pag_sick &= ~XFS_HEALTH_AG_SECONDARY;
+	spin_unlock(&pag->pag_state_lock);
+}
+
+/* Sample which per-ag metadata are unhealthy. */
+unsigned int
+xfs_ag_measure_sickness(
+	struct xfs_perag	*pag)
+{
+	unsigned int		ret;
+
+	spin_lock(&pag->pag_state_lock);
+	ret = pag->pag_sick;
+	spin_unlock(&pag->pag_state_lock);
+	return ret;
+}
+
+/* Mark the unhealthy parts of an inode. */
+void
+xfs_inode_mark_sick(
+	struct xfs_inode	*ip,
+	unsigned int		mask)
+{
+	ASSERT(!(mask & ~XFS_HEALTH_INO_ALL));
+	trace_xfs_inode_mark_sick(ip, mask);
+
+	spin_lock(&ip->i_flags_lock);
+	ip->i_sick |= mask;
+	spin_unlock(&ip->i_flags_lock);
+}
+
+/* Mark parts of an inode healed. */
+void
+xfs_inode_mark_healthy(
+	struct xfs_inode	*ip,
+	unsigned int		mask)
+{
+	ASSERT(!(mask & ~XFS_HEALTH_INO_ALL));
+	trace_xfs_inode_mark_healthy(ip, mask);
+
+	spin_lock(&ip->i_flags_lock);
+	ip->i_sick &= ~mask;
+	if (!(ip->i_sick & XFS_HEALTH_INO_PRIMARY))
+		ip->i_sick &= ~XFS_HEALTH_INO_SECONDARY;
+	spin_unlock(&ip->i_flags_lock);
+}
+
+/* Sample which parts of an inode are unhealthy. */
+unsigned int
+xfs_inode_measure_sickness(
+	struct xfs_inode	*ip)
+{
+	unsigned int		ret;
+
+	spin_lock(&ip->i_flags_lock);
+	ret = ip->i_sick;
+	spin_unlock(&ip->i_flags_lock);
+	return ret;
+}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 88239c2dd824..877acdd5f026 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -45,6 +45,13 @@ typedef struct xfs_inode {
 	mrlock_t		i_lock;		/* inode lock */
 	mrlock_t		i_mmaplock;	/* inode mmap IO lock */
 	atomic_t		i_pincount;	/* inode pin count */
+
+	/*
+	 * Bitset noting which parts of an inode are not healthy.
+	 * Callers must hold i_flags_lock before accessing this field.
+	 */
+	unsigned int		i_sick;
+
 	spinlock_t		i_flags_lock;	/* inode i_flags lock */
 	/* Miscellaneous state. */
 	unsigned long		i_flags;	/* see defined flags below */
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 950752e5ec2c..fc1f24dd0386 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -231,6 +231,7 @@ xfs_initialize_perag(
 		error = xfs_iunlink_init(pag);
 		if (error)
 			goto out_hash_destroy;
+		spin_lock_init(&pag->pag_state_lock);
 	}
 
 	index = xfs_set_inode_alloc(mp, agcount);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 15dc02964113..63bbafb01eb5 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -60,6 +60,13 @@ struct xfs_error_cfg {
 typedef struct xfs_mount {
 	struct super_block	*m_super;
 	xfs_tid_t		m_tid;		/* next unused tid for fs */
+
+	/*
+	 * Bitset of unhealthy per-fs metadata.
+	 * Callers must hold m_sb_lock to access this field.
+	 */
+	unsigned int		m_sick;
+
 	struct xfs_ail		*m_ail;		/* fs active log item list */
 
 	struct xfs_sb		m_sb;		/* copy of fs superblock */
@@ -71,6 +78,11 @@ typedef struct xfs_mount {
 	struct xfs_buf		*m_sb_bp;	/* buffer for superblock */
 	char			*m_fsname;	/* filesystem name */
 	int			m_fsname_len;	/* strlen of fs name */
+	/*
+	 * Bitset of unhealthy rt volume metadata.
+	 * Callers must hold m_sb_lock to access this field.
+	 */
+	unsigned int		m_rt_sick;
 	char			*m_rtname;	/* realtime device name */
 	char			*m_logname;	/* external log device name */
 	int			m_bsize;	/* fs logical block size */
@@ -389,6 +401,17 @@ typedef struct xfs_perag {
 	 * or have some other means to control concurrency.
 	 */
 	struct rhashtable	pagi_unlinked_hash;
+
+	/* Spinlock to protect in-core per-ag state */
+	spinlock_t	pag_state_lock;
+
+	/*
+	 * Bitset of unhealthy AG metadata.
+	 *
+	 * Callers should hold pag_state_lock and the relevant AG header buffer
+	 * lock before accessing this field.
+	 */
+	unsigned int	pag_sick;
 } xfs_perag_t;
 
 static inline struct xfs_ag_resv *
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 47fb07d86efd..f079841c7af6 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3440,6 +3440,79 @@ DEFINE_AGINODE_EVENT(xfs_iunlink);
 DEFINE_AGINODE_EVENT(xfs_iunlink_remove);
 DEFINE_AG_EVENT(xfs_iunlink_map_prev_fallback);
 
+DECLARE_EVENT_CLASS(xfs_fs_corrupt_class,
+	TP_PROTO(struct xfs_mount *mp, unsigned int flags),
+	TP_ARGS(mp, flags),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, flags)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->flags = flags;
+	),
+	TP_printk("dev %d:%d flags 0x%x",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->flags)
+);
+#define DEFINE_FS_CORRUPT_EVENT(name)	\
+DEFINE_EVENT(xfs_fs_corrupt_class, name,	\
+	TP_PROTO(struct xfs_mount *mp, unsigned int flags), \
+	TP_ARGS(mp, flags))
+DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick);
+DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy);
+DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick);
+DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy);
+
+DECLARE_EVENT_CLASS(xfs_ag_corrupt_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags),
+	TP_ARGS(mp, agno, flags),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(unsigned int, flags)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->flags = flags;
+	),
+	TP_printk("dev %d:%d agno %u flags 0x%x",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno, __entry->flags)
+);
+#define DEFINE_AG_CORRUPT_EVENT(name)	\
+DEFINE_EVENT(xfs_ag_corrupt_class, name,	\
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 unsigned int flags), \
+	TP_ARGS(mp, agno, flags))
+DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick);
+DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy);
+
+DECLARE_EVENT_CLASS(xfs_inode_corrupt_class,
+	TP_PROTO(struct xfs_inode *ip, unsigned int flags),
+	TP_ARGS(ip, flags),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, flags)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->flags = flags;
+	),
+	TP_printk("dev %d:%d ino 0x%llx flags 0x%x",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino, __entry->flags)
+);
+#define DEFINE_INODE_CORRUPT_EVENT(name)	\
+DEFINE_EVENT(xfs_inode_corrupt_class, name,	\
+	TP_PROTO(struct xfs_inode *ip, unsigned int flags), \
+	TP_ARGS(ip, flags))
+DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_sick);
+DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_healthy);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 02/10] xfs: replace the BAD_SUMMARY mount flag with the equivalent health code
  2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
  2019-04-01 17:10 ` [PATCH 01/10] xfs: track metadata health levels Darrick J. Wong
@ 2019-04-01 17:10 ` Darrick J. Wong
  2019-04-02 13:22   ` Brian Foster
  2019-04-01 17:10 ` [PATCH 03/10] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem Darrick J. Wong
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-01 17:10 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Replace the BAD_SUMMARY mount flag with calls to the equivalent health
tracking code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_sb.c |    5 +++--
 fs/xfs/xfs_log.c       |    3 ++-
 fs/xfs/xfs_mount.c     |    9 ++++-----
 fs/xfs/xfs_mount.h     |    1 -
 4 files changed, 9 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index f96b1997938e..f0309b74e377 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -30,6 +30,7 @@
 #include "xfs_refcount_btree.h"
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
+#include "xfs_health.h"
 
 /*
  * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -907,7 +908,7 @@ xfs_initialize_perag_data(
 	/*
 	 * If the new summary counts are obviously incorrect, fail the
 	 * mount operation because that implies the AGFs are also corrupt.
-	 * Clear BAD_SUMMARY so that we don't unmount with a dirty log, which
+	 * Clear FS_COUNTERS so that we don't unmount with a dirty log, which
 	 * will prevent xfs_repair from fixing anything.
 	 */
 	if (fdblocks > sbp->sb_dblocks || ifree > ialloc) {
@@ -925,7 +926,7 @@ xfs_initialize_perag_data(
 
 	xfs_reinit_percpu_counters(mp);
 out:
-	mp->m_flags &= ~XFS_MOUNT_BAD_SUMMARY;
+	xfs_fs_mark_healthy(mp, XFS_HEALTH_FS_COUNTERS);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index c3b610b687d1..0f418842a035 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -23,6 +23,7 @@
 #include "xfs_cksum.h"
 #include "xfs_sysfs.h"
 #include "xfs_sb.h"
+#include "xfs_health.h"
 
 kmem_zone_t	*xfs_log_ticket_zone;
 
@@ -861,7 +862,7 @@ xfs_log_write_unmount_record(
 	 * recalculated during log recovery at next mount.  Refer to
 	 * xlog_check_unmount_rec for more details.
 	 */
-	if (XFS_TEST_ERROR((mp->m_flags & XFS_MOUNT_BAD_SUMMARY), mp,
+	if (XFS_TEST_ERROR(xfs_fs_is_sick(mp, XFS_HEALTH_FS_COUNTERS), mp,
 			XFS_ERRTAG_FORCE_SUMMARY_RECALC)) {
 		xfs_alert(mp, "%s: will fix summary counters at next mount",
 				__func__);
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index fc1f24dd0386..a43ca655a431 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -34,6 +34,7 @@
 #include "xfs_refcount_btree.h"
 #include "xfs_reflink.h"
 #include "xfs_extent_busy.h"
+#include "xfs_health.h"
 
 
 static DEFINE_MUTEX(xfs_uuid_table_mutex);
@@ -647,7 +648,7 @@ xfs_check_summary_counts(
 	    (mp->m_sb.sb_fdblocks > mp->m_sb.sb_dblocks ||
 	     !xfs_verify_icount(mp, mp->m_sb.sb_icount) ||
 	     mp->m_sb.sb_ifree > mp->m_sb.sb_icount))
-		mp->m_flags |= XFS_MOUNT_BAD_SUMMARY;
+		xfs_fs_mark_sick(mp, XFS_HEALTH_FS_COUNTERS);
 
 	/*
 	 * We can safely re-initialise incore superblock counters from the
@@ -662,7 +663,7 @@ xfs_check_summary_counts(
 	 */
 	if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
 	     XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
-	    !(mp->m_flags & XFS_MOUNT_BAD_SUMMARY))
+	    !xfs_fs_is_sick(mp, XFS_HEALTH_FS_COUNTERS))
 		return 0;
 
 	return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount);
@@ -1451,7 +1452,5 @@ xfs_force_summary_recalc(
 	if (!xfs_sb_version_haslazysbcount(&mp->m_sb))
 		return;
 
-	spin_lock(&mp->m_sb_lock);
-	mp->m_flags |= XFS_MOUNT_BAD_SUMMARY;
-	spin_unlock(&mp->m_sb_lock);
+	xfs_fs_mark_sick(mp, XFS_HEALTH_FS_COUNTERS);
 }
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 63bbafb01eb5..6e7728340ca7 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -211,7 +211,6 @@ typedef struct xfs_mount {
 						   must be synchronous except
 						   for space allocations */
 #define XFS_MOUNT_UNMOUNTING	(1ULL << 1)	/* filesystem is unmounting */
-#define XFS_MOUNT_BAD_SUMMARY	(1ULL << 2)	/* summary counters are bad */
 #define XFS_MOUNT_WAS_CLEAN	(1ULL << 3)
 #define XFS_MOUNT_FS_SHUTDOWN	(1ULL << 4)	/* atomic stop of all filesystem
 						   operations, typically for

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 03/10] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem
  2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
  2019-04-01 17:10 ` [PATCH 01/10] xfs: track metadata health levels Darrick J. Wong
  2019-04-01 17:10 ` [PATCH 02/10] xfs: replace the BAD_SUMMARY mount flag with the equivalent health code Darrick J. Wong
@ 2019-04-01 17:10 ` Darrick J. Wong
  2019-04-02 13:24   ` Brian Foster
  2019-04-01 17:10 ` [PATCH 04/10] xfs: expand xfs_fsop_geom Darrick J. Wong
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-01 17:10 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

If we know the filesystem metadata isn't healthy during unmount, we want
to encourage the administrator to run xfs_repair right away.  We can't
do this if BAD_SUMMARY will cause an unclean log unmount to force
summary recalculation, so turn it off if the fs is bad.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_health.h |    2 +
 fs/xfs/xfs_health.c        |   59 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_mount.c         |    2 +
 fs/xfs/xfs_trace.h         |    3 ++
 4 files changed, 66 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
index 0d51bd2689ea..269b124dc1d7 100644
--- a/fs/xfs/libxfs/xfs_health.h
+++ b/fs/xfs/libxfs/xfs_health.h
@@ -148,6 +148,8 @@ void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask);
 void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask);
 unsigned int xfs_inode_measure_sickness(struct xfs_inode *ip);
 
+void xfs_health_unmount(struct xfs_mount *mp);
+
 /* Now some helpers. */
 
 static inline bool
diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
index e9d6859f7501..6e2da858c356 100644
--- a/fs/xfs/xfs_health.c
+++ b/fs/xfs/xfs_health.c
@@ -19,6 +19,65 @@
 #include "xfs_trace.h"
 #include "xfs_health.h"
 
+/*
+ * Warn about metadata corruption that we detected but haven't fixed, and
+ * make sure we're not sitting on anything that would get in the way of
+ * recovery.
+ */
+void
+xfs_health_unmount(
+	struct xfs_mount	*mp)
+{
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		agno;
+	unsigned int		sick;
+	bool			warn = false;
+
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return;
+
+	/* Measure AG corruption levels. */
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		pag = xfs_perag_get(mp, agno);
+		spin_lock(&pag->pag_state_lock);
+		if (pag->pag_sick) {
+			trace_xfs_ag_unfixed_corruption(mp, agno, sick);
+			warn = true;
+		}
+		spin_unlock(&pag->pag_state_lock);
+		xfs_perag_put(pag);
+	}
+
+	/* Measure realtime volume corruption levels. */
+	sick = xfs_rt_measure_sickness(mp);
+	if (sick) {
+		trace_xfs_rt_unfixed_corruption(mp, sick);
+		warn = true;
+	}
+
+	/* Measure fs corruption and keep the sample around for the warning. */
+	sick = xfs_fs_measure_sickness(mp);
+	if (sick) {
+		trace_xfs_fs_unfixed_corruption(mp, sick);
+		warn = true;
+	}
+
+	if (warn) {
+		xfs_warn(mp,
+"Uncorrected metadata errors detected; please run xfs_repair.");
+
+		/*
+		 * If we have unhealthy metadata, we want the admin to run
+		 * xfs_repair after unmounting.  They can't do that if the log
+		 * is written out without a clean unmount record (such as when
+		 * the summary counters are marked unhealthy to force
+		 * recalculation of the summary counters) so clear it.
+		 */
+		if (sick & XFS_HEALTH_FS_COUNTERS)
+			xfs_fs_mark_healthy(mp, XFS_HEALTH_FS_COUNTERS);
+	}
+}
+
 /* Mark unhealthy per-fs metadata. */
 void
 xfs_fs_mark_sick(
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index a43ca655a431..f0f73d598a0c 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1075,6 +1075,7 @@ xfs_mountfs(
 	 */
 	cancel_delayed_work_sync(&mp->m_reclaim_work);
 	xfs_reclaim_inodes(mp, SYNC_WAIT);
+	xfs_health_unmount(mp);
  out_log_dealloc:
 	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
 	xfs_log_mount_cancel(mp);
@@ -1157,6 +1158,7 @@ xfs_unmountfs(
 	 */
 	cancel_delayed_work_sync(&mp->m_reclaim_work);
 	xfs_reclaim_inodes(mp, SYNC_WAIT);
+	xfs_health_unmount(mp);
 
 	xfs_qm_unmount(mp);
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index f079841c7af6..2464ea351f83 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3461,8 +3461,10 @@ DEFINE_EVENT(xfs_fs_corrupt_class, name,	\
 	TP_ARGS(mp, flags))
 DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick);
 DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy);
+DEFINE_FS_CORRUPT_EVENT(xfs_fs_unfixed_corruption);
 DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick);
 DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy);
+DEFINE_FS_CORRUPT_EVENT(xfs_rt_unfixed_corruption);
 
 DECLARE_EVENT_CLASS(xfs_ag_corrupt_class,
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags),
@@ -3488,6 +3490,7 @@ DEFINE_EVENT(xfs_ag_corrupt_class, name,	\
 	TP_ARGS(mp, agno, flags))
 DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick);
 DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy);
+DEFINE_AG_CORRUPT_EVENT(xfs_ag_unfixed_corruption);
 
 DECLARE_EVENT_CLASS(xfs_inode_corrupt_class,
 	TP_PROTO(struct xfs_inode *ip, unsigned int flags),

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 04/10] xfs: expand xfs_fsop_geom
  2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2019-04-01 17:10 ` [PATCH 03/10] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem Darrick J. Wong
@ 2019-04-01 17:10 ` Darrick J. Wong
  2019-04-02 17:34   ` Brian Foster
  2019-04-02 21:53   ` Dave Chinner
  2019-04-01 17:10 ` [PATCH 05/10] xfs: add a new ioctl to describe allocation group geometry Darrick J. Wong
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-01 17:10 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Rename the current (v2-v4) geometry ioctl XFS_IOC_FSGEOMETRY_V2 and
expand the existing xfs_fsop_geom to reserve empty space for more
fields.  This means that newly built binaries will pick up the new
format and existing programs will simply end up in the V2 handler.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h |   32 +++++++++++++++++++++++++++++++-
 fs/xfs/libxfs/xfs_sb.c |    5 +++++
 fs/xfs/xfs_ioctl.c     |   22 ++++++++++++++++++++--
 fs/xfs/xfs_ioctl32.c   |    1 +
 4 files changed, 57 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index f3aa59302fef..1dba751cde60 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -148,7 +148,34 @@ typedef struct xfs_fsop_geom_v1 {
 } xfs_fsop_geom_v1_t;
 
 /*
- * Output for XFS_IOC_FSGEOMETRY
+ * Output for XFS_IOC_FSGEOMETRY_V2
+ */
+typedef struct xfs_fsop_geom_v2 {
+	__u32		blocksize;	/* filesystem (data) block size */
+	__u32		rtextsize;	/* realtime extent size		*/
+	__u32		agblocks;	/* fsblocks in an AG		*/
+	__u32		agcount;	/* number of allocation groups	*/
+	__u32		logblocks;	/* fsblocks in the log		*/
+	__u32		sectsize;	/* (data) sector size, bytes	*/
+	__u32		inodesize;	/* inode size in bytes		*/
+	__u32		imaxpct;	/* max allowed inode space(%)	*/
+	__u64		datablocks;	/* fsblocks in data subvolume	*/
+	__u64		rtblocks;	/* fsblocks in realtime subvol	*/
+	__u64		rtextents;	/* rt extents in realtime subvol*/
+	__u64		logstart;	/* starting fsblock of the log	*/
+	unsigned char	uuid[16];	/* unique id of the filesystem	*/
+	__u32		sunit;		/* stripe unit, fsblocks	*/
+	__u32		swidth;		/* stripe width, fsblocks	*/
+	__s32		version;	/* structure version		*/
+	__u32		flags;		/* superblock version flags	*/
+	__u32		logsectsize;	/* log sector size, bytes	*/
+	__u32		rtsectsize;	/* realtime sector size, bytes	*/
+	__u32		dirblocksize;	/* directory block size, bytes	*/
+	__u32		logsunit;	/* log stripe unit, bytes */
+} xfs_fsop_geom_v2_t;
+
+/*
+ * Output for XFS_IOC_FSGEOMETRY (v5)
  */
 typedef struct xfs_fsop_geom {
 	__u32		blocksize;	/* filesystem (data) block size */
@@ -172,6 +199,7 @@ typedef struct xfs_fsop_geom {
 	__u32		rtsectsize;	/* realtime sector size, bytes	*/
 	__u32		dirblocksize;	/* directory block size, bytes	*/
 	__u32		logsunit;	/* log stripe unit, bytes */
+	__u64		reserved[18];	/* reserved space */
 } xfs_fsop_geom_t;
 
 /* Output for XFS_FS_COUNTS */
@@ -189,6 +217,7 @@ typedef struct xfs_fsop_resblks {
 } xfs_fsop_resblks_t;
 
 #define XFS_FSOP_GEOM_VERSION	0
+#define XFS_FSOP_GEOM_V5	5
 
 #define XFS_FSOP_GEOM_FLAGS_ATTR	0x0001	/* attributes in use	*/
 #define XFS_FSOP_GEOM_FLAGS_NLINK	0x0002	/* 32-bit nlink values	*/
@@ -620,6 +649,7 @@ struct xfs_scrub_metadata {
 #define XFS_IOC_FSSETDM_BY_HANDLE    _IOW ('X', 121, struct xfs_fsop_setdm_handlereq)
 #define XFS_IOC_ATTRLIST_BY_HANDLE   _IOW ('X', 122, struct xfs_fsop_attrlist_handlereq)
 #define XFS_IOC_ATTRMULTI_BY_HANDLE  _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
+#define XFS_IOC_FSGEOMETRY_V2	     _IOR ('X', 124, struct xfs_fsop_geom_v2)
 #define XFS_IOC_FSGEOMETRY	     _IOR ('X', 124, struct xfs_fsop_geom)
 #define XFS_IOC_GOINGDOWN	     _IOR ('X', 125, uint32_t)
 /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index f0309b74e377..c2ca3a816c41 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1168,6 +1168,11 @@ xfs_fs_geometry(
 
 	geo->logsunit = sbp->sb_logsunit;
 
+	if (struct_version < 5)
+		return 0;
+
+	geo->version = XFS_FSOP_GEOM_V5;
+
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 6ecdbb3af7de..7fd8815633dc 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -801,7 +801,7 @@ xfs_ioc_fsgeometry_v1(
 }
 
 STATIC int
-xfs_ioc_fsgeometry(
+xfs_ioc_fsgeometry_v2(
 	xfs_mount_t		*mp,
 	void			__user *arg)
 {
@@ -812,6 +812,23 @@ xfs_ioc_fsgeometry(
 	if (error)
 		return error;
 
+	if (copy_to_user(arg, &fsgeo, sizeof(struct xfs_fsop_geom_v2)))
+		return -EFAULT;
+	return 0;
+}
+
+STATIC int
+xfs_ioc_fsgeometry(
+	struct xfs_mount	*mp,
+	void			__user *arg)
+{
+	struct xfs_fsop_geom	fsgeo;
+	int			error;
+
+	error = xfs_fs_geometry(&mp->m_sb, &fsgeo, 5);
+	if (error)
+		return error;
+
 	if (copy_to_user(arg, &fsgeo, sizeof(fsgeo)))
 		return -EFAULT;
 	return 0;
@@ -1938,7 +1955,8 @@ xfs_file_ioctl(
 
 	case XFS_IOC_FSGEOMETRY_V1:
 		return xfs_ioc_fsgeometry_v1(mp, arg);
-
+	case XFS_IOC_FSGEOMETRY_V2:
+		return xfs_ioc_fsgeometry_v2(mp, arg);
 	case XFS_IOC_FSGEOMETRY:
 		return xfs_ioc_fsgeometry(mp, arg);
 
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 5001dca361e9..323cfd4b15dc 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -561,6 +561,7 @@ xfs_file_compat_ioctl(
 	switch (cmd) {
 	/* No size or alignment issues on any arch */
 	case XFS_IOC_DIOINFO:
+	case XFS_IOC_FSGEOMETRY_V2:
 	case XFS_IOC_FSGEOMETRY:
 	case XFS_IOC_FSGETXATTR:
 	case XFS_IOC_FSSETXATTR:

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 05/10] xfs: add a new ioctl to describe allocation group geometry
  2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2019-04-01 17:10 ` [PATCH 04/10] xfs: expand xfs_fsop_geom Darrick J. Wong
@ 2019-04-01 17:10 ` Darrick J. Wong
  2019-04-02 17:34   ` Brian Foster
  2019-04-01 17:10 ` [PATCH 06/10] xfs: report fs and rt health via geometry structure Darrick J. Wong
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-01 17:10 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add a new ioctl to describe an allocation group's geometry.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_ag.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_ag.h |    2 ++
 fs/xfs/libxfs/xfs_fs.h |   14 ++++++++++++++
 fs/xfs/xfs_ioctl.c     |   24 ++++++++++++++++++++++++
 fs/xfs/xfs_ioctl32.c   |    1 +
 5 files changed, 89 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index 1ef8acf35e7d..1679e37fe28d 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -19,6 +19,7 @@
 #include "xfs_ialloc.h"
 #include "xfs_rmap.h"
 #include "xfs_ag.h"
+#include "xfs_ag_resv.h"
 
 static struct xfs_buf *
 xfs_get_aghdr_buf(
@@ -461,3 +462,50 @@ xfs_ag_extend_space(
 				len, &XFS_RMAP_OINFO_SKIP_UPDATE,
 				XFS_AG_RESV_NONE);
 }
+
+/* Retrieve AG geometry. */
+int
+xfs_ag_get_geometry(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct xfs_ag_geometry	*ageo)
+{
+	struct xfs_buf		*bp;
+	struct xfs_agi		*agi;
+	struct xfs_agf		*agf;
+	struct xfs_perag	*pag;
+	unsigned int		freeblks;
+	int			error;
+
+	memset(ageo, 0, sizeof(*ageo));
+
+	if (agno >= mp->m_sb.sb_agcount)
+		return -EINVAL;
+
+	error = xfs_ialloc_read_agi(mp, NULL, agno, &bp);
+	if (error)
+		return error;
+
+	agi = XFS_BUF_TO_AGI(bp);
+	ageo->ag_icount = be32_to_cpu(agi->agi_count);
+	ageo->ag_ifree = be32_to_cpu(agi->agi_freecount);
+	xfs_buf_relse(bp);
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &bp);
+	if (error)
+		return error;
+
+	agf = XFS_BUF_TO_AGF(bp);
+	pag = xfs_perag_get(mp, agno);
+	ageo->ag_length = be32_to_cpu(agf->agf_length);
+	freeblks = pag->pagf_freeblks +
+		   pag->pagf_flcount +
+		   pag->pagf_btreeblks -
+		   xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE);
+	ageo->ag_freeblks = freeblks;
+	xfs_perag_put(pag);
+	xfs_buf_relse(bp);
+
+	ageo->ag_number = agno;
+	return 0;
+}
diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
index 412702e23f61..5166322807e7 100644
--- a/fs/xfs/libxfs/xfs_ag.h
+++ b/fs/xfs/libxfs/xfs_ag.h
@@ -26,5 +26,7 @@ struct aghdr_init_data {
 int xfs_ag_init_headers(struct xfs_mount *mp, struct aghdr_init_data *id);
 int xfs_ag_extend_space(struct xfs_mount *mp, struct xfs_trans *tp,
 			struct aghdr_init_data *id, xfs_extlen_t len);
+int xfs_ag_get_geometry(struct xfs_mount *mp, xfs_agnumber_t agno,
+			struct xfs_ag_geometry *ageo);
 
 #endif /* __LIBXFS_AG_H */
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 1dba751cde60..87226e00e7bd 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -266,6 +266,19 @@ typedef struct xfs_fsop_resblks {
 #define XFS_MIN_DBLOCKS(s) ((xfs_rfsblock_t)((s)->sb_agcount - 1) *	\
 			 (s)->sb_agblocks + XFS_MIN_AG_BLOCKS)
 
+/*
+ * Output for XFS_IOC_AG_GEOMETRY
+ */
+struct xfs_ag_geometry {
+	__u32		ag_number;	/* i/o: AG number */
+	__u32		ag_length;	/* o: length in blocks */
+	__u32		ag_freeblks;	/* o: free space */
+	__u32		ag_icount;	/* o: inodes allocated */
+	__u32		ag_ifree;	/* o: inodes free */
+	__u32		ag_reserved32;	/* o: zero */
+	__u64		ag_reserved[5];	/* o: zero */
+};
+
 /*
  * Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT
  */
@@ -619,6 +632,7 @@ struct xfs_scrub_metadata {
 #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
 #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
+#define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 7fd8815633dc..b5918ce656bd 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -33,6 +33,7 @@
 #include "xfs_fsmap.h"
 #include "scrub/xfs_scrub.h"
 #include "xfs_sb.h"
+#include "xfs_ag.h"
 
 #include <linux/capability.h>
 #include <linux/cred.h>
@@ -834,6 +835,26 @@ xfs_ioc_fsgeometry(
 	return 0;
 }
 
+STATIC int
+xfs_ioc_ag_geometry(
+	struct xfs_mount	*mp,
+	void			__user *arg)
+{
+	struct xfs_ag_geometry	ageo;
+	int			error;
+
+	if (copy_from_user(&ageo, arg, sizeof(ageo)))
+		return -EFAULT;
+
+	error = xfs_ag_get_geometry(mp, ageo.ag_number, &ageo);
+	if (error)
+		return error;
+
+	if (copy_to_user(arg, &ageo, sizeof(ageo)))
+		return -EFAULT;
+	return 0;
+}
+
 /*
  * Linux extended inode flags interface.
  */
@@ -1960,6 +1981,9 @@ xfs_file_ioctl(
 	case XFS_IOC_FSGEOMETRY:
 		return xfs_ioc_fsgeometry(mp, arg);
 
+	case XFS_IOC_AG_GEOMETRY:
+		return xfs_ioc_ag_geometry(mp, arg);
+
 	case XFS_IOC_GETVERSION:
 		return put_user(inode->i_generation, (int __user *)arg);
 
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 323cfd4b15dc..28d2110dd871 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -563,6 +563,7 @@ xfs_file_compat_ioctl(
 	case XFS_IOC_DIOINFO:
 	case XFS_IOC_FSGEOMETRY_V2:
 	case XFS_IOC_FSGEOMETRY:
+	case XFS_IOC_AG_GEOMETRY:
 	case XFS_IOC_FSGETXATTR:
 	case XFS_IOC_FSSETXATTR:
 	case XFS_IOC_FSGETXATTRA:

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 06/10] xfs: report fs and rt health via geometry structure
  2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2019-04-01 17:10 ` [PATCH 05/10] xfs: add a new ioctl to describe allocation group geometry Darrick J. Wong
@ 2019-04-01 17:10 ` Darrick J. Wong
  2019-04-02 17:35   ` Brian Foster
  2019-04-01 17:10 ` [PATCH 07/10] xfs: report AG health via AG geometry ioctl Darrick J. Wong
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-01 17:10 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Use our newly expanded geometry structure to report the overall fs and
realtime health status.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h     |   11 ++++++++++-
 fs/xfs/libxfs/xfs_health.h |    3 +++
 fs/xfs/xfs_health.c        |   27 +++++++++++++++++++++++++++
 fs/xfs/xfs_ioctl.c         |    3 +++
 4 files changed, 43 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 87226e00e7bd..ddbfde7ff79d 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -199,9 +199,18 @@ typedef struct xfs_fsop_geom {
 	__u32		rtsectsize;	/* realtime sector size, bytes	*/
 	__u32		dirblocksize;	/* directory block size, bytes	*/
 	__u32		logsunit;	/* log stripe unit, bytes */
-	__u64		reserved[18];	/* reserved space */
+	__u32		health;		/* o: unhealthy fs & rt metadata */
+	__u32		reserved32;	/* reserved space */
+	__u64		reserved[17];	/* reserved space */
 } xfs_fsop_geom_t;
 
+#define XFS_FSOP_GEOM_HEALTH_FS_COUNTERS (1 << 0) /* summary counters */
+#define XFS_FSOP_GEOM_HEALTH_FS_UQUOTA	(1 << 1)  /* user quota */
+#define XFS_FSOP_GEOM_HEALTH_FS_GQUOTA	(1 << 2)  /* group quota */
+#define XFS_FSOP_GEOM_HEALTH_FS_PQUOTA	(1 << 3)  /* project quota */
+#define XFS_FSOP_GEOM_HEALTH_RT_BITMAP	(1 << 4)  /* realtime bitmap */
+#define XFS_FSOP_GEOM_HEALTH_RT_SUMMARY	(1 << 5)  /* realtime summary */
+
 /* Output for XFS_FS_COUNTS */
 typedef struct xfs_fsop_counts {
 	__u64	freedata;	/* free data section blocks */
diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
index 269b124dc1d7..36736d54a3e3 100644
--- a/fs/xfs/libxfs/xfs_health.h
+++ b/fs/xfs/libxfs/xfs_health.h
@@ -39,6 +39,7 @@
 struct xfs_mount;
 struct xfs_perag;
 struct xfs_inode;
+struct xfs_fsop_geom;
 
 /* Observable health issues for metadata spanning the entire filesystem. */
 #define XFS_HEALTH_FS_COUNTERS	(1 << 0)  /* summary counters */
@@ -200,4 +201,6 @@ xfs_inode_healthy(struct xfs_inode *ip)
 	return xfs_inode_measure_sickness(ip) == 0;
 }
 
+void xfs_fsop_geom_health(struct xfs_mount *mp, struct xfs_fsop_geom *geo);
+
 #endif	/* __XFS_HEALTH_H__ */
diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
index 6e2da858c356..151c98693bef 100644
--- a/fs/xfs/xfs_health.c
+++ b/fs/xfs/xfs_health.c
@@ -249,3 +249,30 @@ xfs_inode_measure_sickness(
 	spin_unlock(&ip->i_flags_lock);
 	return ret;
 }
+
+/* Fill out fs geometry health info. */
+void
+xfs_fsop_geom_health(
+	struct xfs_mount	*mp,
+	struct xfs_fsop_geom	*geo)
+{
+	unsigned int		sick;
+
+	geo->health = 0;
+
+	sick = xfs_fs_measure_sickness(mp);
+	if (sick & XFS_HEALTH_FS_COUNTERS)
+		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_COUNTERS;
+	if (sick & XFS_HEALTH_FS_UQUOTA)
+		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_UQUOTA;
+	if (sick & XFS_HEALTH_FS_GQUOTA)
+		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_GQUOTA;
+	if (sick & XFS_HEALTH_FS_PQUOTA)
+		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_PQUOTA;
+
+	sick = xfs_rt_measure_sickness(mp);
+	if (sick & XFS_HEALTH_RT_BITMAP)
+		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_BITMAP;
+	if (sick & XFS_HEALTH_RT_SUMMARY)
+		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY;
+}
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index b5918ce656bd..f9bf11b6a055 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -34,6 +34,7 @@
 #include "scrub/xfs_scrub.h"
 #include "xfs_sb.h"
 #include "xfs_ag.h"
+#include "xfs_health.h"
 
 #include <linux/capability.h>
 #include <linux/cred.h>
@@ -830,6 +831,8 @@ xfs_ioc_fsgeometry(
 	if (error)
 		return error;
 
+	xfs_fsop_geom_health(mp, &fsgeo);
+
 	if (copy_to_user(arg, &fsgeo, sizeof(fsgeo)))
 		return -EFAULT;
 	return 0;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 07/10] xfs: report AG health via AG geometry ioctl
  2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2019-04-01 17:10 ` [PATCH 06/10] xfs: report fs and rt health via geometry structure Darrick J. Wong
@ 2019-04-01 17:10 ` Darrick J. Wong
  2019-04-03 14:30   ` Brian Foster
  2019-04-01 17:11 ` [PATCH 08/10] xfs: report inode health via bulkstat Darrick J. Wong
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-01 17:10 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the AG geometry info ioctl to report health status too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h     |   12 +++++++++++-
 fs/xfs/libxfs/xfs_health.h |    2 ++
 fs/xfs/xfs_health.c        |   40 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_ioctl.c         |    2 ++
 4 files changed, 55 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index ddbfde7ff79d..dc2c538e6b92 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -284,9 +284,19 @@ struct xfs_ag_geometry {
 	__u32		ag_freeblks;	/* o: free space */
 	__u32		ag_icount;	/* o: inodes allocated */
 	__u32		ag_ifree;	/* o: inodes free */
-	__u32		ag_reserved32;	/* o: zero */
+	__u32		ag_health;	/* o: sick things in ag */
 	__u64		ag_reserved[5];	/* o: zero */
 };
+#define XFS_AG_GEOM_HEALTH_AG_SB	(1 << 0)  /* superblock */
+#define XFS_AG_GEOM_HEALTH_AG_AGF	(1 << 1)  /* AGF header */
+#define XFS_AG_GEOM_HEALTH_AG_AGFL	(1 << 2)  /* AGFL header */
+#define XFS_AG_GEOM_HEALTH_AG_AGI	(1 << 3)  /* AGI header */
+#define XFS_AG_GEOM_HEALTH_AG_BNOBT	(1 << 4)  /* free space by block */
+#define XFS_AG_GEOM_HEALTH_AG_CNTBT	(1 << 5)  /* free space by length */
+#define XFS_AG_GEOM_HEALTH_AG_INOBT	(1 << 6)  /* inode index */
+#define XFS_AG_GEOM_HEALTH_AG_FINOBT	(1 << 7)  /* free inode index */
+#define XFS_AG_GEOM_HEALTH_AG_RMAPBT	(1 << 8)  /* reverse mappings */
+#define XFS_AG_GEOM_HEALTH_AG_REFCNTBT	(1 << 9)  /* reference counts */
 
 /*
  * Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT
diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
index 36736d54a3e3..2d3b879da9b5 100644
--- a/fs/xfs/libxfs/xfs_health.h
+++ b/fs/xfs/libxfs/xfs_health.h
@@ -202,5 +202,7 @@ xfs_inode_healthy(struct xfs_inode *ip)
 }
 
 void xfs_fsop_geom_health(struct xfs_mount *mp, struct xfs_fsop_geom *geo);
+void xfs_ag_geom_health(struct xfs_mount *mp, xfs_agnumber_t agno,
+		struct xfs_ag_geometry *ageo);
 
 #endif	/* __XFS_HEALTH_H__ */
diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
index 151c98693bef..5ca471bd41ad 100644
--- a/fs/xfs/xfs_health.c
+++ b/fs/xfs/xfs_health.c
@@ -276,3 +276,43 @@ xfs_fsop_geom_health(
 	if (sick & XFS_HEALTH_RT_SUMMARY)
 		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY;
 }
+
+/* Fill out ag geometry health info. */
+void
+xfs_ag_geom_health(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct xfs_ag_geometry	*ageo)
+{
+	struct xfs_perag	*pag;
+	unsigned int		sick;
+
+	if (agno >= mp->m_sb.sb_agcount)
+		return;
+
+	ageo->ag_health = 0;
+
+	pag = xfs_perag_get(mp, agno);
+	sick = xfs_ag_measure_sickness(pag);
+	if (sick & XFS_HEALTH_AG_SB)
+		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_SB;
+	if (sick & XFS_HEALTH_AG_AGF)
+		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGF;
+	if (sick & XFS_HEALTH_AG_AGFL)
+		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGFL;
+	if (sick & XFS_HEALTH_AG_AGI)
+		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGI;
+	if (sick & XFS_HEALTH_AG_BNOBT)
+		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_BNOBT;
+	if (sick & XFS_HEALTH_AG_CNTBT)
+		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_CNTBT;
+	if (sick & XFS_HEALTH_AG_INOBT)
+		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_INOBT;
+	if (sick & XFS_HEALTH_AG_FINOBT)
+		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_FINOBT;
+	if (sick & XFS_HEALTH_AG_RMAPBT)
+		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_RMAPBT;
+	if (sick & XFS_HEALTH_AG_REFCNTBT)
+		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_REFCNTBT;
+	xfs_perag_put(pag);
+}
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index f9bf11b6a055..f1fc5e53cfc1 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -853,6 +853,8 @@ xfs_ioc_ag_geometry(
 	if (error)
 		return error;
 
+	xfs_ag_geom_health(mp, ageo.ag_number, &ageo);
+
 	if (copy_to_user(arg, &ageo, sizeof(ageo)))
 		return -EFAULT;
 	return 0;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 08/10] xfs: report inode health via bulkstat
  2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2019-04-01 17:10 ` [PATCH 07/10] xfs: report AG health via AG geometry ioctl Darrick J. Wong
@ 2019-04-01 17:11 ` Darrick J. Wong
  2019-04-01 17:11 ` [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health Darrick J. Wong
  2019-04-01 17:11 ` [PATCH 10/10] xfs: update health status if we get a clean bill of health Darrick J. Wong
  9 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-01 17:11 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Use space in the bulkstat ioctl structure to report any problems
observed with the inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h     |   13 ++++++++++++-
 fs/xfs/libxfs/xfs_health.h |    1 +
 fs/xfs/xfs_health.c        |   27 +++++++++++++++++++++++++++
 fs/xfs/xfs_itable.c        |    2 ++
 4 files changed, 42 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index dc2c538e6b92..fffaead718a5 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -346,13 +346,24 @@ typedef struct xfs_bstat {
 #define	bs_projid	bs_projid_lo	/* (previously just bs_projid)	*/
 	__u16		bs_forkoff;	/* inode fork offset in bytes	*/
 	__u16		bs_projid_hi;	/* higher part of project id	*/
-	unsigned char	bs_pad[6];	/* pad space, unused		*/
+	uint16_t	bs_health;	/* sick inode metadata		*/
+	unsigned char	bs_pad[4];	/* pad space, unused		*/
 	__u32		bs_cowextsize;	/* cow extent size		*/
 	__u32		bs_dmevmask;	/* DMIG event mask		*/
 	__u16		bs_dmstate;	/* DMIG state info		*/
 	__u16		bs_aextents;	/* attribute number of extents	*/
 } xfs_bstat_t;
 
+/* bs_health flags */
+#define XFS_BS_HEALTH_INODE	(1 << 0)  /* inode core */
+#define XFS_BS_HEALTH_BMBTD	(1 << 1)  /* data fork */
+#define XFS_BS_HEALTH_BMBTA	(1 << 2)  /* attr fork */
+#define XFS_BS_HEALTH_BMBTC	(1 << 3)  /* cow fork */
+#define XFS_BS_HEALTH_DIR	(1 << 4)  /* directory */
+#define XFS_BS_HEALTH_XATTR	(1 << 5)  /* extended attributes */
+#define XFS_BS_HEALTH_SYMLINK	(1 << 6)  /* symbolic link remote target */
+#define XFS_BS_HEALTH_PARENT	(1 << 7)  /* parent pointers */
+
 /*
  * Project quota id helpers (previously projid was 16bit only
  * and using two 16bit values to hold new 32bit projid was choosen
diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
index 2d3b879da9b5..a6446d1dd8a7 100644
--- a/fs/xfs/libxfs/xfs_health.h
+++ b/fs/xfs/libxfs/xfs_health.h
@@ -204,5 +204,6 @@ xfs_inode_healthy(struct xfs_inode *ip)
 void xfs_fsop_geom_health(struct xfs_mount *mp, struct xfs_fsop_geom *geo);
 void xfs_ag_geom_health(struct xfs_mount *mp, xfs_agnumber_t agno,
 		struct xfs_ag_geometry *ageo);
+void xfs_bulkstat_health(struct xfs_inode *ip, struct xfs_bstat *bs);
 
 #endif	/* __XFS_HEALTH_H__ */
diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
index 5ca471bd41ad..1c9b71949410 100644
--- a/fs/xfs/xfs_health.c
+++ b/fs/xfs/xfs_health.c
@@ -316,3 +316,30 @@ xfs_ag_geom_health(
 		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_REFCNTBT;
 	xfs_perag_put(pag);
 }
+
+/* Fill out bulkstat health info. */
+void
+xfs_bulkstat_health(
+	struct xfs_inode	*ip,
+	struct xfs_bstat	*bs)
+{
+	unsigned int		sick = xfs_inode_measure_sickness(ip);
+
+	bs->bs_health = 0;
+	if (sick & XFS_HEALTH_INO_CORE)
+		bs->bs_health |= XFS_BS_HEALTH_INODE;
+	if (sick & XFS_HEALTH_INO_BMBTD)
+		bs->bs_health |= XFS_BS_HEALTH_BMBTD;
+	if (sick & XFS_HEALTH_INO_BMBTA)
+		bs->bs_health |= XFS_BS_HEALTH_BMBTA;
+	if (sick & XFS_HEALTH_INO_BMBTC)
+		bs->bs_health |= XFS_BS_HEALTH_BMBTC;
+	if (sick & XFS_HEALTH_INO_DIR)
+		bs->bs_health |= XFS_BS_HEALTH_DIR;
+	if (sick & XFS_HEALTH_INO_XATTR)
+		bs->bs_health |= XFS_BS_HEALTH_XATTR;
+	if (sick & XFS_HEALTH_INO_SYMLINK)
+		bs->bs_health |= XFS_BS_HEALTH_SYMLINK;
+	if (sick & XFS_HEALTH_INO_PARENT)
+		bs->bs_health |= XFS_BS_HEALTH_PARENT;
+}
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 1861289bf823..cff28ee73deb 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -18,6 +18,7 @@
 #include "xfs_error.h"
 #include "xfs_trace.h"
 #include "xfs_icache.h"
+#include "xfs_health.h"
 
 /*
  * Return stat information for one inode.
@@ -84,6 +85,7 @@ xfs_bulkstat_one_int(
 	buf->bs_extsize = dic->di_extsize << mp->m_sb.sb_blocklog;
 	buf->bs_extents = dic->di_nextents;
 	memset(buf->bs_pad, 0, sizeof(buf->bs_pad));
+	xfs_bulkstat_health(ip, buf);
 	buf->bs_dmevmask = dic->di_dmevmask;
 	buf->bs_dmstate = dic->di_dmstate;
 	buf->bs_aextents = dic->di_anextents;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health
  2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2019-04-01 17:11 ` [PATCH 08/10] xfs: report inode health via bulkstat Darrick J. Wong
@ 2019-04-01 17:11 ` Darrick J. Wong
  2019-04-04 11:50   ` Brian Foster
  2019-04-01 17:11 ` [PATCH 10/10] xfs: update health status if we get a clean bill of health Darrick J. Wong
  9 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-01 17:11 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Now that we have the ability to track sick metadata in-core, make scrub
and repair update those health assessments after doing work.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile       |    1 
 fs/xfs/scrub/health.c |  180 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/health.h |   12 +++
 fs/xfs/scrub/scrub.c  |    8 ++
 fs/xfs/scrub/scrub.h  |   11 +++
 5 files changed, 212 insertions(+)
 create mode 100644 fs/xfs/scrub/health.c
 create mode 100644 fs/xfs/scrub/health.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 786379c143f4..b20964e26a22 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -143,6 +143,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   common.o \
 				   dabtree.o \
 				   dir.o \
+				   health.o \
 				   ialloc.o \
 				   inode.o \
 				   parent.o \
diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
new file mode 100644
index 000000000000..dd9986500801
--- /dev/null
+++ b/fs/xfs/scrub/health.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_health.h"
+#include "scrub/scrub.h"
+#include "scrub/health.h"
+
+static const unsigned int xchk_type_to_health_flag[XFS_SCRUB_TYPE_NR] = {
+	[XFS_SCRUB_TYPE_SB]		= XFS_HEALTH_AG_SB,
+	[XFS_SCRUB_TYPE_AGF]		= XFS_HEALTH_AG_AGF,
+	[XFS_SCRUB_TYPE_AGFL]		= XFS_HEALTH_AG_AGFL,
+	[XFS_SCRUB_TYPE_AGI]		= XFS_HEALTH_AG_AGI,
+	[XFS_SCRUB_TYPE_BNOBT]		= XFS_HEALTH_AG_BNOBT,
+	[XFS_SCRUB_TYPE_CNTBT]		= XFS_HEALTH_AG_CNTBT,
+	[XFS_SCRUB_TYPE_INOBT]		= XFS_HEALTH_AG_INOBT,
+	[XFS_SCRUB_TYPE_FINOBT]		= XFS_HEALTH_AG_FINOBT,
+	[XFS_SCRUB_TYPE_RMAPBT]		= XFS_HEALTH_AG_RMAPBT,
+	[XFS_SCRUB_TYPE_REFCNTBT]	= XFS_HEALTH_AG_REFCNTBT,
+	[XFS_SCRUB_TYPE_INODE]		= XFS_HEALTH_INO_CORE,
+	[XFS_SCRUB_TYPE_BMBTD]		= XFS_HEALTH_INO_BMBTD,
+	[XFS_SCRUB_TYPE_BMBTA]		= XFS_HEALTH_INO_BMBTA,
+	[XFS_SCRUB_TYPE_BMBTC]		= XFS_HEALTH_INO_BMBTC,
+	[XFS_SCRUB_TYPE_DIR]		= XFS_HEALTH_INO_DIR,
+	[XFS_SCRUB_TYPE_XATTR]		= XFS_HEALTH_INO_XATTR,
+	[XFS_SCRUB_TYPE_SYMLINK]	= XFS_HEALTH_INO_SYMLINK,
+	[XFS_SCRUB_TYPE_PARENT]		= XFS_HEALTH_INO_PARENT,
+	[XFS_SCRUB_TYPE_RTBITMAP]	= XFS_HEALTH_RT_BITMAP,
+	[XFS_SCRUB_TYPE_RTSUM]		= XFS_HEALTH_RT_SUMMARY,
+	[XFS_SCRUB_TYPE_UQUOTA]		= XFS_HEALTH_FS_UQUOTA,
+	[XFS_SCRUB_TYPE_GQUOTA]		= XFS_HEALTH_FS_GQUOTA,
+	[XFS_SCRUB_TYPE_PQUOTA]		= XFS_HEALTH_FS_PQUOTA,
+};
+
+/* Return the health status mask for this scrub type. */
+unsigned int
+xchk_health_mask_for_scrub_type(
+	__u32			scrub_type)
+{
+	return xchk_type_to_health_flag[scrub_type];
+}
+
+/* Mark metadata unhealthy. */
+static void
+xchk_mark_sick(
+	struct xfs_scrub	*sc,
+	unsigned int		mask)
+{
+	struct xfs_perag	*pag;
+
+	if (!mask)
+		return;
+
+	switch (sc->sm->sm_type) {
+	case XFS_SCRUB_TYPE_SB:
+	case XFS_SCRUB_TYPE_AGF:
+	case XFS_SCRUB_TYPE_AGFL:
+	case XFS_SCRUB_TYPE_AGI:
+	case XFS_SCRUB_TYPE_BNOBT:
+	case XFS_SCRUB_TYPE_CNTBT:
+	case XFS_SCRUB_TYPE_INOBT:
+	case XFS_SCRUB_TYPE_FINOBT:
+	case XFS_SCRUB_TYPE_RMAPBT:
+	case XFS_SCRUB_TYPE_REFCNTBT:
+		pag = xfs_perag_get(sc->mp, sc->sm->sm_agno);
+		xfs_ag_mark_sick(pag, mask);
+		xfs_perag_put(pag);
+		break;
+	case XFS_SCRUB_TYPE_INODE:
+	case XFS_SCRUB_TYPE_BMBTD:
+	case XFS_SCRUB_TYPE_BMBTA:
+	case XFS_SCRUB_TYPE_BMBTC:
+	case XFS_SCRUB_TYPE_DIR:
+	case XFS_SCRUB_TYPE_XATTR:
+	case XFS_SCRUB_TYPE_SYMLINK:
+	case XFS_SCRUB_TYPE_PARENT:
+		xfs_inode_mark_sick(sc->ip, mask);
+		break;
+	case XFS_SCRUB_TYPE_UQUOTA:
+	case XFS_SCRUB_TYPE_GQUOTA:
+	case XFS_SCRUB_TYPE_PQUOTA:
+		xfs_fs_mark_sick(sc->mp, mask);
+		break;
+	case XFS_SCRUB_TYPE_RTBITMAP:
+	case XFS_SCRUB_TYPE_RTSUM:
+		xfs_rt_mark_sick(sc->mp, mask);
+		break;
+	default:
+		break;
+	}
+}
+
+/* Mark metadata healed after a repair or healthy after a clean scan. */
+static void
+xchk_mark_healthy(
+	struct xfs_scrub	*sc,
+	unsigned int		mask)
+{
+	struct xfs_perag	*pag;
+
+	if (!mask)
+		return;
+
+	switch (sc->sm->sm_type) {
+	case XFS_SCRUB_TYPE_SB:
+	case XFS_SCRUB_TYPE_AGF:
+	case XFS_SCRUB_TYPE_AGFL:
+	case XFS_SCRUB_TYPE_AGI:
+	case XFS_SCRUB_TYPE_BNOBT:
+	case XFS_SCRUB_TYPE_CNTBT:
+	case XFS_SCRUB_TYPE_INOBT:
+	case XFS_SCRUB_TYPE_FINOBT:
+	case XFS_SCRUB_TYPE_RMAPBT:
+	case XFS_SCRUB_TYPE_REFCNTBT:
+		pag = xfs_perag_get(sc->mp, sc->sm->sm_agno);
+		xfs_ag_mark_healthy(pag, mask);
+		xfs_perag_put(pag);
+		break;
+	case XFS_SCRUB_TYPE_INODE:
+	case XFS_SCRUB_TYPE_BMBTD:
+	case XFS_SCRUB_TYPE_BMBTA:
+	case XFS_SCRUB_TYPE_BMBTC:
+	case XFS_SCRUB_TYPE_DIR:
+	case XFS_SCRUB_TYPE_XATTR:
+	case XFS_SCRUB_TYPE_SYMLINK:
+	case XFS_SCRUB_TYPE_PARENT:
+		xfs_inode_mark_healthy(sc->ip, mask);
+		break;
+	case XFS_SCRUB_TYPE_UQUOTA:
+	case XFS_SCRUB_TYPE_GQUOTA:
+	case XFS_SCRUB_TYPE_PQUOTA:
+		xfs_fs_mark_healthy(sc->mp, mask);
+		break;
+	case XFS_SCRUB_TYPE_RTBITMAP:
+	case XFS_SCRUB_TYPE_RTSUM:
+		xfs_rt_mark_healthy(sc->mp, mask);
+		break;
+	default:
+		break;
+	}
+}
+
+/* Update filesystem health assessments based on what we found and did. */
+void
+xchk_update_health(
+	struct xfs_scrub	*sc,
+	bool			already_fixed)
+{
+	/*
+	 * If the scrubber finds errors, we mark sick whatever's mentioned in
+	 * sick_mask, no matter whether this is a first scan or an evaluation
+	 * of repair effectiveness.
+	 *
+	 * If there is no direct corruption and we're called after a repair,
+	 * clear whatever's in heal_mask because that's what we fixed.
+	 *
+	 * Otherwise, there's no direct corruption and we didn't repair
+	 * anything, so mark whatever's in sick_mask as healthy.
+	 */
+	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		xchk_mark_sick(sc, sc->sick_mask);
+	else if (already_fixed)
+		xchk_mark_healthy(sc, sc->heal_mask);
+	else
+		xchk_mark_healthy(sc, sc->sick_mask);
+}
diff --git a/fs/xfs/scrub/health.h b/fs/xfs/scrub/health.h
new file mode 100644
index 000000000000..e795f4c9a23c
--- /dev/null
+++ b/fs/xfs/scrub/health.h
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#ifndef __XFS_SCRUB_HEALTH_H__
+#define __XFS_SCRUB_HEALTH_H__
+
+unsigned int xchk_health_mask_for_scrub_type(__u32 scrub_type);
+void xchk_update_health(struct xfs_scrub *sc, bool already_fixed);
+
+#endif /* __XFS_SCRUB_HEALTH_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 1b2344d00525..b1519dfc5811 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -40,6 +40,7 @@
 #include "scrub/trace.h"
 #include "scrub/btree.h"
 #include "scrub/repair.h"
+#include "scrub/health.h"
 
 /*
  * Online Scrub and Repair
@@ -468,6 +469,7 @@ xfs_scrub_metadata(
 {
 	struct xfs_scrub		sc;
 	struct xfs_mount		*mp = ip->i_mount;
+	unsigned int			heal_mask;
 	bool				try_harder = false;
 	bool				already_fixed = false;
 	int				error = 0;
@@ -488,6 +490,7 @@ xfs_scrub_metadata(
 	error = xchk_validate_inputs(mp, sm);
 	if (error)
 		goto out;
+	heal_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
 
 	xchk_experimental_warning(mp);
 
@@ -499,6 +502,8 @@ xfs_scrub_metadata(
 	sc.ops = &meta_scrub_ops[sm->sm_type];
 	sc.try_harder = try_harder;
 	sc.sa.agno = NULLAGNUMBER;
+	sc.heal_mask = heal_mask;
+	sc.sick_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
 	error = sc.ops->setup(&sc, ip);
 	if (error)
 		goto out_teardown;
@@ -519,6 +524,8 @@ xfs_scrub_metadata(
 	} else if (error)
 		goto out_teardown;
 
+	xchk_update_health(&sc, already_fixed);
+
 	if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed) {
 		bool needs_fix;
 
@@ -551,6 +558,7 @@ xfs_scrub_metadata(
 				xrep_failure(mp);
 				goto out;
 			}
+			heal_mask = sc.heal_mask;
 			goto retry_op;
 		}
 	}
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 22f754fba8e5..05f1ad242a35 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -62,6 +62,17 @@ struct xfs_scrub {
 	struct xfs_inode		*ip;
 	void				*buf;
 	uint				ilock_flags;
+
+	/* Metadata to be marked sick if scrub finds errors. */
+	unsigned int			sick_mask;
+
+	/*
+	 * Metadata to be marked healthy if repair fixes errors.  Some repair
+	 * functions can fix multiple data structures at once, so we have to
+	 * treat sick and heal masks separately.
+	 */
+	unsigned int			heal_mask;
+
 	bool				try_harder;
 	bool				has_quotaofflock;
 

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 10/10] xfs: update health status if we get a clean bill of health
  2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2019-04-01 17:11 ` [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health Darrick J. Wong
@ 2019-04-01 17:11 ` Darrick J. Wong
  2019-04-04 11:51   ` Brian Foster
  9 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-01 17:11 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

If scrub finds that everything is ok with the filesystem, we need a way
to tell the health tracking that it can let go of indirect health flags,
since indirect flags only mean that at some point in the past we lost
some context.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h |    3 ++
 fs/xfs/scrub/common.c  |   12 ++++++++++
 fs/xfs/scrub/common.h  |    1 +
 fs/xfs/scrub/health.c  |   58 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/health.h  |    1 +
 fs/xfs/scrub/repair.c  |    1 +
 fs/xfs/scrub/scrub.c   |    6 +++++
 fs/xfs/scrub/trace.h   |    4 ++-
 8 files changed, 84 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index fffaead718a5..320274d3809a 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -574,9 +574,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_UQUOTA	21	/* user quotas */
 #define XFS_SCRUB_TYPE_GQUOTA	22	/* group quotas */
 #define XFS_SCRUB_TYPE_PQUOTA	23	/* project quotas */
+#define XFS_SCRUB_TYPE_HEALTHY	24	/* everything checked out ok */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	24
+#define XFS_SCRUB_TYPE_NR	25
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1 << 0)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 0c54ff55b901..9064ed567e37 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -208,6 +208,18 @@ xchk_ino_set_preen(
 	trace_xchk_ino_preen(sc, ino, __return_address);
 }
 
+/* Record non-specific corruption. */
+void
+xchk_set_corrupt(
+	struct xfs_scrub	*sc)
+{
+	sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+	xfs_scrub_whine(sc->mp, "type %d ret_ip %pS",
+			sc->sm->sm_type,
+			__return_address);
+	trace_xchk_fs_error(sc, 0, __return_address);
+}
+
 /* Record a corrupt block. */
 void
 xchk_block_set_corrupt(
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index e26a430bd466..1b7b8c555f2e 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -39,6 +39,7 @@ void xchk_block_set_preen(struct xfs_scrub *sc,
 		struct xfs_buf *bp);
 void xchk_ino_set_preen(struct xfs_scrub *sc, xfs_ino_t ino);
 
+void xchk_set_corrupt(struct xfs_scrub *sc);
 void xchk_block_set_corrupt(struct xfs_scrub *sc,
 		struct xfs_buf *bp);
 void xchk_ino_set_corrupt(struct xfs_scrub *sc, xfs_ino_t ino);
diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
index dd9986500801..049e802b9418 100644
--- a/fs/xfs/scrub/health.c
+++ b/fs/xfs/scrub/health.c
@@ -19,6 +19,7 @@
 #include "xfs_health.h"
 #include "scrub/scrub.h"
 #include "scrub/health.h"
+#include "scrub/common.h"
 
 static const unsigned int xchk_type_to_health_flag[XFS_SCRUB_TYPE_NR] = {
 	[XFS_SCRUB_TYPE_SB]		= XFS_HEALTH_AG_SB,
@@ -54,6 +55,60 @@ xchk_health_mask_for_scrub_type(
 	return xchk_type_to_health_flag[scrub_type];
 }
 
+/*
+ * Quick scan to double-check that there isn't any evidence of lingering
+ * primary health problems.  If we're still clear, then the health update will
+ * take care of clearing the indirect evidence.
+ */
+int
+xchk_health_record(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		agno;
+	unsigned int		sick;
+
+	sick = xfs_fs_measure_sickness(mp);
+	if (sick & XFS_HEALTH_FS_PRIMARY)
+		xchk_set_corrupt(sc);
+
+	sick = xfs_rt_measure_sickness(mp);
+	if (sick & XFS_HEALTH_RT_PRIMARY)
+		xchk_set_corrupt(sc);
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		pag = xfs_perag_get(mp, agno);
+		sick = xfs_ag_measure_sickness(pag);
+		if (sick & XFS_HEALTH_AG_PRIMARY)
+			xchk_set_corrupt(sc);
+		xfs_perag_put(pag);
+	}
+
+	return 0;
+}
+
+/*
+ * Scrub gave the filesystem a clean bill of health, so clear all the indirect
+ * markers of past problems (at least for the fs and ags) so that we can be
+ * healthy again.
+ */
+STATIC void
+xchk_mark_all_healthy(
+	struct xfs_mount	*mp)
+{
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		agno;
+	int			error = 0;
+
+	xfs_fs_mark_healthy(mp, XFS_HEALTH_FS_INDIRECT);
+	xfs_rt_mark_healthy(mp, XFS_HEALTH_RT_INDIRECT);
+	for (agno = 0; error == 0 && agno < mp->m_sb.sb_agcount; agno++) {
+		pag = xfs_perag_get(mp, agno);
+		xfs_ag_mark_healthy(pag, XFS_HEALTH_AG_INDIRECT);
+		xfs_perag_put(pag);
+	}
+}
 /* Mark metadata unhealthy. */
 static void
 xchk_mark_sick(
@@ -149,6 +204,9 @@ xchk_mark_healthy(
 	case XFS_SCRUB_TYPE_RTSUM:
 		xfs_rt_mark_healthy(sc->mp, mask);
 		break;
+	case XFS_SCRUB_TYPE_HEALTHY:
+		xchk_mark_all_healthy(sc->mp);
+		break;
 	default:
 		break;
 	}
diff --git a/fs/xfs/scrub/health.h b/fs/xfs/scrub/health.h
index e795f4c9a23c..001e5a93273d 100644
--- a/fs/xfs/scrub/health.h
+++ b/fs/xfs/scrub/health.h
@@ -8,5 +8,6 @@
 
 unsigned int xchk_health_mask_for_scrub_type(__u32 scrub_type);
 void xchk_update_health(struct xfs_scrub *sc, bool already_fixed);
+int xchk_health_record(struct xfs_scrub *sc);
 
 #endif /* __XFS_SCRUB_HEALTH_H__ */
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index f28f4bad317b..5df67fe5d8ac 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -31,6 +31,7 @@
 #include "xfs_quota.h"
 #include "xfs_attr.h"
 #include "xfs_reflink.h"
+#include "xfs_health.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index b1519dfc5811..f446ab57d7b0 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -348,6 +348,12 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.scrub	= xchk_quota,
 		.repair	= xrep_notsupported,
 	},
+	[XFS_SCRUB_TYPE_HEALTHY] = {	/* fs healthy; clean all reminders */
+		.type	= ST_FS,
+		.setup	= xchk_setup_fs,
+		.scrub	= xchk_health_record,
+		.repair = xrep_notsupported,
+	},
 };
 
 /* This isn't a stable feature, warn once per day. */
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 3c83e8b3b39c..7c25a38c6f81 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -75,7 +75,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_PQUOTA);
 	{ XFS_SCRUB_TYPE_RTSUM,		"rtsummary" }, \
 	{ XFS_SCRUB_TYPE_UQUOTA,	"usrquota" }, \
 	{ XFS_SCRUB_TYPE_GQUOTA,	"grpquota" }, \
-	{ XFS_SCRUB_TYPE_PQUOTA,	"prjquota" }
+	{ XFS_SCRUB_TYPE_PQUOTA,	"prjquota" }, \
+	{ XFS_SCRUB_TYPE_HEALTHY,	"healthy" }
 
 DECLARE_EVENT_CLASS(xchk_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
@@ -223,6 +224,7 @@ DEFINE_EVENT(xchk_block_error_class, name, \
 		 void *ret_ip), \
 	TP_ARGS(sc, daddr, ret_ip))
 
+DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_fs_error);
 DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_block_error);
 DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_block_preen);
 

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 01/10] xfs: track metadata health levels
  2019-04-01 17:10 ` [PATCH 01/10] xfs: track metadata health levels Darrick J. Wong
@ 2019-04-02 13:22   ` Brian Foster
  2019-04-02 13:30     ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-02 13:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Apr 01, 2019 at 10:10:15AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Add the necessary in-core metadata fields to keep track of which parts
> of the filesystem have been observed to be unhealthy, and print a
> warning at unmount time if we have unfixed problems.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile            |    1 
>  fs/xfs/libxfs/xfs_health.h |  201 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_health.c        |  192 ++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_inode.h         |    7 ++
>  fs/xfs/xfs_mount.c         |    1 
>  fs/xfs/xfs_mount.h         |   23 +++++
>  fs/xfs/xfs_trace.h         |   73 ++++++++++++++++
>  7 files changed, 498 insertions(+)
>  create mode 100644 fs/xfs/libxfs/xfs_health.h
>  create mode 100644 fs/xfs/xfs_health.c
> 
> 
...
> diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
> new file mode 100644
> index 000000000000..0d51bd2689ea
> --- /dev/null
> +++ b/fs/xfs/libxfs/xfs_health.h
> @@ -0,0 +1,201 @@
...
> +/* Secondary state related to (but not primary evidence of) health problems. */
> +#define XFS_HEALTH_FS_SECONDARY	(0)
> +#define XFS_HEALTH_RT_SECONDARY	(0)
> +#define XFS_HEALTH_AG_SECONDARY	(0)
> +#define XFS_HEALTH_INO_SECONDARY (0)
> +
> +/* Evidence of health problems elsewhere. */
> +#define XFS_HEALTH_FS_INDIRECT	(0)
> +#define XFS_HEALTH_RT_INDIRECT	(0)
> +#define XFS_HEALTH_AG_INDIRECT	(0)
> +#define XFS_HEALTH_INO_INDIRECT	(0)
> +

I'm a little confused by the secondary tracking logic in general. Some
of these masks are cleared in the associated helpers below (i.e., when
the fs is made healthy and all primary bits are cleared), but there are
no secondary or indirect bits in use. On a quick look ahead, these are
still zeroed as of the end of this series as well. Can we defer this
secondary tracking logic until there's a demonstrated use? Otherwise the
rest looks reasonable to me.

Brian

> +/* All health masks. */
> +#define XFS_HEALTH_FS_ALL	(XFS_HEALTH_FS_PRIMARY | \
> +				 XFS_HEALTH_FS_SECONDARY | \
> +				 XFS_HEALTH_FS_INDIRECT)
> +
> +#define XFS_HEALTH_RT_ALL	(XFS_HEALTH_RT_PRIMARY | \
> +				 XFS_HEALTH_RT_SECONDARY | \
> +				 XFS_HEALTH_RT_INDIRECT)
> +
> +#define XFS_HEALTH_AG_ALL	(XFS_HEALTH_AG_PRIMARY | \
> +				 XFS_HEALTH_AG_SECONDARY | \
> +				 XFS_HEALTH_AG_INDIRECT)
> +
> +#define XFS_HEALTH_INO_ALL	(XFS_HEALTH_INO_PRIMARY | \
> +				 XFS_HEALTH_INO_SECONDARY | \
> +				 XFS_HEALTH_INO_INDIRECT)
> +
> +/* These functions must be provided by the xfs implementation. */
> +
> +void xfs_fs_mark_sick(struct xfs_mount *mp, unsigned int mask);
> +void xfs_fs_mark_healthy(struct xfs_mount *mp, unsigned int mask);
> +unsigned int xfs_fs_measure_sickness(struct xfs_mount *mp);
> +
> +void xfs_rt_mark_sick(struct xfs_mount *mp, unsigned int mask);
> +void xfs_rt_mark_healthy(struct xfs_mount *mp, unsigned int mask);
> +unsigned int xfs_rt_measure_sickness(struct xfs_mount *mp);
> +
> +void xfs_ag_mark_sick(struct xfs_perag *pag, unsigned int mask);
> +void xfs_ag_mark_healthy(struct xfs_perag *pag, unsigned int mask);
> +unsigned int xfs_ag_measure_sickness(struct xfs_perag *pag);
> +
> +void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask);
> +void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask);
> +unsigned int xfs_inode_measure_sickness(struct xfs_inode *ip);
> +
> +/* Now some helpers. */
> +
> +static inline bool
> +xfs_fs_is_sick(struct xfs_mount *mp, unsigned int mask)
> +{
> +	return (xfs_fs_measure_sickness(mp) & mask) != 0;
> +}
> +
> +static inline bool
> +xfs_rt_is_sick(struct xfs_mount *mp, unsigned int mask)
> +{
> +	return (xfs_rt_measure_sickness(mp) & mask) != 0;
> +}
> +
> +static inline bool
> +xfs_ag_is_sick(struct xfs_perag *pag, unsigned int mask)
> +{
> +	return (xfs_ag_measure_sickness(pag) & mask) != 0;
> +}
> +
> +static inline bool
> +xfs_inode_is_sick(struct xfs_inode *ip, unsigned int mask)
> +{
> +	return (xfs_inode_measure_sickness(ip) & mask) != 0;
> +}
> +
> +static inline bool
> +xfs_fs_healthy(struct xfs_mount *mp)
> +{
> +	return xfs_fs_measure_sickness(mp) == 0;
> +}
> +
> +static inline bool
> +xfs_rt_healthy(struct xfs_mount *mp)
> +{
> +	return xfs_rt_measure_sickness(mp) == 0;
> +}
> +
> +static inline bool
> +xfs_ag_healthy(struct xfs_perag *pag)
> +{
> +	return xfs_ag_measure_sickness(pag) == 0;
> +}
> +
> +static inline bool
> +xfs_inode_healthy(struct xfs_inode *ip)
> +{
> +	return xfs_inode_measure_sickness(ip) == 0;
> +}
> +
> +#endif	/* __XFS_HEALTH_H__ */
> diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> new file mode 100644
> index 000000000000..e9d6859f7501
> --- /dev/null
> +++ b/fs/xfs/xfs_health.c
> @@ -0,0 +1,192 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_bit.h"
> +#include "xfs_sb.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_inode.h"
> +#include "xfs_trace.h"
> +#include "xfs_health.h"
> +
> +/* Mark unhealthy per-fs metadata. */
> +void
> +xfs_fs_mark_sick(
> +	struct xfs_mount	*mp,
> +	unsigned int		mask)
> +{
> +	ASSERT(!(mask & ~XFS_HEALTH_FS_ALL));
> +	trace_xfs_fs_mark_sick(mp, mask);
> +
> +	spin_lock(&mp->m_sb_lock);
> +	mp->m_sick |= mask;
> +	spin_unlock(&mp->m_sb_lock);
> +}
> +
> +/* Mark a per-fs metadata healed. */
> +void
> +xfs_fs_mark_healthy(
> +	struct xfs_mount	*mp,
> +	unsigned int		mask)
> +{
> +	ASSERT(!(mask & ~XFS_HEALTH_FS_ALL));
> +	trace_xfs_fs_mark_healthy(mp, mask);
> +
> +	spin_lock(&mp->m_sb_lock);
> +	mp->m_sick &= ~mask;
> +	if (!(mp->m_sick & XFS_HEALTH_FS_PRIMARY))
> +		mp->m_sick &= ~XFS_HEALTH_FS_SECONDARY;
> +	spin_unlock(&mp->m_sb_lock);
> +}
> +
> +/* Sample which per-fs metadata are unhealthy. */
> +unsigned int
> +xfs_fs_measure_sickness(
> +	struct xfs_mount	*mp)
> +{
> +	unsigned int		ret;
> +
> +	spin_lock(&mp->m_sb_lock);
> +	ret = mp->m_sick;
> +	spin_unlock(&mp->m_sb_lock);
> +	return ret;
> +}
> +
> +/* Mark unhealthy realtime metadata. */
> +void
> +xfs_rt_mark_sick(
> +	struct xfs_mount	*mp,
> +	unsigned int		mask)
> +{
> +	ASSERT(!(mask & ~XFS_HEALTH_RT_ALL));
> +	trace_xfs_rt_mark_sick(mp, mask);
> +
> +	spin_lock(&mp->m_sb_lock);
> +	mp->m_rt_sick |= mask;
> +	spin_unlock(&mp->m_sb_lock);
> +}
> +
> +/* Mark a realtime metadata healed. */
> +void
> +xfs_rt_mark_healthy(
> +	struct xfs_mount	*mp,
> +	unsigned int		mask)
> +{
> +	ASSERT(!(mask & ~XFS_HEALTH_RT_ALL));
> +	trace_xfs_rt_mark_healthy(mp, mask);
> +
> +	spin_lock(&mp->m_sb_lock);
> +	mp->m_rt_sick &= ~mask;
> +	if (!(mp->m_rt_sick & XFS_HEALTH_RT_PRIMARY))
> +		mp->m_rt_sick &= ~XFS_HEALTH_RT_SECONDARY;
> +	spin_unlock(&mp->m_sb_lock);
> +}
> +
> +/* Sample which realtime metadata are unhealthy. */
> +unsigned int
> +xfs_rt_measure_sickness(
> +	struct xfs_mount	*mp)
> +{
> +	unsigned int		ret;
> +
> +	spin_lock(&mp->m_sb_lock);
> +	ret = mp->m_rt_sick;
> +	spin_unlock(&mp->m_sb_lock);
> +	return ret;
> +}
> +
> +/* Mark unhealthy per-ag metadata. */
> +void
> +xfs_ag_mark_sick(
> +	struct xfs_perag	*pag,
> +	unsigned int		mask)
> +{
> +	ASSERT(!(mask & ~XFS_HEALTH_AG_ALL));
> +	trace_xfs_ag_mark_sick(pag->pag_mount, pag->pag_agno, mask);
> +
> +	spin_lock(&pag->pag_state_lock);
> +	pag->pag_sick |= mask;
> +	spin_unlock(&pag->pag_state_lock);
> +}
> +
> +/* Mark per-ag metadata ok. */
> +void
> +xfs_ag_mark_healthy(
> +	struct xfs_perag	*pag,
> +	unsigned int		mask)
> +{
> +	ASSERT(!(mask & ~XFS_HEALTH_AG_ALL));
> +	trace_xfs_ag_mark_healthy(pag->pag_mount, pag->pag_agno, mask);
> +
> +	spin_lock(&pag->pag_state_lock);
> +	pag->pag_sick &= ~mask;
> +	if (!(pag->pag_sick & XFS_HEALTH_AG_PRIMARY))
> +		pag->pag_sick &= ~XFS_HEALTH_AG_SECONDARY;
> +	spin_unlock(&pag->pag_state_lock);
> +}
> +
> +/* Sample which per-ag metadata are unhealthy. */
> +unsigned int
> +xfs_ag_measure_sickness(
> +	struct xfs_perag	*pag)
> +{
> +	unsigned int		ret;
> +
> +	spin_lock(&pag->pag_state_lock);
> +	ret = pag->pag_sick;
> +	spin_unlock(&pag->pag_state_lock);
> +	return ret;
> +}
> +
> +/* Mark the unhealthy parts of an inode. */
> +void
> +xfs_inode_mark_sick(
> +	struct xfs_inode	*ip,
> +	unsigned int		mask)
> +{
> +	ASSERT(!(mask & ~XFS_HEALTH_INO_ALL));
> +	trace_xfs_inode_mark_sick(ip, mask);
> +
> +	spin_lock(&ip->i_flags_lock);
> +	ip->i_sick |= mask;
> +	spin_unlock(&ip->i_flags_lock);
> +}
> +
> +/* Mark parts of an inode healed. */
> +void
> +xfs_inode_mark_healthy(
> +	struct xfs_inode	*ip,
> +	unsigned int		mask)
> +{
> +	ASSERT(!(mask & ~XFS_HEALTH_INO_ALL));
> +	trace_xfs_inode_mark_healthy(ip, mask);
> +
> +	spin_lock(&ip->i_flags_lock);
> +	ip->i_sick &= ~mask;
> +	if (!(ip->i_sick & XFS_HEALTH_INO_PRIMARY))
> +		ip->i_sick &= ~XFS_HEALTH_INO_SECONDARY;
> +	spin_unlock(&ip->i_flags_lock);
> +}
> +
> +/* Sample which parts of an inode are unhealthy. */
> +unsigned int
> +xfs_inode_measure_sickness(
> +	struct xfs_inode	*ip)
> +{
> +	unsigned int		ret;
> +
> +	spin_lock(&ip->i_flags_lock);
> +	ret = ip->i_sick;
> +	spin_unlock(&ip->i_flags_lock);
> +	return ret;
> +}
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 88239c2dd824..877acdd5f026 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -45,6 +45,13 @@ typedef struct xfs_inode {
>  	mrlock_t		i_lock;		/* inode lock */
>  	mrlock_t		i_mmaplock;	/* inode mmap IO lock */
>  	atomic_t		i_pincount;	/* inode pin count */
> +
> +	/*
> +	 * Bitset noting which parts of an inode are not healthy.
> +	 * Callers must hold i_flags_lock before accessing this field.
> +	 */
> +	unsigned int		i_sick;
> +
>  	spinlock_t		i_flags_lock;	/* inode i_flags lock */
>  	/* Miscellaneous state. */
>  	unsigned long		i_flags;	/* see defined flags below */
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 950752e5ec2c..fc1f24dd0386 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -231,6 +231,7 @@ xfs_initialize_perag(
>  		error = xfs_iunlink_init(pag);
>  		if (error)
>  			goto out_hash_destroy;
> +		spin_lock_init(&pag->pag_state_lock);
>  	}
>  
>  	index = xfs_set_inode_alloc(mp, agcount);
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 15dc02964113..63bbafb01eb5 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -60,6 +60,13 @@ struct xfs_error_cfg {
>  typedef struct xfs_mount {
>  	struct super_block	*m_super;
>  	xfs_tid_t		m_tid;		/* next unused tid for fs */
> +
> +	/*
> +	 * Bitset of unhealthy per-fs metadata.
> +	 * Callers must hold m_sb_lock to access this field.
> +	 */
> +	unsigned int		m_sick;
> +
>  	struct xfs_ail		*m_ail;		/* fs active log item list */
>  
>  	struct xfs_sb		m_sb;		/* copy of fs superblock */
> @@ -71,6 +78,11 @@ typedef struct xfs_mount {
>  	struct xfs_buf		*m_sb_bp;	/* buffer for superblock */
>  	char			*m_fsname;	/* filesystem name */
>  	int			m_fsname_len;	/* strlen of fs name */
> +	/*
> +	 * Bitset of unhealthy rt volume metadata.
> +	 * Callers must hold m_sb_lock to access this field.
> +	 */
> +	unsigned int		m_rt_sick;
>  	char			*m_rtname;	/* realtime device name */
>  	char			*m_logname;	/* external log device name */
>  	int			m_bsize;	/* fs logical block size */
> @@ -389,6 +401,17 @@ typedef struct xfs_perag {
>  	 * or have some other means to control concurrency.
>  	 */
>  	struct rhashtable	pagi_unlinked_hash;
> +
> +	/* Spinlock to protect in-core per-ag state */
> +	spinlock_t	pag_state_lock;
> +
> +	/*
> +	 * Bitset of unhealthy AG metadata.
> +	 *
> +	 * Callers should hold pag_state_lock and the relevant AG header buffer
> +	 * lock before accessing this field.
> +	 */
> +	unsigned int	pag_sick;
>  } xfs_perag_t;
>  
>  static inline struct xfs_ag_resv *
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 47fb07d86efd..f079841c7af6 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3440,6 +3440,79 @@ DEFINE_AGINODE_EVENT(xfs_iunlink);
>  DEFINE_AGINODE_EVENT(xfs_iunlink_remove);
>  DEFINE_AG_EVENT(xfs_iunlink_map_prev_fallback);
>  
> +DECLARE_EVENT_CLASS(xfs_fs_corrupt_class,
> +	TP_PROTO(struct xfs_mount *mp, unsigned int flags),
> +	TP_ARGS(mp, flags),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(unsigned int, flags)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->flags = flags;
> +	),
> +	TP_printk("dev %d:%d flags 0x%x",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->flags)
> +);
> +#define DEFINE_FS_CORRUPT_EVENT(name)	\
> +DEFINE_EVENT(xfs_fs_corrupt_class, name,	\
> +	TP_PROTO(struct xfs_mount *mp, unsigned int flags), \
> +	TP_ARGS(mp, flags))
> +DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick);
> +DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy);
> +DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick);
> +DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy);
> +
> +DECLARE_EVENT_CLASS(xfs_ag_corrupt_class,
> +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags),
> +	TP_ARGS(mp, agno, flags),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_agnumber_t, agno)
> +		__field(unsigned int, flags)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->agno = agno;
> +		__entry->flags = flags;
> +	),
> +	TP_printk("dev %d:%d agno %u flags 0x%x",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->agno, __entry->flags)
> +);
> +#define DEFINE_AG_CORRUPT_EVENT(name)	\
> +DEFINE_EVENT(xfs_ag_corrupt_class, name,	\
> +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
> +		 unsigned int flags), \
> +	TP_ARGS(mp, agno, flags))
> +DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick);
> +DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy);
> +
> +DECLARE_EVENT_CLASS(xfs_inode_corrupt_class,
> +	TP_PROTO(struct xfs_inode *ip, unsigned int flags),
> +	TP_ARGS(ip, flags),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_ino_t, ino)
> +		__field(unsigned int, flags)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = ip->i_mount->m_super->s_dev;
> +		__entry->ino = ip->i_ino;
> +		__entry->flags = flags;
> +	),
> +	TP_printk("dev %d:%d ino 0x%llx flags 0x%x",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ino, __entry->flags)
> +);
> +#define DEFINE_INODE_CORRUPT_EVENT(name)	\
> +DEFINE_EVENT(xfs_inode_corrupt_class, name,	\
> +	TP_PROTO(struct xfs_inode *ip, unsigned int flags), \
> +	TP_ARGS(ip, flags))
> +DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_sick);
> +DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_healthy);
> +
>  #endif /* _TRACE_XFS_H */
>  
>  #undef TRACE_INCLUDE_PATH
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/10] xfs: replace the BAD_SUMMARY mount flag with the equivalent health code
  2019-04-01 17:10 ` [PATCH 02/10] xfs: replace the BAD_SUMMARY mount flag with the equivalent health code Darrick J. Wong
@ 2019-04-02 13:22   ` Brian Foster
  0 siblings, 0 replies; 41+ messages in thread
From: Brian Foster @ 2019-04-02 13:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Apr 01, 2019 at 10:10:21AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Replace the BAD_SUMMARY mount flag with calls to the equivalent health
> tracking code.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_sb.c |    5 +++--
>  fs/xfs/xfs_log.c       |    3 ++-
>  fs/xfs/xfs_mount.c     |    9 ++++-----
>  fs/xfs/xfs_mount.h     |    1 -
>  4 files changed, 9 insertions(+), 9 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index f96b1997938e..f0309b74e377 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -30,6 +30,7 @@
>  #include "xfs_refcount_btree.h"
>  #include "xfs_da_format.h"
>  #include "xfs_da_btree.h"
> +#include "xfs_health.h"
>  
>  /*
>   * Physical superblock buffer manipulations. Shared with libxfs in userspace.
> @@ -907,7 +908,7 @@ xfs_initialize_perag_data(
>  	/*
>  	 * If the new summary counts are obviously incorrect, fail the
>  	 * mount operation because that implies the AGFs are also corrupt.
> -	 * Clear BAD_SUMMARY so that we don't unmount with a dirty log, which
> +	 * Clear FS_COUNTERS so that we don't unmount with a dirty log, which
>  	 * will prevent xfs_repair from fixing anything.
>  	 */
>  	if (fdblocks > sbp->sb_dblocks || ifree > ialloc) {
> @@ -925,7 +926,7 @@ xfs_initialize_perag_data(
>  
>  	xfs_reinit_percpu_counters(mp);
>  out:
> -	mp->m_flags &= ~XFS_MOUNT_BAD_SUMMARY;
> +	xfs_fs_mark_healthy(mp, XFS_HEALTH_FS_COUNTERS);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index c3b610b687d1..0f418842a035 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -23,6 +23,7 @@
>  #include "xfs_cksum.h"
>  #include "xfs_sysfs.h"
>  #include "xfs_sb.h"
> +#include "xfs_health.h"
>  
>  kmem_zone_t	*xfs_log_ticket_zone;
>  
> @@ -861,7 +862,7 @@ xfs_log_write_unmount_record(
>  	 * recalculated during log recovery at next mount.  Refer to
>  	 * xlog_check_unmount_rec for more details.
>  	 */
> -	if (XFS_TEST_ERROR((mp->m_flags & XFS_MOUNT_BAD_SUMMARY), mp,
> +	if (XFS_TEST_ERROR(xfs_fs_is_sick(mp, XFS_HEALTH_FS_COUNTERS), mp,
>  			XFS_ERRTAG_FORCE_SUMMARY_RECALC)) {
>  		xfs_alert(mp, "%s: will fix summary counters at next mount",
>  				__func__);
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index fc1f24dd0386..a43ca655a431 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -34,6 +34,7 @@
>  #include "xfs_refcount_btree.h"
>  #include "xfs_reflink.h"
>  #include "xfs_extent_busy.h"
> +#include "xfs_health.h"
>  
>  
>  static DEFINE_MUTEX(xfs_uuid_table_mutex);
> @@ -647,7 +648,7 @@ xfs_check_summary_counts(
>  	    (mp->m_sb.sb_fdblocks > mp->m_sb.sb_dblocks ||
>  	     !xfs_verify_icount(mp, mp->m_sb.sb_icount) ||
>  	     mp->m_sb.sb_ifree > mp->m_sb.sb_icount))
> -		mp->m_flags |= XFS_MOUNT_BAD_SUMMARY;
> +		xfs_fs_mark_sick(mp, XFS_HEALTH_FS_COUNTERS);
>  
>  	/*
>  	 * We can safely re-initialise incore superblock counters from the
> @@ -662,7 +663,7 @@ xfs_check_summary_counts(
>  	 */
>  	if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
>  	     XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
> -	    !(mp->m_flags & XFS_MOUNT_BAD_SUMMARY))
> +	    !xfs_fs_is_sick(mp, XFS_HEALTH_FS_COUNTERS))
>  		return 0;
>  
>  	return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount);
> @@ -1451,7 +1452,5 @@ xfs_force_summary_recalc(
>  	if (!xfs_sb_version_haslazysbcount(&mp->m_sb))
>  		return;
>  
> -	spin_lock(&mp->m_sb_lock);
> -	mp->m_flags |= XFS_MOUNT_BAD_SUMMARY;
> -	spin_unlock(&mp->m_sb_lock);
> +	xfs_fs_mark_sick(mp, XFS_HEALTH_FS_COUNTERS);
>  }
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 63bbafb01eb5..6e7728340ca7 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -211,7 +211,6 @@ typedef struct xfs_mount {
>  						   must be synchronous except
>  						   for space allocations */
>  #define XFS_MOUNT_UNMOUNTING	(1ULL << 1)	/* filesystem is unmounting */
> -#define XFS_MOUNT_BAD_SUMMARY	(1ULL << 2)	/* summary counters are bad */
>  #define XFS_MOUNT_WAS_CLEAN	(1ULL << 3)
>  #define XFS_MOUNT_FS_SHUTDOWN	(1ULL << 4)	/* atomic stop of all filesystem
>  						   operations, typically for
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem
  2019-04-01 17:10 ` [PATCH 03/10] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem Darrick J. Wong
@ 2019-04-02 13:24   ` Brian Foster
  2019-04-02 13:40     ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-02 13:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Apr 01, 2019 at 10:10:28AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> If we know the filesystem metadata isn't healthy during unmount, we want
> to encourage the administrator to run xfs_repair right away.  We can't
> do this if BAD_SUMMARY will cause an unclean log unmount to force
> summary recalculation, so turn it off if the fs is bad.
> 

Do you mean we don't want to suggest xfs_repair because we intentionally
cause a dirty log and thus xfs_repair will require to zap it? If so, the
wording above and the comment in xfs_health_unmount() could be a bit
more specific on the reasoning.

Also, what exactly is the side effect without this change in place? The
user would have to zap the log from xfs_repair, but the somewhat
artificial unclean unmount doesn't actually require log recovery to fix
up the fs outside of the whole summary counter thing, right? IOW, would
the user zapping the log actually lose anything besides the bad summary
counter indication? I ask just because even though we warn the user to
run repair, that doesn't mean they'll actually do it and so it seems
there is a bit of a tradeoff in that regard.

> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

BTW, I get the following compiler warning on this patch:

In file included from fs/xfs/xfs_trace.h:12,
                 from fs/xfs/xfs_health.c:19:
fs/xfs/xfs_health.c: In function ‘xfs_health_unmount’:
./include/linux/tracepoint.h:195:6: warning: ‘sick’ may be used uninitialized in this function [-Wmaybe-uninitialized]                                                                                            
     ((void(*)(proto))(it_func))(args); \
      ^
fs/xfs/xfs_health.c:33:16: note: ‘sick’ was declared here
  unsigned int  sick;

Brian

>  fs/xfs/libxfs/xfs_health.h |    2 +
>  fs/xfs/xfs_health.c        |   59 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_mount.c         |    2 +
>  fs/xfs/xfs_trace.h         |    3 ++
>  4 files changed, 66 insertions(+)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
> index 0d51bd2689ea..269b124dc1d7 100644
> --- a/fs/xfs/libxfs/xfs_health.h
> +++ b/fs/xfs/libxfs/xfs_health.h
> @@ -148,6 +148,8 @@ void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask);
>  void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask);
>  unsigned int xfs_inode_measure_sickness(struct xfs_inode *ip);
>  
> +void xfs_health_unmount(struct xfs_mount *mp);
> +
>  /* Now some helpers. */
>  
>  static inline bool
> diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> index e9d6859f7501..6e2da858c356 100644
> --- a/fs/xfs/xfs_health.c
> +++ b/fs/xfs/xfs_health.c
> @@ -19,6 +19,65 @@
>  #include "xfs_trace.h"
>  #include "xfs_health.h"
>  
> +/*
> + * Warn about metadata corruption that we detected but haven't fixed, and
> + * make sure we're not sitting on anything that would get in the way of
> + * recovery.
> + */
> +void
> +xfs_health_unmount(
> +	struct xfs_mount	*mp)
> +{
> +	struct xfs_perag	*pag;
> +	xfs_agnumber_t		agno;
> +	unsigned int		sick;
> +	bool			warn = false;
> +
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		return;
> +
> +	/* Measure AG corruption levels. */
> +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> +		pag = xfs_perag_get(mp, agno);
> +		spin_lock(&pag->pag_state_lock);
> +		if (pag->pag_sick) {
> +			trace_xfs_ag_unfixed_corruption(mp, agno, sick);
> +			warn = true;
> +		}
> +		spin_unlock(&pag->pag_state_lock);
> +		xfs_perag_put(pag);
> +	}
> +
> +	/* Measure realtime volume corruption levels. */
> +	sick = xfs_rt_measure_sickness(mp);
> +	if (sick) {
> +		trace_xfs_rt_unfixed_corruption(mp, sick);
> +		warn = true;
> +	}
> +
> +	/* Measure fs corruption and keep the sample around for the warning. */
> +	sick = xfs_fs_measure_sickness(mp);
> +	if (sick) {
> +		trace_xfs_fs_unfixed_corruption(mp, sick);
> +		warn = true;
> +	}
> +
> +	if (warn) {
> +		xfs_warn(mp,
> +"Uncorrected metadata errors detected; please run xfs_repair.");
> +
> +		/*
> +		 * If we have unhealthy metadata, we want the admin to run
> +		 * xfs_repair after unmounting.  They can't do that if the log
> +		 * is written out without a clean unmount record (such as when
> +		 * the summary counters are marked unhealthy to force
> +		 * recalculation of the summary counters) so clear it.
> +		 */
> +		if (sick & XFS_HEALTH_FS_COUNTERS)
> +			xfs_fs_mark_healthy(mp, XFS_HEALTH_FS_COUNTERS);
> +	}
> +}
> +
>  /* Mark unhealthy per-fs metadata. */
>  void
>  xfs_fs_mark_sick(
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index a43ca655a431..f0f73d598a0c 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1075,6 +1075,7 @@ xfs_mountfs(
>  	 */
>  	cancel_delayed_work_sync(&mp->m_reclaim_work);
>  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> +	xfs_health_unmount(mp);
>   out_log_dealloc:
>  	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
>  	xfs_log_mount_cancel(mp);
> @@ -1157,6 +1158,7 @@ xfs_unmountfs(
>  	 */
>  	cancel_delayed_work_sync(&mp->m_reclaim_work);
>  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> +	xfs_health_unmount(mp);
>  
>  	xfs_qm_unmount(mp);
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index f079841c7af6..2464ea351f83 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3461,8 +3461,10 @@ DEFINE_EVENT(xfs_fs_corrupt_class, name,	\
>  	TP_ARGS(mp, flags))
>  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick);
>  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy);
> +DEFINE_FS_CORRUPT_EVENT(xfs_fs_unfixed_corruption);
>  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick);
>  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy);
> +DEFINE_FS_CORRUPT_EVENT(xfs_rt_unfixed_corruption);
>  
>  DECLARE_EVENT_CLASS(xfs_ag_corrupt_class,
>  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags),
> @@ -3488,6 +3490,7 @@ DEFINE_EVENT(xfs_ag_corrupt_class, name,	\
>  	TP_ARGS(mp, agno, flags))
>  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick);
>  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy);
> +DEFINE_AG_CORRUPT_EVENT(xfs_ag_unfixed_corruption);
>  
>  DECLARE_EVENT_CLASS(xfs_inode_corrupt_class,
>  	TP_PROTO(struct xfs_inode *ip, unsigned int flags),
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 01/10] xfs: track metadata health levels
  2019-04-02 13:22   ` Brian Foster
@ 2019-04-02 13:30     ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-02 13:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Apr 02, 2019 at 09:22:40AM -0400, Brian Foster wrote:
> On Mon, Apr 01, 2019 at 10:10:15AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Add the necessary in-core metadata fields to keep track of which parts
> > of the filesystem have been observed to be unhealthy, and print a
> > warning at unmount time if we have unfixed problems.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/Makefile            |    1 
> >  fs/xfs/libxfs/xfs_health.h |  201 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_health.c        |  192 ++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_inode.h         |    7 ++
> >  fs/xfs/xfs_mount.c         |    1 
> >  fs/xfs/xfs_mount.h         |   23 +++++
> >  fs/xfs/xfs_trace.h         |   73 ++++++++++++++++
> >  7 files changed, 498 insertions(+)
> >  create mode 100644 fs/xfs/libxfs/xfs_health.h
> >  create mode 100644 fs/xfs/xfs_health.c
> > 
> > 
> ...
> > diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
> > new file mode 100644
> > index 000000000000..0d51bd2689ea
> > --- /dev/null
> > +++ b/fs/xfs/libxfs/xfs_health.h
> > @@ -0,0 +1,201 @@
> ...
> > +/* Secondary state related to (but not primary evidence of) health problems. */
> > +#define XFS_HEALTH_FS_SECONDARY	(0)
> > +#define XFS_HEALTH_RT_SECONDARY	(0)
> > +#define XFS_HEALTH_AG_SECONDARY	(0)
> > +#define XFS_HEALTH_INO_SECONDARY (0)
> > +
> > +/* Evidence of health problems elsewhere. */
> > +#define XFS_HEALTH_FS_INDIRECT	(0)
> > +#define XFS_HEALTH_RT_INDIRECT	(0)
> > +#define XFS_HEALTH_AG_INDIRECT	(0)
> > +#define XFS_HEALTH_INO_INDIRECT	(0)
> > +
> 
> I'm a little confused by the secondary tracking logic in general. Some
> of these masks are cleared in the associated helpers below (i.e., when
> the fs is made healthy and all primary bits are cleared), but there are
> no secondary or indirect bits in use. On a quick look ahead, these are
> still zeroed as of the end of this series as well. Can we defer this
> secondary tracking logic until there's a demonstrated use? Otherwise the
> rest looks reasonable to me.

Doh, I forgot that I don't start using the indirect flags until much
later (specifically, repair part 2) so all this can fall out until then.
The future use for indirect flags is so that the AG health flags can
remember if we inactivated an inode that had primary health flags set.

But, we don't need that part yet so I think all that can drop out until
that later series.

--D

> Brian
> 
> > +/* All health masks. */
> > +#define XFS_HEALTH_FS_ALL	(XFS_HEALTH_FS_PRIMARY | \
> > +				 XFS_HEALTH_FS_SECONDARY | \
> > +				 XFS_HEALTH_FS_INDIRECT)
> > +
> > +#define XFS_HEALTH_RT_ALL	(XFS_HEALTH_RT_PRIMARY | \
> > +				 XFS_HEALTH_RT_SECONDARY | \
> > +				 XFS_HEALTH_RT_INDIRECT)
> > +
> > +#define XFS_HEALTH_AG_ALL	(XFS_HEALTH_AG_PRIMARY | \
> > +				 XFS_HEALTH_AG_SECONDARY | \
> > +				 XFS_HEALTH_AG_INDIRECT)
> > +
> > +#define XFS_HEALTH_INO_ALL	(XFS_HEALTH_INO_PRIMARY | \
> > +				 XFS_HEALTH_INO_SECONDARY | \
> > +				 XFS_HEALTH_INO_INDIRECT)
> > +
> > +/* These functions must be provided by the xfs implementation. */
> > +
> > +void xfs_fs_mark_sick(struct xfs_mount *mp, unsigned int mask);
> > +void xfs_fs_mark_healthy(struct xfs_mount *mp, unsigned int mask);
> > +unsigned int xfs_fs_measure_sickness(struct xfs_mount *mp);
> > +
> > +void xfs_rt_mark_sick(struct xfs_mount *mp, unsigned int mask);
> > +void xfs_rt_mark_healthy(struct xfs_mount *mp, unsigned int mask);
> > +unsigned int xfs_rt_measure_sickness(struct xfs_mount *mp);
> > +
> > +void xfs_ag_mark_sick(struct xfs_perag *pag, unsigned int mask);
> > +void xfs_ag_mark_healthy(struct xfs_perag *pag, unsigned int mask);
> > +unsigned int xfs_ag_measure_sickness(struct xfs_perag *pag);
> > +
> > +void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask);
> > +void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask);
> > +unsigned int xfs_inode_measure_sickness(struct xfs_inode *ip);
> > +
> > +/* Now some helpers. */
> > +
> > +static inline bool
> > +xfs_fs_is_sick(struct xfs_mount *mp, unsigned int mask)
> > +{
> > +	return (xfs_fs_measure_sickness(mp) & mask) != 0;
> > +}
> > +
> > +static inline bool
> > +xfs_rt_is_sick(struct xfs_mount *mp, unsigned int mask)
> > +{
> > +	return (xfs_rt_measure_sickness(mp) & mask) != 0;
> > +}
> > +
> > +static inline bool
> > +xfs_ag_is_sick(struct xfs_perag *pag, unsigned int mask)
> > +{
> > +	return (xfs_ag_measure_sickness(pag) & mask) != 0;
> > +}
> > +
> > +static inline bool
> > +xfs_inode_is_sick(struct xfs_inode *ip, unsigned int mask)
> > +{
> > +	return (xfs_inode_measure_sickness(ip) & mask) != 0;
> > +}
> > +
> > +static inline bool
> > +xfs_fs_healthy(struct xfs_mount *mp)
> > +{
> > +	return xfs_fs_measure_sickness(mp) == 0;
> > +}
> > +
> > +static inline bool
> > +xfs_rt_healthy(struct xfs_mount *mp)
> > +{
> > +	return xfs_rt_measure_sickness(mp) == 0;
> > +}
> > +
> > +static inline bool
> > +xfs_ag_healthy(struct xfs_perag *pag)
> > +{
> > +	return xfs_ag_measure_sickness(pag) == 0;
> > +}
> > +
> > +static inline bool
> > +xfs_inode_healthy(struct xfs_inode *ip)
> > +{
> > +	return xfs_inode_measure_sickness(ip) == 0;
> > +}
> > +
> > +#endif	/* __XFS_HEALTH_H__ */
> > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > new file mode 100644
> > index 000000000000..e9d6859f7501
> > --- /dev/null
> > +++ b/fs/xfs/xfs_health.c
> > @@ -0,0 +1,192 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + */
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_format.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_bit.h"
> > +#include "xfs_sb.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_defer.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_trace.h"
> > +#include "xfs_health.h"
> > +
> > +/* Mark unhealthy per-fs metadata. */
> > +void
> > +xfs_fs_mark_sick(
> > +	struct xfs_mount	*mp,
> > +	unsigned int		mask)
> > +{
> > +	ASSERT(!(mask & ~XFS_HEALTH_FS_ALL));
> > +	trace_xfs_fs_mark_sick(mp, mask);
> > +
> > +	spin_lock(&mp->m_sb_lock);
> > +	mp->m_sick |= mask;
> > +	spin_unlock(&mp->m_sb_lock);
> > +}
> > +
> > +/* Mark a per-fs metadata healed. */
> > +void
> > +xfs_fs_mark_healthy(
> > +	struct xfs_mount	*mp,
> > +	unsigned int		mask)
> > +{
> > +	ASSERT(!(mask & ~XFS_HEALTH_FS_ALL));
> > +	trace_xfs_fs_mark_healthy(mp, mask);
> > +
> > +	spin_lock(&mp->m_sb_lock);
> > +	mp->m_sick &= ~mask;
> > +	if (!(mp->m_sick & XFS_HEALTH_FS_PRIMARY))
> > +		mp->m_sick &= ~XFS_HEALTH_FS_SECONDARY;
> > +	spin_unlock(&mp->m_sb_lock);
> > +}
> > +
> > +/* Sample which per-fs metadata are unhealthy. */
> > +unsigned int
> > +xfs_fs_measure_sickness(
> > +	struct xfs_mount	*mp)
> > +{
> > +	unsigned int		ret;
> > +
> > +	spin_lock(&mp->m_sb_lock);
> > +	ret = mp->m_sick;
> > +	spin_unlock(&mp->m_sb_lock);
> > +	return ret;
> > +}
> > +
> > +/* Mark unhealthy realtime metadata. */
> > +void
> > +xfs_rt_mark_sick(
> > +	struct xfs_mount	*mp,
> > +	unsigned int		mask)
> > +{
> > +	ASSERT(!(mask & ~XFS_HEALTH_RT_ALL));
> > +	trace_xfs_rt_mark_sick(mp, mask);
> > +
> > +	spin_lock(&mp->m_sb_lock);
> > +	mp->m_rt_sick |= mask;
> > +	spin_unlock(&mp->m_sb_lock);
> > +}
> > +
> > +/* Mark a realtime metadata healed. */
> > +void
> > +xfs_rt_mark_healthy(
> > +	struct xfs_mount	*mp,
> > +	unsigned int		mask)
> > +{
> > +	ASSERT(!(mask & ~XFS_HEALTH_RT_ALL));
> > +	trace_xfs_rt_mark_healthy(mp, mask);
> > +
> > +	spin_lock(&mp->m_sb_lock);
> > +	mp->m_rt_sick &= ~mask;
> > +	if (!(mp->m_rt_sick & XFS_HEALTH_RT_PRIMARY))
> > +		mp->m_rt_sick &= ~XFS_HEALTH_RT_SECONDARY;
> > +	spin_unlock(&mp->m_sb_lock);
> > +}
> > +
> > +/* Sample which realtime metadata are unhealthy. */
> > +unsigned int
> > +xfs_rt_measure_sickness(
> > +	struct xfs_mount	*mp)
> > +{
> > +	unsigned int		ret;
> > +
> > +	spin_lock(&mp->m_sb_lock);
> > +	ret = mp->m_rt_sick;
> > +	spin_unlock(&mp->m_sb_lock);
> > +	return ret;
> > +}
> > +
> > +/* Mark unhealthy per-ag metadata. */
> > +void
> > +xfs_ag_mark_sick(
> > +	struct xfs_perag	*pag,
> > +	unsigned int		mask)
> > +{
> > +	ASSERT(!(mask & ~XFS_HEALTH_AG_ALL));
> > +	trace_xfs_ag_mark_sick(pag->pag_mount, pag->pag_agno, mask);
> > +
> > +	spin_lock(&pag->pag_state_lock);
> > +	pag->pag_sick |= mask;
> > +	spin_unlock(&pag->pag_state_lock);
> > +}
> > +
> > +/* Mark per-ag metadata ok. */
> > +void
> > +xfs_ag_mark_healthy(
> > +	struct xfs_perag	*pag,
> > +	unsigned int		mask)
> > +{
> > +	ASSERT(!(mask & ~XFS_HEALTH_AG_ALL));
> > +	trace_xfs_ag_mark_healthy(pag->pag_mount, pag->pag_agno, mask);
> > +
> > +	spin_lock(&pag->pag_state_lock);
> > +	pag->pag_sick &= ~mask;
> > +	if (!(pag->pag_sick & XFS_HEALTH_AG_PRIMARY))
> > +		pag->pag_sick &= ~XFS_HEALTH_AG_SECONDARY;
> > +	spin_unlock(&pag->pag_state_lock);
> > +}
> > +
> > +/* Sample which per-ag metadata are unhealthy. */
> > +unsigned int
> > +xfs_ag_measure_sickness(
> > +	struct xfs_perag	*pag)
> > +{
> > +	unsigned int		ret;
> > +
> > +	spin_lock(&pag->pag_state_lock);
> > +	ret = pag->pag_sick;
> > +	spin_unlock(&pag->pag_state_lock);
> > +	return ret;
> > +}
> > +
> > +/* Mark the unhealthy parts of an inode. */
> > +void
> > +xfs_inode_mark_sick(
> > +	struct xfs_inode	*ip,
> > +	unsigned int		mask)
> > +{
> > +	ASSERT(!(mask & ~XFS_HEALTH_INO_ALL));
> > +	trace_xfs_inode_mark_sick(ip, mask);
> > +
> > +	spin_lock(&ip->i_flags_lock);
> > +	ip->i_sick |= mask;
> > +	spin_unlock(&ip->i_flags_lock);
> > +}
> > +
> > +/* Mark parts of an inode healed. */
> > +void
> > +xfs_inode_mark_healthy(
> > +	struct xfs_inode	*ip,
> > +	unsigned int		mask)
> > +{
> > +	ASSERT(!(mask & ~XFS_HEALTH_INO_ALL));
> > +	trace_xfs_inode_mark_healthy(ip, mask);
> > +
> > +	spin_lock(&ip->i_flags_lock);
> > +	ip->i_sick &= ~mask;
> > +	if (!(ip->i_sick & XFS_HEALTH_INO_PRIMARY))
> > +		ip->i_sick &= ~XFS_HEALTH_INO_SECONDARY;
> > +	spin_unlock(&ip->i_flags_lock);
> > +}
> > +
> > +/* Sample which parts of an inode are unhealthy. */
> > +unsigned int
> > +xfs_inode_measure_sickness(
> > +	struct xfs_inode	*ip)
> > +{
> > +	unsigned int		ret;
> > +
> > +	spin_lock(&ip->i_flags_lock);
> > +	ret = ip->i_sick;
> > +	spin_unlock(&ip->i_flags_lock);
> > +	return ret;
> > +}
> > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > index 88239c2dd824..877acdd5f026 100644
> > --- a/fs/xfs/xfs_inode.h
> > +++ b/fs/xfs/xfs_inode.h
> > @@ -45,6 +45,13 @@ typedef struct xfs_inode {
> >  	mrlock_t		i_lock;		/* inode lock */
> >  	mrlock_t		i_mmaplock;	/* inode mmap IO lock */
> >  	atomic_t		i_pincount;	/* inode pin count */
> > +
> > +	/*
> > +	 * Bitset noting which parts of an inode are not healthy.
> > +	 * Callers must hold i_flags_lock before accessing this field.
> > +	 */
> > +	unsigned int		i_sick;
> > +
> >  	spinlock_t		i_flags_lock;	/* inode i_flags lock */
> >  	/* Miscellaneous state. */
> >  	unsigned long		i_flags;	/* see defined flags below */
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index 950752e5ec2c..fc1f24dd0386 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -231,6 +231,7 @@ xfs_initialize_perag(
> >  		error = xfs_iunlink_init(pag);
> >  		if (error)
> >  			goto out_hash_destroy;
> > +		spin_lock_init(&pag->pag_state_lock);
> >  	}
> >  
> >  	index = xfs_set_inode_alloc(mp, agcount);
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 15dc02964113..63bbafb01eb5 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -60,6 +60,13 @@ struct xfs_error_cfg {
> >  typedef struct xfs_mount {
> >  	struct super_block	*m_super;
> >  	xfs_tid_t		m_tid;		/* next unused tid for fs */
> > +
> > +	/*
> > +	 * Bitset of unhealthy per-fs metadata.
> > +	 * Callers must hold m_sb_lock to access this field.
> > +	 */
> > +	unsigned int		m_sick;
> > +
> >  	struct xfs_ail		*m_ail;		/* fs active log item list */
> >  
> >  	struct xfs_sb		m_sb;		/* copy of fs superblock */
> > @@ -71,6 +78,11 @@ typedef struct xfs_mount {
> >  	struct xfs_buf		*m_sb_bp;	/* buffer for superblock */
> >  	char			*m_fsname;	/* filesystem name */
> >  	int			m_fsname_len;	/* strlen of fs name */
> > +	/*
> > +	 * Bitset of unhealthy rt volume metadata.
> > +	 * Callers must hold m_sb_lock to access this field.
> > +	 */
> > +	unsigned int		m_rt_sick;
> >  	char			*m_rtname;	/* realtime device name */
> >  	char			*m_logname;	/* external log device name */
> >  	int			m_bsize;	/* fs logical block size */
> > @@ -389,6 +401,17 @@ typedef struct xfs_perag {
> >  	 * or have some other means to control concurrency.
> >  	 */
> >  	struct rhashtable	pagi_unlinked_hash;
> > +
> > +	/* Spinlock to protect in-core per-ag state */
> > +	spinlock_t	pag_state_lock;
> > +
> > +	/*
> > +	 * Bitset of unhealthy AG metadata.
> > +	 *
> > +	 * Callers should hold pag_state_lock and the relevant AG header buffer
> > +	 * lock before accessing this field.
> > +	 */
> > +	unsigned int	pag_sick;
> >  } xfs_perag_t;
> >  
> >  static inline struct xfs_ag_resv *
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index 47fb07d86efd..f079841c7af6 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -3440,6 +3440,79 @@ DEFINE_AGINODE_EVENT(xfs_iunlink);
> >  DEFINE_AGINODE_EVENT(xfs_iunlink_remove);
> >  DEFINE_AG_EVENT(xfs_iunlink_map_prev_fallback);
> >  
> > +DECLARE_EVENT_CLASS(xfs_fs_corrupt_class,
> > +	TP_PROTO(struct xfs_mount *mp, unsigned int flags),
> > +	TP_ARGS(mp, flags),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(unsigned int, flags)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = mp->m_super->s_dev;
> > +		__entry->flags = flags;
> > +	),
> > +	TP_printk("dev %d:%d flags 0x%x",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __entry->flags)
> > +);
> > +#define DEFINE_FS_CORRUPT_EVENT(name)	\
> > +DEFINE_EVENT(xfs_fs_corrupt_class, name,	\
> > +	TP_PROTO(struct xfs_mount *mp, unsigned int flags), \
> > +	TP_ARGS(mp, flags))
> > +DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick);
> > +DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy);
> > +DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick);
> > +DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy);
> > +
> > +DECLARE_EVENT_CLASS(xfs_ag_corrupt_class,
> > +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags),
> > +	TP_ARGS(mp, agno, flags),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_agnumber_t, agno)
> > +		__field(unsigned int, flags)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = mp->m_super->s_dev;
> > +		__entry->agno = agno;
> > +		__entry->flags = flags;
> > +	),
> > +	TP_printk("dev %d:%d agno %u flags 0x%x",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __entry->agno, __entry->flags)
> > +);
> > +#define DEFINE_AG_CORRUPT_EVENT(name)	\
> > +DEFINE_EVENT(xfs_ag_corrupt_class, name,	\
> > +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
> > +		 unsigned int flags), \
> > +	TP_ARGS(mp, agno, flags))
> > +DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick);
> > +DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy);
> > +
> > +DECLARE_EVENT_CLASS(xfs_inode_corrupt_class,
> > +	TP_PROTO(struct xfs_inode *ip, unsigned int flags),
> > +	TP_ARGS(ip, flags),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_ino_t, ino)
> > +		__field(unsigned int, flags)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = ip->i_mount->m_super->s_dev;
> > +		__entry->ino = ip->i_ino;
> > +		__entry->flags = flags;
> > +	),
> > +	TP_printk("dev %d:%d ino 0x%llx flags 0x%x",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __entry->ino, __entry->flags)
> > +);
> > +#define DEFINE_INODE_CORRUPT_EVENT(name)	\
> > +DEFINE_EVENT(xfs_inode_corrupt_class, name,	\
> > +	TP_PROTO(struct xfs_inode *ip, unsigned int flags), \
> > +	TP_ARGS(ip, flags))
> > +DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_sick);
> > +DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_healthy);
> > +
> >  #endif /* _TRACE_XFS_H */
> >  
> >  #undef TRACE_INCLUDE_PATH
> > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem
  2019-04-02 13:24   ` Brian Foster
@ 2019-04-02 13:40     ` Darrick J. Wong
  2019-04-02 13:53       ` Brian Foster
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-02 13:40 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Apr 02, 2019 at 09:24:45AM -0400, Brian Foster wrote:
> On Mon, Apr 01, 2019 at 10:10:28AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > If we know the filesystem metadata isn't healthy during unmount, we want
> > to encourage the administrator to run xfs_repair right away.  We can't
> > do this if BAD_SUMMARY will cause an unclean log unmount to force
> > summary recalculation, so turn it off if the fs is bad.
> > 
> 
> Do you mean we don't want to suggest xfs_repair because we intentionally
> cause a dirty log and thus xfs_repair will require to zap it? If so, the
> wording above and the comment in xfs_health_unmount() could be a bit
> more specific on the reasoning.

Sort of the opposite?  We want to suggest xfs_repair, but we don't want
to leave the log dirty because that adds the additional step of running
xfs_repair -L to zap the log.  I get the sense that we don't really want
to encourage admins to be cavalier about running that...?

(Would be nice if we could just port log recovery to repair, but that's
a whole separate project...)

> Also, what exactly is the side effect without this change in place? The
> user would have to zap the log from xfs_repair, but the somewhat
> artificial unclean unmount doesn't actually require log recovery to fix
> up the fs outside of the whole summary counter thing, right? IOW, would
> the user zapping the log actually lose anything besides the bad summary
> counter indication?

The log checkpoints at unmount, right?  So I think it's ok to zap the
log when its status is "cleanly unmounted but we didn't record the clean
unmount because we want to force summary recalculation at next mount".

> I ask just because even though we warn the user to
> run repair, that doesn't mean they'll actually do it and so it seems
> there is a bit of a tradeoff in that regard.

Yeah, it's a pity we can't just run xfs_repair ourselves. :)

> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> BTW, I get the following compiler warning on this patch:
> 
> In file included from fs/xfs/xfs_trace.h:12,
>                  from fs/xfs/xfs_health.c:19:
> fs/xfs/xfs_health.c: In function ‘xfs_health_unmount’:
> ./include/linux/tracepoint.h:195:6: warning: ‘sick’ may be used uninitialized in this function [-Wmaybe-uninitialized]                                                                                            
>      ((void(*)(proto))(it_func))(args); \
>       ^
> fs/xfs/xfs_health.c:33:16: note: ‘sick’ was declared here
>   unsigned int  sick;

<nod> Thanks, will fix that for v2.

--D

> Brian
> 
> >  fs/xfs/libxfs/xfs_health.h |    2 +
> >  fs/xfs/xfs_health.c        |   59 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_mount.c         |    2 +
> >  fs/xfs/xfs_trace.h         |    3 ++
> >  4 files changed, 66 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
> > index 0d51bd2689ea..269b124dc1d7 100644
> > --- a/fs/xfs/libxfs/xfs_health.h
> > +++ b/fs/xfs/libxfs/xfs_health.h
> > @@ -148,6 +148,8 @@ void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask);
> >  void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask);
> >  unsigned int xfs_inode_measure_sickness(struct xfs_inode *ip);
> >  
> > +void xfs_health_unmount(struct xfs_mount *mp);
> > +
> >  /* Now some helpers. */
> >  
> >  static inline bool
> > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > index e9d6859f7501..6e2da858c356 100644
> > --- a/fs/xfs/xfs_health.c
> > +++ b/fs/xfs/xfs_health.c
> > @@ -19,6 +19,65 @@
> >  #include "xfs_trace.h"
> >  #include "xfs_health.h"
> >  
> > +/*
> > + * Warn about metadata corruption that we detected but haven't fixed, and
> > + * make sure we're not sitting on anything that would get in the way of
> > + * recovery.
> > + */
> > +void
> > +xfs_health_unmount(
> > +	struct xfs_mount	*mp)
> > +{
> > +	struct xfs_perag	*pag;
> > +	xfs_agnumber_t		agno;
> > +	unsigned int		sick;
> > +	bool			warn = false;
> > +
> > +	if (XFS_FORCED_SHUTDOWN(mp))
> > +		return;
> > +
> > +	/* Measure AG corruption levels. */
> > +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> > +		pag = xfs_perag_get(mp, agno);
> > +		spin_lock(&pag->pag_state_lock);
> > +		if (pag->pag_sick) {
> > +			trace_xfs_ag_unfixed_corruption(mp, agno, sick);
> > +			warn = true;
> > +		}
> > +		spin_unlock(&pag->pag_state_lock);
> > +		xfs_perag_put(pag);
> > +	}
> > +
> > +	/* Measure realtime volume corruption levels. */
> > +	sick = xfs_rt_measure_sickness(mp);
> > +	if (sick) {
> > +		trace_xfs_rt_unfixed_corruption(mp, sick);
> > +		warn = true;
> > +	}
> > +
> > +	/* Measure fs corruption and keep the sample around for the warning. */
> > +	sick = xfs_fs_measure_sickness(mp);
> > +	if (sick) {
> > +		trace_xfs_fs_unfixed_corruption(mp, sick);
> > +		warn = true;
> > +	}
> > +
> > +	if (warn) {
> > +		xfs_warn(mp,
> > +"Uncorrected metadata errors detected; please run xfs_repair.");
> > +
> > +		/*
> > +		 * If we have unhealthy metadata, we want the admin to run
> > +		 * xfs_repair after unmounting.  They can't do that if the log
> > +		 * is written out without a clean unmount record (such as when
> > +		 * the summary counters are marked unhealthy to force
> > +		 * recalculation of the summary counters) so clear it.
> > +		 */
> > +		if (sick & XFS_HEALTH_FS_COUNTERS)
> > +			xfs_fs_mark_healthy(mp, XFS_HEALTH_FS_COUNTERS);
> > +	}
> > +}
> > +
> >  /* Mark unhealthy per-fs metadata. */
> >  void
> >  xfs_fs_mark_sick(
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index a43ca655a431..f0f73d598a0c 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -1075,6 +1075,7 @@ xfs_mountfs(
> >  	 */
> >  	cancel_delayed_work_sync(&mp->m_reclaim_work);
> >  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> > +	xfs_health_unmount(mp);
> >   out_log_dealloc:
> >  	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
> >  	xfs_log_mount_cancel(mp);
> > @@ -1157,6 +1158,7 @@ xfs_unmountfs(
> >  	 */
> >  	cancel_delayed_work_sync(&mp->m_reclaim_work);
> >  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> > +	xfs_health_unmount(mp);
> >  
> >  	xfs_qm_unmount(mp);
> >  
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index f079841c7af6..2464ea351f83 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -3461,8 +3461,10 @@ DEFINE_EVENT(xfs_fs_corrupt_class, name,	\
> >  	TP_ARGS(mp, flags))
> >  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick);
> >  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy);
> > +DEFINE_FS_CORRUPT_EVENT(xfs_fs_unfixed_corruption);
> >  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick);
> >  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy);
> > +DEFINE_FS_CORRUPT_EVENT(xfs_rt_unfixed_corruption);
> >  
> >  DECLARE_EVENT_CLASS(xfs_ag_corrupt_class,
> >  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags),
> > @@ -3488,6 +3490,7 @@ DEFINE_EVENT(xfs_ag_corrupt_class, name,	\
> >  	TP_ARGS(mp, agno, flags))
> >  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick);
> >  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy);
> > +DEFINE_AG_CORRUPT_EVENT(xfs_ag_unfixed_corruption);
> >  
> >  DECLARE_EVENT_CLASS(xfs_inode_corrupt_class,
> >  	TP_PROTO(struct xfs_inode *ip, unsigned int flags),
> > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem
  2019-04-02 13:40     ` Darrick J. Wong
@ 2019-04-02 13:53       ` Brian Foster
  2019-04-02 18:16         ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-02 13:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 02, 2019 at 06:40:10AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 02, 2019 at 09:24:45AM -0400, Brian Foster wrote:
> > On Mon, Apr 01, 2019 at 10:10:28AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > If we know the filesystem metadata isn't healthy during unmount, we want
> > > to encourage the administrator to run xfs_repair right away.  We can't
> > > do this if BAD_SUMMARY will cause an unclean log unmount to force
> > > summary recalculation, so turn it off if the fs is bad.
> > > 
> > 
> > Do you mean we don't want to suggest xfs_repair because we intentionally
> > cause a dirty log and thus xfs_repair will require to zap it? If so, the
> > wording above and the comment in xfs_health_unmount() could be a bit
> > more specific on the reasoning.
> 
> Sort of the opposite?  We want to suggest xfs_repair, but we don't want
> to leave the log dirty because that adds the additional step of running
> xfs_repair -L to zap the log.  I get the sense that we don't really want
> to encourage admins to be cavalier about running that...?
> 

Ok.. I guess what sounded funny to me is the suggestion that "we can't"
leave a dirty log and suggest repair to the user. xfs_repair can clearly
handle a dirty log, so it suggested to me that perhaps there was some
other critical side effect I wasn't thinking of that we wanted to avoid
as a result of needing to zap the log.

To be clear, I don't feel strongly about it and still think the change
is reasonable enough (it only matters when something else is broken in
the fs after all). I just wanted to clarify the above, call out the
potential tradeoff in the event that others might have further thoughts
on it, and suggest the full reasoning eventually make it into the commit
log for future reference.

> (Would be nice if we could just port log recovery to repair, but that's
> a whole separate project...)
> 

Or work around the whole "trigger summary recalc via log recovery"
thing, but I guess that might not be trivial either..

> > Also, what exactly is the side effect without this change in place? The
> > user would have to zap the log from xfs_repair, but the somewhat
> > artificial unclean unmount doesn't actually require log recovery to fix
> > up the fs outside of the whole summary counter thing, right? IOW, would
> > the user zapping the log actually lose anything besides the bad summary
> > counter indication?
> 
> The log checkpoints at unmount, right?  So I think it's ok to zap the
> log when its status is "cleanly unmounted but we didn't record the clean
> unmount because we want to force summary recalculation at next mount".
> 

That's what I would expect, but I haven't tested it. :)

Brian

> > I ask just because even though we warn the user to
> > run repair, that doesn't mean they'll actually do it and so it seems
> > there is a bit of a tradeoff in that regard.
> 
> Yeah, it's a pity we can't just run xfs_repair ourselves. :)
> 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > 
> > BTW, I get the following compiler warning on this patch:
> > 
> > In file included from fs/xfs/xfs_trace.h:12,
> >                  from fs/xfs/xfs_health.c:19:
> > fs/xfs/xfs_health.c: In function ‘xfs_health_unmount’:
> > ./include/linux/tracepoint.h:195:6: warning: ‘sick’ may be used uninitialized in this function [-Wmaybe-uninitialized]                                                                                            
> >      ((void(*)(proto))(it_func))(args); \
> >       ^
> > fs/xfs/xfs_health.c:33:16: note: ‘sick’ was declared here
> >   unsigned int  sick;
> 
> <nod> Thanks, will fix that for v2.
> 
> --D
> 
> > Brian
> > 
> > >  fs/xfs/libxfs/xfs_health.h |    2 +
> > >  fs/xfs/xfs_health.c        |   59 ++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_mount.c         |    2 +
> > >  fs/xfs/xfs_trace.h         |    3 ++
> > >  4 files changed, 66 insertions(+)
> > > 
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
> > > index 0d51bd2689ea..269b124dc1d7 100644
> > > --- a/fs/xfs/libxfs/xfs_health.h
> > > +++ b/fs/xfs/libxfs/xfs_health.h
> > > @@ -148,6 +148,8 @@ void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask);
> > >  void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask);
> > >  unsigned int xfs_inode_measure_sickness(struct xfs_inode *ip);
> > >  
> > > +void xfs_health_unmount(struct xfs_mount *mp);
> > > +
> > >  /* Now some helpers. */
> > >  
> > >  static inline bool
> > > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > > index e9d6859f7501..6e2da858c356 100644
> > > --- a/fs/xfs/xfs_health.c
> > > +++ b/fs/xfs/xfs_health.c
> > > @@ -19,6 +19,65 @@
> > >  #include "xfs_trace.h"
> > >  #include "xfs_health.h"
> > >  
> > > +/*
> > > + * Warn about metadata corruption that we detected but haven't fixed, and
> > > + * make sure we're not sitting on anything that would get in the way of
> > > + * recovery.
> > > + */
> > > +void
> > > +xfs_health_unmount(
> > > +	struct xfs_mount	*mp)
> > > +{
> > > +	struct xfs_perag	*pag;
> > > +	xfs_agnumber_t		agno;
> > > +	unsigned int		sick;
> > > +	bool			warn = false;
> > > +
> > > +	if (XFS_FORCED_SHUTDOWN(mp))
> > > +		return;
> > > +
> > > +	/* Measure AG corruption levels. */
> > > +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> > > +		pag = xfs_perag_get(mp, agno);
> > > +		spin_lock(&pag->pag_state_lock);
> > > +		if (pag->pag_sick) {
> > > +			trace_xfs_ag_unfixed_corruption(mp, agno, sick);
> > > +			warn = true;
> > > +		}
> > > +		spin_unlock(&pag->pag_state_lock);
> > > +		xfs_perag_put(pag);
> > > +	}
> > > +
> > > +	/* Measure realtime volume corruption levels. */
> > > +	sick = xfs_rt_measure_sickness(mp);
> > > +	if (sick) {
> > > +		trace_xfs_rt_unfixed_corruption(mp, sick);
> > > +		warn = true;
> > > +	}
> > > +
> > > +	/* Measure fs corruption and keep the sample around for the warning. */
> > > +	sick = xfs_fs_measure_sickness(mp);
> > > +	if (sick) {
> > > +		trace_xfs_fs_unfixed_corruption(mp, sick);
> > > +		warn = true;
> > > +	}
> > > +
> > > +	if (warn) {
> > > +		xfs_warn(mp,
> > > +"Uncorrected metadata errors detected; please run xfs_repair.");
> > > +
> > > +		/*
> > > +		 * If we have unhealthy metadata, we want the admin to run
> > > +		 * xfs_repair after unmounting.  They can't do that if the log
> > > +		 * is written out without a clean unmount record (such as when
> > > +		 * the summary counters are marked unhealthy to force
> > > +		 * recalculation of the summary counters) so clear it.
> > > +		 */
> > > +		if (sick & XFS_HEALTH_FS_COUNTERS)
> > > +			xfs_fs_mark_healthy(mp, XFS_HEALTH_FS_COUNTERS);
> > > +	}
> > > +}
> > > +
> > >  /* Mark unhealthy per-fs metadata. */
> > >  void
> > >  xfs_fs_mark_sick(
> > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > index a43ca655a431..f0f73d598a0c 100644
> > > --- a/fs/xfs/xfs_mount.c
> > > +++ b/fs/xfs/xfs_mount.c
> > > @@ -1075,6 +1075,7 @@ xfs_mountfs(
> > >  	 */
> > >  	cancel_delayed_work_sync(&mp->m_reclaim_work);
> > >  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> > > +	xfs_health_unmount(mp);
> > >   out_log_dealloc:
> > >  	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
> > >  	xfs_log_mount_cancel(mp);
> > > @@ -1157,6 +1158,7 @@ xfs_unmountfs(
> > >  	 */
> > >  	cancel_delayed_work_sync(&mp->m_reclaim_work);
> > >  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> > > +	xfs_health_unmount(mp);
> > >  
> > >  	xfs_qm_unmount(mp);
> > >  
> > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > index f079841c7af6..2464ea351f83 100644
> > > --- a/fs/xfs/xfs_trace.h
> > > +++ b/fs/xfs/xfs_trace.h
> > > @@ -3461,8 +3461,10 @@ DEFINE_EVENT(xfs_fs_corrupt_class, name,	\
> > >  	TP_ARGS(mp, flags))
> > >  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick);
> > >  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy);
> > > +DEFINE_FS_CORRUPT_EVENT(xfs_fs_unfixed_corruption);
> > >  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick);
> > >  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy);
> > > +DEFINE_FS_CORRUPT_EVENT(xfs_rt_unfixed_corruption);
> > >  
> > >  DECLARE_EVENT_CLASS(xfs_ag_corrupt_class,
> > >  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags),
> > > @@ -3488,6 +3490,7 @@ DEFINE_EVENT(xfs_ag_corrupt_class, name,	\
> > >  	TP_ARGS(mp, agno, flags))
> > >  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick);
> > >  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy);
> > > +DEFINE_AG_CORRUPT_EVENT(xfs_ag_unfixed_corruption);
> > >  
> > >  DECLARE_EVENT_CLASS(xfs_inode_corrupt_class,
> > >  	TP_PROTO(struct xfs_inode *ip, unsigned int flags),
> > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 04/10] xfs: expand xfs_fsop_geom
  2019-04-01 17:10 ` [PATCH 04/10] xfs: expand xfs_fsop_geom Darrick J. Wong
@ 2019-04-02 17:34   ` Brian Foster
  2019-04-02 21:53   ` Dave Chinner
  1 sibling, 0 replies; 41+ messages in thread
From: Brian Foster @ 2019-04-02 17:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Apr 01, 2019 at 10:10:34AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Rename the current (v2-v4) geometry ioctl XFS_IOC_FSGEOMETRY_V2 and
> expand the existing xfs_fsop_geom to reserve empty space for more
> fields.  This means that newly built binaries will pick up the new
> format and existing programs will simply end up in the V2 handler.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_fs.h |   32 +++++++++++++++++++++++++++++++-
>  fs/xfs/libxfs/xfs_sb.c |    5 +++++
>  fs/xfs/xfs_ioctl.c     |   22 ++++++++++++++++++++--
>  fs/xfs/xfs_ioctl32.c   |    1 +
>  4 files changed, 57 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index f3aa59302fef..1dba751cde60 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -148,7 +148,34 @@ typedef struct xfs_fsop_geom_v1 {
>  } xfs_fsop_geom_v1_t;
>  
>  /*
> - * Output for XFS_IOC_FSGEOMETRY
> + * Output for XFS_IOC_FSGEOMETRY_V2
> + */
> +typedef struct xfs_fsop_geom_v2 {
...
> +} xfs_fsop_geom_v2_t;
> +

Do we need the typedef for a new struct?

> +/*
> + * Output for XFS_IOC_FSGEOMETRY (v5)
>   */
>  typedef struct xfs_fsop_geom {
>  	__u32		blocksize;	/* filesystem (data) block size */
...
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index f0309b74e377..c2ca3a816c41 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -1168,6 +1168,11 @@ xfs_fs_geometry(
>  
>  	geo->logsunit = sbp->sb_logsunit;
>  
> +	if (struct_version < 5)
> +		return 0;
> +
> +	geo->version = XFS_FSOP_GEOM_V5;
> +

It's interesting that we've presumably had the version field since
struct_version >= 4, but it's always been set to zero. Now we go and set
it to 5. I'm not sure it really matters, but any idea what's behind
that?

Brian

>  	return 0;
>  }
>  
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 6ecdbb3af7de..7fd8815633dc 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -801,7 +801,7 @@ xfs_ioc_fsgeometry_v1(
>  }
>  
>  STATIC int
> -xfs_ioc_fsgeometry(
> +xfs_ioc_fsgeometry_v2(
>  	xfs_mount_t		*mp,
>  	void			__user *arg)
>  {
> @@ -812,6 +812,23 @@ xfs_ioc_fsgeometry(
>  	if (error)
>  		return error;
>  
> +	if (copy_to_user(arg, &fsgeo, sizeof(struct xfs_fsop_geom_v2)))
> +		return -EFAULT;
> +	return 0;
> +}
> +
> +STATIC int
> +xfs_ioc_fsgeometry(
> +	struct xfs_mount	*mp,
> +	void			__user *arg)
> +{
> +	struct xfs_fsop_geom	fsgeo;
> +	int			error;
> +
> +	error = xfs_fs_geometry(&mp->m_sb, &fsgeo, 5);
> +	if (error)
> +		return error;
> +
>  	if (copy_to_user(arg, &fsgeo, sizeof(fsgeo)))
>  		return -EFAULT;
>  	return 0;
> @@ -1938,7 +1955,8 @@ xfs_file_ioctl(
>  
>  	case XFS_IOC_FSGEOMETRY_V1:
>  		return xfs_ioc_fsgeometry_v1(mp, arg);
> -
> +	case XFS_IOC_FSGEOMETRY_V2:
> +		return xfs_ioc_fsgeometry_v2(mp, arg);
>  	case XFS_IOC_FSGEOMETRY:
>  		return xfs_ioc_fsgeometry(mp, arg);
>  
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index 5001dca361e9..323cfd4b15dc 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
> @@ -561,6 +561,7 @@ xfs_file_compat_ioctl(
>  	switch (cmd) {
>  	/* No size or alignment issues on any arch */
>  	case XFS_IOC_DIOINFO:
> +	case XFS_IOC_FSGEOMETRY_V2:
>  	case XFS_IOC_FSGEOMETRY:
>  	case XFS_IOC_FSGETXATTR:
>  	case XFS_IOC_FSSETXATTR:
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 05/10] xfs: add a new ioctl to describe allocation group geometry
  2019-04-01 17:10 ` [PATCH 05/10] xfs: add a new ioctl to describe allocation group geometry Darrick J. Wong
@ 2019-04-02 17:34   ` Brian Foster
  2019-04-02 21:35     ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-02 17:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Apr 01, 2019 at 10:10:40AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Add a new ioctl to describe an allocation group's geometry.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Just a couple nits...

>  fs/xfs/libxfs/xfs_ag.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_ag.h |    2 ++
>  fs/xfs/libxfs/xfs_fs.h |   14 ++++++++++++++
>  fs/xfs/xfs_ioctl.c     |   24 ++++++++++++++++++++++++
>  fs/xfs/xfs_ioctl32.c   |    1 +
>  5 files changed, 89 insertions(+)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index 1ef8acf35e7d..1679e37fe28d 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -19,6 +19,7 @@
>  #include "xfs_ialloc.h"
>  #include "xfs_rmap.h"
>  #include "xfs_ag.h"
> +#include "xfs_ag_resv.h"
>  
>  static struct xfs_buf *
>  xfs_get_aghdr_buf(
> @@ -461,3 +462,50 @@ xfs_ag_extend_space(
>  				len, &XFS_RMAP_OINFO_SKIP_UPDATE,
>  				XFS_AG_RESV_NONE);
>  }
> +
> +/* Retrieve AG geometry. */
> +int
> +xfs_ag_get_geometry(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno,
> +	struct xfs_ag_geometry	*ageo)
> +{
> +	struct xfs_buf		*bp;
> +	struct xfs_agi		*agi;
> +	struct xfs_agf		*agf;
> +	struct xfs_perag	*pag;
> +	unsigned int		freeblks;
> +	int			error;
> +
> +	memset(ageo, 0, sizeof(*ageo));
> +
> +	if (agno >= mp->m_sb.sb_agcount)
> +		return -EINVAL;
> +

I'd probably error check prior to the memset().

> +	error = xfs_ialloc_read_agi(mp, NULL, agno, &bp);
> +	if (error)
> +		return error;
> +
> +	agi = XFS_BUF_TO_AGI(bp);
> +	ageo->ag_icount = be32_to_cpu(agi->agi_count);
> +	ageo->ag_ifree = be32_to_cpu(agi->agi_freecount);
> +	xfs_buf_relse(bp);
> +
> +	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &bp);
> +	if (error)
> +		return error;
> +
> +	agf = XFS_BUF_TO_AGF(bp);
> +	pag = xfs_perag_get(mp, agno);
> +	ageo->ag_length = be32_to_cpu(agf->agf_length);
> +	freeblks = pag->pagf_freeblks +
> +		   pag->pagf_flcount +
> +		   pag->pagf_btreeblks -
> +		   xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE);
> +	ageo->ag_freeblks = freeblks;
> +	xfs_perag_put(pag);
> +	xfs_buf_relse(bp);
> +

I wonder if we should lock down the agi and agf together to prevent any
potential incoherent reporting between them..? For example, suppose we
collect the agi values, release the agi and lose a race to the agf by an
inode allocator who consumes more free blocks before we ultimately read
the agf values and return.

Brian

> +	ageo->ag_number = agno;
> +	return 0;
> +}
> diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> index 412702e23f61..5166322807e7 100644
> --- a/fs/xfs/libxfs/xfs_ag.h
> +++ b/fs/xfs/libxfs/xfs_ag.h
> @@ -26,5 +26,7 @@ struct aghdr_init_data {
>  int xfs_ag_init_headers(struct xfs_mount *mp, struct aghdr_init_data *id);
>  int xfs_ag_extend_space(struct xfs_mount *mp, struct xfs_trans *tp,
>  			struct aghdr_init_data *id, xfs_extlen_t len);
> +int xfs_ag_get_geometry(struct xfs_mount *mp, xfs_agnumber_t agno,
> +			struct xfs_ag_geometry *ageo);
>  
>  #endif /* __LIBXFS_AG_H */
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 1dba751cde60..87226e00e7bd 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -266,6 +266,19 @@ typedef struct xfs_fsop_resblks {
>  #define XFS_MIN_DBLOCKS(s) ((xfs_rfsblock_t)((s)->sb_agcount - 1) *	\
>  			 (s)->sb_agblocks + XFS_MIN_AG_BLOCKS)
>  
> +/*
> + * Output for XFS_IOC_AG_GEOMETRY
> + */
> +struct xfs_ag_geometry {
> +	__u32		ag_number;	/* i/o: AG number */
> +	__u32		ag_length;	/* o: length in blocks */
> +	__u32		ag_freeblks;	/* o: free space */
> +	__u32		ag_icount;	/* o: inodes allocated */
> +	__u32		ag_ifree;	/* o: inodes free */
> +	__u32		ag_reserved32;	/* o: zero */
> +	__u64		ag_reserved[5];	/* o: zero */
> +};
> +
>  /*
>   * Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT
>   */
> @@ -619,6 +632,7 @@ struct xfs_scrub_metadata {
>  #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
>  /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
>  #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
> +#define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
>  
>  /*
>   * ioctl commands that replace IRIX syssgi()'s
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 7fd8815633dc..b5918ce656bd 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -33,6 +33,7 @@
>  #include "xfs_fsmap.h"
>  #include "scrub/xfs_scrub.h"
>  #include "xfs_sb.h"
> +#include "xfs_ag.h"
>  
>  #include <linux/capability.h>
>  #include <linux/cred.h>
> @@ -834,6 +835,26 @@ xfs_ioc_fsgeometry(
>  	return 0;
>  }
>  
> +STATIC int
> +xfs_ioc_ag_geometry(
> +	struct xfs_mount	*mp,
> +	void			__user *arg)
> +{
> +	struct xfs_ag_geometry	ageo;
> +	int			error;
> +
> +	if (copy_from_user(&ageo, arg, sizeof(ageo)))
> +		return -EFAULT;
> +
> +	error = xfs_ag_get_geometry(mp, ageo.ag_number, &ageo);
> +	if (error)
> +		return error;
> +
> +	if (copy_to_user(arg, &ageo, sizeof(ageo)))
> +		return -EFAULT;
> +	return 0;
> +}
> +
>  /*
>   * Linux extended inode flags interface.
>   */
> @@ -1960,6 +1981,9 @@ xfs_file_ioctl(
>  	case XFS_IOC_FSGEOMETRY:
>  		return xfs_ioc_fsgeometry(mp, arg);
>  
> +	case XFS_IOC_AG_GEOMETRY:
> +		return xfs_ioc_ag_geometry(mp, arg);
> +
>  	case XFS_IOC_GETVERSION:
>  		return put_user(inode->i_generation, (int __user *)arg);
>  
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index 323cfd4b15dc..28d2110dd871 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
> @@ -563,6 +563,7 @@ xfs_file_compat_ioctl(
>  	case XFS_IOC_DIOINFO:
>  	case XFS_IOC_FSGEOMETRY_V2:
>  	case XFS_IOC_FSGEOMETRY:
> +	case XFS_IOC_AG_GEOMETRY:
>  	case XFS_IOC_FSGETXATTR:
>  	case XFS_IOC_FSSETXATTR:
>  	case XFS_IOC_FSGETXATTRA:
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/10] xfs: report fs and rt health via geometry structure
  2019-04-01 17:10 ` [PATCH 06/10] xfs: report fs and rt health via geometry structure Darrick J. Wong
@ 2019-04-02 17:35   ` Brian Foster
  2019-04-02 18:23     ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-02 17:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Apr 01, 2019 at 10:10:46AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Use our newly expanded geometry structure to report the overall fs and
> realtime health status.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

I kind of wonder whether it's possible (or makes sense) to make the core
sickness bits exportable to userspace and thus avoid some of the
translation code, but that aside this looks fine to me:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_fs.h     |   11 ++++++++++-
>  fs/xfs/libxfs/xfs_health.h |    3 +++
>  fs/xfs/xfs_health.c        |   27 +++++++++++++++++++++++++++
>  fs/xfs/xfs_ioctl.c         |    3 +++
>  4 files changed, 43 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 87226e00e7bd..ddbfde7ff79d 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -199,9 +199,18 @@ typedef struct xfs_fsop_geom {
>  	__u32		rtsectsize;	/* realtime sector size, bytes	*/
>  	__u32		dirblocksize;	/* directory block size, bytes	*/
>  	__u32		logsunit;	/* log stripe unit, bytes */
> -	__u64		reserved[18];	/* reserved space */
> +	__u32		health;		/* o: unhealthy fs & rt metadata */
> +	__u32		reserved32;	/* reserved space */
> +	__u64		reserved[17];	/* reserved space */
>  } xfs_fsop_geom_t;
>  
> +#define XFS_FSOP_GEOM_HEALTH_FS_COUNTERS (1 << 0) /* summary counters */
> +#define XFS_FSOP_GEOM_HEALTH_FS_UQUOTA	(1 << 1)  /* user quota */
> +#define XFS_FSOP_GEOM_HEALTH_FS_GQUOTA	(1 << 2)  /* group quota */
> +#define XFS_FSOP_GEOM_HEALTH_FS_PQUOTA	(1 << 3)  /* project quota */
> +#define XFS_FSOP_GEOM_HEALTH_RT_BITMAP	(1 << 4)  /* realtime bitmap */
> +#define XFS_FSOP_GEOM_HEALTH_RT_SUMMARY	(1 << 5)  /* realtime summary */
> +
>  /* Output for XFS_FS_COUNTS */
>  typedef struct xfs_fsop_counts {
>  	__u64	freedata;	/* free data section blocks */
> diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
> index 269b124dc1d7..36736d54a3e3 100644
> --- a/fs/xfs/libxfs/xfs_health.h
> +++ b/fs/xfs/libxfs/xfs_health.h
> @@ -39,6 +39,7 @@
>  struct xfs_mount;
>  struct xfs_perag;
>  struct xfs_inode;
> +struct xfs_fsop_geom;
>  
>  /* Observable health issues for metadata spanning the entire filesystem. */
>  #define XFS_HEALTH_FS_COUNTERS	(1 << 0)  /* summary counters */
> @@ -200,4 +201,6 @@ xfs_inode_healthy(struct xfs_inode *ip)
>  	return xfs_inode_measure_sickness(ip) == 0;
>  }
>  
> +void xfs_fsop_geom_health(struct xfs_mount *mp, struct xfs_fsop_geom *geo);
> +
>  #endif	/* __XFS_HEALTH_H__ */
> diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> index 6e2da858c356..151c98693bef 100644
> --- a/fs/xfs/xfs_health.c
> +++ b/fs/xfs/xfs_health.c
> @@ -249,3 +249,30 @@ xfs_inode_measure_sickness(
>  	spin_unlock(&ip->i_flags_lock);
>  	return ret;
>  }
> +
> +/* Fill out fs geometry health info. */
> +void
> +xfs_fsop_geom_health(
> +	struct xfs_mount	*mp,
> +	struct xfs_fsop_geom	*geo)
> +{
> +	unsigned int		sick;
> +
> +	geo->health = 0;
> +
> +	sick = xfs_fs_measure_sickness(mp);
> +	if (sick & XFS_HEALTH_FS_COUNTERS)
> +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_COUNTERS;
> +	if (sick & XFS_HEALTH_FS_UQUOTA)
> +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_UQUOTA;
> +	if (sick & XFS_HEALTH_FS_GQUOTA)
> +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_GQUOTA;
> +	if (sick & XFS_HEALTH_FS_PQUOTA)
> +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_PQUOTA;
> +
> +	sick = xfs_rt_measure_sickness(mp);
> +	if (sick & XFS_HEALTH_RT_BITMAP)
> +		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_BITMAP;
> +	if (sick & XFS_HEALTH_RT_SUMMARY)
> +		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY;
> +}
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index b5918ce656bd..f9bf11b6a055 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -34,6 +34,7 @@
>  #include "scrub/xfs_scrub.h"
>  #include "xfs_sb.h"
>  #include "xfs_ag.h"
> +#include "xfs_health.h"
>  
>  #include <linux/capability.h>
>  #include <linux/cred.h>
> @@ -830,6 +831,8 @@ xfs_ioc_fsgeometry(
>  	if (error)
>  		return error;
>  
> +	xfs_fsop_geom_health(mp, &fsgeo);
> +
>  	if (copy_to_user(arg, &fsgeo, sizeof(fsgeo)))
>  		return -EFAULT;
>  	return 0;
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem
  2019-04-02 13:53       ` Brian Foster
@ 2019-04-02 18:16         ` Darrick J. Wong
  2019-04-02 18:32           ` Brian Foster
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-02 18:16 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Apr 02, 2019 at 09:53:41AM -0400, Brian Foster wrote:
> On Tue, Apr 02, 2019 at 06:40:10AM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 02, 2019 at 09:24:45AM -0400, Brian Foster wrote:
> > > On Mon, Apr 01, 2019 at 10:10:28AM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > If we know the filesystem metadata isn't healthy during unmount, we want
> > > > to encourage the administrator to run xfs_repair right away.  We can't
> > > > do this if BAD_SUMMARY will cause an unclean log unmount to force
> > > > summary recalculation, so turn it off if the fs is bad.
> > > > 
> > > 
> > > Do you mean we don't want to suggest xfs_repair because we intentionally
> > > cause a dirty log and thus xfs_repair will require to zap it? If so, the
> > > wording above and the comment in xfs_health_unmount() could be a bit
> > > more specific on the reasoning.
> > 
> > Sort of the opposite?  We want to suggest xfs_repair, but we don't want
> > to leave the log dirty because that adds the additional step of running
> > xfs_repair -L to zap the log.  I get the sense that we don't really want
> > to encourage admins to be cavalier about running that...?
> > 
> 
> Ok.. I guess what sounded funny to me is the suggestion that "we can't"
> leave a dirty log and suggest repair to the user. xfs_repair can clearly
> handle a dirty log, so it suggested to me that perhaps there was some
> other critical side effect I wasn't thinking of that we wanted to avoid
> as a result of needing to zap the log.
> 
> To be clear, I don't feel strongly about it and still think the change
> is reasonable enough (it only matters when something else is broken in
> the fs after all). I just wanted to clarify the above, call out the
> potential tradeoff in the event that others might have further thoughts
> on it, and suggest the full reasoning eventually make it into the commit
> log for future reference.

Ahh, I see the confusion here.  What if I reworked the comment:

/*
 * We discovered uncorrected metadata problems at some point during this
 * filesystem mount and have advised the administrator to run repair
 * once the unmount completes.
 *
 * However, we must be careful -- when FSCOUNTERS are flagged unhealthy,
 * the unmount procedure omits writing the clean unmount record to the
 * log so that the next mount will run recovery and recompute the
 * summary counters.  In other words, we leave a dirty log to get the
 * counters fixed.
 *
 * Unfortunately, xfs_repair cannot recover dirty logs, so if there were
 * filesystem problems, FSCOUNTERS was flagged, and the administrator
 * takes our advice to run xfs_repair, they'll have to zap the log
 * before repairing structures.  We don't really want to encourage this,
 * so we mark the FSCOUNTERS healthy so that a subsequent repair run
 * won't see a dirty log.
 */

Also the "if (sick)" check needs to mask off FSCOUNTERS; I'll fix that
too.

> > (Would be nice if we could just port log recovery to repair, but that's
> > a whole separate project...)
> > 
> 
> Or work around the whole "trigger summary recalc via log recovery"
> thing, but I guess that might not be trivial either..

Well I do have a prototype fscounters scrubber and repairer, so in the
future we might be able to avoid this dirty log dance.

> > > Also, what exactly is the side effect without this change in place? The
> > > user would have to zap the log from xfs_repair, but the somewhat
> > > artificial unclean unmount doesn't actually require log recovery to fix
> > > up the fs outside of the whole summary counter thing, right? IOW, would
> > > the user zapping the log actually lose anything besides the bad summary
> > > counter indication?
> > 
> > The log checkpoints at unmount, right?  So I think it's ok to zap the
> > log when its status is "cleanly unmounted but we didn't record the clean
> > unmount because we want to force summary recalculation at next mount".
> > 
> 
> That's what I would expect, but I haven't tested it. :)

Technically, I have, what with this obnoxious "hard shutdowns are a
normal part of our workflow" user case I've been wrangling with... the
clean umounts /do/ seem to leave a clean log.

--D

> Brian
> 
> > > I ask just because even though we warn the user to
> > > run repair, that doesn't mean they'll actually do it and so it seems
> > > there is a bit of a tradeoff in that regard.
> > 
> > Yeah, it's a pity we can't just run xfs_repair ourselves. :)
> > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > 
> > > BTW, I get the following compiler warning on this patch:
> > > 
> > > In file included from fs/xfs/xfs_trace.h:12,
> > >                  from fs/xfs/xfs_health.c:19:
> > > fs/xfs/xfs_health.c: In function ‘xfs_health_unmount’:
> > > ./include/linux/tracepoint.h:195:6: warning: ‘sick’ may be used uninitialized in this function [-Wmaybe-uninitialized]                                                                                            
> > >      ((void(*)(proto))(it_func))(args); \
> > >       ^
> > > fs/xfs/xfs_health.c:33:16: note: ‘sick’ was declared here
> > >   unsigned int  sick;
> > 
> > <nod> Thanks, will fix that for v2.
> > 
> > --D
> > 
> > > Brian
> > > 
> > > >  fs/xfs/libxfs/xfs_health.h |    2 +
> > > >  fs/xfs/xfs_health.c        |   59 ++++++++++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/xfs_mount.c         |    2 +
> > > >  fs/xfs/xfs_trace.h         |    3 ++
> > > >  4 files changed, 66 insertions(+)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
> > > > index 0d51bd2689ea..269b124dc1d7 100644
> > > > --- a/fs/xfs/libxfs/xfs_health.h
> > > > +++ b/fs/xfs/libxfs/xfs_health.h
> > > > @@ -148,6 +148,8 @@ void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask);
> > > >  void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask);
> > > >  unsigned int xfs_inode_measure_sickness(struct xfs_inode *ip);
> > > >  
> > > > +void xfs_health_unmount(struct xfs_mount *mp);
> > > > +
> > > >  /* Now some helpers. */
> > > >  
> > > >  static inline bool
> > > > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > > > index e9d6859f7501..6e2da858c356 100644
> > > > --- a/fs/xfs/xfs_health.c
> > > > +++ b/fs/xfs/xfs_health.c
> > > > @@ -19,6 +19,65 @@
> > > >  #include "xfs_trace.h"
> > > >  #include "xfs_health.h"
> > > >  
> > > > +/*
> > > > + * Warn about metadata corruption that we detected but haven't fixed, and
> > > > + * make sure we're not sitting on anything that would get in the way of
> > > > + * recovery.
> > > > + */
> > > > +void
> > > > +xfs_health_unmount(
> > > > +	struct xfs_mount	*mp)
> > > > +{
> > > > +	struct xfs_perag	*pag;
> > > > +	xfs_agnumber_t		agno;
> > > > +	unsigned int		sick;
> > > > +	bool			warn = false;
> > > > +
> > > > +	if (XFS_FORCED_SHUTDOWN(mp))
> > > > +		return;
> > > > +
> > > > +	/* Measure AG corruption levels. */
> > > > +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> > > > +		pag = xfs_perag_get(mp, agno);
> > > > +		spin_lock(&pag->pag_state_lock);
> > > > +		if (pag->pag_sick) {
> > > > +			trace_xfs_ag_unfixed_corruption(mp, agno, sick);
> > > > +			warn = true;
> > > > +		}
> > > > +		spin_unlock(&pag->pag_state_lock);
> > > > +		xfs_perag_put(pag);
> > > > +	}
> > > > +
> > > > +	/* Measure realtime volume corruption levels. */
> > > > +	sick = xfs_rt_measure_sickness(mp);
> > > > +	if (sick) {
> > > > +		trace_xfs_rt_unfixed_corruption(mp, sick);
> > > > +		warn = true;
> > > > +	}
> > > > +
> > > > +	/* Measure fs corruption and keep the sample around for the warning. */
> > > > +	sick = xfs_fs_measure_sickness(mp);
> > > > +	if (sick) {
> > > > +		trace_xfs_fs_unfixed_corruption(mp, sick);
> > > > +		warn = true;
> > > > +	}
> > > > +
> > > > +	if (warn) {
> > > > +		xfs_warn(mp,
> > > > +"Uncorrected metadata errors detected; please run xfs_repair.");
> > > > +
> > > > +		/*
> > > > +		 * If we have unhealthy metadata, we want the admin to run
> > > > +		 * xfs_repair after unmounting.  They can't do that if the log
> > > > +		 * is written out without a clean unmount record (such as when
> > > > +		 * the summary counters are marked unhealthy to force
> > > > +		 * recalculation of the summary counters) so clear it.
> > > > +		 */
> > > > +		if (sick & XFS_HEALTH_FS_COUNTERS)
> > > > +			xfs_fs_mark_healthy(mp, XFS_HEALTH_FS_COUNTERS);
> > > > +	}
> > > > +}
> > > > +
> > > >  /* Mark unhealthy per-fs metadata. */
> > > >  void
> > > >  xfs_fs_mark_sick(
> > > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > > index a43ca655a431..f0f73d598a0c 100644
> > > > --- a/fs/xfs/xfs_mount.c
> > > > +++ b/fs/xfs/xfs_mount.c
> > > > @@ -1075,6 +1075,7 @@ xfs_mountfs(
> > > >  	 */
> > > >  	cancel_delayed_work_sync(&mp->m_reclaim_work);
> > > >  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> > > > +	xfs_health_unmount(mp);
> > > >   out_log_dealloc:
> > > >  	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
> > > >  	xfs_log_mount_cancel(mp);
> > > > @@ -1157,6 +1158,7 @@ xfs_unmountfs(
> > > >  	 */
> > > >  	cancel_delayed_work_sync(&mp->m_reclaim_work);
> > > >  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> > > > +	xfs_health_unmount(mp);
> > > >  
> > > >  	xfs_qm_unmount(mp);
> > > >  
> > > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > > index f079841c7af6..2464ea351f83 100644
> > > > --- a/fs/xfs/xfs_trace.h
> > > > +++ b/fs/xfs/xfs_trace.h
> > > > @@ -3461,8 +3461,10 @@ DEFINE_EVENT(xfs_fs_corrupt_class, name,	\
> > > >  	TP_ARGS(mp, flags))
> > > >  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick);
> > > >  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy);
> > > > +DEFINE_FS_CORRUPT_EVENT(xfs_fs_unfixed_corruption);
> > > >  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick);
> > > >  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy);
> > > > +DEFINE_FS_CORRUPT_EVENT(xfs_rt_unfixed_corruption);
> > > >  
> > > >  DECLARE_EVENT_CLASS(xfs_ag_corrupt_class,
> > > >  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags),
> > > > @@ -3488,6 +3490,7 @@ DEFINE_EVENT(xfs_ag_corrupt_class, name,	\
> > > >  	TP_ARGS(mp, agno, flags))
> > > >  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick);
> > > >  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy);
> > > > +DEFINE_AG_CORRUPT_EVENT(xfs_ag_unfixed_corruption);
> > > >  
> > > >  DECLARE_EVENT_CLASS(xfs_inode_corrupt_class,
> > > >  	TP_PROTO(struct xfs_inode *ip, unsigned int flags),
> > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/10] xfs: report fs and rt health via geometry structure
  2019-04-02 17:35   ` Brian Foster
@ 2019-04-02 18:23     ` Darrick J. Wong
  2019-04-02 23:34       ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-02 18:23 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Apr 02, 2019 at 01:35:04PM -0400, Brian Foster wrote:
> On Mon, Apr 01, 2019 at 10:10:46AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Use our newly expanded geometry structure to report the overall fs and
> > realtime health status.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> I kind of wonder whether it's possible (or makes sense) to make the core
> sickness bits exportable to userspace and thus avoid some of the
> translation code, but that aside this looks fine to me:

Not sure -- for now I went with having separate bits and mapping
functions so that the internal implementation doesn't get fused to the
userspace interface, but seeing as the defines are the same we really
could just copy straight from the incore structure and have a bunch of
BUILD_BUG_ON(internal bit == ioctl bit) until they diverge more.

--D

> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> >  fs/xfs/libxfs/xfs_fs.h     |   11 ++++++++++-
> >  fs/xfs/libxfs/xfs_health.h |    3 +++
> >  fs/xfs/xfs_health.c        |   27 +++++++++++++++++++++++++++
> >  fs/xfs/xfs_ioctl.c         |    3 +++
> >  4 files changed, 43 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index 87226e00e7bd..ddbfde7ff79d 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -199,9 +199,18 @@ typedef struct xfs_fsop_geom {
> >  	__u32		rtsectsize;	/* realtime sector size, bytes	*/
> >  	__u32		dirblocksize;	/* directory block size, bytes	*/
> >  	__u32		logsunit;	/* log stripe unit, bytes */
> > -	__u64		reserved[18];	/* reserved space */
> > +	__u32		health;		/* o: unhealthy fs & rt metadata */
> > +	__u32		reserved32;	/* reserved space */
> > +	__u64		reserved[17];	/* reserved space */
> >  } xfs_fsop_geom_t;
> >  
> > +#define XFS_FSOP_GEOM_HEALTH_FS_COUNTERS (1 << 0) /* summary counters */
> > +#define XFS_FSOP_GEOM_HEALTH_FS_UQUOTA	(1 << 1)  /* user quota */
> > +#define XFS_FSOP_GEOM_HEALTH_FS_GQUOTA	(1 << 2)  /* group quota */
> > +#define XFS_FSOP_GEOM_HEALTH_FS_PQUOTA	(1 << 3)  /* project quota */
> > +#define XFS_FSOP_GEOM_HEALTH_RT_BITMAP	(1 << 4)  /* realtime bitmap */
> > +#define XFS_FSOP_GEOM_HEALTH_RT_SUMMARY	(1 << 5)  /* realtime summary */
> > +
> >  /* Output for XFS_FS_COUNTS */
> >  typedef struct xfs_fsop_counts {
> >  	__u64	freedata;	/* free data section blocks */
> > diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
> > index 269b124dc1d7..36736d54a3e3 100644
> > --- a/fs/xfs/libxfs/xfs_health.h
> > +++ b/fs/xfs/libxfs/xfs_health.h
> > @@ -39,6 +39,7 @@
> >  struct xfs_mount;
> >  struct xfs_perag;
> >  struct xfs_inode;
> > +struct xfs_fsop_geom;
> >  
> >  /* Observable health issues for metadata spanning the entire filesystem. */
> >  #define XFS_HEALTH_FS_COUNTERS	(1 << 0)  /* summary counters */
> > @@ -200,4 +201,6 @@ xfs_inode_healthy(struct xfs_inode *ip)
> >  	return xfs_inode_measure_sickness(ip) == 0;
> >  }
> >  
> > +void xfs_fsop_geom_health(struct xfs_mount *mp, struct xfs_fsop_geom *geo);
> > +
> >  #endif	/* __XFS_HEALTH_H__ */
> > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > index 6e2da858c356..151c98693bef 100644
> > --- a/fs/xfs/xfs_health.c
> > +++ b/fs/xfs/xfs_health.c
> > @@ -249,3 +249,30 @@ xfs_inode_measure_sickness(
> >  	spin_unlock(&ip->i_flags_lock);
> >  	return ret;
> >  }
> > +
> > +/* Fill out fs geometry health info. */
> > +void
> > +xfs_fsop_geom_health(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_fsop_geom	*geo)
> > +{
> > +	unsigned int		sick;
> > +
> > +	geo->health = 0;
> > +
> > +	sick = xfs_fs_measure_sickness(mp);
> > +	if (sick & XFS_HEALTH_FS_COUNTERS)
> > +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_COUNTERS;
> > +	if (sick & XFS_HEALTH_FS_UQUOTA)
> > +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_UQUOTA;
> > +	if (sick & XFS_HEALTH_FS_GQUOTA)
> > +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_GQUOTA;
> > +	if (sick & XFS_HEALTH_FS_PQUOTA)
> > +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_PQUOTA;
> > +
> > +	sick = xfs_rt_measure_sickness(mp);
> > +	if (sick & XFS_HEALTH_RT_BITMAP)
> > +		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_BITMAP;
> > +	if (sick & XFS_HEALTH_RT_SUMMARY)
> > +		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY;
> > +}
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index b5918ce656bd..f9bf11b6a055 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -34,6 +34,7 @@
> >  #include "scrub/xfs_scrub.h"
> >  #include "xfs_sb.h"
> >  #include "xfs_ag.h"
> > +#include "xfs_health.h"
> >  
> >  #include <linux/capability.h>
> >  #include <linux/cred.h>
> > @@ -830,6 +831,8 @@ xfs_ioc_fsgeometry(
> >  	if (error)
> >  		return error;
> >  
> > +	xfs_fsop_geom_health(mp, &fsgeo);
> > +
> >  	if (copy_to_user(arg, &fsgeo, sizeof(fsgeo)))
> >  		return -EFAULT;
> >  	return 0;
> > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem
  2019-04-02 18:16         ` Darrick J. Wong
@ 2019-04-02 18:32           ` Brian Foster
  0 siblings, 0 replies; 41+ messages in thread
From: Brian Foster @ 2019-04-02 18:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 02, 2019 at 11:16:14AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 02, 2019 at 09:53:41AM -0400, Brian Foster wrote:
> > On Tue, Apr 02, 2019 at 06:40:10AM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 02, 2019 at 09:24:45AM -0400, Brian Foster wrote:
> > > > On Mon, Apr 01, 2019 at 10:10:28AM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > 
> > > > > If we know the filesystem metadata isn't healthy during unmount, we want
> > > > > to encourage the administrator to run xfs_repair right away.  We can't
> > > > > do this if BAD_SUMMARY will cause an unclean log unmount to force
> > > > > summary recalculation, so turn it off if the fs is bad.
> > > > > 
> > > > 
> > > > Do you mean we don't want to suggest xfs_repair because we intentionally
> > > > cause a dirty log and thus xfs_repair will require to zap it? If so, the
> > > > wording above and the comment in xfs_health_unmount() could be a bit
> > > > more specific on the reasoning.
> > > 
> > > Sort of the opposite?  We want to suggest xfs_repair, but we don't want
> > > to leave the log dirty because that adds the additional step of running
> > > xfs_repair -L to zap the log.  I get the sense that we don't really want
> > > to encourage admins to be cavalier about running that...?
> > > 
> > 
> > Ok.. I guess what sounded funny to me is the suggestion that "we can't"
> > leave a dirty log and suggest repair to the user. xfs_repair can clearly
> > handle a dirty log, so it suggested to me that perhaps there was some
> > other critical side effect I wasn't thinking of that we wanted to avoid
> > as a result of needing to zap the log.
> > 
> > To be clear, I don't feel strongly about it and still think the change
> > is reasonable enough (it only matters when something else is broken in
> > the fs after all). I just wanted to clarify the above, call out the
> > potential tradeoff in the event that others might have further thoughts
> > on it, and suggest the full reasoning eventually make it into the commit
> > log for future reference.
> 
> Ahh, I see the confusion here.  What if I reworked the comment:
> 
> /*
>  * We discovered uncorrected metadata problems at some point during this
>  * filesystem mount and have advised the administrator to run repair
>  * once the unmount completes.
>  *
>  * However, we must be careful -- when FSCOUNTERS are flagged unhealthy,
>  * the unmount procedure omits writing the clean unmount record to the
>  * log so that the next mount will run recovery and recompute the
>  * summary counters.  In other words, we leave a dirty log to get the
>  * counters fixed.
>  *
>  * Unfortunately, xfs_repair cannot recover dirty logs, so if there were
>  * filesystem problems, FSCOUNTERS was flagged, and the administrator
>  * takes our advice to run xfs_repair, they'll have to zap the log
>  * before repairing structures.  We don't really want to encourage this,
>  * so we mark the FSCOUNTERS healthy so that a subsequent repair run
>  * won't see a dirty log.
>  */
> 

Yep, that sounds good to me.

> Also the "if (sick)" check needs to mask off FSCOUNTERS; I'll fix that
> too.
> 
> > > (Would be nice if we could just port log recovery to repair, but that's
> > > a whole separate project...)
> > > 
> > 
> > Or work around the whole "trigger summary recalc via log recovery"
> > thing, but I guess that might not be trivial either..
> 
> Well I do have a prototype fscounters scrubber and repairer, so in the
> future we might be able to avoid this dirty log dance.
> 

Ok, perhaps that is the ideal solution and the current approach can fall
away with kernels that otherwise don't know how to online repair.

> > > > Also, what exactly is the side effect without this change in place? The
> > > > user would have to zap the log from xfs_repair, but the somewhat
> > > > artificial unclean unmount doesn't actually require log recovery to fix
> > > > up the fs outside of the whole summary counter thing, right? IOW, would
> > > > the user zapping the log actually lose anything besides the bad summary
> > > > counter indication?
> > > 
> > > The log checkpoints at unmount, right?  So I think it's ok to zap the
> > > log when its status is "cleanly unmounted but we didn't record the clean
> > > unmount because we want to force summary recalculation at next mount".
> > > 
> > 
> > That's what I would expect, but I haven't tested it. :)
> 
> Technically, I have, what with this obnoxious "hard shutdowns are a
> normal part of our workflow" user case I've been wrangling with... the
> clean umounts /do/ seem to leave a clean log.
> 

Heh, good to know. Thanks!

Brian

> --D
> 
> > Brian
> > 
> > > > I ask just because even though we warn the user to
> > > > run repair, that doesn't mean they'll actually do it and so it seems
> > > > there is a bit of a tradeoff in that regard.
> > > 
> > > Yeah, it's a pity we can't just run xfs_repair ourselves. :)
> > > 
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > ---
> > > > 
> > > > BTW, I get the following compiler warning on this patch:
> > > > 
> > > > In file included from fs/xfs/xfs_trace.h:12,
> > > >                  from fs/xfs/xfs_health.c:19:
> > > > fs/xfs/xfs_health.c: In function ‘xfs_health_unmount’:
> > > > ./include/linux/tracepoint.h:195:6: warning: ‘sick’ may be used uninitialized in this function [-Wmaybe-uninitialized]                                                                                            
> > > >      ((void(*)(proto))(it_func))(args); \
> > > >       ^
> > > > fs/xfs/xfs_health.c:33:16: note: ‘sick’ was declared here
> > > >   unsigned int  sick;
> > > 
> > > <nod> Thanks, will fix that for v2.
> > > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > > >  fs/xfs/libxfs/xfs_health.h |    2 +
> > > > >  fs/xfs/xfs_health.c        |   59 ++++++++++++++++++++++++++++++++++++++++++++
> > > > >  fs/xfs/xfs_mount.c         |    2 +
> > > > >  fs/xfs/xfs_trace.h         |    3 ++
> > > > >  4 files changed, 66 insertions(+)
> > > > > 
> > > > > 
> > > > > diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
> > > > > index 0d51bd2689ea..269b124dc1d7 100644
> > > > > --- a/fs/xfs/libxfs/xfs_health.h
> > > > > +++ b/fs/xfs/libxfs/xfs_health.h
> > > > > @@ -148,6 +148,8 @@ void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask);
> > > > >  void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask);
> > > > >  unsigned int xfs_inode_measure_sickness(struct xfs_inode *ip);
> > > > >  
> > > > > +void xfs_health_unmount(struct xfs_mount *mp);
> > > > > +
> > > > >  /* Now some helpers. */
> > > > >  
> > > > >  static inline bool
> > > > > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > > > > index e9d6859f7501..6e2da858c356 100644
> > > > > --- a/fs/xfs/xfs_health.c
> > > > > +++ b/fs/xfs/xfs_health.c
> > > > > @@ -19,6 +19,65 @@
> > > > >  #include "xfs_trace.h"
> > > > >  #include "xfs_health.h"
> > > > >  
> > > > > +/*
> > > > > + * Warn about metadata corruption that we detected but haven't fixed, and
> > > > > + * make sure we're not sitting on anything that would get in the way of
> > > > > + * recovery.
> > > > > + */
> > > > > +void
> > > > > +xfs_health_unmount(
> > > > > +	struct xfs_mount	*mp)
> > > > > +{
> > > > > +	struct xfs_perag	*pag;
> > > > > +	xfs_agnumber_t		agno;
> > > > > +	unsigned int		sick;
> > > > > +	bool			warn = false;
> > > > > +
> > > > > +	if (XFS_FORCED_SHUTDOWN(mp))
> > > > > +		return;
> > > > > +
> > > > > +	/* Measure AG corruption levels. */
> > > > > +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> > > > > +		pag = xfs_perag_get(mp, agno);
> > > > > +		spin_lock(&pag->pag_state_lock);
> > > > > +		if (pag->pag_sick) {
> > > > > +			trace_xfs_ag_unfixed_corruption(mp, agno, sick);
> > > > > +			warn = true;
> > > > > +		}
> > > > > +		spin_unlock(&pag->pag_state_lock);
> > > > > +		xfs_perag_put(pag);
> > > > > +	}
> > > > > +
> > > > > +	/* Measure realtime volume corruption levels. */
> > > > > +	sick = xfs_rt_measure_sickness(mp);
> > > > > +	if (sick) {
> > > > > +		trace_xfs_rt_unfixed_corruption(mp, sick);
> > > > > +		warn = true;
> > > > > +	}
> > > > > +
> > > > > +	/* Measure fs corruption and keep the sample around for the warning. */
> > > > > +	sick = xfs_fs_measure_sickness(mp);
> > > > > +	if (sick) {
> > > > > +		trace_xfs_fs_unfixed_corruption(mp, sick);
> > > > > +		warn = true;
> > > > > +	}
> > > > > +
> > > > > +	if (warn) {
> > > > > +		xfs_warn(mp,
> > > > > +"Uncorrected metadata errors detected; please run xfs_repair.");
> > > > > +
> > > > > +		/*
> > > > > +		 * If we have unhealthy metadata, we want the admin to run
> > > > > +		 * xfs_repair after unmounting.  They can't do that if the log
> > > > > +		 * is written out without a clean unmount record (such as when
> > > > > +		 * the summary counters are marked unhealthy to force
> > > > > +		 * recalculation of the summary counters) so clear it.
> > > > > +		 */
> > > > > +		if (sick & XFS_HEALTH_FS_COUNTERS)
> > > > > +			xfs_fs_mark_healthy(mp, XFS_HEALTH_FS_COUNTERS);
> > > > > +	}
> > > > > +}
> > > > > +
> > > > >  /* Mark unhealthy per-fs metadata. */
> > > > >  void
> > > > >  xfs_fs_mark_sick(
> > > > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > > > index a43ca655a431..f0f73d598a0c 100644
> > > > > --- a/fs/xfs/xfs_mount.c
> > > > > +++ b/fs/xfs/xfs_mount.c
> > > > > @@ -1075,6 +1075,7 @@ xfs_mountfs(
> > > > >  	 */
> > > > >  	cancel_delayed_work_sync(&mp->m_reclaim_work);
> > > > >  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> > > > > +	xfs_health_unmount(mp);
> > > > >   out_log_dealloc:
> > > > >  	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
> > > > >  	xfs_log_mount_cancel(mp);
> > > > > @@ -1157,6 +1158,7 @@ xfs_unmountfs(
> > > > >  	 */
> > > > >  	cancel_delayed_work_sync(&mp->m_reclaim_work);
> > > > >  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> > > > > +	xfs_health_unmount(mp);
> > > > >  
> > > > >  	xfs_qm_unmount(mp);
> > > > >  
> > > > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > > > index f079841c7af6..2464ea351f83 100644
> > > > > --- a/fs/xfs/xfs_trace.h
> > > > > +++ b/fs/xfs/xfs_trace.h
> > > > > @@ -3461,8 +3461,10 @@ DEFINE_EVENT(xfs_fs_corrupt_class, name,	\
> > > > >  	TP_ARGS(mp, flags))
> > > > >  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick);
> > > > >  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy);
> > > > > +DEFINE_FS_CORRUPT_EVENT(xfs_fs_unfixed_corruption);
> > > > >  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick);
> > > > >  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy);
> > > > > +DEFINE_FS_CORRUPT_EVENT(xfs_rt_unfixed_corruption);
> > > > >  
> > > > >  DECLARE_EVENT_CLASS(xfs_ag_corrupt_class,
> > > > >  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags),
> > > > > @@ -3488,6 +3490,7 @@ DEFINE_EVENT(xfs_ag_corrupt_class, name,	\
> > > > >  	TP_ARGS(mp, agno, flags))
> > > > >  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick);
> > > > >  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy);
> > > > > +DEFINE_AG_CORRUPT_EVENT(xfs_ag_unfixed_corruption);
> > > > >  
> > > > >  DECLARE_EVENT_CLASS(xfs_inode_corrupt_class,
> > > > >  	TP_PROTO(struct xfs_inode *ip, unsigned int flags),
> > > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 05/10] xfs: add a new ioctl to describe allocation group geometry
  2019-04-02 17:34   ` Brian Foster
@ 2019-04-02 21:35     ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-02 21:35 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Apr 02, 2019 at 01:34:46PM -0400, Brian Foster wrote:
> On Mon, Apr 01, 2019 at 10:10:40AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Add a new ioctl to describe an allocation group's geometry.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> Just a couple nits...
> 
> >  fs/xfs/libxfs/xfs_ag.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_ag.h |    2 ++
> >  fs/xfs/libxfs/xfs_fs.h |   14 ++++++++++++++
> >  fs/xfs/xfs_ioctl.c     |   24 ++++++++++++++++++++++++
> >  fs/xfs/xfs_ioctl32.c   |    1 +
> >  5 files changed, 89 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> > index 1ef8acf35e7d..1679e37fe28d 100644
> > --- a/fs/xfs/libxfs/xfs_ag.c
> > +++ b/fs/xfs/libxfs/xfs_ag.c
> > @@ -19,6 +19,7 @@
> >  #include "xfs_ialloc.h"
> >  #include "xfs_rmap.h"
> >  #include "xfs_ag.h"
> > +#include "xfs_ag_resv.h"
> >  
> >  static struct xfs_buf *
> >  xfs_get_aghdr_buf(
> > @@ -461,3 +462,50 @@ xfs_ag_extend_space(
> >  				len, &XFS_RMAP_OINFO_SKIP_UPDATE,
> >  				XFS_AG_RESV_NONE);
> >  }
> > +
> > +/* Retrieve AG geometry. */
> > +int
> > +xfs_ag_get_geometry(
> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno,
> > +	struct xfs_ag_geometry	*ageo)
> > +{
> > +	struct xfs_buf		*bp;
> > +	struct xfs_agi		*agi;
> > +	struct xfs_agf		*agf;
> > +	struct xfs_perag	*pag;
> > +	unsigned int		freeblks;
> > +	int			error;
> > +
> > +	memset(ageo, 0, sizeof(*ageo));
> > +
> > +	if (agno >= mp->m_sb.sb_agcount)
> > +		return -EINVAL;
> > +
> 
> I'd probably error check prior to the memset().

Ok.

> > +	error = xfs_ialloc_read_agi(mp, NULL, agno, &bp);
> > +	if (error)
> > +		return error;
> > +
> > +	agi = XFS_BUF_TO_AGI(bp);
> > +	ageo->ag_icount = be32_to_cpu(agi->agi_count);
> > +	ageo->ag_ifree = be32_to_cpu(agi->agi_freecount);
> > +	xfs_buf_relse(bp);
> > +
> > +	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &bp);
> > +	if (error)
> > +		return error;
> > +
> > +	agf = XFS_BUF_TO_AGF(bp);
> > +	pag = xfs_perag_get(mp, agno);
> > +	ageo->ag_length = be32_to_cpu(agf->agf_length);
> > +	freeblks = pag->pagf_freeblks +
> > +		   pag->pagf_flcount +
> > +		   pag->pagf_btreeblks -
> > +		   xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE);
> > +	ageo->ag_freeblks = freeblks;
> > +	xfs_perag_put(pag);
> > +	xfs_buf_relse(bp);
> > +
> 
> I wonder if we should lock down the agi and agf together to prevent any
> potential incoherent reporting between them..? For example, suppose we
> collect the agi values, release the agi and lose a race to the agf by an
> inode allocator who consumes more free blocks before we ultimately read
> the agf values and return.

Good point, will fix.

--D

> Brian
> 
> > +	ageo->ag_number = agno;
> > +	return 0;
> > +}
> > diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> > index 412702e23f61..5166322807e7 100644
> > --- a/fs/xfs/libxfs/xfs_ag.h
> > +++ b/fs/xfs/libxfs/xfs_ag.h
> > @@ -26,5 +26,7 @@ struct aghdr_init_data {
> >  int xfs_ag_init_headers(struct xfs_mount *mp, struct aghdr_init_data *id);
> >  int xfs_ag_extend_space(struct xfs_mount *mp, struct xfs_trans *tp,
> >  			struct aghdr_init_data *id, xfs_extlen_t len);
> > +int xfs_ag_get_geometry(struct xfs_mount *mp, xfs_agnumber_t agno,
> > +			struct xfs_ag_geometry *ageo);
> >  
> >  #endif /* __LIBXFS_AG_H */
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index 1dba751cde60..87226e00e7bd 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -266,6 +266,19 @@ typedef struct xfs_fsop_resblks {
> >  #define XFS_MIN_DBLOCKS(s) ((xfs_rfsblock_t)((s)->sb_agcount - 1) *	\
> >  			 (s)->sb_agblocks + XFS_MIN_AG_BLOCKS)
> >  
> > +/*
> > + * Output for XFS_IOC_AG_GEOMETRY
> > + */
> > +struct xfs_ag_geometry {
> > +	__u32		ag_number;	/* i/o: AG number */
> > +	__u32		ag_length;	/* o: length in blocks */
> > +	__u32		ag_freeblks;	/* o: free space */
> > +	__u32		ag_icount;	/* o: inodes allocated */
> > +	__u32		ag_ifree;	/* o: inodes free */
> > +	__u32		ag_reserved32;	/* o: zero */
> > +	__u64		ag_reserved[5];	/* o: zero */
> > +};
> > +
> >  /*
> >   * Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT
> >   */
> > @@ -619,6 +632,7 @@ struct xfs_scrub_metadata {
> >  #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
> >  /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
> >  #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
> > +#define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
> >  
> >  /*
> >   * ioctl commands that replace IRIX syssgi()'s
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 7fd8815633dc..b5918ce656bd 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -33,6 +33,7 @@
> >  #include "xfs_fsmap.h"
> >  #include "scrub/xfs_scrub.h"
> >  #include "xfs_sb.h"
> > +#include "xfs_ag.h"
> >  
> >  #include <linux/capability.h>
> >  #include <linux/cred.h>
> > @@ -834,6 +835,26 @@ xfs_ioc_fsgeometry(
> >  	return 0;
> >  }
> >  
> > +STATIC int
> > +xfs_ioc_ag_geometry(
> > +	struct xfs_mount	*mp,
> > +	void			__user *arg)
> > +{
> > +	struct xfs_ag_geometry	ageo;
> > +	int			error;
> > +
> > +	if (copy_from_user(&ageo, arg, sizeof(ageo)))
> > +		return -EFAULT;
> > +
> > +	error = xfs_ag_get_geometry(mp, ageo.ag_number, &ageo);
> > +	if (error)
> > +		return error;
> > +
> > +	if (copy_to_user(arg, &ageo, sizeof(ageo)))
> > +		return -EFAULT;
> > +	return 0;
> > +}
> > +
> >  /*
> >   * Linux extended inode flags interface.
> >   */
> > @@ -1960,6 +1981,9 @@ xfs_file_ioctl(
> >  	case XFS_IOC_FSGEOMETRY:
> >  		return xfs_ioc_fsgeometry(mp, arg);
> >  
> > +	case XFS_IOC_AG_GEOMETRY:
> > +		return xfs_ioc_ag_geometry(mp, arg);
> > +
> >  	case XFS_IOC_GETVERSION:
> >  		return put_user(inode->i_generation, (int __user *)arg);
> >  
> > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > index 323cfd4b15dc..28d2110dd871 100644
> > --- a/fs/xfs/xfs_ioctl32.c
> > +++ b/fs/xfs/xfs_ioctl32.c
> > @@ -563,6 +563,7 @@ xfs_file_compat_ioctl(
> >  	case XFS_IOC_DIOINFO:
> >  	case XFS_IOC_FSGEOMETRY_V2:
> >  	case XFS_IOC_FSGEOMETRY:
> > +	case XFS_IOC_AG_GEOMETRY:
> >  	case XFS_IOC_FSGETXATTR:
> >  	case XFS_IOC_FSSETXATTR:
> >  	case XFS_IOC_FSGETXATTRA:
> > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 04/10] xfs: expand xfs_fsop_geom
  2019-04-01 17:10 ` [PATCH 04/10] xfs: expand xfs_fsop_geom Darrick J. Wong
  2019-04-02 17:34   ` Brian Foster
@ 2019-04-02 21:53   ` Dave Chinner
  2019-04-02 22:31     ` Darrick J. Wong
  1 sibling, 1 reply; 41+ messages in thread
From: Dave Chinner @ 2019-04-02 21:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Apr 01, 2019 at 10:10:34AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Rename the current (v2-v4) geometry ioctl XFS_IOC_FSGEOMETRY_V2 and
> expand the existing xfs_fsop_geom to reserve empty space for more
> fields.  This means that newly built binaries will pick up the new
> format and existing programs will simply end up in the V2 handler.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

This looks familiar....

Thu, 26 Oct 2017 19:33:19 +1100
[PATCH 11/14] xfs: bump XFS_IOC_FSGEOMETRY to v5 structures

https://patchwork.kernel.org/patch/10027799/

> ---
>  fs/xfs/libxfs/xfs_fs.h |   32 +++++++++++++++++++++++++++++++-
>  fs/xfs/libxfs/xfs_sb.c |    5 +++++
>  fs/xfs/xfs_ioctl.c     |   22 ++++++++++++++++++++--
>  fs/xfs/xfs_ioctl32.c   |    1 +
>  4 files changed, 57 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index f3aa59302fef..1dba751cde60 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -148,7 +148,34 @@ typedef struct xfs_fsop_geom_v1 {
>  } xfs_fsop_geom_v1_t;
>  
>  /*
> - * Output for XFS_IOC_FSGEOMETRY
> + * Output for XFS_IOC_FSGEOMETRY_V2
> + */
> +typedef struct xfs_fsop_geom_v2 {
> +	__u32		blocksize;	/* filesystem (data) block size */
> +	__u32		rtextsize;	/* realtime extent size		*/
> +	__u32		agblocks;	/* fsblocks in an AG		*/
> +	__u32		agcount;	/* number of allocation groups	*/
> +	__u32		logblocks;	/* fsblocks in the log		*/
> +	__u32		sectsize;	/* (data) sector size, bytes	*/
> +	__u32		inodesize;	/* inode size in bytes		*/
> +	__u32		imaxpct;	/* max allowed inode space(%)	*/
> +	__u64		datablocks;	/* fsblocks in data subvolume	*/
> +	__u64		rtblocks;	/* fsblocks in realtime subvol	*/
> +	__u64		rtextents;	/* rt extents in realtime subvol*/
> +	__u64		logstart;	/* starting fsblock of the log	*/
> +	unsigned char	uuid[16];	/* unique id of the filesystem	*/
> +	__u32		sunit;		/* stripe unit, fsblocks	*/
> +	__u32		swidth;		/* stripe width, fsblocks	*/
> +	__s32		version;	/* structure version		*/
> +	__u32		flags;		/* superblock version flags	*/
> +	__u32		logsectsize;	/* log sector size, bytes	*/
> +	__u32		rtsectsize;	/* realtime sector size, bytes	*/
> +	__u32		dirblocksize;	/* directory block size, bytes	*/
> +	__u32		logsunit;	/* log stripe unit, bytes */
> +} xfs_fsop_geom_v2_t;

That's actually the v4 structurei, not the v2 structure. fsgeom
versions 1-3 used the v1 structure, v4 uses this structure, and v5
uses the current structure. So this (and the renamed ioctl) should
really be name "v4", not "v2".


> +/*
> + * Output for XFS_IOC_FSGEOMETRY (v5)
>   */
>  typedef struct xfs_fsop_geom {
>  	__u32		blocksize;	/* filesystem (data) block size */
> @@ -172,6 +199,7 @@ typedef struct xfs_fsop_geom {
>  	__u32		rtsectsize;	/* realtime sector size, bytes	*/
>  	__u32		dirblocksize;	/* directory block size, bytes	*/
>  	__u32		logsunit;	/* log stripe unit, bytes */
> +	__u64		reserved[18];	/* reserved space */
>  } xfs_fsop_geom_t;
>  
>  /* Output for XFS_FS_COUNTS */
> @@ -189,6 +217,7 @@ typedef struct xfs_fsop_resblks {
>  } xfs_fsop_resblks_t;
>  
>  #define XFS_FSOP_GEOM_VERSION	0
> +#define XFS_FSOP_GEOM_V5	5
>  
>  #define XFS_FSOP_GEOM_FLAGS_ATTR	0x0001	/* attributes in use	*/
>  #define XFS_FSOP_GEOM_FLAGS_NLINK	0x0002	/* 32-bit nlink values	*/
> @@ -620,6 +649,7 @@ struct xfs_scrub_metadata {
>  #define XFS_IOC_FSSETDM_BY_HANDLE    _IOW ('X', 121, struct xfs_fsop_setdm_handlereq)
>  #define XFS_IOC_ATTRLIST_BY_HANDLE   _IOW ('X', 122, struct xfs_fsop_attrlist_handlereq)
>  #define XFS_IOC_ATTRMULTI_BY_HANDLE  _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
> +#define XFS_IOC_FSGEOMETRY_V2	     _IOR ('X', 124, struct xfs_fsop_geom_v2)
>  #define XFS_IOC_FSGEOMETRY	     _IOR ('X', 124, struct xfs_fsop_geom)
                                                ^^^
The identifier for the XFS_IOC_FSGEOMETRY ioctl needs to change
because it's now a new ioctl.


>  #define XFS_IOC_GOINGDOWN	     _IOR ('X', 125, uint32_t)
>  /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index f0309b74e377..c2ca3a816c41 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -1168,6 +1168,11 @@ xfs_fs_geometry(
>  
>  	geo->logsunit = sbp->sb_logsunit;
>  
> +	if (struct_version < 5)
> +		return 0;
> +
> +	geo->version = XFS_FSOP_GEOM_V5;
> +
>  	return 0;
>  }
>  
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 6ecdbb3af7de..7fd8815633dc 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -801,7 +801,7 @@ xfs_ioc_fsgeometry_v1(
>  }
>  
>  STATIC int
> -xfs_ioc_fsgeometry(
> +xfs_ioc_fsgeometry_v2(
>  	xfs_mount_t		*mp,
>  	void			__user *arg)
>  {
> @@ -812,6 +812,23 @@ xfs_ioc_fsgeometry(
>  	if (error)
>  		return error;
>  
> +	if (copy_to_user(arg, &fsgeo, sizeof(struct xfs_fsop_geom_v2)))
> +		return -EFAULT;
> +	return 0;
> +}
> +
> +STATIC int
> +xfs_ioc_fsgeometry(
> +	struct xfs_mount	*mp,
> +	void			__user *arg)
> +{
> +	struct xfs_fsop_geom	fsgeo;
> +	int			error;
> +
> +	error = xfs_fs_geometry(&mp->m_sb, &fsgeo, 5);
> +	if (error)
> +		return error;
> +
>  	if (copy_to_user(arg, &fsgeo, sizeof(fsgeo)))
>  		return -EFAULT;
>  	return 0;

And I factored all this into a single function, because it's just
boiler plate that can be done with a version switch passed in from
the XFS_IOC_FSGEOMETRY* calls themselves. see the patch I referenced
above....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 04/10] xfs: expand xfs_fsop_geom
  2019-04-02 21:53   ` Dave Chinner
@ 2019-04-02 22:31     ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-02 22:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 03, 2019 at 08:53:46AM +1100, Dave Chinner wrote:
> On Mon, Apr 01, 2019 at 10:10:34AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Rename the current (v2-v4) geometry ioctl XFS_IOC_FSGEOMETRY_V2 and
> > expand the existing xfs_fsop_geom to reserve empty space for more
> > fields.  This means that newly built binaries will pick up the new
> > format and existing programs will simply end up in the V2 handler.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> This looks familiar....
> 
> Thu, 26 Oct 2017 19:33:19 +1100
> [PATCH 11/14] xfs: bump XFS_IOC_FSGEOMETRY to v5 structures
> 
> https://patchwork.kernel.org/patch/10027799/

Heh, yes. :)

> > ---
> >  fs/xfs/libxfs/xfs_fs.h |   32 +++++++++++++++++++++++++++++++-
> >  fs/xfs/libxfs/xfs_sb.c |    5 +++++
> >  fs/xfs/xfs_ioctl.c     |   22 ++++++++++++++++++++--
> >  fs/xfs/xfs_ioctl32.c   |    1 +
> >  4 files changed, 57 insertions(+), 3 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index f3aa59302fef..1dba751cde60 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -148,7 +148,34 @@ typedef struct xfs_fsop_geom_v1 {
> >  } xfs_fsop_geom_v1_t;
> >  
> >  /*
> > - * Output for XFS_IOC_FSGEOMETRY
> > + * Output for XFS_IOC_FSGEOMETRY_V2
> > + */
> > +typedef struct xfs_fsop_geom_v2 {
> > +	__u32		blocksize;	/* filesystem (data) block size */
> > +	__u32		rtextsize;	/* realtime extent size		*/
> > +	__u32		agblocks;	/* fsblocks in an AG		*/
> > +	__u32		agcount;	/* number of allocation groups	*/
> > +	__u32		logblocks;	/* fsblocks in the log		*/
> > +	__u32		sectsize;	/* (data) sector size, bytes	*/
> > +	__u32		inodesize;	/* inode size in bytes		*/
> > +	__u32		imaxpct;	/* max allowed inode space(%)	*/
> > +	__u64		datablocks;	/* fsblocks in data subvolume	*/
> > +	__u64		rtblocks;	/* fsblocks in realtime subvol	*/
> > +	__u64		rtextents;	/* rt extents in realtime subvol*/
> > +	__u64		logstart;	/* starting fsblock of the log	*/
> > +	unsigned char	uuid[16];	/* unique id of the filesystem	*/
> > +	__u32		sunit;		/* stripe unit, fsblocks	*/
> > +	__u32		swidth;		/* stripe width, fsblocks	*/
> > +	__s32		version;	/* structure version		*/
> > +	__u32		flags;		/* superblock version flags	*/
> > +	__u32		logsectsize;	/* log sector size, bytes	*/
> > +	__u32		rtsectsize;	/* realtime sector size, bytes	*/
> > +	__u32		dirblocksize;	/* directory block size, bytes	*/
> > +	__u32		logsunit;	/* log stripe unit, bytes */
> > +} xfs_fsop_geom_v2_t;
> 
> That's actually the v4 structurei, not the v2 structure. fsgeom
> versions 1-3 used the v1 structure, v4 uses this structure, and v5
> uses the current structure. So this (and the renamed ioctl) should
> really be name "v4", not "v2".

Fair enough.  Like Brian, I wasn't 100% sure whether we were on v4 or v2
or vMILLION. :)

> 
> > +/*
> > + * Output for XFS_IOC_FSGEOMETRY (v5)
> >   */
> >  typedef struct xfs_fsop_geom {
> >  	__u32		blocksize;	/* filesystem (data) block size */
> > @@ -172,6 +199,7 @@ typedef struct xfs_fsop_geom {
> >  	__u32		rtsectsize;	/* realtime sector size, bytes	*/
> >  	__u32		dirblocksize;	/* directory block size, bytes	*/
> >  	__u32		logsunit;	/* log stripe unit, bytes */
> > +	__u64		reserved[18];	/* reserved space */
> >  } xfs_fsop_geom_t;
> >  
> >  /* Output for XFS_FS_COUNTS */
> > @@ -189,6 +217,7 @@ typedef struct xfs_fsop_resblks {
> >  } xfs_fsop_resblks_t;
> >  
> >  #define XFS_FSOP_GEOM_VERSION	0
> > +#define XFS_FSOP_GEOM_V5	5
> >  
> >  #define XFS_FSOP_GEOM_FLAGS_ATTR	0x0001	/* attributes in use	*/
> >  #define XFS_FSOP_GEOM_FLAGS_NLINK	0x0002	/* 32-bit nlink values	*/
> > @@ -620,6 +649,7 @@ struct xfs_scrub_metadata {
> >  #define XFS_IOC_FSSETDM_BY_HANDLE    _IOW ('X', 121, struct xfs_fsop_setdm_handlereq)
> >  #define XFS_IOC_ATTRLIST_BY_HANDLE   _IOW ('X', 122, struct xfs_fsop_attrlist_handlereq)
> >  #define XFS_IOC_ATTRMULTI_BY_HANDLE  _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
> > +#define XFS_IOC_FSGEOMETRY_V2	     _IOR ('X', 124, struct xfs_fsop_geom_v2)
> >  #define XFS_IOC_FSGEOMETRY	     _IOR ('X', 124, struct xfs_fsop_geom)
>                                                 ^^^
> The identifier for the XFS_IOC_FSGEOMETRY ioctl needs to change
> because it's now a new ioctl.

It /is/ different, because the sizeof(third parameter) is encoded in the
ioctl number...

printf("x%lx x%lx\n", XFS_IOC_FSGEOMETRY, XFS_IOC_FSGEOMETRY_V2);

Yields:

x8100587c x8070587c
                 ^^ 124
               ^^ the 'X'
            ^^^ structure size
           ^ ioctl direction

The size went from 0x70 to 0x100, so the number's different, unless
someone wants to make a drastic change to how the ioctl macros work.

Granted, it's a little subtle.

> 
> >  #define XFS_IOC_GOINGDOWN	     _IOR ('X', 125, uint32_t)
> >  /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
> > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > index f0309b74e377..c2ca3a816c41 100644
> > --- a/fs/xfs/libxfs/xfs_sb.c
> > +++ b/fs/xfs/libxfs/xfs_sb.c
> > @@ -1168,6 +1168,11 @@ xfs_fs_geometry(
> >  
> >  	geo->logsunit = sbp->sb_logsunit;
> >  
> > +	if (struct_version < 5)
> > +		return 0;
> > +
> > +	geo->version = XFS_FSOP_GEOM_V5;
> > +
> >  	return 0;
> >  }
> >  
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 6ecdbb3af7de..7fd8815633dc 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -801,7 +801,7 @@ xfs_ioc_fsgeometry_v1(
> >  }
> >  
> >  STATIC int
> > -xfs_ioc_fsgeometry(
> > +xfs_ioc_fsgeometry_v2(
> >  	xfs_mount_t		*mp,
> >  	void			__user *arg)
> >  {
> > @@ -812,6 +812,23 @@ xfs_ioc_fsgeometry(
> >  	if (error)
> >  		return error;
> >  
> > +	if (copy_to_user(arg, &fsgeo, sizeof(struct xfs_fsop_geom_v2)))
> > +		return -EFAULT;
> > +	return 0;
> > +}
> > +
> > +STATIC int
> > +xfs_ioc_fsgeometry(
> > +	struct xfs_mount	*mp,
> > +	void			__user *arg)
> > +{
> > +	struct xfs_fsop_geom	fsgeo;
> > +	int			error;
> > +
> > +	error = xfs_fs_geometry(&mp->m_sb, &fsgeo, 5);
> > +	if (error)
> > +		return error;
> > +
> >  	if (copy_to_user(arg, &fsgeo, sizeof(fsgeo)))
> >  		return -EFAULT;
> >  	return 0;
> 
> And I factored all this into a single function, because it's just
> boiler plate that can be done with a version switch passed in from
> the XFS_IOC_FSGEOMETRY* calls themselves. see the patch I referenced
> above....

Yeah, that is a lot less gross.  I'll have a look at your patch from
ages ago.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/10] xfs: report fs and rt health via geometry structure
  2019-04-02 18:23     ` Darrick J. Wong
@ 2019-04-02 23:34       ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-02 23:34 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Apr 02, 2019 at 11:23:10AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 02, 2019 at 01:35:04PM -0400, Brian Foster wrote:
> > On Mon, Apr 01, 2019 at 10:10:46AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Use our newly expanded geometry structure to report the overall fs and
> > > realtime health status.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > 
> > I kind of wonder whether it's possible (or makes sense) to make the core
> > sickness bits exportable to userspace and thus avoid some of the
> > translation code, but that aside this looks fine to me:
> 
> Not sure -- for now I went with having separate bits and mapping
> functions so that the internal implementation doesn't get fused to the
> userspace interface, but seeing as the defines are the same we really
> could just copy straight from the incore structure and have a bunch of
> BUILD_BUG_ON(internal bit == ioctl bit) until they diverge more.

(Heh, the fsgeometry health bits -- we mix the fs and realtime
reporting, which means that we still need the kinda ugly function...)

--D

> 
> --D
> 
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > 
> > >  fs/xfs/libxfs/xfs_fs.h     |   11 ++++++++++-
> > >  fs/xfs/libxfs/xfs_health.h |    3 +++
> > >  fs/xfs/xfs_health.c        |   27 +++++++++++++++++++++++++++
> > >  fs/xfs/xfs_ioctl.c         |    3 +++
> > >  4 files changed, 43 insertions(+), 1 deletion(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > > index 87226e00e7bd..ddbfde7ff79d 100644
> > > --- a/fs/xfs/libxfs/xfs_fs.h
> > > +++ b/fs/xfs/libxfs/xfs_fs.h
> > > @@ -199,9 +199,18 @@ typedef struct xfs_fsop_geom {
> > >  	__u32		rtsectsize;	/* realtime sector size, bytes	*/
> > >  	__u32		dirblocksize;	/* directory block size, bytes	*/
> > >  	__u32		logsunit;	/* log stripe unit, bytes */
> > > -	__u64		reserved[18];	/* reserved space */
> > > +	__u32		health;		/* o: unhealthy fs & rt metadata */
> > > +	__u32		reserved32;	/* reserved space */
> > > +	__u64		reserved[17];	/* reserved space */
> > >  } xfs_fsop_geom_t;
> > >  
> > > +#define XFS_FSOP_GEOM_HEALTH_FS_COUNTERS (1 << 0) /* summary counters */
> > > +#define XFS_FSOP_GEOM_HEALTH_FS_UQUOTA	(1 << 1)  /* user quota */
> > > +#define XFS_FSOP_GEOM_HEALTH_FS_GQUOTA	(1 << 2)  /* group quota */
> > > +#define XFS_FSOP_GEOM_HEALTH_FS_PQUOTA	(1 << 3)  /* project quota */
> > > +#define XFS_FSOP_GEOM_HEALTH_RT_BITMAP	(1 << 4)  /* realtime bitmap */
> > > +#define XFS_FSOP_GEOM_HEALTH_RT_SUMMARY	(1 << 5)  /* realtime summary */
> > > +
> > >  /* Output for XFS_FS_COUNTS */
> > >  typedef struct xfs_fsop_counts {
> > >  	__u64	freedata;	/* free data section blocks */
> > > diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
> > > index 269b124dc1d7..36736d54a3e3 100644
> > > --- a/fs/xfs/libxfs/xfs_health.h
> > > +++ b/fs/xfs/libxfs/xfs_health.h
> > > @@ -39,6 +39,7 @@
> > >  struct xfs_mount;
> > >  struct xfs_perag;
> > >  struct xfs_inode;
> > > +struct xfs_fsop_geom;
> > >  
> > >  /* Observable health issues for metadata spanning the entire filesystem. */
> > >  #define XFS_HEALTH_FS_COUNTERS	(1 << 0)  /* summary counters */
> > > @@ -200,4 +201,6 @@ xfs_inode_healthy(struct xfs_inode *ip)
> > >  	return xfs_inode_measure_sickness(ip) == 0;
> > >  }
> > >  
> > > +void xfs_fsop_geom_health(struct xfs_mount *mp, struct xfs_fsop_geom *geo);
> > > +
> > >  #endif	/* __XFS_HEALTH_H__ */
> > > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > > index 6e2da858c356..151c98693bef 100644
> > > --- a/fs/xfs/xfs_health.c
> > > +++ b/fs/xfs/xfs_health.c
> > > @@ -249,3 +249,30 @@ xfs_inode_measure_sickness(
> > >  	spin_unlock(&ip->i_flags_lock);
> > >  	return ret;
> > >  }
> > > +
> > > +/* Fill out fs geometry health info. */
> > > +void
> > > +xfs_fsop_geom_health(
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_fsop_geom	*geo)
> > > +{
> > > +	unsigned int		sick;
> > > +
> > > +	geo->health = 0;
> > > +
> > > +	sick = xfs_fs_measure_sickness(mp);
> > > +	if (sick & XFS_HEALTH_FS_COUNTERS)
> > > +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_COUNTERS;
> > > +	if (sick & XFS_HEALTH_FS_UQUOTA)
> > > +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_UQUOTA;
> > > +	if (sick & XFS_HEALTH_FS_GQUOTA)
> > > +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_GQUOTA;
> > > +	if (sick & XFS_HEALTH_FS_PQUOTA)
> > > +		geo->health |= XFS_FSOP_GEOM_HEALTH_FS_PQUOTA;
> > > +
> > > +	sick = xfs_rt_measure_sickness(mp);
> > > +	if (sick & XFS_HEALTH_RT_BITMAP)
> > > +		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_BITMAP;
> > > +	if (sick & XFS_HEALTH_RT_SUMMARY)
> > > +		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY;
> > > +}
> > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > index b5918ce656bd..f9bf11b6a055 100644
> > > --- a/fs/xfs/xfs_ioctl.c
> > > +++ b/fs/xfs/xfs_ioctl.c
> > > @@ -34,6 +34,7 @@
> > >  #include "scrub/xfs_scrub.h"
> > >  #include "xfs_sb.h"
> > >  #include "xfs_ag.h"
> > > +#include "xfs_health.h"
> > >  
> > >  #include <linux/capability.h>
> > >  #include <linux/cred.h>
> > > @@ -830,6 +831,8 @@ xfs_ioc_fsgeometry(
> > >  	if (error)
> > >  		return error;
> > >  
> > > +	xfs_fsop_geom_health(mp, &fsgeo);
> > > +
> > >  	if (copy_to_user(arg, &fsgeo, sizeof(fsgeo)))
> > >  		return -EFAULT;
> > >  	return 0;
> > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/10] xfs: report AG health via AG geometry ioctl
  2019-04-01 17:10 ` [PATCH 07/10] xfs: report AG health via AG geometry ioctl Darrick J. Wong
@ 2019-04-03 14:30   ` Brian Foster
  2019-04-03 16:11     ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-03 14:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Apr 01, 2019 at 10:10:52AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Use the AG geometry info ioctl to report health status too.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_fs.h     |   12 +++++++++++-
>  fs/xfs/libxfs/xfs_health.h |    2 ++
>  fs/xfs/xfs_health.c        |   40 ++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_ioctl.c         |    2 ++
>  4 files changed, 55 insertions(+), 1 deletion(-)
> 
> 
...
> diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> index 151c98693bef..5ca471bd41ad 100644
> --- a/fs/xfs/xfs_health.c
> +++ b/fs/xfs/xfs_health.c
> @@ -276,3 +276,43 @@ xfs_fsop_geom_health(
>  	if (sick & XFS_HEALTH_RT_SUMMARY)
>  		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY;
>  }
> +
> +/* Fill out ag geometry health info. */
> +void
> +xfs_ag_geom_health(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno,
> +	struct xfs_ag_geometry	*ageo)
> +{
> +	struct xfs_perag	*pag;
> +	unsigned int		sick;
> +
> +	if (agno >= mp->m_sb.sb_agcount)
> +		return;

The call to xfs_ag_get_geometry() would have already returned an error
in the ioctl path for the above scenario. It might still make sense to
check here, but perhaps we could let this function also return an int
and return an error for consistency?

> +
> +	ageo->ag_health = 0;
> +
> +	pag = xfs_perag_get(mp, agno);
> +	sick = xfs_ag_measure_sickness(pag);
> +	if (sick & XFS_HEALTH_AG_SB)
> +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_SB;

I'm starting to wonder whether "health" is the best term to use for the
interface bits just because it reads a little weird to measure
"sickness" and then apply all the sick state to something called
"health." I don't have a better suggestion off the top of my head,
though. Just something to think about a bit more from an API
standpoint..

Brian

> +	if (sick & XFS_HEALTH_AG_AGF)
> +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGF;
> +	if (sick & XFS_HEALTH_AG_AGFL)
> +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGFL;
> +	if (sick & XFS_HEALTH_AG_AGI)
> +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGI;
> +	if (sick & XFS_HEALTH_AG_BNOBT)
> +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_BNOBT;
> +	if (sick & XFS_HEALTH_AG_CNTBT)
> +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_CNTBT;
> +	if (sick & XFS_HEALTH_AG_INOBT)
> +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_INOBT;
> +	if (sick & XFS_HEALTH_AG_FINOBT)
> +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_FINOBT;
> +	if (sick & XFS_HEALTH_AG_RMAPBT)
> +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_RMAPBT;
> +	if (sick & XFS_HEALTH_AG_REFCNTBT)
> +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_REFCNTBT;
> +	xfs_perag_put(pag);
> +}
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index f9bf11b6a055..f1fc5e53cfc1 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -853,6 +853,8 @@ xfs_ioc_ag_geometry(
>  	if (error)
>  		return error;
>  
> +	xfs_ag_geom_health(mp, ageo.ag_number, &ageo);
> +
>  	if (copy_to_user(arg, &ageo, sizeof(ageo)))
>  		return -EFAULT;
>  	return 0;
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/10] xfs: report AG health via AG geometry ioctl
  2019-04-03 14:30   ` Brian Foster
@ 2019-04-03 16:11     ` Darrick J. Wong
  2019-04-04 11:48       ` Brian Foster
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-03 16:11 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Wed, Apr 03, 2019 at 10:30:05AM -0400, Brian Foster wrote:
> On Mon, Apr 01, 2019 at 10:10:52AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Use the AG geometry info ioctl to report health status too.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_fs.h     |   12 +++++++++++-
> >  fs/xfs/libxfs/xfs_health.h |    2 ++
> >  fs/xfs/xfs_health.c        |   40 ++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_ioctl.c         |    2 ++
> >  4 files changed, 55 insertions(+), 1 deletion(-)
> > 
> > 
> ...
> > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > index 151c98693bef..5ca471bd41ad 100644
> > --- a/fs/xfs/xfs_health.c
> > +++ b/fs/xfs/xfs_health.c
> > @@ -276,3 +276,43 @@ xfs_fsop_geom_health(
> >  	if (sick & XFS_HEALTH_RT_SUMMARY)
> >  		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY;
> >  }
> > +
> > +/* Fill out ag geometry health info. */
> > +void
> > +xfs_ag_geom_health(
> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno,
> > +	struct xfs_ag_geometry	*ageo)
> > +{
> > +	struct xfs_perag	*pag;
> > +	unsigned int		sick;
> > +
> > +	if (agno >= mp->m_sb.sb_agcount)
> > +		return;
> 
> The call to xfs_ag_get_geometry() would have already returned an error
> in the ioctl path for the above scenario. It might still make sense to
> check here, but perhaps we could let this function also return an int
> and return an error for consistency?

Or maybe just ASSERT on the agno and add a note that the caller is
required to pass in a valid ag number.

> > +
> > +	ageo->ag_health = 0;
> > +
> > +	pag = xfs_perag_get(mp, agno);
> > +	sick = xfs_ag_measure_sickness(pag);
> > +	if (sick & XFS_HEALTH_AG_SB)
> > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_SB;
> 
> I'm starting to wonder whether "health" is the best term to use for the
> interface bits just because it reads a little weird to measure
> "sickness" and then apply all the sick state to something called
> "health." I don't have a better suggestion off the top of my head,
> though. Just something to think about a bit more from an API
> standpoint..

I had the same conundrum.  I guess we could start the bitset with -1 and
clear bits when scrub says they've gone bad?  That would be much clearer
with regards to the names, but technically we don't know the health of a
structure until we scan it, so I wouldn't want to represent the fs as
being "healthy" having not actually looked for problems.

What we /really/ need is a tri-state bitset[1]:

enum Bool
{
    True,
    False,
    FileNotFound
};

But maybe I will try renaming all this to "sick" again.

if (sick & XFS_SICK_AG_AGF)
	ageo->ag_sick |= XFS_AG_GEOM_SICK_AG_AGF;

Gosh.  That second name is gross.  XFS_AG_GEOM_SICK_AGF.

Sick sick sick sick sick.  Ok, I've convinced myself of the name change. :P

--D

[1] https://thedailywtf.com/articles/What_Is_Truth_0x3f_

> Brian
> 
> > +	if (sick & XFS_HEALTH_AG_AGF)
> > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGF;
> > +	if (sick & XFS_HEALTH_AG_AGFL)
> > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGFL;
> > +	if (sick & XFS_HEALTH_AG_AGI)
> > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGI;
> > +	if (sick & XFS_HEALTH_AG_BNOBT)
> > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_BNOBT;
> > +	if (sick & XFS_HEALTH_AG_CNTBT)
> > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_CNTBT;
> > +	if (sick & XFS_HEALTH_AG_INOBT)
> > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_INOBT;
> > +	if (sick & XFS_HEALTH_AG_FINOBT)
> > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_FINOBT;
> > +	if (sick & XFS_HEALTH_AG_RMAPBT)
> > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_RMAPBT;
> > +	if (sick & XFS_HEALTH_AG_REFCNTBT)
> > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_REFCNTBT;
> > +	xfs_perag_put(pag);
> > +}
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index f9bf11b6a055..f1fc5e53cfc1 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -853,6 +853,8 @@ xfs_ioc_ag_geometry(
> >  	if (error)
> >  		return error;
> >  
> > +	xfs_ag_geom_health(mp, ageo.ag_number, &ageo);
> > +
> >  	if (copy_to_user(arg, &ageo, sizeof(ageo)))
> >  		return -EFAULT;
> >  	return 0;
> > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/10] xfs: report AG health via AG geometry ioctl
  2019-04-03 16:11     ` Darrick J. Wong
@ 2019-04-04 11:48       ` Brian Foster
  2019-04-05 20:33         ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-04 11:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Apr 03, 2019 at 09:11:06AM -0700, Darrick J. Wong wrote:
> On Wed, Apr 03, 2019 at 10:30:05AM -0400, Brian Foster wrote:
> > On Mon, Apr 01, 2019 at 10:10:52AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Use the AG geometry info ioctl to report health status too.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_fs.h     |   12 +++++++++++-
> > >  fs/xfs/libxfs/xfs_health.h |    2 ++
> > >  fs/xfs/xfs_health.c        |   40 ++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_ioctl.c         |    2 ++
> > >  4 files changed, 55 insertions(+), 1 deletion(-)
> > > 
> > > 
> > ...
> > > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > > index 151c98693bef..5ca471bd41ad 100644
> > > --- a/fs/xfs/xfs_health.c
> > > +++ b/fs/xfs/xfs_health.c
> > > @@ -276,3 +276,43 @@ xfs_fsop_geom_health(
> > >  	if (sick & XFS_HEALTH_RT_SUMMARY)
> > >  		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY;
> > >  }
> > > +
> > > +/* Fill out ag geometry health info. */
> > > +void
> > > +xfs_ag_geom_health(
> > > +	struct xfs_mount	*mp,
> > > +	xfs_agnumber_t		agno,
> > > +	struct xfs_ag_geometry	*ageo)
> > > +{
> > > +	struct xfs_perag	*pag;
> > > +	unsigned int		sick;
> > > +
> > > +	if (agno >= mp->m_sb.sb_agcount)
> > > +		return;
> > 
> > The call to xfs_ag_get_geometry() would have already returned an error
> > in the ioctl path for the above scenario. It might still make sense to
> > check here, but perhaps we could let this function also return an int
> > and return an error for consistency?
> 
> Or maybe just ASSERT on the agno and add a note that the caller is
> required to pass in a valid ag number.
> 
> > > +
> > > +	ageo->ag_health = 0;
> > > +
> > > +	pag = xfs_perag_get(mp, agno);
> > > +	sick = xfs_ag_measure_sickness(pag);
> > > +	if (sick & XFS_HEALTH_AG_SB)
> > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_SB;
> > 
> > I'm starting to wonder whether "health" is the best term to use for the
> > interface bits just because it reads a little weird to measure
> > "sickness" and then apply all the sick state to something called
> > "health." I don't have a better suggestion off the top of my head,
> > though. Just something to think about a bit more from an API
> > standpoint..
> 
> I had the same conundrum.  I guess we could start the bitset with -1 and
> clear bits when scrub says they've gone bad?  That would be much clearer
> with regards to the names, but technically we don't know the health of a
> structure until we scan it, so I wouldn't want to represent the fs as
> being "healthy" having not actually looked for problems.
> 
> What we /really/ need is a tri-state bitset[1]:
> 
> enum Bool
> {
>     True,
>     False,
>     FileNotFound
> };
> 
> But maybe I will try renaming all this to "sick" again.
> 
> if (sick & XFS_SICK_AG_AGF)
> 	ageo->ag_sick |= XFS_AG_GEOM_SICK_AG_AGF;
> 
> Gosh.  That second name is gross.  XFS_AG_GEOM_SICK_AGF.
> 
> Sick sick sick sick sick.  Ok, I've convinced myself of the name change. :P
> 

Heh. I suppose we could either invert the logic or perhaps try to come
up with a better keyword than "health" for the exported bits (at least).
If I see ag_health in a data structure, for example, I'm assuming it's
telling me what is healthy. Of course we'll have documentation and
whatnot to clear that up..

Another term that came to mind is "fault" or "faulted" as it has
precedent in storage contexts wrt to raid. I.e., ag_faults and
XFS_AG_GEOM_FAULT_AGF, etc. etc. To me it also kind of covers the angle
that we aren't necessarily stating a subset of the filesystem is healthy
due to lack of faults if we just haven't scrubbed/found anything. Hm? I
guess it could be confused with reporting underlying storage problems. I
dunno... it's more clear to me, but maybe others have ideas..

Brian

> --D
> 
> [1] https://thedailywtf.com/articles/What_Is_Truth_0x3f_
> 
> > Brian
> > 
> > > +	if (sick & XFS_HEALTH_AG_AGF)
> > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGF;
> > > +	if (sick & XFS_HEALTH_AG_AGFL)
> > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGFL;
> > > +	if (sick & XFS_HEALTH_AG_AGI)
> > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGI;
> > > +	if (sick & XFS_HEALTH_AG_BNOBT)
> > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_BNOBT;
> > > +	if (sick & XFS_HEALTH_AG_CNTBT)
> > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_CNTBT;
> > > +	if (sick & XFS_HEALTH_AG_INOBT)
> > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_INOBT;
> > > +	if (sick & XFS_HEALTH_AG_FINOBT)
> > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_FINOBT;
> > > +	if (sick & XFS_HEALTH_AG_RMAPBT)
> > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_RMAPBT;
> > > +	if (sick & XFS_HEALTH_AG_REFCNTBT)
> > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_REFCNTBT;
> > > +	xfs_perag_put(pag);
> > > +}
> > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > index f9bf11b6a055..f1fc5e53cfc1 100644
> > > --- a/fs/xfs/xfs_ioctl.c
> > > +++ b/fs/xfs/xfs_ioctl.c
> > > @@ -853,6 +853,8 @@ xfs_ioc_ag_geometry(
> > >  	if (error)
> > >  		return error;
> > >  
> > > +	xfs_ag_geom_health(mp, ageo.ag_number, &ageo);
> > > +
> > >  	if (copy_to_user(arg, &ageo, sizeof(ageo)))
> > >  		return -EFAULT;
> > >  	return 0;
> > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health
  2019-04-01 17:11 ` [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health Darrick J. Wong
@ 2019-04-04 11:50   ` Brian Foster
  2019-04-04 18:01     ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-04 11:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Apr 01, 2019 at 10:11:12AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Now that we have the ability to track sick metadata in-core, make scrub
> and repair update those health assessments after doing work.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile       |    1 
>  fs/xfs/scrub/health.c |  180 +++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/health.h |   12 +++
>  fs/xfs/scrub/scrub.c  |    8 ++
>  fs/xfs/scrub/scrub.h  |   11 +++
>  5 files changed, 212 insertions(+)
>  create mode 100644 fs/xfs/scrub/health.c
>  create mode 100644 fs/xfs/scrub/health.h
> 
> 
...
> diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
> new file mode 100644
> index 000000000000..dd9986500801
> --- /dev/null
> +++ b/fs/xfs/scrub/health.c
> @@ -0,0 +1,180 @@
...
> +/* Update filesystem health assessments based on what we found and did. */
> +void
> +xchk_update_health(
> +	struct xfs_scrub	*sc,
> +	bool			already_fixed)
> +{
> +	/*
> +	 * If the scrubber finds errors, we mark sick whatever's mentioned in
> +	 * sick_mask, no matter whether this is a first scan or an evaluation
> +	 * of repair effectiveness.
> +	 *
> +	 * If there is no direct corruption and we're called after a repair,
> +	 * clear whatever's in heal_mask because that's what we fixed.
> +	 *
> +	 * Otherwise, there's no direct corruption and we didn't repair
> +	 * anything, so mark whatever's in sick_mask as healthy.
> +	 */
> +	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> +		xchk_mark_sick(sc, sc->sick_mask);
> +	else if (already_fixed)
> +		xchk_mark_healthy(sc, sc->heal_mask);
> +	else
> +		xchk_mark_healthy(sc, sc->sick_mask);
> +}

Hmm, I think I follow what we're doing here but it's a bit confusing
without the additional context of where these bits will be set/cleared
at the lower scrub layers (or at least without an example). Some
questions on that below...

...
> diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> index 1b2344d00525..b1519dfc5811 100644
> --- a/fs/xfs/scrub/scrub.c
> +++ b/fs/xfs/scrub/scrub.c
> @@ -40,6 +40,7 @@
>  #include "scrub/trace.h"
>  #include "scrub/btree.h"
>  #include "scrub/repair.h"
> +#include "scrub/health.h"
>  
>  /*
>   * Online Scrub and Repair
> @@ -468,6 +469,7 @@ xfs_scrub_metadata(
>  {
>  	struct xfs_scrub		sc;
>  	struct xfs_mount		*mp = ip->i_mount;
> +	unsigned int			heal_mask;
>  	bool				try_harder = false;
>  	bool				already_fixed = false;
>  	int				error = 0;
> @@ -488,6 +490,7 @@ xfs_scrub_metadata(
>  	error = xchk_validate_inputs(mp, sm);
>  	if (error)
>  		goto out;
> +	heal_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
>  
>  	xchk_experimental_warning(mp);
>  
> @@ -499,6 +502,8 @@ xfs_scrub_metadata(
>  	sc.ops = &meta_scrub_ops[sm->sm_type];
>  	sc.try_harder = try_harder;
>  	sc.sa.agno = NULLAGNUMBER;
> +	sc.heal_mask = heal_mask;
> +	sc.sick_mask = xchk_health_mask_for_scrub_type(sm->sm_type);

Ok, so we initialize the heal/sick masks based on the scrub type that
was requested on the first pass through...

>  	error = sc.ops->setup(&sc, ip);
>  	if (error)
>  		goto out_teardown;
> @@ -519,6 +524,8 @@ xfs_scrub_metadata(
>  	} else if (error)
>  		goto out_teardown;
>  
> +	xchk_update_health(&sc, already_fixed);
> +

... then update the in-core fs health state based on the sick mask. Is
it possible for the scrub operation to set more sick mask bits based on
what it finds? More specifically, I'm wondering why the masks wouldn't
start as zero and toggle based on finding/fixing corruption(s). Or if
the sick mask value is essentially fixed, whether we need to store it in
the xfs_scrub context...

>  	if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed) {
>  		bool needs_fix;
>  
> @@ -551,6 +558,7 @@ xfs_scrub_metadata(
>  				xrep_failure(mp);
>  				goto out;
>  			}
> +			heal_mask = sc.heal_mask;

And if we end up doing a repair, we presumably can repair multiple
things and so we track that separately and persist the heal mask across
a potential retry. What about the case where we don't retry, but scrub
finds something and then immediately repairs it? Should we update the fs
state after both detecting and clearing the problem, or does that happen
elsewhere?

Also, if repair can potentially clear multiple bits, what's the
possibility of a repair clearing one failure and then failing on
another, causing the broader repair op to return an error or jump into
this retry? ISTM that it might be possible to skip clearing one fail
state bit so long as the original thing remained corrupted, but I feel
like I'm still missing some context on the bigger picture scrub
tracking...

Brian

>  			goto retry_op;
>  		}
>  	}
> diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> index 22f754fba8e5..05f1ad242a35 100644
> --- a/fs/xfs/scrub/scrub.h
> +++ b/fs/xfs/scrub/scrub.h
> @@ -62,6 +62,17 @@ struct xfs_scrub {
>  	struct xfs_inode		*ip;
>  	void				*buf;
>  	uint				ilock_flags;
> +
> +	/* Metadata to be marked sick if scrub finds errors. */
> +	unsigned int			sick_mask;
> +
> +	/*
> +	 * Metadata to be marked healthy if repair fixes errors.  Some repair
> +	 * functions can fix multiple data structures at once, so we have to
> +	 * treat sick and heal masks separately.
> +	 */
> +	unsigned int			heal_mask;
> +
>  	bool				try_harder;
>  	bool				has_quotaofflock;
>  
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/10] xfs: update health status if we get a clean bill of health
  2019-04-01 17:11 ` [PATCH 10/10] xfs: update health status if we get a clean bill of health Darrick J. Wong
@ 2019-04-04 11:51   ` Brian Foster
  2019-04-04 15:48     ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-04 11:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Apr 01, 2019 at 10:11:18AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> If scrub finds that everything is ok with the filesystem, we need a way
> to tell the health tracking that it can let go of indirect health flags,
> since indirect flags only mean that at some point in the past we lost
> some context.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

FYI this one doesn't compile for me:

...
fs/xfs/scrub/common.c: In function ‘xchk_set_corrupt’:
fs/xfs/scrub/common.c:217:2: error: implicit declaration of function ‘xfs_scrub_whine’; did you mean ‘xfs_bmapi_write’? [-Werror=implicit-function-declaration]
  xfs_scrub_whine(sc->mp, "type %d ret_ip %pS",
...

>  fs/xfs/libxfs/xfs_fs.h |    3 ++
>  fs/xfs/scrub/common.c  |   12 ++++++++++
>  fs/xfs/scrub/common.h  |    1 +
>  fs/xfs/scrub/health.c  |   58 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/health.h  |    1 +
>  fs/xfs/scrub/repair.c  |    1 +
>  fs/xfs/scrub/scrub.c   |    6 +++++
>  fs/xfs/scrub/trace.h   |    4 ++-
>  8 files changed, 84 insertions(+), 2 deletions(-)
> 
> 
...
> diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
> index dd9986500801..049e802b9418 100644
> --- a/fs/xfs/scrub/health.c
> +++ b/fs/xfs/scrub/health.c
...
> @@ -54,6 +55,60 @@ xchk_health_mask_for_scrub_type(
...
> +/*
> + * Scrub gave the filesystem a clean bill of health, so clear all the indirect
> + * markers of past problems (at least for the fs and ags) so that we can be
> + * healthy again.
> + */
> +STATIC void
> +xchk_mark_all_healthy(
> +	struct xfs_mount	*mp)
> +{
> +	struct xfs_perag	*pag;
> +	xfs_agnumber_t		agno;
> +	int			error = 0;
> +
> +	xfs_fs_mark_healthy(mp, XFS_HEALTH_FS_INDIRECT);
> +	xfs_rt_mark_healthy(mp, XFS_HEALTH_RT_INDIRECT);
> +	for (agno = 0; error == 0 && agno < mp->m_sb.sb_agcount; agno++) {
> +		pag = xfs_perag_get(mp, agno);
> +		xfs_ag_mark_healthy(pag, XFS_HEALTH_AG_INDIRECT);
> +		xfs_perag_put(pag);
> +	}
> +}
>  /* Mark metadata unhealthy. */
>  static void
>  xchk_mark_sick(
> @@ -149,6 +204,9 @@ xchk_mark_healthy(
>  	case XFS_SCRUB_TYPE_RTSUM:
>  		xfs_rt_mark_healthy(sc->mp, mask);
>  		break;
> +	case XFS_SCRUB_TYPE_HEALTHY:
> +		xchk_mark_all_healthy(sc->mp);
> +		break;

Should this scrub type have a corresponding health flag? It kind of
looks like it being zeroed could prevent us from getting here because of
the 'if (!mask)' check in xchk_update_health(), but it's a bit twisty
from there to here.. :P

Brian

>  	default:
>  		break;
>  	}
> diff --git a/fs/xfs/scrub/health.h b/fs/xfs/scrub/health.h
> index e795f4c9a23c..001e5a93273d 100644
> --- a/fs/xfs/scrub/health.h
> +++ b/fs/xfs/scrub/health.h
> @@ -8,5 +8,6 @@
>  
>  unsigned int xchk_health_mask_for_scrub_type(__u32 scrub_type);
>  void xchk_update_health(struct xfs_scrub *sc, bool already_fixed);
> +int xchk_health_record(struct xfs_scrub *sc);
>  
>  #endif /* __XFS_SCRUB_HEALTH_H__ */
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index f28f4bad317b..5df67fe5d8ac 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -31,6 +31,7 @@
>  #include "xfs_quota.h"
>  #include "xfs_attr.h"
>  #include "xfs_reflink.h"
> +#include "xfs_health.h"
>  #include "scrub/xfs_scrub.h"
>  #include "scrub/scrub.h"
>  #include "scrub/common.h"
> diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> index b1519dfc5811..f446ab57d7b0 100644
> --- a/fs/xfs/scrub/scrub.c
> +++ b/fs/xfs/scrub/scrub.c
> @@ -348,6 +348,12 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
>  		.scrub	= xchk_quota,
>  		.repair	= xrep_notsupported,
>  	},
> +	[XFS_SCRUB_TYPE_HEALTHY] = {	/* fs healthy; clean all reminders */
> +		.type	= ST_FS,
> +		.setup	= xchk_setup_fs,
> +		.scrub	= xchk_health_record,
> +		.repair = xrep_notsupported,
> +	},
>  };
>  
>  /* This isn't a stable feature, warn once per day. */
> diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> index 3c83e8b3b39c..7c25a38c6f81 100644
> --- a/fs/xfs/scrub/trace.h
> +++ b/fs/xfs/scrub/trace.h
> @@ -75,7 +75,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_PQUOTA);
>  	{ XFS_SCRUB_TYPE_RTSUM,		"rtsummary" }, \
>  	{ XFS_SCRUB_TYPE_UQUOTA,	"usrquota" }, \
>  	{ XFS_SCRUB_TYPE_GQUOTA,	"grpquota" }, \
> -	{ XFS_SCRUB_TYPE_PQUOTA,	"prjquota" }
> +	{ XFS_SCRUB_TYPE_PQUOTA,	"prjquota" }, \
> +	{ XFS_SCRUB_TYPE_HEALTHY,	"healthy" }
>  
>  DECLARE_EVENT_CLASS(xchk_class,
>  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
> @@ -223,6 +224,7 @@ DEFINE_EVENT(xchk_block_error_class, name, \
>  		 void *ret_ip), \
>  	TP_ARGS(sc, daddr, ret_ip))
>  
> +DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_fs_error);
>  DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_block_error);
>  DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_block_preen);
>  
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/10] xfs: update health status if we get a clean bill of health
  2019-04-04 11:51   ` Brian Foster
@ 2019-04-04 15:48     ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-04 15:48 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Apr 04, 2019 at 07:51:43AM -0400, Brian Foster wrote:
> On Mon, Apr 01, 2019 at 10:11:18AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > If scrub finds that everything is ok with the filesystem, we need a way
> > to tell the health tracking that it can let go of indirect health flags,
> > since indirect flags only mean that at some point in the past we lost
> > some context.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> FYI this one doesn't compile for me:
> 
> ...
> fs/xfs/scrub/common.c: In function ‘xchk_set_corrupt’:
> fs/xfs/scrub/common.c:217:2: error: implicit declaration of function ‘xfs_scrub_whine’; did you mean ‘xfs_bmapi_write’? [-Werror=implicit-function-declaration]
>   xfs_scrub_whine(sc->mp, "type %d ret_ip %pS",

Uhh... April Fools! :)

> ...
> 
> >  fs/xfs/libxfs/xfs_fs.h |    3 ++
> >  fs/xfs/scrub/common.c  |   12 ++++++++++
> >  fs/xfs/scrub/common.h  |    1 +
> >  fs/xfs/scrub/health.c  |   58 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/health.h  |    1 +
> >  fs/xfs/scrub/repair.c  |    1 +
> >  fs/xfs/scrub/scrub.c   |    6 +++++
> >  fs/xfs/scrub/trace.h   |    4 ++-
> >  8 files changed, 84 insertions(+), 2 deletions(-)
> > 
> > 
> ...
> > diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
> > index dd9986500801..049e802b9418 100644
> > --- a/fs/xfs/scrub/health.c
> > +++ b/fs/xfs/scrub/health.c
> ...
> > @@ -54,6 +55,60 @@ xchk_health_mask_for_scrub_type(
> ...
> > +/*
> > + * Scrub gave the filesystem a clean bill of health, so clear all the indirect
> > + * markers of past problems (at least for the fs and ags) so that we can be
> > + * healthy again.
> > + */
> > +STATIC void
> > +xchk_mark_all_healthy(
> > +	struct xfs_mount	*mp)
> > +{
> > +	struct xfs_perag	*pag;
> > +	xfs_agnumber_t		agno;
> > +	int			error = 0;
> > +
> > +	xfs_fs_mark_healthy(mp, XFS_HEALTH_FS_INDIRECT);
> > +	xfs_rt_mark_healthy(mp, XFS_HEALTH_RT_INDIRECT);
> > +	for (agno = 0; error == 0 && agno < mp->m_sb.sb_agcount; agno++) {
> > +		pag = xfs_perag_get(mp, agno);
> > +		xfs_ag_mark_healthy(pag, XFS_HEALTH_AG_INDIRECT);
> > +		xfs_perag_put(pag);
> > +	}
> > +}
> >  /* Mark metadata unhealthy. */
> >  static void
> >  xchk_mark_sick(
> > @@ -149,6 +204,9 @@ xchk_mark_healthy(
> >  	case XFS_SCRUB_TYPE_RTSUM:
> >  		xfs_rt_mark_healthy(sc->mp, mask);
> >  		break;
> > +	case XFS_SCRUB_TYPE_HEALTHY:
> > +		xchk_mark_all_healthy(sc->mp);
> > +		break;
> 
> Should this scrub type have a corresponding health flag? It kind of
> looks like it being zeroed could prevent us from getting here because of
> the 'if (!mask)' check in xchk_update_health(), but it's a bit twisty
> from there to here.. :P

Oops, no, that's just a bug. :(

SCRUB_TYPE_HEALTHY is a convenience method for xfs_scrub to ask the
kernel to clear all the indirect sick flags if it saw no errors and
nothing else got marked sick since xfs_scrub started running.  It's not
collecting primary evidence, so there's no health flag associated with
it and *_mask should be zero.

The top of xchk_mark_healthy should have been:

if (sc->sm->sm_type == XFS_SCRUB_TYPE_HEALTHY) {
	xchk_mark_all_healthy(sc->mp);
	return;
}

if (!mask)
	return;

switch (sc->sm->sm_type) {
<etc>

--D

> Brian
> 
> >  	default:
> >  		break;
> >  	}
> > diff --git a/fs/xfs/scrub/health.h b/fs/xfs/scrub/health.h
> > index e795f4c9a23c..001e5a93273d 100644
> > --- a/fs/xfs/scrub/health.h
> > +++ b/fs/xfs/scrub/health.h
> > @@ -8,5 +8,6 @@
> >  
> >  unsigned int xchk_health_mask_for_scrub_type(__u32 scrub_type);
> >  void xchk_update_health(struct xfs_scrub *sc, bool already_fixed);
> > +int xchk_health_record(struct xfs_scrub *sc);
> >  
> >  #endif /* __XFS_SCRUB_HEALTH_H__ */
> > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > index f28f4bad317b..5df67fe5d8ac 100644
> > --- a/fs/xfs/scrub/repair.c
> > +++ b/fs/xfs/scrub/repair.c
> > @@ -31,6 +31,7 @@
> >  #include "xfs_quota.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_reflink.h"
> > +#include "xfs_health.h"
> >  #include "scrub/xfs_scrub.h"
> >  #include "scrub/scrub.h"
> >  #include "scrub/common.h"
> > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > index b1519dfc5811..f446ab57d7b0 100644
> > --- a/fs/xfs/scrub/scrub.c
> > +++ b/fs/xfs/scrub/scrub.c
> > @@ -348,6 +348,12 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
> >  		.scrub	= xchk_quota,
> >  		.repair	= xrep_notsupported,
> >  	},
> > +	[XFS_SCRUB_TYPE_HEALTHY] = {	/* fs healthy; clean all reminders */
> > +		.type	= ST_FS,
> > +		.setup	= xchk_setup_fs,
> > +		.scrub	= xchk_health_record,
> > +		.repair = xrep_notsupported,
> > +	},
> >  };
> >  
> >  /* This isn't a stable feature, warn once per day. */
> > diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> > index 3c83e8b3b39c..7c25a38c6f81 100644
> > --- a/fs/xfs/scrub/trace.h
> > +++ b/fs/xfs/scrub/trace.h
> > @@ -75,7 +75,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_PQUOTA);
> >  	{ XFS_SCRUB_TYPE_RTSUM,		"rtsummary" }, \
> >  	{ XFS_SCRUB_TYPE_UQUOTA,	"usrquota" }, \
> >  	{ XFS_SCRUB_TYPE_GQUOTA,	"grpquota" }, \
> > -	{ XFS_SCRUB_TYPE_PQUOTA,	"prjquota" }
> > +	{ XFS_SCRUB_TYPE_PQUOTA,	"prjquota" }, \
> > +	{ XFS_SCRUB_TYPE_HEALTHY,	"healthy" }
> >  
> >  DECLARE_EVENT_CLASS(xchk_class,
> >  	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
> > @@ -223,6 +224,7 @@ DEFINE_EVENT(xchk_block_error_class, name, \
> >  		 void *ret_ip), \
> >  	TP_ARGS(sc, daddr, ret_ip))
> >  
> > +DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_fs_error);
> >  DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_block_error);
> >  DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_block_preen);
> >  
> > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health
  2019-04-04 11:50   ` Brian Foster
@ 2019-04-04 18:01     ` Darrick J. Wong
  2019-04-05 13:07       ` Brian Foster
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-04 18:01 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Apr 04, 2019 at 07:50:11AM -0400, Brian Foster wrote:
> On Mon, Apr 01, 2019 at 10:11:12AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Now that we have the ability to track sick metadata in-core, make scrub
> > and repair update those health assessments after doing work.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/Makefile       |    1 
> >  fs/xfs/scrub/health.c |  180 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/health.h |   12 +++
> >  fs/xfs/scrub/scrub.c  |    8 ++
> >  fs/xfs/scrub/scrub.h  |   11 +++
> >  5 files changed, 212 insertions(+)
> >  create mode 100644 fs/xfs/scrub/health.c
> >  create mode 100644 fs/xfs/scrub/health.h
> > 
> > 
> ...
> > diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
> > new file mode 100644
> > index 000000000000..dd9986500801
> > --- /dev/null
> > +++ b/fs/xfs/scrub/health.c
> > @@ -0,0 +1,180 @@
> ...
> > +/* Update filesystem health assessments based on what we found and did. */
> > +void
> > +xchk_update_health(
> > +	struct xfs_scrub	*sc,
> > +	bool			already_fixed)
> > +{
> > +	/*
> > +	 * If the scrubber finds errors, we mark sick whatever's mentioned in
> > +	 * sick_mask, no matter whether this is a first scan or an evaluation
> > +	 * of repair effectiveness.
> > +	 *
> > +	 * If there is no direct corruption and we're called after a repair,
> > +	 * clear whatever's in heal_mask because that's what we fixed.
> > +	 *
> > +	 * Otherwise, there's no direct corruption and we didn't repair
> > +	 * anything, so mark whatever's in sick_mask as healthy.
> > +	 */
> > +	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > +		xchk_mark_sick(sc, sc->sick_mask);
> > +	else if (already_fixed)
> > +		xchk_mark_healthy(sc, sc->heal_mask);
> > +	else
> > +		xchk_mark_healthy(sc, sc->sick_mask);
> > +}
> 
> Hmm, I think I follow what we're doing here but it's a bit confusing
> without the additional context of where these bits will be set/cleared
> at the lower scrub layers (or at least without an example). Some
> questions on that below...
> 
> ...
> > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > index 1b2344d00525..b1519dfc5811 100644
> > --- a/fs/xfs/scrub/scrub.c
> > +++ b/fs/xfs/scrub/scrub.c
> > @@ -40,6 +40,7 @@
> >  #include "scrub/trace.h"
> >  #include "scrub/btree.h"
> >  #include "scrub/repair.h"
> > +#include "scrub/health.h"
> >  
> >  /*
> >   * Online Scrub and Repair
> > @@ -468,6 +469,7 @@ xfs_scrub_metadata(
> >  {
> >  	struct xfs_scrub		sc;
> >  	struct xfs_mount		*mp = ip->i_mount;
> > +	unsigned int			heal_mask;
> >  	bool				try_harder = false;
> >  	bool				already_fixed = false;
> >  	int				error = 0;
> > @@ -488,6 +490,7 @@ xfs_scrub_metadata(
> >  	error = xchk_validate_inputs(mp, sm);
> >  	if (error)
> >  		goto out;
> > +	heal_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
> >  
> >  	xchk_experimental_warning(mp);
> >  
> > @@ -499,6 +502,8 @@ xfs_scrub_metadata(
> >  	sc.ops = &meta_scrub_ops[sm->sm_type];
> >  	sc.try_harder = try_harder;
> >  	sc.sa.agno = NULLAGNUMBER;
> > +	sc.heal_mask = heal_mask;
> > +	sc.sick_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
> 
> Ok, so we initialize the heal/sick masks based on the scrub type that
> was requested on the first pass through...
> 
> >  	error = sc.ops->setup(&sc, ip);
> >  	if (error)
> >  		goto out_teardown;
> > @@ -519,6 +524,8 @@ xfs_scrub_metadata(
> >  	} else if (error)
> >  		goto out_teardown;
> >  
> > +	xchk_update_health(&sc, already_fixed);
> > +
> 
> ... then update the in-core fs health state based on the sick mask. Is
> it possible for the scrub operation to set more sick mask bits based on
> what it finds?

Theoretically, yes, but in practice none of the current scrubbers need
to touch sick_mask.

heal_mask, OTOH, will be adjusted by the free space / inode repair
functions since they rebuild multiple structures.

> More specifically, I'm wondering why the masks wouldn't start as zero
> and toggle based on finding/fixing corruption(s).

sick_mask is also the mask we feed to xfs_*_mark_healthy if the scan
returns clean, which is why we set the default value before dispatching
the scrub.

> Or if the sick mask value is essentially fixed, whether we need to
> store it in the xfs_scrub context...

We could probably get away with generating it in xchk_update_health at
the end, but it feels weird to have heal_mask in the scrub context but
sick_mask gets auto-generated.

> 
> >  	if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed) {
> >  		bool needs_fix;
> >  
> > @@ -551,6 +558,7 @@ xfs_scrub_metadata(
> >  				xrep_failure(mp);
> >  				goto out;
> >  			}
> > +			heal_mask = sc.heal_mask;
> 
> And if we end up doing a repair, we presumably can repair multiple
> things and so we track that separately and persist the heal mask across
> a potential retry.

Right.

> What about the case where we don't retry, but scrub finds something
> and then immediately repairs it?

The repair jumps back to retry_op if either (a) we couldn't get all the
resources we needed and therefore sc.try_harder = true and we need to
start over; or (b) repair thinks it fixed a thing, so we need to scrub
the thing again to see if it's really fixed...

> Should we update the fs state after both detecting and clearing the
> problem, or does that happen elsewhere?

...so if scrub immediately repairs a thing, we preserve heal_mask, jump
back to the scrub, and if the scrub says clean we'll mark heal mask
healthy.

If the repair has to retry then the we'll call the repair function
again, which (presumably) will set (again) the heal_mask appropriately,
and then we have the same post-repair state updating as above.

Does that make sense? :)

> Also, if repair can potentially clear multiple bits, what's the
> possibility of a repair clearing one failure and then failing on
> another, causing the broader repair op to return an error or jump into
> this retry?

Scrub doesn't touch the fs health state at all until after the ->scrub
or ->repair function succeeds.  If the scrub or the repair functions
fail for any non-retry reason, we back out to userspace without updating
anything.  It's as if we'd never called the failed function.

Maybe some worked examples will help?

Let's say both inode btrees are corrupt.  We run xfs_scrub -n,
xchk_inobt will record the corruption, and (assuming it hits no runtime
errors) once we return to xfs_scrub_metadata, it'll set
XFS_SICK_AG_INOBT.  Presumably xfs_scrub will also call the finobt scrub
and SICK_AG_FINOBT will also get set.

If we run xfs_scrub without the -n, xchk_inobt will record the
corruption and set SICK_AG_INOBT per above.  Then it'll run xrep_inobt,
which will set heal_mask to SICK_AG_INOBT | SICK_AG_FINOBT.  If the
repair fails with a non-retry runtime error, we exit to userspace and
ignore heal_mask.

If instead the repair succeeds, we scan the inobt again.  If that comes
up clear then we use heal_mask to clear SICK_AG_INOBT | SICK_AG_FINOBT.
xfs_scrub will call again later to repair the finobt, but the initial
finobt scan will see no errors in the finobt, clear SICK_AG_FINOBT
(which isn't set) and exit.

If the inobt repair function is buggy and says it repaired the inode
btrees but leaves corruptions, then the rescan of the inobt will notice
and set SICK_AG_INOBT (which is already set) and exit.  Similarly, when
xfs_scrub calls back about the finobt, it will notice the corrupt
finobt, try to set SICK_AG_FINOBT (also already set), try to fix it, and
the rescan of the finobt will notice that the finobt is still corrupt
and try to set SICK_AG_FINOBT (which is still set).

The end result (I think) is that we always set the sick bits if a scan
shows problems, and we only clear the sick bits for things if we can
prove that the things are no longer sick.  Does that help?

> ISTM that it might be possible to skip clearing one fail state bit so
> long as the original thing remained corrupted, but I feel like I'm
> still missing some context on the bigger picture scrub tracking...

Yeah, the state machine is pretty squirrely. :/

--D

> Brian
> 
> >  			goto retry_op;
> >  		}
> >  	}
> > diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> > index 22f754fba8e5..05f1ad242a35 100644
> > --- a/fs/xfs/scrub/scrub.h
> > +++ b/fs/xfs/scrub/scrub.h
> > @@ -62,6 +62,17 @@ struct xfs_scrub {
> >  	struct xfs_inode		*ip;
> >  	void				*buf;
> >  	uint				ilock_flags;
> > +
> > +	/* Metadata to be marked sick if scrub finds errors. */
> > +	unsigned int			sick_mask;
> > +
> > +	/*
> > +	 * Metadata to be marked healthy if repair fixes errors.  Some repair
> > +	 * functions can fix multiple data structures at once, so we have to
> > +	 * treat sick and heal masks separately.
> > +	 */
> > +	unsigned int			heal_mask;
> > +
> >  	bool				try_harder;
> >  	bool				has_quotaofflock;
> >  
> > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health
  2019-04-04 18:01     ` Darrick J. Wong
@ 2019-04-05 13:07       ` Brian Foster
  2019-04-05 20:54         ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-05 13:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Apr 04, 2019 at 11:01:33AM -0700, Darrick J. Wong wrote:
> On Thu, Apr 04, 2019 at 07:50:11AM -0400, Brian Foster wrote:
> > On Mon, Apr 01, 2019 at 10:11:12AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Now that we have the ability to track sick metadata in-core, make scrub
> > > and repair update those health assessments after doing work.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/Makefile       |    1 
> > >  fs/xfs/scrub/health.c |  180 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/scrub/health.h |   12 +++
> > >  fs/xfs/scrub/scrub.c  |    8 ++
> > >  fs/xfs/scrub/scrub.h  |   11 +++
> > >  5 files changed, 212 insertions(+)
> > >  create mode 100644 fs/xfs/scrub/health.c
> > >  create mode 100644 fs/xfs/scrub/health.h
> > > 
> > > 
> > ...
> > > diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
> > > new file mode 100644
> > > index 000000000000..dd9986500801
> > > --- /dev/null
> > > +++ b/fs/xfs/scrub/health.c
> > > @@ -0,0 +1,180 @@
> > ...
> > > +/* Update filesystem health assessments based on what we found and did. */
> > > +void
> > > +xchk_update_health(
> > > +	struct xfs_scrub	*sc,
> > > +	bool			already_fixed)
> > > +{
> > > +	/*
> > > +	 * If the scrubber finds errors, we mark sick whatever's mentioned in
> > > +	 * sick_mask, no matter whether this is a first scan or an evaluation
> > > +	 * of repair effectiveness.
> > > +	 *
> > > +	 * If there is no direct corruption and we're called after a repair,
> > > +	 * clear whatever's in heal_mask because that's what we fixed.
> > > +	 *
> > > +	 * Otherwise, there's no direct corruption and we didn't repair
> > > +	 * anything, so mark whatever's in sick_mask as healthy.
> > > +	 */
> > > +	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > > +		xchk_mark_sick(sc, sc->sick_mask);
> > > +	else if (already_fixed)
> > > +		xchk_mark_healthy(sc, sc->heal_mask);
> > > +	else
> > > +		xchk_mark_healthy(sc, sc->sick_mask);
> > > +}
> > 
> > Hmm, I think I follow what we're doing here but it's a bit confusing
> > without the additional context of where these bits will be set/cleared
> > at the lower scrub layers (or at least without an example). Some
> > questions on that below...
> > 
> > ...
> > > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > > index 1b2344d00525..b1519dfc5811 100644
> > > --- a/fs/xfs/scrub/scrub.c
> > > +++ b/fs/xfs/scrub/scrub.c
> > > @@ -40,6 +40,7 @@
> > >  #include "scrub/trace.h"
> > >  #include "scrub/btree.h"
> > >  #include "scrub/repair.h"
> > > +#include "scrub/health.h"
> > >  
> > >  /*
> > >   * Online Scrub and Repair
> > > @@ -468,6 +469,7 @@ xfs_scrub_metadata(
> > >  {
> > >  	struct xfs_scrub		sc;
> > >  	struct xfs_mount		*mp = ip->i_mount;
> > > +	unsigned int			heal_mask;
> > >  	bool				try_harder = false;
> > >  	bool				already_fixed = false;
> > >  	int				error = 0;
> > > @@ -488,6 +490,7 @@ xfs_scrub_metadata(
> > >  	error = xchk_validate_inputs(mp, sm);
> > >  	if (error)
> > >  		goto out;
> > > +	heal_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
> > >  
> > >  	xchk_experimental_warning(mp);
> > >  
> > > @@ -499,6 +502,8 @@ xfs_scrub_metadata(
> > >  	sc.ops = &meta_scrub_ops[sm->sm_type];
> > >  	sc.try_harder = try_harder;
> > >  	sc.sa.agno = NULLAGNUMBER;
> > > +	sc.heal_mask = heal_mask;
> > > +	sc.sick_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
> > 
> > Ok, so we initialize the heal/sick masks based on the scrub type that
> > was requested on the first pass through...
> > 
> > >  	error = sc.ops->setup(&sc, ip);
> > >  	if (error)
> > >  		goto out_teardown;
> > > @@ -519,6 +524,8 @@ xfs_scrub_metadata(
> > >  	} else if (error)
> > >  		goto out_teardown;
> > >  
> > > +	xchk_update_health(&sc, already_fixed);
> > > +
> > 
> > ... then update the in-core fs health state based on the sick mask. Is
> > it possible for the scrub operation to set more sick mask bits based on
> > what it finds?
> 
> Theoretically, yes, but in practice none of the current scrubbers need
> to touch sick_mask.
> 
> heal_mask, OTOH, will be adjusted by the free space / inode repair
> functions since they rebuild multiple structures.
> 

Ok..

> > More specifically, I'm wondering why the masks wouldn't start as zero
> > and toggle based on finding/fixing corruption(s).
> 
> sick_mask is also the mask we feed to xfs_*_mark_healthy if the scan
> returns clean, which is why we set the default value before dispatching
> the scrub.
> 
> > Or if the sick mask value is essentially fixed, whether we need to
> > store it in the xfs_scrub context...
> 
> We could probably get away with generating it in xchk_update_health at
> the end, but it feels weird to have heal_mask in the scrub context but
> sick_mask gets auto-generated.
> 

Ok.. hmm. Both feel a little weird to me, but this is really just an
aesthetic/factoring thing so I'll think about it a bit more and come
back to it.

> > 
> > >  	if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed) {
> > >  		bool needs_fix;
> > >  
> > > @@ -551,6 +558,7 @@ xfs_scrub_metadata(
> > >  				xrep_failure(mp);
> > >  				goto out;
> > >  			}
> > > +			heal_mask = sc.heal_mask;
> > 
> > And if we end up doing a repair, we presumably can repair multiple
> > things and so we track that separately and persist the heal mask across
> > a potential retry.
> 
> Right.
> 
> > What about the case where we don't retry, but scrub finds something
> > and then immediately repairs it?
> 
> The repair jumps back to retry_op if either (a) we couldn't get all the
> resources we needed and therefore sc.try_harder = true and we need to
> start over; or (b) repair thinks it fixed a thing, so we need to scrub
> the thing again to see if it's really fixed...
> 
> > Should we update the fs state after both detecting and clearing the
> > problem, or does that happen elsewhere?
> 
> ...so if scrub immediately repairs a thing, we preserve heal_mask, jump
> back to the scrub, and if the scrub says clean we'll mark heal mask
> healthy.
> 
> If the repair has to retry then the we'll call the repair function
> again, which (presumably) will set (again) the heal_mask appropriately,
> and then we have the same post-repair state updating as above.
> 
> Does that make sense? :)
> 

Ah, Ok. I didn't realize that a successful repair looped back to the
scrub code (and thus the health update). Yes, that makes more sense.

> > Also, if repair can potentially clear multiple bits, what's the
> > possibility of a repair clearing one failure and then failing on
> > another, causing the broader repair op to return an error or jump into
> > this retry?
> 
> Scrub doesn't touch the fs health state at all until after the ->scrub
> or ->repair function succeeds.  If the scrub or the repair functions
> fail for any non-retry reason, we back out to userspace without updating
> anything.  It's as if we'd never called the failed function.
> 

Right.. what I was getting at above is seeing whether we'd actually
update partial repair state in-core. E.g., suppose things A and B are
faulted in-core and it's one of these cases where repair can fix A and B
at the same time. If it fixes thing A and fails on thing B, it sounds
like we'd not clear the in-core fault state on A even though it's
technically repaired.

> Maybe some worked examples will help?
> 
> Let's say both inode btrees are corrupt.  We run xfs_scrub -n,
> xchk_inobt will record the corruption, and (assuming it hits no runtime
> errors) once we return to xfs_scrub_metadata, it'll set
> XFS_SICK_AG_INOBT.  Presumably xfs_scrub will also call the finobt scrub
> and SICK_AG_FINOBT will also get set.
> 
> If we run xfs_scrub without the -n, xchk_inobt will record the
> corruption and set SICK_AG_INOBT per above.  Then it'll run xrep_inobt,
> which will set heal_mask to SICK_AG_INOBT | SICK_AG_FINOBT.  If the
> repair fails with a non-retry runtime error, we exit to userspace and
> ignore heal_mask.
> 

Ok, this sounds like the case I'm theorizing about above (where suppose
repair fixed the inobt and then failed on the finobt, but hasn't cleared
faults for either..).

> If instead the repair succeeds, we scan the inobt again.  If that comes
> up clear then we use heal_mask to clear SICK_AG_INOBT | SICK_AG_FINOBT.
> xfs_scrub will call again later to repair the finobt, but the initial
> finobt scan will see no errors in the finobt, clear SICK_AG_FINOBT
> (which isn't set) and exit.
> 

So it sounds like the state would have to be cleared by a subsequent
scrub request. The scan would find thing A healthy and mark it so
regardless, to clear any potential previous faults that might have
already been repaired. Right?

> If the inobt repair function is buggy and says it repaired the inode
> btrees but leaves corruptions, then the rescan of the inobt will notice
> and set SICK_AG_INOBT (which is already set) and exit.  Similarly, when
> xfs_scrub calls back about the finobt, it will notice the corrupt
> finobt, try to set SICK_AG_FINOBT (also already set), try to fix it, and
> the rescan of the finobt will notice that the finobt is still corrupt
> and try to set SICK_AG_FINOBT (which is still set).
> 
> The end result (I think) is that we always set the sick bits if a scan
> shows problems, and we only clear the sick bits for things if we can
> prove that the things are no longer sick.  Does that help?
> 

Yes, thanks for the explanation. I think the confusion is mostly due to
not being able to fully see how these scrub states are managed,
particularly the bits that warranted the creation of separate masks in
the first place.

This does still have me wondering if separate masks are necessary, if we
perhaps had more selective health update logic, for example. I think it
might be better to either bundle this patch with whatever other changes
actually make use of the separate masks, or alternatively to simplify
the current logic and just defer the separate mask thing until those
more complex repair algorithms come along..

Brian

> > ISTM that it might be possible to skip clearing one fail state bit so
> > long as the original thing remained corrupted, but I feel like I'm
> > still missing some context on the bigger picture scrub tracking...
> 
> Yeah, the state machine is pretty squirrely. :/
> 
> --D
> 
> > Brian
> > 
> > >  			goto retry_op;
> > >  		}
> > >  	}
> > > diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> > > index 22f754fba8e5..05f1ad242a35 100644
> > > --- a/fs/xfs/scrub/scrub.h
> > > +++ b/fs/xfs/scrub/scrub.h
> > > @@ -62,6 +62,17 @@ struct xfs_scrub {
> > >  	struct xfs_inode		*ip;
> > >  	void				*buf;
> > >  	uint				ilock_flags;
> > > +
> > > +	/* Metadata to be marked sick if scrub finds errors. */
> > > +	unsigned int			sick_mask;
> > > +
> > > +	/*
> > > +	 * Metadata to be marked healthy if repair fixes errors.  Some repair
> > > +	 * functions can fix multiple data structures at once, so we have to
> > > +	 * treat sick and heal masks separately.
> > > +	 */
> > > +	unsigned int			heal_mask;
> > > +
> > >  	bool				try_harder;
> > >  	bool				has_quotaofflock;
> > >  
> > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/10] xfs: report AG health via AG geometry ioctl
  2019-04-04 11:48       ` Brian Foster
@ 2019-04-05 20:33         ` Darrick J. Wong
  2019-04-08 11:34           ` Brian Foster
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-05 20:33 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Apr 04, 2019 at 07:48:57AM -0400, Brian Foster wrote:
> On Wed, Apr 03, 2019 at 09:11:06AM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 03, 2019 at 10:30:05AM -0400, Brian Foster wrote:
> > > On Mon, Apr 01, 2019 at 10:10:52AM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Use the AG geometry info ioctl to report health status too.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_fs.h     |   12 +++++++++++-
> > > >  fs/xfs/libxfs/xfs_health.h |    2 ++
> > > >  fs/xfs/xfs_health.c        |   40 ++++++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/xfs_ioctl.c         |    2 ++
> > > >  4 files changed, 55 insertions(+), 1 deletion(-)
> > > > 
> > > > 
> > > ...
> > > > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > > > index 151c98693bef..5ca471bd41ad 100644
> > > > --- a/fs/xfs/xfs_health.c
> > > > +++ b/fs/xfs/xfs_health.c
> > > > @@ -276,3 +276,43 @@ xfs_fsop_geom_health(
> > > >  	if (sick & XFS_HEALTH_RT_SUMMARY)
> > > >  		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY;
> > > >  }
> > > > +
> > > > +/* Fill out ag geometry health info. */
> > > > +void
> > > > +xfs_ag_geom_health(
> > > > +	struct xfs_mount	*mp,
> > > > +	xfs_agnumber_t		agno,
> > > > +	struct xfs_ag_geometry	*ageo)
> > > > +{
> > > > +	struct xfs_perag	*pag;
> > > > +	unsigned int		sick;
> > > > +
> > > > +	if (agno >= mp->m_sb.sb_agcount)
> > > > +		return;
> > > 
> > > The call to xfs_ag_get_geometry() would have already returned an error
> > > in the ioctl path for the above scenario. It might still make sense to
> > > check here, but perhaps we could let this function also return an int
> > > and return an error for consistency?
> > 
> > Or maybe just ASSERT on the agno and add a note that the caller is
> > required to pass in a valid ag number.
> > 
> > > > +
> > > > +	ageo->ag_health = 0;
> > > > +
> > > > +	pag = xfs_perag_get(mp, agno);
> > > > +	sick = xfs_ag_measure_sickness(pag);
> > > > +	if (sick & XFS_HEALTH_AG_SB)
> > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_SB;
> > > 
> > > I'm starting to wonder whether "health" is the best term to use for the
> > > interface bits just because it reads a little weird to measure
> > > "sickness" and then apply all the sick state to something called
> > > "health." I don't have a better suggestion off the top of my head,
> > > though. Just something to think about a bit more from an API
> > > standpoint..
> > 
> > I had the same conundrum.  I guess we could start the bitset with -1 and
> > clear bits when scrub says they've gone bad?  That would be much clearer
> > with regards to the names, but technically we don't know the health of a
> > structure until we scan it, so I wouldn't want to represent the fs as
> > being "healthy" having not actually looked for problems.
> > 
> > What we /really/ need is a tri-state bitset[1]:
> > 
> > enum Bool
> > {
> >     True,
> >     False,
> >     FileNotFound
> > };
> > 
> > But maybe I will try renaming all this to "sick" again.
> > 
> > if (sick & XFS_SICK_AG_AGF)
> > 	ageo->ag_sick |= XFS_AG_GEOM_SICK_AG_AGF;
> > 
> > Gosh.  That second name is gross.  XFS_AG_GEOM_SICK_AGF.
> > 
> > Sick sick sick sick sick.  Ok, I've convinced myself of the name change. :P
> > 
> 
> Heh. I suppose we could either invert the logic or perhaps try to come
> up with a better keyword than "health" for the exported bits (at least).
> If I see ag_health in a data structure, for example, I'm assuming it's
> telling me what is healthy. Of course we'll have documentation and
> whatnot to clear that up..
> 
> Another term that came to mind is "fault" or "faulted" as it has
> precedent in storage contexts wrt to raid. I.e., ag_faults and
> XFS_AG_GEOM_FAULT_AGF, etc. etc. To me it also kind of covers the angle
> that we aren't necessarily stating a subset of the filesystem is healthy
> due to lack of faults if we just haven't scrubbed/found anything. Hm? I
> guess it could be confused with reporting underlying storage problems. I
> dunno... it's more clear to me, but maybe others have ideas..

I have a (not very strong) preference for 'sick' over 'fault' because
there are other parts of xfs where we deal with (page) faults and I
don't really want to get "file metadata faults" and "file page faults"
confused.

(I'm not sure anyone is really going to confuse them, though...)

--D

> Brian
> 
> > --D
> > 
> > [1] https://thedailywtf.com/articles/What_Is_Truth_0x3f_
> > 
> > > Brian
> > > 
> > > > +	if (sick & XFS_HEALTH_AG_AGF)
> > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGF;
> > > > +	if (sick & XFS_HEALTH_AG_AGFL)
> > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGFL;
> > > > +	if (sick & XFS_HEALTH_AG_AGI)
> > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGI;
> > > > +	if (sick & XFS_HEALTH_AG_BNOBT)
> > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_BNOBT;
> > > > +	if (sick & XFS_HEALTH_AG_CNTBT)
> > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_CNTBT;
> > > > +	if (sick & XFS_HEALTH_AG_INOBT)
> > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_INOBT;
> > > > +	if (sick & XFS_HEALTH_AG_FINOBT)
> > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_FINOBT;
> > > > +	if (sick & XFS_HEALTH_AG_RMAPBT)
> > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_RMAPBT;
> > > > +	if (sick & XFS_HEALTH_AG_REFCNTBT)
> > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_REFCNTBT;
> > > > +	xfs_perag_put(pag);
> > > > +}
> > > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > > index f9bf11b6a055..f1fc5e53cfc1 100644
> > > > --- a/fs/xfs/xfs_ioctl.c
> > > > +++ b/fs/xfs/xfs_ioctl.c
> > > > @@ -853,6 +853,8 @@ xfs_ioc_ag_geometry(
> > > >  	if (error)
> > > >  		return error;
> > > >  
> > > > +	xfs_ag_geom_health(mp, ageo.ag_number, &ageo);
> > > > +
> > > >  	if (copy_to_user(arg, &ageo, sizeof(ageo)))
> > > >  		return -EFAULT;
> > > >  	return 0;
> > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health
  2019-04-05 13:07       ` Brian Foster
@ 2019-04-05 20:54         ` Darrick J. Wong
  2019-04-08 11:35           ` Brian Foster
  0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-05 20:54 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 05, 2019 at 09:07:39AM -0400, Brian Foster wrote:
> On Thu, Apr 04, 2019 at 11:01:33AM -0700, Darrick J. Wong wrote:
> > On Thu, Apr 04, 2019 at 07:50:11AM -0400, Brian Foster wrote:
> > > On Mon, Apr 01, 2019 at 10:11:12AM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Now that we have the ability to track sick metadata in-core, make scrub
> > > > and repair update those health assessments after doing work.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/Makefile       |    1 
> > > >  fs/xfs/scrub/health.c |  180 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/scrub/health.h |   12 +++
> > > >  fs/xfs/scrub/scrub.c  |    8 ++
> > > >  fs/xfs/scrub/scrub.h  |   11 +++
> > > >  5 files changed, 212 insertions(+)
> > > >  create mode 100644 fs/xfs/scrub/health.c
> > > >  create mode 100644 fs/xfs/scrub/health.h
> > > > 
> > > > 
> > > ...
> > > > diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
> > > > new file mode 100644
> > > > index 000000000000..dd9986500801
> > > > --- /dev/null
> > > > +++ b/fs/xfs/scrub/health.c
> > > > @@ -0,0 +1,180 @@
> > > ...
> > > > +/* Update filesystem health assessments based on what we found and did. */
> > > > +void
> > > > +xchk_update_health(
> > > > +	struct xfs_scrub	*sc,
> > > > +	bool			already_fixed)
> > > > +{
> > > > +	/*
> > > > +	 * If the scrubber finds errors, we mark sick whatever's mentioned in
> > > > +	 * sick_mask, no matter whether this is a first scan or an evaluation
> > > > +	 * of repair effectiveness.
> > > > +	 *
> > > > +	 * If there is no direct corruption and we're called after a repair,
> > > > +	 * clear whatever's in heal_mask because that's what we fixed.
> > > > +	 *
> > > > +	 * Otherwise, there's no direct corruption and we didn't repair
> > > > +	 * anything, so mark whatever's in sick_mask as healthy.
> > > > +	 */
> > > > +	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > > > +		xchk_mark_sick(sc, sc->sick_mask);
> > > > +	else if (already_fixed)
> > > > +		xchk_mark_healthy(sc, sc->heal_mask);
> > > > +	else
> > > > +		xchk_mark_healthy(sc, sc->sick_mask);
> > > > +}
> > > 
> > > Hmm, I think I follow what we're doing here but it's a bit confusing
> > > without the additional context of where these bits will be set/cleared
> > > at the lower scrub layers (or at least without an example). Some
> > > questions on that below...
> > > 
> > > ...
> > > > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > > > index 1b2344d00525..b1519dfc5811 100644
> > > > --- a/fs/xfs/scrub/scrub.c
> > > > +++ b/fs/xfs/scrub/scrub.c
> > > > @@ -40,6 +40,7 @@
> > > >  #include "scrub/trace.h"
> > > >  #include "scrub/btree.h"
> > > >  #include "scrub/repair.h"
> > > > +#include "scrub/health.h"
> > > >  
> > > >  /*
> > > >   * Online Scrub and Repair
> > > > @@ -468,6 +469,7 @@ xfs_scrub_metadata(
> > > >  {
> > > >  	struct xfs_scrub		sc;
> > > >  	struct xfs_mount		*mp = ip->i_mount;
> > > > +	unsigned int			heal_mask;
> > > >  	bool				try_harder = false;
> > > >  	bool				already_fixed = false;
> > > >  	int				error = 0;
> > > > @@ -488,6 +490,7 @@ xfs_scrub_metadata(
> > > >  	error = xchk_validate_inputs(mp, sm);
> > > >  	if (error)
> > > >  		goto out;
> > > > +	heal_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
> > > >  
> > > >  	xchk_experimental_warning(mp);
> > > >  
> > > > @@ -499,6 +502,8 @@ xfs_scrub_metadata(
> > > >  	sc.ops = &meta_scrub_ops[sm->sm_type];
> > > >  	sc.try_harder = try_harder;
> > > >  	sc.sa.agno = NULLAGNUMBER;
> > > > +	sc.heal_mask = heal_mask;
> > > > +	sc.sick_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
> > > 
> > > Ok, so we initialize the heal/sick masks based on the scrub type that
> > > was requested on the first pass through...
> > > 
> > > >  	error = sc.ops->setup(&sc, ip);
> > > >  	if (error)
> > > >  		goto out_teardown;
> > > > @@ -519,6 +524,8 @@ xfs_scrub_metadata(
> > > >  	} else if (error)
> > > >  		goto out_teardown;
> > > >  
> > > > +	xchk_update_health(&sc, already_fixed);
> > > > +
> > > 
> > > ... then update the in-core fs health state based on the sick mask. Is
> > > it possible for the scrub operation to set more sick mask bits based on
> > > what it finds?
> > 
> > Theoretically, yes, but in practice none of the current scrubbers need
> > to touch sick_mask.
> > 
> > heal_mask, OTOH, will be adjusted by the free space / inode repair
> > functions since they rebuild multiple structures.
> > 
> 
> Ok..
> 
> > > More specifically, I'm wondering why the masks wouldn't start as zero
> > > and toggle based on finding/fixing corruption(s).
> > 
> > sick_mask is also the mask we feed to xfs_*_mark_healthy if the scan
> > returns clean, which is why we set the default value before dispatching
> > the scrub.
> > 
> > > Or if the sick mask value is essentially fixed, whether we need to
> > > store it in the xfs_scrub context...
> > 
> > We could probably get away with generating it in xchk_update_health at
> > the end, but it feels weird to have heal_mask in the scrub context but
> > sick_mask gets auto-generated.
> > 
> 
> Ok.. hmm. Both feel a little weird to me, but this is really just an
> aesthetic/factoring thing so I'll think about it a bit more and come
> back to it.
> 
> > > 
> > > >  	if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed) {
> > > >  		bool needs_fix;
> > > >  
> > > > @@ -551,6 +558,7 @@ xfs_scrub_metadata(
> > > >  				xrep_failure(mp);
> > > >  				goto out;
> > > >  			}
> > > > +			heal_mask = sc.heal_mask;
> > > 
> > > And if we end up doing a repair, we presumably can repair multiple
> > > things and so we track that separately and persist the heal mask across
> > > a potential retry.
> > 
> > Right.
> > 
> > > What about the case where we don't retry, but scrub finds something
> > > and then immediately repairs it?
> > 
> > The repair jumps back to retry_op if either (a) we couldn't get all the
> > resources we needed and therefore sc.try_harder = true and we need to
> > start over; or (b) repair thinks it fixed a thing, so we need to scrub
> > the thing again to see if it's really fixed...
> > 
> > > Should we update the fs state after both detecting and clearing the
> > > problem, or does that happen elsewhere?
> > 
> > ...so if scrub immediately repairs a thing, we preserve heal_mask, jump
> > back to the scrub, and if the scrub says clean we'll mark heal mask
> > healthy.
> > 
> > If the repair has to retry then the we'll call the repair function
> > again, which (presumably) will set (again) the heal_mask appropriately,
> > and then we have the same post-repair state updating as above.
> > 
> > Does that make sense? :)
> > 
> 
> Ah, Ok. I didn't realize that a successful repair looped back to the
> scrub code (and thus the health update). Yes, that makes more sense.
> 
> > > Also, if repair can potentially clear multiple bits, what's the
> > > possibility of a repair clearing one failure and then failing on
> > > another, causing the broader repair op to return an error or jump into
> > > this retry?
> > 
> > Scrub doesn't touch the fs health state at all until after the ->scrub
> > or ->repair function succeeds.  If the scrub or the repair functions
> > fail for any non-retry reason, we back out to userspace without updating
> > anything.  It's as if we'd never called the failed function.
> > 
> 
> Right.. what I was getting at above is seeing whether we'd actually
> update partial repair state in-core. E.g., suppose things A and B are
> faulted in-core and it's one of these cases where repair can fix A and B
> at the same time. If it fixes thing A and fails on thing B, it sounds
> like we'd not clear the in-core fault state on A even though it's
> technically repaired.

Hmm.  If the repair function returns a runtime error (having fixed A but
not B) then yes, we won't clear the incore fault state on A (or B) even
though we fixed A.  Something weird happened, so we shouldn't be too
hasty to clear things.  A subsequent re-scrub of A will clear the fault
on A, though.

OTOH... if the A/B repair function returns 0 having fixed A but left B
corrupt, the rescan will see that A is fine and (incorrectly) clear both
A and B.  I would say that's a bug, so maybe I should rethink the need
for sick_mask and heal_mask.

That said, a normal xfs_scrub run will check (or have already checked) B
and noticed that it was corrupt, so it will circle back and try to fix B
separately, so in a sense we don't really need heal_mask at all.

> > Maybe some worked examples will help?
> > 
> > Let's say both inode btrees are corrupt.  We run xfs_scrub -n,
> > xchk_inobt will record the corruption, and (assuming it hits no runtime
> > errors) once we return to xfs_scrub_metadata, it'll set
> > XFS_SICK_AG_INOBT.  Presumably xfs_scrub will also call the finobt scrub
> > and SICK_AG_FINOBT will also get set.
> > 
> > If we run xfs_scrub without the -n, xchk_inobt will record the
> > corruption and set SICK_AG_INOBT per above.  Then it'll run xrep_inobt,
> > which will set heal_mask to SICK_AG_INOBT | SICK_AG_FINOBT.  If the
> > repair fails with a non-retry runtime error, we exit to userspace and
> > ignore heal_mask.
> > 
> 
> Ok, this sounds like the case I'm theorizing about above (where suppose
> repair fixed the inobt and then failed on the finobt, but hasn't cleared
> faults for either..).
> 
> > If instead the repair succeeds, we scan the inobt again.  If that comes
> > up clear then we use heal_mask to clear SICK_AG_INOBT | SICK_AG_FINOBT.
> > xfs_scrub will call again later to repair the finobt, but the initial
> > finobt scan will see no errors in the finobt, clear SICK_AG_FINOBT
> > (which isn't set) and exit.
> > 
> 
> So it sounds like the state would have to be cleared by a subsequent
> scrub request. The scan would find thing A healthy and mark it so
> regardless, to clear any potential previous faults that might have
> already been repaired. Right?

Right.

> > If the inobt repair function is buggy and says it repaired the inode
> > btrees but leaves corruptions, then the rescan of the inobt will notice
> > and set SICK_AG_INOBT (which is already set) and exit.  Similarly, when
> > xfs_scrub calls back about the finobt, it will notice the corrupt
> > finobt, try to set SICK_AG_FINOBT (also already set), try to fix it, and
> > the rescan of the finobt will notice that the finobt is still corrupt
> > and try to set SICK_AG_FINOBT (which is still set).
> > 
> > The end result (I think) is that we always set the sick bits if a scan
> > shows problems, and we only clear the sick bits for things if we can
> > prove that the things are no longer sick.  Does that help?
> > 
> 
> Yes, thanks for the explanation. I think the confusion is mostly due to
> not being able to fully see how these scrub states are managed,
> particularly the bits that warranted the creation of separate masks in
> the first place.

You've convinced me that this patch is too convoluted to understand, so
I think I want to simplify it some more.  First, I'd rename the field
to "sick_mask_update" and change the behavior so that we:

 1. Set sick_mask_update to the default XFS_SICK flag for this scrub
    type (call it A).  (We already do this)

 2. If the scrubber returns an error code, we exit making no changes to
    the incore sick state.

 3. If the scrubber finds that A is clean, clear the incore sick flags
    that are set in s_m_u and exit.

 4. If the scrubber finds that A is corrupt, set the incore sick flags
    that are set in s_m_u.

    a. If the user doesn't want to repair, then we exit, having
       previously set incore sick flags.

 5. Now we know that A is corrupt and the user wants to repair.
    If repair returns an error code, we exit with that error code, having
    made no further changes to the incore sick state.

 6. If repair rebuilds both A & B correctly and the re-scrub of A is
    clean, we'll clear the incore sick flags using s_m_u.  This should
    clear A.

 7. If repair rebuilds both A & B and screws up A, the re-scrub will find
    it corrupt and leave the sick flags as they are, which is to say that
    A is marked sick.

 8. If repair rebuilds A correctly but leaves B corrupt, the re-scrub of
    A will be clean and we'll clear the incore sick flags using s_m_u.
    This should clear A, even though B is corrupt.

 9. No matter whether we encountered scenarios 6, 7, or 8, if xfs_scrub
    previously scrubbed B and found it corrupt, it will call again to
    repair B, which will set the incore sick state appropriately.  If
    xfs_scrub has not yet scrubbed B then it will call later to scrub B,
    which will set the incore sick state appropriately.

I hope that's easier to understand...

> This does still have me wondering if separate masks are necessary, if we
> perhaps had more selective health update logic, for example. I think it
> might be better to either bundle this patch with whatever other changes
> actually make use of the separate masks, or alternatively to simplify
> the current logic and just defer the separate mask thing until those
> more complex repair algorithms come along..

--D

> Brian
> 
> > > ISTM that it might be possible to skip clearing one fail state bit so
> > > long as the original thing remained corrupted, but I feel like I'm
> > > still missing some context on the bigger picture scrub tracking...
> > 
> > Yeah, the state machine is pretty squirrely. :/
> > 
> > --D
> > 
> > > Brian
> > > 
> > > >  			goto retry_op;
> > > >  		}
> > > >  	}
> > > > diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> > > > index 22f754fba8e5..05f1ad242a35 100644
> > > > --- a/fs/xfs/scrub/scrub.h
> > > > +++ b/fs/xfs/scrub/scrub.h
> > > > @@ -62,6 +62,17 @@ struct xfs_scrub {
> > > >  	struct xfs_inode		*ip;
> > > >  	void				*buf;
> > > >  	uint				ilock_flags;
> > > > +
> > > > +	/* Metadata to be marked sick if scrub finds errors. */
> > > > +	unsigned int			sick_mask;
> > > > +
> > > > +	/*
> > > > +	 * Metadata to be marked healthy if repair fixes errors.  Some repair
> > > > +	 * functions can fix multiple data structures at once, so we have to
> > > > +	 * treat sick and heal masks separately.
> > > > +	 */
> > > > +	unsigned int			heal_mask;
> > > > +
> > > >  	bool				try_harder;
> > > >  	bool				has_quotaofflock;
> > > >  
> > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/10] xfs: report AG health via AG geometry ioctl
  2019-04-05 20:33         ` Darrick J. Wong
@ 2019-04-08 11:34           ` Brian Foster
  2019-04-09  3:25             ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-08 11:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Apr 05, 2019 at 01:33:19PM -0700, Darrick J. Wong wrote:
> On Thu, Apr 04, 2019 at 07:48:57AM -0400, Brian Foster wrote:
> > On Wed, Apr 03, 2019 at 09:11:06AM -0700, Darrick J. Wong wrote:
> > > On Wed, Apr 03, 2019 at 10:30:05AM -0400, Brian Foster wrote:
> > > > On Mon, Apr 01, 2019 at 10:10:52AM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > 
> > > > > Use the AG geometry info ioctl to report health status too.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > ---
> > > > >  fs/xfs/libxfs/xfs_fs.h     |   12 +++++++++++-
> > > > >  fs/xfs/libxfs/xfs_health.h |    2 ++
> > > > >  fs/xfs/xfs_health.c        |   40 ++++++++++++++++++++++++++++++++++++++++
> > > > >  fs/xfs/xfs_ioctl.c         |    2 ++
> > > > >  4 files changed, 55 insertions(+), 1 deletion(-)
> > > > > 
> > > > > 
> > > > ...
> > > > > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > > > > index 151c98693bef..5ca471bd41ad 100644
> > > > > --- a/fs/xfs/xfs_health.c
> > > > > +++ b/fs/xfs/xfs_health.c
> > > > > @@ -276,3 +276,43 @@ xfs_fsop_geom_health(
> > > > >  	if (sick & XFS_HEALTH_RT_SUMMARY)
> > > > >  		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY;
> > > > >  }
> > > > > +
> > > > > +/* Fill out ag geometry health info. */
> > > > > +void
> > > > > +xfs_ag_geom_health(
> > > > > +	struct xfs_mount	*mp,
> > > > > +	xfs_agnumber_t		agno,
> > > > > +	struct xfs_ag_geometry	*ageo)
> > > > > +{
> > > > > +	struct xfs_perag	*pag;
> > > > > +	unsigned int		sick;
> > > > > +
> > > > > +	if (agno >= mp->m_sb.sb_agcount)
> > > > > +		return;
> > > > 
> > > > The call to xfs_ag_get_geometry() would have already returned an error
> > > > in the ioctl path for the above scenario. It might still make sense to
> > > > check here, but perhaps we could let this function also return an int
> > > > and return an error for consistency?
> > > 
> > > Or maybe just ASSERT on the agno and add a note that the caller is
> > > required to pass in a valid ag number.
> > > 
> > > > > +
> > > > > +	ageo->ag_health = 0;
> > > > > +
> > > > > +	pag = xfs_perag_get(mp, agno);
> > > > > +	sick = xfs_ag_measure_sickness(pag);
> > > > > +	if (sick & XFS_HEALTH_AG_SB)
> > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_SB;
> > > > 
> > > > I'm starting to wonder whether "health" is the best term to use for the
> > > > interface bits just because it reads a little weird to measure
> > > > "sickness" and then apply all the sick state to something called
> > > > "health." I don't have a better suggestion off the top of my head,
> > > > though. Just something to think about a bit more from an API
> > > > standpoint..
> > > 
> > > I had the same conundrum.  I guess we could start the bitset with -1 and
> > > clear bits when scrub says they've gone bad?  That would be much clearer
> > > with regards to the names, but technically we don't know the health of a
> > > structure until we scan it, so I wouldn't want to represent the fs as
> > > being "healthy" having not actually looked for problems.
> > > 
> > > What we /really/ need is a tri-state bitset[1]:
> > > 
> > > enum Bool
> > > {
> > >     True,
> > >     False,
> > >     FileNotFound
> > > };
> > > 
> > > But maybe I will try renaming all this to "sick" again.
> > > 
> > > if (sick & XFS_SICK_AG_AGF)
> > > 	ageo->ag_sick |= XFS_AG_GEOM_SICK_AG_AGF;
> > > 
> > > Gosh.  That second name is gross.  XFS_AG_GEOM_SICK_AGF.
> > > 
> > > Sick sick sick sick sick.  Ok, I've convinced myself of the name change. :P
> > > 
> > 
> > Heh. I suppose we could either invert the logic or perhaps try to come
> > up with a better keyword than "health" for the exported bits (at least).
> > If I see ag_health in a data structure, for example, I'm assuming it's
> > telling me what is healthy. Of course we'll have documentation and
> > whatnot to clear that up..
> > 
> > Another term that came to mind is "fault" or "faulted" as it has
> > precedent in storage contexts wrt to raid. I.e., ag_faults and
> > XFS_AG_GEOM_FAULT_AGF, etc. etc. To me it also kind of covers the angle
> > that we aren't necessarily stating a subset of the filesystem is healthy
> > due to lack of faults if we just haven't scrubbed/found anything. Hm? I
> > guess it could be confused with reporting underlying storage problems. I
> > dunno... it's more clear to me, but maybe others have ideas..
> 
> I have a (not very strong) preference for 'sick' over 'fault' because
> there are other parts of xfs where we deal with (page) faults and I
> don't really want to get "file metadata faults" and "file page faults"
> confused.
> 
> (I'm not sure anyone is really going to confuse them, though...)
> 

Ok. Either way, I think a field/bit prefix name that reflects borkedness
over health is a bit more intuitive with the current semantics (i.e.,
bit set means something is borked).

Brian

> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > [1] https://thedailywtf.com/articles/What_Is_Truth_0x3f_
> > > 
> > > > Brian
> > > > 
> > > > > +	if (sick & XFS_HEALTH_AG_AGF)
> > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGF;
> > > > > +	if (sick & XFS_HEALTH_AG_AGFL)
> > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGFL;
> > > > > +	if (sick & XFS_HEALTH_AG_AGI)
> > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGI;
> > > > > +	if (sick & XFS_HEALTH_AG_BNOBT)
> > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_BNOBT;
> > > > > +	if (sick & XFS_HEALTH_AG_CNTBT)
> > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_CNTBT;
> > > > > +	if (sick & XFS_HEALTH_AG_INOBT)
> > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_INOBT;
> > > > > +	if (sick & XFS_HEALTH_AG_FINOBT)
> > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_FINOBT;
> > > > > +	if (sick & XFS_HEALTH_AG_RMAPBT)
> > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_RMAPBT;
> > > > > +	if (sick & XFS_HEALTH_AG_REFCNTBT)
> > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_REFCNTBT;
> > > > > +	xfs_perag_put(pag);
> > > > > +}
> > > > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > > > index f9bf11b6a055..f1fc5e53cfc1 100644
> > > > > --- a/fs/xfs/xfs_ioctl.c
> > > > > +++ b/fs/xfs/xfs_ioctl.c
> > > > > @@ -853,6 +853,8 @@ xfs_ioc_ag_geometry(
> > > > >  	if (error)
> > > > >  		return error;
> > > > >  
> > > > > +	xfs_ag_geom_health(mp, ageo.ag_number, &ageo);
> > > > > +
> > > > >  	if (copy_to_user(arg, &ageo, sizeof(ageo)))
> > > > >  		return -EFAULT;
> > > > >  	return 0;
> > > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health
  2019-04-05 20:54         ` Darrick J. Wong
@ 2019-04-08 11:35           ` Brian Foster
  2019-04-09  3:30             ` Darrick J. Wong
  0 siblings, 1 reply; 41+ messages in thread
From: Brian Foster @ 2019-04-08 11:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Apr 05, 2019 at 01:54:47PM -0700, Darrick J. Wong wrote:
> On Fri, Apr 05, 2019 at 09:07:39AM -0400, Brian Foster wrote:
> > On Thu, Apr 04, 2019 at 11:01:33AM -0700, Darrick J. Wong wrote:
> > > On Thu, Apr 04, 2019 at 07:50:11AM -0400, Brian Foster wrote:
> > > > On Mon, Apr 01, 2019 at 10:11:12AM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > 
> > > > > Now that we have the ability to track sick metadata in-core, make scrub
> > > > > and repair update those health assessments after doing work.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > ---
> > > > >  fs/xfs/Makefile       |    1 
> > > > >  fs/xfs/scrub/health.c |  180 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  fs/xfs/scrub/health.h |   12 +++
> > > > >  fs/xfs/scrub/scrub.c  |    8 ++
> > > > >  fs/xfs/scrub/scrub.h  |   11 +++
> > > > >  5 files changed, 212 insertions(+)
> > > > >  create mode 100644 fs/xfs/scrub/health.c
> > > > >  create mode 100644 fs/xfs/scrub/health.h
> > > > > 
> > > > > 
> > > > ...
> > > > > diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
> > > > > new file mode 100644
> > > > > index 000000000000..dd9986500801
> > > > > --- /dev/null
> > > > > +++ b/fs/xfs/scrub/health.c
> > > > > @@ -0,0 +1,180 @@
> > > > ...
> > > > > +/* Update filesystem health assessments based on what we found and did. */
> > > > > +void
> > > > > +xchk_update_health(
> > > > > +	struct xfs_scrub	*sc,
> > > > > +	bool			already_fixed)
> > > > > +{
> > > > > +	/*
> > > > > +	 * If the scrubber finds errors, we mark sick whatever's mentioned in
> > > > > +	 * sick_mask, no matter whether this is a first scan or an evaluation
> > > > > +	 * of repair effectiveness.
> > > > > +	 *
> > > > > +	 * If there is no direct corruption and we're called after a repair,
> > > > > +	 * clear whatever's in heal_mask because that's what we fixed.
> > > > > +	 *
> > > > > +	 * Otherwise, there's no direct corruption and we didn't repair
> > > > > +	 * anything, so mark whatever's in sick_mask as healthy.
> > > > > +	 */
> > > > > +	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > > > > +		xchk_mark_sick(sc, sc->sick_mask);
> > > > > +	else if (already_fixed)
> > > > > +		xchk_mark_healthy(sc, sc->heal_mask);
> > > > > +	else
> > > > > +		xchk_mark_healthy(sc, sc->sick_mask);
> > > > > +}
> > > > 
> > > > Hmm, I think I follow what we're doing here but it's a bit confusing
> > > > without the additional context of where these bits will be set/cleared
> > > > at the lower scrub layers (or at least without an example). Some
> > > > questions on that below...
> > > > 
> > > > ...
> > > > > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > > > > index 1b2344d00525..b1519dfc5811 100644
> > > > > --- a/fs/xfs/scrub/scrub.c
> > > > > +++ b/fs/xfs/scrub/scrub.c
> > > > > @@ -40,6 +40,7 @@
> > > > >  #include "scrub/trace.h"
> > > > >  #include "scrub/btree.h"
> > > > >  #include "scrub/repair.h"
> > > > > +#include "scrub/health.h"
> > > > >  
> > > > >  /*
> > > > >   * Online Scrub and Repair
> > > > > @@ -468,6 +469,7 @@ xfs_scrub_metadata(
> > > > >  {
> > > > >  	struct xfs_scrub		sc;
> > > > >  	struct xfs_mount		*mp = ip->i_mount;
> > > > > +	unsigned int			heal_mask;
> > > > >  	bool				try_harder = false;
> > > > >  	bool				already_fixed = false;
> > > > >  	int				error = 0;
> > > > > @@ -488,6 +490,7 @@ xfs_scrub_metadata(
> > > > >  	error = xchk_validate_inputs(mp, sm);
> > > > >  	if (error)
> > > > >  		goto out;
> > > > > +	heal_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
> > > > >  
> > > > >  	xchk_experimental_warning(mp);
> > > > >  
> > > > > @@ -499,6 +502,8 @@ xfs_scrub_metadata(
> > > > >  	sc.ops = &meta_scrub_ops[sm->sm_type];
> > > > >  	sc.try_harder = try_harder;
> > > > >  	sc.sa.agno = NULLAGNUMBER;
> > > > > +	sc.heal_mask = heal_mask;
> > > > > +	sc.sick_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
> > > > 
> > > > Ok, so we initialize the heal/sick masks based on the scrub type that
> > > > was requested on the first pass through...
> > > > 
> > > > >  	error = sc.ops->setup(&sc, ip);
> > > > >  	if (error)
> > > > >  		goto out_teardown;
> > > > > @@ -519,6 +524,8 @@ xfs_scrub_metadata(
> > > > >  	} else if (error)
> > > > >  		goto out_teardown;
> > > > >  
> > > > > +	xchk_update_health(&sc, already_fixed);
> > > > > +
> > > > 
> > > > ... then update the in-core fs health state based on the sick mask. Is
> > > > it possible for the scrub operation to set more sick mask bits based on
> > > > what it finds?
> > > 
> > > Theoretically, yes, but in practice none of the current scrubbers need
> > > to touch sick_mask.
> > > 
> > > heal_mask, OTOH, will be adjusted by the free space / inode repair
> > > functions since they rebuild multiple structures.
> > > 
> > 
> > Ok..
> > 
> > > > More specifically, I'm wondering why the masks wouldn't start as zero
> > > > and toggle based on finding/fixing corruption(s).
> > > 
> > > sick_mask is also the mask we feed to xfs_*_mark_healthy if the scan
> > > returns clean, which is why we set the default value before dispatching
> > > the scrub.
> > > 
> > > > Or if the sick mask value is essentially fixed, whether we need to
> > > > store it in the xfs_scrub context...
> > > 
> > > We could probably get away with generating it in xchk_update_health at
> > > the end, but it feels weird to have heal_mask in the scrub context but
> > > sick_mask gets auto-generated.
> > > 
> > 
> > Ok.. hmm. Both feel a little weird to me, but this is really just an
> > aesthetic/factoring thing so I'll think about it a bit more and come
> > back to it.
> > 
> > > > 
> > > > >  	if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed) {
> > > > >  		bool needs_fix;
> > > > >  
> > > > > @@ -551,6 +558,7 @@ xfs_scrub_metadata(
> > > > >  				xrep_failure(mp);
> > > > >  				goto out;
> > > > >  			}
> > > > > +			heal_mask = sc.heal_mask;
> > > > 
> > > > And if we end up doing a repair, we presumably can repair multiple
> > > > things and so we track that separately and persist the heal mask across
> > > > a potential retry.
> > > 
> > > Right.
> > > 
> > > > What about the case where we don't retry, but scrub finds something
> > > > and then immediately repairs it?
> > > 
> > > The repair jumps back to retry_op if either (a) we couldn't get all the
> > > resources we needed and therefore sc.try_harder = true and we need to
> > > start over; or (b) repair thinks it fixed a thing, so we need to scrub
> > > the thing again to see if it's really fixed...
> > > 
> > > > Should we update the fs state after both detecting and clearing the
> > > > problem, or does that happen elsewhere?
> > > 
> > > ...so if scrub immediately repairs a thing, we preserve heal_mask, jump
> > > back to the scrub, and if the scrub says clean we'll mark heal mask
> > > healthy.
> > > 
> > > If the repair has to retry then the we'll call the repair function
> > > again, which (presumably) will set (again) the heal_mask appropriately,
> > > and then we have the same post-repair state updating as above.
> > > 
> > > Does that make sense? :)
> > > 
> > 
> > Ah, Ok. I didn't realize that a successful repair looped back to the
> > scrub code (and thus the health update). Yes, that makes more sense.
> > 
> > > > Also, if repair can potentially clear multiple bits, what's the
> > > > possibility of a repair clearing one failure and then failing on
> > > > another, causing the broader repair op to return an error or jump into
> > > > this retry?
> > > 
> > > Scrub doesn't touch the fs health state at all until after the ->scrub
> > > or ->repair function succeeds.  If the scrub or the repair functions
> > > fail for any non-retry reason, we back out to userspace without updating
> > > anything.  It's as if we'd never called the failed function.
> > > 
> > 
> > Right.. what I was getting at above is seeing whether we'd actually
> > update partial repair state in-core. E.g., suppose things A and B are
> > faulted in-core and it's one of these cases where repair can fix A and B
> > at the same time. If it fixes thing A and fails on thing B, it sounds
> > like we'd not clear the in-core fault state on A even though it's
> > technically repaired.
> 
> Hmm.  If the repair function returns a runtime error (having fixed A but
> not B) then yes, we won't clear the incore fault state on A (or B) even
> though we fixed A.  Something weird happened, so we shouldn't be too
> hasty to clear things.  A subsequent re-scrub of A will clear the fault
> on A, though.
> 

Ok. Indeed, it doesn't seem that unreasonable to me for an operational
error to fail to clear health state for something that was repaired.

> OTOH... if the A/B repair function returns 0 having fixed A but left B
> corrupt, the rescan will see that A is fine and (incorrectly) clear both
> A and B.  I would say that's a bug, so maybe I should rethink the need
> for sick_mask and heal_mask.
> 

That one sounds more dodgy. ;P

> That said, a normal xfs_scrub run will check (or have already checked) B
> and noticed that it was corrupt, so it will circle back and try to fix B
> separately, so in a sense we don't really need heal_mask at all.
> 

Ok..

> > > Maybe some worked examples will help?
> > > 
> > > Let's say both inode btrees are corrupt.  We run xfs_scrub -n,
> > > xchk_inobt will record the corruption, and (assuming it hits no runtime
> > > errors) once we return to xfs_scrub_metadata, it'll set
> > > XFS_SICK_AG_INOBT.  Presumably xfs_scrub will also call the finobt scrub
> > > and SICK_AG_FINOBT will also get set.
> > > 
> > > If we run xfs_scrub without the -n, xchk_inobt will record the
> > > corruption and set SICK_AG_INOBT per above.  Then it'll run xrep_inobt,
> > > which will set heal_mask to SICK_AG_INOBT | SICK_AG_FINOBT.  If the
> > > repair fails with a non-retry runtime error, we exit to userspace and
> > > ignore heal_mask.
> > > 
> > 
> > Ok, this sounds like the case I'm theorizing about above (where suppose
> > repair fixed the inobt and then failed on the finobt, but hasn't cleared
> > faults for either..).
> > 
> > > If instead the repair succeeds, we scan the inobt again.  If that comes
> > > up clear then we use heal_mask to clear SICK_AG_INOBT | SICK_AG_FINOBT.
> > > xfs_scrub will call again later to repair the finobt, but the initial
> > > finobt scan will see no errors in the finobt, clear SICK_AG_FINOBT
> > > (which isn't set) and exit.
> > > 
> > 
> > So it sounds like the state would have to be cleared by a subsequent
> > scrub request. The scan would find thing A healthy and mark it so
> > regardless, to clear any potential previous faults that might have
> > already been repaired. Right?
> 
> Right.
> 
> > > If the inobt repair function is buggy and says it repaired the inode
> > > btrees but leaves corruptions, then the rescan of the inobt will notice
> > > and set SICK_AG_INOBT (which is already set) and exit.  Similarly, when
> > > xfs_scrub calls back about the finobt, it will notice the corrupt
> > > finobt, try to set SICK_AG_FINOBT (also already set), try to fix it, and
> > > the rescan of the finobt will notice that the finobt is still corrupt
> > > and try to set SICK_AG_FINOBT (which is still set).
> > > 
> > > The end result (I think) is that we always set the sick bits if a scan
> > > shows problems, and we only clear the sick bits for things if we can
> > > prove that the things are no longer sick.  Does that help?
> > > 
> > 
> > Yes, thanks for the explanation. I think the confusion is mostly due to
> > not being able to fully see how these scrub states are managed,
> > particularly the bits that warranted the creation of separate masks in
> > the first place.
> 
> You've convinced me that this patch is too convoluted to understand, so
> I think I want to simplify it some more.  First, I'd rename the field
> to "sick_mask_update" and change the behavior so that we:
> 
>  1. Set sick_mask_update to the default XFS_SICK flag for this scrub
>     type (call it A).  (We already do this)
> 
>  2. If the scrubber returns an error code, we exit making no changes to
>     the incore sick state.
> 
>  3. If the scrubber finds that A is clean, clear the incore sick flags
>     that are set in s_m_u and exit.
> 
>  4. If the scrubber finds that A is corrupt, set the incore sick flags
>     that are set in s_m_u.
> 
>     a. If the user doesn't want to repair, then we exit, having
>        previously set incore sick flags.
> 
>  5. Now we know that A is corrupt and the user wants to repair.
>     If repair returns an error code, we exit with that error code, having
>     made no further changes to the incore sick state.
> 
>  6. If repair rebuilds both A & B correctly and the re-scrub of A is
>     clean, we'll clear the incore sick flags using s_m_u.  This should
>     clear A.
> 
>  7. If repair rebuilds both A & B and screws up A, the re-scrub will find
>     it corrupt and leave the sick flags as they are, which is to say that
>     A is marked sick.
> 
>  8. If repair rebuilds A correctly but leaves B corrupt, the re-scrub of
>     A will be clean and we'll clear the incore sick flags using s_m_u.
>     This should clear A, even though B is corrupt.
> 
>  9. No matter whether we encountered scenarios 6, 7, or 8, if xfs_scrub
>     previously scrubbed B and found it corrupt, it will call again to
>     repair B, which will set the incore sick state appropriately.  If
>     xfs_scrub has not yet scrubbed B then it will call later to scrub B,
>     which will set the incore sick state appropriately.
> 
> I hope that's easier to understand...
> 

It sounds like the primary difference here is trading off the ability to
clear both A and B flags at the same time during a scrub+repair of A,
and rather rely on the separate scrub of B to detect that B is no longer
corrupt.

That sounds much more straightforward to me provided it works well
enough with the userspace tool (i.e., xfs_scrub will eventually mark B
healthy before it returns either way). It simplifies the tracking and if
we consider the normal sequence for a corrupted thing should be scrub(A)
-> setcorrupt(A) -> repair(A) -> scrub(A) -> sethealthy(A), then
clearing in-core sick state of B at the end kind of violates the model
where we'd expect another scrub(B) to take place first.

Brian

> > This does still have me wondering if separate masks are necessary, if we
> > perhaps had more selective health update logic, for example. I think it
> > might be better to either bundle this patch with whatever other changes
> > actually make use of the separate masks, or alternatively to simplify
> > the current logic and just defer the separate mask thing until those
> > more complex repair algorithms come along..
> 
> --D
> 
> > Brian
> > 
> > > > ISTM that it might be possible to skip clearing one fail state bit so
> > > > long as the original thing remained corrupted, but I feel like I'm
> > > > still missing some context on the bigger picture scrub tracking...
> > > 
> > > Yeah, the state machine is pretty squirrely. :/
> > > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > > >  			goto retry_op;
> > > > >  		}
> > > > >  	}
> > > > > diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> > > > > index 22f754fba8e5..05f1ad242a35 100644
> > > > > --- a/fs/xfs/scrub/scrub.h
> > > > > +++ b/fs/xfs/scrub/scrub.h
> > > > > @@ -62,6 +62,17 @@ struct xfs_scrub {
> > > > >  	struct xfs_inode		*ip;
> > > > >  	void				*buf;
> > > > >  	uint				ilock_flags;
> > > > > +
> > > > > +	/* Metadata to be marked sick if scrub finds errors. */
> > > > > +	unsigned int			sick_mask;
> > > > > +
> > > > > +	/*
> > > > > +	 * Metadata to be marked healthy if repair fixes errors.  Some repair
> > > > > +	 * functions can fix multiple data structures at once, so we have to
> > > > > +	 * treat sick and heal masks separately.
> > > > > +	 */
> > > > > +	unsigned int			heal_mask;
> > > > > +
> > > > >  	bool				try_harder;
> > > > >  	bool				has_quotaofflock;
> > > > >  
> > > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/10] xfs: report AG health via AG geometry ioctl
  2019-04-08 11:34           ` Brian Foster
@ 2019-04-09  3:25             ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-09  3:25 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 08, 2019 at 07:34:39AM -0400, Brian Foster wrote:
> On Fri, Apr 05, 2019 at 01:33:19PM -0700, Darrick J. Wong wrote:
> > On Thu, Apr 04, 2019 at 07:48:57AM -0400, Brian Foster wrote:
> > > On Wed, Apr 03, 2019 at 09:11:06AM -0700, Darrick J. Wong wrote:
> > > > On Wed, Apr 03, 2019 at 10:30:05AM -0400, Brian Foster wrote:
> > > > > On Mon, Apr 01, 2019 at 10:10:52AM -0700, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > 
> > > > > > Use the AG geometry info ioctl to report health status too.
> > > > > > 
> > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > ---
> > > > > >  fs/xfs/libxfs/xfs_fs.h     |   12 +++++++++++-
> > > > > >  fs/xfs/libxfs/xfs_health.h |    2 ++
> > > > > >  fs/xfs/xfs_health.c        |   40 ++++++++++++++++++++++++++++++++++++++++
> > > > > >  fs/xfs/xfs_ioctl.c         |    2 ++
> > > > > >  4 files changed, 55 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > 
> > > > > ...
> > > > > > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> > > > > > index 151c98693bef..5ca471bd41ad 100644
> > > > > > --- a/fs/xfs/xfs_health.c
> > > > > > +++ b/fs/xfs/xfs_health.c
> > > > > > @@ -276,3 +276,43 @@ xfs_fsop_geom_health(
> > > > > >  	if (sick & XFS_HEALTH_RT_SUMMARY)
> > > > > >  		geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY;
> > > > > >  }
> > > > > > +
> > > > > > +/* Fill out ag geometry health info. */
> > > > > > +void
> > > > > > +xfs_ag_geom_health(
> > > > > > +	struct xfs_mount	*mp,
> > > > > > +	xfs_agnumber_t		agno,
> > > > > > +	struct xfs_ag_geometry	*ageo)
> > > > > > +{
> > > > > > +	struct xfs_perag	*pag;
> > > > > > +	unsigned int		sick;
> > > > > > +
> > > > > > +	if (agno >= mp->m_sb.sb_agcount)
> > > > > > +		return;
> > > > > 
> > > > > The call to xfs_ag_get_geometry() would have already returned an error
> > > > > in the ioctl path for the above scenario. It might still make sense to
> > > > > check here, but perhaps we could let this function also return an int
> > > > > and return an error for consistency?
> > > > 
> > > > Or maybe just ASSERT on the agno and add a note that the caller is
> > > > required to pass in a valid ag number.
> > > > 
> > > > > > +
> > > > > > +	ageo->ag_health = 0;
> > > > > > +
> > > > > > +	pag = xfs_perag_get(mp, agno);
> > > > > > +	sick = xfs_ag_measure_sickness(pag);
> > > > > > +	if (sick & XFS_HEALTH_AG_SB)
> > > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_SB;
> > > > > 
> > > > > I'm starting to wonder whether "health" is the best term to use for the
> > > > > interface bits just because it reads a little weird to measure
> > > > > "sickness" and then apply all the sick state to something called
> > > > > "health." I don't have a better suggestion off the top of my head,
> > > > > though. Just something to think about a bit more from an API
> > > > > standpoint..
> > > > 
> > > > I had the same conundrum.  I guess we could start the bitset with -1 and
> > > > clear bits when scrub says they've gone bad?  That would be much clearer
> > > > with regards to the names, but technically we don't know the health of a
> > > > structure until we scan it, so I wouldn't want to represent the fs as
> > > > being "healthy" having not actually looked for problems.
> > > > 
> > > > What we /really/ need is a tri-state bitset[1]:
> > > > 
> > > > enum Bool
> > > > {
> > > >     True,
> > > >     False,
> > > >     FileNotFound
> > > > };
> > > > 
> > > > But maybe I will try renaming all this to "sick" again.
> > > > 
> > > > if (sick & XFS_SICK_AG_AGF)
> > > > 	ageo->ag_sick |= XFS_AG_GEOM_SICK_AG_AGF;
> > > > 
> > > > Gosh.  That second name is gross.  XFS_AG_GEOM_SICK_AGF.
> > > > 
> > > > Sick sick sick sick sick.  Ok, I've convinced myself of the name change. :P
> > > > 
> > > 
> > > Heh. I suppose we could either invert the logic or perhaps try to come
> > > up with a better keyword than "health" for the exported bits (at least).
> > > If I see ag_health in a data structure, for example, I'm assuming it's
> > > telling me what is healthy. Of course we'll have documentation and
> > > whatnot to clear that up..
> > > 
> > > Another term that came to mind is "fault" or "faulted" as it has
> > > precedent in storage contexts wrt to raid. I.e., ag_faults and
> > > XFS_AG_GEOM_FAULT_AGF, etc. etc. To me it also kind of covers the angle
> > > that we aren't necessarily stating a subset of the filesystem is healthy
> > > due to lack of faults if we just haven't scrubbed/found anything. Hm? I
> > > guess it could be confused with reporting underlying storage problems. I
> > > dunno... it's more clear to me, but maybe others have ideas..
> > 
> > I have a (not very strong) preference for 'sick' over 'fault' because
> > there are other parts of xfs where we deal with (page) faults and I
> > don't really want to get "file metadata faults" and "file page faults"
> > confused.
> > 
> > (I'm not sure anyone is really going to confuse them, though...)
> > 
> 
> Ok. Either way, I think a field/bit prefix name that reflects borkedness
> over health is a bit more intuitive with the current semantics (i.e.,
> bit set means something is borked).

<nod> I'm nearly ready to send v2, which will have all the fields
renamed to "sick" and the bit flags named "SICK" so it'll be consistent
and (hopefully) obvious to all.

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > > --D
> > > > 
> > > > [1] https://thedailywtf.com/articles/What_Is_Truth_0x3f_
> > > > 
> > > > > Brian
> > > > > 
> > > > > > +	if (sick & XFS_HEALTH_AG_AGF)
> > > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGF;
> > > > > > +	if (sick & XFS_HEALTH_AG_AGFL)
> > > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGFL;
> > > > > > +	if (sick & XFS_HEALTH_AG_AGI)
> > > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGI;
> > > > > > +	if (sick & XFS_HEALTH_AG_BNOBT)
> > > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_BNOBT;
> > > > > > +	if (sick & XFS_HEALTH_AG_CNTBT)
> > > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_CNTBT;
> > > > > > +	if (sick & XFS_HEALTH_AG_INOBT)
> > > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_INOBT;
> > > > > > +	if (sick & XFS_HEALTH_AG_FINOBT)
> > > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_FINOBT;
> > > > > > +	if (sick & XFS_HEALTH_AG_RMAPBT)
> > > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_RMAPBT;
> > > > > > +	if (sick & XFS_HEALTH_AG_REFCNTBT)
> > > > > > +		ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_REFCNTBT;
> > > > > > +	xfs_perag_put(pag);
> > > > > > +}
> > > > > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > > > > index f9bf11b6a055..f1fc5e53cfc1 100644
> > > > > > --- a/fs/xfs/xfs_ioctl.c
> > > > > > +++ b/fs/xfs/xfs_ioctl.c
> > > > > > @@ -853,6 +853,8 @@ xfs_ioc_ag_geometry(
> > > > > >  	if (error)
> > > > > >  		return error;
> > > > > >  
> > > > > > +	xfs_ag_geom_health(mp, ageo.ag_number, &ageo);
> > > > > > +
> > > > > >  	if (copy_to_user(arg, &ageo, sizeof(ageo)))
> > > > > >  		return -EFAULT;
> > > > > >  	return 0;
> > > > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health
  2019-04-08 11:35           ` Brian Foster
@ 2019-04-09  3:30             ` Darrick J. Wong
  0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2019-04-09  3:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 08, 2019 at 07:35:41AM -0400, Brian Foster wrote:
> On Fri, Apr 05, 2019 at 01:54:47PM -0700, Darrick J. Wong wrote:
> > On Fri, Apr 05, 2019 at 09:07:39AM -0400, Brian Foster wrote:
> > > On Thu, Apr 04, 2019 at 11:01:33AM -0700, Darrick J. Wong wrote:
> > > > On Thu, Apr 04, 2019 at 07:50:11AM -0400, Brian Foster wrote:
> > > > > On Mon, Apr 01, 2019 at 10:11:12AM -0700, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > 
> > > > > > Now that we have the ability to track sick metadata in-core, make scrub
> > > > > > and repair update those health assessments after doing work.
> > > > > > 
> > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > ---
> > > > > >  fs/xfs/Makefile       |    1 
> > > > > >  fs/xfs/scrub/health.c |  180 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  fs/xfs/scrub/health.h |   12 +++
> > > > > >  fs/xfs/scrub/scrub.c  |    8 ++
> > > > > >  fs/xfs/scrub/scrub.h  |   11 +++
> > > > > >  5 files changed, 212 insertions(+)
> > > > > >  create mode 100644 fs/xfs/scrub/health.c
> > > > > >  create mode 100644 fs/xfs/scrub/health.h
> > > > > > 
> > > > > > 
> > > > > ...
> > > > > > diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
> > > > > > new file mode 100644
> > > > > > index 000000000000..dd9986500801
> > > > > > --- /dev/null
> > > > > > +++ b/fs/xfs/scrub/health.c
> > > > > > @@ -0,0 +1,180 @@
> > > > > ...
> > > > > > +/* Update filesystem health assessments based on what we found and did. */
> > > > > > +void
> > > > > > +xchk_update_health(
> > > > > > +	struct xfs_scrub	*sc,
> > > > > > +	bool			already_fixed)
> > > > > > +{
> > > > > > +	/*
> > > > > > +	 * If the scrubber finds errors, we mark sick whatever's mentioned in
> > > > > > +	 * sick_mask, no matter whether this is a first scan or an evaluation
> > > > > > +	 * of repair effectiveness.
> > > > > > +	 *
> > > > > > +	 * If there is no direct corruption and we're called after a repair,
> > > > > > +	 * clear whatever's in heal_mask because that's what we fixed.
> > > > > > +	 *
> > > > > > +	 * Otherwise, there's no direct corruption and we didn't repair
> > > > > > +	 * anything, so mark whatever's in sick_mask as healthy.
> > > > > > +	 */
> > > > > > +	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > > > > > +		xchk_mark_sick(sc, sc->sick_mask);
> > > > > > +	else if (already_fixed)
> > > > > > +		xchk_mark_healthy(sc, sc->heal_mask);
> > > > > > +	else
> > > > > > +		xchk_mark_healthy(sc, sc->sick_mask);
> > > > > > +}
> > > > > 
> > > > > Hmm, I think I follow what we're doing here but it's a bit confusing
> > > > > without the additional context of where these bits will be set/cleared
> > > > > at the lower scrub layers (or at least without an example). Some
> > > > > questions on that below...
> > > > > 
> > > > > ...
> > > > > > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > > > > > index 1b2344d00525..b1519dfc5811 100644
> > > > > > --- a/fs/xfs/scrub/scrub.c
> > > > > > +++ b/fs/xfs/scrub/scrub.c
> > > > > > @@ -40,6 +40,7 @@
> > > > > >  #include "scrub/trace.h"
> > > > > >  #include "scrub/btree.h"
> > > > > >  #include "scrub/repair.h"
> > > > > > +#include "scrub/health.h"
> > > > > >  
> > > > > >  /*
> > > > > >   * Online Scrub and Repair
> > > > > > @@ -468,6 +469,7 @@ xfs_scrub_metadata(
> > > > > >  {
> > > > > >  	struct xfs_scrub		sc;
> > > > > >  	struct xfs_mount		*mp = ip->i_mount;
> > > > > > +	unsigned int			heal_mask;
> > > > > >  	bool				try_harder = false;
> > > > > >  	bool				already_fixed = false;
> > > > > >  	int				error = 0;
> > > > > > @@ -488,6 +490,7 @@ xfs_scrub_metadata(
> > > > > >  	error = xchk_validate_inputs(mp, sm);
> > > > > >  	if (error)
> > > > > >  		goto out;
> > > > > > +	heal_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
> > > > > >  
> > > > > >  	xchk_experimental_warning(mp);
> > > > > >  
> > > > > > @@ -499,6 +502,8 @@ xfs_scrub_metadata(
> > > > > >  	sc.ops = &meta_scrub_ops[sm->sm_type];
> > > > > >  	sc.try_harder = try_harder;
> > > > > >  	sc.sa.agno = NULLAGNUMBER;
> > > > > > +	sc.heal_mask = heal_mask;
> > > > > > +	sc.sick_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
> > > > > 
> > > > > Ok, so we initialize the heal/sick masks based on the scrub type that
> > > > > was requested on the first pass through...
> > > > > 
> > > > > >  	error = sc.ops->setup(&sc, ip);
> > > > > >  	if (error)
> > > > > >  		goto out_teardown;
> > > > > > @@ -519,6 +524,8 @@ xfs_scrub_metadata(
> > > > > >  	} else if (error)
> > > > > >  		goto out_teardown;
> > > > > >  
> > > > > > +	xchk_update_health(&sc, already_fixed);
> > > > > > +
> > > > > 
> > > > > ... then update the in-core fs health state based on the sick mask. Is
> > > > > it possible for the scrub operation to set more sick mask bits based on
> > > > > what it finds?
> > > > 
> > > > Theoretically, yes, but in practice none of the current scrubbers need
> > > > to touch sick_mask.
> > > > 
> > > > heal_mask, OTOH, will be adjusted by the free space / inode repair
> > > > functions since they rebuild multiple structures.
> > > > 
> > > 
> > > Ok..
> > > 
> > > > > More specifically, I'm wondering why the masks wouldn't start as zero
> > > > > and toggle based on finding/fixing corruption(s).
> > > > 
> > > > sick_mask is also the mask we feed to xfs_*_mark_healthy if the scan
> > > > returns clean, which is why we set the default value before dispatching
> > > > the scrub.
> > > > 
> > > > > Or if the sick mask value is essentially fixed, whether we need to
> > > > > store it in the xfs_scrub context...
> > > > 
> > > > We could probably get away with generating it in xchk_update_health at
> > > > the end, but it feels weird to have heal_mask in the scrub context but
> > > > sick_mask gets auto-generated.
> > > > 
> > > 
> > > Ok.. hmm. Both feel a little weird to me, but this is really just an
> > > aesthetic/factoring thing so I'll think about it a bit more and come
> > > back to it.
> > > 
> > > > > 
> > > > > >  	if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed) {
> > > > > >  		bool needs_fix;
> > > > > >  
> > > > > > @@ -551,6 +558,7 @@ xfs_scrub_metadata(
> > > > > >  				xrep_failure(mp);
> > > > > >  				goto out;
> > > > > >  			}
> > > > > > +			heal_mask = sc.heal_mask;
> > > > > 
> > > > > And if we end up doing a repair, we presumably can repair multiple
> > > > > things and so we track that separately and persist the heal mask across
> > > > > a potential retry.
> > > > 
> > > > Right.
> > > > 
> > > > > What about the case where we don't retry, but scrub finds something
> > > > > and then immediately repairs it?
> > > > 
> > > > The repair jumps back to retry_op if either (a) we couldn't get all the
> > > > resources we needed and therefore sc.try_harder = true and we need to
> > > > start over; or (b) repair thinks it fixed a thing, so we need to scrub
> > > > the thing again to see if it's really fixed...
> > > > 
> > > > > Should we update the fs state after both detecting and clearing the
> > > > > problem, or does that happen elsewhere?
> > > > 
> > > > ...so if scrub immediately repairs a thing, we preserve heal_mask, jump
> > > > back to the scrub, and if the scrub says clean we'll mark heal mask
> > > > healthy.
> > > > 
> > > > If the repair has to retry then the we'll call the repair function
> > > > again, which (presumably) will set (again) the heal_mask appropriately,
> > > > and then we have the same post-repair state updating as above.
> > > > 
> > > > Does that make sense? :)
> > > > 
> > > 
> > > Ah, Ok. I didn't realize that a successful repair looped back to the
> > > scrub code (and thus the health update). Yes, that makes more sense.
> > > 
> > > > > Also, if repair can potentially clear multiple bits, what's the
> > > > > possibility of a repair clearing one failure and then failing on
> > > > > another, causing the broader repair op to return an error or jump into
> > > > > this retry?
> > > > 
> > > > Scrub doesn't touch the fs health state at all until after the ->scrub
> > > > or ->repair function succeeds.  If the scrub or the repair functions
> > > > fail for any non-retry reason, we back out to userspace without updating
> > > > anything.  It's as if we'd never called the failed function.
> > > > 
> > > 
> > > Right.. what I was getting at above is seeing whether we'd actually
> > > update partial repair state in-core. E.g., suppose things A and B are
> > > faulted in-core and it's one of these cases where repair can fix A and B
> > > at the same time. If it fixes thing A and fails on thing B, it sounds
> > > like we'd not clear the in-core fault state on A even though it's
> > > technically repaired.
> > 
> > Hmm.  If the repair function returns a runtime error (having fixed A but
> > not B) then yes, we won't clear the incore fault state on A (or B) even
> > though we fixed A.  Something weird happened, so we shouldn't be too
> > hasty to clear things.  A subsequent re-scrub of A will clear the fault
> > on A, though.
> > 
> 
> Ok. Indeed, it doesn't seem that unreasonable to me for an operational
> error to fail to clear health state for something that was repaired.
> 
> > OTOH... if the A/B repair function returns 0 having fixed A but left B
> > corrupt, the rescan will see that A is fine and (incorrectly) clear both
> > A and B.  I would say that's a bug, so maybe I should rethink the need
> > for sick_mask and heal_mask.
> > 
> 
> That one sounds more dodgy. ;P
> 
> > That said, a normal xfs_scrub run will check (or have already checked) B
> > and noticed that it was corrupt, so it will circle back and try to fix B
> > separately, so in a sense we don't really need heal_mask at all.
> > 
> 
> Ok..
> 
> > > > Maybe some worked examples will help?
> > > > 
> > > > Let's say both inode btrees are corrupt.  We run xfs_scrub -n,
> > > > xchk_inobt will record the corruption, and (assuming it hits no runtime
> > > > errors) once we return to xfs_scrub_metadata, it'll set
> > > > XFS_SICK_AG_INOBT.  Presumably xfs_scrub will also call the finobt scrub
> > > > and SICK_AG_FINOBT will also get set.
> > > > 
> > > > If we run xfs_scrub without the -n, xchk_inobt will record the
> > > > corruption and set SICK_AG_INOBT per above.  Then it'll run xrep_inobt,
> > > > which will set heal_mask to SICK_AG_INOBT | SICK_AG_FINOBT.  If the
> > > > repair fails with a non-retry runtime error, we exit to userspace and
> > > > ignore heal_mask.
> > > > 
> > > 
> > > Ok, this sounds like the case I'm theorizing about above (where suppose
> > > repair fixed the inobt and then failed on the finobt, but hasn't cleared
> > > faults for either..).
> > > 
> > > > If instead the repair succeeds, we scan the inobt again.  If that comes
> > > > up clear then we use heal_mask to clear SICK_AG_INOBT | SICK_AG_FINOBT.
> > > > xfs_scrub will call again later to repair the finobt, but the initial
> > > > finobt scan will see no errors in the finobt, clear SICK_AG_FINOBT
> > > > (which isn't set) and exit.
> > > > 
> > > 
> > > So it sounds like the state would have to be cleared by a subsequent
> > > scrub request. The scan would find thing A healthy and mark it so
> > > regardless, to clear any potential previous faults that might have
> > > already been repaired. Right?
> > 
> > Right.
> > 
> > > > If the inobt repair function is buggy and says it repaired the inode
> > > > btrees but leaves corruptions, then the rescan of the inobt will notice
> > > > and set SICK_AG_INOBT (which is already set) and exit.  Similarly, when
> > > > xfs_scrub calls back about the finobt, it will notice the corrupt
> > > > finobt, try to set SICK_AG_FINOBT (also already set), try to fix it, and
> > > > the rescan of the finobt will notice that the finobt is still corrupt
> > > > and try to set SICK_AG_FINOBT (which is still set).
> > > > 
> > > > The end result (I think) is that we always set the sick bits if a scan
> > > > shows problems, and we only clear the sick bits for things if we can
> > > > prove that the things are no longer sick.  Does that help?
> > > > 
> > > 
> > > Yes, thanks for the explanation. I think the confusion is mostly due to
> > > not being able to fully see how these scrub states are managed,
> > > particularly the bits that warranted the creation of separate masks in
> > > the first place.
> > 
> > You've convinced me that this patch is too convoluted to understand, so
> > I think I want to simplify it some more.  First, I'd rename the field
> > to "sick_mask_update" and change the behavior so that we:
> > 
> >  1. Set sick_mask_update to the default XFS_SICK flag for this scrub
> >     type (call it A).  (We already do this)
> > 
> >  2. If the scrubber returns an error code, we exit making no changes to
> >     the incore sick state.
> > 
> >  3. If the scrubber finds that A is clean, clear the incore sick flags
> >     that are set in s_m_u and exit.
> > 
> >  4. If the scrubber finds that A is corrupt, set the incore sick flags
> >     that are set in s_m_u.
> > 
> >     a. If the user doesn't want to repair, then we exit, having
> >        previously set incore sick flags.
> > 
> >  5. Now we know that A is corrupt and the user wants to repair.
> >     If repair returns an error code, we exit with that error code, having
> >     made no further changes to the incore sick state.
> > 
> >  6. If repair rebuilds both A & B correctly and the re-scrub of A is
> >     clean, we'll clear the incore sick flags using s_m_u.  This should
> >     clear A.
> > 
> >  7. If repair rebuilds both A & B and screws up A, the re-scrub will find
> >     it corrupt and leave the sick flags as they are, which is to say that
> >     A is marked sick.
> > 
> >  8. If repair rebuilds A correctly but leaves B corrupt, the re-scrub of
> >     A will be clean and we'll clear the incore sick flags using s_m_u.
> >     This should clear A, even though B is corrupt.
> > 
> >  9. No matter whether we encountered scenarios 6, 7, or 8, if xfs_scrub
> >     previously scrubbed B and found it corrupt, it will call again to
> >     repair B, which will set the incore sick state appropriately.  If
> >     xfs_scrub has not yet scrubbed B then it will call later to scrub B,
> >     which will set the incore sick state appropriately.
> > 
> > I hope that's easier to understand...
> > 
> 
> It sounds like the primary difference here is trading off the ability to
> clear both A and B flags at the same time during a scrub+repair of A,
> and rather rely on the separate scrub of B to detect that B is no longer
> corrupt.
> 
> That sounds much more straightforward to me provided it works well
> enough with the userspace tool (i.e., xfs_scrub will eventually mark B
> healthy before it returns either way). It simplifies the tracking and if
> we consider the normal sequence for a corrupted thing should be scrub(A)
> -> setcorrupt(A) -> repair(A) -> scrub(A) -> sethealthy(A), then
> clearing in-core sick state of B at the end kind of violates the model
> where we'd expect another scrub(B) to take place first.

<nod> While I was busy revising patches today I realized that I could
change meta_scrub_ops to supply a "revalidate" function to check
repair's work, and then the revalidation function would know to check
both of the rebuilt btrees.

This should simplify it further -- if we make a mistake anywhere then
the repair operation fails and both structures are marked sick.  At that
point we know we have to try again, though probably that means umount +
xfs_repair.

(I also pasted that numbered list of goals into the code comments,
though massaged a bit to reflect the revalidation functions.)

--D

> Brian
> 
> > > This does still have me wondering if separate masks are necessary, if we
> > > perhaps had more selective health update logic, for example. I think it
> > > might be better to either bundle this patch with whatever other changes
> > > actually make use of the separate masks, or alternatively to simplify
> > > the current logic and just defer the separate mask thing until those
> > > more complex repair algorithms come along..
> > 
> > --D
> > 
> > > Brian
> > > 
> > > > > ISTM that it might be possible to skip clearing one fail state bit so
> > > > > long as the original thing remained corrupted, but I feel like I'm
> > > > > still missing some context on the bigger picture scrub tracking...
> > > > 
> > > > Yeah, the state machine is pretty squirrely. :/
> > > > 
> > > > --D
> > > > 
> > > > > Brian
> > > > > 
> > > > > >  			goto retry_op;
> > > > > >  		}
> > > > > >  	}
> > > > > > diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> > > > > > index 22f754fba8e5..05f1ad242a35 100644
> > > > > > --- a/fs/xfs/scrub/scrub.h
> > > > > > +++ b/fs/xfs/scrub/scrub.h
> > > > > > @@ -62,6 +62,17 @@ struct xfs_scrub {
> > > > > >  	struct xfs_inode		*ip;
> > > > > >  	void				*buf;
> > > > > >  	uint				ilock_flags;
> > > > > > +
> > > > > > +	/* Metadata to be marked sick if scrub finds errors. */
> > > > > > +	unsigned int			sick_mask;
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * Metadata to be marked healthy if repair fixes errors.  Some repair
> > > > > > +	 * functions can fix multiple data structures at once, so we have to
> > > > > > +	 * treat sick and heal masks separately.
> > > > > > +	 */
> > > > > > +	unsigned int			heal_mask;
> > > > > > +
> > > > > >  	bool				try_harder;
> > > > > >  	bool				has_quotaofflock;
> > > > > >  
> > > > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2019-04-09  3:30 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
2019-04-01 17:10 ` [PATCH 01/10] xfs: track metadata health levels Darrick J. Wong
2019-04-02 13:22   ` Brian Foster
2019-04-02 13:30     ` Darrick J. Wong
2019-04-01 17:10 ` [PATCH 02/10] xfs: replace the BAD_SUMMARY mount flag with the equivalent health code Darrick J. Wong
2019-04-02 13:22   ` Brian Foster
2019-04-01 17:10 ` [PATCH 03/10] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem Darrick J. Wong
2019-04-02 13:24   ` Brian Foster
2019-04-02 13:40     ` Darrick J. Wong
2019-04-02 13:53       ` Brian Foster
2019-04-02 18:16         ` Darrick J. Wong
2019-04-02 18:32           ` Brian Foster
2019-04-01 17:10 ` [PATCH 04/10] xfs: expand xfs_fsop_geom Darrick J. Wong
2019-04-02 17:34   ` Brian Foster
2019-04-02 21:53   ` Dave Chinner
2019-04-02 22:31     ` Darrick J. Wong
2019-04-01 17:10 ` [PATCH 05/10] xfs: add a new ioctl to describe allocation group geometry Darrick J. Wong
2019-04-02 17:34   ` Brian Foster
2019-04-02 21:35     ` Darrick J. Wong
2019-04-01 17:10 ` [PATCH 06/10] xfs: report fs and rt health via geometry structure Darrick J. Wong
2019-04-02 17:35   ` Brian Foster
2019-04-02 18:23     ` Darrick J. Wong
2019-04-02 23:34       ` Darrick J. Wong
2019-04-01 17:10 ` [PATCH 07/10] xfs: report AG health via AG geometry ioctl Darrick J. Wong
2019-04-03 14:30   ` Brian Foster
2019-04-03 16:11     ` Darrick J. Wong
2019-04-04 11:48       ` Brian Foster
2019-04-05 20:33         ` Darrick J. Wong
2019-04-08 11:34           ` Brian Foster
2019-04-09  3:25             ` Darrick J. Wong
2019-04-01 17:11 ` [PATCH 08/10] xfs: report inode health via bulkstat Darrick J. Wong
2019-04-01 17:11 ` [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health Darrick J. Wong
2019-04-04 11:50   ` Brian Foster
2019-04-04 18:01     ` Darrick J. Wong
2019-04-05 13:07       ` Brian Foster
2019-04-05 20:54         ` Darrick J. Wong
2019-04-08 11:35           ` Brian Foster
2019-04-09  3:30             ` Darrick J. Wong
2019-04-01 17:11 ` [PATCH 10/10] xfs: update health status if we get a clean bill of health Darrick J. Wong
2019-04-04 11:51   ` Brian Foster
2019-04-04 15:48     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.