* [PATCH v9 00/19] xfs: online fs repair support
@ 2017-08-25 22:16 Darrick J. Wong
2017-08-25 22:16 ` [PATCH 01/19] xfs: add helpers to calculate btree size Darrick J. Wong
` (18 more replies)
0 siblings, 19 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:16 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
Hi all,
This is the ninth revision of a patchset that adds to XFS kernel
support for online metadata scrubbing and repair. There aren't any
on-disk format changes. The overview of the online scrub functionality
isn't any different than it was with the first kernel series, so I'll
dive into what's in this set.
Today's submission is the fourth of four parts; here we add the ability
to regenerate damaged per-AG metadata and inode fork maps from the
reverse mapping btree. We also implement a limited ability to correct
minor problems in AG headers and inode records.
If you're going to start using this mess, you probably ought to just
pull from my git trees. The kernel patches[1] should apply against
4.13-rc6. xfsprogs[2] and xfstests[3] can be found in their usual
places. The git trees contain all four series' worth of changes.
This is an extraordinary way to eat your data. Enjoy!
Comments and questions are, as always, welcome.
--D
[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 01/19] xfs: add helpers to calculate btree size
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
@ 2017-08-25 22:16 ` Darrick J. Wong
2017-08-25 22:17 ` [PATCH 02/19] xfs: expose various functions to repair code Darrick J. Wong
` (17 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:16 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Add a bunch of helper functions that calculate the sizes of various
btrees. These will be used to repair btrees and btree headers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_alloc_btree.c | 9 +++++++++
fs/xfs/libxfs/xfs_alloc_btree.h | 2 ++
fs/xfs/libxfs/xfs_bmap_btree.c | 9 +++++++++
fs/xfs/libxfs/xfs_bmap_btree.h | 3 +++
fs/xfs/libxfs/xfs_btree.c | 4 ++--
fs/xfs/libxfs/xfs_btree.h | 2 +-
fs/xfs/libxfs/xfs_ialloc_btree.c | 9 +++++++++
fs/xfs/libxfs/xfs_ialloc_btree.h | 2 ++
8 files changed, 37 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index cfde0a0..89346e6 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -544,3 +544,12 @@ xfs_allocbt_maxrecs(
return blocklen / sizeof(xfs_alloc_rec_t);
return blocklen / (sizeof(xfs_alloc_key_t) + sizeof(xfs_alloc_ptr_t));
}
+
+/* Calculate the freespace btree size for some records. */
+xfs_extlen_t
+xfs_allocbt_calc_size(
+ struct xfs_mount *mp,
+ unsigned long long len)
+{
+ return xfs_btree_calc_size(mp, mp->m_alloc_mnr, len);
+}
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.h b/fs/xfs/libxfs/xfs_alloc_btree.h
index 45e189e..2fd5472 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.h
+++ b/fs/xfs/libxfs/xfs_alloc_btree.h
@@ -61,5 +61,7 @@ extern struct xfs_btree_cur *xfs_allocbt_init_cursor(struct xfs_mount *,
struct xfs_trans *, struct xfs_buf *,
xfs_agnumber_t, xfs_btnum_t);
extern int xfs_allocbt_maxrecs(struct xfs_mount *, int, int);
+extern xfs_extlen_t xfs_allocbt_calc_size(struct xfs_mount *mp,
+ unsigned long long len);
#endif /* __XFS_ALLOC_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 85de225..eddc1df 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -863,3 +863,12 @@ xfs_bmbt_change_owner(
xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
return error;
}
+
+/* Calculate the bmap btree size for some records. */
+unsigned long long
+xfs_bmbt_calc_size(
+ struct xfs_mount *mp,
+ unsigned long long len)
+{
+ return xfs_btree_calc_size(mp, mp->m_bmap_dmnr, len);
+}
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.h b/fs/xfs/libxfs/xfs_bmap_btree.h
index 9da5a8d..d6dae86 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.h
+++ b/fs/xfs/libxfs/xfs_bmap_btree.h
@@ -146,4 +146,7 @@ static inline bool xfs_bmbt_validate_extent(struct xfs_mount *mp, int whichfork,
return false;
}
+extern unsigned long long xfs_bmbt_calc_size(struct xfs_mount *mp,
+ unsigned long long len);
+
#endif /* __XFS_BMAP_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 42af403..11cf13e 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4863,7 +4863,7 @@ xfs_btree_query_all(
* Calculate the number of blocks needed to store a given number of records
* in a short-format (per-AG metadata) btree.
*/
-xfs_extlen_t
+unsigned long long
xfs_btree_calc_size(
struct xfs_mount *mp,
uint *limits,
@@ -4871,7 +4871,7 @@ xfs_btree_calc_size(
{
int level;
int maxrecs;
- xfs_extlen_t rval;
+ unsigned long long rval;
maxrecs = limits[0];
for (level = 0, rval = 0; len > 1; level++) {
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 3c6b966..89da887 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -482,7 +482,7 @@ bool xfs_btree_sblock_v5hdr_verify(struct xfs_buf *bp);
bool xfs_btree_sblock_verify(struct xfs_buf *bp, unsigned int max_recs);
uint xfs_btree_compute_maxlevels(struct xfs_mount *mp, uint *limits,
unsigned long len);
-xfs_extlen_t xfs_btree_calc_size(struct xfs_mount *mp, uint *limits,
+unsigned long long xfs_btree_calc_size(struct xfs_mount *mp, uint *limits,
unsigned long long len);
/* return codes */
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 317caba..2d2c3ea 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -584,3 +584,12 @@ xfs_finobt_calc_reserves(
*used += tree_len;
return 0;
}
+
+/* Calculate the inobt btree size for some records. */
+xfs_extlen_t
+xfs_iallocbt_calc_size(
+ struct xfs_mount *mp,
+ unsigned long long len)
+{
+ return xfs_btree_calc_size(mp, mp->m_inobt_mnr, len);
+}
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h
index aa81e2e..4acdd54 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.h
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
@@ -74,5 +74,7 @@ int xfs_inobt_rec_check_count(struct xfs_mount *,
int xfs_finobt_calc_reserves(struct xfs_mount *mp, xfs_agnumber_t agno,
xfs_extlen_t *ask, xfs_extlen_t *used);
+extern xfs_extlen_t xfs_iallocbt_calc_size(struct xfs_mount *mp,
+ unsigned long long len);
#endif /* __XFS_IALLOC_BTREE_H__ */
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 02/19] xfs: expose various functions to repair code
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
2017-08-25 22:16 ` [PATCH 01/19] xfs: add helpers to calculate btree size Darrick J. Wong
@ 2017-08-25 22:17 ` Darrick J. Wong
2017-08-25 22:17 ` [PATCH 03/19] xfs: add repair helpers for the reverse mapping btree Darrick J. Wong
` (16 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:17 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Expose various helpers that the repair code will want to use.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_ialloc.c | 2 +-
fs/xfs/libxfs/xfs_ialloc.h | 3 +++
fs/xfs/libxfs/xfs_refcount.c | 4 ++--
fs/xfs/libxfs/xfs_refcount.h | 5 +++++
4 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 6ec2655..b7d4a1b 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -147,7 +147,7 @@ xfs_inobt_get_rec(
/*
* Insert a single inobt record. Cursor must already point to desired location.
*/
-STATIC int
+int
xfs_inobt_insert_rec(
struct xfs_btree_cur *cur,
uint16_t holemask,
diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
index 7a26cf7..42b8c34 100644
--- a/fs/xfs/libxfs/xfs_ialloc.h
+++ b/fs/xfs/libxfs/xfs_ialloc.h
@@ -177,6 +177,9 @@ int xfs_ialloc_has_inode_record(struct xfs_btree_cur *cur, xfs_agino_t low,
xfs_agino_t high, bool *exists);
int xfs_ialloc_count_inodes(struct xfs_btree_cur *cur, xfs_agino_t *count,
xfs_agino_t *freecount);
+int xfs_inobt_insert_rec(struct xfs_btree_cur *cur, uint16_t holemask,
+ uint8_t count, int32_t freecount, xfs_inofree_t free,
+ int *stat);
int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 728133f..dc3fbcb 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -88,7 +88,7 @@ xfs_refcount_lookup_ge(
}
/* Convert on-disk record to in-core format. */
-static inline void
+void
xfs_refcount_btrec_to_irec(
union xfs_btree_rec *rec,
struct xfs_refcount_irec *irec)
@@ -148,7 +148,7 @@ xfs_refcount_update(
* by [bno, len, refcount].
* This either works (return 0) or gets an EFSCORRUPTED error.
*/
-STATIC int
+int
xfs_refcount_insert(
struct xfs_btree_cur *cur,
struct xfs_refcount_irec *irec,
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
index 2a731ac..5856abb 100644
--- a/fs/xfs/libxfs/xfs_refcount.h
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -85,5 +85,10 @@ static inline xfs_fileoff_t xfs_refcount_max_unmap(int log_res)
extern int xfs_refcount_has_record(struct xfs_btree_cur *cur,
xfs_agblock_t bno, xfs_extlen_t len, bool *exists);
+union xfs_btree_rec;
+extern void xfs_refcount_btrec_to_irec(union xfs_btree_rec *rec,
+ struct xfs_refcount_irec *irec);
+extern int xfs_refcount_insert(struct xfs_btree_cur *cur,
+ struct xfs_refcount_irec *irec, int *stat);
#endif /* __XFS_REFCOUNT_H__ */
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 03/19] xfs: add repair helpers for the reverse mapping btree
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
2017-08-25 22:16 ` [PATCH 01/19] xfs: add helpers to calculate btree size Darrick J. Wong
2017-08-25 22:17 ` [PATCH 02/19] xfs: expose various functions to repair code Darrick J. Wong
@ 2017-08-25 22:17 ` Darrick J. Wong
2017-08-25 22:17 ` [PATCH 04/19] xfs: add repair helpers for the reference count btree Darrick J. Wong
` (15 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:17 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Add a couple of functions to the reverse mapping btree that will be used
to repair the rmapbt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_rmap.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_rmap.h | 4 ++
2 files changed, 83 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 9c33fd6..d14af8b 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -1977,6 +1977,34 @@ xfs_rmap_map_shared(
return error;
}
+/* Insert a raw rmap into the rmapbt. */
+int
+xfs_rmap_map_raw(
+ struct xfs_btree_cur *cur,
+ struct xfs_rmap_irec *rmap)
+{
+ struct xfs_owner_info oinfo;
+
+ oinfo.oi_owner = rmap->rm_owner;
+ oinfo.oi_offset = rmap->rm_offset;
+ oinfo.oi_flags = 0;
+ if (rmap->rm_flags & XFS_RMAP_ATTR_FORK)
+ oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK;
+ if (rmap->rm_flags & XFS_RMAP_BMBT_BLOCK)
+ oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK;
+
+ if (rmap->rm_flags || XFS_RMAP_NON_INODE_OWNER(rmap->rm_owner))
+ return xfs_rmap_map(cur, rmap->rm_startblock,
+ rmap->rm_blockcount,
+ rmap->rm_flags & XFS_RMAP_UNWRITTEN,
+ &oinfo);
+
+ return xfs_rmap_map_shared(cur, rmap->rm_startblock,
+ rmap->rm_blockcount,
+ rmap->rm_flags & XFS_RMAP_UNWRITTEN,
+ &oinfo);
+}
+
struct xfs_rmap_query_range_info {
xfs_rmap_query_range_fn fn;
void *priv;
@@ -2391,3 +2419,54 @@ xfs_rmap_record_exists(
irec.rm_startblock + irec.rm_blockcount >= bno + len);
return 0;
}
+
+struct xfs_rmap_has_other_keys {
+ uint64_t owner;
+ uint64_t offset;
+ bool *has_rmap;
+ unsigned int flags;
+};
+
+/* For each rmap given, figure out if it doesn't match the key we want. */
+STATIC int
+xfs_rmap_has_other_keys_helper(
+ struct xfs_btree_cur *cur,
+ struct xfs_rmap_irec *rec,
+ void *priv)
+{
+ struct xfs_rmap_has_other_keys *rhok = priv;
+
+ if (rhok->owner == rec->rm_owner && rhok->offset == rec->rm_offset &&
+ ((rhok->flags & rec->rm_flags) & XFS_RMAP_KEY_FLAGS) == rhok->flags)
+ return 0;
+ *rhok->has_rmap = true;
+ return XFS_BTREE_QUERY_RANGE_ABORT;
+}
+
+/*
+ * Given an extent and some owner info, can we find records overlapping
+ * the extent whose owner info does not match the given owner?
+ */
+int
+xfs_rmap_has_other_keys(
+ struct xfs_btree_cur *cur,
+ xfs_agblock_t bno,
+ xfs_extlen_t len,
+ struct xfs_owner_info *oinfo,
+ bool *has_rmap)
+{
+ struct xfs_rmap_irec low = {0};
+ struct xfs_rmap_irec high;
+ struct xfs_rmap_has_other_keys rhok;
+
+ xfs_owner_info_unpack(oinfo, &rhok.owner, &rhok.offset, &rhok.flags);
+ *has_rmap = false;
+ rhok.has_rmap = has_rmap;
+
+ low.rm_startblock = bno;
+ memset(&high, 0xFF, sizeof(high));
+ high.rm_startblock = bno + len - 1;
+
+ return xfs_rmap_query_range(cur, &low, &high,
+ xfs_rmap_has_other_keys_helper, &rhok);
+}
diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h
index 32c9382..180b127 100644
--- a/fs/xfs/libxfs/xfs_rmap.h
+++ b/fs/xfs/libxfs/xfs_rmap.h
@@ -224,5 +224,9 @@ int xfs_rmap_has_record(struct xfs_btree_cur *cur, xfs_agblock_t bno,
int xfs_rmap_record_exists(struct xfs_btree_cur *cur, xfs_agblock_t bno,
xfs_extlen_t len, struct xfs_owner_info *oinfo,
bool *has_rmap);
+int xfs_rmap_has_other_keys(struct xfs_btree_cur *cur, xfs_agblock_t bno,
+ xfs_extlen_t len, struct xfs_owner_info *oinfo,
+ bool *has_rmap);
+int xfs_rmap_map_raw(struct xfs_btree_cur *cur, struct xfs_rmap_irec *rmap);
#endif /* __XFS_RMAP_H__ */
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 04/19] xfs: add repair helpers for the reference count btree
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (2 preceding siblings ...)
2017-08-25 22:17 ` [PATCH 03/19] xfs: add repair helpers for the reverse mapping btree Darrick J. Wong
@ 2017-08-25 22:17 ` Darrick J. Wong
2017-08-25 22:17 ` [PATCH 05/19] xfs: add BMAPI_NORMAP flag to perform block remapping without updating rmpabt Darrick J. Wong
` (14 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:17 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Add a couple of functions to the refcount btree and generic btree code
that will be used to repair the refcountbt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_btree.c | 21 +++++++++++++++++++++
fs/xfs/libxfs/xfs_btree.h | 1 +
fs/xfs/libxfs/xfs_refcount.c | 17 +++++++++++++++++
fs/xfs/libxfs/xfs_refcount.h | 2 ++
4 files changed, 41 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 11cf13e..9ba49dd 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4946,3 +4946,24 @@ xfs_btree_has_record(
return 0;
}
+
+/* Are there more records in this btree? */
+bool
+xfs_btree_has_more_records(
+ struct xfs_btree_cur *cur)
+{
+ struct xfs_btree_block *block;
+ struct xfs_buf *bp;
+
+ block = xfs_btree_get_block(cur, 0, &bp);
+
+ /* There are still records in this block. */
+ if (cur->bc_ptrs[0] < xfs_btree_get_numrecs(block))
+ return true;
+
+ /* There are more record blocks. */
+ if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
+ return block->bb_u.l.bb_rightsib != cpu_to_be64(NULLFSBLOCK);
+ else
+ return block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK);
+}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 89da887..8666cc6 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -524,5 +524,6 @@ void xfs_btree_get_sibling(struct xfs_btree_cur *cur,
union xfs_btree_ptr *ptr, int lr);
int xfs_btree_has_record(struct xfs_btree_cur *cur, union xfs_btree_irec *low,
union xfs_btree_irec *high, bool *exists);
+bool xfs_btree_has_more_records(struct xfs_btree_cur *cur);
#endif /* __XFS_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index dc3fbcb..5c1c188 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -87,6 +87,23 @@ xfs_refcount_lookup_ge(
return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
}
+/*
+ * Look up the first record equal to [bno, len] in the btree
+ * given by cur.
+ */
+int
+xfs_refcount_lookup_eq(
+ struct xfs_btree_cur *cur,
+ xfs_agblock_t bno,
+ int *stat)
+{
+ trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_private.a.agno, bno,
+ XFS_LOOKUP_LE);
+ cur->bc_rec.rc.rc_startblock = bno;
+ cur->bc_rec.rc.rc_blockcount = 0;
+ return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
+}
+
/* Convert on-disk record to in-core format. */
void
xfs_refcount_btrec_to_irec(
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
index 5856abb..a92ad90 100644
--- a/fs/xfs/libxfs/xfs_refcount.h
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -24,6 +24,8 @@ extern int xfs_refcount_lookup_le(struct xfs_btree_cur *cur,
xfs_agblock_t bno, int *stat);
extern int xfs_refcount_lookup_ge(struct xfs_btree_cur *cur,
xfs_agblock_t bno, int *stat);
+extern int xfs_refcount_lookup_eq(struct xfs_btree_cur *cur,
+ xfs_agblock_t bno, int *stat);
extern int xfs_refcount_get_rec(struct xfs_btree_cur *cur,
struct xfs_refcount_irec *irec, int *stat);
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 05/19] xfs: add BMAPI_NORMAP flag to perform block remapping without updating rmpabt
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (3 preceding siblings ...)
2017-08-25 22:17 ` [PATCH 04/19] xfs: add repair helpers for the reference count btree Darrick J. Wong
@ 2017-08-25 22:17 ` Darrick J. Wong
2017-08-25 22:17 ` [PATCH 06/19] xfs: halt auto-reclamation activities while rebuilding rmap Darrick J. Wong
` (13 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:17 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Add a new flag, XFS_BMAPI_NORMAP, which will perform file block
remapping without updating the rmapbt. This will be used by the repair
code to reconstruct bmbts from the rmapbt, in which case we don't want
the rmapbt update.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 57 +++++++++++++++++++++++++++++-----------------
fs/xfs/libxfs/xfs_bmap.h | 10 +++++++-
2 files changed, 45 insertions(+), 22 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index c09c16b..2a58e1a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -2101,9 +2101,12 @@ xfs_bmap_add_extent_delay_real(
}
/* add reverse mapping */
- error = xfs_rmap_map_extent(mp, bma->dfops, bma->ip, whichfork, new);
- if (error)
- goto done;
+ if (!(bma->flags & XFS_BMAPI_NORMAP)) {
+ error = xfs_rmap_map_extent(mp, bma->dfops, bma->ip,
+ whichfork, new);
+ if (error)
+ goto done;
+ }
/* convert to a btree if necessary */
if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
@@ -2845,7 +2848,8 @@ xfs_bmap_add_extent_hole_real(
struct xfs_bmbt_irec *new,
xfs_fsblock_t *first,
struct xfs_defer_ops *dfops,
- int *logflagsp)
+ int *logflagsp,
+ int flags)
{
struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
struct xfs_mount *mp = ip->i_mount;
@@ -3059,9 +3063,11 @@ xfs_bmap_add_extent_hole_real(
}
/* add reverse mapping */
- error = xfs_rmap_map_extent(mp, dfops, ip, whichfork, new);
- if (error)
- goto done;
+ if (!(flags & XFS_BMAPI_NORMAP)) {
+ error = xfs_rmap_map_extent(mp, dfops, ip, whichfork, new);
+ if (error)
+ goto done;
+ }
/* convert to a btree if necessary */
if (xfs_bmap_needs_btree(ip, whichfork)) {
@@ -4293,7 +4299,8 @@ xfs_bmapi_allocate(
else
error = xfs_bmap_add_extent_hole_real(bma->tp, bma->ip,
whichfork, &bma->idx, &bma->cur, &bma->got,
- bma->firstblock, bma->dfops, &bma->logflags);
+ bma->firstblock, bma->dfops, &bma->logflags,
+ bma->flags);
bma->logflags |= tmp_logflags;
if (error)
@@ -4670,30 +4677,37 @@ xfs_bmapi_write(
return error;
}
-static int
+int
xfs_bmapi_remap(
struct xfs_trans *tp,
struct xfs_inode *ip,
xfs_fileoff_t bno,
xfs_filblks_t len,
xfs_fsblock_t startblock,
- struct xfs_defer_ops *dfops)
+ struct xfs_defer_ops *dfops,
+ int flags)
{
struct xfs_mount *mp = ip->i_mount;
- struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+ struct xfs_ifork *ifp;
struct xfs_btree_cur *cur = NULL;
xfs_fsblock_t firstblock = NULLFSBLOCK;
struct xfs_bmbt_irec got;
xfs_extnum_t idx;
+ int whichfork = xfs_bmapi_whichfork(flags);
int logflags = 0, error;
+ ifp = XFS_IFORK_PTR(ip, whichfork);
ASSERT(len > 0);
ASSERT(len <= (xfs_filblks_t)MAXEXTLEN);
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+ ASSERT(!(flags & (XFS_BMAPI_DELALLOC | XFS_BMAPI_COWFORK |
+ XFS_BMAPI_ZERO | XFS_BMAPI_CONVERT |
+ XFS_BMAPI_IGSTATE | XFS_BMAPI_METADATA |
+ XFS_BMAPI_ENTIRE)));
if (unlikely(XFS_TEST_ERROR(
- (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
+ (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
+ XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
mp, XFS_ERRTAG_BMAPIFORMAT))) {
XFS_ERROR_REPORT("xfs_bmapi_remap", XFS_ERRLEVEL_LOW, mp);
return -EFSCORRUPTED;
@@ -4703,7 +4717,7 @@ xfs_bmapi_remap(
return -EIO;
if (!(ifp->if_flags & XFS_IFEXTENTS)) {
- error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+ error = xfs_iread_extents(tp, ip, whichfork);
if (error)
return error;
}
@@ -4718,7 +4732,7 @@ xfs_bmapi_remap(
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
if (ifp->if_flags & XFS_IFBROOT) {
- cur = xfs_bmbt_init_cursor(mp, tp, ip, XFS_DATA_FORK);
+ cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
cur->bc_private.b.firstblock = firstblock;
cur->bc_private.b.dfops = dfops;
cur->bc_private.b.flags = 0;
@@ -4727,18 +4741,19 @@ xfs_bmapi_remap(
got.br_startoff = bno;
got.br_startblock = startblock;
got.br_blockcount = len;
- got.br_state = XFS_EXT_NORM;
+ got.br_state = (flags & XFS_BMAPI_PREALLOC) ? XFS_EXT_UNWRITTEN :
+ XFS_EXT_NORM;
- error = xfs_bmap_add_extent_hole_real(tp, ip, XFS_DATA_FORK, &idx, &cur,
- &got, &firstblock, dfops, &logflags);
+ error = xfs_bmap_add_extent_hole_real(tp, ip, whichfork, &idx, &cur,
+ &got, &firstblock, dfops, &logflags, flags);
if (error)
goto error0;
- if (xfs_bmap_wants_extents(ip, XFS_DATA_FORK)) {
+ if (xfs_bmap_wants_extents(ip, whichfork)) {
int tmp_logflags = 0;
error = xfs_bmap_btree_to_extents(tp, ip, cur,
- &tmp_logflags, XFS_DATA_FORK);
+ &tmp_logflags, whichfork);
logflags |= tmp_logflags;
}
@@ -6535,7 +6550,7 @@ xfs_bmap_finish_one(
switch (type) {
case XFS_BMAP_MAP:
error = xfs_bmapi_remap(tp, ip, startoff, *blockcount,
- startblock, dfops);
+ startblock, dfops, 0);
*blockcount = 0;
break;
case XFS_BMAP_UNMAP:
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 851982a..d6cb130 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -113,6 +113,9 @@ struct xfs_extent_free_item
/* Only convert delalloc space, don't allocate entirely new extents */
#define XFS_BMAPI_DELALLOC 0x400
+/* Don't update the rmap btree. */
+#define XFS_BMAPI_NORMAP 0x800
+
#define XFS_BMAPI_FLAGS \
{ XFS_BMAPI_ENTIRE, "ENTIRE" }, \
{ XFS_BMAPI_METADATA, "METADATA" }, \
@@ -124,7 +127,8 @@ struct xfs_extent_free_item
{ XFS_BMAPI_ZERO, "ZERO" }, \
{ XFS_BMAPI_REMAP, "REMAP" }, \
{ XFS_BMAPI_COWFORK, "COWFORK" }, \
- { XFS_BMAPI_DELALLOC, "DELALLOC" }
+ { XFS_BMAPI_DELALLOC, "DELALLOC" }, \
+ { XFS_BMAPI_NORMAP, "NORMAP" }
static inline int xfs_bmapi_aflag(int w)
@@ -277,4 +281,8 @@ int xfs_bmap_map_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
int xfs_bmap_unmap_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
+int xfs_bmapi_remap(struct xfs_trans *tp, struct xfs_inode *ip,
+ xfs_fileoff_t bno, xfs_filblks_t len, xfs_fsblock_t startblock,
+ struct xfs_defer_ops *dfops, int flags);
+
#endif /* __XFS_BMAP_H__ */
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 06/19] xfs: halt auto-reclamation activities while rebuilding rmap
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (4 preceding siblings ...)
2017-08-25 22:17 ` [PATCH 05/19] xfs: add BMAPI_NORMAP flag to perform block remapping without updating rmpabt Darrick J. Wong
@ 2017-08-25 22:17 ` Darrick J. Wong
2017-08-25 22:17 ` [PATCH 07/19] xfs: create tracepoints for online repair Darrick J. Wong
` (12 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:17 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Rebuilding the reverse-mapping tree requires us to quiesce all inodes in
the filesystem, so we must stop background reclamation of post-EOF and
CoW prealloc blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_icache.c | 18 ++++++++++++++++++
fs/xfs/xfs_icache.h | 3 +++
2 files changed, 21 insertions(+)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 0a9e698..7a715cd 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1737,3 +1737,21 @@ xfs_inode_clear_cowblocks_tag(
return __xfs_inode_clear_eofblocks_tag(ip,
trace_xfs_perag_clear_cowblocks, XFS_ICI_COWBLOCKS_TAG);
}
+
+/* Disable post-EOF and CoW block auto-reclamation. */
+void
+xfs_icache_disable_reclaim(
+ struct xfs_mount *mp)
+{
+ cancel_delayed_work_sync(&mp->m_eofblocks_work);
+ cancel_delayed_work_sync(&mp->m_cowblocks_work);
+}
+
+/* Enable post-EOF and CoW block auto-reclamation. */
+void
+xfs_icache_enable_reclaim(
+ struct xfs_mount *mp)
+{
+ xfs_queue_eofblocks(mp);
+ xfs_queue_cowblocks(mp);
+}
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index bff4d85..55dd3c6 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -130,4 +130,7 @@ xfs_fs_eofblocks_from_user(
int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp,
xfs_ino_t ino, bool *inuse);
+void xfs_icache_disable_reclaim(struct xfs_mount *mp);
+void xfs_icache_enable_reclaim(struct xfs_mount *mp);
+
#endif
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 07/19] xfs: create tracepoints for online repair
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (5 preceding siblings ...)
2017-08-25 22:17 ` [PATCH 06/19] xfs: halt auto-reclamation activities while rebuilding rmap Darrick J. Wong
@ 2017-08-25 22:17 ` Darrick J. Wong
2017-08-25 22:17 ` [PATCH 08/19] xfs: implement the metadata repair ioctl flag Darrick J. Wong
` (11 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:17 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
These tracepoints will be used to debug the online repair routines.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/scrub/trace.h | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 166 insertions(+)
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 6769e02..f389501 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -68,6 +68,8 @@ DEFINE_EVENT(xfs_scrub_class, name, \
DEFINE_SCRUB_EVENT(xfs_scrub_start);
DEFINE_SCRUB_EVENT(xfs_scrub_done);
DEFINE_SCRUB_EVENT(xfs_scrub_deadlock_retry);
+DEFINE_SCRUB_EVENT(xfs_repair_attempt);
+DEFINE_SCRUB_EVENT(xfs_repair_done);
TRACE_EVENT(xfs_scrub_op_error,
TP_PROTO(struct xfs_scrub_context *sc, xfs_agnumber_t agno,
@@ -489,6 +491,170 @@ TRACE_EVENT(xfs_scrub_xref_error,
__entry->ret_ip)
);
+/* repair tracepoints */
+
+/* XXX sort out this mess (and the two others below) later */
+
+#define trace_xfs_repair_free_or_unmap_extent(...)
+#define trace_xfs_repair_collect_btree_extent(...)
+#if 0
+DEFINE_BUSY_EVENT(xfs_repair_free_or_unmap_extent);
+DEFINE_BUSY_EVENT(xfs_repair_collect_btree_extent);
+#endif
+
+TRACE_EVENT(xfs_repair_init_btblock,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
+ xfs_btnum_t btnum),
+ TP_ARGS(mp, agno, agbno, btnum),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agblock_t, agbno)
+ __field(uint32_t, btnum)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->agbno = agbno;
+ __entry->btnum = btnum;
+ ),
+ TP_printk("dev %d:%d agno %u agbno %u btnum %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
+ __entry->agbno, __entry->btnum)
+)
+TRACE_EVENT(xfs_repair_find_ag_btree_roots_helper,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
+ uint32_t magic, uint16_t level),
+ TP_ARGS(mp, agno, agbno, magic, level),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agblock_t, agbno)
+ __field(uint32_t, magic)
+ __field(uint16_t, level)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->agbno = agbno;
+ __entry->magic = magic;
+ __entry->level = level;
+ ),
+ TP_printk("dev %d:%d agno %u agbno %u magic 0x%x level %u",
+ MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
+ __entry->agbno, __entry->magic, __entry->level)
+)
+TRACE_EVENT(xfs_repair_calc_ag_resblks,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_agino_t icount, xfs_agblock_t aglen, xfs_agblock_t freelen,
+ xfs_agblock_t usedlen),
+ TP_ARGS(mp, agno, icount, aglen, freelen, usedlen),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agino_t, icount)
+ __field(xfs_agblock_t, aglen)
+ __field(xfs_agblock_t, freelen)
+ __field(xfs_agblock_t, usedlen)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->icount = icount;
+ __entry->aglen = aglen;
+ __entry->freelen = freelen;
+ __entry->usedlen = usedlen;
+ ),
+ TP_printk("dev %d:%d agno %d icount %u aglen %u freelen %u usedlen %u",
+ MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
+ __entry->icount, __entry->aglen, __entry->freelen,
+ __entry->usedlen)
+)
+TRACE_EVENT(xfs_repair_calc_ag_resblks_btsize,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_agblock_t bnobt_sz, xfs_agblock_t inobt_sz,
+ xfs_agblock_t rmapbt_sz, xfs_agblock_t refcbt_sz),
+ TP_ARGS(mp, agno, bnobt_sz, inobt_sz, rmapbt_sz, refcbt_sz),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agblock_t, bnobt_sz)
+ __field(xfs_agblock_t, inobt_sz)
+ __field(xfs_agblock_t, rmapbt_sz)
+ __field(xfs_agblock_t, refcbt_sz)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->bnobt_sz = bnobt_sz;
+ __entry->inobt_sz = inobt_sz;
+ __entry->rmapbt_sz = rmapbt_sz;
+ __entry->refcbt_sz = refcbt_sz;
+ ),
+ TP_printk("dev %d:%d agno %d bno %u ino %u rmap %u refcount %u",
+ MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
+ __entry->bnobt_sz, __entry->inobt_sz, __entry->rmapbt_sz,
+ __entry->refcbt_sz)
+)
+TRACE_EVENT(xfs_repair_reset_counters,
+ TP_PROTO(struct xfs_mount *mp),
+ TP_ARGS(mp),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ ),
+ TP_printk("dev %d:%d",
+ MAJOR(__entry->dev), MINOR(__entry->dev))
+)
+
+#define trace_xfs_repair_agfl_insert(...)
+#define trace_xfs_repair_alloc_extent_fn(...)
+#define trace_xfs_repair_ialloc_extent_fn(...)
+#if 0
+DEFINE_BUSY_EVENT(xfs_repair_agfl_insert);
+DEFINE_RMAPBT_EVENT(xfs_repair_alloc_extent_fn);
+DEFINE_RMAPBT_EVENT(xfs_repair_ialloc_extent_fn);
+#endif
+
+TRACE_EVENT(xfs_repair_ialloc_insert,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_agino_t startino, uint16_t holemask, uint8_t count,
+ uint8_t freecount, uint64_t freemask),
+ TP_ARGS(mp, agno, startino, holemask, count, freecount, freemask),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agino_t, startino)
+ __field(uint16_t, holemask)
+ __field(uint8_t, count)
+ __field(uint8_t, freecount)
+ __field(uint64_t, freemask)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->startino = startino;
+ __entry->holemask = holemask;
+ __entry->count = count;
+ __entry->freecount = freecount;
+ __entry->freemask = freemask;
+ ),
+ TP_printk("dev %d:%d agno %d startino %u holemask 0x%x count %u freecount %u freemask 0x%llx",
+ MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
+ __entry->startino, __entry->holemask, __entry->count,
+ __entry->freecount, __entry->freemask)
+)
+#define trace_xfs_repair_rmap_extent_fn(...)
+#define trace_xfs_repair_refcount_extent_fn(...)
+#define trace_xfs_repair_bmap_extent_fn(...)
+#if 0
+DEFINE_RMAPBT_EVENT(xfs_repair_rmap_extent_fn);
+DEFINE_REFCOUNT_EXTENT_EVENT(xfs_repair_refcount_extent_fn);
+DEFINE_RMAPBT_EVENT(xfs_repair_bmap_extent_fn);
+#endif
+
#endif /* _TRACE_XFS_SCRUB_TRACE_H */
#undef TRACE_INCLUDE_PATH
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 08/19] xfs: implement the metadata repair ioctl flag
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (6 preceding siblings ...)
2017-08-25 22:17 ` [PATCH 07/19] xfs: create tracepoints for online repair Darrick J. Wong
@ 2017-08-25 22:17 ` Darrick J. Wong
2017-08-25 22:17 ` [PATCH 09/19] xfs: add helper routines for the repair code Darrick J. Wong
` (10 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:17 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Plumb in the pieces necessary to make the "scrub" subfunction of
the scrub ioctl actually work.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/Kconfig | 17 ++++++++
fs/xfs/scrub/scrub.c | 107 +++++++++++++++++++++++++++++++++++++++++++++++---
fs/xfs/scrub/scrub.h | 10 +++++
fs/xfs/xfs_error.c | 3 +
fs/xfs/xfs_error.h | 4 +-
5 files changed, 133 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index f42fcf1..06be67d 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -88,6 +88,23 @@ config XFS_ONLINE_SCRUB
If unsure, say N.
+config XFS_ONLINE_REPAIR
+ bool "XFS online metadata repair support"
+ default n
+ depends on XFS_FS && XFS_ONLINE_SCRUB
+ help
+ If you say Y here you will be able to repair metadata on a
+ mounted XFS filesystem. This feature is intended to reduce
+ filesystem downtime even further by fixing minor problems
+ before they cause the filesystem to go down. However, it
+ requires that the filesystem be formatted with secondary
+ metadata, such as reverse mappings and inode parent pointers.
+
+ This feature is considered EXPERIMENTAL. Use with caution!
+
+ See the xfs_scrub man page in section 8 for additional information.
+
+ If unsure, say N.
config XFS_WARN
bool "XFS Verbose Warnings"
depends on XFS_FS && !XFS_DEBUG
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 5f2c71d..cdc8233 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -42,6 +42,9 @@
#include "xfs_refcount_btree.h"
#include "xfs_rmap.h"
#include "xfs_rmap_btree.h"
+#include "xfs_error.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
@@ -121,6 +124,24 @@
* XCORRUPT flag; btree query function errors are noted by setting the
* XFAIL flag and deleting the cursor to prevent further attempts to
* cross-reference with a defective btree.
+ *
+ * If a piece of metadata proves corrupt or suboptimal, the userspace
+ * program can ask the kernel to apply some tender loving care (TLC) to
+ * the metadata object by setting the REPAIR flag and re-calling the
+ * scrub ioctl. "Corruption" is defined by metadata violating the
+ * on-disk specification; operations cannot continue if the violation is
+ * left untreated. It is possible for XFS to continue if an object is
+ * "suboptimal", however performance may be degraded. Repairs are
+ * usually performed by rebuilding the metadata entirely out of
+ * redundant metadata. Optimizing, on the other hand, can sometimes be
+ * done without rebuilding entire structures.
+ *
+ * Generally speaking, the repair code has the following code structure:
+ * Lock -> scrub -> repair -> commit -> re-lock -> re-scrub -> unlock.
+ * The first check helps us figure out if we need to rebuild or simply
+ * optimize the structure so that the rebuild knows what to do. The
+ * second check evaluates the completeness of the repair; that is what
+ * is reported to userspace.
*/
/*
@@ -162,7 +183,10 @@ xfs_scrub_teardown(
{
xfs_scrub_ag_free(sc, &sc->sa);
if (sc->tp) {
- xfs_trans_cancel(sc->tp);
+ if (error == 0 && (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR))
+ error = xfs_trans_commit(sc->tp);
+ else
+ xfs_trans_cancel(sc->tp);
sc->tp = NULL;
}
if (sc->ip) {
@@ -184,6 +208,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
{ /* ioctl presence test */
.setup = xfs_scrub_setup_fs,
.scrub = xfs_scrub_tester,
+ .repair = xfs_scrub_tester,
},
{ /* superblock */
.setup = xfs_scrub_setup_ag_header,
@@ -295,6 +320,18 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
#endif
};
+#if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
+static inline bool xfs_scrub_can_repair(struct xfs_mount *mp)
+{
+ return xfs_sb_version_hascrc(&mp->m_sb);
+}
+#else
+static inline bool xfs_scrub_can_repair(struct xfs_mount *mp)
+{
+ return false;
+}
+#endif
+
/* Dispatch metadata scrubbing. */
int
xfs_scrub_metadata(
@@ -304,7 +341,10 @@ xfs_scrub_metadata(
struct xfs_scrub_context sc;
struct xfs_mount *mp = ip->i_mount;
const struct xfs_scrub_meta_ops *ops;
+ char *errstr;
bool try_harder = false;
+ bool already_fixed = false;
+ bool was_corrupt = false;
int error = 0;
trace_xfs_scrub_start(ip, sm, error);
@@ -337,10 +377,17 @@ xfs_scrub_metadata(
if (ops->has && !ops->has(&mp->m_sb))
goto out;
- /* We don't know how to repair anything yet. */
- error = -EOPNOTSUPP;
- if (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
- goto out;
+ /* Can we repair it? */
+ if (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) {
+ /* Only allow repair for metadata we know how to fix. */
+ error = -EOPNOTSUPP;
+ if (!xfs_scrub_can_repair(mp) || ops->repair == NULL)
+ goto out;
+
+ error = -EROFS;
+ if (mp->m_flags & XFS_MOUNT_RDONLY)
+ goto out;
+ }
/* This isn't a stable feature. Use with care. */
{
@@ -382,9 +429,55 @@ xfs_scrub_metadata(
} else if (error)
goto out_teardown;
+ /* Let debug users force us into the repair routines. */
+ if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed &&
+ XFS_TEST_ERROR(false, mp,
+ XFS_ERRTAG_FORCE_SCRUB_REPAIR)) {
+ sc.sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+ }
+ if (!already_fixed)
+ was_corrupt = !!(sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
+ XFS_SCRUB_OFLAG_XCORRUPT));
+
+ if (!already_fixed && xfs_scrub_should_fix(sc.sm)) {
+ xfs_scrub_ag_btcur_free(&sc.sa);
+
+ /* Ok, something's wrong. Repair it. */
+ trace_xfs_repair_attempt(ip, sc.sm, error);
+ error = sc.ops->repair(&sc);
+ trace_xfs_repair_done(ip, sc.sm, error);
+ if (!try_harder && error == -EDEADLOCK) {
+ error = xfs_scrub_teardown(&sc, ip, 0);
+ if (error)
+ goto out_dec;
+ try_harder = true;
+ goto retry_op;
+ } else if (error)
+ goto out_teardown;
+
+ /*
+ * Commit the fixes and perform a second dry-run scrub
+ * so that we can tell userspace if we fixed the problem.
+ */
+ error = xfs_scrub_teardown(&sc, ip, error);
+ if (error)
+ goto out_dec;
+ sc.sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
+ already_fixed = true;
+ goto retry_op;
+ }
+
if (sc.sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
- XFS_SCRUB_OFLAG_XCORRUPT))
- xfs_alert_ratelimited(mp, "Corruption detected during scrub.");
+ XFS_SCRUB_OFLAG_XCORRUPT)) {
+ if (sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
+ errstr = "Corruption not fixed during online repair. "
+ "Unmount and run xfs_repair.";
+ else
+ errstr = "Corruption detected during scrub.";
+ xfs_alert_ratelimited(mp, errstr);
+ } else if (already_fixed && was_corrupt) {
+ xfs_alert_ratelimited(mp, "Corruption repaired during scrub.");
+ }
out_teardown:
error = xfs_scrub_teardown(&sc, ip, error);
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 3218664..0713eda 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -26,9 +26,19 @@ struct xfs_scrub_meta_ops {
int (*setup)(struct xfs_scrub_context *,
struct xfs_inode *);
int (*scrub)(struct xfs_scrub_context *);
+ int (*repair)(struct xfs_scrub_context *);
bool (*has)(struct xfs_sb *);
};
+/* Did userspace tell us we can repair /and/ we found something to fix? */
+static inline bool xfs_scrub_should_fix(struct xfs_scrub_metadata *sm)
+{
+ return (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) &&
+ (sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
+ XFS_SCRUB_OFLAG_XCORRUPT |
+ XFS_SCRUB_OFLAG_PREEN));
+}
+
/* Buffer pointers and btree cursors for an entire AG. */
struct xfs_scrub_ag {
xfs_agnumber_t agno;
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index 8cebbaa..5ff86c4 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -57,6 +57,7 @@ static unsigned int xfs_errortag_random_default[] = {
XFS_RANDOM_AG_RESV_CRITICAL,
XFS_RANDOM_DROP_WRITES,
XFS_RANDOM_LOG_BAD_CRC,
+ XFS_RANDOM_FORCE_SCRUB_REPAIR,
};
struct xfs_errortag_attr {
@@ -161,6 +162,7 @@ XFS_ERRORTAG_ATTR_RW(bmap_finish_one, XFS_ERRTAG_BMAP_FINISH_ONE);
XFS_ERRORTAG_ATTR_RW(ag_resv_critical, XFS_ERRTAG_AG_RESV_CRITICAL);
XFS_ERRORTAG_ATTR_RW(drop_writes, XFS_ERRTAG_DROP_WRITES);
XFS_ERRORTAG_ATTR_RW(log_bad_crc, XFS_ERRTAG_LOG_BAD_CRC);
+XFS_ERRORTAG_ATTR_RW(force_repair, XFS_ERRTAG_FORCE_SCRUB_REPAIR);
static struct attribute *xfs_errortag_attrs[] = {
XFS_ERRORTAG_ATTR_LIST(noerror),
@@ -193,6 +195,7 @@ static struct attribute *xfs_errortag_attrs[] = {
XFS_ERRORTAG_ATTR_LIST(ag_resv_critical),
XFS_ERRORTAG_ATTR_LIST(drop_writes),
XFS_ERRORTAG_ATTR_LIST(log_bad_crc),
+ XFS_ERRORTAG_ATTR_LIST(force_repair),
NULL,
};
diff --git a/fs/xfs/xfs_error.h b/fs/xfs/xfs_error.h
index 7577be5..6ee23eb 100644
--- a/fs/xfs/xfs_error.h
+++ b/fs/xfs/xfs_error.h
@@ -106,7 +106,8 @@ extern void xfs_verifier_error(struct xfs_buf *bp);
*/
#define XFS_ERRTAG_DROP_WRITES 28
#define XFS_ERRTAG_LOG_BAD_CRC 29
-#define XFS_ERRTAG_MAX 30
+#define XFS_ERRTAG_FORCE_SCRUB_REPAIR 30
+#define XFS_ERRTAG_MAX 31
/*
* Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@@ -141,6 +142,7 @@ extern void xfs_verifier_error(struct xfs_buf *bp);
#define XFS_RANDOM_AG_RESV_CRITICAL 4
#define XFS_RANDOM_DROP_WRITES 1
#define XFS_RANDOM_LOG_BAD_CRC 1
+#define XFS_RANDOM_FORCE_SCRUB_REPAIR 1
#ifdef DEBUG
extern int xfs_errortag_init(struct xfs_mount *mp);
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 09/19] xfs: add helper routines for the repair code
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (7 preceding siblings ...)
2017-08-25 22:17 ` [PATCH 08/19] xfs: implement the metadata repair ioctl flag Darrick J. Wong
@ 2017-08-25 22:17 ` Darrick J. Wong
2017-08-25 22:17 ` [PATCH 10/19] xfs: repair superblocks Darrick J. Wong
` (9 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:17 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Add some helper functions for repair functions that will help us to
allocate and initialize new metadata blocks for btrees that we're
rebuilding.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/Makefile | 1
fs/xfs/scrub/common.c | 4
fs/xfs/scrub/common.h | 3
fs/xfs/scrub/repair.c | 911 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/repair.h | 74 ++++
fs/xfs/scrub/scrub.c | 3
fs/xfs/scrub/scrub.h | 9
7 files changed, 1004 insertions(+), 1 deletion(-)
create mode 100644 fs/xfs/scrub/repair.c
create mode 100644 fs/xfs/scrub/repair.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index f8b3915..2dc82d5 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -156,6 +156,7 @@ xfs-y += $(addprefix scrub/, \
inode.o \
parent.o \
refcount.o \
+ repair.o \
rmap.o \
scrub.o \
symlink.o \
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index db32c31..515bee6 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -47,6 +47,7 @@
#include "scrub/common.h"
#include "scrub/trace.h"
#include "scrub/btree.h"
+#include "scrub/repair.h"
/* Common code for the metadata scrubbers. */
@@ -699,7 +700,8 @@ xfs_scrub_setup_fs(
struct xfs_inode *ip)
{
return xfs_scrub_trans_alloc(sc->sm, sc->mp,
- &M_RES(sc->mp)->tr_itruncate, 0, 0, 0, &sc->tp);
+ &M_RES(sc->mp)->tr_itruncate,
+ xfs_repair_calc_ag_resblks(sc), 0, 0, &sc->tp);
}
/* Set us up with AG headers and btree cursors. */
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index e20fb1d..6db9cfe 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -48,6 +48,9 @@ xfs_scrub_trans_alloc(
uint flags,
struct xfs_trans **tpp)
{
+ if (sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
+ return xfs_trans_alloc(mp, resp, blocks, rtextents, flags, tpp);
+
return xfs_trans_alloc_empty(mp, tpp);
}
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
new file mode 100644
index 0000000..9df2f97
--- /dev/null
+++ b/fs/xfs/scrub/repair.c
@@ -0,0 +1,911 @@
+/*
+ * Copyright (C) 2017 Oracle. All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_extent_busy.h"
+#include "xfs_ag_resv.h"
+#include "xfs_trans_space.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+
+/*
+ * Roll a transaction, keeping the AG headers locked and reinitializing
+ * the btree cursors.
+ */
+int
+xfs_repair_roll_ag_trans(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_trans *tp;
+ int error;
+
+ /* Keep the AG header buffers locked so we can keep going. */
+ xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
+ xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
+ xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
+
+ /* Roll the transaction. */
+ tp = sc->tp;
+ error = xfs_trans_roll(&sc->tp, NULL);
+ if (error)
+ return error;
+
+ /* Join the buffer to the new transaction or release the hold. */
+ if (sc->tp != tp) {
+ xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
+ xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
+ xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
+ } else {
+ xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
+ xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
+ xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
+ }
+
+ return error;
+}
+
+/*
+ * Does the given AG have enough space to rebuild a btree? Neither AG
+ * reservation can be critical, and we must have enough space (factoring
+ * in AG reservations) to construct a whole btree.
+ */
+bool
+xfs_repair_ag_has_space(
+ struct xfs_perag *pag,
+ xfs_extlen_t nr_blocks,
+ enum xfs_ag_resv_type type)
+{
+ return !xfs_ag_resv_critical(pag, XFS_AG_RESV_AGFL) &&
+ !xfs_ag_resv_critical(pag, XFS_AG_RESV_METADATA) &&
+ pag->pagf_freeblks > xfs_ag_resv_needed(pag, type) + nr_blocks;
+}
+
+/* Allocate a block in an AG. */
+int
+xfs_repair_alloc_ag_block(
+ struct xfs_scrub_context *sc,
+ struct xfs_owner_info *oinfo,
+ xfs_fsblock_t *fsbno,
+ enum xfs_ag_resv_type resv)
+{
+ struct xfs_alloc_arg args = {0};
+ xfs_agblock_t bno;
+ int error;
+
+ if (resv == XFS_AG_RESV_AGFL) {
+ error = xfs_alloc_get_freelist(sc->tp, sc->sa.agf_bp, &bno, 1);
+ if (error)
+ return error;
+ if (bno == NULLAGBLOCK)
+ return -ENOSPC;
+ xfs_extent_busy_reuse(sc->mp, sc->sa.agno, bno,
+ 1, false);
+ *fsbno = XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, bno);
+ return 0;
+ }
+
+ args.tp = sc->tp;
+ args.mp = sc->mp;
+ args.oinfo = *oinfo;
+ args.fsbno = XFS_AGB_TO_FSB(args.mp, sc->sa.agno, 0);
+ args.minlen = 1;
+ args.maxlen = 1;
+ args.prod = 1;
+ args.type = XFS_ALLOCTYPE_NEAR_BNO;
+ args.resv = resv;
+
+ error = xfs_alloc_vextent(&args);
+ if (error)
+ return error;
+ if (args.fsbno == NULLFSBLOCK)
+ return -ENOSPC;
+ ASSERT(args.len == 1);
+ *fsbno = args.fsbno;
+
+ return 0;
+}
+
+/* Initialize an AG block to a zeroed out btree header. */
+int
+xfs_repair_init_btblock(
+ struct xfs_scrub_context *sc,
+ xfs_fsblock_t fsb,
+ struct xfs_buf **bpp,
+ xfs_btnum_t btnum,
+ const struct xfs_buf_ops *ops)
+{
+ struct xfs_trans *tp = sc->tp;
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_buf *bp;
+
+ trace_xfs_repair_init_btblock(mp, XFS_FSB_TO_AGNO(mp, fsb),
+ XFS_FSB_TO_AGBNO(mp, fsb), btnum);
+
+ ASSERT(XFS_FSB_TO_AGNO(mp, fsb) == sc->sa.agno);
+ bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, XFS_FSB_TO_DADDR(mp, fsb),
+ XFS_FSB_TO_BB(mp, 1), 0);
+ xfs_buf_zero(bp, 0, BBTOB(bp->b_length));
+ xfs_btree_init_block(mp, bp, btnum, 0, 0, sc->sa.agno,
+ XFS_BTREE_CRC_BLOCKS);
+ xfs_trans_buf_set_type(tp, bp, XFS_BLFT_BTREE_BUF);
+ xfs_trans_log_buf(tp, bp, 0, bp->b_length);
+ bp->b_ops = ops;
+ *bpp = bp;
+
+ return 0;
+}
+
+/* Ensure the freelist is full. */
+int
+xfs_repair_fix_freelist(
+ struct xfs_scrub_context *sc,
+ bool can_shrink)
+{
+ struct xfs_alloc_arg args = {0};
+ int error;
+
+ args.mp = sc->mp;
+ args.tp = sc->tp;
+ args.agno = sc->sa.agno;
+ args.alignment = 1;
+ args.pag = xfs_perag_get(args.mp, sc->sa.agno);
+ args.resv = XFS_AG_RESV_AGFL;
+
+ error = xfs_alloc_fix_freelist(&args,
+ can_shrink ? 0 : XFS_ALLOC_FLAG_NOSHRINK);
+ xfs_perag_put(args.pag);
+
+ return error;
+}
+
+/* Put a block back on the AGFL. */
+int
+xfs_repair_put_freelist(
+ struct xfs_scrub_context *sc,
+ xfs_agblock_t agbno)
+{
+ struct xfs_owner_info oinfo;
+ int error;
+
+ /*
+ * Since we're "freeing" a lost block onto the AGFL, we have to
+ * create an rmap for the block prior to merging it or else other
+ * parts will break.
+ */
+ xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+ error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno, agbno, 1,
+ &oinfo);
+ if (error)
+ return error;
+
+ /* Put the block on the AGFL. */
+ error = xfs_alloc_put_freelist(sc->tp, sc->sa.agf_bp, sc->sa.agfl_bp,
+ agbno, 0);
+ if (error)
+ return error;
+ xfs_extent_busy_insert(sc->tp, sc->sa.agno, agbno, 1,
+ XFS_EXTENT_BUSY_SKIP_DISCARD);
+
+ /* Make sure the AGFL doesn't overfill. */
+ return xfs_repair_fix_freelist(sc, true);
+}
+
+/*
+ * For a given metadata extent and owner, delete the associated rmap.
+ * If the block has no other owners, free it.
+ */
+STATIC int
+xfs_repair_free_or_unmap_extent(
+ struct xfs_scrub_context *sc,
+ xfs_fsblock_t fsbno,
+ xfs_extlen_t len,
+ struct xfs_owner_info *oinfo,
+ enum xfs_ag_resv_type resv)
+{
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_btree_cur *rmap_cur;
+ struct xfs_buf *agf_bp = NULL;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ bool has_other_rmap;
+ int error = 0;
+
+ ASSERT(xfs_sb_version_hasrmapbt(&mp->m_sb));
+ agno = XFS_FSB_TO_AGNO(mp, fsbno);
+ agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
+
+ trace_xfs_repair_free_or_unmap_extent(mp, agno, agbno, len);
+
+ for (; len > 0 && !error; len--, agbno++, fsbno++) {
+ ASSERT(sc->ip != NULL || agno == sc->sa.agno);
+
+ /* Can we find any other rmappings? */
+ if (sc->ip) {
+ error = xfs_alloc_read_agf(mp, sc->tp, agno, 0,
+ &agf_bp);
+ if (error)
+ break;
+ if (!agf_bp) {
+ error = -ENOMEM;
+ break;
+ }
+ }
+ rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp,
+ agf_bp ? agf_bp : sc->sa.agf_bp, agno);
+ error = xfs_rmap_has_other_keys(rmap_cur, agbno, 1, oinfo,
+ &has_other_rmap);
+ if (error)
+ goto out_cur;
+ xfs_btree_del_cursor(rmap_cur, XFS_BTREE_NOERROR);
+ if (agf_bp)
+ xfs_trans_brelse(sc->tp, agf_bp);
+
+ /*
+ * If there are other rmappings, this block is cross
+ * linked and must not be freed. Remove the reverse
+ * mapping and move on. Otherwise, we were the only
+ * owner of the block, so free the extent, which will
+ * also remove the rmap.
+ */
+ if (has_other_rmap)
+ error = xfs_rmap_free(sc->tp, agf_bp, agno, agbno, 1,
+ oinfo);
+ else if (resv == XFS_AG_RESV_AGFL)
+ error = xfs_repair_put_freelist(sc, agbno);
+ else
+ error = xfs_free_extent(sc->tp, fsbno, 1, oinfo, resv);
+ if (error)
+ break;
+
+ if (sc->ip)
+ error = xfs_trans_roll(&sc->tp, sc->ip);
+ else
+ error = xfs_repair_roll_ag_trans(sc);
+ }
+
+ return error;
+out_cur:
+ xfs_btree_del_cursor(rmap_cur, XFS_BTREE_ERROR);
+ if (agf_bp)
+ xfs_trans_brelse(sc->tp, agf_bp);
+ return error;
+}
+
+/* Collect a dead btree extent for later disposal. */
+int
+xfs_repair_collect_btree_extent(
+ struct xfs_scrub_context *sc,
+ struct list_head *btlist,
+ xfs_fsblock_t fsbno,
+ xfs_extlen_t len)
+{
+ struct xfs_repair_btree_extent *rbe;
+
+ trace_xfs_repair_collect_btree_extent(sc->mp,
+ XFS_FSB_TO_AGNO(mp, fsbno),
+ XFS_FSB_TO_AGBNO(mp, fsbno), len);
+
+ rbe = kmem_alloc(sizeof(struct xfs_repair_btree_extent),
+ KM_MAYFAIL | KM_NOFS);
+ if (!rbe)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&rbe->list);
+ rbe->fsbno = fsbno;
+ rbe->len = len;
+ list_add_tail(&rbe->list, btlist);
+
+ return 0;
+}
+
+/* Invalidate buffers for blocks we're dumping. */
+int
+xfs_repair_invalidate_blocks(
+ struct xfs_scrub_context *sc,
+ struct list_head *btlist)
+{
+ struct xfs_repair_btree_extent *rbe;
+ struct xfs_repair_btree_extent *n;
+ struct xfs_buf *bp;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ xfs_agblock_t i;
+
+ list_for_each_entry_safe(rbe, n, btlist, list) {
+ agno = XFS_FSB_TO_AGNO(sc->mp, rbe->fsbno);
+ agbno = XFS_FSB_TO_AGBNO(sc->mp, rbe->fsbno);
+ for (i = 0; i < rbe->len; i++) {
+ bp = xfs_btree_get_bufs(sc->mp, sc->tp, agno,
+ agbno + i, 0);
+ xfs_trans_binval(sc->tp, bp);
+ }
+ }
+
+ return 0;
+}
+
+/* Dispose of dead btree extents. If oinfo is NULL, just delete the list. */
+int
+xfs_repair_reap_btree_extents(
+ struct xfs_scrub_context *sc,
+ struct list_head *btlist,
+ struct xfs_owner_info *oinfo,
+ enum xfs_ag_resv_type type)
+{
+ struct xfs_repair_btree_extent *rbe;
+ struct xfs_repair_btree_extent *n;
+ int error = 0;
+
+ list_for_each_entry_safe(rbe, n, btlist, list) {
+ if (oinfo) {
+ error = xfs_repair_free_or_unmap_extent(sc, rbe->fsbno,
+ rbe->len, oinfo, type);
+ if (error)
+ oinfo = NULL;
+ }
+ list_del(&rbe->list);
+ kmem_free(rbe);
+ }
+
+ return error;
+}
+
+/* Errors happened, just delete the dead btree extent list. */
+void
+xfs_repair_cancel_btree_extents(
+ struct xfs_scrub_context *sc,
+ struct list_head *btlist)
+{
+ xfs_repair_reap_btree_extents(sc, btlist, NULL, XFS_AG_RESV_NONE);
+}
+
+/* Compare two btree extents. */
+static int
+xfs_repair_btree_extent_cmp(
+ void *priv,
+ struct list_head *a,
+ struct list_head *b)
+{
+ struct xfs_repair_btree_extent *ap;
+ struct xfs_repair_btree_extent *bp;
+
+ ap = container_of(a, struct xfs_repair_btree_extent, list);
+ bp = container_of(b, struct xfs_repair_btree_extent, list);
+
+ if (ap->fsbno > bp->fsbno)
+ return 1;
+ else if (ap->fsbno < bp->fsbno)
+ return -1;
+ return 0;
+}
+
+/*
+ * Process a block on the extent list. This involves advancing along the
+ * subtract list until it catches fsb, then deciding if fsb is to be
+ * preserved on the list or removed from it, and doing all the list
+ * bookkeeping as necessary.
+ */
+STATIC int
+xfs_repair_subtract_extents_block(
+ struct xfs_repair_btree_extent *sub,
+ struct xfs_repair_btree_extent **subp,
+ struct list_head *sublist,
+ struct xfs_repair_btree_extent **rbe,
+ xfs_fsblock_t *newfsb,
+ xfs_fsblock_t fsb,
+ xfs_extlen_t *newlen)
+{
+ struct xfs_repair_btree_extent *newrbe;
+
+ /*
+ * If the current location of the extent list is beyond the
+ * subtract list, move the subtract list forward by one block or
+ * by one record.
+ */
+ while (fsb > sub->fsbno || sub->len == 0) {
+ if (sub->len) {
+ sub->len--;
+ sub->fsbno++;
+ } else {
+ /*
+ * Get the next subtract extent. If there isn't
+ * one, make the current extent match the
+ * unprocessed part of that extent, and jump
+ * out.
+ */
+ if ((*subp)->list.next == sublist ||
+ (*subp)->list.next == NULL) {
+ (*rbe)->len -= fsb - (*rbe)->fsbno;
+ (*rbe)->fsbno = fsb;
+ *subp = NULL;
+ *rbe = NULL;
+ return 0;
+ }
+ *subp = list_next_entry(*subp, list);
+ memcpy(sub, *subp, sizeof(*sub));
+ }
+ }
+
+ if (fsb != sub->fsbno) {
+ /*
+ * Block not in the subtract list; stash it for later
+ * reinsertion in the list.
+ */
+ if (*newfsb == NULLFSBLOCK) {
+ *newfsb = fsb;
+ *newlen = 1;
+ } else
+ (*newlen)++;
+ } else {
+ /* Match! */
+ if (*newfsb != NULLFSBLOCK) {
+ /*
+ * Last block of the extent and we have a saved
+ * extent. Store the saved extent in this
+ * extent.
+ */
+ if (fsb == (*rbe)->fsbno + (*rbe)->len - 1) {
+ (*rbe)->fsbno = *newfsb;
+ (*rbe)->len = *newlen;
+ *newfsb = NULLFSBLOCK;
+ *rbe = NULL;
+ return 0;
+ }
+ /* Stash the new extent in the list. */
+ newrbe = kmem_alloc(
+ sizeof(struct xfs_repair_btree_extent),
+ KM_MAYFAIL | KM_NOFS);
+ if (!newrbe)
+ return -ENOMEM;
+ INIT_LIST_HEAD(&newrbe->list);
+ newrbe->fsbno = *newfsb;
+ newrbe->len = *newlen;
+ list_add_tail(&newrbe->list,
+ &(*rbe)->list);
+ }
+
+ *newfsb = NULLFSBLOCK;
+ *newlen = 0;
+ }
+
+ return 0;
+}
+
+/* Remove all the blocks in sublist from exlist. */
+int
+xfs_repair_subtract_extents(
+ struct xfs_scrub_context *sc,
+ struct list_head *exlist,
+ struct list_head *sublist)
+{
+ struct xfs_repair_btree_extent *newrbe;
+ struct xfs_repair_btree_extent *rbe;
+ struct xfs_repair_btree_extent *n;
+ struct xfs_repair_btree_extent *subp;
+ struct xfs_repair_btree_extent sub;
+ xfs_fsblock_t fsb;
+ xfs_fsblock_t newfsb;
+ xfs_extlen_t newlen;
+ int error;
+
+ list_sort(NULL, exlist, xfs_repair_btree_extent_cmp);
+ list_sort(NULL, sublist, xfs_repair_btree_extent_cmp);
+
+ subp = list_first_entry(sublist, struct xfs_repair_btree_extent, list);
+ if (subp == NULL)
+ return 0;
+
+ memcpy(&sub, subp, sizeof(sub));
+ /* For every block mentioned in exlist... */
+ list_for_each_entry_safe(rbe, n, exlist, list) {
+ newfsb = NULLFSBLOCK;
+ newlen = 0;
+ for (fsb = rbe->fsbno; fsb < rbe->fsbno + rbe->len; fsb++) {
+ error = xfs_repair_subtract_extents_block(&sub, &subp,
+ sublist, &rbe, &newfsb, fsb, &newlen);
+ if (error)
+ return error;
+ }
+
+ /* If we have an extent to add back, do that now. */
+ if (newfsb != NULLFSBLOCK) {
+ if (rbe) {
+ newrbe = rbe;
+ rbe = NULL;
+ } else {
+ newrbe = kmem_alloc(
+ sizeof(struct xfs_repair_btree_extent),
+ KM_MAYFAIL | KM_NOFS);
+ if (!newrbe)
+ return -ENOMEM;
+ INIT_LIST_HEAD(&newrbe->list);
+ list_add_tail(&newrbe->list, &rbe->list);
+ }
+ newrbe->fsbno = newfsb;
+ newrbe->len = newlen;
+ }
+ if (rbe) {
+ list_del(&rbe->list);
+ kmem_free(rbe);
+ }
+ if (subp == NULL)
+ break;
+ }
+
+ return 0;
+}
+
+struct xfs_repair_find_ag_btree_roots_info {
+ struct xfs_buf *agfl_bp;
+ struct xfs_repair_find_ag_btree *btree_info;
+};
+
+/* Is this an OWN_AG block in the AGFL? */
+STATIC bool
+xfs_repair_is_block_in_agfl(
+ struct xfs_mount *mp,
+ uint64_t rmap_owner,
+ xfs_agblock_t agbno,
+ struct xfs_buf *agf_bp,
+ struct xfs_buf *agfl_bp)
+{
+ struct xfs_agf *agf;
+ __be32 *agfl_bno;
+ unsigned int flfirst;
+ unsigned int fllast;
+ int i;
+
+ if (rmap_owner != XFS_RMAP_OWN_AG)
+ return false;
+
+ agf = XFS_BUF_TO_AGF(agf_bp);
+ agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agfl_bp);
+ flfirst = be32_to_cpu(agf->agf_flfirst);
+ fllast = be32_to_cpu(agf->agf_fllast);
+
+ /* Skip an empty AGFL. */
+ if (agf->agf_flcount == cpu_to_be32(0))
+ return false;
+
+ /* first to last is a consecutive list. */
+ if (fllast >= flfirst) {
+ for (i = flfirst; i <= fllast; i++) {
+ if (be32_to_cpu(agfl_bno[i]) == agbno)
+ return true;
+ }
+
+ return false;
+ }
+
+ /* first to the end */
+ for (i = flfirst; i < XFS_AGFL_SIZE(mp); i++) {
+ if (be32_to_cpu(agfl_bno[i]) == agbno)
+ return true;
+ }
+
+ /* the start to last. */
+ for (i = 0; i <= fllast; i++) {
+ if (be32_to_cpu(agfl_bno[i]) == agbno)
+ return true;
+ }
+
+ return false;
+}
+
+/* Find btree roots from the AGF. */
+STATIC int
+xfs_repair_find_ag_btree_roots_helper(
+ struct xfs_btree_cur *cur,
+ struct xfs_rmap_irec *rec,
+ void *priv)
+{
+ struct xfs_mount *mp = cur->bc_mp;
+ struct xfs_repair_find_ag_btree_roots_info *ri = priv;
+ struct xfs_repair_find_ag_btree *fab;
+ struct xfs_buf *bp;
+ struct xfs_btree_block *btblock;
+ xfs_daddr_t daddr;
+ xfs_agblock_t agbno;
+ int error = 0;
+
+ if (!XFS_RMAP_NON_INODE_OWNER(rec->rm_owner))
+ return 0;
+
+ for (agbno = 0; agbno < rec->rm_blockcount; agbno++) {
+ daddr = XFS_AGB_TO_DADDR(mp, cur->bc_private.a.agno,
+ rec->rm_startblock + agbno);
+ for (fab = ri->btree_info; fab->buf_ops; fab++) {
+ if (rec->rm_owner != fab->rmap_owner)
+ continue;
+
+ /*
+ * Blocks in the AGFL have stale contents that
+ * might just happen to have a matching magic
+ * and uuid. We don't want to pull these blocks
+ * in as part of a tree root, so we have to
+ * filter out the AGFL stuff here. If the AGFL
+ * looks insane we'll just refuse to repair.
+ */
+ if (xfs_repair_is_block_in_agfl(mp, rec->rm_owner,
+ rec->rm_startblock + agbno,
+ cur->bc_private.a.agbp, ri->agfl_bp))
+ continue;
+
+ error = xfs_trans_read_buf(mp, cur->bc_tp,
+ mp->m_ddev_targp, daddr, mp->m_bsize,
+ 0, &bp, NULL);
+ if (error)
+ return error;
+
+ /* Does this look like a block we want? */
+ btblock = XFS_BUF_TO_BLOCK(bp);
+ if (be32_to_cpu(btblock->bb_magic) != fab->magic)
+ goto next_fab;
+ if (xfs_sb_version_hascrc(&mp->m_sb) &&
+ !uuid_equal(&btblock->bb_u.s.bb_uuid,
+ &mp->m_sb.sb_meta_uuid))
+ goto next_fab;
+ if (fab->root != NULLAGBLOCK &&
+ xfs_btree_get_level(btblock) <= fab->level)
+ goto next_fab;
+
+ /* Make sure we pass the verifiers. */
+ bp->b_ops = fab->buf_ops;
+ bp->b_ops->verify_read(bp);
+ if (bp->b_error)
+ goto next_fab;
+ fab->root = rec->rm_startblock + agbno;
+ fab->level = xfs_btree_get_level(btblock);
+
+ trace_xfs_repair_find_ag_btree_roots_helper(mp,
+ cur->bc_private.a.agno,
+ rec->rm_startblock + agbno,
+ be32_to_cpu(btblock->bb_magic),
+ fab->level);
+next_fab:
+ xfs_trans_brelse(cur->bc_tp, bp);
+ if (be32_to_cpu(btblock->bb_magic) == fab->magic)
+ break;
+ }
+ }
+
+ return error;
+}
+
+/* Find the roots of the given btrees from the rmap info. */
+int
+xfs_repair_find_ag_btree_roots(
+ struct xfs_scrub_context *sc,
+ struct xfs_buf *agf_bp,
+ struct xfs_repair_find_ag_btree *btree_info,
+ struct xfs_buf *agfl_bp)
+{
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_repair_find_ag_btree_roots_info ri;
+ struct xfs_repair_find_ag_btree *fab;
+ struct xfs_btree_cur *cur;
+ int error;
+
+ ri.btree_info = btree_info;
+ ri.agfl_bp = agfl_bp;
+ for (fab = btree_info; fab->buf_ops; fab++) {
+ ASSERT(agfl_bp || fab->rmap_owner != XFS_RMAP_OWN_AG);
+ fab->root = NULLAGBLOCK;
+ fab->level = 0;
+ }
+
+ cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
+ error = xfs_rmap_query_all(cur, xfs_repair_find_ag_btree_roots_helper,
+ &ri);
+ xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+
+ for (fab = btree_info; !error && fab->buf_ops; fab++)
+ if (fab->root != NULLAGBLOCK)
+ fab->level++;
+
+ return error;
+}
+
+/* Reset the superblock counters from the AGF/AGI. */
+int
+xfs_repair_reset_counters(
+ struct xfs_mount *mp)
+{
+ struct xfs_trans *tp;
+ struct xfs_buf *agi_bp;
+ struct xfs_buf *agf_bp;
+ struct xfs_agi *agi;
+ struct xfs_agf *agf;
+ xfs_agnumber_t agno;
+ xfs_ino_t icount = 0;
+ xfs_ino_t ifree = 0;
+ xfs_filblks_t fdblocks = 0;
+ int64_t delta_icount;
+ int64_t delta_ifree;
+ int64_t delta_fdblocks;
+ int error;
+
+ trace_xfs_repair_reset_counters(mp);
+
+ error = xfs_trans_alloc_empty(mp, &tp);
+ if (error)
+ return error;
+
+ for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+ /* Count all the inodes... */
+ error = xfs_ialloc_read_agi(mp, tp, agno, &agi_bp);
+ if (error)
+ goto out;
+ agi = XFS_BUF_TO_AGI(agi_bp);
+ icount += be32_to_cpu(agi->agi_count);
+ ifree += be32_to_cpu(agi->agi_freecount);
+
+ /* Add up the free/freelist/bnobt/cntbt blocks... */
+ error = xfs_alloc_read_agf(mp, tp, agno, 0, &agf_bp);
+ if (error)
+ goto out;
+ if (!agf_bp) {
+ error = -ENOMEM;
+ goto out;
+ }
+ agf = XFS_BUF_TO_AGF(agf_bp);
+ fdblocks += be32_to_cpu(agf->agf_freeblks);
+ fdblocks += be32_to_cpu(agf->agf_flcount);
+ fdblocks += be32_to_cpu(agf->agf_btreeblks);
+ }
+
+ /*
+ * Reinitialize the counters. The on-disk and in-core counters
+ * differ by the number of inodes/blocks reserved by the admin,
+ * the per-AG reservation, and any transactions in progress, so
+ * we have to account for that.
+ */
+ spin_lock(&mp->m_sb_lock);
+ delta_icount = (int64_t)mp->m_sb.sb_icount - icount;
+ delta_ifree = (int64_t)mp->m_sb.sb_ifree - ifree;
+ delta_fdblocks = (int64_t)mp->m_sb.sb_fdblocks - fdblocks;
+ mp->m_sb.sb_icount = icount;
+ mp->m_sb.sb_ifree = ifree;
+ mp->m_sb.sb_fdblocks = fdblocks;
+ spin_unlock(&mp->m_sb_lock);
+
+ if (delta_icount) {
+ error = xfs_mod_icount(mp, delta_icount);
+ if (error)
+ goto out;
+ }
+ if (delta_ifree) {
+ error = xfs_mod_ifree(mp, delta_ifree);
+ if (error)
+ goto out;
+ }
+ if (delta_fdblocks) {
+ error = xfs_mod_fdblocks(mp, delta_fdblocks, false);
+ if (error)
+ goto out;
+ }
+
+out:
+ xfs_trans_cancel(tp);
+ return error;
+}
+
+/* Figure out how many blocks to reserve for an AG repair. */
+xfs_extlen_t
+xfs_repair_calc_ag_resblks(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_scrub_metadata *sm = sc->sm;
+ struct xfs_agi *agi;
+ struct xfs_agf *agf;
+ struct xfs_buf *bp;
+ xfs_agino_t icount;
+ xfs_extlen_t aglen;
+ xfs_extlen_t usedlen;
+ xfs_extlen_t freelen;
+ xfs_extlen_t bnobt_sz;
+ xfs_extlen_t inobt_sz;
+ xfs_extlen_t rmapbt_sz;
+ xfs_extlen_t refcbt_sz;
+ int error;
+
+ if (!(sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR))
+ return 0;
+
+ /*
+ * Try to get the actual counters from disk; if not, make
+ * some worst case assumptions.
+ */
+ error = xfs_read_agi(mp, NULL, sm->sm_agno, &bp);
+ if (!error) {
+ agi = XFS_BUF_TO_AGI(bp);
+ icount = be32_to_cpu(agi->agi_count);
+ xfs_trans_brelse(NULL, bp);
+ } else
+ icount = mp->m_sb.sb_agblocks / mp->m_sb.sb_inopblock;
+
+ error = xfs_alloc_read_agf(mp, NULL, sm->sm_agno, 0, &bp);
+ if (!error && bp) {
+ agf = XFS_BUF_TO_AGF(bp);
+ aglen = be32_to_cpu(agf->agf_length);
+ freelen = be32_to_cpu(agf->agf_freeblks);
+ usedlen = aglen - freelen;
+ xfs_trans_brelse(NULL, bp);
+ } else {
+ aglen = mp->m_sb.sb_agblocks;
+ freelen = aglen;
+ usedlen = aglen;
+ }
+
+ trace_xfs_repair_calc_ag_resblks(mp, sm->sm_agno, icount, aglen,
+ freelen, usedlen);
+
+ /*
+ * Figure out how many blocks we'd need worst case to rebuild
+ * each type of btree. Note that we can only rebuild the
+ * bnobt/cntbt or inobt/finobt as pairs.
+ */
+ bnobt_sz = 2 * xfs_allocbt_calc_size(mp, freelen);
+ if (xfs_sb_version_hassparseinodes(&mp->m_sb))
+ inobt_sz = xfs_iallocbt_calc_size(mp, icount /
+ XFS_INODES_PER_HOLEMASK_BIT);
+ else
+ inobt_sz = xfs_iallocbt_calc_size(mp, icount /
+ XFS_INODES_PER_CHUNK);
+ if (xfs_sb_version_hasfinobt(&mp->m_sb))
+ inobt_sz *= 2;
+ if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+ rmapbt_sz = xfs_rmapbt_calc_size(mp, aglen);
+ refcbt_sz = xfs_refcountbt_calc_size(mp, usedlen);
+ } else {
+ rmapbt_sz = xfs_rmapbt_calc_size(mp, usedlen);
+ refcbt_sz = 0;
+ }
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ rmapbt_sz = 0;
+
+ trace_xfs_repair_calc_ag_resblks_btsize(mp, sm->sm_agno, bnobt_sz,
+ inobt_sz, rmapbt_sz, refcbt_sz);
+
+ return max(max(bnobt_sz, inobt_sz), max(rmapbt_sz, refcbt_sz));
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
new file mode 100644
index 0000000..e0d3690
--- /dev/null
+++ b/fs/xfs/scrub/repair.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2017 Oracle. All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+#ifndef __XFS_SCRUB_REPAIR_H__
+#define __XFS_SCRUB_REPAIR_H__
+
+/* Repair helpers */
+
+struct xfs_repair_find_ag_btree {
+ uint64_t rmap_owner;
+ const struct xfs_buf_ops *buf_ops;
+ uint32_t magic;
+ xfs_agblock_t root;
+ unsigned int level;
+};
+
+struct xfs_repair_btree_extent {
+ struct list_head list;
+ xfs_fsblock_t fsbno;
+ xfs_extlen_t len;
+};
+
+int xfs_repair_roll_ag_trans(struct xfs_scrub_context *sc);
+bool xfs_repair_ag_has_space(struct xfs_perag *pag, xfs_extlen_t nr_blocks,
+ enum xfs_ag_resv_type type);
+int xfs_repair_alloc_ag_block(struct xfs_scrub_context *sc,
+ struct xfs_owner_info *oinfo,
+ xfs_fsblock_t *fsbno, enum xfs_ag_resv_type resv);
+int xfs_repair_init_btblock(struct xfs_scrub_context *sc, xfs_fsblock_t fsb,
+ struct xfs_buf **bpp, xfs_btnum_t btnum,
+ const struct xfs_buf_ops *ops);
+int xfs_repair_fix_freelist(struct xfs_scrub_context *sc, bool can_shrink);
+int xfs_repair_put_freelist(struct xfs_scrub_context *sc, xfs_agblock_t agbno);
+int xfs_repair_collect_btree_extent(struct xfs_scrub_context *sc,
+ struct list_head *btlist,
+ xfs_fsblock_t fsbno, xfs_extlen_t len);
+int xfs_repair_invalidate_blocks(struct xfs_scrub_context *sc,
+ struct list_head *btlist);
+int xfs_repair_reap_btree_extents(struct xfs_scrub_context *sc,
+ struct list_head *btlist,
+ struct xfs_owner_info *oinfo,
+ enum xfs_ag_resv_type type);
+void xfs_repair_cancel_btree_extents(struct xfs_scrub_context *sc,
+ struct list_head *btlist);
+int xfs_repair_subtract_extents(struct xfs_scrub_context *sc,
+ struct list_head *exlist,
+ struct list_head *sublist);
+int xfs_repair_find_ag_btree_roots(struct xfs_scrub_context *sc,
+ struct xfs_buf *agf_bp,
+ struct xfs_repair_find_ag_btree *btree_info,
+ struct xfs_buf *agfl_bp);
+int xfs_repair_reset_counters(struct xfs_mount *mp);
+xfs_extlen_t xfs_repair_calc_ag_resblks(struct xfs_scrub_context *sc);
+int xfs_repair_setup_btree_extent_collection(struct xfs_scrub_context *sc);
+
+/* Metadata repairers */
+
+#endif /* __XFS_SCRUB_REPAIR_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index cdc8233..9c2372e 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -51,6 +51,7 @@
#include "scrub/trace.h"
#include "scrub/scrub.h"
#include "scrub/btree.h"
+#include "scrub/repair.h"
/*
* Online Scrub and Repair
@@ -199,6 +200,8 @@ xfs_scrub_teardown(
kmem_free(sc->buf);
sc->buf = NULL;
}
+ if (sc->reset_counters && !error)
+ error = xfs_repair_reset_counters(sc->mp);
return error;
}
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 0713eda..70adf0c 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -39,6 +39,14 @@ static inline bool xfs_scrub_should_fix(struct xfs_scrub_metadata *sm)
XFS_SCRUB_OFLAG_PREEN));
}
+/* Are we here only for preening? */
+static inline bool xfs_scrub_preen_only(struct xfs_scrub_metadata *sm)
+{
+ return (sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
+ XFS_SCRUB_OFLAG_XCORRUPT |
+ XFS_SCRUB_OFLAG_PREEN)) == XFS_SCRUB_OFLAG_PREEN;
+}
+
/* Buffer pointers and btree cursors for an entire AG. */
struct xfs_scrub_ag {
xfs_agnumber_t agno;
@@ -67,6 +75,7 @@ struct xfs_scrub_context {
void *buf;
uint ilock_flags;
bool try_harder;
+ bool reset_counters;
/* State tracking for single-AG operations. */
struct xfs_scrub_ag sa;
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 10/19] xfs: repair superblocks
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (8 preceding siblings ...)
2017-08-25 22:17 ` [PATCH 09/19] xfs: add helper routines for the repair code Darrick J. Wong
@ 2017-08-25 22:17 ` Darrick J. Wong
2017-08-25 22:18 ` [PATCH 11/19] xfs: repair the AGF and AGFL Darrick J. Wong
` (8 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:17 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
If one of the backup superblocks is found to differ seriously from
superblock 0, write out a fresh copy from the in-core sb.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
| 35 +++++++++++++++++++++++++++++++++++
fs/xfs/scrub/repair.h | 1 +
fs/xfs/scrub/scrub.c | 1 +
3 files changed, 37 insertions(+)
--git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index 8507153..d59444c 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -472,6 +472,41 @@ xfs_scrub_superblock(
return error;
}
+/* Repair the superblock. */
+int
+xfs_repair_superblock(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_buf *bp;
+ struct xfs_dsb *sbp;
+ xfs_agnumber_t agno;
+ int error;
+
+ /* Don't try to repair AG 0's sb; let xfs_repair deal with it. */
+ agno = sc->sm->sm_agno;
+ if (agno == 0)
+ return -EOPNOTSUPP;
+
+ error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+ XFS_AG_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
+ XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
+ if (error)
+ return error;
+ bp->b_ops = &xfs_sb_buf_ops;
+
+ /* Copy AG 0's superblock to this one. */
+ sbp = XFS_BUF_TO_SBP(bp);
+ memset(sbp, 0, mp->m_sb.sb_sectsize);
+ xfs_sb_to_disk(sbp, &mp->m_sb);
+ sbp->sb_bad_features2 = sbp->sb_features2;
+
+ /* Write this to disk. */
+ xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_SB_BUF);
+ xfs_trans_log_buf(sc->tp, bp, 0, mp->m_sb.sb_sectsize - 1);
+ return error;
+}
+
/* AGF */
/* Tally freespace record lengths. */
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index e0d3690..fc5dcaa 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -70,5 +70,6 @@ xfs_extlen_t xfs_repair_calc_ag_resblks(struct xfs_scrub_context *sc);
int xfs_repair_setup_btree_extent_collection(struct xfs_scrub_context *sc);
/* Metadata repairers */
+int xfs_repair_superblock(struct xfs_scrub_context *sc);
#endif /* __XFS_SCRUB_REPAIR_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 9c2372e..401a446 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -216,6 +216,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
{ /* superblock */
.setup = xfs_scrub_setup_ag_header,
.scrub = xfs_scrub_superblock,
+ .repair = xfs_repair_superblock,
},
{ /* agf */
.setup = xfs_scrub_setup_ag_header,
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 11/19] xfs: repair the AGF and AGFL
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (9 preceding siblings ...)
2017-08-25 22:17 ` [PATCH 10/19] xfs: repair superblocks Darrick J. Wong
@ 2017-08-25 22:18 ` Darrick J. Wong
2017-08-25 22:18 ` [PATCH 12/19] xfs: rebuild the AGI Darrick J. Wong
` (7 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:18 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Regenerate the AGF and AGFL from the rmap data.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
| 472 +++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/repair.h | 2
fs/xfs/scrub/scrub.c | 2
3 files changed, 476 insertions(+)
--git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index d59444c..40c31ed 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -31,13 +31,18 @@
#include "xfs_sb.h"
#include "xfs_inode.h"
#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/trace.h"
+#include "scrub/repair.h"
/* Set us up to check an AG header. */
int
@@ -744,6 +749,256 @@ xfs_scrub_agf(
return error;
}
+struct xfs_repair_agf_allocbt {
+ xfs_agblock_t freeblks;
+ xfs_agblock_t longest;
+};
+
+/* Record free space shape information. */
+STATIC int
+xfs_repair_agf_walk_allocbt(
+ struct xfs_btree_cur *cur,
+ struct xfs_alloc_rec_incore *rec,
+ void *priv)
+{
+ struct xfs_repair_agf_allocbt *raa = priv;
+ int error = 0;
+
+ if (xfs_scrub_should_terminate(&error))
+ return error;
+
+ raa->freeblks += rec->ar_blockcount;
+ if (rec->ar_blockcount > raa->longest)
+ raa->longest = rec->ar_blockcount;
+ return error;
+}
+
+/* Does this AGFL look sane? */
+STATIC int
+xfs_repair_agf_check_agfl(
+ struct xfs_scrub_context *sc,
+ struct xfs_agf *agf,
+ __be32 *agfl_bno)
+{
+ struct xfs_mount *mp = sc->mp;
+ xfs_agblock_t aglen;
+ xfs_agblock_t bno;
+ unsigned int flfirst;
+ unsigned int fllast;
+ int i;
+
+ if (agf->agf_flcount == cpu_to_be32(0))
+ return 0;
+
+ flfirst = be32_to_cpu(agf->agf_flfirst);
+ fllast = be32_to_cpu(agf->agf_fllast);
+ aglen = be32_to_cpu(agf->agf_length);
+
+ /* first to last is a consecutive list. */
+ if (fllast >= flfirst) {
+ for (i = flfirst; i <= fllast; i++) {
+ bno = be32_to_cpu(agfl_bno[i]);
+ if (xfs_scrub_extent_covers_ag_head(mp, bno, 1) ||
+ bno > aglen || bno == NULLAGBLOCK)
+ return -EFSCORRUPTED;
+ }
+
+ return 0;
+ }
+
+ /* first to the end */
+ for (i = flfirst; i < XFS_AGFL_SIZE(mp); i++) {
+ bno = be32_to_cpu(agfl_bno[i]);
+ if (xfs_scrub_extent_covers_ag_head(mp, bno, 1) ||
+ bno > aglen || bno == NULLAGBLOCK)
+ return -EFSCORRUPTED;
+ }
+
+ /* the start to last. */
+ for (i = 0; i <= fllast; i++) {
+ bno = be32_to_cpu(agfl_bno[i]);
+ if (xfs_scrub_extent_covers_ag_head(mp, bno, 1) ||
+ bno > aglen || bno == NULLAGBLOCK)
+ return -EFSCORRUPTED;
+ }
+ return 0;
+}
+
+/* Repair the AGF. */
+int
+xfs_repair_agf(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_repair_find_ag_btree fab[] = {
+ {XFS_RMAP_OWN_AG, &xfs_allocbt_buf_ops, XFS_ABTB_CRC_MAGIC, 0, 0},
+ {XFS_RMAP_OWN_AG, &xfs_allocbt_buf_ops, XFS_ABTC_CRC_MAGIC, 0, 0},
+ {XFS_RMAP_OWN_AG, &xfs_rmapbt_buf_ops, XFS_RMAP_CRC_MAGIC, 0, 0},
+ {XFS_RMAP_OWN_REFC, &xfs_refcountbt_buf_ops, XFS_REFC_CRC_MAGIC, 0, 0},
+ {0, NULL, 0, 0, 0},
+ };
+ struct xfs_repair_agf_allocbt raa = {0};
+ struct xfs_agf old_agf;
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_buf *agf_bp;
+ struct xfs_buf *agfl_bp;
+ struct xfs_agf *agf;
+ struct xfs_btree_cur *cur = NULL;
+ struct xfs_perag *pag;
+ xfs_agblock_t blocks;
+ xfs_agblock_t freesp_blocks;
+ int error;
+
+ /* We require the rmapbt to rebuild anything. */
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return -EOPNOTSUPP;
+
+ error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+ XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
+ XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
+ if (error)
+ return error;
+ agf_bp->b_ops = &xfs_agf_buf_ops;
+
+ /*
+ * Load the AGFL so that we can screen out OWN_AG blocks that
+ * are on the AGFL now; these blocks might have once been part
+ * of the bno/cnt/rmap btrees but are not now.
+ */
+ error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
+ if (error)
+ return error;
+ error = xfs_repair_agf_check_agfl(sc, XFS_BUF_TO_AGF(agf_bp),
+ XFS_BUF_TO_AGFL_BNO(mp, agfl_bp));
+ if (error)
+ return error;
+
+ /* Find the btree roots. */
+ error = xfs_repair_find_ag_btree_roots(sc, agf_bp, fab, agfl_bp);
+ if (error)
+ return error;
+ if (fab[0].root == NULLAGBLOCK || fab[0].level > XFS_BTREE_MAXLEVELS ||
+ fab[1].root == NULLAGBLOCK || fab[1].level > XFS_BTREE_MAXLEVELS ||
+ fab[2].root == NULLAGBLOCK || fab[2].level > XFS_BTREE_MAXLEVELS)
+ return -EFSCORRUPTED;
+ if (xfs_sb_version_hasreflink(&mp->m_sb) &&
+ (fab[3].root == NULLAGBLOCK || fab[3].level > XFS_BTREE_MAXLEVELS))
+ return -EFSCORRUPTED;
+
+ /* Start rewriting the header. */
+ agf = XFS_BUF_TO_AGF(agf_bp);
+ old_agf = *agf;
+ /*
+ * We relied on the rmapbt to reconstruct the AGF. If we get a
+ * different root then something's seriously wrong.
+ */
+ if (be32_to_cpu(old_agf.agf_roots[XFS_BTNUM_RMAPi]) != fab[2].root)
+ return -EFSCORRUPTED;
+ memset(agf, 0, mp->m_sb.sb_sectsize);
+ agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
+ agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
+ agf->agf_seqno = cpu_to_be32(sc->sa.agno);
+ agf->agf_length = cpu_to_be32(xfs_scrub_ag_blocks(mp, sc->sa.agno));
+ agf->agf_roots[XFS_BTNUM_BNOi] = cpu_to_be32(fab[0].root);
+ agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(fab[1].root);
+ agf->agf_roots[XFS_BTNUM_RMAPi] = cpu_to_be32(fab[2].root);
+ agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(fab[0].level);
+ agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(fab[1].level);
+ agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(fab[2].level);
+ agf->agf_flfirst = old_agf.agf_flfirst;
+ agf->agf_fllast = old_agf.agf_fllast;
+ agf->agf_flcount = old_agf.agf_flcount;
+ if (xfs_sb_version_hascrc(&mp->m_sb))
+ uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
+ if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+ agf->agf_refcount_root = cpu_to_be32(fab[3].root);
+ agf->agf_refcount_level = cpu_to_be32(fab[3].level);
+ }
+
+ /* Update the AGF counters from the bnobt. */
+ cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+ XFS_BTNUM_BNO);
+ error = xfs_alloc_query_all(cur, xfs_repair_agf_walk_allocbt, &raa);
+ if (error)
+ goto err;
+ error = xfs_btree_count_blocks(cur, &blocks);
+ if (error)
+ goto err;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ freesp_blocks = blocks - 1;
+ agf->agf_freeblks = cpu_to_be32(raa.freeblks);
+ agf->agf_longest = cpu_to_be32(raa.longest);
+
+ /* Update the AGF counters from the cntbt. */
+ cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+ XFS_BTNUM_CNT);
+ error = xfs_btree_count_blocks(cur, &blocks);
+ if (error)
+ goto err;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ freesp_blocks += blocks - 1;
+
+ /* Update the AGF counters from the rmapbt. */
+ cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
+ error = xfs_btree_count_blocks(cur, &blocks);
+ if (error)
+ goto err;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ agf->agf_rmap_blocks = cpu_to_be32(blocks);
+ freesp_blocks += blocks - 1;
+
+ /* Update the AGF counters from the refcountbt. */
+ if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+ cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp,
+ sc->sa.agno, NULL);
+ error = xfs_btree_count_blocks(cur, &blocks);
+ if (error)
+ goto err;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ agf->agf_refcount_blocks = cpu_to_be32(blocks);
+ }
+ agf->agf_btreeblks = cpu_to_be32(freesp_blocks);
+ cur = NULL;
+
+ /* Trigger reinitialization of the in-core data. */
+ if (raa.freeblks != be32_to_cpu(old_agf.agf_freeblks) ||
+ freesp_blocks != be32_to_cpu(old_agf.agf_btreeblks) ||
+ raa.longest != be32_to_cpu(old_agf.agf_longest) ||
+ fab[0].level != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_BNOi]) ||
+ fab[1].level != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_CNTi]) ||
+ fab[2].level != be32_to_cpu(old_agf.agf_levels[XFS_BTNUM_RMAPi]) ||
+ fab[3].level != be32_to_cpu(old_agf.agf_refcount_level)) {
+ pag = xfs_perag_get(mp, sc->sa.agno);
+ if (pag->pagf_init) {
+ pag->pagf_freeblks = be32_to_cpu(agf->agf_freeblks);
+ pag->pagf_btreeblks = be32_to_cpu(agf->agf_btreeblks);
+ pag->pagf_flcount = be32_to_cpu(agf->agf_flcount);
+ pag->pagf_longest = be32_to_cpu(agf->agf_longest);
+ pag->pagf_levels[XFS_BTNUM_BNOi] =
+ be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
+ pag->pagf_levels[XFS_BTNUM_CNTi] =
+ be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
+ pag->pagf_levels[XFS_BTNUM_RMAPi] =
+ be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
+ pag->pagf_refcount_level =
+ be32_to_cpu(agf->agf_refcount_level);
+ }
+ xfs_perag_put(pag);
+ sc->reset_counters = true;
+ }
+
+ /* Write this to disk. */
+ xfs_trans_buf_set_type(sc->tp, agf_bp, XFS_BLFT_AGF_BUF);
+ xfs_trans_log_buf(sc->tp, agf_bp, 0, mp->m_sb.sb_sectsize - 1);
+ return error;
+
+err:
+ if (cur)
+ xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
+ XFS_BTREE_NOERROR);
+ *agf = old_agf;
+ return error;
+}
+
/* AGFL */
struct xfs_scrub_agfl {
@@ -905,6 +1160,223 @@ xfs_scrub_agfl(
return error;
}
+/* AGFL repair. */
+
+struct xfs_repair_agfl {
+ struct list_head freesp_list;
+ struct list_head agmeta_list;
+ struct xfs_scrub_context *sc;
+};
+
+/* Record all freespace information. */
+STATIC int
+xfs_repair_agfl_rmap_fn(
+ struct xfs_btree_cur *cur,
+ struct xfs_rmap_irec *rec,
+ void *priv)
+{
+ struct xfs_repair_agfl *ra = priv;
+ struct xfs_buf *bp;
+ xfs_fsblock_t fsb;
+ int i;
+ int error = 0;
+
+ if (xfs_scrub_should_terminate(&error))
+ return error;
+
+ /* Record all the OWN_AG blocks... */
+ if (rec->rm_owner == XFS_RMAP_OWN_AG) {
+ fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+ rec->rm_startblock);
+ error = xfs_repair_collect_btree_extent(ra->sc,
+ &ra->freesp_list, fsb, rec->rm_blockcount);
+ if (error)
+ return error;
+ }
+
+ /* ...and all the rmapbt blocks... */
+ for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
+ xfs_btree_get_block(cur, i, &bp);
+ if (!bp)
+ continue;
+ fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+ error = xfs_repair_collect_btree_extent(ra->sc,
+ &ra->agmeta_list, fsb, 1);
+ if (error)
+ return error;
+ }
+
+ return 0;
+}
+
+/* Add a btree block to the agmeta list. */
+STATIC int
+xfs_repair_agfl_visit_btblock(
+ struct xfs_btree_cur *cur,
+ int level,
+ void *priv)
+{
+ struct xfs_repair_agfl *ra = priv;
+ struct xfs_buf *bp;
+ xfs_fsblock_t fsb;
+ int error = 0;
+
+ if (xfs_scrub_should_terminate(&error))
+ return error;
+
+ xfs_btree_get_block(cur, level, &bp);
+ if (!bp)
+ return 0;
+
+ fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+ return xfs_repair_collect_btree_extent(ra->sc, &ra->agmeta_list,
+ fsb, 1);
+}
+
+/* Repair the AGFL. */
+int
+xfs_repair_agfl(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_repair_agfl ra;
+ struct xfs_owner_info oinfo;
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_buf *agf_bp;
+ struct xfs_buf *agfl_bp;
+ struct xfs_agf *agf;
+ struct xfs_agfl *agfl;
+ struct xfs_btree_cur *cur = NULL;
+ struct xfs_perag *pag;
+ __be32 *agfl_bno;
+ struct xfs_repair_btree_extent *rbe;
+ struct xfs_repair_btree_extent *n;
+ xfs_agblock_t flcount;
+ xfs_agblock_t agbno;
+ xfs_agblock_t bno;
+ xfs_agblock_t old_flcount;
+ int error;
+
+ /* We require the rmapbt to rebuild anything. */
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return -EOPNOTSUPP;
+
+ INIT_LIST_HEAD(&ra.freesp_list);
+ INIT_LIST_HEAD(&ra.agmeta_list);
+ ra.sc = sc;
+
+ error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp);
+ if (error)
+ return error;
+ if (!agf_bp)
+ return -ENOMEM;
+
+ error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+ XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGFL_DADDR(mp)),
+ XFS_FSS_TO_BB(mp, 1), 0, &agfl_bp, NULL);
+ if (error)
+ return error;
+ agfl_bp->b_ops = &xfs_agfl_buf_ops;
+
+ /* Find all space used by the free space btrees & rmapbt. */
+ cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
+ error = xfs_rmap_query_all(cur, xfs_repair_agfl_rmap_fn, &ra);
+ if (error)
+ goto err;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+
+ /* Find all space used by bnobt. */
+ cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+ XFS_BTNUM_BNO);
+ error = xfs_btree_visit_blocks(cur, xfs_repair_agfl_visit_btblock,
+ &ra);
+ if (error)
+ goto err;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+
+ /* Find all space used by cntbt. */
+ cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
+ XFS_BTNUM_CNT);
+ error = xfs_btree_visit_blocks(cur, xfs_repair_agfl_visit_btblock,
+ &ra);
+ if (error)
+ goto err;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+
+ /*
+ * Drop the freesp meta blocks that are in use by btrees.
+ * The remaining blocks /should/ be AGFL blocks.
+ */
+ error = xfs_repair_subtract_extents(sc, &ra.freesp_list,
+ &ra.agmeta_list);
+ if (error)
+ goto err;
+ xfs_repair_cancel_btree_extents(sc, &ra.agmeta_list);
+
+ /* Start rewriting the header. */
+ agfl = XFS_BUF_TO_AGFL(agfl_bp);
+ memset(agfl, 0xFF, mp->m_sb.sb_sectsize);
+ agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
+ agfl->agfl_seqno = cpu_to_be32(sc->sa.agno);
+ uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
+
+ /* Fill the AGFL with the remaining blocks. */
+ flcount = 0;
+ agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agfl_bp);
+ list_for_each_entry_safe(rbe, n, &ra.freesp_list, list) {
+ agbno = XFS_FSB_TO_AGBNO(mp, rbe->fsbno);
+
+ trace_xfs_repair_agfl_insert(mp, sc->sa.agno, agbno, rbe->len);
+
+ for (bno = 0; bno < rbe->len; bno++) {
+ if (flcount >= XFS_AGFL_SIZE(mp) - 1)
+ break;
+ agfl_bno[flcount + 1] = cpu_to_be32(agbno + bno);
+ flcount++;
+ }
+ rbe->fsbno += bno;
+ rbe->len -= bno;
+ if (rbe->len)
+ break;
+ list_del(&rbe->list);
+ kmem_free(rbe);
+ }
+
+ /* Update the AGF counters. */
+ agf = XFS_BUF_TO_AGF(agf_bp);
+ old_flcount = be32_to_cpu(agf->agf_flcount);
+ agf->agf_flfirst = cpu_to_be32(1);
+ agf->agf_flcount = cpu_to_be32(flcount);
+ agf->agf_fllast = cpu_to_be32(flcount);
+
+ /* Trigger reinitialization of the in-core data. */
+ if (flcount != old_flcount) {
+ pag = xfs_perag_get(mp, sc->sa.agno);
+ if (pag->pagf_init)
+ pag->pagf_flcount = flcount;
+ xfs_perag_put(pag);
+ sc->reset_counters = true;
+ }
+
+ /* Write AGF and AGFL to disk. */
+ xfs_alloc_log_agf(sc->tp, agf_bp,
+ XFS_AGF_FLFIRST | XFS_AGF_FLLAST | XFS_AGF_FLCOUNT);
+ xfs_trans_buf_set_type(sc->tp, agfl_bp, XFS_BLFT_AGFL_BUF);
+ xfs_trans_log_buf(sc->tp, agfl_bp, 0, mp->m_sb.sb_sectsize - 1);
+
+ /* Dump any AGFL overflow. */
+ xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+ return xfs_repair_reap_btree_extents(sc, &ra.freesp_list, &oinfo,
+ XFS_AG_RESV_AGFL);
+err:
+ xfs_repair_cancel_btree_extents(sc, &ra.agmeta_list);
+ xfs_repair_cancel_btree_extents(sc, &ra.freesp_list);
+ if (cur)
+ xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
+ XFS_BTREE_NOERROR);
+ return error;
+}
+
/* AGI */
/* Scrub the AGI. */
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index fc5dcaa..6c544eb 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -71,5 +71,7 @@ int xfs_repair_setup_btree_extent_collection(struct xfs_scrub_context *sc);
/* Metadata repairers */
int xfs_repair_superblock(struct xfs_scrub_context *sc);
+int xfs_repair_agf(struct xfs_scrub_context *sc);
+int xfs_repair_agfl(struct xfs_scrub_context *sc);
#endif /* __XFS_SCRUB_REPAIR_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 401a446..3ddfe09 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -221,10 +221,12 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
{ /* agf */
.setup = xfs_scrub_setup_ag_header,
.scrub = xfs_scrub_agf,
+ .repair = xfs_repair_agf,
},
{ /* agfl */
.setup = xfs_scrub_setup_ag_header,
.scrub = xfs_scrub_agfl,
+ .repair = xfs_repair_agfl,
},
{ /* agi */
.setup = xfs_scrub_setup_ag_header,
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 12/19] xfs: rebuild the AGI
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (10 preceding siblings ...)
2017-08-25 22:18 ` [PATCH 11/19] xfs: repair the AGF and AGFL Darrick J. Wong
@ 2017-08-25 22:18 ` Darrick J. Wong
2017-08-25 22:18 ` [PATCH 13/19] xfs: repair free space btrees Darrick J. Wong
` (6 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:18 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Rebuild the AGI header items with some help from the rmapbt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
| 100 +++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/repair.h | 1
fs/xfs/scrub/scrub.c | 1
3 files changed, 102 insertions(+)
--git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index 40c31ed..0f83ea3 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -1536,3 +1536,103 @@ xfs_scrub_agi(
out:
return error;
}
+
+/* Repair the AGI. */
+int
+xfs_repair_agi(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_repair_find_ag_btree fab[] = {
+ {XFS_RMAP_OWN_INOBT, &xfs_inobt_buf_ops, XFS_IBT_CRC_MAGIC, 0, 0},
+ {XFS_RMAP_OWN_INOBT, &xfs_inobt_buf_ops, XFS_FIBT_CRC_MAGIC, 0, 0},
+ {0, NULL, 0, 0, 0},
+ };
+ struct xfs_agi old_agi;
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_buf *agi_bp;
+ struct xfs_buf *agf_bp;
+ struct xfs_agi *agi;
+ struct xfs_btree_cur *cur;
+ struct xfs_perag *pag;
+ xfs_agino_t old_count;
+ xfs_agino_t old_freecount;
+ xfs_agino_t count;
+ xfs_agino_t freecount;
+ int bucket;
+ int error;
+
+ /* We require the rmapbt to rebuild anything. */
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return -EOPNOTSUPP;
+
+ error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+ XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGI_DADDR(mp)),
+ XFS_FSS_TO_BB(mp, 1), 0, &agi_bp, NULL);
+ if (error)
+ return error;
+ agi_bp->b_ops = &xfs_agi_buf_ops;
+
+ error = xfs_alloc_read_agf(mp, sc->tp, sc->sa.agno, 0, &agf_bp);
+ if (error)
+ return error;
+ if (!agf_bp)
+ return -ENOMEM;
+
+ /* Find the btree roots. */
+ error = xfs_repair_find_ag_btree_roots(sc, agf_bp, fab, NULL);
+ if (error)
+ return error;
+ if (fab[0].root == NULLAGBLOCK || fab[0].level > XFS_BTREE_MAXLEVELS)
+ return -EFSCORRUPTED;
+ if (xfs_sb_version_hasfinobt(&mp->m_sb) &&
+ (fab[1].root == NULLAGBLOCK || fab[1].level > XFS_BTREE_MAXLEVELS))
+ return -EFSCORRUPTED;
+
+ /* Start rewriting the header. */
+ agi = XFS_BUF_TO_AGI(agi_bp);
+ old_agi = *agi;
+ old_count = be32_to_cpu(old_agi.agi_count);
+ old_freecount = be32_to_cpu(old_agi.agi_freecount);
+ memset(agi, 0, mp->m_sb.sb_sectsize);
+ agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC);
+ agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION);
+ agi->agi_seqno = cpu_to_be32(sc->sa.agno);
+ agi->agi_length = cpu_to_be32(xfs_scrub_ag_blocks(mp, sc->sa.agno));
+ agi->agi_newino = cpu_to_be32(NULLAGINO);
+ agi->agi_dirino = cpu_to_be32(NULLAGINO);
+ if (xfs_sb_version_hascrc(&mp->m_sb))
+ uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid);
+ for (bucket = 0; bucket < XFS_AGI_UNLINKED_BUCKETS; bucket++)
+ agi->agi_unlinked[bucket] = cpu_to_be32(NULLAGINO);
+ agi->agi_root = cpu_to_be32(fab[0].root);
+ agi->agi_level = cpu_to_be32(fab[0].level);
+ if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+ agi->agi_free_root = cpu_to_be32(fab[1].root);
+ agi->agi_free_level = cpu_to_be32(fab[1].level);
+ }
+
+ /* Update the AGI counters. */
+ cur = xfs_inobt_init_cursor(mp, sc->tp, agi_bp, sc->sa.agno,
+ XFS_BTNUM_INO);
+ error = xfs_ialloc_count_inodes(cur, &count, &freecount);
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ if (error)
+ goto err;
+ agi->agi_count = cpu_to_be32(count);
+ agi->agi_freecount = cpu_to_be32(freecount);
+ if (old_count != count || old_freecount != freecount) {
+ pag = xfs_perag_get(mp, sc->sa.agno);
+ pag->pagi_init = 0;
+ xfs_perag_put(pag);
+ sc->reset_counters = true;
+ }
+
+ /* Write this to disk. */
+ xfs_trans_buf_set_type(sc->tp, agi_bp, XFS_BLFT_AGI_BUF);
+ xfs_trans_log_buf(sc->tp, agi_bp, 0, mp->m_sb.sb_sectsize - 1);
+ return error;
+
+err:
+ *agi = old_agi;
+ return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 6c544eb..e80f2e3 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -73,5 +73,6 @@ int xfs_repair_setup_btree_extent_collection(struct xfs_scrub_context *sc);
int xfs_repair_superblock(struct xfs_scrub_context *sc);
int xfs_repair_agf(struct xfs_scrub_context *sc);
int xfs_repair_agfl(struct xfs_scrub_context *sc);
+int xfs_repair_agi(struct xfs_scrub_context *sc);
#endif /* __XFS_SCRUB_REPAIR_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 3ddfe09..03da10a 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -231,6 +231,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
{ /* agi */
.setup = xfs_scrub_setup_ag_header,
.scrub = xfs_scrub_agi,
+ .repair = xfs_repair_agi,
},
{ /* bnobt */
.setup = xfs_scrub_setup_ag_allocbt,
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 13/19] xfs: repair free space btrees
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (11 preceding siblings ...)
2017-08-25 22:18 ` [PATCH 12/19] xfs: rebuild the AGI Darrick J. Wong
@ 2017-08-25 22:18 ` Darrick J. Wong
2017-08-25 22:18 ` [PATCH 14/19] xfs: repair inode btrees Darrick J. Wong
` (5 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:18 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Rebuild the free space btrees from the gaps in the rmap btree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/scrub/alloc.c | 411 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/common.c | 16 ++
fs/xfs/scrub/repair.h | 1
fs/xfs/scrub/scrub.c | 2
4 files changed, 430 insertions(+)
diff --git a/fs/xfs/scrub/alloc.c b/fs/xfs/scrub/alloc.c
index 812843c..4daf78f 100644
--- a/fs/xfs/scrub/alloc.c
+++ b/fs/xfs/scrub/alloc.c
@@ -29,15 +29,19 @@
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_sb.h"
+#include "xfs_inode.h"
#include "xfs_rmap.h"
#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
#include "xfs_ialloc.h"
+#include "xfs_rmap_btree.h"
#include "xfs_refcount.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/btree.h"
#include "scrub/trace.h"
+#include "scrub/repair.h"
/*
* Set us up to scrub free space btrees.
@@ -182,3 +186,410 @@ xfs_scrub_cntbt(
{
return xfs_scrub_allocbt(sc, XFS_BTNUM_CNT);
}
+
+/* Free space btree repair. */
+
+struct xfs_repair_alloc_extent {
+ struct list_head list;
+ xfs_agblock_t bno;
+ xfs_extlen_t len;
+};
+
+struct xfs_repair_alloc {
+ struct list_head extlist;
+ struct list_head btlist; /* OWN_AG blocks */
+ struct list_head nobtlist; /* rmapbt/agfl blocks */
+ struct xfs_scrub_context *sc;
+ xfs_agblock_t next_bno;
+ uint64_t nr_records;
+};
+
+/* Record extents that aren't in use from gaps in the rmap records. */
+STATIC int
+xfs_repair_alloc_extent_fn(
+ struct xfs_btree_cur *cur,
+ struct xfs_rmap_irec *rec,
+ void *priv)
+{
+ struct xfs_repair_alloc *ra = priv;
+ struct xfs_repair_alloc_extent *rae;
+ struct xfs_buf *bp;
+ xfs_fsblock_t fsb;
+ int i;
+ int error;
+
+ /* Record all the OWN_AG blocks... */
+ if (rec->rm_owner == XFS_RMAP_OWN_AG) {
+ fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+ rec->rm_startblock);
+ error = xfs_repair_collect_btree_extent(ra->sc,
+ &ra->btlist, fsb, rec->rm_blockcount);
+ if (error)
+ return error;
+ }
+
+ /* ...and all the rmapbt blocks... */
+ for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
+ xfs_btree_get_block(cur, i, &bp);
+ if (!bp)
+ continue;
+ fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+ error = xfs_repair_collect_btree_extent(ra->sc,
+ &ra->nobtlist, fsb, 1);
+ if (error)
+ return error;
+ }
+
+ /* ...and all the free space. */
+ if (rec->rm_startblock > ra->next_bno) {
+ trace_xfs_repair_alloc_extent_fn(sc->mp, cur->bc_private.a.agno,
+ rec->rm_startblock, rec->rm_blockcount,
+ rec->rm_owner, rec->rm_offset, rec->rm_flags);
+
+ rae = kmem_alloc(sizeof(struct xfs_repair_alloc_extent),
+ KM_MAYFAIL | KM_NOFS);
+ if (!rae)
+ return -ENOMEM;
+ INIT_LIST_HEAD(&rae->list);
+ rae->bno = ra->next_bno;
+ rae->len = rec->rm_startblock - ra->next_bno;
+ list_add_tail(&rae->list, &ra->extlist);
+ ra->nr_records++;
+ }
+ ra->next_bno = max_t(xfs_agblock_t, ra->next_bno,
+ rec->rm_startblock + rec->rm_blockcount);
+ return 0;
+}
+
+/* Find the longest free extent in the list. */
+static struct xfs_repair_alloc_extent *
+xfs_repair_allocbt_get_longest(
+ struct xfs_repair_alloc *ra)
+{
+ struct xfs_repair_alloc_extent *rae;
+ struct xfs_repair_alloc_extent *longest = NULL;
+
+ list_for_each_entry(rae, &ra->extlist, list)
+ if (!longest || rae->len > longest->len)
+ longest = rae;
+ return longest;
+}
+
+/* Collect an AGFL block for the not-to-release list. */
+static int
+xfs_repair_collect_agfl_block(
+ struct xfs_scrub_context *sc,
+ xfs_agblock_t bno,
+ void *data)
+{
+ struct xfs_repair_alloc *ra = data;
+ xfs_fsblock_t fsb;
+
+ fsb = XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, bno);
+ return xfs_repair_collect_btree_extent(sc, &ra->nobtlist, fsb, 1);
+}
+
+/* Compare two btree extents. */
+static int
+xfs_repair_allocbt_extent_cmp(
+ void *priv,
+ struct list_head *a,
+ struct list_head *b)
+{
+ struct xfs_repair_alloc_extent *ap;
+ struct xfs_repair_alloc_extent *bp;
+
+ ap = container_of(a, struct xfs_repair_alloc_extent, list);
+ bp = container_of(b, struct xfs_repair_alloc_extent, list);
+
+ if (ap->bno > bp->bno)
+ return 1;
+ else if (ap->bno < bp->bno)
+ return -1;
+ return 0;
+}
+
+/* Put an extent onto the free list. */
+STATIC int
+xfs_repair_allocbt_free_extent(
+ struct xfs_scrub_context *sc,
+ xfs_fsblock_t fsbno,
+ xfs_extlen_t len,
+ struct xfs_owner_info *oinfo)
+{
+ int error;
+
+ error = xfs_free_extent(sc->tp, fsbno, len, oinfo, 0);
+ if (error)
+ return error;
+ error = xfs_repair_roll_ag_trans(sc);
+ if (error)
+ return error;
+ return xfs_mod_fdblocks(sc->mp, -(int64_t)len, false);
+}
+
+/* Allocate a block from the (cached) longest extent in the AG. */
+STATIC xfs_fsblock_t
+xfs_repair_allocbt_alloc_from_longest(
+ struct xfs_repair_alloc *ra,
+ struct xfs_repair_alloc_extent **longest)
+{
+ xfs_fsblock_t fsb;
+
+ if (*longest && (*longest)->len == 0) {
+ list_del(&(*longest)->list);
+ kmem_free(*longest);
+ *longest = NULL;
+ }
+
+ if (*longest == NULL) {
+ *longest = xfs_repair_allocbt_get_longest(ra);
+ if (*longest == NULL)
+ return NULLFSBLOCK;
+ }
+
+ fsb = XFS_AGB_TO_FSB(ra->sc->mp, ra->sc->sa.agno, (*longest)->bno);
+ (*longest)->bno++;
+ (*longest)->len--;
+ return fsb;
+}
+
+/* Insert a free space record into the allocbt. */
+static int
+xfs_repair_allocbt_insert_free_space(
+ struct xfs_scrub_context *sc,
+ struct xfs_owner_info *oinfo,
+ struct xfs_repair_alloc_extent *rae)
+{
+ int error;
+
+ error = xfs_repair_allocbt_free_extent(sc,
+ XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, rae->bno),
+ rae->len, oinfo);
+ if (error)
+ return error;
+ list_del(&rae->list);
+ kmem_free(rae);
+ return 0;
+}
+
+/* Repair the freespace btrees for some AG. */
+int
+xfs_repair_allocbt(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_repair_alloc ra;
+ struct xfs_owner_info oinfo;
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_btree_cur *cur = NULL;
+ struct xfs_repair_alloc_extent *longest = NULL;
+ struct xfs_repair_alloc_extent *rae;
+ struct xfs_repair_alloc_extent *n;
+ struct xfs_perag *pag;
+ struct xfs_agf *agf;
+ struct xfs_buf *bp;
+ xfs_fsblock_t bnofsb;
+ xfs_fsblock_t cntfsb;
+ xfs_extlen_t oldf;
+ xfs_extlen_t nr_blocks;
+ xfs_agblock_t agend;
+ int error;
+
+ /* We require the rmapbt to rebuild anything. */
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return -EOPNOTSUPP;
+
+ /*
+ * Make sure the busy extent list is clear because we can't put
+ * extents on there twice.
+ */
+ pag = xfs_perag_get(sc->mp, sc->sa.agno);
+ spin_lock(&pag->pagb_lock);
+ if (pag->pagb_tree.rb_node) {
+ spin_unlock(&pag->pagb_lock);
+ xfs_perag_put(pag);
+ return -EDEADLOCK;
+ }
+ spin_unlock(&pag->pagb_lock);
+ xfs_perag_put(pag);
+
+ /*
+ * Collect all reverse mappings for free extents, and the rmapbt
+ * blocks. We can discover the rmapbt blocks completely from a
+ * query_all handler because there are always rmapbt entries.
+ * (One cannot use on query_all to visit all of a btree's blocks
+ * unless that btree is guaranteed to have at least one entry.)
+ */
+ INIT_LIST_HEAD(&ra.extlist);
+ INIT_LIST_HEAD(&ra.btlist);
+ INIT_LIST_HEAD(&ra.nobtlist);
+ ra.next_bno = 0;
+ ra.nr_records = 0;
+ ra.sc = sc;
+
+ cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
+ error = xfs_rmap_query_all(cur, xfs_repair_alloc_extent_fn, &ra);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+
+ /* Insert a record for space between the last rmap and EOAG. */
+ agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+ agend = be32_to_cpu(agf->agf_length);
+ if (ra.next_bno < agend) {
+ rae = kmem_alloc(sizeof(struct xfs_repair_alloc_extent),
+ KM_MAYFAIL | KM_NOFS);
+ if (!rae) {
+ error = -ENOMEM;
+ goto out;
+ }
+ INIT_LIST_HEAD(&rae->list);
+ rae->bno = ra.next_bno;
+ rae->len = agend - ra.next_bno;
+ list_add_tail(&rae->list, &ra.extlist);
+ ra.nr_records++;
+ }
+
+ /* Collect all the AGFL blocks. */
+ error = xfs_scrub_walk_agfl(sc, xfs_repair_collect_agfl_block, &ra);
+ if (error)
+ goto out;
+
+ /* Do we actually have enough space to do this? */
+ pag = xfs_perag_get(mp, sc->sa.agno);
+ nr_blocks = 2 * xfs_allocbt_calc_size(mp, ra.nr_records);
+ if (!xfs_repair_ag_has_space(pag, nr_blocks, XFS_AG_RESV_NONE)) {
+ xfs_perag_put(pag);
+ error = -ENOSPC;
+ goto out;
+ }
+ xfs_perag_put(pag);
+
+ /* Invalidate all the bnobt/cntbt blocks in btlist. */
+ error = xfs_repair_subtract_extents(sc, &ra.btlist, &ra.nobtlist);
+ if (error)
+ goto out;
+ xfs_repair_cancel_btree_extents(sc, &ra.nobtlist);
+ error = xfs_repair_invalidate_blocks(sc, &ra.btlist);
+ if (error)
+ goto out;
+
+ /* Allocate new bnobt root. */
+ bnofsb = xfs_repair_allocbt_alloc_from_longest(&ra, &longest);
+ if (bnofsb == NULLFSBLOCK) {
+ error = -ENOSPC;
+ goto out;
+ }
+
+ /* Allocate new cntbt root. */
+ cntfsb = xfs_repair_allocbt_alloc_from_longest(&ra, &longest);
+ if (cntfsb == NULLFSBLOCK) {
+ error = -ENOSPC;
+ goto out;
+ }
+
+ agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+ /* Initialize new bnobt root. */
+ error = xfs_repair_init_btblock(sc, bnofsb, &bp, XFS_BTNUM_BNO,
+ &xfs_allocbt_buf_ops);
+ if (error)
+ goto out;
+ agf->agf_roots[XFS_BTNUM_BNOi] =
+ cpu_to_be32(XFS_FSB_TO_AGBNO(mp, bnofsb));
+ agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
+
+ /* Initialize new cntbt root. */
+ error = xfs_repair_init_btblock(sc, cntfsb, &bp, XFS_BTNUM_CNT,
+ &xfs_allocbt_buf_ops);
+ if (error)
+ goto out;
+ agf->agf_roots[XFS_BTNUM_CNTi] =
+ cpu_to_be32(XFS_FSB_TO_AGBNO(mp, cntfsb));
+ agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
+
+ /*
+ * Since we're abandoning the old bnobt/cntbt, we have to
+ * decrease fdblocks by the # of blocks in those trees.
+ * btreeblks counts the non-root blocks of the free space
+ * and rmap btrees. Do this before resetting the AGF counters.
+ */
+ pag = xfs_perag_get(mp, sc->sa.agno);
+ oldf = pag->pagf_btreeblks + 2;
+ oldf -= (be32_to_cpu(agf->agf_rmap_blocks) - 1);
+ error = xfs_mod_fdblocks(mp, -(int64_t)oldf, false);
+ if (error) {
+ xfs_perag_put(pag);
+ goto out;
+ }
+
+ /* Reset the perag info. */
+ pag->pagf_btreeblks = be32_to_cpu(agf->agf_rmap_blocks) - 1;
+ pag->pagf_freeblks = 0;
+ pag->pagf_longest = 0;
+ pag->pagf_levels[XFS_BTNUM_BNOi] =
+ be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
+ pag->pagf_levels[XFS_BTNUM_CNTi] =
+ be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
+
+ /* Now reset the AGF counters. */
+ agf->agf_btreeblks = cpu_to_be32(pag->pagf_btreeblks);
+ agf->agf_freeblks = cpu_to_be32(pag->pagf_freeblks);
+ agf->agf_longest = cpu_to_be32(pag->pagf_longest);
+ xfs_perag_put(pag);
+ xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp,
+ XFS_AGF_ROOTS | XFS_AGF_LEVELS | XFS_AGF_BTREEBLKS |
+ XFS_AGF_LONGEST | XFS_AGF_FREEBLKS);
+ error = xfs_repair_roll_ag_trans(sc);
+ if (error)
+ goto out;
+
+ /*
+ * Insert the longest free extent in case it's necessary to
+ * refresh the AGFL with multiple blocks.
+ */
+ xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_UNKNOWN);
+ if (longest && longest->len == 0) {
+ error = xfs_repair_allocbt_insert_free_space(sc, &oinfo,
+ longest);
+ if (error)
+ goto out;
+ }
+
+ /* Insert records into the new btrees. */
+ list_sort(NULL, &ra.extlist, xfs_repair_allocbt_extent_cmp);
+ list_for_each_entry_safe(rae, n, &ra.extlist, list) {
+ error = xfs_repair_allocbt_insert_free_space(sc, &oinfo, rae);
+ if (error)
+ goto out;
+ }
+
+ /* Add rmap records for the btree roots */
+ xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
+ error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno,
+ XFS_FSB_TO_AGBNO(mp, bnofsb), 1, &oinfo);
+ if (error)
+ goto out;
+ error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno,
+ XFS_FSB_TO_AGBNO(mp, cntfsb), 1, &oinfo);
+ if (error)
+ goto out;
+
+ /* Free all the OWN_AG blocks that are not in the rmapbt/agfl. */
+ error = xfs_repair_reap_btree_extents(sc, &ra.btlist, &oinfo,
+ XFS_AG_RESV_NONE);
+ if (error)
+ goto out;
+
+ return 0;
+out:
+ xfs_repair_cancel_btree_extents(sc, &ra.btlist);
+ xfs_repair_cancel_btree_extents(sc, &ra.nobtlist);
+ if (cur)
+ xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+ list_for_each_entry_safe(rae, n, &ra.extlist, list) {
+ list_del(&rae->list);
+ kmem_free(rae);
+ }
+ return error;
+}
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 515bee6..8c00acb 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -42,6 +42,8 @@
#include "xfs_refcount_btree.h"
#include "xfs_rmap.h"
#include "xfs_rmap_btree.h"
+#include "xfs_log.h"
+#include "xfs_trans_priv.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
@@ -711,8 +713,22 @@ xfs_scrub_setup_ag_btree(
struct xfs_inode *ip,
bool force_log)
{
+ struct xfs_mount *mp = sc->mp;
int error;
+ /*
+ * Push everything out of the log onto disk prior to checking.
+ * Force everything in memory out to disk if we're repairing.
+ * This ensures we won't get tripped up by btree blocks sitting
+ * in memory waiting to have LSNs stamped in.
+ */
+ if (force_log || (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)) {
+ error = _xfs_log_force(mp, XFS_LOG_SYNC, NULL);
+ if (error)
+ return error;
+ xfs_ail_push_all_sync(mp->m_ail);
+ }
+
error = xfs_scrub_setup_ag_header(sc, ip);
if (error)
return error;
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index e80f2e3..5756d27 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -74,5 +74,6 @@ int xfs_repair_superblock(struct xfs_scrub_context *sc);
int xfs_repair_agf(struct xfs_scrub_context *sc);
int xfs_repair_agfl(struct xfs_scrub_context *sc);
int xfs_repair_agi(struct xfs_scrub_context *sc);
+int xfs_repair_allocbt(struct xfs_scrub_context *sc);
#endif /* __XFS_SCRUB_REPAIR_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 03da10a..b15c320 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -236,10 +236,12 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
{ /* bnobt */
.setup = xfs_scrub_setup_ag_allocbt,
.scrub = xfs_scrub_bnobt,
+ .repair = xfs_repair_allocbt,
},
{ /* cntbt */
.setup = xfs_scrub_setup_ag_allocbt,
.scrub = xfs_scrub_cntbt,
+ .repair = xfs_repair_allocbt,
},
{ /* inobt */
.setup = xfs_scrub_setup_ag_iallocbt,
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 14/19] xfs: repair inode btrees
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (12 preceding siblings ...)
2017-08-25 22:18 ` [PATCH 13/19] xfs: repair free space btrees Darrick J. Wong
@ 2017-08-25 22:18 ` Darrick J. Wong
2017-08-25 22:18 ` [PATCH 15/19] xfs: rebuild the rmapbt Darrick J. Wong
` (4 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:18 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Use the rmapbt to find inode chunks, query the chunks to compute
hole and free masks, and with that information rebuild the inobt
and finobt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/scrub/ialloc.c | 411 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/repair.h | 1
fs/xfs/scrub/scrub.c | 2
3 files changed, 414 insertions(+)
diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 08baab0..7503ade 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -37,12 +37,15 @@
#include "xfs_log.h"
#include "xfs_trans_priv.h"
#include "xfs_alloc.h"
+#include "xfs_rmap_btree.h"
#include "xfs_refcount.h"
+#include "xfs_error.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/btree.h"
#include "scrub/trace.h"
+#include "scrub/repair.h"
/*
* Set us up to scrub inode btrees.
@@ -463,3 +466,411 @@ xfs_scrub_finobt(
{
return xfs_scrub_iallocbt(sc, XFS_BTNUM_FINO);
}
+
+/* Inode btree repair. */
+
+struct xfs_repair_ialloc_extent {
+ struct list_head list;
+ xfs_inofree_t freemask;
+ xfs_agino_t startino;
+ unsigned int count;
+ unsigned int usedcount;
+ uint16_t holemask;
+};
+
+struct xfs_repair_ialloc {
+ struct list_head extlist;
+ struct list_head btlist;
+ struct xfs_scrub_context *sc;
+ uint64_t nr_records;
+};
+
+/* Set usedmask if the inode is in use. */
+STATIC int
+xfs_repair_ialloc_check_free(
+ struct xfs_btree_cur *cur,
+ struct xfs_buf *bp,
+ xfs_ino_t fsino,
+ xfs_agino_t bpino,
+ bool *inuse)
+{
+ struct xfs_mount *mp = cur->bc_mp;
+ struct xfs_dinode *dip;
+ int error;
+
+ /* Will the in-core inode tell us if it's in use? */
+ error = xfs_icache_inode_is_allocated(mp, cur->bc_tp, fsino, inuse);
+ if (!error)
+ return 0;
+
+ /* Inode uncached or half assembled, read disk buffer */
+ dip = xfs_buf_offset(bp, bpino * mp->m_sb.sb_inodesize);
+ if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC)
+ return -EFSCORRUPTED;
+
+ if (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino)
+ return -EFSCORRUPTED;
+
+ *inuse = dip->di_mode != 0;
+ return 0;
+}
+
+/* Record extents that belong to inode btrees. */
+STATIC int
+xfs_repair_ialloc_extent_fn(
+ struct xfs_btree_cur *cur,
+ struct xfs_rmap_irec *rec,
+ void *priv)
+{
+ struct xfs_imap imap;
+ struct xfs_repair_ialloc *ri = priv;
+ struct xfs_repair_ialloc_extent *rie;
+ struct xfs_dinode *dip;
+ struct xfs_buf *bp;
+ struct xfs_mount *mp = cur->bc_mp;
+ xfs_ino_t fsino;
+ xfs_inofree_t usedmask;
+ xfs_fsblock_t fsbno;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ xfs_agino_t cdist;
+ xfs_agino_t startino;
+ xfs_agino_t clusterino;
+ xfs_agino_t nr_inodes;
+ xfs_agino_t inoalign;
+ xfs_agino_t agino;
+ xfs_agino_t rmino;
+ uint16_t fillmask;
+ bool inuse;
+ int blks_per_cluster;
+ int usedcount;
+ int error = 0;
+
+ if (xfs_scrub_should_terminate(&error))
+ return error;
+
+ /* Fragment of the old btrees; dispose of them later. */
+ if (rec->rm_owner == XFS_RMAP_OWN_INOBT) {
+ fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+ rec->rm_startblock);
+ return xfs_repair_collect_btree_extent(ri->sc, &ri->btlist,
+ fsbno, rec->rm_blockcount);
+ }
+
+ /* Skip extents which are not owned by this inode and fork. */
+ if (rec->rm_owner != XFS_RMAP_OWN_INODES)
+ return 0;
+
+ agno = cur->bc_private.a.agno;
+ blks_per_cluster = xfs_icluster_size_fsb(mp);
+ nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
+
+ if (rec->rm_startblock % blks_per_cluster != 0)
+ return -EFSCORRUPTED;
+
+ trace_xfs_repair_ialloc_extent_fn(mp, cur->bc_private.a.agno,
+ rec->rm_startblock, rec->rm_blockcount, rec->rm_owner,
+ rec->rm_offset, rec->rm_flags);
+
+ /*
+ * Determine the inode block alignment, and where the block
+ * ought to start if it's aligned properly. On a sparse inode
+ * system the rmap doesn't have to start on an alignment boundary,
+ * but the record does. On pre-sparse filesystems, we /must/
+ * start both rmap and inobt on an alignment boundary.
+ */
+ inoalign = xfs_ialloc_cluster_alignment(mp);
+ agbno = rec->rm_startblock;
+ agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
+ rmino = XFS_OFFBNO_TO_AGINO(mp, rounddown(agbno, inoalign), 0);
+ if (!xfs_sb_version_hassparseinodes(&mp->m_sb) && agino != rmino)
+ return -EFSCORRUPTED;
+
+ /*
+ * For each cluster in this blob of inode, we must calculate the
+ * properly aligned startino of that cluster, then iterate each
+ * cluster to fill in used and filled masks appropriately. We
+ * then use the (startino, used, filled) information to construct
+ * the appropriate inode records.
+ */
+ for (agbno = rec->rm_startblock;
+ agbno < rec->rm_startblock + rec->rm_blockcount;
+ agbno += blks_per_cluster) {
+ /* The per-AG inum of this inode cluster. */
+ agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
+
+ /* The per-AG inum of the inobt record. */
+ startino = rmino +
+ rounddown(agino - rmino, XFS_INODES_PER_CHUNK);
+ cdist = agino - startino;
+
+ /* Every inode in this holemask slot is filled. */
+ fillmask = xfs_inobt_maskn(
+ cdist / XFS_INODES_PER_HOLEMASK_BIT,
+ nr_inodes / XFS_INODES_PER_HOLEMASK_BIT);
+
+ /* Grab the inode cluster buffer. */
+ imap.im_blkno = XFS_AGB_TO_DADDR(mp, agno, agbno);
+ imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster);
+ imap.im_boffset = 0;
+
+ error = xfs_imap_to_bp(mp, cur->bc_tp, &imap,
+ &dip, &bp, 0, XFS_IGET_UNTRUSTED);
+ if (error)
+ return error;
+
+ usedmask = 0;
+ usedcount = 0;
+ /* Which inodes within this cluster are free? */
+ for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
+ fsino = XFS_AGINO_TO_INO(mp, cur->bc_private.a.agno,
+ agino + clusterino);
+ error = xfs_repair_ialloc_check_free(cur, bp, fsino,
+ clusterino, &inuse);
+ if (error) {
+ xfs_trans_brelse(cur->bc_tp, bp);
+ return error;
+ }
+ if (inuse) {
+ usedcount++;
+ usedmask |= XFS_INOBT_MASK(cdist + clusterino);
+ }
+ }
+ xfs_trans_brelse(cur->bc_tp, bp);
+
+ /*
+ * If the last item in the list is our chunk record,
+ * update that.
+ */
+ if (!list_empty(&ri->extlist)) {
+ rie = list_last_entry(&ri->extlist,
+ struct xfs_repair_ialloc_extent, list);
+ if (rie->startino + XFS_INODES_PER_CHUNK > startino) {
+ rie->freemask &= ~usedmask;
+ rie->holemask &= ~fillmask;
+ rie->count += nr_inodes;
+ rie->usedcount += usedcount;
+ continue;
+ }
+ }
+
+ /* New inode chunk; add to the list. */
+ rie = kmem_alloc(sizeof(struct xfs_repair_ialloc_extent),
+ KM_MAYFAIL | KM_NOFS);
+ if (!rie)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&rie->list);
+ rie->startino = startino;
+ rie->freemask = XFS_INOBT_ALL_FREE & ~usedmask;
+ rie->holemask = XFS_INOBT_ALL_FREE & ~fillmask;
+ rie->count = nr_inodes;
+ rie->usedcount = usedcount;
+ list_add_tail(&rie->list, &ri->extlist);
+ ri->nr_records++;
+ }
+
+ return 0;
+}
+
+/* Compare two ialloc extents. */
+static int
+xfs_repair_ialloc_extent_cmp(
+ void *priv,
+ struct list_head *a,
+ struct list_head *b)
+{
+ struct xfs_repair_ialloc_extent *ap;
+ struct xfs_repair_ialloc_extent *bp;
+
+ ap = container_of(a, struct xfs_repair_ialloc_extent, list);
+ bp = container_of(b, struct xfs_repair_ialloc_extent, list);
+
+ if (ap->startino > bp->startino)
+ return 1;
+ else if (ap->startino < bp->startino)
+ return -1;
+ return 0;
+}
+
+/* Repair both inode btrees. */
+int
+xfs_repair_iallocbt(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_repair_ialloc ri;
+ struct xfs_owner_info oinfo;
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_buf *bp;
+ struct xfs_repair_ialloc_extent *rie;
+ struct xfs_repair_ialloc_extent *n;
+ struct xfs_agi *agi;
+ struct xfs_btree_cur *cur = NULL;
+ struct xfs_perag *pag;
+ xfs_fsblock_t inofsb;
+ xfs_fsblock_t finofsb;
+ xfs_extlen_t nr_blocks;
+ unsigned int count;
+ unsigned int usedcount;
+ int stat;
+ int logflags;
+ int error = 0;
+
+ /* We require the rmapbt to rebuild anything. */
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return -EOPNOTSUPP;
+
+ /* Collect all reverse mappings for inode blocks. */
+ xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
+ INIT_LIST_HEAD(&ri.extlist);
+ INIT_LIST_HEAD(&ri.btlist);
+ ri.nr_records = 0;
+ ri.sc = sc;
+
+ cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
+ error = xfs_rmap_query_all(cur, xfs_repair_ialloc_extent_fn, &ri);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+
+ /* Do we actually have enough space to do this? */
+ pag = xfs_perag_get(mp, sc->sa.agno);
+ nr_blocks = xfs_iallocbt_calc_size(mp, ri.nr_records);
+ if (xfs_sb_version_hasfinobt(&mp->m_sb))
+ nr_blocks *= 2;
+ if (!xfs_repair_ag_has_space(pag, nr_blocks, XFS_AG_RESV_NONE)) {
+ xfs_perag_put(pag);
+ error = -ENOSPC;
+ goto out;
+ }
+ xfs_perag_put(pag);
+
+ /* Invalidate all the inobt/finobt blocks in btlist. */
+ error = xfs_repair_invalidate_blocks(sc, &ri.btlist);
+ if (error)
+ goto out;
+
+ agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
+ /* Initialize new btree roots. */
+ error = xfs_repair_alloc_ag_block(sc, &oinfo, &inofsb,
+ XFS_AG_RESV_NONE);
+ if (error)
+ goto out;
+ error = xfs_repair_init_btblock(sc, inofsb, &bp, XFS_BTNUM_INO,
+ &xfs_inobt_buf_ops);
+ if (error)
+ goto out;
+ agi->agi_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, inofsb));
+ agi->agi_level = cpu_to_be32(1);
+ logflags = XFS_AGI_ROOT | XFS_AGI_LEVEL;
+
+ if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+ error = xfs_repair_alloc_ag_block(sc, &oinfo, &finofsb,
+ mp->m_inotbt_nores ? XFS_AG_RESV_NONE :
+ XFS_AG_RESV_METADATA);
+ if (error)
+ goto out;
+ error = xfs_repair_init_btblock(sc, finofsb, &bp,
+ XFS_BTNUM_FINO, &xfs_inobt_buf_ops);
+ if (error)
+ goto out;
+ agi->agi_free_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, finofsb));
+ agi->agi_free_level = cpu_to_be32(1);
+ logflags |= XFS_AGI_FREE_ROOT | XFS_AGI_FREE_LEVEL;
+ }
+
+ xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, logflags);
+ error = xfs_repair_roll_ag_trans(sc);
+ if (error)
+ goto out;
+
+ /* Insert records into the new btrees. */
+ count = 0;
+ usedcount = 0;
+ list_sort(NULL, &ri.extlist, xfs_repair_ialloc_extent_cmp);
+ list_for_each_entry_safe(rie, n, &ri.extlist, list) {
+ count += rie->count;
+ usedcount += rie->usedcount;
+
+ trace_xfs_repair_ialloc_insert(mp, sc->sa.agno, rie->startino,
+ rie->holemask, rie->count,
+ rie->count - rie->usedcount, rie->freemask);
+
+ /* Insert into the inobt. */
+ cur = xfs_inobt_init_cursor(mp, sc->tp, sc->sa.agi_bp,
+ sc->sa.agno, XFS_BTNUM_INO);
+ error = xfs_inobt_lookup(cur, rie->startino, XFS_LOOKUP_EQ,
+ &stat);
+ if (error)
+ goto out;
+ XFS_WANT_CORRUPTED_GOTO(mp, stat == 0, out);
+ error = xfs_inobt_insert_rec(cur, rie->holemask, rie->count,
+ rie->count - rie->usedcount, rie->freemask,
+ &stat);
+ if (error)
+ goto out;
+ XFS_WANT_CORRUPTED_GOTO(mp, stat == 1, out);
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+
+ /* Insert into the finobt. */
+ if (rie->count != rie->usedcount &&
+ xfs_sb_version_hasfinobt(&mp->m_sb)) {
+ cur = xfs_inobt_init_cursor(mp, sc->tp, sc->sa.agi_bp,
+ sc->sa.agno, XFS_BTNUM_FINO);
+ error = xfs_inobt_lookup(cur, rie->startino,
+ XFS_LOOKUP_EQ, &stat);
+ if (error)
+ goto out;
+ XFS_WANT_CORRUPTED_GOTO(mp, stat == 0, out);
+ error = xfs_inobt_insert_rec(cur, rie->holemask,
+ rie->count, rie->count - rie->usedcount,
+ rie->freemask, &stat);
+ if (error)
+ goto out;
+ XFS_WANT_CORRUPTED_GOTO(mp, stat == 1, out);
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+ }
+
+ error = xfs_repair_roll_ag_trans(sc);
+ if (error)
+ goto out;
+
+ list_del(&rie->list);
+ kmem_free(rie);
+ }
+
+ /* Update the AGI counters. */
+ agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
+ if (be32_to_cpu(agi->agi_count) != count ||
+ be32_to_cpu(agi->agi_freecount) != count - usedcount) {
+ pag = xfs_perag_get(mp, sc->sa.agno);
+ pag->pagi_init = 0;
+ xfs_perag_put(pag);
+
+ agi->agi_count = cpu_to_be32(count);
+ agi->agi_freecount = cpu_to_be32(count - usedcount);
+ xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp,
+ XFS_AGI_COUNT | XFS_AGI_FREECOUNT);
+ sc->reset_counters = true;
+ }
+
+ /* Free the old inode btree blocks if they're not in use. */
+ error = xfs_repair_reap_btree_extents(sc, &ri.btlist, &oinfo,
+ XFS_AG_RESV_NONE);
+ if (error)
+ goto out;
+
+ return error;
+out:
+ if (cur)
+ xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+ xfs_repair_cancel_btree_extents(sc, &ri.btlist);
+ list_for_each_entry_safe(rie, n, &ri.extlist, list) {
+ list_del(&rie->list);
+ kmem_free(rie);
+ }
+ return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 5756d27..b8d0f4d 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -75,5 +75,6 @@ int xfs_repair_agf(struct xfs_scrub_context *sc);
int xfs_repair_agfl(struct xfs_scrub_context *sc);
int xfs_repair_agi(struct xfs_scrub_context *sc);
int xfs_repair_allocbt(struct xfs_scrub_context *sc);
+int xfs_repair_iallocbt(struct xfs_scrub_context *sc);
#endif /* __XFS_SCRUB_REPAIR_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index b15c320..7824913 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -246,10 +246,12 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
{ /* inobt */
.setup = xfs_scrub_setup_ag_iallocbt,
.scrub = xfs_scrub_inobt,
+ .repair = xfs_repair_iallocbt,
},
{ /* finobt */
.setup = xfs_scrub_setup_ag_iallocbt,
.scrub = xfs_scrub_finobt,
+ .repair = xfs_repair_iallocbt,
.has = xfs_sb_version_hasfinobt,
},
{ /* rmapbt */
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 15/19] xfs: rebuild the rmapbt
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (13 preceding siblings ...)
2017-08-25 22:18 ` [PATCH 14/19] xfs: repair inode btrees Darrick J. Wong
@ 2017-08-25 22:18 ` Darrick J. Wong
2017-08-25 22:18 ` [PATCH 16/19] xfs: repair refcount btrees Darrick J. Wong
` (3 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:18 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Rebuild the reverse mapping btree from all primary metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/scrub/repair.c | 75 +++++
fs/xfs/scrub/repair.h | 4
fs/xfs/scrub/rmap.c | 736 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/scrub.c | 9 +
fs/xfs/scrub/scrub.h | 1
fs/xfs/xfs_super.c | 26 ++
6 files changed, 850 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 9df2f97..935b641 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -30,6 +30,7 @@
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_inode.h"
+#include "xfs_icache.h"
#include "xfs_alloc.h"
#include "xfs_alloc_btree.h"
#include "xfs_ialloc.h"
@@ -909,3 +910,77 @@ xfs_repair_calc_ag_resblks(
return max(max(bnobt_sz, inobt_sz), max(rmapbt_sz, refcbt_sz));
}
+
+/* Freeze the FS against outside activity. */
+int
+xfs_repair_fs_freeze(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_mount *mp = sc->mp;
+ struct super_block *sb = mp->m_super;
+ int error;
+
+ xfs_icache_disable_reclaim(mp);
+
+ /* Freeze out any further writes or page faults. */
+ error = freeze_super(sb);
+ if (error)
+ return error;
+
+ /* Thaw it to the point that we can make transactions. */
+ down_write(&sb->s_umount);
+ sb->s_writers.frozen = SB_FREEZE_FS;
+ percpu_rwsem_acquire(sb->s_writers.rw_sem + SB_FREEZE_FS - 1,
+ 0, _THIS_IP_);
+ percpu_up_write(sb->s_writers.rw_sem + SB_FREEZE_FS - 1);
+ up_write(&sb->s_umount);
+ sc->fs_frozen = true;
+
+ return 0;
+}
+
+/* Unfreeze the FS. */
+int
+xfs_repair_fs_thaw(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_mount *mp = sc->mp;
+ struct super_block *sb = mp->m_super;
+ int error;
+
+ WARN_ON(sb->s_writers.frozen != SB_FREEZE_FS);
+
+ /* Re-freeze the last level of filesystem. */
+ down_write(&sb->s_umount);
+ percpu_down_write(sb->s_writers.rw_sem + SB_FREEZE_FS - 1);
+ percpu_rwsem_release(sb->s_writers.rw_sem + SB_FREEZE_FS - 1,
+ 0, _THIS_IP_);
+ sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+ up_write(&sb->s_umount);
+
+ /* Thaw everything. */
+ error = thaw_super(sb);
+ xfs_icache_enable_reclaim(mp);
+ return error;
+}
+
+/* Read all AG headers and attach to this transaction. */
+int
+xfs_repair_grab_all_ag_headers(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_buf *agi;
+ struct xfs_buf *agf;
+ struct xfs_buf *agfl;
+ xfs_agnumber_t agno;
+ int error = 0;
+
+ for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+ error = xfs_scrub_ag_read_headers(sc, agno, &agi, &agf, &agfl);
+ if (error)
+ break;
+ }
+
+ return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index b8d0f4d..43c7cd2 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -68,6 +68,9 @@ int xfs_repair_find_ag_btree_roots(struct xfs_scrub_context *sc,
int xfs_repair_reset_counters(struct xfs_mount *mp);
xfs_extlen_t xfs_repair_calc_ag_resblks(struct xfs_scrub_context *sc);
int xfs_repair_setup_btree_extent_collection(struct xfs_scrub_context *sc);
+int xfs_repair_fs_freeze(struct xfs_scrub_context *sc);
+int xfs_repair_fs_thaw(struct xfs_scrub_context *sc);
+int xfs_repair_grab_all_ag_headers(struct xfs_scrub_context *sc);
/* Metadata repairers */
int xfs_repair_superblock(struct xfs_scrub_context *sc);
@@ -76,5 +79,6 @@ int xfs_repair_agfl(struct xfs_scrub_context *sc);
int xfs_repair_agi(struct xfs_scrub_context *sc);
int xfs_repair_allocbt(struct xfs_scrub_context *sc);
int xfs_repair_iallocbt(struct xfs_scrub_context *sc);
+int xfs_repair_rmapbt(struct xfs_scrub_context *sc);
#endif /* __XFS_SCRUB_REPAIR_H__ */
diff --git a/fs/xfs/scrub/rmap.c b/fs/xfs/scrub/rmap.c
index e3129ed..9cc463b 100644
--- a/fs/xfs/scrub/rmap.c
+++ b/fs/xfs/scrub/rmap.c
@@ -32,15 +32,21 @@
#include "xfs_inode.h"
#include "xfs_icache.h"
#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
#include "xfs_alloc.h"
+#include "xfs_alloc_btree.h"
#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_bmap.h"
#include "xfs_bmap_btree.h"
#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/btree.h"
#include "scrub/trace.h"
+#include "scrub/repair.h"
/*
* Set us up to scrub reverse mapping btrees.
@@ -50,7 +56,35 @@ xfs_scrub_setup_ag_rmapbt(
struct xfs_scrub_context *sc,
struct xfs_inode *ip)
{
- return xfs_scrub_setup_ag_btree(sc, ip, false);
+ int error;
+
+ if (!(sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR))
+ return xfs_scrub_setup_ag_btree(sc, ip, false);
+
+ /*
+ * Freeze out anything that can lock an inode. We reconstruct
+ * the rmapbt by reading inode bmaps with the AGF held, which is
+ * only safe w.r.t. ABBA deadlocks if we're the only ones locking
+ * inodes.
+ */
+ error = xfs_repair_fs_freeze(sc);
+ if (error)
+ return error;
+
+ /* Check the AG number and set up the scrub context. */
+ error = xfs_scrub_setup_ag_header(sc, ip);
+ if (error)
+ return error;
+
+ /*
+ * Lock all the AG header buffers so that we can read all the
+ * per-AG metadata too.
+ */
+ error = xfs_repair_grab_all_ag_headers(sc);
+ if (error)
+ return error;
+
+ return xfs_scrub_ag_init(sc, sc->sm->sm_agno, &sc->sa);
}
/* Reverse-mapping scrubber. */
@@ -400,3 +434,703 @@ xfs_scrub_rmapbt(
return xfs_scrub_btree(sc, sc->sa.rmap_cur, xfs_scrub_rmapbt_helper,
&oinfo, NULL);
}
+
+/* Reverse-mapping repair. */
+
+struct xfs_repair_rmapbt_extent {
+ struct list_head list;
+ struct xfs_rmap_irec rmap;
+};
+
+struct xfs_repair_rmapbt {
+ struct list_head rmaplist;
+ struct list_head rmap_freelist;
+ struct list_head bno_freelist;
+ struct xfs_scrub_context *sc;
+ uint64_t owner;
+ xfs_extlen_t btblocks;
+ xfs_agblock_t next_bno;
+ uint64_t nr_records;
+};
+
+/* Initialize an rmap. */
+static inline int
+xfs_repair_rmapbt_new_rmap(
+ struct xfs_repair_rmapbt *rr,
+ xfs_agblock_t startblock,
+ xfs_extlen_t blockcount,
+ uint64_t owner,
+ uint64_t offset,
+ unsigned int flags)
+{
+ struct xfs_repair_rmapbt_extent *rre;
+ int error = 0;
+
+ trace_xfs_repair_rmap_extent_fn(rr->sc->mp, rr->sc->sa.agno,
+ startblock, blockcount, owner, offset, flags);
+
+ if (xfs_scrub_should_terminate(&error))
+ return error;
+
+ rre = kmem_alloc(sizeof(struct xfs_repair_rmapbt_extent),
+ KM_MAYFAIL | KM_NOFS);
+ if (!rre)
+ return -ENOMEM;
+ INIT_LIST_HEAD(&rre->list);
+ rre->rmap.rm_startblock = startblock;
+ rre->rmap.rm_blockcount = blockcount;
+ rre->rmap.rm_owner = owner;
+ rre->rmap.rm_offset = offset;
+ rre->rmap.rm_flags = flags;
+ list_add_tail(&rre->list, &rr->rmaplist);
+ rr->nr_records++;
+
+ return 0;
+}
+
+/* Add an AGFL block to the rmap list. */
+STATIC int
+xfs_repair_rmapbt_walk_agfl(
+ struct xfs_scrub_context *sc,
+ xfs_agblock_t bno,
+ void *priv)
+{
+ struct xfs_repair_rmapbt *rr = priv;
+
+ return xfs_repair_rmapbt_new_rmap(rr, bno, 1, XFS_RMAP_OWN_AG, 0, 0);
+}
+
+/* Add a btree block to the rmap list. */
+STATIC int
+xfs_repair_rmapbt_visit_btblock(
+ struct xfs_btree_cur *cur,
+ int level,
+ void *priv)
+{
+ struct xfs_repair_rmapbt *rr = priv;
+ struct xfs_buf *bp;
+ xfs_fsblock_t fsb;
+
+ xfs_btree_get_block(cur, level, &bp);
+ if (!bp)
+ return 0;
+
+ rr->btblocks++;
+ fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+ return xfs_repair_rmapbt_new_rmap(rr, XFS_FSB_TO_AGBNO(cur->bc_mp, fsb),
+ 1, rr->owner, 0, 0);
+}
+
+/* Record inode btree rmaps. */
+STATIC int
+xfs_repair_rmapbt_inodes(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_rec *rec,
+ void *priv)
+{
+ struct xfs_inobt_rec_incore irec;
+ struct xfs_repair_rmapbt *rr = priv;
+ struct xfs_mount *mp = cur->bc_mp;
+ struct xfs_buf *bp;
+ xfs_fsblock_t fsb;
+ xfs_agino_t agino;
+ xfs_agino_t iperhole;
+ unsigned int i;
+ int error;
+
+ /* Record the inobt blocks */
+ for (i = 0; i < cur->bc_nlevels && cur->bc_ptrs[i] == 1; i++) {
+ xfs_btree_get_block(cur, i, &bp);
+ if (!bp)
+ continue;
+ fsb = XFS_DADDR_TO_FSB(mp, bp->b_bn);
+ error = xfs_repair_rmapbt_new_rmap(rr,
+ XFS_FSB_TO_AGBNO(mp, fsb), 1,
+ XFS_RMAP_OWN_INOBT, 0, 0);
+ if (error)
+ return error;
+ }
+
+ xfs_inobt_btrec_to_irec(mp, rec, &irec);
+
+ /* Record a non-sparse inode chunk. */
+ if (irec.ir_holemask == XFS_INOBT_HOLEMASK_FULL)
+ return xfs_repair_rmapbt_new_rmap(rr,
+ XFS_AGINO_TO_AGBNO(mp, irec.ir_startino),
+ XFS_INODES_PER_CHUNK / mp->m_sb.sb_inopblock,
+ XFS_RMAP_OWN_INODES, 0, 0);
+
+ /* Iterate each chunk. */
+ iperhole = max_t(xfs_agino_t, mp->m_sb.sb_inopblock,
+ XFS_INODES_PER_HOLEMASK_BIT);
+ for (i = 0, agino = irec.ir_startino;
+ i < XFS_INOBT_HOLEMASK_BITS;
+ i += iperhole / XFS_INODES_PER_HOLEMASK_BIT, agino += iperhole) {
+ /* Skip holes. */
+ if (irec.ir_holemask & (1 << i))
+ continue;
+
+ /* Record the inode chunk otherwise. */
+ error = xfs_repair_rmapbt_new_rmap(rr,
+ XFS_AGINO_TO_AGBNO(mp, agino),
+ iperhole / mp->m_sb.sb_inopblock,
+ XFS_RMAP_OWN_INODES, 0, 0);
+ if (error)
+ return error;
+ }
+
+ return 0;
+}
+
+/* Record a CoW staging extent. */
+STATIC int
+xfs_repair_rmapbt_refcount(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_rec *rec,
+ void *priv)
+{
+ struct xfs_repair_rmapbt *rr = priv;
+ struct xfs_refcount_irec refc;
+
+ xfs_refcount_btrec_to_irec(rec, &refc);
+ if (refc.rc_refcount != 1)
+ return -EFSCORRUPTED;
+
+ return xfs_repair_rmapbt_new_rmap(rr,
+ refc.rc_startblock - XFS_REFC_COW_START,
+ refc.rc_blockcount, XFS_RMAP_OWN_COW, 0, 0);
+}
+
+/* Add a bmbt block to the rmap list. */
+STATIC int
+xfs_repair_rmapbt_visit_bmbt(
+ struct xfs_btree_cur *cur,
+ int level,
+ void *priv)
+{
+ struct xfs_repair_rmapbt *rr = priv;
+ struct xfs_buf *bp;
+ xfs_fsblock_t fsb;
+ unsigned int flags = XFS_RMAP_BMBT_BLOCK;
+
+ xfs_btree_get_block(cur, level, &bp);
+ if (!bp)
+ return 0;
+
+ fsb = XFS_DADDR_TO_FSB(cur->bc_mp, bp->b_bn);
+ if (XFS_FSB_TO_AGNO(cur->bc_mp, fsb) != rr->sc->sa.agno)
+ return 0;
+
+ if (cur->bc_private.b.whichfork == XFS_ATTR_FORK)
+ flags |= XFS_RMAP_ATTR_FORK;
+ return xfs_repair_rmapbt_new_rmap(rr,
+ XFS_FSB_TO_AGBNO(cur->bc_mp, fsb), 1,
+ cur->bc_private.b.ip->i_ino, 0, flags);
+}
+
+/* Determine rmap flags from fork and bmbt state. */
+static inline unsigned int
+xfs_repair_rmapbt_bmap_flags(
+ int whichfork,
+ xfs_exntst_t state)
+{
+ return (whichfork == XFS_ATTR_FORK ? XFS_RMAP_ATTR_FORK : 0) |
+ (state == XFS_EXT_UNWRITTEN ? XFS_RMAP_UNWRITTEN : 0);
+}
+
+/* Find all the extents from a given AG in an inode fork. */
+STATIC int
+xfs_repair_rmapbt_scan_ifork(
+ struct xfs_repair_rmapbt *rr,
+ struct xfs_inode *ip,
+ int whichfork)
+{
+ struct xfs_bmbt_irec rec;
+ struct xfs_mount *mp = rr->sc->mp;
+ struct xfs_btree_cur *cur = NULL;
+ struct xfs_ifork *ifp;
+ unsigned int rflags;
+ xfs_extnum_t idx;
+ bool found;
+ int fmt;
+ int error;
+
+ /* Do we even have data mapping extents? */
+ fmt = XFS_IFORK_FORMAT(ip, whichfork);
+ switch (fmt) {
+ case XFS_DINODE_FMT_BTREE:
+ case XFS_DINODE_FMT_EXTENTS:
+ break;
+ default:
+ return 0;
+ }
+ if (!XFS_IFORK_PTR(ip, whichfork))
+ return 0;
+
+ /* Find all the BMBT blocks in the AG. */
+ if (fmt == XFS_DINODE_FMT_BTREE) {
+ cur = xfs_bmbt_init_cursor(mp, rr->sc->tp, ip, whichfork);
+ error = xfs_btree_visit_blocks(cur,
+ xfs_repair_rmapbt_visit_bmbt, rr);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+ }
+
+ /* We're done if this is an rt inode's data fork. */
+ if (whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip))
+ return 0;
+
+ /* Find all the extents in the AG. */
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ for (found = xfs_iext_lookup_extent(ip, ifp, 0, &idx, &rec);
+ found;
+ found = xfs_iext_get_extent(ifp, ++idx, &rec)) {
+ if (isnullstartblock(rec.br_startblock))
+ continue;
+ /* Stash non-hole extent. */
+ if (XFS_FSB_TO_AGNO(mp, rec.br_startblock) == rr->sc->sa.agno) {
+ rflags = xfs_repair_rmapbt_bmap_flags(whichfork,
+ rec.br_state);
+ error = xfs_repair_rmapbt_new_rmap(rr,
+ XFS_FSB_TO_AGBNO(mp, rec.br_startblock),
+ rec.br_blockcount, ip->i_ino,
+ rec.br_startoff, rflags);
+ if (error)
+ goto out;
+ }
+ }
+out:
+ if (cur)
+ xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+ return error;
+}
+
+/* Iterate all the inodes in an AG group. */
+STATIC int
+xfs_repair_rmapbt_scan_inobt(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_rec *rec,
+ void *priv)
+{
+ struct xfs_inobt_rec_incore irec;
+ struct xfs_mount *mp = cur->bc_mp;
+ struct xfs_inode *ip = NULL;
+ xfs_ino_t ino;
+ xfs_agino_t agino;
+ int chunkidx;
+ int lock_mode = 0;
+ int error;
+
+ xfs_inobt_btrec_to_irec(mp, rec, &irec);
+
+ for (chunkidx = 0, agino = irec.ir_startino;
+ chunkidx < XFS_INODES_PER_CHUNK;
+ chunkidx++, agino++) {
+ /* Skip if this inode is free */
+ if (XFS_INOBT_MASK(chunkidx) & irec.ir_free)
+ continue;
+ ino = XFS_AGINO_TO_INO(mp, cur->bc_private.a.agno, agino);
+ error = xfs_iget(mp, cur->bc_tp, ino, 0, 0, &ip);
+ if (error)
+ return error;
+
+ if ((ip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
+ !(ip->i_df.if_flags & XFS_IFEXTENTS)) ||
+ (ip->i_d.di_aformat == XFS_DINODE_FMT_BTREE &&
+ !(ip->i_afp->if_flags & XFS_IFEXTENTS)))
+ lock_mode = XFS_ILOCK_EXCL;
+ else
+ lock_mode = XFS_ILOCK_SHARED;
+ if (!xfs_ilock_nowait(ip, lock_mode)) {
+ error = -EBUSY;
+ goto out_rele;
+ }
+
+ /* Check the data fork. */
+ error = xfs_repair_rmapbt_scan_ifork(priv, ip, XFS_DATA_FORK);
+ if (error)
+ goto out_unlock;
+
+ /* Check the attr fork. */
+ error = xfs_repair_rmapbt_scan_ifork(priv, ip, XFS_ATTR_FORK);
+ if (error)
+ goto out_unlock;
+
+ xfs_iunlock(ip, lock_mode);
+ iput(VFS_I(ip));
+ ip = NULL;
+ }
+
+ return error;
+out_unlock:
+ xfs_iunlock(ip, lock_mode);
+out_rele:
+ iput(VFS_I(ip));
+ return error;
+}
+
+/* Record extents that aren't in use from gaps in the rmap records. */
+STATIC int
+xfs_repair_rmapbt_record_rmap_freesp(
+ struct xfs_btree_cur *cur,
+ struct xfs_rmap_irec *rec,
+ void *priv)
+{
+ struct xfs_repair_rmapbt *rr = priv;
+ xfs_fsblock_t fsb;
+ int error;
+
+ /* Record the free space we find. */
+ if (rec->rm_startblock > rr->next_bno) {
+ fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+ rr->next_bno);
+ error = xfs_repair_collect_btree_extent(rr->sc,
+ &rr->rmap_freelist, fsb,
+ rec->rm_startblock - rr->next_bno);
+ if (error)
+ return error;
+ }
+ rr->next_bno = max_t(xfs_agblock_t, rr->next_bno,
+ rec->rm_startblock + rec->rm_blockcount);
+ return 0;
+}
+
+/* Record extents that aren't in use from the bnobt records. */
+STATIC int
+xfs_repair_rmapbt_record_bno_freesp(
+ struct xfs_btree_cur *cur,
+ struct xfs_alloc_rec_incore *rec,
+ void *priv)
+{
+ struct xfs_repair_rmapbt *rr = priv;
+ xfs_fsblock_t fsb;
+
+ /* Record the free space we find. */
+ fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+ rec->ar_startblock);
+ return xfs_repair_collect_btree_extent(rr->sc, &rr->bno_freelist,
+ fsb, rec->ar_blockcount);
+}
+
+/* Compare two rmapbt extents. */
+static int
+xfs_repair_rmapbt_extent_cmp(
+ void *priv,
+ struct list_head *a,
+ struct list_head *b)
+{
+ struct xfs_repair_rmapbt_extent *ap;
+ struct xfs_repair_rmapbt_extent *bp;
+
+ ap = container_of(a, struct xfs_repair_rmapbt_extent, list);
+ bp = container_of(b, struct xfs_repair_rmapbt_extent, list);
+ return xfs_rmap_compare(&ap->rmap, &bp->rmap);
+}
+
+#define RMAP(type, startblock, blockcount) xfs_repair_rmapbt_new_rmap( \
+ &rr, (startblock), (blockcount), \
+ XFS_RMAP_OWN_##type, 0, 0)
+/* Repair the rmap btree for some AG. */
+int
+xfs_repair_rmapbt(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_repair_rmapbt rr;
+ struct xfs_owner_info oinfo;
+ struct xfs_repair_rmapbt_extent *rre;
+ struct xfs_repair_rmapbt_extent *n;
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_btree_cur *cur = NULL;
+ struct xfs_buf *bp = NULL;
+ struct xfs_agf *agf;
+ struct xfs_agi *agi;
+ struct xfs_perag *pag;
+ xfs_fsblock_t btfsb;
+ xfs_agnumber_t ag;
+ xfs_agblock_t agend;
+ xfs_extlen_t freesp_btblocks;
+ int error;
+
+ INIT_LIST_HEAD(&rr.rmaplist);
+ INIT_LIST_HEAD(&rr.rmap_freelist);
+ INIT_LIST_HEAD(&rr.bno_freelist);
+ rr.sc = sc;
+ rr.nr_records = 0;
+
+ /* Collect rmaps for all AG headers. */
+ error = RMAP(FS, XFS_SB_BLOCK(mp), 1);
+ if (error)
+ goto out;
+ rre = list_last_entry(&rr.rmaplist, struct xfs_repair_rmapbt_extent,
+ list);
+
+ if (rre->rmap.rm_startblock != XFS_AGF_BLOCK(mp)) {
+ error = RMAP(FS, XFS_AGF_BLOCK(mp), 1);
+ if (error)
+ goto out;
+ rre = list_last_entry(&rr.rmaplist,
+ struct xfs_repair_rmapbt_extent, list);
+ }
+
+ if (rre->rmap.rm_startblock != XFS_AGI_BLOCK(mp)) {
+ error = RMAP(FS, XFS_AGI_BLOCK(mp), 1);
+ if (error)
+ goto out;
+ rre = list_last_entry(&rr.rmaplist,
+ struct xfs_repair_rmapbt_extent, list);
+ }
+
+ if (rre->rmap.rm_startblock != XFS_AGFL_BLOCK(mp)) {
+ error = RMAP(FS, XFS_AGFL_BLOCK(mp), 1);
+ if (error)
+ goto out;
+ }
+
+ error = xfs_scrub_walk_agfl(sc, xfs_repair_rmapbt_walk_agfl, &rr);
+ if (error)
+ goto out;
+
+ /* Collect rmap for the log if it's in this AG. */
+ if (mp->m_sb.sb_logstart &&
+ XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == sc->sa.agno) {
+ error = RMAP(LOG, XFS_FSB_TO_AGBNO(mp, mp->m_sb.sb_logstart),
+ mp->m_sb.sb_logblocks);
+ if (error)
+ goto out;
+ }
+
+ /* Collect rmaps for the free space btrees. */
+ rr.owner = XFS_RMAP_OWN_AG;
+ rr.btblocks = 0;
+ cur = xfs_allocbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno,
+ XFS_BTNUM_BNO);
+ error = xfs_btree_visit_blocks(cur, xfs_repair_rmapbt_visit_btblock,
+ &rr);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+
+ /* Collect rmaps for the cntbt. */
+ cur = xfs_allocbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno,
+ XFS_BTNUM_CNT);
+ error = xfs_btree_visit_blocks(cur, xfs_repair_rmapbt_visit_btblock,
+ &rr);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+ freesp_btblocks = rr.btblocks;
+
+ /* Collect rmaps for the inode btree. */
+ cur = xfs_inobt_init_cursor(mp, sc->tp, sc->sa.agi_bp, sc->sa.agno,
+ XFS_BTNUM_INO);
+ error = xfs_btree_query_all(cur, xfs_repair_rmapbt_inodes, &rr);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+
+ /* If there are no inodes, we have to include the inobt root. */
+ agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
+ if (agi->agi_count == cpu_to_be32(0)) {
+ error = xfs_repair_rmapbt_new_rmap(&rr,
+ be32_to_cpu(agi->agi_root), 1,
+ XFS_RMAP_OWN_INOBT, 0, 0);
+ if (error)
+ goto out;
+ }
+
+ /* Collect rmaps for the free inode btree. */
+ if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+ rr.owner = XFS_RMAP_OWN_INOBT;
+ cur = xfs_inobt_init_cursor(mp, sc->tp, sc->sa.agi_bp,
+ sc->sa.agno, XFS_BTNUM_FINO);
+ error = xfs_btree_visit_blocks(cur,
+ xfs_repair_rmapbt_visit_btblock, &rr);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+ }
+
+ /* Collect rmaps for the refcount btree. */
+ if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+ union xfs_btree_irec low;
+ union xfs_btree_irec high;
+
+ rr.owner = XFS_RMAP_OWN_REFC;
+ cur = xfs_refcountbt_init_cursor(mp, sc->tp, sc->sa.agf_bp,
+ sc->sa.agno, NULL);
+ error = xfs_btree_visit_blocks(cur,
+ xfs_repair_rmapbt_visit_btblock, &rr);
+ if (error)
+ goto out;
+
+ /* Collect rmaps for CoW staging extents. */
+ memset(&low, 0, sizeof(low));
+ low.rc.rc_startblock = XFS_REFC_COW_START;
+ memset(&high, 0xFF, sizeof(high));
+ error = xfs_btree_query_range(cur, &low, &high,
+ xfs_repair_rmapbt_refcount, &rr);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+ }
+
+ /* Iterate all AGs for inodes. */
+ for (ag = 0; ag < mp->m_sb.sb_agcount; ag++) {
+ error = xfs_ialloc_read_agi(mp, sc->tp, ag, &bp);
+ if (error)
+ goto out;
+ cur = xfs_inobt_init_cursor(mp, sc->tp, bp, ag, XFS_BTNUM_INO);
+ error = xfs_btree_query_all(cur, xfs_repair_rmapbt_scan_inobt,
+ &rr);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+ xfs_trans_brelse(sc->tp, bp);
+ bp = NULL;
+ }
+
+ /* Do we actually have enough space to do this? */
+ pag = xfs_perag_get(mp, sc->sa.agno);
+ if (!xfs_repair_ag_has_space(pag,
+ xfs_rmapbt_calc_size(mp, rr.nr_records),
+ XFS_AG_RESV_AGFL)) {
+ xfs_perag_put(pag);
+ error = -ENOSPC;
+ goto out;
+ }
+
+ /* XXX: Do we need to invalidate buffers here? */
+
+ /* Initialize a new rmapbt root. */
+ xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_UNKNOWN);
+ agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+ error = xfs_repair_alloc_ag_block(sc, &oinfo, &btfsb, XFS_AG_RESV_AGFL);
+ if (error) {
+ xfs_perag_put(pag);
+ goto out;
+ }
+ error = xfs_repair_init_btblock(sc, btfsb, &bp, XFS_BTNUM_RMAP,
+ &xfs_rmapbt_buf_ops);
+ if (error) {
+ xfs_perag_put(pag);
+ goto out;
+ }
+ agf->agf_roots[XFS_BTNUM_RMAPi] = cpu_to_be32(XFS_FSB_TO_AGBNO(mp,
+ btfsb));
+ agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
+ agf->agf_rmap_blocks = cpu_to_be32(1);
+
+ /* Reset the perag info. */
+ pag->pagf_btreeblks = freesp_btblocks - 2;
+ pag->pagf_levels[XFS_BTNUM_RMAPi] =
+ be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
+
+ /* Now reset the AGF counters. */
+ agf->agf_btreeblks = cpu_to_be32(pag->pagf_btreeblks);
+ xfs_perag_put(pag);
+ xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, XFS_AGF_ROOTS |
+ XFS_AGF_LEVELS | XFS_AGF_RMAP_BLOCKS |
+ XFS_AGF_BTREEBLKS);
+ bp = NULL;
+ error = xfs_repair_roll_ag_trans(sc);
+ if (error)
+ goto out;
+
+ /* Insert all the metadata rmaps. */
+ list_sort(NULL, &rr.rmaplist, xfs_repair_rmapbt_extent_cmp);
+ list_for_each_entry_safe(rre, n, &rr.rmaplist, list) {
+ /* Add the rmap. */
+ cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp,
+ sc->sa.agno);
+ error = xfs_rmap_map_raw(cur, &rre->rmap);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+
+ error = xfs_repair_roll_ag_trans(sc);
+ if (error)
+ goto out;
+
+ list_del(&rre->list);
+ kmem_free(rre);
+
+ /*
+ * Ensure the freelist is full, but don't let it shrink.
+ * The rmapbt isn't fully set up yet, which means that
+ * the current AGFL blocks might not be reflected in the
+ * rmapbt, which is a problem if we want to unmap blocks
+ * from the AGFL.
+ */
+ error = xfs_repair_fix_freelist(sc, false);
+ if (error)
+ goto out;
+ }
+
+ /* Compute free space from the new rmapbt. */
+ rr.next_bno = 0;
+ cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
+ error = xfs_rmap_query_all(cur, xfs_repair_rmapbt_record_rmap_freesp,
+ &rr);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+
+ /* Insert a record for space between the last rmap and EOAG. */
+ agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+ agend = be32_to_cpu(agf->agf_length);
+ if (rr.next_bno < agend) {
+ btfsb = XFS_AGB_TO_FSB(mp, sc->sa.agno, rr.next_bno);
+ error = xfs_repair_collect_btree_extent(sc, &rr.rmap_freelist,
+ btfsb, agend - rr.next_bno);
+ if (error)
+ goto out;
+ }
+
+ /* Compute free space from the existing bnobt. */
+ cur = xfs_allocbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno,
+ XFS_BTNUM_BNO);
+ error = xfs_alloc_query_all(cur, xfs_repair_rmapbt_record_bno_freesp,
+ &rr);
+ if (error)
+ goto out;
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+
+ /*
+ * Free the "free" blocks that the new rmapbt knows about but
+ * the old bnobt doesn't. These are the old rmapbt blocks.
+ */
+ error = xfs_repair_subtract_extents(sc, &rr.rmap_freelist,
+ &rr.bno_freelist);
+ if (error)
+ goto out;
+ xfs_repair_cancel_btree_extents(sc, &rr.bno_freelist);
+ error = xfs_repair_reap_btree_extents(sc, &rr.rmap_freelist, &oinfo,
+ XFS_AG_RESV_AGFL);
+ if (error)
+ goto out;
+
+ return 0;
+out:
+ if (cur)
+ xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+ if (bp)
+ xfs_trans_brelse(sc->tp, bp);
+ xfs_repair_cancel_btree_extents(sc, &rr.bno_freelist);
+ xfs_repair_cancel_btree_extents(sc, &rr.rmap_freelist);
+ list_for_each_entry_safe(rre, n, &rr.rmaplist, list) {
+ list_del(&rre->list);
+ kmem_free(rre);
+ }
+ return error;
+}
+#undef RMAP
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 7824913..87b1dec 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -182,6 +182,8 @@ xfs_scrub_teardown(
struct xfs_inode *ip_in,
int error)
{
+ int err2;
+
xfs_scrub_ag_free(sc, &sc->sa);
if (sc->tp) {
if (error == 0 && (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR))
@@ -190,6 +192,12 @@ xfs_scrub_teardown(
xfs_trans_cancel(sc->tp);
sc->tp = NULL;
}
+ if (sc->fs_frozen) {
+ err2 = xfs_repair_fs_thaw(sc);
+ if (!error && err2)
+ error = err2;
+ sc->fs_frozen = false;
+ }
if (sc->ip) {
xfs_iunlock(sc->ip, sc->ilock_flags);
if (sc->ip != ip_in)
@@ -257,6 +265,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
{ /* rmapbt */
.setup = xfs_scrub_setup_ag_rmapbt,
.scrub = xfs_scrub_rmapbt,
+ .repair = xfs_repair_rmapbt,
.has = xfs_sb_version_hasrmapbt,
},
{ /* refcountbt */
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 70adf0c..41ec126 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -76,6 +76,7 @@ struct xfs_scrub_context {
uint ilock_flags;
bool try_harder;
bool reset_counters;
+ bool fs_frozen;
/* State tracking for single-AG operations. */
struct xfs_scrub_ag sa;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 664db70..5044352 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1409,6 +1409,30 @@ xfs_fs_unfreeze(
return 0;
}
+/* Don't let userspace freeze while we're scrubbing the filesystem. */
+STATIC int
+xfs_fs_freeze_super(
+ struct super_block *sb)
+{
+ struct xfs_mount *mp = XFS_M(sb);
+
+ if (atomic_read(&mp->m_scrubbers) > 0)
+ return -EBUSY;
+ return freeze_super(sb);
+}
+
+/* Don't let userspace thaw while we're scrubbing the filesystem. */
+STATIC int
+xfs_fs_thaw_super(
+ struct super_block *sb)
+{
+ struct xfs_mount *mp = XFS_M(sb);
+
+ if (atomic_read(&mp->m_scrubbers) > 0)
+ return -EBUSY;
+ return thaw_super(sb);
+}
+
STATIC int
xfs_fs_show_options(
struct seq_file *m,
@@ -1752,6 +1776,8 @@ static const struct super_operations xfs_super_operations = {
.show_options = xfs_fs_show_options,
.nr_cached_objects = xfs_fs_nr_cached_objects,
.free_cached_objects = xfs_fs_free_cached_objects,
+ .freeze_super = xfs_fs_freeze_super,
+ .thaw_super = xfs_fs_thaw_super,
};
static struct file_system_type xfs_fs_type = {
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 16/19] xfs: repair refcount btrees
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (14 preceding siblings ...)
2017-08-25 22:18 ` [PATCH 15/19] xfs: rebuild the rmapbt Darrick J. Wong
@ 2017-08-25 22:18 ` Darrick J. Wong
2017-08-25 22:18 ` [PATCH 17/19] xfs: online repair of inodes Darrick J. Wong
` (2 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:18 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Reconstruct the refcount data from the rmap btree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/scrub/refcount.c | 479 +++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/repair.h | 1
fs/xfs/scrub/scrub.c | 1
3 files changed, 481 insertions(+)
diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
index 999820b..d016870 100644
--- a/fs/xfs/scrub/refcount.c
+++ b/fs/xfs/scrub/refcount.c
@@ -29,14 +29,20 @@
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_sb.h"
+#include "xfs_itable.h"
#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_refcount_btree.h"
#include "xfs_alloc.h"
#include "xfs_ialloc.h"
+#include "xfs_error.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/btree.h"
#include "scrub/trace.h"
+#include "scrub/repair.h"
/*
* Set us up to scrub reference count btrees.
@@ -370,3 +376,476 @@ xfs_scrub_refcountbt(
return error;
}
+
+/*
+ * Rebuilding the Reference Count Btree
+ *
+ * This algorithm is "borrowed" from xfs_repair. Imagine the rmap
+ * entries as rectangles representing extents of physical blocks, and
+ * that the rectangles can be laid down to allow them to overlap each
+ * other; then we know that we must emit a refcnt btree entry wherever
+ * the amount of overlap changes, i.e. the emission stimulus is
+ * level-triggered:
+ *
+ * - ---
+ * -- ----- ---- --- ------
+ * -- ---- ----------- ---- ---------
+ * -------------------------------- -----------
+ * ^ ^ ^^ ^^ ^ ^^ ^^^ ^^^^ ^ ^^ ^ ^ ^
+ * 2 1 23 21 3 43 234 2123 1 01 2 3 0
+ *
+ * For our purposes, a rmap is a tuple (startblock, len, fileoff, owner).
+ *
+ * Note that in the actual refcnt btree we don't store the refcount < 2
+ * cases because the bnobt tells us which blocks are free; single-use
+ * blocks aren't recorded in the bnobt or the refcntbt. If the rmapbt
+ * supports storing multiple entries covering a given block we could
+ * theoretically dispense with the refcntbt and simply count rmaps, but
+ * that's inefficient in the (hot) write path, so we'll take the cost of
+ * the extra tree to save time. Also there's no guarantee that rmap
+ * will be enabled.
+ *
+ * Given an array of rmaps sorted by physical block number, a starting
+ * physical block (sp), a bag to hold rmaps that cover sp, and the next
+ * physical block where the level changes (np), we can reconstruct the
+ * refcount btree as follows:
+ *
+ * While there are still unprocessed rmaps in the array,
+ * - Set sp to the physical block (pblk) of the next unprocessed rmap.
+ * - Add to the bag all rmaps in the array where startblock == sp.
+ * - Set np to the physical block where the bag size will change. This
+ * is the minimum of (the pblk of the next unprocessed rmap) and
+ * (startblock + len of each rmap in the bag).
+ * - Record the bag size as old_bag_size.
+ *
+ * - While the bag isn't empty,
+ * - Remove from the bag all rmaps where startblock + len == np.
+ * - Add to the bag all rmaps in the array where startblock == np.
+ * - If the bag size isn't old_bag_size, store the refcount entry
+ * (sp, np - sp, bag_size) in the refcnt btree.
+ * - If the bag is empty, break out of the inner loop.
+ * - Set old_bag_size to the bag size
+ * - Set sp = np.
+ * - Set np to the physical block where the bag size will change.
+ * This is the minimum of (the pblk of the next unprocessed rmap)
+ * and (startblock + len of each rmap in the bag).
+ *
+ * Like all the other repairers, we make a list of all the refcount
+ * records we need, then reinitialize the refcount btree root and
+ * insert all the records.
+ */
+
+struct xfs_repair_refc_rmap {
+ struct list_head list;
+ struct xfs_rmap_irec rmap;
+};
+
+struct xfs_repair_refc_extent {
+ struct list_head list;
+ struct xfs_refcount_irec refc;
+};
+
+struct xfs_repair_refc {
+ struct list_head rmap_bag; /* rmaps we're tracking */
+ struct list_head rmap_idle; /* idle rmaps */
+ struct list_head extlist; /* refcount extents */
+ struct list_head btlist; /* old refcountbt blocks */
+ struct xfs_scrub_context *sc;
+ xfs_extlen_t btblocks; /* # of refcountbt blocks */
+};
+
+/* Grab the next record from the rmapbt. */
+STATIC int
+xfs_repair_refcountbt_next_rmap(
+ struct xfs_btree_cur *cur,
+ struct xfs_repair_refc *rr,
+ struct xfs_rmap_irec *rec,
+ bool *have_rec)
+{
+ struct xfs_rmap_irec rmap;
+ struct xfs_mount *mp = cur->bc_mp;
+ struct xfs_repair_refc_extent *rre;
+ xfs_fsblock_t fsbno;
+ int have_gt;
+ int error = 0;
+
+ *have_rec = false;
+ /*
+ * Loop through the remaining rmaps. Remember CoW staging
+ * extents and the refcountbt blocks from the old tree for later
+ * disposal. We can only share written data fork extents, so
+ * keep looping until we find an rmap for one.
+ */
+ do {
+ if (xfs_scrub_should_terminate(&error))
+ goto out_error;
+
+ error = xfs_btree_increment(cur, 0, &have_gt);
+ if (error)
+ goto out_error;
+ if (!have_gt)
+ return 0;
+
+ error = xfs_rmap_get_rec(cur, &rmap, &have_gt);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, have_gt == 1, out_error);
+
+ if (rmap.rm_owner == XFS_RMAP_OWN_COW) {
+ /* Pass CoW staging extents right through. */
+ rre = kmem_alloc(sizeof(struct xfs_repair_refc_extent),
+ KM_MAYFAIL | KM_NOFS);
+ if (!rre)
+ goto out_error;
+
+ INIT_LIST_HEAD(&rre->list);
+ rre->refc.rc_startblock = rmap.rm_startblock +
+ XFS_REFC_COW_START;
+ rre->refc.rc_blockcount = rmap.rm_blockcount;
+ rre->refc.rc_refcount = 1;
+ list_add_tail(&rre->list, &rr->extlist);
+ } else if (rmap.rm_owner == XFS_RMAP_OWN_REFC) {
+ /* refcountbt block, dump it when we're done. */
+ rr->btblocks += rmap.rm_blockcount;
+ fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
+ cur->bc_private.a.agno,
+ rmap.rm_startblock);
+ error = xfs_repair_collect_btree_extent(rr->sc,
+ &rr->btlist, fsbno, rmap.rm_blockcount);
+ if (error)
+ goto out_error;
+ }
+ } while (XFS_RMAP_NON_INODE_OWNER(rmap.rm_owner) ||
+ xfs_internal_inum(mp, rmap.rm_owner) ||
+ (rmap.rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK |
+ XFS_RMAP_UNWRITTEN)));
+
+ *rec = rmap;
+ *have_rec = true;
+ return 0;
+
+out_error:
+ return error;
+}
+
+/* Recycle an idle rmap or allocate a new one. */
+static struct xfs_repair_refc_rmap *
+xfs_repair_refcountbt_get_rmap(
+ struct xfs_repair_refc *rr)
+{
+ struct xfs_repair_refc_rmap *rrm;
+
+ if (list_empty(&rr->rmap_idle)) {
+ rrm = kmem_alloc(sizeof(struct xfs_repair_refc_rmap),
+ KM_MAYFAIL | KM_NOFS);
+ if (!rrm)
+ return NULL;
+ INIT_LIST_HEAD(&rrm->list);
+ return rrm;
+ }
+
+ rrm = list_first_entry(&rr->rmap_idle, struct xfs_repair_refc_rmap,
+ list);
+ list_del_init(&rrm->list);
+ return rrm;
+}
+
+/* Compare two btree extents. */
+static int
+xfs_repair_refcount_extent_cmp(
+ void *priv,
+ struct list_head *a,
+ struct list_head *b)
+{
+ struct xfs_repair_refc_extent *ap;
+ struct xfs_repair_refc_extent *bp;
+
+ ap = container_of(a, struct xfs_repair_refc_extent, list);
+ bp = container_of(b, struct xfs_repair_refc_extent, list);
+
+ if (ap->refc.rc_startblock > bp->refc.rc_startblock)
+ return 1;
+ else if (ap->refc.rc_startblock < bp->refc.rc_startblock)
+ return -1;
+ return 0;
+}
+
+/* Record a reference count extent. */
+STATIC int
+xfs_repair_refcountbt_new_refc(
+ struct xfs_scrub_context *sc,
+ struct xfs_repair_refc *rr,
+ xfs_agblock_t agbno,
+ xfs_extlen_t len,
+ xfs_nlink_t refcount)
+{
+ struct xfs_repair_refc_extent *rre;
+ struct xfs_refcount_irec irec;
+
+ irec.rc_startblock = agbno;
+ irec.rc_blockcount = len;
+ irec.rc_refcount = refcount;
+
+ trace_xfs_repair_refcount_extent_fn(sc->mp, sc->sa.agno,
+ &irec);
+
+ rre = kmem_alloc(sizeof(struct xfs_repair_refc_extent),
+ KM_MAYFAIL | KM_NOFS);
+ if (!rre)
+ return -ENOMEM;
+ INIT_LIST_HEAD(&rre->list);
+ rre->refc = irec;
+ list_add_tail(&rre->list, &rr->extlist);
+
+ return 0;
+}
+
+/* Rebuild the refcount btree. */
+#define RMAP_END(r) ((r).rm_startblock + (r).rm_blockcount)
+int
+xfs_repair_refcountbt(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_repair_refc rr;
+ struct xfs_rmap_irec rmap;
+ struct xfs_owner_info oinfo;
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_repair_refc_rmap *rrm;
+ struct xfs_repair_refc_rmap *n;
+ struct xfs_repair_refc_extent *rre;
+ struct xfs_repair_refc_extent *o;
+ struct xfs_buf *bp = NULL;
+ struct xfs_agf *agf;
+ struct xfs_btree_cur *cur = NULL;
+ struct xfs_perag *pag;
+ uint64_t nr_records;
+ xfs_fsblock_t btfsb;
+ size_t old_stack_sz;
+ size_t stack_sz = 0;
+ xfs_agblock_t sbno;
+ xfs_agblock_t cbno;
+ xfs_agblock_t nbno;
+ bool have;
+ int have_gt;
+ int error = 0;
+
+ /* We require the rmapbt to rebuild anything. */
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return -EOPNOTSUPP;
+
+ INIT_LIST_HEAD(&rr.rmap_bag);
+ INIT_LIST_HEAD(&rr.rmap_idle);
+ INIT_LIST_HEAD(&rr.extlist);
+ INIT_LIST_HEAD(&rr.btlist);
+ rr.btblocks = 0;
+ rr.sc = sc;
+
+ nr_records = 0;
+ xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC);
+
+ /* Start the rmapbt cursor to the left of all records. */
+ cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
+ error = xfs_rmap_lookup_le(cur, 0, 0, 0, 0, 0, &have_gt);
+ if (error)
+ goto out;
+ ASSERT(have_gt == 0);
+
+ /* Process reverse mappings into refcount data. */
+ while (xfs_btree_has_more_records(cur)) {
+ /* Push all rmaps with pblk == sbno onto the stack */
+ error = xfs_repair_refcountbt_next_rmap(cur, &rr, &rmap, &have);
+ if (error)
+ goto out;
+ if (!have)
+ break;
+ sbno = cbno = rmap.rm_startblock;
+ while (have && rmap.rm_startblock == sbno) {
+ rrm = xfs_repair_refcountbt_get_rmap(&rr);
+ if (!rrm)
+ goto out;
+ rrm->rmap = rmap;
+ list_add_tail(&rrm->list, &rr.rmap_bag);
+ stack_sz++;
+ error = xfs_repair_refcountbt_next_rmap(cur, &rr, &rmap,
+ &have);
+ if (error)
+ goto out;
+ }
+ error = xfs_btree_decrement(cur, 0, &have_gt);
+ if (error)
+ goto out;
+ XFS_WANT_CORRUPTED_GOTO(mp, have_gt, out);
+
+ /* Set nbno to the bno of the next refcount change */
+ nbno = have ? rmap.rm_startblock : NULLAGBLOCK;
+ list_for_each_entry(rrm, &rr.rmap_bag, list)
+ nbno = min_t(xfs_agblock_t, nbno, RMAP_END(rrm->rmap));
+
+ ASSERT(nbno > sbno);
+ old_stack_sz = stack_sz;
+
+ /* While stack isn't empty... */
+ while (stack_sz) {
+ /* Pop all rmaps that end at nbno */
+ list_for_each_entry_safe(rrm, n, &rr.rmap_bag, list) {
+ if (RMAP_END(rrm->rmap) != nbno)
+ continue;
+ stack_sz--;
+ list_del_init(&rrm->list);
+ list_add(&rrm->list, &rr.rmap_idle);
+ }
+
+ /* Push array items that start at nbno */
+ error = xfs_repair_refcountbt_next_rmap(cur, &rr, &rmap,
+ &have);
+ if (error)
+ goto out;
+ while (have && rmap.rm_startblock == nbno) {
+ rrm = xfs_repair_refcountbt_get_rmap(&rr);
+ if (!rrm)
+ goto out;
+ rrm->rmap = rmap;
+ list_add_tail(&rrm->list, &rr.rmap_bag);
+ stack_sz++;
+ error = xfs_repair_refcountbt_next_rmap(cur,
+ &rr, &rmap, &have);
+ if (error)
+ goto out;
+ }
+ error = xfs_btree_decrement(cur, 0, &have_gt);
+ if (error)
+ goto out;
+ XFS_WANT_CORRUPTED_GOTO(mp, have_gt, out);
+
+ /* Emit refcount if necessary */
+ ASSERT(nbno > cbno);
+ if (stack_sz != old_stack_sz) {
+ if (old_stack_sz > 1) {
+ error = xfs_repair_refcountbt_new_refc(
+ sc, &rr, cbno,
+ nbno - cbno,
+ old_stack_sz);
+ if (error)
+ goto out;
+ nr_records++;
+ }
+ cbno = nbno;
+ }
+
+ /* Stack empty, go find the next rmap */
+ if (stack_sz == 0)
+ break;
+ old_stack_sz = stack_sz;
+ sbno = nbno;
+
+ /* Set nbno to the bno of the next refcount change */
+ nbno = have ? rmap.rm_startblock : NULLAGBLOCK;
+ list_for_each_entry(rrm, &rr.rmap_bag, list)
+ nbno = min_t(xfs_agblock_t, nbno,
+ RMAP_END(rrm->rmap));
+
+ /* Emit reverse mappings, if needed */
+ ASSERT(nbno > sbno);
+ }
+ }
+ ASSERT(list_empty(&rr.rmap_bag));
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+
+ /* Free all the rmap records. */
+ list_for_each_entry_safe(rrm, n, &rr.rmap_idle, list) {
+ list_del(&rrm->list);
+ kmem_free(rrm);
+ }
+ list_for_each_entry_safe(rrm, n, &rr.rmap_bag, list) {
+ list_del(&rrm->list);
+ kmem_free(rrm);
+ }
+
+ /* Do we actually have enough space to do this? */
+ pag = xfs_perag_get(mp, sc->sa.agno);
+ if (!xfs_repair_ag_has_space(pag,
+ xfs_refcountbt_calc_size(mp, nr_records),
+ XFS_AG_RESV_METADATA)) {
+ xfs_perag_put(pag);
+ error = -ENOSPC;
+ goto out;
+ }
+ xfs_perag_put(pag);
+
+ /* Invalidate all the refcountbt blocks in btlist. */
+ error = xfs_repair_invalidate_blocks(sc, &rr.btlist);
+ if (error)
+ goto out;
+
+ agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
+ /* Initialize a new btree root. */
+ error = xfs_repair_alloc_ag_block(sc, &oinfo, &btfsb,
+ XFS_AG_RESV_METADATA);
+ if (error)
+ goto out;
+ error = xfs_repair_init_btblock(sc, btfsb, &bp, XFS_BTNUM_REFC,
+ &xfs_refcountbt_buf_ops);
+ if (error)
+ goto out;
+ agf->agf_refcount_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, btfsb));
+ agf->agf_refcount_level = cpu_to_be32(1);
+ agf->agf_refcount_blocks = cpu_to_be32(1);
+ xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, XFS_AGF_REFCOUNT_BLOCKS |
+ XFS_AGF_REFCOUNT_ROOT | XFS_AGF_REFCOUNT_LEVEL);
+ error = xfs_repair_roll_ag_trans(sc);
+ if (error)
+ goto out;
+
+ /* Insert records into the new btree. */
+ list_sort(NULL, &rr.extlist, xfs_repair_refcount_extent_cmp);
+ list_for_each_entry_safe(rre, o, &rr.extlist, list) {
+ /* Insert into the refcountbt. */
+ cur = xfs_refcountbt_init_cursor(mp, sc->tp, sc->sa.agf_bp,
+ sc->sa.agno, NULL);
+ error = xfs_refcount_lookup_eq(cur, rre->refc.rc_startblock,
+ &have_gt);
+ if (error)
+ goto out;
+ XFS_WANT_CORRUPTED_GOTO(mp, have_gt == 0, out);
+ error = xfs_refcount_insert(cur, &rre->refc, &have_gt);
+ if (error)
+ goto out;
+ XFS_WANT_CORRUPTED_GOTO(mp, have_gt == 1, out);
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+
+ error = xfs_repair_roll_ag_trans(sc);
+ if (error)
+ goto out;
+
+ list_del(&rre->list);
+ kmem_free(rre);
+ }
+
+ /* Free the old refcountbt blocks if they're not in use. */
+ error = xfs_repair_reap_btree_extents(sc, &rr.btlist, &oinfo,
+ XFS_AG_RESV_METADATA);
+ if (error)
+ goto out;
+
+ return error;
+
+out:
+ if (cur)
+ xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+ xfs_repair_cancel_btree_extents(sc, &rr.btlist);
+ list_for_each_entry_safe(rrm, n, &rr.rmap_idle, list) {
+ list_del(&rrm->list);
+ kmem_free(rrm);
+ }
+ list_for_each_entry_safe(rrm, n, &rr.rmap_bag, list) {
+ list_del(&rrm->list);
+ kmem_free(rrm);
+ }
+ list_for_each_entry_safe(rre, o, &rr.extlist, list) {
+ list_del(&rre->list);
+ kmem_free(rre);
+ }
+ return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 43c7cd2..303afa9 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -80,5 +80,6 @@ int xfs_repair_agi(struct xfs_scrub_context *sc);
int xfs_repair_allocbt(struct xfs_scrub_context *sc);
int xfs_repair_iallocbt(struct xfs_scrub_context *sc);
int xfs_repair_rmapbt(struct xfs_scrub_context *sc);
+int xfs_repair_refcountbt(struct xfs_scrub_context *sc);
#endif /* __XFS_SCRUB_REPAIR_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 87b1dec..2e2ed4a 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -271,6 +271,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
{ /* refcountbt */
.setup = xfs_scrub_setup_ag_refcountbt,
.scrub = xfs_scrub_refcountbt,
+ .repair = xfs_repair_refcountbt,
.has = xfs_sb_version_hasreflink,
},
{ /* inode record */
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 17/19] xfs: online repair of inodes
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (15 preceding siblings ...)
2017-08-25 22:18 ` [PATCH 16/19] xfs: repair refcount btrees Darrick J. Wong
@ 2017-08-25 22:18 ` Darrick J. Wong
2017-08-25 22:18 ` [PATCH 18/19] xfs: repair inode block maps Darrick J. Wong
2017-08-25 22:18 ` [PATCH 19/19] xfs: repair damaged symlinks Darrick J. Wong
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:18 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Try to reinitialize corrupt inodes, or clear the reflink flag
if it's not needed.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/scrub/inode.c | 188 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/repair.h | 1
fs/xfs/scrub/scrub.c | 1
3 files changed, 190 insertions(+)
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index 44201ab..2316c76 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -40,10 +40,13 @@
#include "xfs_rmap.h"
#include "xfs_bmap.h"
#include "xfs_bmap_util.h"
+#include "xfs_dir2.h"
+#include "xfs_quota_defs.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/trace.h"
+#include "scrub/repair.h"
/* Set us up with an inode. */
int
@@ -437,3 +440,188 @@ xfs_scrub_inode(
xfs_trans_brelse(sc->tp, bp);
return error;
}
+
+/* Repair an inode's fields. */
+int
+xfs_repair_inode(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_imap imap;
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_buf *bp;
+ struct xfs_dinode *dip;
+ struct xfs_inode *ip;
+ xfs_ino_t ino;
+ xfs_filblks_t count;
+ xfs_filblks_t acount;
+ uint64_t flags2;
+ xfs_extnum_t nextents;
+ uint16_t flags;
+ uint16_t mode;
+ bool invalidate_quota = false;
+ int error = 0;
+
+ if (!xfs_sb_version_hascrc(&mp->m_sb))
+ return -EOPNOTSUPP;
+
+ if (sc->ip && xfs_scrub_preen_only(sc->sm))
+ goto preen_only;
+
+ /* Are we fixing this thing manually? */
+ if (!sc->ip) {
+ /* Map & read inode. */
+ ino = sc->sm->sm_ino;
+ error = xfs_imap(mp, sc->tp, ino, &imap, XFS_IGET_UNTRUSTED);
+ if (error)
+ goto out;
+
+ error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+ imap.im_blkno, imap.im_len, XBF_UNMAPPED, &bp,
+ NULL);
+ if (error)
+ goto out;
+
+ /* Fix everything the verifier will complain about. */
+ bp->b_ops = &xfs_inode_buf_ops;
+ dip = xfs_buf_offset(bp, imap.im_boffset);
+ mode = be16_to_cpu(dip->di_mode);
+ if (mode && xfs_mode_to_ftype(mode) == XFS_DIR3_FT_UNKNOWN)
+ mode = S_IFREG;
+ dip->di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
+ if (!xfs_dinode_good_version(mp, dip->di_version))
+ dip->di_version = 3;
+ dip->di_ino = cpu_to_be64(ino);
+ uuid_copy(&dip->di_uuid, &mp->m_sb.sb_meta_uuid);
+ flags = be16_to_cpu(dip->di_flags);
+ flags2 = be64_to_cpu(dip->di_flags2);
+ if (xfs_sb_version_hasreflink(&mp->m_sb) && S_ISREG(mode))
+ flags2 |= XFS_DIFLAG2_REFLINK;
+ else
+ flags2 &= ~(XFS_DIFLAG2_REFLINK |
+ XFS_DIFLAG2_COWEXTSIZE);
+ if (flags & XFS_DIFLAG_REALTIME)
+ flags2 &= ~XFS_DIFLAG2_REFLINK;
+ if (flags2 & XFS_DIFLAG2_REFLINK)
+ flags2 &= ~XFS_DIFLAG2_DAX;
+ dip->di_flags = cpu_to_be16(flags);
+ dip->di_flags2 = cpu_to_be64(flags2);
+ dip->di_gen = cpu_to_be32(sc->sm->sm_gen);
+ if (be64_to_cpu(dip->di_size) & (1ULL << 63))
+ dip->di_size = cpu_to_be64((1ULL << 63) - 1);
+
+ /* Write out the inode... */
+ xfs_dinode_calc_crc(mp, dip);
+ xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_DINO_BUF);
+ xfs_trans_log_buf(sc->tp, bp, imap.im_boffset,
+ imap.im_boffset + mp->m_sb.sb_inodesize - 1);
+ error = xfs_trans_roll(&sc->tp, NULL);
+ if (error)
+ goto out;
+
+ /* ...and reload it? */
+ error = xfs_iget(mp, sc->tp, ino,
+ XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE,
+ 0, &sc->ip);
+ if (error)
+ goto out;
+ sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL |
+ XFS_ILOCK_EXCL;
+ xfs_ilock(sc->ip, sc->ilock_flags);
+ }
+
+ ip = sc->ip;
+ xfs_trans_ijoin(sc->tp, ip, 0);
+
+ /* di_size */
+ if (!S_ISDIR(VFS_I(ip)->i_mode) && !S_ISREG(VFS_I(ip)->i_mode) &&
+ !S_ISLNK(VFS_I(ip)->i_mode)) {
+ i_size_write(VFS_I(ip), 0);
+ ip->i_d.di_size = 0;
+ }
+
+ /* di_flags */
+ flags = ip->i_d.di_flags;
+ if ((flags & XFS_DIFLAG_IMMUTABLE) && (flags & XFS_DIFLAG_APPEND))
+ flags &= ~XFS_DIFLAG_APPEND;
+
+ if ((flags & XFS_DIFLAG_FILESTREAM) && (flags & XFS_DIFLAG_REALTIME))
+ flags &= ~XFS_DIFLAG_FILESTREAM;
+ ip->i_d.di_flags = flags;
+
+ /* di_nblocks/di_nextents/di_anextents */
+ error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_DATA_FORK,
+ &nextents, &count);
+ if (error)
+ goto out;
+ ip->i_d.di_nextents = nextents;
+
+ error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK,
+ &nextents, &acount);
+ if (error)
+ goto out;
+ ip->i_d.di_anextents = nextents;
+
+ ip->i_d.di_nblocks = count + acount;
+ if (ip->i_d.di_anextents != 0 && ip->i_d.di_forkoff == 0)
+ ip->i_d.di_anextents = 0;
+
+ /* Do we have prealloc blocks? */
+ if (S_ISREG(VFS_I(ip)->i_mode) && !(flags & XFS_DIFLAG_PREALLOC) &&
+ (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS ||
+ ip->i_d.di_format == XFS_DINODE_FMT_BTREE)) {
+ struct xfs_bmbt_irec got;
+ struct xfs_ifork *ifp;
+ xfs_fileoff_t lblk;
+ xfs_extnum_t idx;
+ bool found;
+
+ lblk = XFS_B_TO_FSB(mp, i_size_read(VFS_I(sc->ip)));
+ ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK);
+ found = xfs_iext_lookup_extent(sc->ip, ifp, lblk, &idx, &got);
+ while (found) {
+ if (got.br_startoff >= lblk &&
+ got.br_state == XFS_EXT_NORM) {
+ ip->i_d.di_flags |= XFS_DIFLAG_PREALLOC;
+ break;
+ }
+ lblk = got.br_startoff + got.br_blockcount;
+ found = xfs_iext_get_extent(ifp, ++idx, &got);
+ }
+ }
+
+ /* Invalid uid/gid? */
+ if (ip->i_d.di_uid == cpu_to_be32(-1U)) {
+ ip->i_d.di_uid = 0;
+ VFS_I(ip)->i_mode &= ~(S_ISUID | S_ISGID);
+ if (XFS_IS_UQUOTA_ON(mp))
+ invalidate_quota = true;
+ }
+ if (ip->i_d.di_gid == cpu_to_be32(-1U)) {
+ ip->i_d.di_gid = 0;
+ VFS_I(ip)->i_mode &= ~(S_ISUID | S_ISGID);
+ if (XFS_IS_GQUOTA_ON(mp))
+ invalidate_quota = true;
+ }
+
+ /* Commit inode core changes. */
+ xfs_trans_log_inode(sc->tp, ip, XFS_ILOG_CORE);
+ error = xfs_trans_roll(&sc->tp, ip);
+ if (error)
+ goto out;
+
+ /* We changed uid/gid, force a quotacheck. */
+ if (invalidate_quota) {
+ mp->m_qflags &= ~XFS_ALL_QUOTA_CHKD;
+ spin_lock(&mp->m_sb_lock);
+ mp->m_sb.sb_qflags = mp->m_qflags & XFS_MOUNT_QUOTA_ALL;
+ spin_unlock(&mp->m_sb_lock);
+ xfs_log_sb(sc->tp);
+ }
+
+preen_only:
+ if (xfs_is_reflink_inode(sc->ip))
+ return xfs_reflink_clear_inode_flag(sc->ip, &sc->tp);
+
+out:
+ return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 303afa9..62d0002 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -81,5 +81,6 @@ int xfs_repair_allocbt(struct xfs_scrub_context *sc);
int xfs_repair_iallocbt(struct xfs_scrub_context *sc);
int xfs_repair_rmapbt(struct xfs_scrub_context *sc);
int xfs_repair_refcountbt(struct xfs_scrub_context *sc);
+int xfs_repair_inode(struct xfs_scrub_context *sc);
#endif /* __XFS_SCRUB_REPAIR_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 2e2ed4a..47394a3 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -277,6 +277,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
{ /* inode record */
.setup = xfs_scrub_setup_inode,
.scrub = xfs_scrub_inode,
+ .repair = xfs_repair_inode,
},
{ /* inode data fork */
.setup = xfs_scrub_setup_inode_bmap_data,
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 18/19] xfs: repair inode block maps
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (16 preceding siblings ...)
2017-08-25 22:18 ` [PATCH 17/19] xfs: online repair of inodes Darrick J. Wong
@ 2017-08-25 22:18 ` Darrick J. Wong
2017-08-25 22:18 ` [PATCH 19/19] xfs: repair damaged symlinks Darrick J. Wong
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:18 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Use the reverse-mapping btree information to rebuild an inode fork.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/scrub/bmap.c | 395 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/repair.h | 2
fs/xfs/scrub/scrub.c | 2
3 files changed, 398 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
index 8377521..7858a5e 100644
--- a/fs/xfs/scrub/bmap.c
+++ b/fs/xfs/scrub/bmap.c
@@ -35,15 +35,18 @@
#include "xfs_bmap_util.h"
#include "xfs_bmap_btree.h"
#include "xfs_rmap.h"
+#include "xfs_rmap_btree.h"
#include "xfs_alloc.h"
#include "xfs_ialloc.h"
#include "xfs_refcount.h"
#include "xfs_rtalloc.h"
+#include "xfs_quota.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/btree.h"
#include "scrub/trace.h"
+#include "scrub/repair.h"
/* Set us up with an inode's bmap. */
STATIC int
@@ -53,12 +56,34 @@ __xfs_scrub_setup_inode_bmap(
bool flush_data)
{
struct xfs_mount *mp = sc->mp;
+ unsigned int resblks;
int error;
error = xfs_scrub_get_inode(sc, ip);
if (error)
return error;
+ /*
+ * Guess how many blocks we're going to need to rebuild an
+ * entire bmap. Since we're reloading the btree sequentially
+ * there should be fewer splits.
+ */
+ switch (sc->sm->sm_type) {
+ case XFS_SCRUB_TYPE_BMBTD:
+ resblks = xfs_bmbt_calc_size(mp, sc->ip->i_d.di_nextents);
+ break;
+ case XFS_SCRUB_TYPE_BMBTA:
+ resblks = xfs_bmbt_calc_size(mp, sc->ip->i_d.di_anextents);
+ break;
+ case XFS_SCRUB_TYPE_BMBTC:
+ resblks = 0;
+ break;
+ default:
+ ASSERT(0);
+ error = -EFSCORRUPTED;
+ goto out_rele;
+ }
+
sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
xfs_ilock(sc->ip, sc->ilock_flags);
@@ -79,7 +104,7 @@ __xfs_scrub_setup_inode_bmap(
/* Got the inode, lock it and we're ready to go. */
error = xfs_scrub_trans_alloc(sc->sm, mp, &M_RES(mp)->tr_itruncate,
- 0, 0, 0, &sc->tp);
+ resblks, 0, 0, &sc->tp);
if (error)
goto out_unlock;
sc->ilock_flags |= XFS_ILOCK_EXCL;
@@ -88,6 +113,7 @@ __xfs_scrub_setup_inode_bmap(
return 0;
out_unlock:
xfs_iunlock(sc->ip, sc->ilock_flags);
+out_rele:
if (sc->ip != ip)
iput(VFS_I(sc->ip));
sc->ip = NULL;
@@ -565,3 +591,370 @@ xfs_scrub_bmap_cow(
return xfs_scrub_bmap(sc, XFS_COW_FORK);
}
+
+/* Inode fork block mapping (BMBT) repair. */
+
+struct xfs_repair_bmap_extent {
+ struct list_head list;
+ struct xfs_rmap_irec rmap;
+ xfs_agnumber_t agno;
+};
+
+struct xfs_repair_bmap {
+ struct list_head extlist;
+ struct list_head btlist;
+ struct xfs_repair_bmap_extent ext; /* most files have 1 extent */
+ struct xfs_scrub_context *sc;
+ xfs_ino_t ino;
+ xfs_fileoff_t wantblks;
+ xfs_fileoff_t blocks;
+ xfs_rfsblock_t bmbt_blocks;
+ int whichfork;
+};
+
+/* Record extents that belong to this inode's fork. */
+STATIC int
+xfs_repair_bmap_extent_fn(
+ struct xfs_btree_cur *cur,
+ struct xfs_rmap_irec *rec,
+ void *priv)
+{
+ struct xfs_repair_bmap *rb = priv;
+ struct xfs_repair_bmap_extent *rbe;
+ struct xfs_mount *mp = cur->bc_mp;
+ xfs_fsblock_t fsbno;
+ int error = 0;
+
+ if (xfs_scrub_should_terminate(&error))
+ return error;
+
+ /* Skip extents which are not owned by this inode and fork. */
+ if (rec->rm_owner != rb->ino)
+ return 0;
+ else if (rb->whichfork == XFS_DATA_FORK &&
+ (rec->rm_flags & XFS_RMAP_ATTR_FORK))
+ return 0;
+ else if (rb->whichfork == XFS_ATTR_FORK &&
+ !(rec->rm_flags & XFS_RMAP_ATTR_FORK))
+ return 0;
+
+ /* Delete the old bmbt blocks later. */
+ if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) {
+ fsbno = XFS_AGB_TO_FSB(mp, cur->bc_private.a.agno,
+ rec->rm_startblock);
+ rb->bmbt_blocks += rec->rm_blockcount;
+ return xfs_repair_collect_btree_extent(rb->sc, &rb->btlist,
+ fsbno, rec->rm_blockcount);
+ }
+
+ /* Remember this rmap. */
+ trace_xfs_repair_bmap_extent_fn(mp, cur->bc_private.a.agno,
+ rec->rm_startblock, rec->rm_blockcount, rec->rm_owner,
+ rec->rm_offset, rec->rm_flags);
+
+ if (list_empty(&rb->extlist)) {
+ rbe = &rb->ext;
+ } else {
+ rbe = kmem_alloc(sizeof(struct xfs_repair_bmap_extent),
+ KM_MAYFAIL | KM_NOFS);
+ if (!rbe)
+ return -ENOMEM;
+ }
+
+ INIT_LIST_HEAD(&rbe->list);
+ rbe->rmap = *rec;
+ rbe->agno = cur->bc_private.a.agno;
+ list_add_tail(&rbe->list, &rb->extlist);
+
+ rb->blocks += rec->rm_blockcount;
+ if (rb->blocks >= rb->wantblks)
+ return XFS_BTREE_QUERY_RANGE_ABORT;
+
+ return 0;
+}
+
+/* Compare two bmap extents. */
+static int
+xfs_repair_bmap_extent_cmp(
+ void *priv,
+ struct list_head *a,
+ struct list_head *b)
+{
+ struct xfs_repair_bmap_extent *ap;
+ struct xfs_repair_bmap_extent *bp;
+
+ ap = container_of(a, struct xfs_repair_bmap_extent, list);
+ bp = container_of(b, struct xfs_repair_bmap_extent, list);
+
+ if (ap->rmap.rm_offset > bp->rmap.rm_offset)
+ return 1;
+ else if (ap->rmap.rm_offset < bp->rmap.rm_offset)
+ return -1;
+ return 0;
+}
+
+/* Scan one AG for reverse mappings that we can turn into extent maps. */
+STATIC int
+xfs_repair_bmap_scan_ag(
+ struct xfs_repair_bmap *rb,
+ xfs_agnumber_t agno)
+{
+ struct xfs_scrub_context *sc = rb->sc;
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_buf *agf_bp = NULL;
+ struct xfs_btree_cur *cur;
+ int error;
+
+ error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, &agf_bp);
+ if (error)
+ return error;
+ if (!agf_bp)
+ return -ENOMEM;
+ cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, agno);
+ error = xfs_rmap_query_all(cur, xfs_repair_bmap_extent_fn, rb);
+ if (error == XFS_BTREE_QUERY_RANGE_ABORT)
+ error = 0;
+ xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
+ XFS_BTREE_NOERROR);
+ xfs_trans_brelse(sc->tp, agf_bp);
+ return error;
+}
+
+/*
+ * Estimate how many blocks we ought to find for the fork we're rebuilding.
+ * This ought to be di_nblocks - blocks_in_other_fork, but watch for
+ * obviously bad values.
+ */
+STATIC xfs_filblks_t
+xfs_repair_bmap_estimate_blocks(
+ struct xfs_scrub_context *sc,
+ int whichfork)
+{
+ xfs_filblks_t blks;
+ xfs_extnum_t nex;
+ int otherfork;
+ int error;
+
+ if (sc->ip->i_d.di_nblocks >= sc->mp->m_sb.sb_dblocks)
+ return ULLONG_MAX;
+
+ otherfork = whichfork == XFS_DATA_FORK ? XFS_ATTR_FORK : XFS_DATA_FORK;
+ error = xfs_bmap_count_blocks(sc->tp, sc->ip, otherfork, &nex, &blks);
+ if (error)
+ return ULLONG_MAX;
+
+ if ((otherfork == XFS_ATTR_FORK && nex > USHRT_MAX) ||
+ blks > sc->mp->m_sb.sb_dblocks ||
+ blks > sc->ip->i_d.di_nblocks)
+ return ULLONG_MAX;
+
+ return sc->ip->i_d.di_nblocks - blks;
+}
+
+/* Repair an inode fork. */
+STATIC int
+xfs_repair_bmap(
+ struct xfs_scrub_context *sc,
+ int whichfork)
+{
+ struct xfs_repair_bmap rb;
+ struct xfs_bmbt_irec bmap;
+ struct xfs_defer_ops dfops;
+ struct xfs_owner_info oinfo;
+ struct xfs_inode *ip = sc->ip;
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_repair_bmap_extent *rbe;
+ struct xfs_repair_bmap_extent *n;
+ xfs_fsblock_t firstfsb;
+ xfs_agnumber_t agno;
+ xfs_agnumber_t iagno;
+ xfs_extlen_t extlen;
+ int baseflags;
+ int flags;
+ int error = 0;
+
+ ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_ATTR_FORK);
+
+ /* Don't know how to repair the other fork formats. */
+ if (XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
+ XFS_IFORK_FORMAT(sc->ip, whichfork) != XFS_DINODE_FMT_BTREE)
+ return -EOPNOTSUPP;
+
+ /* Only files, symlinks, and directories get to have data forks. */
+ if (whichfork == XFS_DATA_FORK && !S_ISREG(VFS_I(ip)->i_mode) &&
+ !S_ISDIR(VFS_I(ip)->i_mode) && !S_ISLNK(VFS_I(ip)->i_mode))
+ return -EINVAL;
+
+ /* If we somehow have delalloc extents, forget it. */
+ if (whichfork == XFS_DATA_FORK && ip->i_delayed_blks)
+ return -EBUSY;
+
+ /*
+ * If there's no attr fork area in the inode, there's
+ * no attr fork to rebuild.
+ */
+ if (whichfork == XFS_ATTR_FORK && !XFS_IFORK_Q(ip))
+ return -ENOENT;
+
+ /* We require the rmapbt to rebuild anything. */
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return -EOPNOTSUPP;
+
+ /* Don't know how to rebuild realtime data forks. */
+ if (XFS_IS_REALTIME_INODE(ip) && whichfork == XFS_DATA_FORK)
+ return -EOPNOTSUPP;
+
+ /*
+ * If this is a file data fork, wait for all pending directio to
+ * complete, then tear everything out of the page cache.
+ */
+ if (S_ISREG(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK) {
+ inode_dio_wait(VFS_I(ip));
+ truncate_inode_pages(VFS_I(ip)->i_mapping, 0);
+ }
+
+ /* Collect all reverse mappings for this fork's extents. */
+ memset(&rb, 0, sizeof(rb));
+ INIT_LIST_HEAD(&rb.extlist);
+ INIT_LIST_HEAD(&rb.btlist);
+ rb.ino = ip->i_ino;
+ rb.whichfork = whichfork;
+ rb.sc = sc;
+ rb.wantblks = xfs_repair_bmap_estimate_blocks(sc, whichfork);
+
+ /* Iterate the home AG for extents... */
+ if (rb.wantblks != ULLONG_MAX) {
+ iagno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+ error = xfs_repair_bmap_scan_ag(&rb, iagno);
+ if (error)
+ goto out;
+ } else {
+ iagno = NULLAGNUMBER;
+ }
+
+ /* ...then do the rest if we don't find all the blocks. */
+ for (agno = 0;
+ agno < mp->m_sb.sb_agcount && rb.blocks < rb.wantblks;
+ agno++) {
+ if (agno == iagno)
+ continue;
+ error = xfs_repair_bmap_scan_ag(&rb, agno);
+ if (error)
+ goto out;
+ }
+
+ /* Blow out the in-core fork and zero the on-disk fork. */
+ xfs_trans_ijoin(sc->tp, sc->ip, 0);
+ if (XFS_IFORK_PTR(ip, whichfork) != NULL)
+ xfs_idestroy_fork(sc->ip, whichfork);
+ XFS_IFORK_FMT_SET(sc->ip, whichfork, XFS_DINODE_FMT_EXTENTS);
+ XFS_IFORK_NEXT_SET(sc->ip, whichfork, 0);
+
+ /* Reinitialize the on-disk fork. */
+ if (whichfork == XFS_DATA_FORK) {
+ memset(&ip->i_df, 0, sizeof(struct xfs_ifork));
+ ip->i_df.if_flags |= XFS_IFEXTENTS;
+ } else if (whichfork == XFS_ATTR_FORK) {
+ if (list_empty(&rb.extlist))
+ ip->i_afp = NULL;
+ else {
+ ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, KM_NOFS);
+ ip->i_afp->if_flags |= XFS_IFEXTENTS;
+ }
+ }
+ xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
+ error = xfs_trans_roll(&sc->tp, sc->ip);
+ if (error)
+ goto out;
+
+ baseflags = XFS_BMAPI_NORMAP;
+ if (whichfork == XFS_ATTR_FORK)
+ baseflags |= XFS_BMAPI_ATTRFORK;
+
+ /* Decrease nblocks to reflect the freed bmbt blocks. */
+ if (rb.bmbt_blocks) {
+ sc->ip->i_d.di_nblocks -= rb.bmbt_blocks;
+ xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
+ xfs_trans_mod_dquot_byino(sc->tp, sc->ip, XFS_TRANS_DQ_BCOUNT,
+ -rb.bmbt_blocks);
+ error = xfs_trans_roll(&sc->tp, sc->ip);
+ if (error)
+ goto out;
+ }
+
+ /* "Remap" the extents into the fork. */
+ list_sort(NULL, &rb.extlist, xfs_repair_bmap_extent_cmp);
+ list_for_each_entry_safe(rbe, n, &rb.extlist, list) {
+ /* Form the "new" mapping... */
+ bmap.br_startblock = XFS_AGB_TO_FSB(mp, rbe->agno,
+ rbe->rmap.rm_startblock);
+ bmap.br_startoff = rbe->rmap.rm_offset;
+ flags = 0;
+ if (rbe->rmap.rm_flags & XFS_RMAP_UNWRITTEN)
+ flags = XFS_BMAPI_PREALLOC;
+ while (rbe->rmap.rm_blockcount > 0) {
+ xfs_defer_init(&dfops, &firstfsb);
+ extlen = min_t(xfs_extlen_t, rbe->rmap.rm_blockcount,
+ MAXEXTLEN);
+ bmap.br_blockcount = extlen;
+
+ /* Drop the block counter... */
+ sc->ip->i_d.di_nblocks -= extlen;
+
+ /* Re-add the extent to the fork. */
+ error = xfs_bmapi_remap(sc->tp, sc->ip,
+ bmap.br_startoff, extlen,
+ bmap.br_startblock, &dfops,
+ baseflags | flags);
+ if (error)
+ goto out;
+
+ bmap.br_startblock += extlen;
+ bmap.br_startoff += extlen;
+ rbe->rmap.rm_blockcount -= extlen;
+ error = xfs_defer_finish(&sc->tp, &dfops, sc->ip);
+ if (error)
+ goto out;
+ /* Make sure we roll the transaction. */
+ error = xfs_trans_roll(&sc->tp, sc->ip);
+ if (error)
+ goto out;
+ }
+ list_del(&rbe->list);
+ if (rbe != &rb.ext)
+ kmem_free(rbe);
+ }
+
+ /* Dispose of all the old bmbt blocks. */
+ xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, whichfork);
+ error = xfs_repair_reap_btree_extents(sc, &rb.btlist, &oinfo,
+ XFS_AG_RESV_NONE);
+ if (error)
+ goto out;
+
+ return error;
+out:
+ xfs_repair_cancel_btree_extents(sc, &rb.btlist);
+ list_for_each_entry_safe(rbe, n, &rb.extlist, list) {
+ list_del(&rbe->list);
+ if (rbe != &rb.ext)
+ kmem_free(rbe);
+ }
+ return error;
+}
+
+/* Repair an inode's data fork. */
+int
+xfs_repair_bmap_data(
+ struct xfs_scrub_context *sc)
+{
+ return xfs_repair_bmap(sc, XFS_DATA_FORK);
+}
+
+/* Repair an inode's attr fork. */
+int
+xfs_repair_bmap_attr(
+ struct xfs_scrub_context *sc)
+{
+ return xfs_repair_bmap(sc, XFS_ATTR_FORK);
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 62d0002..a56f2bc 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -82,5 +82,7 @@ int xfs_repair_iallocbt(struct xfs_scrub_context *sc);
int xfs_repair_rmapbt(struct xfs_scrub_context *sc);
int xfs_repair_refcountbt(struct xfs_scrub_context *sc);
int xfs_repair_inode(struct xfs_scrub_context *sc);
+int xfs_repair_bmap_data(struct xfs_scrub_context *sc);
+int xfs_repair_bmap_attr(struct xfs_scrub_context *sc);
#endif /* __XFS_SCRUB_REPAIR_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 47394a3..79dc9f9 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -282,10 +282,12 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
{ /* inode data fork */
.setup = xfs_scrub_setup_inode_bmap_data,
.scrub = xfs_scrub_bmap_data,
+ .repair = xfs_repair_bmap_data,
},
{ /* inode attr fork */
.setup = xfs_scrub_setup_inode_bmap,
.scrub = xfs_scrub_bmap_attr,
+ .repair = xfs_repair_bmap_attr,
},
{ /* inode CoW fork */
.setup = xfs_scrub_setup_inode_bmap,
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 19/19] xfs: repair damaged symlinks
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
` (17 preceding siblings ...)
2017-08-25 22:18 ` [PATCH 18/19] xfs: repair inode block maps Darrick J. Wong
@ 2017-08-25 22:18 ` Darrick J. Wong
18 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-25 22:18 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Repair inconsistent symbolic link data.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/scrub/repair.h | 1
fs/xfs/scrub/scrub.c | 1
fs/xfs/scrub/symlink.c | 238 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 239 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index a56f2bc..aee17e0 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -84,5 +84,6 @@ int xfs_repair_refcountbt(struct xfs_scrub_context *sc);
int xfs_repair_inode(struct xfs_scrub_context *sc);
int xfs_repair_bmap_data(struct xfs_scrub_context *sc);
int xfs_repair_bmap_attr(struct xfs_scrub_context *sc);
+int xfs_repair_symlink(struct xfs_scrub_context *sc);
#endif /* __XFS_SCRUB_REPAIR_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 79dc9f9..7f9a4f8 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -304,6 +304,7 @@ static const struct xfs_scrub_meta_ops meta_scrub_ops[] = {
{ /* symbolic link */
.setup = xfs_scrub_setup_symlink,
.scrub = xfs_scrub_symlink,
+ .repair = xfs_repair_symlink,
},
{ /* parent pointers */
.setup = xfs_scrub_setup_parent,
diff --git a/fs/xfs/scrub/symlink.c b/fs/xfs/scrub/symlink.c
index e3b5d35..35c0071 100644
--- a/fs/xfs/scrub/symlink.c
+++ b/fs/xfs/scrub/symlink.c
@@ -32,10 +32,13 @@
#include "xfs_inode.h"
#include "xfs_inode_fork.h"
#include "xfs_symlink.h"
+#include "xfs_bmap.h"
+#include "xfs_quota.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/trace.h"
+#include "scrub/repair.h"
/* Set us up to scrub a symbolic link. */
int
@@ -48,7 +51,7 @@ xfs_scrub_setup_symlink(
if (!sc->buf)
return -ENOMEM;
- return xfs_scrub_setup_inode_contents(sc, ip, 0);
+ return xfs_scrub_setup_inode_contents(sc, ip, XFS_SYMLINK_MAPS);
}
/* Symbolic links. */
@@ -90,3 +93,236 @@ xfs_scrub_symlink(
out:
return error;
}
+
+/* Blow out the whole symlink; replace contents. */
+STATIC int
+xfs_repair_symlink_rewrite(
+ struct xfs_trans **tpp,
+ struct xfs_inode *ip,
+ const char *target_path,
+ int pathlen)
+{
+ struct xfs_defer_ops dfops;
+ struct xfs_bmbt_irec mval[XFS_SYMLINK_MAPS];
+ struct xfs_ifork *ifp;
+ const char *cur_chunk;
+ struct xfs_mount *mp = (*tpp)->t_mountp;
+ struct xfs_buf *bp;
+ xfs_fsblock_t first_block;
+ xfs_fileoff_t first_fsb;
+ xfs_filblks_t fs_blocks;
+ xfs_daddr_t d;
+ uint resblks;
+ int byte_cnt;
+ int n;
+ int nmaps;
+ int offset;
+ int error = 0;
+
+ ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+
+ /* Truncate the whole data fork if it wasn't inline. */
+ if (!(ifp->if_flags & XFS_IFINLINE)) {
+ error = xfs_itruncate_extents(tpp, ip, XFS_DATA_FORK, 0);
+ if (error)
+ goto out;
+ }
+
+ /* Blow out the in-core fork and zero the on-disk fork. */
+ xfs_idestroy_fork(ip, XFS_DATA_FORK);
+ ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
+ ip->i_d.di_nextents = 0;
+ memset(&ip->i_df, 0, sizeof(struct xfs_ifork));
+ ip->i_df.if_flags |= XFS_IFEXTENTS;
+
+ /* Rewrite an inline symlink. */
+ if (pathlen <= XFS_IFORK_DSIZE(ip)) {
+ xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);
+
+ i_size_write(VFS_I(ip), pathlen);
+ ip->i_d.di_size = pathlen;
+ ip->i_d.di_format = XFS_DINODE_FMT_LOCAL;
+ xfs_trans_log_inode(*tpp, ip, XFS_ILOG_DDATA | XFS_ILOG_CORE);
+ goto out;
+
+ }
+
+ /* Rewrite a remote symlink. */
+ fs_blocks = xfs_symlink_blocks(mp, pathlen);
+ first_fsb = 0;
+ nmaps = XFS_SYMLINK_MAPS;
+
+ /* Reserve quota for new blocks. */
+ error = xfs_trans_reserve_quota_nblks(*tpp, ip, fs_blocks, 0,
+ XFS_QMOPT_RES_REGBLKS);
+ if (error)
+ goto out;
+
+ /* Map blocks, write symlink target. */
+ xfs_defer_init(&dfops, &first_block);
+
+ error = xfs_bmapi_write(*tpp, ip, first_fsb, fs_blocks,
+ XFS_BMAPI_METADATA, &first_block, fs_blocks,
+ mval, &nmaps, &dfops);
+ if (error)
+ goto out_bmap_cancel;
+
+ if (resblks)
+ resblks -= fs_blocks;
+ ip->i_d.di_size = pathlen;
+ i_size_write(VFS_I(ip), pathlen);
+ xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
+
+ cur_chunk = target_path;
+ offset = 0;
+ for (n = 0; n < nmaps; n++) {
+ char *buf;
+
+ d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock);
+ byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
+ bp = xfs_trans_get_buf(*tpp, mp->m_ddev_targp, d,
+ BTOBB(byte_cnt), 0);
+ if (!bp) {
+ error = -ENOMEM;
+ goto out_bmap_cancel;
+ }
+ bp->b_ops = &xfs_symlink_buf_ops;
+
+ byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
+ byte_cnt = min(byte_cnt, pathlen);
+
+ buf = bp->b_addr;
+ buf += xfs_symlink_hdr_set(mp, ip->i_ino, offset,
+ byte_cnt, bp);
+
+ memcpy(buf, cur_chunk, byte_cnt);
+
+ cur_chunk += byte_cnt;
+ pathlen -= byte_cnt;
+ offset += byte_cnt;
+
+ xfs_trans_buf_set_type(*tpp, bp, XFS_BLFT_SYMLINK_BUF);
+ xfs_trans_log_buf(*tpp, bp, 0, (buf + byte_cnt - 1) -
+ (char *)bp->b_addr);
+ }
+ ASSERT(pathlen == 0);
+
+ error = xfs_defer_finish(tpp, &dfops, NULL);
+ if (error)
+ goto out_bmap_cancel;
+
+ return 0;
+
+out_bmap_cancel:
+ xfs_defer_cancel(&dfops);
+out:
+ return error;
+}
+
+int
+xfs_repair_symlink(
+ struct xfs_scrub_context *sc)
+{
+ struct xfs_bmbt_irec mval[XFS_SYMLINK_MAPS];
+ struct xfs_inode *ip = sc->ip;
+ struct xfs_mount *mp = sc->mp;
+ struct xfs_ifork *ifp;
+ struct xfs_buf *bp;
+ loff_t len;
+ size_t newlen;
+ xfs_daddr_t d;
+ int fsblocks;
+ int nmaps = XFS_SYMLINK_MAPS;
+ int nr;
+ int offset;
+ int n;
+ int byte_cnt;
+ int error = 0;
+
+ ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+ len = i_size_read(VFS_I(ip));
+ xfs_trans_ijoin(sc->tp, ip, 0);
+
+ /* Truncate the inode if there's a zero inside the length. */
+ if (ifp->if_flags & XFS_IFINLINE) {
+ if (ifp->if_u1.if_data)
+ newlen = strnlen(ifp->if_u1.if_data,
+ XFS_IFORK_DSIZE(ip));
+ else {
+ newlen = 1;
+ ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
+ ifp->if_u1.if_data[0] = '/';
+ }
+ if (len > newlen) {
+ i_size_write(VFS_I(ip), newlen);
+ ip->i_d.di_size = newlen;
+ xfs_trans_log_inode(sc->tp, ip, XFS_ILOG_DDATA |
+ XFS_ILOG_CORE);
+ }
+ goto out;
+ }
+
+ fsblocks = xfs_symlink_blocks(mp, len);
+ error = xfs_bmapi_read(ip, 0, fsblocks, mval, &nmaps, 0);
+ if (error)
+ goto out;
+
+ /* Fix everything that fails the verifiers. */
+ offset = 0;
+ for (n = 0; n < nmaps; n++) {
+ d = XFS_FSB_TO_DADDR(mp, mval[n].br_startblock);
+ byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
+
+ error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
+ d, BTOBB(byte_cnt), 0, &bp, NULL);
+ if (error)
+ goto out;
+ bp->b_ops = &xfs_symlink_buf_ops;
+
+ byte_cnt = XFS_SYMLINK_BUF_SPACE(mp, byte_cnt);
+ if (len < byte_cnt)
+ byte_cnt = len;
+
+ nr = xfs_symlink_hdr_set(mp, ip->i_ino, offset, byte_cnt, bp);
+
+ len -= byte_cnt;
+ offset += byte_cnt;
+
+ xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_SYMLINK_BUF);
+ xfs_trans_log_buf(sc->tp, bp, 0, nr - 1);
+ xfs_trans_brelse(sc->tp, bp);
+ }
+ if (len != 0) {
+ error = -EFSCORRUPTED;
+ goto out;
+ }
+
+ /* Roll transaction, release buffers. */
+ error = xfs_trans_roll(&sc->tp, ip);
+ if (error)
+ goto out;
+
+ /* Size set correctly? */
+ len = i_size_read(VFS_I(ip));
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ error = xfs_readlink(ip, sc->buf);
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ if (error)
+ goto out;
+
+ /*
+ * Figure out the new target length. We can't handle zero-length
+ * symlinks, so make sure that we don't write that out.
+ */
+ newlen = strnlen(sc->buf, XFS_SYMLINK_MAXLEN);
+ if (newlen == 0) {
+ *((char *)sc->buf) = '/';
+ newlen = 1;
+ }
+
+ if (len > newlen)
+ error = xfs_repair_symlink_rewrite(&sc->tp, ip, sc->buf,
+ newlen);
+out:
+ return error;
+}
^ permalink raw reply related [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-08-25 22:19 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-25 22:16 [PATCH v9 00/19] xfs: online fs repair support Darrick J. Wong
2017-08-25 22:16 ` [PATCH 01/19] xfs: add helpers to calculate btree size Darrick J. Wong
2017-08-25 22:17 ` [PATCH 02/19] xfs: expose various functions to repair code Darrick J. Wong
2017-08-25 22:17 ` [PATCH 03/19] xfs: add repair helpers for the reverse mapping btree Darrick J. Wong
2017-08-25 22:17 ` [PATCH 04/19] xfs: add repair helpers for the reference count btree Darrick J. Wong
2017-08-25 22:17 ` [PATCH 05/19] xfs: add BMAPI_NORMAP flag to perform block remapping without updating rmpabt Darrick J. Wong
2017-08-25 22:17 ` [PATCH 06/19] xfs: halt auto-reclamation activities while rebuilding rmap Darrick J. Wong
2017-08-25 22:17 ` [PATCH 07/19] xfs: create tracepoints for online repair Darrick J. Wong
2017-08-25 22:17 ` [PATCH 08/19] xfs: implement the metadata repair ioctl flag Darrick J. Wong
2017-08-25 22:17 ` [PATCH 09/19] xfs: add helper routines for the repair code Darrick J. Wong
2017-08-25 22:17 ` [PATCH 10/19] xfs: repair superblocks Darrick J. Wong
2017-08-25 22:18 ` [PATCH 11/19] xfs: repair the AGF and AGFL Darrick J. Wong
2017-08-25 22:18 ` [PATCH 12/19] xfs: rebuild the AGI Darrick J. Wong
2017-08-25 22:18 ` [PATCH 13/19] xfs: repair free space btrees Darrick J. Wong
2017-08-25 22:18 ` [PATCH 14/19] xfs: repair inode btrees Darrick J. Wong
2017-08-25 22:18 ` [PATCH 15/19] xfs: rebuild the rmapbt Darrick J. Wong
2017-08-25 22:18 ` [PATCH 16/19] xfs: repair refcount btrees Darrick J. Wong
2017-08-25 22:18 ` [PATCH 17/19] xfs: online repair of inodes Darrick J. Wong
2017-08-25 22:18 ` [PATCH 18/19] xfs: repair inode block maps Darrick J. Wong
2017-08-25 22:18 ` [PATCH 19/19] xfs: repair damaged symlinks Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).