linux-bcachefs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] bcachefs - semvar, forward compatibility
@ 2023-07-09 17:15 Kent Overstreet
  2023-07-09 17:15 ` [PATCH 01/10] bcachefs: Allow for unknown btree IDs Kent Overstreet
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 17:15 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, bfoster, sandeen

So - in the upstreaming discussion, Brian mentioned code review, so now
seems like a good time to start making sure bcachefs patches hit the
list.

In the last cabal meeting, we started talking about on disk
compatibility issues for mainlining. Since (IIRC) the first release,
we've maintained backwards compatibility (support for very old versions
has been dropped, but there's always been an upgrade path) - but we
generally haven't been addressing forwards compatibility yet - we've
been doing a lot of forced incompatible version upgrades, where the old
version is no longer able to mount the filesystem after it's been
mounted by the new version.

Obviously, we can't do that anymore after we're in mainline and out of
EXPERIMENTAL. There were two main issues to address:

Major/minor version numbers
---------------------------

bcachefs started out with the traditional compatible/incompatible
feature bits in the superblock, but they are no longer my preferred
approach.

The problem with feature bits is that there was an ordering in which new
on disk format features were released, and feature bits lose that
ordering: they make it possible for users to create filesystems where x,
y, and z modern feature bits are enabled, but not feature bit a from 5
years ago - and the code was never written to expect that and you
certainly never tested that configuration, so things break in incredibly
fun ways.

Assigning every new on disk feature a distinct version number instead of
a feature bit preserves this ordering and makes it impossible for users
to create or use filesystems with features selected that historically
should not have existed. This has been the practice in bcachefs for
awhile now, and I've been quite happy with it.

The missing bit that this patch series adds is to split the version
number field into major and minor versions. Incrementing the minor
version number corresponds to adding a new compat feature flag, if we
were using feature bits: incrementing the major version number
corresponds to adding a new incompatible feature bit.

IOW, we'll allow mounting a filesystem with a version number greater
than the currently supported version as long as the major version number
is the same. As with compat feature bits, if you do so the filesystem
will be downgraded to the currently supported version, indicating those
new on disk structures may now be inconsistent.

Forwards compatibility of on disk structures:
---------------------------------------------

We need to be able to roll out new on disk data structures without
causing problems for old versions - old versions should just ignore
metadata they don't understand. This had been planned for in the past
and most of the work was done, so there wasn't much left.

Specifically, we need to be able to roll out new
 - Superblock sections: already handled, audited and cleaned up a bit
 - Journal entry types: already handled, audited
 - Btrees: addressed by this patchset
 - Bkey types: addressed by this patchset
 - New fields for existing bkeys: addressed by the patch series that
   introduced "bch2_bkey_get_val_typed()", but this is the trickiest to
   handle and likely more work will be required

With all this in place, we'll be able to roll out most of the new
features we want that require new on disk data structures as forwards
compatible changes, including everything currently in the pipeline. That
includes

 - Snapshot nodes are gaining skiplist entries soon: this will fix O(n)
   issues with bch2_snapshot_is_ancestor()

 - rebalance_work btree: Rebalance is the last operation that happens
   during normal operation that requires metadata scanning - soon I'll
   be adding a rebalance_work btree that references extents that
   rebalance will have work to do on in the future (e.g. for the
   background_compression or background_target io options). 

 - inodes_deleted btree: After unclean shutdown we still have to scan
   the entire inodes btree for deleted inodes, I'll be adding another
   bitset btree to address this - and also adding a tmpdir feature as
   well.

Things that will require incompatible changes:

 - New key types that replace existing key types, or in general new data
   structures that replace existing data structures

   Where we can maintain both the old and new data structures this isn't
   a problem - e.g. we can roll out a new bch_sb_members_v2 superblock
   section and just also keep writing out bch_sb_members for old
   versions to use; but we won't be able to roll out e.g. a new extent
   key type without an incompatible change.

 - New btree node header/journal entry headers - we'd like bigger
   nonces, so this will need to happen eventually

 - New extent_entry types: this one is a bit unfortunate, because
   extents contain a list of variable size fields (e.g. ptrs, different
   sized crc entries) and the entries themselves don't specify their
   size - the code that's reading it has to know how big every extent
   entry type is.

   This just came up with rebalance_work - rebalance_work needs a new
   extent entry type, so I rolled that out ahead of time so we can roll
   out the rest of the functionality as a compatible change.

Forced version upgrades:
------------------------

Going forward, we will still be doing forced version upgrades for awhile
- but only to forwards-compatible versions. After the next incompatible
(version 2.0) release, we likely won't be doing forced version upgrades
at all anymore.

Currently, version upgrades generally require a fsck. Another thing this
patchset addresses is enumerating all our recovery (including version
upgrade and fsck passes); this will let us specify "upgrading to this
version only requires this pass to run".

Kent Overstreet (10):
  bcachefs: Allow for unknown btree IDs
  bcachefs: Allow for unknown key types
  bcachefs: Refactor bch_sb_field_ops handling
  bcachefs: Change check for invalid key types
  bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE()
  bcachefs: version_upgrade is now an enum
  bcachefs: Kill bch2_bucket_gens_read()
  bcachefs: Stash journal replay params in bch_fs
  bcachefs: Enumerate recovery passes
  bcachefs: bcachefs_metadata_version_major_minor

 fs/bcachefs/alloc_background.c      | 129 +++++-----
 fs/bcachefs/alloc_background.h      |  18 +-
 fs/bcachefs/alloc_foreground.c      |   9 +-
 fs/bcachefs/backpointers.c          |  23 +-
 fs/bcachefs/backpointers.h          |   2 +-
 fs/bcachefs/bcachefs.h              |  62 ++++-
 fs/bcachefs/bcachefs_format.h       |  63 +++--
 fs/bcachefs/bkey_methods.c          |  81 ++++---
 fs/bcachefs/bkey_methods.h          |  20 +-
 fs/bcachefs/btree_cache.c           |  23 +-
 fs/bcachefs/btree_cache.h           |  22 +-
 fs/bcachefs/btree_gc.c              |  26 +-
 fs/bcachefs/btree_io.c              |   9 +-
 fs/bcachefs/btree_iter.c            |   4 +-
 fs/bcachefs/btree_update_interior.c |  18 +-
 fs/bcachefs/btree_update_leaf.c     |  17 +-
 fs/bcachefs/dirent.c                |   3 +-
 fs/bcachefs/dirent.h                |   4 +-
 fs/bcachefs/ec.c                    |   3 +-
 fs/bcachefs/ec.h                    |   4 +-
 fs/bcachefs/extents.c               |  12 +-
 fs/bcachefs/extents.h               |   9 +-
 fs/bcachefs/fsck.c                  |  77 +-----
 fs/bcachefs/fsck.h                  |  10 +-
 fs/bcachefs/inode.c                 |  12 +-
 fs/bcachefs/inode.h                 |  12 +-
 fs/bcachefs/journal_io.c            |  15 +-
 fs/bcachefs/lru.c                   |   3 +-
 fs/bcachefs/lru.h                   |   3 +-
 fs/bcachefs/move.c                  |  10 +-
 fs/bcachefs/opts.c                  |   5 +
 fs/bcachefs/opts.h                  |   5 +-
 fs/bcachefs/quota.c                 |   3 +-
 fs/bcachefs/quota.h                 |   4 +-
 fs/bcachefs/recovery.c              | 353 ++++++++++++++--------------
 fs/bcachefs/reflink.c               |   9 +-
 fs/bcachefs/reflink.h               |   8 +-
 fs/bcachefs/subvolume.c             |  16 +-
 fs/bcachefs/subvolume.h             |  14 +-
 fs/bcachefs/super-io.c              |  91 +++++--
 fs/bcachefs/super-io.h              |   3 +-
 fs/bcachefs/super.c                 |   1 +
 fs/bcachefs/xattr.c                 |   3 +-
 fs/bcachefs/xattr.h                 |   3 +-
 44 files changed, 700 insertions(+), 521 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 01/10] bcachefs: Allow for unknown btree IDs
  2023-07-09 17:15 [PATCH 00/10] bcachefs - semvar, forward compatibility Kent Overstreet
@ 2023-07-09 17:15 ` Kent Overstreet
  2023-07-09 17:15 ` [PATCH 02/10] bcachefs: Allow for unknown key types Kent Overstreet
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 17:15 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, bfoster, sandeen

We need to allow filesystems with metadata from newer versions to be
mountable and usable by older versions.

This patch enables us to roll out new btrees without a new major version
number; we can now handle btree roots for unknown btree types.

The unknown btree roots will be retained, and fsck (including
backpointers) will check them, the same as other btree types.

We add a dynamic array for the extra, unknown btree roots, in addition
to the fixed size btree root array, and add new helpers for looking up
btree roots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/backpointers.c          | 14 ++++++++------
 fs/bcachefs/bcachefs.h              |  3 ++-
 fs/bcachefs/btree_cache.c           | 21 +++++++++++++--------
 fs/bcachefs/btree_cache.h           | 22 +++++++++++++++++++++-
 fs/bcachefs/btree_gc.c              | 22 ++++++++++++++++++----
 fs/bcachefs/btree_io.c              |  2 +-
 fs/bcachefs/btree_iter.c            |  4 ++--
 fs/bcachefs/btree_update_interior.c | 18 +++++++++---------
 fs/bcachefs/move.c                  | 10 ++++++++--
 fs/bcachefs/recovery.c              | 16 ++++++++--------
 fs/bcachefs/super.c                 |  1 +
 11 files changed, 91 insertions(+), 42 deletions(-)

diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c
index 8d0f831550..760c4cc16a 100644
--- a/fs/bcachefs/backpointers.c
+++ b/fs/bcachefs/backpointers.c
@@ -272,6 +272,7 @@ struct bkey_s_c bch2_backpointer_get_key(struct btree_trans *trans,
 					 unsigned iter_flags)
 {
 	struct bch_fs *c = trans->c;
+	struct btree_root *r = bch2_btree_id_root(c, bp.btree_id);
 	struct bpos bucket = bp_pos_to_bucket(c, bp_pos);
 	struct bkey_s_c k;
 
@@ -279,7 +280,7 @@ struct bkey_s_c bch2_backpointer_get_key(struct btree_trans *trans,
 				  bp.btree_id,
 				  bp.pos,
 				  0,
-				  min(bp.level, c->btree_roots[bp.btree_id].level),
+				  min(bp.level, r->level),
 				  iter_flags);
 	k = bch2_btree_iter_peek_slot(iter);
 	if (bkey_err(k)) {
@@ -287,8 +288,8 @@ struct bkey_s_c bch2_backpointer_get_key(struct btree_trans *trans,
 		return k;
 	}
 
-	if (bp.level == c->btree_roots[bp.btree_id].level + 1)
-		k = bkey_i_to_s_c(&c->btree_roots[bp.btree_id].key);
+	if (bp.level == r->level + 1)
+		k = bkey_i_to_s_c(&r->key);
 
 	if (k.k && extent_matches_bp(c, bp.btree_id, bp.level, k, bucket, bp))
 		return k;
@@ -531,6 +532,7 @@ static int check_btree_root_to_backpointers(struct btree_trans *trans,
 					    struct bpos_level *last_flushed)
 {
 	struct bch_fs *c = trans->c;
+	struct btree_root *r = bch2_btree_id_root(c, btree_id);
 	struct btree_iter iter;
 	struct btree *b;
 	struct bkey_s_c k;
@@ -539,8 +541,7 @@ static int check_btree_root_to_backpointers(struct btree_trans *trans,
 	const union bch_extent_entry *entry;
 	int ret;
 
-	bch2_trans_node_iter_init(trans, &iter, btree_id, POS_MIN, 0,
-				  c->btree_roots[btree_id].level, 0);
+	bch2_trans_node_iter_init(trans, &iter, btree_id, POS_MIN, 0, r->level, 0);
 	b = bch2_btree_iter_peek_node(&iter);
 	ret = PTR_ERR_OR_ZERO(b);
 	if (ret)
@@ -640,12 +641,13 @@ static int bch2_check_extents_to_backpointers_pass(struct btree_trans *trans,
 						   struct bpos bucket_start,
 						   struct bpos bucket_end)
 {
+	struct bch_fs *c = trans->c;
 	struct btree_iter iter;
 	enum btree_id btree_id;
 	struct bpos_level last_flushed = { UINT_MAX };
 	int ret = 0;
 
-	for (btree_id = 0; btree_id < BTREE_ID_NR; btree_id++) {
+	for (btree_id = 0; btree_id < btree_id_nr_alive(c); btree_id++) {
 		unsigned depth = btree_type_has_ptrs(btree_id) ? 0 : 1;
 
 		bch2_trans_node_iter_init(trans, &iter, btree_id, POS_MIN, 0,
diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index b8d50fe64b..a8488d4e18 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -749,7 +749,8 @@ struct bch_fs {
 	struct bio_set		btree_bio;
 	struct workqueue_struct	*io_complete_wq;
 
-	struct btree_root	btree_roots[BTREE_ID_NR];
+	struct btree_root	btree_roots_known[BTREE_ID_NR];
+	DARRAY(struct btree_root) btree_roots_extra;
 	struct mutex		btree_root_lock;
 
 	struct btree_cache	btree_cache;
diff --git a/fs/bcachefs/btree_cache.c b/fs/bcachefs/btree_cache.c
index be4d22d7fc..721e2b43e2 100644
--- a/fs/bcachefs/btree_cache.c
+++ b/fs/bcachefs/btree_cache.c
@@ -32,13 +32,15 @@ void bch2_recalc_btree_reserve(struct bch_fs *c)
 {
 	unsigned i, reserve = 16;
 
-	if (!c->btree_roots[0].b)
+	if (!c->btree_roots_known[0].b)
 		reserve += 8;
 
-	for (i = 0; i < BTREE_ID_NR; i++)
-		if (c->btree_roots[i].b)
-			reserve += min_t(unsigned, 1,
-					 c->btree_roots[i].b->c.level) * 8;
+	for (i = 0; i < btree_id_nr_alive(c); i++) {
+		struct btree_root *r = bch2_btree_id_root(c, i);
+
+		if (r->b)
+			reserve += min_t(unsigned, 1, r->b->c.level) * 8;
+	}
 
 	c->btree_cache.reserve = reserve;
 }
@@ -457,9 +459,12 @@ void bch2_fs_btree_cache_exit(struct bch_fs *c)
 
 	kvpfree(c->verify_ondisk, btree_bytes(c));
 
-	for (i = 0; i < BTREE_ID_NR; i++)
-		if (c->btree_roots[i].b)
-			list_add(&c->btree_roots[i].b->list, &bc->live);
+	for (i = 0; i < btree_id_nr_alive(c); i++) {
+		struct btree_root *r = bch2_btree_id_root(c, i);
+
+		if (r->b)
+			list_add(&r->b->list, &bc->live);
+	}
 
 	list_splice(&bc->freeable, &bc->live);
 
diff --git a/fs/bcachefs/btree_cache.h b/fs/bcachefs/btree_cache.h
index 608accd1c4..00c9b92183 100644
--- a/fs/bcachefs/btree_cache.h
+++ b/fs/bcachefs/btree_cache.h
@@ -101,7 +101,27 @@ static inline unsigned btree_blocks(struct bch_fs *c)
 	(BTREE_FOREGROUND_MERGE_THRESHOLD(c) +			\
 	 (BTREE_FOREGROUND_MERGE_THRESHOLD(c) >> 2))
 
-#define btree_node_root(_c, _b)	((_c)->btree_roots[(_b)->c.btree_id].b)
+static inline unsigned btree_id_nr_alive(struct bch_fs *c)
+{
+	return BTREE_ID_NR + c->btree_roots_extra.nr;
+}
+
+static inline struct btree_root *bch2_btree_id_root(struct bch_fs *c, unsigned id)
+{
+	if (likely(id < BTREE_ID_NR)) {
+		return &c->btree_roots_known[id];
+	} else {
+		unsigned idx = id - BTREE_ID_NR;
+
+		EBUG_ON(idx >= c->btree_roots_extra.nr);
+		return &c->btree_roots_extra.data[idx];
+	}
+}
+
+static inline struct btree *btree_node_root(struct bch_fs *c, struct btree *b)
+{
+	return bch2_btree_id_root(c, b->c.btree_id)->b;
+}
 
 void bch2_btree_node_to_text(struct printbuf *, struct bch_fs *,
 			     const struct btree *);
diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c
index e64bfbb860..1fc386761d 100644
--- a/fs/bcachefs/btree_gc.c
+++ b/fs/bcachefs/btree_gc.c
@@ -529,8 +529,13 @@ static int bch2_repair_topology(struct bch_fs *c)
 
 	bch2_trans_init(&trans, c, 0, 0);
 
-	for (i = 0; i < BTREE_ID_NR && !ret; i++) {
-		b = c->btree_roots[i].b;
+	for (i = 0; i < btree_id_nr_alive(c)&& !ret; i++) {
+		struct btree_root *r = bch2_btree_id_root(c, i);
+
+		if (!r->alive)
+			continue;
+
+		b = r->b;
 		if (btree_node_fake(b))
 			continue;
 
@@ -883,7 +888,7 @@ static int bch2_gc_btree(struct btree_trans *trans, enum btree_id btree_id,
 		return ret;
 
 	mutex_lock(&c->btree_root_lock);
-	b = c->btree_roots[btree_id].b;
+	b = bch2_btree_id_root(c, btree_id)->b;
 	if (!btree_node_fake(b)) {
 		struct bkey_s_c k = bkey_i_to_s_c(&b->key);
 
@@ -1006,7 +1011,7 @@ static int bch2_gc_btree_init(struct btree_trans *trans,
 	struct printbuf buf = PRINTBUF;
 	int ret = 0;
 
-	b = c->btree_roots[btree_id].b;
+	b = bch2_btree_id_root(c, btree_id)->b;
 
 	if (btree_node_fake(b))
 		return 0;
@@ -1075,6 +1080,15 @@ static int bch2_gc_btrees(struct bch_fs *c, bool initial, bool metadata_only)
 			? bch2_gc_btree_init(&trans, ids[i], metadata_only)
 			: bch2_gc_btree(&trans, ids[i], initial, metadata_only);
 
+	for (i = BTREE_ID_NR; i < btree_id_nr_alive(c) && !ret; i++) {
+		if (!bch2_btree_id_root(c, i)->alive)
+			continue;
+
+		ret = initial
+			? bch2_gc_btree_init(&trans, i, metadata_only)
+			: bch2_gc_btree(&trans, i, initial, metadata_only);
+	}
+
 	if (ret < 0)
 		bch_err_fn(c, ret);
 
diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index cb2959bee8..a8197c5008 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -517,7 +517,7 @@ static void btree_pos_to_text(struct printbuf *out, struct bch_fs *c,
 	prt_printf(out, "%s level %u/%u\n  ",
 	       bch2_btree_ids[b->c.btree_id],
 	       b->c.level,
-	       c->btree_roots[b->c.btree_id].level);
+	       bch2_btree_id_root(c, b->c.btree_id)->level);
 	bch2_bkey_val_to_text(out, c, bkey_i_to_s_c(&b->key));
 }
 
diff --git a/fs/bcachefs/btree_iter.c b/fs/bcachefs/btree_iter.c
index bbf8f026ea..e292c5a2a8 100644
--- a/fs/bcachefs/btree_iter.c
+++ b/fs/bcachefs/btree_iter.c
@@ -238,7 +238,7 @@ static void bch2_btree_path_verify(struct btree_trans *trans,
 	for (i = 0; i < (!path->cached ? BTREE_MAX_DEPTH : 1); i++) {
 		if (!path->l[i].b) {
 			BUG_ON(!path->cached &&
-			       c->btree_roots[path->btree_id].b->c.level > i);
+			       bch2_btree_id_root(c, path->btree_id)->b->c.level > i);
 			break;
 		}
 
@@ -732,7 +732,7 @@ static inline int btree_path_lock_root(struct btree_trans *trans,
 				       unsigned long trace_ip)
 {
 	struct bch_fs *c = trans->c;
-	struct btree *b, **rootp = &c->btree_roots[path->btree_id].b;
+	struct btree *b, **rootp = &bch2_btree_id_root(c, path->btree_id)->b;
 	enum six_lock_type lock_type;
 	unsigned i;
 	int ret;
diff --git a/fs/bcachefs/btree_update_interior.c b/fs/bcachefs/btree_update_interior.c
index 22cec9a29a..5592feff79 100644
--- a/fs/bcachefs/btree_update_interior.c
+++ b/fs/bcachefs/btree_update_interior.c
@@ -1199,7 +1199,7 @@ static void bch2_btree_set_root_inmem(struct bch_fs *c, struct btree *b)
 	       (b->c.level < btree_node_root(c, b)->c.level ||
 		!btree_node_dying(btree_node_root(c, b))));
 
-	btree_node_root(c, b) = b;
+	bch2_btree_id_root(c, b->c.btree_id)->b = b;
 	mutex_unlock(&c->btree_root_lock);
 
 	bch2_recalc_btree_reserve(c);
@@ -2402,7 +2402,7 @@ bool bch2_btree_interior_updates_flush(struct bch_fs *c)
 
 void bch2_journal_entry_to_btree_root(struct bch_fs *c, struct jset_entry *entry)
 {
-	struct btree_root *r = &c->btree_roots[entry->btree_id];
+	struct btree_root *r = bch2_btree_id_root(c, entry->btree_id);
 
 	mutex_lock(&c->btree_root_lock);
 
@@ -2428,15 +2428,15 @@ bch2_btree_roots_to_journal_entries(struct bch_fs *c,
 
 	mutex_lock(&c->btree_root_lock);
 
-	for (i = 0; i < BTREE_ID_NR; i++)
-		if (c->btree_roots[i].alive && !test_bit(i, &have)) {
-			journal_entry_set(end,
-					  BCH_JSET_ENTRY_btree_root,
-					  i, c->btree_roots[i].level,
-					  &c->btree_roots[i].key,
-					  c->btree_roots[i].key.k.u64s);
+	for (i = 0; i < btree_id_nr_alive(c); i++) {
+		struct btree_root *r = bch2_btree_id_root(c, i);
+
+		if (r->alive && !test_bit(i, &have)) {
+			journal_entry_set(end, BCH_JSET_ENTRY_btree_root,
+					  i, r->level, &r->key, r->key.k.u64s);
 			end = vstruct_next(end);
 		}
+	}
 
 	mutex_unlock(&c->btree_root_lock);
 
diff --git a/fs/bcachefs/move.c b/fs/bcachefs/move.c
index 37fb3784a2..0527267390 100644
--- a/fs/bcachefs/move.c
+++ b/fs/bcachefs/move.c
@@ -632,7 +632,7 @@ int bch2_move_data(struct bch_fs *c,
 	bch2_moving_ctxt_init(&ctxt, c, rate, stats, wp, wait_on_copygc);
 
 	for (id = start_btree_id;
-	     id <= min_t(unsigned, end_btree_id, BTREE_ID_NR - 1);
+	     id <= min_t(unsigned, end_btree_id, btree_id_nr_alive(c) - 1);
 	     id++) {
 		stats->btree_id = id;
 
@@ -640,6 +640,9 @@ int bch2_move_data(struct bch_fs *c,
 		    id != BTREE_ID_reflink)
 			continue;
 
+		if (!bch2_btree_id_root(c, id)->b)
+			continue;
+
 		ret = __bch2_move_data(&ctxt,
 				       id == start_btree_id ? start_pos : POS_MIN,
 				       id == end_btree_id   ? end_pos   : POS_MAX,
@@ -861,10 +864,13 @@ static int bch2_move_btree(struct bch_fs *c,
 	stats->data_type = BCH_DATA_btree;
 
 	for (id = start_btree_id;
-	     id <= min_t(unsigned, end_btree_id, BTREE_ID_NR - 1);
+	     id <= min_t(unsigned, end_btree_id, btree_id_nr_alive(c) - 1);
 	     id++) {
 		stats->btree_id = id;
 
+		if (!bch2_btree_id_root(c, id)->b)
+			continue;
+
 		bch2_trans_node_iter_init(&trans, &iter, id, POS_MIN, 0, 0,
 					  BTREE_ITER_PREFETCH);
 retry:
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index cf8f5c0e1b..9ea85b097e 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -702,13 +702,13 @@ static int journal_replay_entry_early(struct bch_fs *c,
 	case BCH_JSET_ENTRY_btree_root: {
 		struct btree_root *r;
 
-		if (entry->btree_id >= BTREE_ID_NR) {
-			bch_err(c, "filesystem has unknown btree type %u",
-				entry->btree_id);
-			return -EINVAL;
+		while (entry->btree_id >= c->btree_roots_extra.nr + BTREE_ID_NR) {
+			ret = darray_push(&c->btree_roots_extra, (struct btree_root) { NULL });
+			if (ret)
+				return ret;
 		}
 
-		r = &c->btree_roots[entry->btree_id];
+		r = bch2_btree_id_root(c, entry->btree_id);
 
 		if (entry->u64s) {
 			r->level = entry->level;
@@ -980,8 +980,8 @@ static int read_btree_roots(struct bch_fs *c)
 	unsigned i;
 	int ret = 0;
 
-	for (i = 0; i < BTREE_ID_NR; i++) {
-		struct btree_root *r = &c->btree_roots[i];
+	for (i = 0; i < btree_id_nr_alive(c); i++) {
+		struct btree_root *r = bch2_btree_id_root(c, i);
 
 		if (!r->alive)
 			continue;
@@ -1014,7 +1014,7 @@ static int read_btree_roots(struct bch_fs *c)
 	}
 
 	for (i = 0; i < BTREE_ID_NR; i++) {
-		struct btree_root *r = &c->btree_roots[i];
+		struct btree_root *r = bch2_btree_id_root(c, i);
 
 		if (!r->b) {
 			r->alive = false;
diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index 4d7380099e..426d2acfca 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -486,6 +486,7 @@ static void __bch2_fs_free(struct bch_fs *c)
 		for_each_possible_cpu(cpu)
 			kfree(per_cpu_ptr(c->btree_paths_bufs, cpu)->path);
 
+	darray_exit(&c->btree_roots_extra);
 	free_percpu(c->btree_paths_bufs);
 	free_percpu(c->pcpu);
 	mempool_exit(&c->large_bkey_pool);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 02/10] bcachefs: Allow for unknown key types
  2023-07-09 17:15 [PATCH 00/10] bcachefs - semvar, forward compatibility Kent Overstreet
  2023-07-09 17:15 ` [PATCH 01/10] bcachefs: Allow for unknown btree IDs Kent Overstreet
@ 2023-07-09 17:15 ` Kent Overstreet
  2023-07-09 17:15 ` [PATCH 03/10] bcachefs: Refactor bch_sb_field_ops handling Kent Overstreet
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 17:15 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, bfoster, sandeen

This adds a new helper for lookups bkey_ops for a given key type, which
returns a null bkey_ops for unknown key types; various bkey_ops users
are tweaked as well to handle unknown key types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/bkey_methods.c      | 38 +++++++++++++++------------------
 fs/bcachefs/bkey_methods.h      | 12 +++++++++--
 fs/bcachefs/btree_update_leaf.c | 10 +++++----
 3 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/fs/bcachefs/bkey_methods.c b/fs/bcachefs/bkey_methods.c
index ee7275c9d1..985ea2daa8 100644
--- a/fs/bcachefs/bkey_methods.c
+++ b/fs/bcachefs/bkey_methods.c
@@ -118,17 +118,14 @@ const struct bkey_ops bch2_bkey_ops[] = {
 #undef x
 };
 
+const struct bkey_ops bch2_bkey_null_ops = {
+	.min_val_size = U8_MAX,
+};
+
 int bch2_bkey_val_invalid(struct bch_fs *c, struct bkey_s_c k,
 			  unsigned flags, struct printbuf *err)
 {
-	const struct bkey_ops *ops;
-
-	if (k.k->type >= KEY_TYPE_MAX) {
-		prt_printf(err, "invalid type (%u >= %u)", k.k->type, KEY_TYPE_MAX);
-		return -BCH_ERR_invalid_bkey;
-	}
-
-	ops = &bch2_bkey_ops[k.k->type];
+	const struct bkey_ops *ops = bch2_bkey_type_ops(k.k->type);
 
 	if (bkey_val_bytes(k.k) < ops->min_val_size) {
 		prt_printf(err, "bad val size (%zu < %u)",
@@ -136,6 +133,9 @@ int bch2_bkey_val_invalid(struct bch_fs *c, struct bkey_s_c k,
 		return -BCH_ERR_invalid_bkey;
 	}
 
+	if (!ops->key_invalid)
+		return 0;
+
 	return ops->key_invalid(c, k, flags, err);
 }
 
@@ -340,14 +340,10 @@ void bch2_bkey_to_text(struct printbuf *out, const struct bkey *k)
 void bch2_val_to_text(struct printbuf *out, struct bch_fs *c,
 		      struct bkey_s_c k)
 {
-	if (k.k->type < KEY_TYPE_MAX) {
-		const struct bkey_ops *ops = &bch2_bkey_ops[k.k->type];
+	const struct bkey_ops *ops = bch2_bkey_type_ops(k.k->type);
 
-		if (likely(ops->val_to_text))
-			ops->val_to_text(out, c, k);
-	} else {
-		prt_printf(out, "(invalid type %u)", k.k->type);
-	}
+	if (likely(ops->val_to_text))
+		ops->val_to_text(out, c, k);
 }
 
 void bch2_bkey_val_to_text(struct printbuf *out, struct bch_fs *c,
@@ -363,7 +359,7 @@ void bch2_bkey_val_to_text(struct printbuf *out, struct bch_fs *c,
 
 void bch2_bkey_swab_val(struct bkey_s k)
 {
-	const struct bkey_ops *ops = &bch2_bkey_ops[k.k->type];
+	const struct bkey_ops *ops = bch2_bkey_type_ops(k.k->type);
 
 	if (ops->swab)
 		ops->swab(k);
@@ -371,7 +367,7 @@ void bch2_bkey_swab_val(struct bkey_s k)
 
 bool bch2_bkey_normalize(struct bch_fs *c, struct bkey_s k)
 {
-	const struct bkey_ops *ops = &bch2_bkey_ops[k.k->type];
+	const struct bkey_ops *ops = bch2_bkey_type_ops(k.k->type);
 
 	return ops->key_normalize
 		? ops->key_normalize(c, k)
@@ -380,11 +376,11 @@ bool bch2_bkey_normalize(struct bch_fs *c, struct bkey_s k)
 
 bool bch2_bkey_merge(struct bch_fs *c, struct bkey_s l, struct bkey_s_c r)
 {
-	const struct bkey_ops *ops = &bch2_bkey_ops[l.k->type];
+	const struct bkey_ops *ops = bch2_bkey_type_ops(l.k->type);
 
-	return bch2_bkey_maybe_mergable(l.k, r.k) &&
+	return ops->key_merge &&
+		bch2_bkey_maybe_mergable(l.k, r.k) &&
 		(u64) l.k->size + r.k->size <= KEY_SIZE_MAX &&
-		bch2_bkey_ops[l.k->type].key_merge &&
 		!bch2_key_merging_disabled &&
 		ops->key_merge(c, l, r);
 }
@@ -509,7 +505,7 @@ void __bch2_bkey_compat(unsigned level, enum btree_id btree_id,
 		if (big_endian != CPU_BIG_ENDIAN)
 			bch2_bkey_swab_val(u);
 
-		ops = &bch2_bkey_ops[k->type];
+		ops = bch2_bkey_type_ops(k->type);
 
 		if (ops->compat)
 			ops->compat(btree_id, version, big_endian, write, u);
diff --git a/fs/bcachefs/bkey_methods.h b/fs/bcachefs/bkey_methods.h
index a65756e306..32b86c74cc 100644
--- a/fs/bcachefs/bkey_methods.h
+++ b/fs/bcachefs/bkey_methods.h
@@ -11,6 +11,7 @@ struct bkey;
 enum btree_node_type;
 
 extern const char * const bch2_bkey_types[];
+extern const struct bkey_ops bch2_bkey_null_ops;
 
 /*
  * key_invalid: checks validity of @k, returns 0 if good or -EINVAL if bad. If
@@ -41,6 +42,13 @@ struct bkey_ops {
 
 extern const struct bkey_ops bch2_bkey_ops[];
 
+static inline const struct bkey_ops *bch2_bkey_type_ops(enum bch_bkey_type type)
+{
+	return likely(type < KEY_TYPE_MAX)
+		? &bch2_bkey_ops[type]
+		: &bch2_bkey_null_ops;
+}
+
 #define BKEY_INVALID_FROM_JOURNAL		(1 << 1)
 
 int bch2_bkey_val_invalid(struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
@@ -75,7 +83,7 @@ static inline int bch2_mark_key(struct btree_trans *trans,
 		struct bkey_s_c old, struct bkey_s_c new,
 		unsigned flags)
 {
-	const struct bkey_ops *ops = &bch2_bkey_ops[old.k->type ?: new.k->type];
+	const struct bkey_ops *ops = bch2_bkey_type_ops(old.k->type ?: new.k->type);
 
 	return ops->atomic_trigger
 		? ops->atomic_trigger(trans, btree, level, old, new, flags)
@@ -125,7 +133,7 @@ static inline int bch2_trans_mark_key(struct btree_trans *trans,
 				      struct bkey_s_c old, struct bkey_i *new,
 				      unsigned flags)
 {
-	const struct bkey_ops *ops = &bch2_bkey_ops[old.k->type ?: new->k.type];
+	const struct bkey_ops *ops = bch2_bkey_type_ops(old.k->type ?: new->k.type);
 
 	return ops->trans_trigger
 		? ops->trans_trigger(trans, btree_id, level, old, new, flags)
diff --git a/fs/bcachefs/btree_update_leaf.c b/fs/bcachefs/btree_update_leaf.c
index dc105a7c2e..1474dca26d 100644
--- a/fs/bcachefs/btree_update_leaf.c
+++ b/fs/bcachefs/btree_update_leaf.c
@@ -407,6 +407,8 @@ static int run_one_mem_trigger(struct btree_trans *trans,
 {
 	struct bkey_s_c old = { &i->old_k, i->old_v };
 	struct bkey_i *new = i->k;
+	const struct bkey_ops *old_ops = bch2_bkey_type_ops(old.k->type);
+	const struct bkey_ops *new_ops = bch2_bkey_type_ops(i->k->k.type);
 	int ret;
 
 	verify_update_old_key(trans, i);
@@ -417,8 +419,7 @@ static int run_one_mem_trigger(struct btree_trans *trans,
 	if (!btree_node_type_needs_gc(i->btree_id))
 		return 0;
 
-	if (bch2_bkey_ops[old.k->type].atomic_trigger ==
-	    bch2_bkey_ops[i->k->k.type].atomic_trigger &&
+	if (old_ops->atomic_trigger == new_ops->atomic_trigger &&
 	    ((1U << old.k->type) & BTREE_TRIGGER_WANTS_OLD_AND_NEW)) {
 		ret   = bch2_mark_key(trans, i->btree_id, i->level,
 				old, bkey_i_to_s_c(new),
@@ -450,6 +451,8 @@ static int run_one_trans_trigger(struct btree_trans *trans, struct btree_insert_
 	 */
 	struct bkey old_k = i->old_k;
 	struct bkey_s_c old = { &old_k, i->old_v };
+	const struct bkey_ops *old_ops = bch2_bkey_type_ops(old.k->type);
+	const struct bkey_ops *new_ops = bch2_bkey_type_ops(i->k->k.type);
 
 	verify_update_old_key(trans, i);
 
@@ -459,8 +462,7 @@ static int run_one_trans_trigger(struct btree_trans *trans, struct btree_insert_
 
 	if (!i->insert_trigger_run &&
 	    !i->overwrite_trigger_run &&
-	    bch2_bkey_ops[old.k->type].trans_trigger ==
-	    bch2_bkey_ops[i->k->k.type].trans_trigger &&
+	    old_ops->trans_trigger == new_ops->trans_trigger &&
 	    ((1U << old.k->type) & BTREE_TRIGGER_WANTS_OLD_AND_NEW)) {
 		i->overwrite_trigger_run = true;
 		i->insert_trigger_run = true;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 03/10] bcachefs: Refactor bch_sb_field_ops handling
  2023-07-09 17:15 [PATCH 00/10] bcachefs - semvar, forward compatibility Kent Overstreet
  2023-07-09 17:15 ` [PATCH 01/10] bcachefs: Allow for unknown btree IDs Kent Overstreet
  2023-07-09 17:15 ` [PATCH 02/10] bcachefs: Allow for unknown key types Kent Overstreet
@ 2023-07-09 17:15 ` Kent Overstreet
  2023-07-09 17:15 ` [PATCH 04/10] bcachefs: Change check for invalid key types Kent Overstreet
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 17:15 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, bfoster, sandeen

This changes bch_sb_field_ops lookup to match how bkey_ops now works;
for an unknown field type we return an empty ops struct.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/super-io.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
index eaf90eb137..833e78d48c 100644
--- a/fs/bcachefs/super-io.c
+++ b/fs/bcachefs/super-io.c
@@ -1405,21 +1405,29 @@ static const struct bch_sb_field_ops *bch2_sb_field_ops[] = {
 #undef x
 };
 
+static const struct bch_sb_field_ops bch2_sb_field_null_ops = {
+	NULL
+};
+
+static const struct bch_sb_field_ops *bch2_sb_field_type_ops(unsigned type)
+{
+	return likely(type < ARRAY_SIZE(bch2_sb_field_ops))
+		? bch2_sb_field_ops[type]
+		: &bch2_sb_field_null_ops;
+}
+
 static int bch2_sb_field_validate(struct bch_sb *sb, struct bch_sb_field *f,
 				  struct printbuf *err)
 {
 	unsigned type = le32_to_cpu(f->type);
 	struct printbuf field_err = PRINTBUF;
+	const struct bch_sb_field_ops *ops = bch2_sb_field_type_ops(type);
 	int ret;
 
-	if (type >= BCH_SB_FIELD_NR)
-		return 0;
-
-	ret = bch2_sb_field_ops[type]->validate(sb, f, &field_err);
+	ret = ops->validate ? ops->validate(sb, f, &field_err) : 0;
 	if (ret) {
 		prt_printf(err, "Invalid superblock section %s: %s",
-		       bch2_sb_fields[type],
-		       field_err.buf);
+			   bch2_sb_fields[type], field_err.buf);
 		prt_newline(err);
 		bch2_sb_field_to_text(err, sb, f);
 	}
@@ -1432,13 +1440,12 @@ void bch2_sb_field_to_text(struct printbuf *out, struct bch_sb *sb,
 			   struct bch_sb_field *f)
 {
 	unsigned type = le32_to_cpu(f->type);
-	const struct bch_sb_field_ops *ops = type < BCH_SB_FIELD_NR
-		? bch2_sb_field_ops[type] : NULL;
+	const struct bch_sb_field_ops *ops = bch2_sb_field_type_ops(type);
 
 	if (!out->nr_tabstops)
 		printbuf_tabstop_push(out, 32);
 
-	if (ops)
+	if (type < BCH_SB_FIELD_NR)
 		prt_printf(out, "%s", bch2_sb_fields[type]);
 	else
 		prt_printf(out, "(unknown field %u)", type);
@@ -1446,9 +1453,9 @@ void bch2_sb_field_to_text(struct printbuf *out, struct bch_sb *sb,
 	prt_printf(out, " (size %zu):", vstruct_bytes(f));
 	prt_newline(out);
 
-	if (ops && ops->to_text) {
+	if (ops->to_text) {
 		printbuf_indent_add(out, 2);
-		bch2_sb_field_ops[type]->to_text(out, sb, f);
+		ops->to_text(out, sb, f);
 		printbuf_indent_sub(out, 2);
 	}
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 04/10] bcachefs: Change check for invalid key types
  2023-07-09 17:15 [PATCH 00/10] bcachefs - semvar, forward compatibility Kent Overstreet
                   ` (2 preceding siblings ...)
  2023-07-09 17:15 ` [PATCH 03/10] bcachefs: Refactor bch_sb_field_ops handling Kent Overstreet
@ 2023-07-09 17:15 ` Kent Overstreet
  2023-07-09 17:15 ` [PATCH 05/10] bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE() Kent Overstreet
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 17:15 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, bfoster, sandeen

As part of the forward compatibility patch series, we need to allow for
new key types without complaining loudly when running an old version.

This patch changes the flags parameter of bkey_invalid to an enum, and
adds a new flag to indicate we're being called from the transaction
commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/alloc_background.c  | 14 +++++++----
 fs/bcachefs/alloc_background.h  | 17 +++++++++----
 fs/bcachefs/backpointers.c      |  3 ++-
 fs/bcachefs/backpointers.h      |  2 +-
 fs/bcachefs/bkey_methods.c      | 43 ++++++++++++++++++---------------
 fs/bcachefs/bkey_methods.h      | 10 +++++---
 fs/bcachefs/btree_update_leaf.c |  7 ++++--
 fs/bcachefs/dirent.c            |  3 ++-
 fs/bcachefs/dirent.h            |  4 ++-
 fs/bcachefs/ec.c                |  3 ++-
 fs/bcachefs/ec.h                |  4 ++-
 fs/bcachefs/extents.c           | 12 ++++++---
 fs/bcachefs/extents.h           |  9 ++++---
 fs/bcachefs/inode.c             | 12 ++++++---
 fs/bcachefs/inode.h             | 12 ++++++---
 fs/bcachefs/journal_io.c        |  3 ++-
 fs/bcachefs/lru.c               |  3 ++-
 fs/bcachefs/lru.h               |  3 ++-
 fs/bcachefs/quota.c             |  3 ++-
 fs/bcachefs/quota.h             |  4 ++-
 fs/bcachefs/reflink.c           |  9 ++++---
 fs/bcachefs/reflink.h           |  8 +++---
 fs/bcachefs/subvolume.c         |  6 +++--
 fs/bcachefs/subvolume.h         |  6 +++--
 fs/bcachefs/xattr.c             |  3 ++-
 fs/bcachefs/xattr.h             |  3 ++-
 26 files changed, 133 insertions(+), 73 deletions(-)

diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c
index f7ecea8325..c59629bbbc 100644
--- a/fs/bcachefs/alloc_background.c
+++ b/fs/bcachefs/alloc_background.c
@@ -223,7 +223,8 @@ static unsigned bch_alloc_v1_val_u64s(const struct bch_alloc *a)
 }
 
 int bch2_alloc_v1_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			  unsigned flags, struct printbuf *err)
+			  enum bkey_invalid_flags flags,
+			  struct printbuf *err)
 {
 	struct bkey_s_c_alloc a = bkey_s_c_to_alloc(k);
 
@@ -238,7 +239,8 @@ int bch2_alloc_v1_invalid(const struct bch_fs *c, struct bkey_s_c k,
 }
 
 int bch2_alloc_v2_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			  unsigned flags, struct printbuf *err)
+			  enum bkey_invalid_flags flags,
+			  struct printbuf *err)
 {
 	struct bkey_alloc_unpacked u;
 
@@ -251,7 +253,8 @@ int bch2_alloc_v2_invalid(const struct bch_fs *c, struct bkey_s_c k,
 }
 
 int bch2_alloc_v3_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			  unsigned flags, struct printbuf *err)
+			  enum bkey_invalid_flags flags,
+			  struct printbuf *err)
 {
 	struct bkey_alloc_unpacked u;
 
@@ -282,7 +285,7 @@ int bch2_alloc_v4_invalid(const struct bch_fs *c, struct bkey_s_c k,
 	}
 
 	if (rw == WRITE &&
-	    !(flags & BKEY_INVALID_FROM_JOURNAL) &&
+	    !(flags & BKEY_INVALID_JOURNAL) &&
 	    test_bit(BCH_FS_CHECK_BACKPOINTERS_DONE, &c->flags)) {
 		unsigned i, bp_len = 0;
 
@@ -605,7 +608,8 @@ static unsigned alloc_gen(struct bkey_s_c k, unsigned offset)
 }
 
 int bch2_bucket_gens_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			     unsigned flags, struct printbuf *err)
+			     enum bkey_invalid_flags flags,
+			     struct printbuf *err)
 {
 	if (bkey_val_bytes(k.k) != sizeof(struct bch_bucket_gens)) {
 		prt_printf(err, "bad val size (%lu != %zu)",
diff --git a/fs/bcachefs/alloc_background.h b/fs/bcachefs/alloc_background.h
index 3c4d6d40b1..d1bf45a4b4 100644
--- a/fs/bcachefs/alloc_background.h
+++ b/fs/bcachefs/alloc_background.h
@@ -8,6 +8,8 @@
 #include "debug.h"
 #include "super.h"
 
+enum bkey_invalid_flags;
+
 /* How out of date a pointer gen is allowed to be: */
 #define BUCKET_GC_GEN_MAX	96U
 
@@ -147,10 +149,14 @@ struct bkey_i_alloc_v4 *bch2_alloc_to_v4_mut(struct btree_trans *, struct bkey_s
 
 int bch2_bucket_io_time_reset(struct btree_trans *, unsigned, size_t, int);
 
-int bch2_alloc_v1_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
-int bch2_alloc_v2_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
-int bch2_alloc_v3_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
-int bch2_alloc_v4_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
+int bch2_alloc_v1_invalid(const struct bch_fs *, struct bkey_s_c,
+			  enum bkey_invalid_flags, struct printbuf *);
+int bch2_alloc_v2_invalid(const struct bch_fs *, struct bkey_s_c,
+			  enum bkey_invalid_flags, struct printbuf *);
+int bch2_alloc_v3_invalid(const struct bch_fs *, struct bkey_s_c,
+			  enum bkey_invalid_flags, struct printbuf *);
+int bch2_alloc_v4_invalid(const struct bch_fs *, struct bkey_s_c,
+			  enum bkey_invalid_flags, struct printbuf *);
 void bch2_alloc_v4_swab(struct bkey_s);
 void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 
@@ -187,7 +193,8 @@ void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 	.min_val_size	= 48,				\
 })
 
-int bch2_bucket_gens_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
+int bch2_bucket_gens_invalid(const struct bch_fs *, struct bkey_s_c,
+			     enum bkey_invalid_flags, struct printbuf *);
 void bch2_bucket_gens_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 
 #define bch2_bkey_ops_bucket_gens ((struct bkey_ops) {	\
diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c
index 760c4cc16a..571a7d19be 100644
--- a/fs/bcachefs/backpointers.c
+++ b/fs/bcachefs/backpointers.c
@@ -38,7 +38,8 @@ static bool extent_matches_bp(struct bch_fs *c,
 }
 
 int bch2_backpointer_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			     unsigned flags, struct printbuf *err)
+			     enum bkey_invalid_flags flags,
+			     struct printbuf *err)
 {
 	struct bkey_s_c_backpointer bp = bkey_s_c_to_backpointer(k);
 	struct bpos bucket = bp_pos_to_bucket(c, bp.k->p);
diff --git a/fs/bcachefs/backpointers.h b/fs/bcachefs/backpointers.h
index 3994bc83d6..87e31aa197 100644
--- a/fs/bcachefs/backpointers.h
+++ b/fs/bcachefs/backpointers.h
@@ -8,7 +8,7 @@
 #include "super.h"
 
 int bch2_backpointer_invalid(const struct bch_fs *, struct bkey_s_c k,
-			     unsigned, struct printbuf *);
+			     enum bkey_invalid_flags, struct printbuf *);
 void bch2_backpointer_to_text(struct printbuf *, const struct bch_backpointer *);
 void bch2_backpointer_k_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 void bch2_backpointer_swab(struct bkey_s);
diff --git a/fs/bcachefs/bkey_methods.c b/fs/bcachefs/bkey_methods.c
index 985ea2daa8..1381166bfc 100644
--- a/fs/bcachefs/bkey_methods.c
+++ b/fs/bcachefs/bkey_methods.c
@@ -123,7 +123,8 @@ const struct bkey_ops bch2_bkey_null_ops = {
 };
 
 int bch2_bkey_val_invalid(struct bch_fs *c, struct bkey_s_c k,
-			  unsigned flags, struct printbuf *err)
+			  enum bkey_invalid_flags flags,
+			  struct printbuf *err)
 {
 	const struct bkey_ops *ops = bch2_bkey_type_ops(k.k->type);
 
@@ -215,14 +216,16 @@ static unsigned bch2_key_types_allowed[] = {
 
 int __bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k,
 			enum btree_node_type type,
-			unsigned flags, struct printbuf *err)
+			enum bkey_invalid_flags flags,
+			struct printbuf *err)
 {
 	if (k.k->u64s < BKEY_U64s) {
 		prt_printf(err, "u64s too small (%u < %zu)", k.k->u64s, BKEY_U64s);
 		return -BCH_ERR_invalid_bkey;
 	}
 
-	if (!(bch2_key_types_allowed[type] & (1U << k.k->type))) {
+	if (flags & BKEY_INVALID_COMMIT	 &&
+	    !(bch2_key_types_allowed[type] & (1U << k.k->type))) {
 		prt_printf(err, "invalid key type for btree %s (%s)",
 			   bch2_btree_ids[type], bch2_bkey_types[k.k->type]);
 		return -BCH_ERR_invalid_bkey;
@@ -246,24 +249,23 @@ int __bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k,
 		}
 	}
 
-	if (type != BKEY_TYPE_btree &&
-	    !btree_type_has_snapshots(type) &&
-	    k.k->p.snapshot) {
-		prt_printf(err, "nonzero snapshot");
-		return -BCH_ERR_invalid_bkey;
-	}
+	if (type != BKEY_TYPE_btree) {
+		if (!btree_type_has_snapshots((enum btree_id) type) &&
+		    k.k->p.snapshot) {
+			prt_printf(err, "nonzero snapshot");
+			return -BCH_ERR_invalid_bkey;
+		}
 
-	if (type != BKEY_TYPE_btree &&
-	    btree_type_has_snapshots(type) &&
-	    !k.k->p.snapshot) {
-		prt_printf(err, "snapshot == 0");
-		return -BCH_ERR_invalid_bkey;
-	}
+		if (btree_type_has_snapshots((enum btree_id) type) &&
+		    !k.k->p.snapshot) {
+			prt_printf(err, "snapshot == 0");
+			return -BCH_ERR_invalid_bkey;
+		}
 
-	if (type != BKEY_TYPE_btree &&
-	    bkey_eq(k.k->p, POS_MAX)) {
-		prt_printf(err, "key at POS_MAX");
-		return -BCH_ERR_invalid_bkey;
+		if (bkey_eq(k.k->p, POS_MAX)) {
+			prt_printf(err, "key at POS_MAX");
+			return -BCH_ERR_invalid_bkey;
+		}
 	}
 
 	return 0;
@@ -271,7 +273,8 @@ int __bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k,
 
 int bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k,
 		      enum btree_node_type type,
-		      unsigned flags, struct printbuf *err)
+		      enum bkey_invalid_flags flags,
+		      struct printbuf *err)
 {
 	return __bch2_bkey_invalid(c, k, type, flags, err) ?:
 		bch2_bkey_val_invalid(c, k, flags, err);
diff --git a/fs/bcachefs/bkey_methods.h b/fs/bcachefs/bkey_methods.h
index 32b86c74cc..0f3dc156ad 100644
--- a/fs/bcachefs/bkey_methods.h
+++ b/fs/bcachefs/bkey_methods.h
@@ -13,6 +13,12 @@ enum btree_node_type;
 extern const char * const bch2_bkey_types[];
 extern const struct bkey_ops bch2_bkey_null_ops;
 
+enum bkey_invalid_flags {
+	BKEY_INVALID_WRITE		= (1U << 0),
+	BKEY_INVALID_COMMIT		= (1U << 1),
+	BKEY_INVALID_JOURNAL		= (1U << 2),
+};
+
 /*
  * key_invalid: checks validity of @k, returns 0 if good or -EINVAL if bad. If
  * invalid, entire key will be deleted.
@@ -22,7 +28,7 @@ extern const struct bkey_ops bch2_bkey_null_ops;
  */
 struct bkey_ops {
 	int		(*key_invalid)(const struct bch_fs *c, struct bkey_s_c k,
-				       unsigned flags, struct printbuf *err);
+				       enum bkey_invalid_flags flags, struct printbuf *err);
 	void		(*val_to_text)(struct printbuf *, struct bch_fs *,
 				       struct bkey_s_c);
 	void		(*swab)(struct bkey_s);
@@ -49,8 +55,6 @@ static inline const struct bkey_ops *bch2_bkey_type_ops(enum bch_bkey_type type)
 		: &bch2_bkey_null_ops;
 }
 
-#define BKEY_INVALID_FROM_JOURNAL		(1 << 1)
-
 int bch2_bkey_val_invalid(struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
 int __bch2_bkey_invalid(struct bch_fs *, struct bkey_s_c,
 			enum btree_node_type, unsigned, struct printbuf *);
diff --git a/fs/bcachefs/btree_update_leaf.c b/fs/bcachefs/btree_update_leaf.c
index 1474dca26d..2b43f02fc4 100644
--- a/fs/bcachefs/btree_update_leaf.c
+++ b/fs/bcachefs/btree_update_leaf.c
@@ -856,10 +856,13 @@ static inline int do_bch2_trans_commit(struct btree_trans *trans, unsigned flags
 	struct printbuf buf = PRINTBUF;
 
 	trans_for_each_update(trans, i) {
-		int rw = (flags & BTREE_INSERT_JOURNAL_REPLAY) ? READ : WRITE;
+		enum bkey_invalid_flags invalid_flags = 0;
+
+		if (!(flags & BTREE_INSERT_JOURNAL_REPLAY))
+			invalid_flags |= BKEY_INVALID_WRITE|BKEY_INVALID_COMMIT;
 
 		if (unlikely(bch2_bkey_invalid(c, bkey_i_to_s_c(i->k),
-					       i->bkey_type, rw, &buf)))
+					       i->bkey_type, invalid_flags, &buf)))
 			return bch2_trans_commit_bkey_invalid(trans, flags, i, &buf);
 		btree_insert_entry_checks(trans, i);
 	}
diff --git a/fs/bcachefs/dirent.c b/fs/bcachefs/dirent.c
index ef3f1f9b7e..065ea59ee9 100644
--- a/fs/bcachefs/dirent.c
+++ b/fs/bcachefs/dirent.c
@@ -85,7 +85,8 @@ const struct bch_hash_desc bch2_dirent_hash_desc = {
 };
 
 int bch2_dirent_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			unsigned flags, struct printbuf *err)
+			enum bkey_invalid_flags flags,
+			struct printbuf *err)
 {
 	struct bkey_s_c_dirent d = bkey_s_c_to_dirent(k);
 	unsigned len;
diff --git a/fs/bcachefs/dirent.h b/fs/bcachefs/dirent.h
index bf9ea2e35f..b42f4a13bc 100644
--- a/fs/bcachefs/dirent.h
+++ b/fs/bcachefs/dirent.h
@@ -4,9 +4,11 @@
 
 #include "str_hash.h"
 
+enum bkey_invalid_flags;
 extern const struct bch_hash_desc bch2_dirent_hash_desc;
 
-int bch2_dirent_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
+int bch2_dirent_invalid(const struct bch_fs *, struct bkey_s_c,
+			enum bkey_invalid_flags, struct printbuf *);
 void bch2_dirent_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 
 #define bch2_bkey_ops_dirent ((struct bkey_ops) {	\
diff --git a/fs/bcachefs/ec.c b/fs/bcachefs/ec.c
index 749be7a9f5..efbb7cf7a5 100644
--- a/fs/bcachefs/ec.c
+++ b/fs/bcachefs/ec.c
@@ -105,7 +105,8 @@ struct ec_bio {
 /* Stripes btree keys: */
 
 int bch2_stripe_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			unsigned flags, struct printbuf *err)
+			enum bkey_invalid_flags flags,
+			struct printbuf *err)
 {
 	const struct bch_stripe *s = bkey_s_c_to_stripe(k).v;
 
diff --git a/fs/bcachefs/ec.h b/fs/bcachefs/ec.h
index 64ca277ca1..1b1848e5fa 100644
--- a/fs/bcachefs/ec.h
+++ b/fs/bcachefs/ec.h
@@ -6,8 +6,10 @@
 #include "buckets_types.h"
 #include "extents_types.h"
 
+enum bkey_invalid_flags;
+
 int bch2_stripe_invalid(const struct bch_fs *, struct bkey_s_c,
-			unsigned, struct printbuf *);
+			enum bkey_invalid_flags, struct printbuf *);
 void bch2_stripe_to_text(struct printbuf *, struct bch_fs *,
 			 struct bkey_s_c);
 
diff --git a/fs/bcachefs/extents.c b/fs/bcachefs/extents.c
index bb5e17eb2a..c13e0afc66 100644
--- a/fs/bcachefs/extents.c
+++ b/fs/bcachefs/extents.c
@@ -163,7 +163,8 @@ int bch2_bkey_pick_read_device(struct bch_fs *c, struct bkey_s_c k,
 /* KEY_TYPE_btree_ptr: */
 
 int bch2_btree_ptr_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			   unsigned flags, struct printbuf *err)
+			   enum bkey_invalid_flags flags,
+			   struct printbuf *err)
 {
 	if (bkey_val_u64s(k.k) > BCH_REPLICAS_MAX) {
 		prt_printf(err, "value too big (%zu > %u)",
@@ -181,7 +182,8 @@ void bch2_btree_ptr_to_text(struct printbuf *out, struct bch_fs *c,
 }
 
 int bch2_btree_ptr_v2_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			      unsigned flags, struct printbuf *err)
+			      enum bkey_invalid_flags flags,
+			      struct printbuf *err)
 {
 	if (bkey_val_u64s(k.k) > BKEY_BTREE_PTR_VAL_U64s_MAX) {
 		prt_printf(err, "value too big (%zu > %zu)",
@@ -371,7 +373,8 @@ bool bch2_extent_merge(struct bch_fs *c, struct bkey_s l, struct bkey_s_c r)
 /* KEY_TYPE_reservation: */
 
 int bch2_reservation_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			     unsigned flags, struct printbuf *err)
+			     enum bkey_invalid_flags flags,
+			     struct printbuf *err)
 {
 	struct bkey_s_c_reservation r = bkey_s_c_to_reservation(k);
 
@@ -1103,7 +1106,8 @@ static int extent_ptr_invalid(const struct bch_fs *c,
 }
 
 int bch2_bkey_ptrs_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			   unsigned flags, struct printbuf *err)
+			   enum bkey_invalid_flags flags,
+			   struct printbuf *err)
 {
 	struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
 	const union bch_extent_entry *entry;
diff --git a/fs/bcachefs/extents.h b/fs/bcachefs/extents.h
index 00ddac1f68..d359b3fda9 100644
--- a/fs/bcachefs/extents.h
+++ b/fs/bcachefs/extents.h
@@ -8,6 +8,7 @@
 
 struct bch_fs;
 struct btree_trans;
+enum bkey_invalid_flags;
 
 /* extent entries: */
 
@@ -382,11 +383,13 @@ int bch2_bkey_pick_read_device(struct bch_fs *, struct bkey_s_c,
 
 /* KEY_TYPE_btree_ptr: */
 
-int bch2_btree_ptr_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
+int bch2_btree_ptr_invalid(const struct bch_fs *, struct bkey_s_c,
+			   enum bkey_invalid_flags, struct printbuf *);
 void bch2_btree_ptr_to_text(struct printbuf *, struct bch_fs *,
 			    struct bkey_s_c);
 
-int bch2_btree_ptr_v2_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
+int bch2_btree_ptr_v2_invalid(const struct bch_fs *, struct bkey_s_c,
+			      enum bkey_invalid_flags, struct printbuf *);
 void bch2_btree_ptr_v2_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 void bch2_btree_ptr_v2_compat(enum btree_id, unsigned, unsigned,
 			      int, struct bkey_s);
@@ -426,7 +429,7 @@ bool bch2_extent_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c);
 /* KEY_TYPE_reservation: */
 
 int bch2_reservation_invalid(const struct bch_fs *, struct bkey_s_c,
-			     unsigned, struct printbuf *);
+			     enum bkey_invalid_flags, struct printbuf *);
 void bch2_reservation_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 bool bch2_reservation_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c);
 
diff --git a/fs/bcachefs/inode.c b/fs/bcachefs/inode.c
index 64e8d1f8a2..fa435d8655 100644
--- a/fs/bcachefs/inode.c
+++ b/fs/bcachefs/inode.c
@@ -432,7 +432,8 @@ static int __bch2_inode_invalid(struct bkey_s_c k, struct printbuf *err)
 }
 
 int bch2_inode_invalid(const struct bch_fs *c, struct bkey_s_c k,
-		       unsigned flags, struct printbuf *err)
+		       enum bkey_invalid_flags flags,
+		       struct printbuf *err)
 {
 	struct bkey_s_c_inode inode = bkey_s_c_to_inode(k);
 
@@ -446,7 +447,8 @@ int bch2_inode_invalid(const struct bch_fs *c, struct bkey_s_c k,
 }
 
 int bch2_inode_v2_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			  unsigned flags, struct printbuf *err)
+			  enum bkey_invalid_flags flags,
+			  struct printbuf *err)
 {
 	struct bkey_s_c_inode_v2 inode = bkey_s_c_to_inode_v2(k);
 
@@ -460,7 +462,8 @@ int bch2_inode_v2_invalid(const struct bch_fs *c, struct bkey_s_c k,
 }
 
 int bch2_inode_v3_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			  unsigned flags, struct printbuf *err)
+			  enum bkey_invalid_flags flags,
+			  struct printbuf *err)
 {
 	struct bkey_s_c_inode_v3 inode = bkey_s_c_to_inode_v3(k);
 
@@ -517,7 +520,8 @@ void bch2_inode_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c
 }
 
 int bch2_inode_generation_invalid(const struct bch_fs *c, struct bkey_s_c k,
-				  unsigned flags, struct printbuf *err)
+				  enum bkey_invalid_flags flags,
+				  struct printbuf *err)
 {
 	if (k.k->p.inode) {
 		prt_printf(err, "nonzero k.p.inode");
diff --git a/fs/bcachefs/inode.h b/fs/bcachefs/inode.h
index 0c3022d3f9..8f9be5e583 100644
--- a/fs/bcachefs/inode.h
+++ b/fs/bcachefs/inode.h
@@ -5,11 +5,15 @@
 #include "bkey.h"
 #include "opts.h"
 
+enum bkey_invalid_flags;
 extern const char * const bch2_inode_opts[];
 
-int bch2_inode_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
-int bch2_inode_v2_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
-int bch2_inode_v3_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
+int bch2_inode_invalid(const struct bch_fs *, struct bkey_s_c,
+		       enum bkey_invalid_flags, struct printbuf *);
+int bch2_inode_v2_invalid(const struct bch_fs *, struct bkey_s_c,
+			  enum bkey_invalid_flags, struct printbuf *);
+int bch2_inode_v3_invalid(const struct bch_fs *, struct bkey_s_c,
+			  enum bkey_invalid_flags, struct printbuf *);
 void bch2_inode_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 
 #define bch2_bkey_ops_inode ((struct bkey_ops) {	\
@@ -44,7 +48,7 @@ static inline bool bkey_is_inode(const struct bkey *k)
 }
 
 int bch2_inode_generation_invalid(const struct bch_fs *, struct bkey_s_c,
-				  unsigned, struct printbuf *);
+				  enum bkey_invalid_flags, struct printbuf *);
 void bch2_inode_generation_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 
 #define bch2_bkey_ops_inode_generation ((struct bkey_ops) {	\
diff --git a/fs/bcachefs/journal_io.c b/fs/bcachefs/journal_io.c
index a084c6d0fe..c7c2ae326f 100644
--- a/fs/bcachefs/journal_io.c
+++ b/fs/bcachefs/journal_io.c
@@ -340,7 +340,8 @@ static int journal_entry_btree_keys_validate(struct bch_fs *c,
 		int ret = journal_validate_key(c, jset, entry,
 					       entry->level,
 					       entry->btree_id,
-					       k, version, big_endian, write|BKEY_INVALID_FROM_JOURNAL);
+					       k, version, big_endian,
+					       write|BKEY_INVALID_JOURNAL);
 		if (ret == FSCK_DELETED_KEY)
 			continue;
 
diff --git a/fs/bcachefs/lru.c b/fs/bcachefs/lru.c
index e04c037f0c..07d1929535 100644
--- a/fs/bcachefs/lru.c
+++ b/fs/bcachefs/lru.c
@@ -11,7 +11,8 @@
 
 /* KEY_TYPE_lru is obsolete: */
 int bch2_lru_invalid(const struct bch_fs *c, struct bkey_s_c k,
-		     unsigned flags, struct printbuf *err)
+		     enum bkey_invalid_flags flags,
+		     struct printbuf *err)
 {
 	if (!lru_pos_time(k.k->p)) {
 		prt_printf(err, "lru entry at time=0");
diff --git a/fs/bcachefs/lru.h b/fs/bcachefs/lru.h
index adb9842924..7a3be20a85 100644
--- a/fs/bcachefs/lru.h
+++ b/fs/bcachefs/lru.h
@@ -43,7 +43,8 @@ static inline enum bch_lru_type lru_type(struct bkey_s_c l)
 	return BCH_LRU_read;
 }
 
-int bch2_lru_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
+int bch2_lru_invalid(const struct bch_fs *, struct bkey_s_c,
+		     enum bkey_invalid_flags, struct printbuf *);
 void bch2_lru_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 
 void bch2_lru_pos_to_text(struct printbuf *, struct bpos);
diff --git a/fs/bcachefs/quota.c b/fs/bcachefs/quota.c
index 1decb7191d..d90db3fb82 100644
--- a/fs/bcachefs/quota.c
+++ b/fs/bcachefs/quota.c
@@ -60,7 +60,8 @@ const struct bch_sb_field_ops bch_sb_field_ops_quota = {
 };
 
 int bch2_quota_invalid(const struct bch_fs *c, struct bkey_s_c k,
-		       unsigned flags, struct printbuf *err)
+		       enum bkey_invalid_flags flags,
+		       struct printbuf *err)
 {
 	if (k.k->p.inode >= QTYP_NR) {
 		prt_printf(err, "invalid quota type (%llu >= %u)",
diff --git a/fs/bcachefs/quota.h b/fs/bcachefs/quota.h
index b0f7d4ee77..2f463874a3 100644
--- a/fs/bcachefs/quota.h
+++ b/fs/bcachefs/quota.h
@@ -5,9 +5,11 @@
 #include "inode.h"
 #include "quota_types.h"
 
+enum bkey_invalid_flags;
 extern const struct bch_sb_field_ops bch_sb_field_ops_quota;
 
-int bch2_quota_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
+int bch2_quota_invalid(const struct bch_fs *, struct bkey_s_c,
+		       enum bkey_invalid_flags, struct printbuf *);
 void bch2_quota_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 
 #define bch2_bkey_ops_quota ((struct bkey_ops) {	\
diff --git a/fs/bcachefs/reflink.c b/fs/bcachefs/reflink.c
index 26f0275ff0..39f711d506 100644
--- a/fs/bcachefs/reflink.c
+++ b/fs/bcachefs/reflink.c
@@ -26,7 +26,8 @@ static inline unsigned bkey_type_to_indirect(const struct bkey *k)
 /* reflink pointers */
 
 int bch2_reflink_p_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			   unsigned flags, struct printbuf *err)
+			   enum bkey_invalid_flags flags,
+			   struct printbuf *err)
 {
 	struct bkey_s_c_reflink_p p = bkey_s_c_to_reflink_p(k);
 
@@ -72,7 +73,8 @@ bool bch2_reflink_p_merge(struct bch_fs *c, struct bkey_s _l, struct bkey_s_c _r
 /* indirect extents */
 
 int bch2_reflink_v_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			   unsigned flags, struct printbuf *err)
+			   enum bkey_invalid_flags flags,
+			   struct printbuf *err)
 {
 	return bch2_bkey_ptrs_invalid(c, k, flags, err);
 }
@@ -117,7 +119,8 @@ int bch2_trans_mark_reflink_v(struct btree_trans *trans,
 /* indirect inline data */
 
 int bch2_indirect_inline_data_invalid(const struct bch_fs *c, struct bkey_s_c k,
-				      unsigned flags, struct printbuf *err)
+				      enum bkey_invalid_flags flags,
+				      struct printbuf *err)
 {
 	return 0;
 }
diff --git a/fs/bcachefs/reflink.h b/fs/bcachefs/reflink.h
index ba400188f5..fe52538efb 100644
--- a/fs/bcachefs/reflink.h
+++ b/fs/bcachefs/reflink.h
@@ -2,8 +2,10 @@
 #ifndef _BCACHEFS_REFLINK_H
 #define _BCACHEFS_REFLINK_H
 
+enum bkey_invalid_flags;
+
 int bch2_reflink_p_invalid(const struct bch_fs *, struct bkey_s_c,
-			   unsigned, struct printbuf *);
+			   enum bkey_invalid_flags, struct printbuf *);
 void bch2_reflink_p_to_text(struct printbuf *, struct bch_fs *,
 			    struct bkey_s_c);
 bool bch2_reflink_p_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c);
@@ -18,7 +20,7 @@ bool bch2_reflink_p_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c);
 })
 
 int bch2_reflink_v_invalid(const struct bch_fs *, struct bkey_s_c,
-			   unsigned, struct printbuf *);
+			   enum bkey_invalid_flags, struct printbuf *);
 void bch2_reflink_v_to_text(struct printbuf *, struct bch_fs *,
 			    struct bkey_s_c);
 int bch2_trans_mark_reflink_v(struct btree_trans *, enum btree_id, unsigned,
@@ -34,7 +36,7 @@ int bch2_trans_mark_reflink_v(struct btree_trans *, enum btree_id, unsigned,
 })
 
 int bch2_indirect_inline_data_invalid(const struct bch_fs *, struct bkey_s_c,
-				      unsigned, struct printbuf *);
+				      enum bkey_invalid_flags, struct printbuf *);
 void bch2_indirect_inline_data_to_text(struct printbuf *,
 				struct bch_fs *, struct bkey_s_c);
 int bch2_trans_mark_indirect_inline_data(struct btree_trans *,
diff --git a/fs/bcachefs/subvolume.c b/fs/bcachefs/subvolume.c
index 341c0d1b81..f26397aa2b 100644
--- a/fs/bcachefs/subvolume.c
+++ b/fs/bcachefs/subvolume.c
@@ -23,7 +23,8 @@ void bch2_snapshot_tree_to_text(struct printbuf *out, struct bch_fs *c,
 }
 
 int bch2_snapshot_tree_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			       unsigned flags, struct printbuf *err)
+			       enum bkey_invalid_flags flags,
+			       struct printbuf *err)
 {
 	if (bkey_gt(k.k->p, POS(0, U32_MAX)) ||
 	    bkey_lt(k.k->p, POS(0, 1))) {
@@ -97,7 +98,8 @@ void bch2_snapshot_to_text(struct printbuf *out, struct bch_fs *c,
 }
 
 int bch2_snapshot_invalid(const struct bch_fs *c, struct bkey_s_c k,
-			  unsigned flags, struct printbuf *err)
+			  enum bkey_invalid_flags flags,
+			  struct printbuf *err)
 {
 	struct bkey_s_c_snapshot s;
 	u32 i, id;
diff --git a/fs/bcachefs/subvolume.h b/fs/bcachefs/subvolume.h
index 1a39f713db..105410e080 100644
--- a/fs/bcachefs/subvolume.h
+++ b/fs/bcachefs/subvolume.h
@@ -5,9 +5,11 @@
 #include "darray.h"
 #include "subvolume_types.h"
 
+enum bkey_invalid_flags;
+
 void bch2_snapshot_tree_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 int bch2_snapshot_tree_invalid(const struct bch_fs *, struct bkey_s_c,
-			       unsigned, struct printbuf *);
+			       enum bkey_invalid_flags, struct printbuf *);
 
 #define bch2_bkey_ops_snapshot_tree ((struct bkey_ops) {	\
 	.key_invalid	= bch2_snapshot_tree_invalid,		\
@@ -19,7 +21,7 @@ int bch2_snapshot_tree_lookup(struct btree_trans *, u32, struct bch_snapshot_tre
 
 void bch2_snapshot_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 int bch2_snapshot_invalid(const struct bch_fs *, struct bkey_s_c,
-			  unsigned, struct printbuf *);
+			  enum bkey_invalid_flags, struct printbuf *);
 int bch2_mark_snapshot(struct btree_trans *, enum btree_id, unsigned,
 		       struct bkey_s_c, struct bkey_s_c, unsigned);
 
diff --git a/fs/bcachefs/xattr.c b/fs/bcachefs/xattr.c
index f47a085d14..867cc68782 100644
--- a/fs/bcachefs/xattr.c
+++ b/fs/bcachefs/xattr.c
@@ -70,7 +70,8 @@ const struct bch_hash_desc bch2_xattr_hash_desc = {
 };
 
 int bch2_xattr_invalid(const struct bch_fs *c, struct bkey_s_c k,
-		       unsigned flags, struct printbuf *err)
+		       enum bkey_invalid_flags flags,
+		       struct printbuf *err)
 {
 	const struct xattr_handler *handler;
 	struct bkey_s_c_xattr xattr = bkey_s_c_to_xattr(k);
diff --git a/fs/bcachefs/xattr.h b/fs/bcachefs/xattr.h
index b3e16729bc..214cbbaac3 100644
--- a/fs/bcachefs/xattr.h
+++ b/fs/bcachefs/xattr.h
@@ -6,7 +6,8 @@
 
 extern const struct bch_hash_desc bch2_xattr_hash_desc;
 
-int bch2_xattr_invalid(const struct bch_fs *, struct bkey_s_c, unsigned, struct printbuf *);
+int bch2_xattr_invalid(const struct bch_fs *, struct bkey_s_c,
+		       enum bkey_invalid_flags, struct printbuf *);
 void bch2_xattr_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
 
 #define bch2_bkey_ops_xattr ((struct bkey_ops) {	\
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 05/10] bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE()
  2023-07-09 17:15 [PATCH 00/10] bcachefs - semvar, forward compatibility Kent Overstreet
                   ` (3 preceding siblings ...)
  2023-07-09 17:15 ` [PATCH 04/10] bcachefs: Change check for invalid key types Kent Overstreet
@ 2023-07-09 17:15 ` Kent Overstreet
  2023-07-13 13:42   ` Brian Foster
  2023-07-09 17:15 ` [PATCH 06/10] bcachefs: version_upgrade is now an enum Kent Overstreet
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 17:15 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, bfoster, sandeen

Version upgrades are not atomic operations: when we do a version upgrade
we need to update the superblock before we start using new features, and
then when the upgrade completes we need to update the superblock again.
This adds a new superblock field so we can detect and handle incomplete
version upgrades.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/alloc_background.c |  3 +-
 fs/bcachefs/bcachefs.h         |  7 +++++
 fs/bcachefs/bcachefs_format.h  |  5 ++++
 fs/bcachefs/recovery.c         | 54 +++++++++++++++++++---------------
 fs/bcachefs/super-io.c         | 18 ++++++++++++
 5 files changed, 62 insertions(+), 25 deletions(-)

diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c
index c59629bbbc..b1dfe300d9 100644
--- a/fs/bcachefs/alloc_background.c
+++ b/fs/bcachefs/alloc_background.c
@@ -1232,8 +1232,7 @@ int bch2_check_alloc_hole_bucket_gens(struct btree_trans *trans,
 	unsigned i, gens_offset, gens_end_offset;
 	int ret;
 
-	if (c->sb.version < bcachefs_metadata_version_bucket_gens &&
-	    !c->opts.version_upgrade)
+	if (c->sb.version < bcachefs_metadata_version_bucket_gens)
 		return 0;
 
 	bch2_btree_iter_set_pos(bucket_gens_iter, alloc_gens_pos(start, &gens_offset));
diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index a8488d4e18..d7f030aa30 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -712,6 +712,7 @@ struct bch_fs {
 
 		u16		version;
 		u16		version_min;
+		u16		version_upgrade_complete;
 
 		u8		nr_devices;
 		u8		clean;
@@ -1134,6 +1135,12 @@ static inline bool bch2_dev_exists2(const struct bch_fs *c, unsigned dev)
 	return dev < c->sb.nr_devices && c->devs[dev];
 }
 
+static inline bool bch2_version_upgrading_to(const struct bch_fs *c, unsigned new_version)
+{
+	return c->sb.version_upgrade_complete < new_version &&
+		c->sb.version >= new_version;
+}
+
 #define BKEY_PADDED_ONSTACK(key, pad)				\
 	struct { struct bkey_i key; __u64 key ## _pad[pad]; }
 
diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
index 49b86bfda7..c397a3b96b 100644
--- a/fs/bcachefs/bcachefs_format.h
+++ b/fs/bcachefs/bcachefs_format.h
@@ -1748,6 +1748,11 @@ LE64_BITMASK(BCH_SB_JOURNAL_TRANSACTION_NAMES,struct bch_sb, flags[4], 32, 33);
 LE64_BITMASK(BCH_SB_NOCOW,		struct bch_sb, flags[4], 33, 34);
 LE64_BITMASK(BCH_SB_WRITE_BUFFER_SIZE,	struct bch_sb, flags[4], 34, 54);
 
+/* flags[4] 56-64 unused: */
+
+LE64_BITMASK(BCH_SB_VERSION_UPGRADE_COMPLETE,
+					struct bch_sb, flags[5],  0, 16);
+
 /*
  * Features:
  *
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index 9ea85b097e..0173707cfd 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -1107,6 +1107,31 @@ static int bch2_fs_upgrade_for_subvolumes(struct bch_fs *c)
 	return ret;
 }
 
+static void check_version_upgrade(struct bch_fs *c)
+{
+	unsigned version = c->sb.version_upgrade_complete ?: c->sb.version;
+
+	if (version < bcachefs_metadata_required_upgrade_below) {
+		struct printbuf buf = PRINTBUF;
+
+		if (version != c->sb.version)
+			prt_str(&buf, "version upgrade incomplete:\n");
+
+		prt_str(&buf, "version ");
+		bch2_version_to_text(&buf, version);
+		prt_str(&buf, " prior to ");
+		bch2_version_to_text(&buf, bcachefs_metadata_required_upgrade_below);
+		prt_str(&buf, ", upgrade and fsck required");
+
+		bch_info(c, "%s", buf.buf);
+		printbuf_exit(&buf);
+
+		c->opts.version_upgrade	= true;
+		c->opts.fsck		= true;
+		c->opts.fix_errors	= FSCK_OPT_YES;
+	}
+}
+
 int bch2_fs_recovery(struct bch_fs *c)
 {
 	struct bch_sb_field_clean *clean = NULL;
@@ -1146,23 +1171,8 @@ int bch2_fs_recovery(struct bch_fs *c)
 		goto err;
 	}
 
-	if (!c->opts.nochanges &&
-	    c->sb.version < bcachefs_metadata_required_upgrade_below) {
-		struct printbuf buf = PRINTBUF;
-
-		prt_str(&buf, "version ");
-		bch2_version_to_text(&buf, c->sb.version);
-		prt_str(&buf, " prior to ");
-		bch2_version_to_text(&buf, bcachefs_metadata_required_upgrade_below);
-		prt_str(&buf, ", upgrade and fsck required");
-
-		bch_info(c, "%s", buf.buf);
-		printbuf_exit(&buf);
-
-		c->opts.version_upgrade	= true;
-		c->opts.fsck		= true;
-		c->opts.fix_errors	= FSCK_OPT_YES;
-	}
+	if (!c->opts.nochanges)
+		check_version_upgrade(c);
 
 	if (c->opts.fsck && c->opts.norecovery) {
 		bch_err(c, "cannot select both norecovery and fsck");
@@ -1406,8 +1416,7 @@ int bch2_fs_recovery(struct bch_fs *c)
 	if (ret)
 		goto err;
 
-	if (c->sb.version < bcachefs_metadata_version_bucket_gens &&
-	    c->opts.version_upgrade) {
+	if (bch2_version_upgrading_to(c, bcachefs_metadata_version_bucket_gens)) {
 		bch_info(c, "initializing bucket_gens");
 		ret = bch2_bucket_gens_init(c);
 		if (ret)
@@ -1415,7 +1424,7 @@ int bch2_fs_recovery(struct bch_fs *c)
 		bch_verbose(c, "bucket_gens init done");
 	}
 
-	if (c->sb.version < bcachefs_metadata_version_snapshot_2) {
+	if (bch2_version_upgrading_to(c, bcachefs_metadata_version_snapshot_2)) {
 		ret = bch2_fs_upgrade_for_subvolumes(c);
 		if (ret)
 			goto err;
@@ -1443,9 +1452,8 @@ int bch2_fs_recovery(struct bch_fs *c)
 	}
 
 	mutex_lock(&c->sb_lock);
-	if (c->opts.version_upgrade) {
-		c->disk_sb.sb->version = cpu_to_le16(bcachefs_metadata_version_current);
-		c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALL);
+	if (BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb) != c->sb.version) {
+		SET_BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb, c->sb.version);
 		write_sb = true;
 	}
 
diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
index 833e78d48c..cb03e3f0c5 100644
--- a/fs/bcachefs/super-io.c
+++ b/fs/bcachefs/super-io.c
@@ -445,6 +445,7 @@ static void bch2_sb_update(struct bch_fs *c)
 	c->sb.user_uuid		= src->user_uuid;
 	c->sb.version		= le16_to_cpu(src->version);
 	c->sb.version_min	= le16_to_cpu(src->version_min);
+	c->sb.version_upgrade_complete = BCH_SB_VERSION_UPGRADE_COMPLETE(src) ?: c->sb.version;
 	c->sb.nr_devices	= src->nr_devices;
 	c->sb.clean		= BCH_SB_CLEAN(src);
 	c->sb.encryption_type	= BCH_SB_ENCRYPTION_TYPE(src);
@@ -1185,7 +1186,19 @@ int bch2_fs_mark_dirty(struct bch_fs *c)
 
 	mutex_lock(&c->sb_lock);
 	SET_BCH_SB_CLEAN(c->disk_sb.sb, false);
+
+	if (BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb) > bcachefs_metadata_version_current)
+		SET_BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb, bcachefs_metadata_version_current);
+
+	if (c->opts.version_upgrade ||
+	    c->sb.version > bcachefs_metadata_version_current)
+		c->disk_sb.sb->version = cpu_to_le16(bcachefs_metadata_version_current);
+
+	if (c->opts.version_upgrade)
+		c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALL);
+
 	c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALWAYS);
+
 	c->disk_sb.sb->compat[0] &= cpu_to_le64((1ULL << BCH_COMPAT_NR) - 1);
 	ret = bch2_write_super(c);
 	mutex_unlock(&c->sb_lock);
@@ -1529,6 +1542,11 @@ void bch2_sb_to_text(struct printbuf *out, struct bch_sb *sb,
 	bch2_version_to_text(out, le16_to_cpu(sb->version));
 	prt_newline(out);
 
+	prt_str(out, "Version upgrade complete:");
+	prt_tab(out);
+	bch2_version_to_text(out, BCH_SB_VERSION_UPGRADE_COMPLETE(sb));
+	prt_newline(out);
+
 	prt_printf(out, "Oldest version on disk:");
 	prt_tab(out);
 	bch2_version_to_text(out, le16_to_cpu(sb->version_min));
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 06/10] bcachefs: version_upgrade is now an enum
  2023-07-09 17:15 [PATCH 00/10] bcachefs - semvar, forward compatibility Kent Overstreet
                   ` (4 preceding siblings ...)
  2023-07-09 17:15 ` [PATCH 05/10] bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE() Kent Overstreet
@ 2023-07-09 17:15 ` Kent Overstreet
  2023-07-09 17:15 ` [PATCH 07/10] bcachefs: Kill bch2_bucket_gens_read() Kent Overstreet
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 17:15 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, bfoster, sandeen

The version_upgrade parameter is now an enum, not a bool, and it's
persistent in the superblock:
 - compatible (default):	upgrade to the latest compatible version
 - incompatible:		upgrade to latest incompatible version
 - none

Currently all upgrades are incompatible upgrades, but the next release
will introduce major:minor versions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/bcachefs.h        |  1 +
 fs/bcachefs/bcachefs_format.h | 12 ++++++++++++
 fs/bcachefs/opts.c            |  5 +++++
 fs/bcachefs/opts.h            |  5 +++--
 fs/bcachefs/recovery.c        | 19 +++++++++++--------
 fs/bcachefs/super-io.c        |  6 +++---
 6 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index d7f030aa30..1ab32b61f0 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -573,6 +573,7 @@ enum {
 	BCH_FS_INITIAL_GC_UNFIXED,	/* kill when we enumerate fsck errors */
 	BCH_FS_NEED_ANOTHER_GC,
 
+	BCH_FS_VERSION_UPGRADE,
 	BCH_FS_HAVE_DELETED_SNAPSHOTS,
 
 	/* errors: */
diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
index c397a3b96b..8a0f90a83d 100644
--- a/fs/bcachefs/bcachefs_format.h
+++ b/fs/bcachefs/bcachefs_format.h
@@ -1747,6 +1747,7 @@ LE64_BITMASK(BCH_SB_JOURNAL_RECLAIM_DELAY,struct bch_sb, flags[4], 0, 32);
 LE64_BITMASK(BCH_SB_JOURNAL_TRANSACTION_NAMES,struct bch_sb, flags[4], 32, 33);
 LE64_BITMASK(BCH_SB_NOCOW,		struct bch_sb, flags[4], 33, 34);
 LE64_BITMASK(BCH_SB_WRITE_BUFFER_SIZE,	struct bch_sb, flags[4], 34, 54);
+LE64_BITMASK(BCH_SB_VERSION_UPGRADE,	struct bch_sb, flags[4], 54, 56);
 
 /* flags[4] 56-64 unused: */
 
@@ -1819,6 +1820,17 @@ enum bch_sb_compat {
 
 /* options: */
 
+#define BCH_VERSION_UPGRADE_OPTS()	\
+	x(compatible,		0)	\
+	x(incompatible,		1)	\
+	x(none,			2)
+
+enum bch_version_upgrade_opts {
+#define x(t, n) BCH_VERSION_UPGRADE_##t = n,
+	BCH_VERSION_UPGRADE_OPTS()
+#undef x
+};
+
 #define BCH_REPLICAS_MAX		4U
 
 #define BCH_BKEY_PTRS_MAX		16U
diff --git a/fs/bcachefs/opts.c b/fs/bcachefs/opts.c
index a05c389830..0c0c83fa42 100644
--- a/fs/bcachefs/opts.c
+++ b/fs/bcachefs/opts.c
@@ -16,6 +16,11 @@ const char * const bch2_error_actions[] = {
 	NULL
 };
 
+const char * const bch2_version_upgrade_opts[] = {
+	BCH_VERSION_UPGRADE_OPTS()
+	NULL
+};
+
 const char * const bch2_sb_features[] = {
 	BCH_SB_FEATURES()
 	NULL
diff --git a/fs/bcachefs/opts.h b/fs/bcachefs/opts.h
index e7cf7e92f3..e105a742fd 100644
--- a/fs/bcachefs/opts.h
+++ b/fs/bcachefs/opts.h
@@ -9,6 +9,7 @@
 #include "bcachefs_format.h"
 
 extern const char * const bch2_error_actions[];
+extern const char * const bch2_version_upgrade_opts[];
 extern const char * const bch2_sb_features[];
 extern const char * const bch2_sb_compat[];
 extern const char * const bch2_btree_ids[];
@@ -388,8 +389,8 @@ enum opt_type {
 	  NULL,		"Reconstruct alloc btree")			\
 	x(version_upgrade,		u8,				\
 	  OPT_FS|OPT_MOUNT,						\
-	  OPT_BOOL(),							\
-	  BCH2_NO_SB_OPT,		false,				\
+	  OPT_STR(bch2_version_upgrade_opts),				\
+	  BCH_SB_VERSION_UPGRADE,	BCH_VERSION_UPGRADE_compatible,	\
 	  NULL,		"Set superblock to latest version,\n"		\
 			"allowing any new features to be used")		\
 	x(buckets_nouse,		u8,				\
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index 0173707cfd..c90205aa22 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -1111,11 +1111,16 @@ static void check_version_upgrade(struct bch_fs *c)
 {
 	unsigned version = c->sb.version_upgrade_complete ?: c->sb.version;
 
-	if (version < bcachefs_metadata_required_upgrade_below) {
+	if (version < bcachefs_metadata_required_upgrade_below ||
+	    (version < bcachefs_metadata_version_current &&
+	     c->opts.version_upgrade != BCH_VERSION_UPGRADE_none)) {
 		struct printbuf buf = PRINTBUF;
 
-		if (version != c->sb.version)
-			prt_str(&buf, "version upgrade incomplete:\n");
+		if (version != c->sb.version) {
+			prt_str(&buf, "version upgrade to ");
+			bch2_version_to_text(&buf, c->sb.version);
+			prt_str(&buf, " incomplete:\n");
+		}
 
 		prt_str(&buf, "version ");
 		bch2_version_to_text(&buf, version);
@@ -1126,9 +1131,9 @@ static void check_version_upgrade(struct bch_fs *c)
 		bch_info(c, "%s", buf.buf);
 		printbuf_exit(&buf);
 
-		c->opts.version_upgrade	= true;
 		c->opts.fsck		= true;
 		c->opts.fix_errors	= FSCK_OPT_YES;
+		set_bit(BCH_FS_VERSION_UPGRADE, &c->flags);
 	}
 }
 
@@ -1534,11 +1539,9 @@ int bch2_fs_initialize(struct bch_fs *c)
 	c->disk_sb.sb->compat[0] |= cpu_to_le64(1ULL << BCH_COMPAT_extents_above_btree_updates_done);
 	c->disk_sb.sb->compat[0] |= cpu_to_le64(1ULL << BCH_COMPAT_bformat_overflow_done);
 
-	if (c->sb.version < bcachefs_metadata_version_inode_v3)
-		c->opts.version_upgrade	= true;
-
-	if (c->opts.version_upgrade) {
+	if (c->opts.version_upgrade != BCH_VERSION_UPGRADE_none) {
 		c->disk_sb.sb->version = cpu_to_le16(bcachefs_metadata_version_current);
+		SET_BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb, bcachefs_metadata_version_current);
 		c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALL);
 		bch2_write_super(c);
 	}
diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
index cb03e3f0c5..b174003b57 100644
--- a/fs/bcachefs/super-io.c
+++ b/fs/bcachefs/super-io.c
@@ -809,7 +809,7 @@ int bch2_write_super(struct bch_fs *c)
 	closure_init_stack(cl);
 	memset(&sb_written, 0, sizeof(sb_written));
 
-	if (c->opts.version_upgrade) {
+	if (test_bit(BCH_FS_VERSION_UPGRADE, &c->flags)) {
 		c->disk_sb.sb->magic = BCHFS_MAGIC;
 		c->disk_sb.sb->layout.magic = BCHFS_MAGIC;
 	}
@@ -1190,11 +1190,11 @@ int bch2_fs_mark_dirty(struct bch_fs *c)
 	if (BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb) > bcachefs_metadata_version_current)
 		SET_BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb, bcachefs_metadata_version_current);
 
-	if (c->opts.version_upgrade ||
+	if (test_bit(BCH_FS_VERSION_UPGRADE, &c->flags) ||
 	    c->sb.version > bcachefs_metadata_version_current)
 		c->disk_sb.sb->version = cpu_to_le16(bcachefs_metadata_version_current);
 
-	if (c->opts.version_upgrade)
+	if (test_bit(BCH_FS_VERSION_UPGRADE, &c->flags))
 		c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALL);
 
 	c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALWAYS);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 07/10] bcachefs: Kill bch2_bucket_gens_read()
  2023-07-09 17:15 [PATCH 00/10] bcachefs - semvar, forward compatibility Kent Overstreet
                   ` (5 preceding siblings ...)
  2023-07-09 17:15 ` [PATCH 06/10] bcachefs: version_upgrade is now an enum Kent Overstreet
@ 2023-07-09 17:15 ` Kent Overstreet
  2023-07-09 17:15 ` [PATCH 08/10] bcachefs: Stash journal replay params in bch_fs Kent Overstreet
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 17:15 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, bfoster, sandeen

This folds bch2_bucket_gens_read() into bch2_alloc_read(), doing the
version check there.

This is prep work for enumarating all recovery passes: we need some
cleanup first to make calling all the recovery passes consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/alloc_background.c | 100 +++++++++++++++------------------
 fs/bcachefs/alloc_background.h |   1 -
 fs/bcachefs/recovery.c         |   6 +-
 3 files changed, 45 insertions(+), 62 deletions(-)

diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c
index b1dfe300d9..291d97f837 100644
--- a/fs/bcachefs/alloc_background.c
+++ b/fs/bcachefs/alloc_background.c
@@ -551,40 +551,6 @@ bch2_trans_start_alloc_update(struct btree_trans *trans, struct btree_iter *iter
 	return ERR_PTR(ret);
 }
 
-int bch2_alloc_read(struct bch_fs *c)
-{
-	struct btree_trans trans;
-	struct btree_iter iter;
-	struct bkey_s_c k;
-	struct bch_alloc_v4 a;
-	struct bch_dev *ca;
-	int ret;
-
-	bch2_trans_init(&trans, c, 0, 0);
-
-	for_each_btree_key(&trans, iter, BTREE_ID_alloc, POS_MIN,
-			   BTREE_ITER_PREFETCH, k, ret) {
-		/*
-		 * Not a fsck error because this is checked/repaired by
-		 * bch2_check_alloc_key() which runs later:
-		 */
-		if (!bch2_dev_bucket_exists(c, k.k->p))
-			continue;
-
-		ca = bch_dev_bkey_exists(c, k.k->p.inode);
-
-		*bucket_gen(ca, k.k->p.offset) = bch2_alloc_to_v4(k, &a)->gen;
-	}
-	bch2_trans_iter_exit(&trans, &iter);
-
-	bch2_trans_exit(&trans);
-
-	if (ret)
-		bch_err_fn(c, ret);
-
-	return ret;
-}
-
 static struct bpos alloc_gens_pos(struct bpos pos, unsigned *offset)
 {
 	*offset = pos.offset & KEY_TYPE_BUCKET_GENS_MASK;
@@ -692,45 +658,67 @@ int bch2_bucket_gens_init(struct bch_fs *c)
 	return ret;
 }
 
-int bch2_bucket_gens_read(struct bch_fs *c)
+int bch2_alloc_read(struct bch_fs *c)
 {
 	struct btree_trans trans;
 	struct btree_iter iter;
 	struct bkey_s_c k;
-	const struct bch_bucket_gens *g;
 	struct bch_dev *ca;
-	u64 b;
 	int ret;
 
+	down_read(&c->gc_lock);
 	bch2_trans_init(&trans, c, 0, 0);
 
-	for_each_btree_key(&trans, iter, BTREE_ID_bucket_gens, POS_MIN,
-			   BTREE_ITER_PREFETCH, k, ret) {
-		u64 start = bucket_gens_pos_to_alloc(k.k->p, 0).offset;
-		u64 end = bucket_gens_pos_to_alloc(bpos_nosnap_successor(k.k->p), 0).offset;
+	if (c->sb.version_upgrade_complete >= bcachefs_metadata_version_bucket_gens) {
+		const struct bch_bucket_gens *g;
+		u64 b;
 
-		if (k.k->type != KEY_TYPE_bucket_gens)
-			continue;
+		for_each_btree_key(&trans, iter, BTREE_ID_bucket_gens, POS_MIN,
+				   BTREE_ITER_PREFETCH, k, ret) {
+			u64 start = bucket_gens_pos_to_alloc(k.k->p, 0).offset;
+			u64 end = bucket_gens_pos_to_alloc(bpos_nosnap_successor(k.k->p), 0).offset;
 
-		g = bkey_s_c_to_bucket_gens(k).v;
+			if (k.k->type != KEY_TYPE_bucket_gens)
+				continue;
 
-		/*
-		 * Not a fsck error because this is checked/repaired by
-		 * bch2_check_alloc_key() which runs later:
-		 */
-		if (!bch2_dev_exists2(c, k.k->p.inode))
-			continue;
+			g = bkey_s_c_to_bucket_gens(k).v;
+
+			/*
+			 * Not a fsck error because this is checked/repaired by
+			 * bch2_check_alloc_key() which runs later:
+			 */
+			if (!bch2_dev_exists2(c, k.k->p.inode))
+				continue;
 
-		ca = bch_dev_bkey_exists(c, k.k->p.inode);
+			ca = bch_dev_bkey_exists(c, k.k->p.inode);
+
+			for (b = max_t(u64, ca->mi.first_bucket, start);
+			     b < min_t(u64, ca->mi.nbuckets, end);
+			     b++)
+				*bucket_gen(ca, b) = g->gens[b & KEY_TYPE_BUCKET_GENS_MASK];
+		}
+		bch2_trans_iter_exit(&trans, &iter);
+	} else {
+		struct bch_alloc_v4 a;
 
-		for (b = max_t(u64, ca->mi.first_bucket, start);
-		     b < min_t(u64, ca->mi.nbuckets, end);
-		     b++)
-			*bucket_gen(ca, b) = g->gens[b & KEY_TYPE_BUCKET_GENS_MASK];
+		for_each_btree_key(&trans, iter, BTREE_ID_alloc, POS_MIN,
+				   BTREE_ITER_PREFETCH, k, ret) {
+			/*
+			 * Not a fsck error because this is checked/repaired by
+			 * bch2_check_alloc_key() which runs later:
+			 */
+			if (!bch2_dev_bucket_exists(c, k.k->p))
+				continue;
+
+			ca = bch_dev_bkey_exists(c, k.k->p.inode);
+
+			*bucket_gen(ca, k.k->p.offset) = bch2_alloc_to_v4(k, &a)->gen;
+		}
+		bch2_trans_iter_exit(&trans, &iter);
 	}
-	bch2_trans_iter_exit(&trans, &iter);
 
 	bch2_trans_exit(&trans);
+	up_read(&c->gc_lock);
 
 	if (ret)
 		bch_err_fn(c, ret);
diff --git a/fs/bcachefs/alloc_background.h b/fs/bcachefs/alloc_background.h
index d1bf45a4b4..c0914feb54 100644
--- a/fs/bcachefs/alloc_background.h
+++ b/fs/bcachefs/alloc_background.h
@@ -212,7 +212,6 @@ static inline bool bkey_is_alloc(const struct bkey *k)
 }
 
 int bch2_alloc_read(struct bch_fs *);
-int bch2_bucket_gens_read(struct bch_fs *);
 
 int bch2_trans_mark_alloc(struct btree_trans *, enum btree_id, unsigned,
 			  struct bkey_s_c, struct bkey_i *, unsigned);
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index c90205aa22..4c61a28e49 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -1309,11 +1309,7 @@ int bch2_fs_recovery(struct bch_fs *c)
 		goto err;
 
 	bch_verbose(c, "starting alloc read");
-	down_read(&c->gc_lock);
-	ret = c->sb.version < bcachefs_metadata_version_bucket_gens
-		? bch2_alloc_read(c)
-		: bch2_bucket_gens_read(c);
-	up_read(&c->gc_lock);
+	ret = bch2_alloc_read(c);
 	if (ret)
 		goto err;
 	bch_verbose(c, "alloc read done");
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 08/10] bcachefs: Stash journal replay params in bch_fs
  2023-07-09 17:15 [PATCH 00/10] bcachefs - semvar, forward compatibility Kent Overstreet
                   ` (6 preceding siblings ...)
  2023-07-09 17:15 ` [PATCH 07/10] bcachefs: Kill bch2_bucket_gens_read() Kent Overstreet
@ 2023-07-09 17:15 ` Kent Overstreet
  2023-07-09 17:15 ` [PATCH 09/10] bcachefs: Enumerate recovery passes Kent Overstreet
  2023-07-09 17:15 ` [PATCH 10/10] bcachefs: bcachefs_metadata_version_major_minor Kent Overstreet
  9 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 17:15 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, bfoster, sandeen

For the upcoming enumeration of recovery passes, we need all recovery
passes to be called the same way - including journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/bcachefs.h |  3 +++
 fs/bcachefs/recovery.c | 11 ++++++++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index 1ab32b61f0..67ed55761a 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -993,6 +993,9 @@ struct bch_fs {
 	/* QUOTAS */
 	struct bch_memquota_type quotas[QTYP_NR];
 
+	/* RECOVERY */
+	u64			journal_replay_seq_start;
+	u64			journal_replay_seq_end;
 	/* DEBUG JUNK */
 	struct dentry		*fs_debug_dir;
 	struct dentry		*btree_debug_dir;
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index 4c61a28e49..1499efc9d2 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -624,11 +624,13 @@ static int journal_sort_seq_cmp(const void *_l, const void *_r)
 	return cmp_int(l->journal_seq, r->journal_seq);
 }
 
-static int bch2_journal_replay(struct bch_fs *c, u64 start_seq, u64 end_seq)
+static int bch2_journal_replay(struct bch_fs *c)
 {
 	struct journal_keys *keys = &c->journal_keys;
 	struct journal_key **keys_sorted, *k;
 	struct journal *j = &c->journal;
+	u64 start_seq	= c->journal_replay_seq_start;
+	u64 end_seq	= c->journal_replay_seq_start;
 	size_t i;
 	int ret;
 
@@ -1256,6 +1258,9 @@ int bch2_fs_recovery(struct bch_fs *c)
 		blacklist_seq = journal_seq = le64_to_cpu(clean->journal_seq) + 1;
 	}
 
+	c->journal_replay_seq_start	= last_seq;
+	c->journal_replay_seq_end	= blacklist_seq - 1;;
+
 	if (c->opts.reconstruct_alloc) {
 		c->sb.compat &= ~(1ULL << BCH_COMPAT_alloc_info);
 		drop_alloc_keys(&c->journal_keys);
@@ -1346,7 +1351,7 @@ int bch2_fs_recovery(struct bch_fs *c)
 		set_bit(BCH_FS_MAY_GO_RW, &c->flags);
 
 		bch_info(c, "starting journal replay, %zu keys", c->journal_keys.nr);
-		ret = bch2_journal_replay(c, last_seq, blacklist_seq - 1);
+		ret = bch2_journal_replay(c);
 		if (ret)
 			goto err;
 		if (c->opts.verbose || !c->sb.clean)
@@ -1406,7 +1411,7 @@ int bch2_fs_recovery(struct bch_fs *c)
 		set_bit(BCH_FS_MAY_GO_RW, &c->flags);
 
 		bch_verbose(c, "starting journal replay, %zu keys", c->journal_keys.nr);
-		ret = bch2_journal_replay(c, last_seq, blacklist_seq - 1);
+		ret = bch2_journal_replay(c);
 		if (ret)
 			goto err;
 		if (c->opts.verbose || !c->sb.clean)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 09/10] bcachefs: Enumerate recovery passes
  2023-07-09 17:15 [PATCH 00/10] bcachefs - semvar, forward compatibility Kent Overstreet
                   ` (7 preceding siblings ...)
  2023-07-09 17:15 ` [PATCH 08/10] bcachefs: Stash journal replay params in bch_fs Kent Overstreet
@ 2023-07-09 17:15 ` Kent Overstreet
  2023-07-09 17:15 ` [PATCH 10/10] bcachefs: bcachefs_metadata_version_major_minor Kent Overstreet
  9 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 17:15 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, bfoster, sandeen

Recovery and fsck have many different passes/jobs to do, which always
run in the same order - but not all of them run all the time. Some are
for fsck, some for unclean shutdown, some for version upgrades.

This adds some new structure: a defined list of recovery passes that we
can run in a loop, as well as consolidating the log messages.

The main benefit is consolidating the "should run this recovery pass"
logic, as well as cleaning up the "this recovery pass has finished"
state; instead of having a bunch of ad-hoc state bits in c->flags, we've
now got c->curr_recovery_pass.

By consolidating the "should run this recovery pass" logic, in the
future on disk format upgrades will be able to say "upgrading to this
version requires x passes to run", instead of forcing all of fsck to
run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/alloc_background.c |  12 +-
 fs/bcachefs/alloc_foreground.c |   9 +-
 fs/bcachefs/backpointers.c     |   6 +-
 fs/bcachefs/bcachefs.h         |  49 ++++++-
 fs/bcachefs/btree_cache.c      |   2 +-
 fs/bcachefs/btree_gc.c         |   4 +-
 fs/bcachefs/fsck.c             |  77 ++---------
 fs/bcachefs/fsck.h             |  10 +-
 fs/bcachefs/recovery.c         | 227 +++++++++++++--------------------
 fs/bcachefs/subvolume.c        |  10 +-
 fs/bcachefs/subvolume.h        |   8 +-
 11 files changed, 175 insertions(+), 239 deletions(-)

diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c
index 291d97f837..8d8481fc1e 100644
--- a/fs/bcachefs/alloc_background.c
+++ b/fs/bcachefs/alloc_background.c
@@ -286,7 +286,7 @@ int bch2_alloc_v4_invalid(const struct bch_fs *c, struct bkey_s_c k,
 
 	if (rw == WRITE &&
 	    !(flags & BKEY_INVALID_JOURNAL) &&
-	    test_bit(BCH_FS_CHECK_BACKPOINTERS_DONE, &c->flags)) {
+	    c->curr_recovery_pass > BCH_RECOVERY_PASS_check_btree_backpointers) {
 		unsigned i, bp_len = 0;
 
 		for (i = 0; i < BCH_ALLOC_V4_NR_BACKPOINTERS(a.v); i++)
@@ -336,7 +336,7 @@ int bch2_alloc_v4_invalid(const struct bch_fs *c, struct bkey_s_c k,
 			}
 
 			if (!a.v->io_time[READ] &&
-			    test_bit(BCH_FS_CHECK_ALLOC_TO_LRU_REFS_DONE, &c->flags)) {
+			    c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_to_lru_refs) {
 				prt_printf(err, "cached bucket with read_time == 0");
 				return -BCH_ERR_invalid_bkey;
 			}
@@ -777,7 +777,7 @@ static int bch2_bucket_do_index(struct btree_trans *trans,
 		return ret;
 
 	if (ca->mi.freespace_initialized &&
-	    test_bit(BCH_FS_CHECK_ALLOC_DONE, &c->flags) &&
+	    c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_info &&
 	    bch2_trans_inconsistent_on(old.k->type != old_type, trans,
 			"incorrect key when %s %s:%llu:%llu:0 (got %s should be %s)\n"
 			"  for %s",
@@ -1663,7 +1663,7 @@ static int bch2_discard_one_bucket(struct btree_trans *trans,
 	}
 
 	if (a->v.journal_seq > c->journal.flushed_seq_ondisk) {
-		if (test_bit(BCH_FS_CHECK_ALLOC_DONE, &c->flags)) {
+		if (c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_info) {
 			bch2_trans_inconsistent(trans,
 				"clearing need_discard but journal_seq %llu > flushed_seq %llu\n"
 				"%s",
@@ -1676,7 +1676,7 @@ static int bch2_discard_one_bucket(struct btree_trans *trans,
 	}
 
 	if (a->v.data_type != BCH_DATA_need_discard) {
-		if (test_bit(BCH_FS_CHECK_ALLOC_DONE, &c->flags)) {
+		if (c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_info) {
 			bch2_trans_inconsistent(trans,
 				"bucket incorrectly set in need_discard btree\n"
 				"%s",
@@ -1844,7 +1844,7 @@ static int invalidate_one_bucket(struct btree_trans *trans,
 		bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&a->k_i));
 
 	bch_err(c, "%s", buf.buf);
-	if (test_bit(BCH_FS_CHECK_LRUS_DONE, &c->flags)) {
+	if (c->curr_recovery_pass > BCH_RECOVERY_PASS_check_lrus) {
 		bch2_inconsistent_error(c);
 		ret = -EINVAL;
 	}
diff --git a/fs/bcachefs/alloc_foreground.c b/fs/bcachefs/alloc_foreground.c
index 0cc5e9f8d4..06bfcc5a49 100644
--- a/fs/bcachefs/alloc_foreground.c
+++ b/fs/bcachefs/alloc_foreground.c
@@ -324,7 +324,7 @@ static struct open_bucket *try_alloc_bucket(struct btree_trans *trans, struct bc
 	a = bch2_alloc_to_v4(k, &a_convert);
 
 	if (a->data_type != BCH_DATA_free) {
-		if (!test_bit(BCH_FS_CHECK_ALLOC_DONE, &c->flags)) {
+		if (c->curr_recovery_pass <= BCH_RECOVERY_PASS_check_alloc_info) {
 			ob = NULL;
 			goto err;
 		}
@@ -340,7 +340,7 @@ static struct open_bucket *try_alloc_bucket(struct btree_trans *trans, struct bc
 	}
 
 	if (genbits != (alloc_freespace_genbits(*a) >> 56) &&
-	    test_bit(BCH_FS_CHECK_ALLOC_DONE, &c->flags)) {
+	    c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_info) {
 		prt_printf(&buf, "bucket in freespace btree with wrong genbits (got %u should be %llu)\n"
 		       "  freespace key ",
 		       genbits, alloc_freespace_genbits(*a) >> 56);
@@ -350,10 +350,9 @@ static struct open_bucket *try_alloc_bucket(struct btree_trans *trans, struct bc
 		bch2_trans_inconsistent(trans, "%s", buf.buf);
 		ob = ERR_PTR(-EIO);
 		goto err;
-
 	}
 
-	if (!test_bit(BCH_FS_CHECK_BACKPOINTERS_DONE, &c->flags)) {
+	if (c->curr_recovery_pass <= BCH_RECOVERY_PASS_check_extents_to_backpointers) {
 		struct bch_backpointer bp;
 		struct bpos bp_pos = POS_MIN;
 
@@ -556,7 +555,7 @@ static struct open_bucket *bch2_bucket_alloc_trans(struct btree_trans *trans,
 	if (s.skipped_need_journal_commit * 2 > avail)
 		bch2_journal_flush_async(&c->journal, NULL);
 
-	if (!ob && freespace && !test_bit(BCH_FS_CHECK_ALLOC_DONE, &c->flags)) {
+	if (!ob && freespace && c->curr_recovery_pass <= BCH_RECOVERY_PASS_check_alloc_info) {
 		freespace = false;
 		goto alloc;
 	}
diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c
index 571a7d19be..d412bae553 100644
--- a/fs/bcachefs/backpointers.c
+++ b/fs/bcachefs/backpointers.c
@@ -104,7 +104,7 @@ static noinline int backpointer_mod_err(struct btree_trans *trans,
 		bch2_bkey_val_to_text(&buf, c, orig_k);
 
 		bch_err(c, "%s", buf.buf);
-	} else if (test_bit(BCH_FS_CHECK_BACKPOINTERS_DONE, &c->flags)) {
+	} else if (c->curr_recovery_pass > BCH_RECOVERY_PASS_check_extents_to_backpointers) {
 		prt_printf(&buf, "backpointer not found when deleting");
 		prt_newline(&buf);
 		printbuf_indent_add(&buf, 2);
@@ -125,7 +125,7 @@ static noinline int backpointer_mod_err(struct btree_trans *trans,
 
 	printbuf_exit(&buf);
 
-	if (test_bit(BCH_FS_CHECK_BACKPOINTERS_DONE, &c->flags)) {
+	if (c->curr_recovery_pass > BCH_RECOVERY_PASS_check_extents_to_backpointers) {
 		bch2_inconsistent_error(c);
 		return -EIO;
 	} else {
@@ -258,7 +258,7 @@ static void backpointer_not_found(struct btree_trans *trans,
 	bch2_backpointer_to_text(&buf, &bp);
 	prt_printf(&buf, "\n  ");
 	bch2_bkey_val_to_text(&buf, c, k);
-	if (!test_bit(BCH_FS_CHECK_BACKPOINTERS_DONE, &c->flags))
+	if (c->curr_recovery_pass >= BCH_RECOVERY_PASS_check_extents_to_backpointers)
 		bch_err_ratelimited(c, "%s", buf.buf);
 	else
 		bch2_trans_inconsistent(trans, "%s", buf.buf);
diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index 67ed55761a..cfd4a7b9e8 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -564,11 +564,6 @@ enum {
 
 	/* fsck passes: */
 	BCH_FS_TOPOLOGY_REPAIR_DONE,
-	BCH_FS_INITIAL_GC_DONE,		/* kill when we enumerate fsck passes */
-	BCH_FS_CHECK_ALLOC_DONE,
-	BCH_FS_CHECK_LRUS_DONE,
-	BCH_FS_CHECK_BACKPOINTERS_DONE,
-	BCH_FS_CHECK_ALLOC_TO_LRU_REFS_DONE,
 	BCH_FS_FSCK_DONE,
 	BCH_FS_INITIAL_GC_UNFIXED,	/* kill when we enumerate fsck errors */
 	BCH_FS_NEED_ANOTHER_GC,
@@ -662,6 +657,48 @@ enum bch_write_ref {
 	BCH_WRITE_REF_NR,
 };
 
+#define PASS_SILENT		BIT(0)
+#define PASS_FSCK		BIT(1)
+#define PASS_UNCLEAN		BIT(2)
+#define PASS_ALWAYS		BIT(3)
+#define PASS_UPGRADE(v)		((v) << 4)
+
+#define BCH_RECOVERY_PASSES()									\
+	x(alloc_read,			PASS_ALWAYS)						\
+	x(stripes_read,			PASS_ALWAYS)						\
+	x(initialize_subvolumes,	PASS_UPGRADE(bcachefs_metadata_version_snapshot_2))	\
+	x(snapshots_read,		PASS_ALWAYS)						\
+	x(check_allocations,		PASS_FSCK)						\
+	x(set_may_go_rw,		PASS_ALWAYS|PASS_SILENT)				\
+	x(journal_replay,		PASS_ALWAYS)						\
+	x(check_alloc_info,		PASS_FSCK)						\
+	x(check_lrus,			PASS_FSCK)						\
+	x(check_btree_backpointers,	PASS_FSCK)						\
+	x(check_backpointers_to_extents,PASS_FSCK)						\
+	x(check_extents_to_backpointers,PASS_FSCK)						\
+	x(check_alloc_to_lru_refs,	PASS_FSCK)						\
+	x(fs_freespace_init,		PASS_ALWAYS|PASS_SILENT)				\
+	x(bucket_gens_init,		PASS_UPGRADE(bcachefs_metadata_version_bucket_gens))	\
+	x(fs_upgrade_for_subvolumes,	PASS_UPGRADE(bcachefs_metadata_version_snapshot_2))	\
+	x(check_snapshot_trees,		PASS_FSCK)						\
+	x(check_snapshots,		PASS_FSCK)						\
+	x(check_subvols,		PASS_FSCK)						\
+	x(delete_dead_snapshots,	PASS_FSCK|PASS_UNCLEAN|PASS_SILENT)			\
+	x(check_inodes,			PASS_FSCK|PASS_UNCLEAN)					\
+	x(check_extents,		PASS_FSCK)						\
+	x(check_dirents,		PASS_FSCK)						\
+	x(check_xattrs,			PASS_FSCK)						\
+	x(check_root,			PASS_FSCK)						\
+	x(check_directory_structure,	PASS_FSCK)						\
+	x(check_nlinks,			PASS_FSCK)						\
+	x(fix_reflink_p,		PASS_UPGRADE(bcachefs_metadata_version_reflink_p_fix))	\
+
+enum bch_recovery_pass {
+#define x(n, when)	BCH_RECOVERY_PASS_##n,
+	BCH_RECOVERY_PASSES()
+#undef x
+};
+
 struct bch_fs {
 	struct closure		cl;
 
@@ -996,6 +1033,8 @@ struct bch_fs {
 	/* RECOVERY */
 	u64			journal_replay_seq_start;
 	u64			journal_replay_seq_end;
+	enum bch_recovery_pass	curr_recovery_pass;
+
 	/* DEBUG JUNK */
 	struct dentry		*fs_debug_dir;
 	struct dentry		*btree_debug_dir;
diff --git a/fs/bcachefs/btree_cache.c b/fs/bcachefs/btree_cache.c
index 721e2b43e2..13c88d9533 100644
--- a/fs/bcachefs/btree_cache.c
+++ b/fs/bcachefs/btree_cache.c
@@ -825,7 +825,7 @@ static noinline void btree_bad_header(struct bch_fs *c, struct btree *b)
 {
 	struct printbuf buf = PRINTBUF;
 
-	if (!test_bit(BCH_FS_INITIAL_GC_DONE, &c->flags))
+	if (c->curr_recovery_pass <= BCH_RECOVERY_PASS_check_allocations)
 		return;
 
 	prt_printf(&buf,
diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c
index 1fc386761d..c47d5d8caa 100644
--- a/fs/bcachefs/btree_gc.c
+++ b/fs/bcachefs/btree_gc.c
@@ -1810,7 +1810,7 @@ int bch2_gc(struct bch_fs *c, bool initial, bool metadata_only)
 
 	if (IS_ENABLED(CONFIG_BCACHEFS_DEBUG) ||
 	    (BCH_SB_HAS_TOPOLOGY_ERRORS(c->disk_sb.sb) &&
-	     !test_bit(BCH_FS_INITIAL_GC_DONE, &c->flags) &&
+	     c->curr_recovery_pass <= BCH_RECOVERY_PASS_check_allocations &&
 	     c->opts.fix_errors != FSCK_OPT_NO)) {
 		bch_info(c, "Starting topology repair pass");
 		ret = bch2_repair_topology(c);
@@ -1825,7 +1825,7 @@ int bch2_gc(struct bch_fs *c, bool initial, bool metadata_only)
 
 	if (ret == -BCH_ERR_need_topology_repair &&
 	    !test_bit(BCH_FS_TOPOLOGY_REPAIR_DONE, &c->flags) &&
-	    !test_bit(BCH_FS_INITIAL_GC_DONE, &c->flags)) {
+	    c->curr_recovery_pass <= BCH_RECOVERY_PASS_check_allocations) {
 		set_bit(BCH_FS_NEED_ANOTHER_GC, &c->flags);
 		SET_BCH_SB_HAS_TOPOLOGY_ERRORS(c->disk_sb.sb, true);
 		ret = 0;
diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c
index 98fde0bf6e..ddc2782fc5 100644
--- a/fs/bcachefs/fsck.c
+++ b/fs/bcachefs/fsck.c
@@ -350,7 +350,7 @@ static int lookup_lostfound(struct btree_trans *trans, u32 subvol,
 	}
 
 	/*
-	 * The check_dirents pass has already run, dangling dirents
+	 * The bch2_check_dirents pass has already run, dangling dirents
 	 * shouldn't exist here:
 	 */
 	return __lookup_inode(trans, inum, lostfound, &snapshot);
@@ -1008,8 +1008,9 @@ static int check_inode(struct btree_trans *trans,
 }
 
 noinline_for_stack
-static int check_inodes(struct bch_fs *c, bool full)
+int bch2_check_inodes(struct bch_fs *c)
 {
+	bool full = c->opts.fsck;
 	struct btree_trans trans;
 	struct btree_iter iter;
 	struct bch_inode_unpacked prev = { 0 };
@@ -1404,8 +1405,7 @@ static int check_extent(struct btree_trans *trans, struct btree_iter *iter,
  * Walk extents: verify that extents have a corresponding S_ISREG inode, and
  * that i_size an i_sectors are consistent
  */
-noinline_for_stack
-static int check_extents(struct bch_fs *c)
+int bch2_check_extents(struct bch_fs *c)
 {
 	struct inode_walker w = inode_walker_init();
 	struct snapshots_seen s;
@@ -1419,8 +1419,6 @@ static int check_extents(struct bch_fs *c)
 	snapshots_seen_init(&s);
 	bch2_trans_init(&trans, c, BTREE_ITER_MAX, 0);
 
-	bch_verbose(c, "checking extents");
-
 	ret = for_each_btree_key_commit(&trans, iter, BTREE_ID_extents,
 			POS(BCACHEFS_ROOT_INO, 0),
 			BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k,
@@ -1772,8 +1770,7 @@ static int check_dirent(struct btree_trans *trans, struct btree_iter *iter,
  * Walk dirents: verify that they all have a corresponding S_ISDIR inode,
  * validate d_type
  */
-noinline_for_stack
-static int check_dirents(struct bch_fs *c)
+int bch2_check_dirents(struct bch_fs *c)
 {
 	struct inode_walker dir = inode_walker_init();
 	struct inode_walker target = inode_walker_init();
@@ -1784,8 +1781,6 @@ static int check_dirents(struct bch_fs *c)
 	struct bkey_s_c k;
 	int ret = 0;
 
-	bch_verbose(c, "checking dirents");
-
 	snapshots_seen_init(&s);
 	bch2_trans_init(&trans, c, BTREE_ITER_MAX, 0);
 
@@ -1847,8 +1842,7 @@ static int check_xattr(struct btree_trans *trans, struct btree_iter *iter,
 /*
  * Walk xattrs: verify that they all have a corresponding inode
  */
-noinline_for_stack
-static int check_xattrs(struct bch_fs *c)
+int bch2_check_xattrs(struct bch_fs *c)
 {
 	struct inode_walker inode = inode_walker_init();
 	struct bch_hash_info hash_info;
@@ -1857,8 +1851,6 @@ static int check_xattrs(struct bch_fs *c)
 	struct bkey_s_c k;
 	int ret = 0;
 
-	bch_verbose(c, "checking xattrs");
-
 	bch2_trans_init(&trans, c, BTREE_ITER_MAX, 0);
 
 	ret = for_each_btree_key_commit(&trans, iter, BTREE_ID_xattrs,
@@ -1932,13 +1924,10 @@ static int check_root_trans(struct btree_trans *trans)
 }
 
 /* Get root directory, create if it doesn't exist: */
-noinline_for_stack
-static int check_root(struct bch_fs *c)
+int bch2_check_root(struct bch_fs *c)
 {
 	int ret;
 
-	bch_verbose(c, "checking root directory");
-
 	ret = bch2_trans_do(c, NULL, NULL,
 			     BTREE_INSERT_NOFAIL|
 			     BTREE_INSERT_LAZY_RW,
@@ -2089,11 +2078,10 @@ static int check_path(struct btree_trans *trans,
 
 /*
  * Check for unreachable inodes, as well as loops in the directory structure:
- * After check_dirents(), if an inode backpointer doesn't exist that means it's
+ * After bch2_check_dirents(), if an inode backpointer doesn't exist that means it's
  * unreachable:
  */
-noinline_for_stack
-static int check_directory_structure(struct bch_fs *c)
+int bch2_check_directory_structure(struct bch_fs *c)
 {
 	struct btree_trans trans;
 	struct btree_iter iter;
@@ -2376,15 +2364,12 @@ static int check_nlinks_update_hardlinks(struct bch_fs *c,
 	return 0;
 }
 
-noinline_for_stack
-static int check_nlinks(struct bch_fs *c)
+int bch2_check_nlinks(struct bch_fs *c)
 {
 	struct nlink_table links = { 0 };
 	u64 this_iter_range_start, next_iter_range_start = 0;
 	int ret = 0;
 
-	bch_verbose(c, "checking inode nlinks");
-
 	do {
 		this_iter_range_start = next_iter_range_start;
 		next_iter_range_start = U64_MAX;
@@ -2442,8 +2427,7 @@ static int fix_reflink_p_key(struct btree_trans *trans, struct btree_iter *iter,
 	return bch2_trans_update(trans, iter, &u->k_i, BTREE_TRIGGER_NORUN);
 }
 
-noinline_for_stack
-static int fix_reflink_p(struct bch_fs *c)
+int bch2_fix_reflink_p(struct bch_fs *c)
 {
 	struct btree_iter iter;
 	struct bkey_s_c k;
@@ -2452,8 +2436,6 @@ static int fix_reflink_p(struct bch_fs *c)
 	if (c->sb.version >= bcachefs_metadata_version_reflink_p_fix)
 		return 0;
 
-	bch_verbose(c, "fixing reflink_p keys");
-
 	ret = bch2_trans_run(c,
 		for_each_btree_key_commit(&trans, iter,
 				BTREE_ID_extents, POS_MIN,
@@ -2466,40 +2448,3 @@ static int fix_reflink_p(struct bch_fs *c)
 		bch_err_fn(c, ret);
 	return ret;
 }
-
-/*
- * Checks for inconsistencies that shouldn't happen, unless we have a bug.
- * Doesn't fix them yet, mainly because they haven't yet been observed:
- */
-int bch2_fsck_full(struct bch_fs *c)
-{
-	int ret;
-again:
-	ret =   bch2_fs_check_snapshot_trees(c);
-		bch2_fs_check_snapshots(c) ?:
-		bch2_fs_check_subvols(c) ?:
-		bch2_delete_dead_snapshots(c) ?:
-		check_inodes(c, true) ?:
-		check_extents(c) ?:
-		check_dirents(c) ?:
-		check_xattrs(c) ?:
-		check_root(c) ?:
-		check_directory_structure(c) ?:
-		check_nlinks(c) ?:
-		fix_reflink_p(c);
-
-	if (bch2_err_matches(ret, BCH_ERR_need_snapshot_cleanup)) {
-		set_bit(BCH_FS_HAVE_DELETED_SNAPSHOTS, &c->flags);
-		goto again;
-	}
-
-	return ret;
-}
-
-int bch2_fsck_walk_inodes_only(struct bch_fs *c)
-{
-	return  bch2_fs_check_snapshots(c) ?:
-		bch2_fs_check_subvols(c) ?:
-		bch2_delete_dead_snapshots(c) ?:
-		check_inodes(c, false);
-}
diff --git a/fs/bcachefs/fsck.h b/fs/bcachefs/fsck.h
index 264f2706b1..90c87b5089 100644
--- a/fs/bcachefs/fsck.h
+++ b/fs/bcachefs/fsck.h
@@ -2,7 +2,13 @@
 #ifndef _BCACHEFS_FSCK_H
 #define _BCACHEFS_FSCK_H
 
-int bch2_fsck_full(struct bch_fs *);
-int bch2_fsck_walk_inodes_only(struct bch_fs *);
+int bch2_check_inodes(struct bch_fs *);
+int bch2_check_extents(struct bch_fs *);
+int bch2_check_dirents(struct bch_fs *);
+int bch2_check_xattrs(struct bch_fs *);
+int bch2_check_root(struct bch_fs *);
+int bch2_check_directory_structure(struct bch_fs *);
+int bch2_check_nlinks(struct bch_fs *);
+int bch2_fix_reflink_p(struct bch_fs *);
 
 #endif /* _BCACHEFS_FSCK_H */
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index 1499efc9d2..3b9120bd36 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -1028,7 +1028,7 @@ static int read_btree_roots(struct bch_fs *c)
 	return ret;
 }
 
-static int bch2_fs_initialize_subvolumes(struct bch_fs *c)
+static int bch2_initialize_subvolumes(struct bch_fs *c)
 {
 	struct bkey_i_snapshot_tree	root_tree;
 	struct bkey_i_snapshot		root_snapshot;
@@ -1139,6 +1139,88 @@ static void check_version_upgrade(struct bch_fs *c)
 	}
 }
 
+static int bch2_check_allocations(struct bch_fs *c)
+{
+	return bch2_gc(c, true, c->opts.norecovery);
+}
+
+static int bch2_set_may_go_rw(struct bch_fs *c)
+{
+	set_bit(BCH_FS_MAY_GO_RW, &c->flags);
+	return 0;
+}
+
+struct recovery_pass_fn {
+	int		(*fn)(struct bch_fs *);
+	const char	*name;
+	unsigned	when;
+};
+
+static struct recovery_pass_fn recovery_passes[] = {
+#define x(_fn, _when)	{ .fn = bch2_##_fn, .name = #_fn, .when = _when },
+	BCH_RECOVERY_PASSES()
+#undef x
+};
+
+static bool should_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
+{
+	struct recovery_pass_fn *p = recovery_passes + c->curr_recovery_pass;
+
+	if (c->opts.norecovery && pass > BCH_RECOVERY_PASS_snapshots_read)
+		return false;
+	if ((p->when & PASS_FSCK) && c->opts.fsck)
+		return true;
+	if ((p->when & PASS_UNCLEAN) && !c->sb.clean)
+		return true;
+	if (p->when & PASS_ALWAYS)
+		return true;
+	if (p->when >= PASS_UPGRADE(0) &&
+	    bch2_version_upgrading_to(c, p->when >> 4))
+		return true;
+	return false;
+}
+
+static int bch2_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
+{
+	int ret;
+
+	c->curr_recovery_pass = pass;
+
+	if (should_run_recovery_pass(c, pass)) {
+		struct recovery_pass_fn *p = recovery_passes + pass;
+
+		if (!(p->when & PASS_SILENT))
+			printk(KERN_INFO bch2_log_msg(c, "%s..."), p->name);
+		ret = p->fn(c);
+		if (ret)
+			return ret;
+		if (!(p->when & PASS_SILENT))
+			printk(KERN_CONT " done\n");
+	}
+
+	return 0;
+}
+
+static int bch2_run_recovery_passes(struct bch_fs *c)
+{
+	int ret = 0;
+again:
+	while (c->curr_recovery_pass < ARRAY_SIZE(recovery_passes)) {
+		ret = bch2_run_recovery_pass(c, c->curr_recovery_pass);
+		if (ret)
+			break;
+		c->curr_recovery_pass++;
+	}
+
+	if (bch2_err_matches(ret, BCH_ERR_need_snapshot_cleanup)) {
+		set_bit(BCH_FS_HAVE_DELETED_SNAPSHOTS, &c->flags);
+		c->curr_recovery_pass = BCH_RECOVERY_PASS_delete_dead_snapshots;
+		goto again;
+	}
+
+	return ret;
+}
+
 int bch2_fs_recovery(struct bch_fs *c)
 {
 	struct bch_sb_field_clean *clean = NULL;
@@ -1313,141 +1395,9 @@ int bch2_fs_recovery(struct bch_fs *c)
 	if (ret)
 		goto err;
 
-	bch_verbose(c, "starting alloc read");
-	ret = bch2_alloc_read(c);
-	if (ret)
-		goto err;
-	bch_verbose(c, "alloc read done");
-
-	bch_verbose(c, "starting stripes_read");
-	ret = bch2_stripes_read(c);
+	ret = bch2_run_recovery_passes(c);
 	if (ret)
 		goto err;
-	bch_verbose(c, "stripes_read done");
-
-	if (c->sb.version < bcachefs_metadata_version_snapshot_2) {
-		ret = bch2_fs_initialize_subvolumes(c);
-		if (ret)
-			goto err;
-	}
-
-	bch_verbose(c, "reading snapshots table");
-	ret = bch2_fs_snapshots_start(c);
-	if (ret)
-		goto err;
-	bch_verbose(c, "reading snapshots done");
-
-	if (c->opts.fsck) {
-		bool metadata_only = c->opts.norecovery;
-
-		bch_info(c, "checking allocations");
-		ret = bch2_gc(c, true, metadata_only);
-		if (ret)
-			goto err;
-		bch_verbose(c, "done checking allocations");
-
-		set_bit(BCH_FS_INITIAL_GC_DONE, &c->flags);
-
-		set_bit(BCH_FS_MAY_GO_RW, &c->flags);
-
-		bch_info(c, "starting journal replay, %zu keys", c->journal_keys.nr);
-		ret = bch2_journal_replay(c);
-		if (ret)
-			goto err;
-		if (c->opts.verbose || !c->sb.clean)
-			bch_info(c, "journal replay done");
-
-		bch_info(c, "checking need_discard and freespace btrees");
-		ret = bch2_check_alloc_info(c);
-		if (ret)
-			goto err;
-		bch_verbose(c, "done checking need_discard and freespace btrees");
-
-		set_bit(BCH_FS_CHECK_ALLOC_DONE, &c->flags);
-
-		bch_info(c, "checking lrus");
-		ret = bch2_check_lrus(c);
-		if (ret)
-			goto err;
-		bch_verbose(c, "done checking lrus");
-		set_bit(BCH_FS_CHECK_LRUS_DONE, &c->flags);
-
-		bch_info(c, "checking backpointers to alloc keys");
-		ret = bch2_check_btree_backpointers(c);
-		if (ret)
-			goto err;
-		bch_verbose(c, "done checking backpointers to alloc keys");
-
-		bch_info(c, "checking backpointers to extents");
-		ret = bch2_check_backpointers_to_extents(c);
-		if (ret)
-			goto err;
-		bch_verbose(c, "done checking backpointers to extents");
-
-		bch_info(c, "checking extents to backpointers");
-		ret = bch2_check_extents_to_backpointers(c);
-		if (ret)
-			goto err;
-		bch_verbose(c, "done checking extents to backpointers");
-		set_bit(BCH_FS_CHECK_BACKPOINTERS_DONE, &c->flags);
-
-		bch_info(c, "checking alloc to lru refs");
-		ret = bch2_check_alloc_to_lru_refs(c);
-		if (ret)
-			goto err;
-		bch_verbose(c, "done checking alloc to lru refs");
-		set_bit(BCH_FS_CHECK_ALLOC_TO_LRU_REFS_DONE, &c->flags);
-	} else {
-		set_bit(BCH_FS_INITIAL_GC_DONE, &c->flags);
-		set_bit(BCH_FS_CHECK_ALLOC_DONE, &c->flags);
-		set_bit(BCH_FS_CHECK_LRUS_DONE, &c->flags);
-		set_bit(BCH_FS_CHECK_BACKPOINTERS_DONE, &c->flags);
-		set_bit(BCH_FS_CHECK_ALLOC_TO_LRU_REFS_DONE, &c->flags);
-		set_bit(BCH_FS_FSCK_DONE, &c->flags);
-
-		if (c->opts.norecovery)
-			goto out;
-
-		set_bit(BCH_FS_MAY_GO_RW, &c->flags);
-
-		bch_verbose(c, "starting journal replay, %zu keys", c->journal_keys.nr);
-		ret = bch2_journal_replay(c);
-		if (ret)
-			goto err;
-		if (c->opts.verbose || !c->sb.clean)
-			bch_info(c, "journal replay done");
-	}
-
-	ret = bch2_fs_freespace_init(c);
-	if (ret)
-		goto err;
-
-	if (bch2_version_upgrading_to(c, bcachefs_metadata_version_bucket_gens)) {
-		bch_info(c, "initializing bucket_gens");
-		ret = bch2_bucket_gens_init(c);
-		if (ret)
-			goto err;
-		bch_verbose(c, "bucket_gens init done");
-	}
-
-	if (bch2_version_upgrading_to(c, bcachefs_metadata_version_snapshot_2)) {
-		ret = bch2_fs_upgrade_for_subvolumes(c);
-		if (ret)
-			goto err;
-	}
-
-	if (c->opts.fsck) {
-		ret = bch2_fsck_full(c);
-		if (ret)
-			goto err;
-		bch_verbose(c, "fsck done");
-	} else if (!c->sb.clean) {
-		bch_verbose(c, "checking for deleted inodes");
-		ret = bch2_fsck_walk_inodes_only(c);
-		if (ret)
-			goto err;
-		bch_verbose(c, "check inodes done");
-	}
 
 	if (enabled_qtypes(c)) {
 		bch_verbose(c, "reading quotas");
@@ -1548,10 +1498,7 @@ int bch2_fs_initialize(struct bch_fs *c)
 	}
 	mutex_unlock(&c->sb_lock);
 
-	set_bit(BCH_FS_INITIAL_GC_DONE, &c->flags);
-	set_bit(BCH_FS_CHECK_LRUS_DONE, &c->flags);
-	set_bit(BCH_FS_CHECK_BACKPOINTERS_DONE, &c->flags);
-	set_bit(BCH_FS_CHECK_ALLOC_TO_LRU_REFS_DONE, &c->flags);
+	c->curr_recovery_pass = ARRAY_SIZE(recovery_passes);
 	set_bit(BCH_FS_MAY_GO_RW, &c->flags);
 	set_bit(BCH_FS_FSCK_DONE, &c->flags);
 
@@ -1599,12 +1546,12 @@ int bch2_fs_initialize(struct bch_fs *c)
 	if (ret)
 		goto err;
 
-	ret = bch2_fs_initialize_subvolumes(c);
+	ret = bch2_initialize_subvolumes(c);
 	if (ret)
 		goto err;
 
 	bch_verbose(c, "reading snapshots table");
-	ret = bch2_fs_snapshots_start(c);
+	ret = bch2_snapshots_read(c);
 	if (ret)
 		goto err;
 	bch_verbose(c, "reading snapshots done");
diff --git a/fs/bcachefs/subvolume.c b/fs/bcachefs/subvolume.c
index f26397aa2b..f3852c433c 100644
--- a/fs/bcachefs/subvolume.c
+++ b/fs/bcachefs/subvolume.c
@@ -408,7 +408,7 @@ static int check_snapshot_tree(struct btree_trans *trans,
  * And, make sure it points to a subvolume within that snapshot tree, or correct
  * it to point to the oldest subvolume within that snapshot tree.
  */
-int bch2_fs_check_snapshot_trees(struct bch_fs *c)
+int bch2_check_snapshot_trees(struct bch_fs *c)
 {
 	struct btree_iter iter;
 	struct bkey_s_c k;
@@ -612,7 +612,7 @@ static int check_snapshot(struct btree_trans *trans,
 	return ret;
 }
 
-int bch2_fs_check_snapshots(struct bch_fs *c)
+int bch2_check_snapshots(struct bch_fs *c)
 {
 	struct btree_iter iter;
 	struct bkey_s_c k;
@@ -692,7 +692,7 @@ static int check_subvol(struct btree_trans *trans,
 	return ret;
 }
 
-int bch2_fs_check_subvols(struct bch_fs *c)
+int bch2_check_subvols(struct bch_fs *c)
 {
 	struct btree_iter iter;
 	struct bkey_s_c k;
@@ -713,7 +713,7 @@ void bch2_fs_snapshots_exit(struct bch_fs *c)
 	genradix_free(&c->snapshots);
 }
 
-int bch2_fs_snapshots_start(struct bch_fs *c)
+int bch2_snapshots_read(struct bch_fs *c)
 {
 	struct btree_iter iter;
 	struct bkey_s_c k;
@@ -1151,7 +1151,7 @@ static int bch2_delete_dead_snapshots_hook(struct btree_trans *trans,
 
 	set_bit(BCH_FS_HAVE_DELETED_SNAPSHOTS, &c->flags);
 
-	if (!test_bit(BCH_FS_FSCK_DONE, &c->flags))
+	if (c->curr_recovery_pass <= BCH_RECOVERY_PASS_delete_dead_snapshots)
 		return 0;
 
 	bch2_delete_dead_snapshots_async(c);
diff --git a/fs/bcachefs/subvolume.h b/fs/bcachefs/subvolume.h
index 105410e080..daa9a6b081 100644
--- a/fs/bcachefs/subvolume.h
+++ b/fs/bcachefs/subvolume.h
@@ -130,12 +130,12 @@ static inline int snapshot_list_add(struct bch_fs *c, snapshot_id_list *s, u32 i
 	return ret;
 }
 
-int bch2_fs_check_snapshot_trees(struct bch_fs *);
-int bch2_fs_check_snapshots(struct bch_fs *);
-int bch2_fs_check_subvols(struct bch_fs *);
+int bch2_check_snapshot_trees(struct bch_fs *);
+int bch2_check_snapshots(struct bch_fs *);
+int bch2_check_subvols(struct bch_fs *);
 
 void bch2_fs_snapshots_exit(struct bch_fs *);
-int bch2_fs_snapshots_start(struct bch_fs *);
+int bch2_snapshots_read(struct bch_fs *);
 
 int bch2_subvolume_invalid(const struct bch_fs *, struct bkey_s_c,
 			   unsigned, struct printbuf *);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 10/10] bcachefs: bcachefs_metadata_version_major_minor
  2023-07-09 17:15 [PATCH 00/10] bcachefs - semvar, forward compatibility Kent Overstreet
                   ` (8 preceding siblings ...)
  2023-07-09 17:15 ` [PATCH 09/10] bcachefs: Enumerate recovery passes Kent Overstreet
@ 2023-07-09 17:15 ` Kent Overstreet
  2023-07-09 17:49   ` Thomas Weißschuh
  9 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 17:15 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, bfoster, sandeen

This introduces major/minor versioning to the superblock version number.
Major version number changes indicate incompatible releases; we can move
forward to a new major version number, but not backwards. Minor version
numbers indicate compatible changes - these add features, but can still
be mounted and used by old versions.

With the recent patches that make it possible to roll out new btrees and
key types without breaking compatibility, we should be able to roll out
most new features without incompatible changes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/bcachefs.h        |  1 -
 fs/bcachefs/bcachefs_format.h | 46 +++++++++++++-----------
 fs/bcachefs/btree_io.c        |  7 ++--
 fs/bcachefs/journal_io.c      | 12 ++++---
 fs/bcachefs/recovery.c        | 66 +++++++++++++++++++++++++++--------
 fs/bcachefs/super-io.c        | 58 ++++++++++++++++++++----------
 fs/bcachefs/super-io.h        |  3 +-
 7 files changed, 132 insertions(+), 61 deletions(-)

diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index cfd4a7b9e8..88a1782b2a 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -568,7 +568,6 @@ enum {
 	BCH_FS_INITIAL_GC_UNFIXED,	/* kill when we enumerate fsck errors */
 	BCH_FS_NEED_ANOTHER_GC,
 
-	BCH_FS_VERSION_UPGRADE,
 	BCH_FS_HAVE_DELETED_SNAPSHOTS,
 
 	/* errors: */
diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
index 8a0f90a83d..8c1955b6a4 100644
--- a/fs/bcachefs/bcachefs_format.h
+++ b/fs/bcachefs/bcachefs_format.h
@@ -1574,28 +1574,32 @@ struct bch_sb_field_journal_seq_blacklist {
  * One common version number for all on disk data structures - superblock, btree
  * nodes, journal entries
  */
+#define BCH_VERSION_MAJOR(_v)		((__u8) ((_v) >> 10))
+#define BCH_VERSION_MINOR(_v)		((__u8) ((_v) >> 0))
+#define BCH_VERSION(_major, _minor)	(((_major) << 10)|(_minor) << 0)
 
 #define BCH_METADATA_VERSIONS()				\
-	x(bkey_renumber,		10)		\
-	x(inode_btree_change,		11)		\
-	x(snapshot,			12)		\
-	x(inode_backpointers,		13)		\
-	x(btree_ptr_sectors_written,	14)		\
-	x(snapshot_2,			15)		\
-	x(reflink_p_fix,		16)		\
-	x(subvol_dirent,		17)		\
-	x(inode_v2,			18)		\
-	x(freespace,			19)		\
-	x(alloc_v4,			20)		\
-	x(new_data_types,		21)		\
-	x(backpointers,			22)		\
-	x(inode_v3,			23)		\
-	x(unwritten_extents,		24)		\
-	x(bucket_gens,			25)		\
-	x(lru_v2,			26)		\
-	x(fragmentation_lru,		27)		\
-	x(no_bps_in_alloc_keys,		28)		\
-	x(snapshot_trees,		29)
+	x(bkey_renumber,		BCH_VERSION(0, 10))		\
+	x(inode_btree_change,		BCH_VERSION(0, 11))		\
+	x(snapshot,			BCH_VERSION(0, 12))		\
+	x(inode_backpointers,		BCH_VERSION(0, 13))		\
+	x(btree_ptr_sectors_written,	BCH_VERSION(0, 14))		\
+	x(snapshot_2,			BCH_VERSION(0, 15))		\
+	x(reflink_p_fix,		BCH_VERSION(0, 16))		\
+	x(subvol_dirent,		BCH_VERSION(0, 17))		\
+	x(inode_v2,			BCH_VERSION(0, 18))		\
+	x(freespace,			BCH_VERSION(0, 19))		\
+	x(alloc_v4,			BCH_VERSION(0, 20))		\
+	x(new_data_types,		BCH_VERSION(0, 21))		\
+	x(backpointers,			BCH_VERSION(0, 22))		\
+	x(inode_v3,			BCH_VERSION(0, 23))		\
+	x(unwritten_extents,		BCH_VERSION(0, 24))		\
+	x(bucket_gens,			BCH_VERSION(0, 25))		\
+	x(lru_v2,			BCH_VERSION(0, 26))		\
+	x(fragmentation_lru,		BCH_VERSION(0, 27))		\
+	x(no_bps_in_alloc_keys,		BCH_VERSION(0, 28))		\
+	x(snapshot_trees,		BCH_VERSION(0, 29))		\
+	x(major_minor,			BCH_VERSION(1,  0))
 
 enum bcachefs_metadata_version {
 	bcachefs_metadata_version_min = 9,
@@ -1605,7 +1609,7 @@ enum bcachefs_metadata_version {
 	bcachefs_metadata_version_max
 };
 
-static const unsigned bcachefs_metadata_required_upgrade_below = bcachefs_metadata_version_snapshot_trees;
+static const unsigned bcachefs_metadata_required_upgrade_below = bcachefs_metadata_version_major_minor;
 
 #define bcachefs_metadata_version_current	(bcachefs_metadata_version_max - 1)
 
diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index a8197c5008..a8f7b71139 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -701,7 +701,9 @@ static int validate_bset(struct bch_fs *c, struct bch_dev *ca,
 
 	btree_err_on(!bch2_version_compatible(version),
 		     BTREE_ERR_INCOMPATIBLE, c, ca, b, i,
-		     "unsupported bset version %u", version);
+		     "unsupported bset version %u.%u",
+		     BCH_VERSION_MAJOR(version),
+		     BCH_VERSION_MINOR(version));
 
 	if (btree_err_on(version < c->sb.version_min,
 			 BTREE_ERR_FIXABLE, c, NULL, b, i,
@@ -713,7 +715,8 @@ static int validate_bset(struct bch_fs *c, struct bch_dev *ca,
 		mutex_unlock(&c->sb_lock);
 	}
 
-	if (btree_err_on(version > c->sb.version,
+	if (btree_err_on(BCH_VERSION_MAJOR(version) >
+			 BCH_VERSION_MAJOR(c->sb.version),
 			 BTREE_ERR_FIXABLE, c, NULL, b, i,
 			 "bset version %u newer than superblock version %u",
 			 version, c->sb.version)) {
diff --git a/fs/bcachefs/journal_io.c b/fs/bcachefs/journal_io.c
index c7c2ae326f..f861ae2f17 100644
--- a/fs/bcachefs/journal_io.c
+++ b/fs/bcachefs/journal_io.c
@@ -747,9 +747,11 @@ static int jset_validate(struct bch_fs *c,
 
 	version = le32_to_cpu(jset->version);
 	if (journal_entry_err_on(!bch2_version_compatible(version), c, jset, NULL,
-			"%s sector %llu seq %llu: incompatible journal entry version %u",
+			"%s sector %llu seq %llu: incompatible journal entry version %u.%u",
 			ca ? ca->name : c->name,
-			sector, le64_to_cpu(jset->seq), version)) {
+			sector, le64_to_cpu(jset->seq),
+			BCH_VERSION_MAJOR(version),
+			BCH_VERSION_MINOR(version))) {
 		/* don't try to continue: */
 		return -EINVAL;
 	}
@@ -794,9 +796,11 @@ static int jset_validate_early(struct bch_fs *c,
 
 	version = le32_to_cpu(jset->version);
 	if (journal_entry_err_on(!bch2_version_compatible(version), c, jset, NULL,
-			"%s sector %llu seq %llu: unknown journal entry version %u",
+			"%s sector %llu seq %llu: unknown journal entry version %u.%u",
 			ca ? ca->name : c->name,
-			sector, le64_to_cpu(jset->seq), version)) {
+			sector, le64_to_cpu(jset->seq),
+			BCH_VERSION_MAJOR(version),
+			BCH_VERSION_MINOR(version))) {
 		/* don't try to continue: */
 		return -EINVAL;
 	}
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index 3b9120bd36..2339b979ec 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -1111,31 +1111,69 @@ static int bch2_fs_upgrade_for_subvolumes(struct bch_fs *c)
 
 static void check_version_upgrade(struct bch_fs *c)
 {
-	unsigned version = c->sb.version_upgrade_complete ?: c->sb.version;
+	unsigned latest_compatible = bch2_version_compatible(c->sb.version);
+	unsigned latest_version	= bcachefs_metadata_version_current;
+	unsigned old_version = c->sb.version_upgrade_complete ?: c->sb.version;
+	unsigned new_version;
+
+	if (old_version < bcachefs_metadata_required_upgrade_below) {
+		if (c->opts.version_upgrade == BCH_VERSION_UPGRADE_incompatible ||
+		    latest_compatible < bcachefs_metadata_required_upgrade_below)
+			new_version = latest_version;
+		else
+			new_version = latest_compatible;
+	} else {
+		switch (c->opts.version_upgrade) {
+		case BCH_VERSION_UPGRADE_compatible:
+			new_version = latest_compatible;
+			break;
+		case BCH_VERSION_UPGRADE_incompatible:
+			new_version = latest_version;
+			break;
+		case BCH_VERSION_UPGRADE_none:
+			new_version = old_version;
+			break;
+		}
+	}
 
-	if (version < bcachefs_metadata_required_upgrade_below ||
-	    (version < bcachefs_metadata_version_current &&
-	     c->opts.version_upgrade != BCH_VERSION_UPGRADE_none)) {
+	if (new_version > old_version) {
 		struct printbuf buf = PRINTBUF;
 
-		if (version != c->sb.version) {
-			prt_str(&buf, "version upgrade to ");
+		if (old_version < bcachefs_metadata_required_upgrade_below)
+			prt_str(&buf, "Version upgrade required:\n");
+
+		if (old_version != c->sb.version) {
+			prt_str(&buf, "Version upgrade from ");
+			bch2_version_to_text(&buf, c->sb.version_upgrade_complete);
+			prt_str(&buf, " to ");
 			bch2_version_to_text(&buf, c->sb.version);
-			prt_str(&buf, " incomplete:\n");
+			prt_str(&buf, " incomplete\n");
 		}
 
-		prt_str(&buf, "version ");
-		bch2_version_to_text(&buf, version);
-		prt_str(&buf, " prior to ");
-		bch2_version_to_text(&buf, bcachefs_metadata_required_upgrade_below);
-		prt_str(&buf, ", upgrade and fsck required");
+		prt_str(&buf, "Doing ");
+		if (BCH_VERSION_MAJOR(old_version) != BCH_VERSION_MAJOR(new_version))
+			prt_str(&buf, "incompatible");
+		else
+			prt_str(&buf, "compatible");
+		prt_str(&buf, "version upgrade from ");
+		bch2_version_to_text(&buf, old_version);
+		prt_str(&buf, " to ");
+		bch2_version_to_text(&buf, new_version);
+		prt_newline(&buf);
+
+		prt_str(&buf, "fsck required");
 
 		bch_info(c, "%s", buf.buf);
-		printbuf_exit(&buf);
 
 		c->opts.fsck		= true;
 		c->opts.fix_errors	= FSCK_OPT_YES;
-		set_bit(BCH_FS_VERSION_UPGRADE, &c->flags);
+
+		mutex_lock(&c->sb_lock);
+		c->disk_sb.sb->version = cpu_to_le16(new_version);
+		c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALL);
+		mutex_unlock(&c->sb_lock);
+
+		printbuf_exit(&buf);
 	}
 }
 
diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
index b174003b57..b4aec5b6ae 100644
--- a/fs/bcachefs/super-io.c
+++ b/fs/bcachefs/super-io.c
@@ -23,19 +23,42 @@
 #include <linux/backing-dev.h>
 #include <linux/sort.h>
 
-static const char * const bch2_metadata_versions[] = {
-#define x(t, n) [n] = #t,
+struct bch2_metadata_version_str {
+	u16		version;
+	const char	*name;
+};
+
+static const struct bch2_metadata_version_str bch2_metadata_versions[] = {
+#define x(n, v) { .version = v, .name = #n },
 	BCH_METADATA_VERSIONS()
 #undef x
 };
 
 void bch2_version_to_text(struct printbuf *out, unsigned v)
 {
-	const char *str = v < ARRAY_SIZE(bch2_metadata_versions)
-		? bch2_metadata_versions[v]
-		: "(unknown version)";
+	const char *str = "(unknown version)";
+
+	for (unsigned i = 0; i < ARRAY_SIZE(bch2_metadata_versions); i++)
+		if (bch2_metadata_versions[i].version == v) {
+			str = bch2_metadata_versions[i].name;
+			break;
+		}
+
+	prt_printf(out, "%u.%u: %s", BCH_VERSION_MAJOR(v), BCH_VERSION_MINOR(v), str);
+}
+
+unsigned bch2_latest_compatible_version(unsigned v)
+{
+	if (!BCH_VERSION_MAJOR(v))
+		return v;
+
+	for (unsigned i = 0; i < ARRAY_SIZE(bch2_metadata_versions); i++)
+		if (bch2_metadata_versions[i].version > v &&
+		    BCH_VERSION_MAJOR(bch2_metadata_versions[i].version) ==
+		    BCH_VERSION_MAJOR(v))
+			v = bch2_metadata_versions[i].version;
 
-	prt_printf(out, "%u: %s", v, str);
+	return v;
 }
 
 const char * const bch2_sb_fields[] = {
@@ -809,10 +832,9 @@ int bch2_write_super(struct bch_fs *c)
 	closure_init_stack(cl);
 	memset(&sb_written, 0, sizeof(sb_written));
 
-	if (test_bit(BCH_FS_VERSION_UPGRADE, &c->flags)) {
-		c->disk_sb.sb->magic = BCHFS_MAGIC;
-		c->disk_sb.sb->layout.magic = BCHFS_MAGIC;
-	}
+	/* Make sure we're using the new magic numbers: */
+	c->disk_sb.sb->magic = BCHFS_MAGIC;
+	c->disk_sb.sb->layout.magic = BCHFS_MAGIC;
 
 	le64_add_cpu(&c->disk_sb.sb->seq, 1);
 
@@ -1187,19 +1209,19 @@ int bch2_fs_mark_dirty(struct bch_fs *c)
 	mutex_lock(&c->sb_lock);
 	SET_BCH_SB_CLEAN(c->disk_sb.sb, false);
 
+	/*
+	 * Downgrade, if superblock is at a higher version than currently
+	 * supported:
+	 */
 	if (BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb) > bcachefs_metadata_version_current)
 		SET_BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb, bcachefs_metadata_version_current);
-
-	if (test_bit(BCH_FS_VERSION_UPGRADE, &c->flags) ||
-	    c->sb.version > bcachefs_metadata_version_current)
+	if (c->sb.version > bcachefs_metadata_version_current)
 		c->disk_sb.sb->version = cpu_to_le16(bcachefs_metadata_version_current);
-
-	if (test_bit(BCH_FS_VERSION_UPGRADE, &c->flags))
-		c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALL);
+	if (c->sb.version_min > bcachefs_metadata_version_current)
+		c->disk_sb.sb->version_min = cpu_to_le16(bcachefs_metadata_version_current);
+	c->disk_sb.sb->compat[0] &= cpu_to_le64((1ULL << BCH_COMPAT_NR) - 1);
 
 	c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALWAYS);
-
-	c->disk_sb.sb->compat[0] &= cpu_to_le64((1ULL << BCH_COMPAT_NR) - 1);
 	ret = bch2_write_super(c);
 	mutex_unlock(&c->sb_lock);
 
diff --git a/fs/bcachefs/super-io.h b/fs/bcachefs/super-io.h
index cda71ec845..a850cc4ae6 100644
--- a/fs/bcachefs/super-io.h
+++ b/fs/bcachefs/super-io.h
@@ -11,11 +11,12 @@
 
 static inline bool bch2_version_compatible(u16 version)
 {
-	return version <= bcachefs_metadata_version_current &&
+	return BCH_VERSION_MAJOR(version) <= BCH_VERSION_MAJOR(bcachefs_metadata_version_current) &&
 		version >= bcachefs_metadata_version_min;
 }
 
 void bch2_version_to_text(struct printbuf *, unsigned);
+unsigned bch2_latest_compatible_version(unsigned);
 
 struct bch_sb_field *bch2_sb_field_get(struct bch_sb *, enum bch_sb_field_type);
 struct bch_sb_field *bch2_sb_field_resize(struct bch_sb_handle *,
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 10/10] bcachefs: bcachefs_metadata_version_major_minor
  2023-07-09 17:15 ` [PATCH 10/10] bcachefs: bcachefs_metadata_version_major_minor Kent Overstreet
@ 2023-07-09 17:49   ` Thomas Weißschuh
  2023-07-09 18:31     ` Kent Overstreet
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Weißschuh @ 2023-07-09 17:49 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs, bfoster, sandeen

Hi Kent,

the subject for this patch looks a bit like a placeholder, in contrast
with the other patches of the series.

More below.

On 2023-07-09 13:15:51-0400, Kent Overstreet wrote:
> This introduces major/minor versioning to the superblock version number.
> Major version number changes indicate incompatible releases; we can move
> forward to a new major version number, but not backwards. Minor version
> numbers indicate compatible changes - these add features, but can still
> be mounted and used by old versions.
> 
> With the recent patches that make it possible to roll out new btrees and
> key types without breaking compatibility, we should be able to roll out
> most new features without incompatible changes.
> 
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> ---
>  fs/bcachefs/bcachefs.h        |  1 -
>  fs/bcachefs/bcachefs_format.h | 46 +++++++++++++-----------
>  fs/bcachefs/btree_io.c        |  7 ++--
>  fs/bcachefs/journal_io.c      | 12 ++++---
>  fs/bcachefs/recovery.c        | 66 +++++++++++++++++++++++++++--------
>  fs/bcachefs/super-io.c        | 58 ++++++++++++++++++++----------
>  fs/bcachefs/super-io.h        |  3 +-
>  7 files changed, 132 insertions(+), 61 deletions(-)

> [..]

> diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
> index 8a0f90a83d..8c1955b6a4 100644
> --- a/fs/bcachefs/bcachefs_format.h
> +++ b/fs/bcachefs/bcachefs_format.h
> @@ -1574,28 +1574,32 @@ struct bch_sb_field_journal_seq_blacklist {
>   * One common version number for all on disk data structures - superblock, btree
>   * nodes, journal entries
>   */
> +#define BCH_VERSION_MAJOR(_v)		((__u8) ((_v) >> 10))

This "wastes" two bits of the major version that do not fit into the
u8 value. This is problematic in two cases:

* If in the future there is a major version so high that it will
  overflow the 8 bits it would be interpreted as major version "0" and
  inadvertedly accepted by older/ancient implementations.

* If this logic is used in other derived projects (libblkid/util-linux)
  it is likely that those will miss if the in-kernel logic is expanded
  to the full ten bits.

If eight bits are supposed to be enough for the major version, why not
split it into 8 bits major, 8 bits minor?
Or make the major version an u16.

> +#define BCH_VERSION_MINOR(_v)		((__u8) ((_v) >> 0))
> +#define BCH_VERSION(_major, _minor)	(((_major) << 10)|(_minor) << 0)

This changes the user-visible data format.
For example for libblkid/util-linux parses and exposes it to users.

I don't think it's a big problem and I will adapt blkid for this format.

But I think it would be good opportunity to have some sort of official
statement about the stability of the current (pre-mainline) superblock.

Do you expect or plan any incompatibilities?

> [..]

Thomas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 10/10] bcachefs: bcachefs_metadata_version_major_minor
  2023-07-09 17:49   ` Thomas Weißschuh
@ 2023-07-09 18:31     ` Kent Overstreet
  2023-07-09 19:29       ` Thomas Weißschuh
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 18:31 UTC (permalink / raw)
  To: Thomas Weißschuh; +Cc: linux-bcachefs, bfoster, sandeen

On Sun, Jul 09, 2023 at 07:49:00PM +0200, Thomas Weißschuh wrote:
> Hi Kent,
> 
> the subject for this patch looks a bit like a placeholder, in contrast
> with the other patches of the series.
> 
> More below.
> 
> On 2023-07-09 13:15:51-0400, Kent Overstreet wrote:
> > This introduces major/minor versioning to the superblock version number.
> > Major version number changes indicate incompatible releases; we can move
> > forward to a new major version number, but not backwards. Minor version
> > numbers indicate compatible changes - these add features, but can still
> > be mounted and used by old versions.
> > 
> > With the recent patches that make it possible to roll out new btrees and
> > key types without breaking compatibility, we should be able to roll out
> > most new features without incompatible changes.
> > 
> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > ---
> >  fs/bcachefs/bcachefs.h        |  1 -
> >  fs/bcachefs/bcachefs_format.h | 46 +++++++++++++-----------
> >  fs/bcachefs/btree_io.c        |  7 ++--
> >  fs/bcachefs/journal_io.c      | 12 ++++---
> >  fs/bcachefs/recovery.c        | 66 +++++++++++++++++++++++++++--------
> >  fs/bcachefs/super-io.c        | 58 ++++++++++++++++++++----------
> >  fs/bcachefs/super-io.h        |  3 +-
> >  7 files changed, 132 insertions(+), 61 deletions(-)
> 
> > [..]
> 
> > diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
> > index 8a0f90a83d..8c1955b6a4 100644
> > --- a/fs/bcachefs/bcachefs_format.h
> > +++ b/fs/bcachefs/bcachefs_format.h
> > @@ -1574,28 +1574,32 @@ struct bch_sb_field_journal_seq_blacklist {
> >   * One common version number for all on disk data structures - superblock, btree
> >   * nodes, journal entries
> >   */
> > +#define BCH_VERSION_MAJOR(_v)		((__u8) ((_v) >> 10))
> 
> This "wastes" two bits of the major version that do not fit into the
> u8 value. This is problematic in two cases:
> 
> * If in the future there is a major version so high that it will
>   overflow the 8 bits it would be interpreted as major version "0" and
>   inadvertedly accepted by older/ancient implementations.
> 
> * If this logic is used in other derived projects (libblkid/util-linux)
>   it is likely that those will miss if the in-kernel logic is expanded
>   to the full ten bits.
> 
> If eight bits are supposed to be enough for the major version, why not
> split it into 8 bits major, 8 bits minor?
> Or make the major version an u16.

Minor releases are going to be much more common than major releases:
thus, I'm using 10 bits for the minor number and 6 bits for the major.

It would've been better if the version number was a u32, and then the
major and minor could've both been u16s, but alas I was not that
foresightful :)

> > +#define BCH_VERSION_MINOR(_v)		((__u8) ((_v) >> 0))
> > +#define BCH_VERSION(_major, _minor)	(((_major) << 10)|(_minor) << 0)
> 
> This changes the user-visible data format.
> For example for libblkid/util-linux parses and exposes it to users.
> 
> I don't think it's a big problem and I will adapt blkid for this format.
> 
> But I think it would be good opportunity to have some sort of official
> statement about the stability of the current (pre-mainline) superblock.
> 
> Do you expect or plan any incompatibilities?

I think it's pretty unlikely that there'll ever be a majorly
incompatible superblock change. I recently switched to a new magic UUID,
different from the one bcache uses - that and this change are the
biggest incompatibilities there have been since I implemented the
"tagged variable-length sections" scheme years ago.

If we can't fit something in the main part of the superblock, we'll just
create new tagged sections, and those are much easier from a
compatibility point of view.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 10/10] bcachefs: bcachefs_metadata_version_major_minor
  2023-07-09 18:31     ` Kent Overstreet
@ 2023-07-09 19:29       ` Thomas Weißschuh
  2023-07-09 20:08         ` Kent Overstreet
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Weißschuh @ 2023-07-09 19:29 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs, bfoster, sandeen

On 2023-07-09 14:31:22-0400, Kent Overstreet wrote:
> On Sun, Jul 09, 2023 at 07:49:00PM +0200, Thomas Weißschuh wrote:
> > Hi Kent,
> > 
> > the subject for this patch looks a bit like a placeholder, in contrast
> > with the other patches of the series.
> > 
> > More below.
> > 
> > On 2023-07-09 13:15:51-0400, Kent Overstreet wrote:
> > > This introduces major/minor versioning to the superblock version number.
> > > Major version number changes indicate incompatible releases; we can move
> > > forward to a new major version number, but not backwards. Minor version
> > > numbers indicate compatible changes - these add features, but can still
> > > be mounted and used by old versions.
> > > 
> > > With the recent patches that make it possible to roll out new btrees and
> > > key types without breaking compatibility, we should be able to roll out
> > > most new features without incompatible changes.
> > > 
> > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > > ---
> > >  fs/bcachefs/bcachefs.h        |  1 -
> > >  fs/bcachefs/bcachefs_format.h | 46 +++++++++++++-----------
> > >  fs/bcachefs/btree_io.c        |  7 ++--
> > >  fs/bcachefs/journal_io.c      | 12 ++++---
> > >  fs/bcachefs/recovery.c        | 66 +++++++++++++++++++++++++++--------
> > >  fs/bcachefs/super-io.c        | 58 ++++++++++++++++++++----------
> > >  fs/bcachefs/super-io.h        |  3 +-
> > >  7 files changed, 132 insertions(+), 61 deletions(-)
> > 
> > > [..]
> > 
> > > diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
> > > index 8a0f90a83d..8c1955b6a4 100644
> > > --- a/fs/bcachefs/bcachefs_format.h
> > > +++ b/fs/bcachefs/bcachefs_format.h
> > > @@ -1574,28 +1574,32 @@ struct bch_sb_field_journal_seq_blacklist {
> > >   * One common version number for all on disk data structures - superblock, btree
> > >   * nodes, journal entries
> > >   */
> > > +#define BCH_VERSION_MAJOR(_v)		((__u8) ((_v) >> 10))
> > 
> > This "wastes" two bits of the major version that do not fit into the
> > u8 value. This is problematic in two cases:
> > 
> > * If in the future there is a major version so high that it will
> >   overflow the 8 bits it would be interpreted as major version "0" and
> >   inadvertedly accepted by older/ancient implementations.
> > 
> > * If this logic is used in other derived projects (libblkid/util-linux)
> >   it is likely that those will miss if the in-kernel logic is expanded
> >   to the full ten bits.
> > 
> > If eight bits are supposed to be enough for the major version, why not
> > split it into 8 bits major, 8 bits minor?
> > Or make the major version an u16.
> 
> Minor releases are going to be much more common than major releases:
> thus, I'm using 10 bits for the minor number and 6 bits for the major.

Sounds good. But can this be reflected in the definition of
BCH_VERSION_MINOR?
Currently it only supports 8 bits.

Something like:

#define BCH_VERSION_MINOR(_v)		((__u16) ((_v) & 0x03ff))

> It would've been better if the version number was a u32, and then the
> major and minor could've both been u16s, but alas I was not that
> foresightful :)
> 
> > > +#define BCH_VERSION_MINOR(_v)		((__u8) ((_v) >> 0))
> > > +#define BCH_VERSION(_major, _minor)	(((_major) << 10)|(_minor) << 0)
> > 
> > This changes the user-visible data format.
> > For example for libblkid/util-linux parses and exposes it to users.
> > 
> > I don't think it's a big problem and I will adapt blkid for this format.
> > 
> > But I think it would be good opportunity to have some sort of official
> > statement about the stability of the current (pre-mainline) superblock.
> > 
> > Do you expect or plan any incompatibilities?
> 
> I think it's pretty unlikely that there'll ever be a majorly
> incompatible superblock change. I recently switched to a new magic UUID,
> different from the one bcache uses - that and this change are the
> biggest incompatibilities there have been since I implemented the
> "tagged variable-length sections" scheme years ago.
> 
> If we can't fit something in the main part of the superblock, we'll just
> create new tagged sections, and those are much easier from a
> compatibility point of view.

Thanks for the clarification! Sounds good to me.

PS: Is the source for "bcachefs: Principles of Operations" available?
    While reading it I noticed some typos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 10/10] bcachefs: bcachefs_metadata_version_major_minor
  2023-07-09 19:29       ` Thomas Weißschuh
@ 2023-07-09 20:08         ` Kent Overstreet
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2023-07-09 20:08 UTC (permalink / raw)
  To: Thomas Weißschuh; +Cc: linux-bcachefs, bfoster, sandeen

On Sun, Jul 09, 2023 at 09:29:00PM +0200, Thomas Weißschuh wrote:
> On 2023-07-09 14:31:22-0400, Kent Overstreet wrote:
> > Minor releases are going to be much more common than major releases:
> > thus, I'm using 10 bits for the minor number and 6 bits for the major.
> 
> Sounds good. But can this be reflected in the definition of
> BCH_VERSION_MINOR?
> Currently it only supports 8 bits.
> 
> Something like:
> 
> #define BCH_VERSION_MINOR(_v)		((__u16) ((_v) & 0x03ff))

Ah, good catch. Changing it to:
#define BCH_VERSION_MAJOR(_v)           ((__u16) ((_v) >> 10))                                                                                                                                                                                                                           
#define BCH_VERSION_MINOR(_v)           ((__u16) ((_v) & ~(~0U << 10)))

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 05/10] bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE()
  2023-07-09 17:15 ` [PATCH 05/10] bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE() Kent Overstreet
@ 2023-07-13 13:42   ` Brian Foster
  2023-07-13 15:31     ` Kent Overstreet
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2023-07-13 13:42 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs, sandeen

On Sun, Jul 09, 2023 at 01:15:46PM -0400, Kent Overstreet wrote:
> Version upgrades are not atomic operations: when we do a version upgrade
> we need to update the superblock before we start using new features, and
> then when the upgrade completes we need to update the superblock again.
> This adds a new superblock field so we can detect and handle incomplete
> version upgrades.
> 
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> ---
>  fs/bcachefs/alloc_background.c |  3 +-
>  fs/bcachefs/bcachefs.h         |  7 +++++
>  fs/bcachefs/bcachefs_format.h  |  5 ++++
>  fs/bcachefs/recovery.c         | 54 +++++++++++++++++++---------------
>  fs/bcachefs/super-io.c         | 18 ++++++++++++
>  5 files changed, 62 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c
> index c59629bbbc..b1dfe300d9 100644
> --- a/fs/bcachefs/alloc_background.c
> +++ b/fs/bcachefs/alloc_background.c
> @@ -1232,8 +1232,7 @@ int bch2_check_alloc_hole_bucket_gens(struct btree_trans *trans,
>  	unsigned i, gens_offset, gens_end_offset;
>  	int ret;
>  
> -	if (c->sb.version < bcachefs_metadata_version_bucket_gens &&
> -	    !c->opts.version_upgrade)
> +	if (c->sb.version < bcachefs_metadata_version_bucket_gens)
>  		return 0;
>  
>  	bch2_btree_iter_set_pos(bucket_gens_iter, alloc_gens_pos(start, &gens_offset));
> diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
> index a8488d4e18..d7f030aa30 100644
> --- a/fs/bcachefs/bcachefs.h
> +++ b/fs/bcachefs/bcachefs.h
> @@ -712,6 +712,7 @@ struct bch_fs {
>  
>  		u16		version;
>  		u16		version_min;
> +		u16		version_upgrade_complete;
>  
>  		u8		nr_devices;
>  		u8		clean;
> @@ -1134,6 +1135,12 @@ static inline bool bch2_dev_exists2(const struct bch_fs *c, unsigned dev)
>  	return dev < c->sb.nr_devices && c->devs[dev];
>  }
>  

So the commit log implies that when an upgrade starts, sb.version refers
to the target version, and now version_upgrade_complete is updated to
match upon completion of the upgrade. Am I following that correctly?

If so, I'm still missing something.. where exactly is sb.version updated
to the target version?

Brian

> +static inline bool bch2_version_upgrading_to(const struct bch_fs *c, unsigned new_version)
> +{
> +	return c->sb.version_upgrade_complete < new_version &&
> +		c->sb.version >= new_version;
> +}
> +
>  #define BKEY_PADDED_ONSTACK(key, pad)				\
>  	struct { struct bkey_i key; __u64 key ## _pad[pad]; }
>  
> diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
> index 49b86bfda7..c397a3b96b 100644
> --- a/fs/bcachefs/bcachefs_format.h
> +++ b/fs/bcachefs/bcachefs_format.h
> @@ -1748,6 +1748,11 @@ LE64_BITMASK(BCH_SB_JOURNAL_TRANSACTION_NAMES,struct bch_sb, flags[4], 32, 33);
>  LE64_BITMASK(BCH_SB_NOCOW,		struct bch_sb, flags[4], 33, 34);
>  LE64_BITMASK(BCH_SB_WRITE_BUFFER_SIZE,	struct bch_sb, flags[4], 34, 54);
>  
> +/* flags[4] 56-64 unused: */
> +
> +LE64_BITMASK(BCH_SB_VERSION_UPGRADE_COMPLETE,
> +					struct bch_sb, flags[5],  0, 16);
> +
>  /*
>   * Features:
>   *
> diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
> index 9ea85b097e..0173707cfd 100644
> --- a/fs/bcachefs/recovery.c
> +++ b/fs/bcachefs/recovery.c
> @@ -1107,6 +1107,31 @@ static int bch2_fs_upgrade_for_subvolumes(struct bch_fs *c)
>  	return ret;
>  }
>  
> +static void check_version_upgrade(struct bch_fs *c)
> +{
> +	unsigned version = c->sb.version_upgrade_complete ?: c->sb.version;
> +
> +	if (version < bcachefs_metadata_required_upgrade_below) {
> +		struct printbuf buf = PRINTBUF;
> +
> +		if (version != c->sb.version)
> +			prt_str(&buf, "version upgrade incomplete:\n");
> +
> +		prt_str(&buf, "version ");
> +		bch2_version_to_text(&buf, version);
> +		prt_str(&buf, " prior to ");
> +		bch2_version_to_text(&buf, bcachefs_metadata_required_upgrade_below);
> +		prt_str(&buf, ", upgrade and fsck required");
> +
> +		bch_info(c, "%s", buf.buf);
> +		printbuf_exit(&buf);
> +
> +		c->opts.version_upgrade	= true;
> +		c->opts.fsck		= true;
> +		c->opts.fix_errors	= FSCK_OPT_YES;
> +	}
> +}
> +
>  int bch2_fs_recovery(struct bch_fs *c)
>  {
>  	struct bch_sb_field_clean *clean = NULL;
> @@ -1146,23 +1171,8 @@ int bch2_fs_recovery(struct bch_fs *c)
>  		goto err;
>  	}
>  
> -	if (!c->opts.nochanges &&
> -	    c->sb.version < bcachefs_metadata_required_upgrade_below) {
> -		struct printbuf buf = PRINTBUF;
> -
> -		prt_str(&buf, "version ");
> -		bch2_version_to_text(&buf, c->sb.version);
> -		prt_str(&buf, " prior to ");
> -		bch2_version_to_text(&buf, bcachefs_metadata_required_upgrade_below);
> -		prt_str(&buf, ", upgrade and fsck required");
> -
> -		bch_info(c, "%s", buf.buf);
> -		printbuf_exit(&buf);
> -
> -		c->opts.version_upgrade	= true;
> -		c->opts.fsck		= true;
> -		c->opts.fix_errors	= FSCK_OPT_YES;
> -	}
> +	if (!c->opts.nochanges)
> +		check_version_upgrade(c);
>  
>  	if (c->opts.fsck && c->opts.norecovery) {
>  		bch_err(c, "cannot select both norecovery and fsck");
> @@ -1406,8 +1416,7 @@ int bch2_fs_recovery(struct bch_fs *c)
>  	if (ret)
>  		goto err;
>  
> -	if (c->sb.version < bcachefs_metadata_version_bucket_gens &&
> -	    c->opts.version_upgrade) {
> +	if (bch2_version_upgrading_to(c, bcachefs_metadata_version_bucket_gens)) {
>  		bch_info(c, "initializing bucket_gens");
>  		ret = bch2_bucket_gens_init(c);
>  		if (ret)
> @@ -1415,7 +1424,7 @@ int bch2_fs_recovery(struct bch_fs *c)
>  		bch_verbose(c, "bucket_gens init done");
>  	}
>  
> -	if (c->sb.version < bcachefs_metadata_version_snapshot_2) {
> +	if (bch2_version_upgrading_to(c, bcachefs_metadata_version_snapshot_2)) {
>  		ret = bch2_fs_upgrade_for_subvolumes(c);
>  		if (ret)
>  			goto err;
> @@ -1443,9 +1452,8 @@ int bch2_fs_recovery(struct bch_fs *c)
>  	}
>  
>  	mutex_lock(&c->sb_lock);
> -	if (c->opts.version_upgrade) {
> -		c->disk_sb.sb->version = cpu_to_le16(bcachefs_metadata_version_current);
> -		c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALL);
> +	if (BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb) != c->sb.version) {
> +		SET_BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb, c->sb.version);
>  		write_sb = true;
>  	}
>  
> diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
> index 833e78d48c..cb03e3f0c5 100644
> --- a/fs/bcachefs/super-io.c
> +++ b/fs/bcachefs/super-io.c
> @@ -445,6 +445,7 @@ static void bch2_sb_update(struct bch_fs *c)
>  	c->sb.user_uuid		= src->user_uuid;
>  	c->sb.version		= le16_to_cpu(src->version);
>  	c->sb.version_min	= le16_to_cpu(src->version_min);
> +	c->sb.version_upgrade_complete = BCH_SB_VERSION_UPGRADE_COMPLETE(src) ?: c->sb.version;
>  	c->sb.nr_devices	= src->nr_devices;
>  	c->sb.clean		= BCH_SB_CLEAN(src);
>  	c->sb.encryption_type	= BCH_SB_ENCRYPTION_TYPE(src);
> @@ -1185,7 +1186,19 @@ int bch2_fs_mark_dirty(struct bch_fs *c)
>  
>  	mutex_lock(&c->sb_lock);
>  	SET_BCH_SB_CLEAN(c->disk_sb.sb, false);
> +
> +	if (BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb) > bcachefs_metadata_version_current)
> +		SET_BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb, bcachefs_metadata_version_current);
> +
> +	if (c->opts.version_upgrade ||
> +	    c->sb.version > bcachefs_metadata_version_current)
> +		c->disk_sb.sb->version = cpu_to_le16(bcachefs_metadata_version_current);
> +
> +	if (c->opts.version_upgrade)
> +		c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALL);
> +
>  	c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALWAYS);
> +
>  	c->disk_sb.sb->compat[0] &= cpu_to_le64((1ULL << BCH_COMPAT_NR) - 1);
>  	ret = bch2_write_super(c);
>  	mutex_unlock(&c->sb_lock);
> @@ -1529,6 +1542,11 @@ void bch2_sb_to_text(struct printbuf *out, struct bch_sb *sb,
>  	bch2_version_to_text(out, le16_to_cpu(sb->version));
>  	prt_newline(out);
>  
> +	prt_str(out, "Version upgrade complete:");
> +	prt_tab(out);
> +	bch2_version_to_text(out, BCH_SB_VERSION_UPGRADE_COMPLETE(sb));
> +	prt_newline(out);
> +
>  	prt_printf(out, "Oldest version on disk:");
>  	prt_tab(out);
>  	bch2_version_to_text(out, le16_to_cpu(sb->version_min));
> -- 
> 2.40.1
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 05/10] bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE()
  2023-07-13 13:42   ` Brian Foster
@ 2023-07-13 15:31     ` Kent Overstreet
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2023-07-13 15:31 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-bcachefs, sandeen

On Thu, Jul 13, 2023 at 09:42:15AM -0400, Brian Foster wrote:
> On Sun, Jul 09, 2023 at 01:15:46PM -0400, Kent Overstreet wrote:
> > Version upgrades are not atomic operations: when we do a version upgrade
> > we need to update the superblock before we start using new features, and
> > then when the upgrade completes we need to update the superblock again.
> > This adds a new superblock field so we can detect and handle incomplete
> > version upgrades.
> > 
> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > ---
> >  fs/bcachefs/alloc_background.c |  3 +-
> >  fs/bcachefs/bcachefs.h         |  7 +++++
> >  fs/bcachefs/bcachefs_format.h  |  5 ++++
> >  fs/bcachefs/recovery.c         | 54 +++++++++++++++++++---------------
> >  fs/bcachefs/super-io.c         | 18 ++++++++++++
> >  5 files changed, 62 insertions(+), 25 deletions(-)
> > 
> > diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c
> > index c59629bbbc..b1dfe300d9 100644
> > --- a/fs/bcachefs/alloc_background.c
> > +++ b/fs/bcachefs/alloc_background.c
> > @@ -1232,8 +1232,7 @@ int bch2_check_alloc_hole_bucket_gens(struct btree_trans *trans,
> >  	unsigned i, gens_offset, gens_end_offset;
> >  	int ret;
> >  
> > -	if (c->sb.version < bcachefs_metadata_version_bucket_gens &&
> > -	    !c->opts.version_upgrade)
> > +	if (c->sb.version < bcachefs_metadata_version_bucket_gens)
> >  		return 0;
> >  
> >  	bch2_btree_iter_set_pos(bucket_gens_iter, alloc_gens_pos(start, &gens_offset));
> > diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
> > index a8488d4e18..d7f030aa30 100644
> > --- a/fs/bcachefs/bcachefs.h
> > +++ b/fs/bcachefs/bcachefs.h
> > @@ -712,6 +712,7 @@ struct bch_fs {
> >  
> >  		u16		version;
> >  		u16		version_min;
> > +		u16		version_upgrade_complete;
> >  
> >  		u8		nr_devices;
> >  		u8		clean;
> > @@ -1134,6 +1135,12 @@ static inline bool bch2_dev_exists2(const struct bch_fs *c, unsigned dev)
> >  	return dev < c->sb.nr_devices && c->devs[dev];
> >  }
> >  
> 
> So the commit log implies that when an upgrade starts, sb.version refers
> to the target version, and now version_upgrade_complete is updated to
> match upon completion of the upgrade. Am I following that correctly?
> 
> If so, I'm still missing something.. where exactly is sb.version updated
> to the target version?

I don't think that was at its final destination in this patch - it's
check_version_uprade() in recovery.c post major minor versioning.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-07-13 15:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-09 17:15 [PATCH 00/10] bcachefs - semvar, forward compatibility Kent Overstreet
2023-07-09 17:15 ` [PATCH 01/10] bcachefs: Allow for unknown btree IDs Kent Overstreet
2023-07-09 17:15 ` [PATCH 02/10] bcachefs: Allow for unknown key types Kent Overstreet
2023-07-09 17:15 ` [PATCH 03/10] bcachefs: Refactor bch_sb_field_ops handling Kent Overstreet
2023-07-09 17:15 ` [PATCH 04/10] bcachefs: Change check for invalid key types Kent Overstreet
2023-07-09 17:15 ` [PATCH 05/10] bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE() Kent Overstreet
2023-07-13 13:42   ` Brian Foster
2023-07-13 15:31     ` Kent Overstreet
2023-07-09 17:15 ` [PATCH 06/10] bcachefs: version_upgrade is now an enum Kent Overstreet
2023-07-09 17:15 ` [PATCH 07/10] bcachefs: Kill bch2_bucket_gens_read() Kent Overstreet
2023-07-09 17:15 ` [PATCH 08/10] bcachefs: Stash journal replay params in bch_fs Kent Overstreet
2023-07-09 17:15 ` [PATCH 09/10] bcachefs: Enumerate recovery passes Kent Overstreet
2023-07-09 17:15 ` [PATCH 10/10] bcachefs: bcachefs_metadata_version_major_minor Kent Overstreet
2023-07-09 17:49   ` Thomas Weißschuh
2023-07-09 18:31     ` Kent Overstreet
2023-07-09 19:29       ` Thomas Weißschuh
2023-07-09 20:08         ` Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).