All of lore.kernel.org
 help / color / mirror / Atom feed
* [[PATCH] 0/3] btrfs-check: Fix bitflipped keys from bad RAM
@ 2014-05-05 17:07 Hugo Mills
  2014-05-05 17:07 ` [PATCH 1/3] btrfs check: Fix wrong level access Hugo Mills
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Hugo Mills @ 2014-05-05 17:07 UTC (permalink / raw)
  To: linux-btrfs

If you have RAM with stuck or unreliable bits in it, and a metadata block is
stored in it, you can end up with keys with errors in. These usually show up
as "bad key order". In many cases, these out-of-order keys can be identified
and fixed with a simple heuristic. This patch series implements that
heuristic, and fixes a long-standing issue with the existing code to fix out-
of-order keys.

Hugo Mills (3):
  btrfs check: Fix wrong level access
  btrfs check: Pre-sort keys in a block while searching
  btrfs check: Attempt to fix misordered keys with bitflips in them

 cmds-check.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 ctree.c      |  86 ++++++++++++++++++++++++++++++++++++--------
 ctree.h      |   2 ++
 3 files changed, 180 insertions(+), 22 deletions(-)

-- 
1.9.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] btrfs check: Fix wrong level access
  2014-05-05 17:07 [[PATCH] 0/3] btrfs-check: Fix bitflipped keys from bad RAM Hugo Mills
@ 2014-05-05 17:07 ` Hugo Mills
  2014-05-05 17:07 ` [PATCH 2/3] btrfs check: Pre-sort keys in a block while searching Hugo Mills
  2014-05-05 17:07 ` [PATCH 3/3] btrfs check: Attempt to fix misordered keys with bitflips in them Hugo Mills
  2 siblings, 0 replies; 6+ messages in thread
From: Hugo Mills @ 2014-05-05 17:07 UTC (permalink / raw)
  To: linux-btrfs

There's no reason to assume that the bad key order is in a leaf block,
so accessing level 0 of the path is going to be an error if it's actually
a node block that's bad.

Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
---
 cmds-check.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index d195e7a..fc84ad8 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2418,6 +2418,7 @@ static int try_to_fix_bad_block(struct btrfs_trans_handle *trans,
 	struct btrfs_path *path;
 	struct btrfs_key k1, k2;
 	int i;
+	int level;
 	int ret;
 
 	if (status != BTRFS_TREE_BLOCK_BAD_KEY_ORDER)
@@ -2435,9 +2436,10 @@ static int try_to_fix_bad_block(struct btrfs_trans_handle *trans,
 	if (!path)
 		return -EIO;
 
-	path->lowest_level = btrfs_header_level(buf);
+	level = btrfs_header_level(buf);
+	path->lowest_level = level;
 	path->skip_check_block = 1;
-	if (btrfs_header_level(buf))
+	if (level)
 		btrfs_node_key_to_cpu(buf, &k1, 0);
 	else
 		btrfs_item_key_to_cpu(buf, &k1, 0);
@@ -2448,9 +2450,9 @@ static int try_to_fix_bad_block(struct btrfs_trans_handle *trans,
 		return -EIO;
 	}
 
-	buf = path->nodes[0];
+	buf = path->nodes[level];
 	for (i = 0; i < btrfs_header_nritems(buf) - 1; i++) {
-		if (btrfs_header_level(buf)) {
+		if (level) {
 			btrfs_node_key_to_cpu(buf, &k1, i);
 			btrfs_node_key_to_cpu(buf, &k2, i + 1);
 		} else {
-- 
1.9.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] btrfs check: Pre-sort keys in a block while searching
  2014-05-05 17:07 [[PATCH] 0/3] btrfs-check: Fix bitflipped keys from bad RAM Hugo Mills
  2014-05-05 17:07 ` [PATCH 1/3] btrfs check: Fix wrong level access Hugo Mills
@ 2014-05-05 17:07 ` Hugo Mills
  2014-05-05 17:07 ` [PATCH 3/3] btrfs check: Attempt to fix misordered keys with bitflips in them Hugo Mills
  2 siblings, 0 replies; 6+ messages in thread
From: Hugo Mills @ 2014-05-05 17:07 UTC (permalink / raw)
  To: linux-btrfs

When we think we might have a messed-up block with keys out of order
(e.g. during fsck), we still need to be able to find a key in the block.
To deal with this, we copy the keys, keeping track of where they came from
in the original node/leaf, sort them, and then do the binary search.

Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
---
 cmds-check.c |  1 +
 ctree.c      | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
 ctree.h      |  2 ++
 3 files changed, 75 insertions(+), 14 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index fc84ad8..b2e4a46 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2439,6 +2439,7 @@ static int try_to_fix_bad_block(struct btrfs_trans_handle *trans,
 	level = btrfs_header_level(buf);
 	path->lowest_level = level;
 	path->skip_check_block = 1;
+	path->bin_search_presort = 1;
 	if (level)
 		btrfs_node_key_to_cpu(buf, &k1, 0);
 	else
diff --git a/ctree.c b/ctree.c
index 9e5b30f..30e1785 100644
--- a/ctree.c
+++ b/ctree.c
@@ -388,6 +388,16 @@ int btrfs_comp_cpu_keys(struct btrfs_key *k1, struct btrfs_key *k2)
 	return 0;
 }
 
+int btrfs_comp_disk_keys(struct btrfs_disk_key *dk1,
+			 struct btrfs_disk_key *dk2)
+{
+	struct btrfs_key k1, k2;
+
+	btrfs_disk_key_to_cpu(&k1, dk1);
+	btrfs_disk_key_to_cpu(&k2, dk2);
+	return btrfs_comp_cpu_keys(&k1, &k2);
+}
+
 /*
  * compare two keys in a memcmp fashion
  */
@@ -598,25 +608,73 @@ static int generic_bin_search(struct extent_buffer *eb, unsigned long p,
 	return 1;
 }
 
+static int cmp_disk_keys(const void *k1, const void *k2)
+{
+	return btrfs_comp_disk_keys((struct btrfs_disk_key *)k1, (struct btrfs_disk_key *)k2);
+}
+
+/* Copy the item keys and their original positions into a second
+ * extent buffer, which can be safely passed to generic_bin_search in
+ * the case where the keys might be out of order.
+ */
+static void sort_key_copy(struct extent_buffer *tgt, struct extent_buffer *src,
+			  int offset, int item_size, int nitems)
+{
+	struct btrfs_disk_key *src_item;
+	struct btrfs_item *tgt_item;
+	int i;
+
+	for (i = 0; i < nitems; i++) {
+		/* We abuse the struct btrfs_item slightly here: the key
+		 * is the key we care about; the offset field is the
+		 * original slot number */
+		src_item = (struct btrfs_disk_key *)(src->data + offset + i*item_size);
+		tgt_item = (struct btrfs_item *)(tgt->data + i*sizeof(struct btrfs_item));
+		memcpy(tgt_item, src_item, sizeof(struct btrfs_disk_key));
+		tgt_item->offset = i;
+	}
+	qsort(tgt->data, nitems, sizeof(struct btrfs_item), cmp_disk_keys);
+}
+
 /*
  * simple bin_search frontend that does the right thing for
  * leaves vs nodes
  */
 static int bin_search(struct extent_buffer *eb, struct btrfs_key *key,
-		      int level, int *slot)
+		      int level, int pre_sort, int *slot)
 {
-	if (level == 0)
-		return generic_bin_search(eb,
-					  offsetof(struct btrfs_leaf, items),
-					  sizeof(struct btrfs_item),
-					  key, btrfs_header_nritems(eb),
-					  slot);
-	else
-		return generic_bin_search(eb,
-					  offsetof(struct btrfs_node, ptrs),
-					  sizeof(struct btrfs_key_ptr),
-					  key, btrfs_header_nritems(eb),
-					  slot);
+	struct extent_buffer *sorted = NULL;
+	int ret;
+	int offset, size, nritems;
+
+	if (level == 0) {
+		offset = offsetof(struct btrfs_leaf, items);
+		size = sizeof(struct btrfs_item);
+	} else {
+		offset = offsetof(struct btrfs_node, ptrs);
+		size = sizeof(struct btrfs_key_ptr);
+	}
+	nritems = btrfs_header_nritems(eb);
+
+	if (pre_sort) {
+		sorted = alloc_extent_buffer(eb->tree, eb->dev_bytenr, eb->len);
+		sort_key_copy(sorted, eb, offset, size, nritems);
+		offset = 0;
+		size = sizeof(struct btrfs_item);
+		eb = sorted;
+	}
+
+	ret = generic_bin_search(eb, offset, size, key, nritems, slot);
+
+	if (pre_sort) {
+		/* We have the sorted slot number, which is probably unhelpful
+		   if the sort changed the order. So, return the original slot
+		   number, not the sorted position. */
+		*slot = ((struct btrfs_item *)(eb->data + (*slot)*size))->offset;
+		free_extent_buffer(sorted);
+	}
+
+	return ret;
 }
 
 struct extent_buffer *read_node_slot(struct btrfs_root *root,
@@ -1075,7 +1133,7 @@ again:
 		ret = check_block(root, p, level);
 		if (ret)
 			return -1;
-		ret = bin_search(b, key, level, &slot);
+		ret = bin_search(b, key, level, p->bin_search_presort, &slot);
 		if (level != 0) {
 			if (ret && slot > 0)
 				slot -= 1;
diff --git a/ctree.h b/ctree.h
index a4d2cd1..4668446 100644
--- a/ctree.h
+++ b/ctree.h
@@ -540,6 +540,8 @@ struct btrfs_path {
 	int reada;
 	/* keep some upper locks as we walk down */
 	int lowest_level;
+	/* For check: Sort the keys in a block before applying a binary search */
+	int bin_search_presort;
 
 	/*
 	 * set by btrfs_split_item, tells search_slot to keep all locks
-- 
1.9.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] btrfs check: Attempt to fix misordered keys with bitflips in them
  2014-05-05 17:07 [[PATCH] 0/3] btrfs-check: Fix bitflipped keys from bad RAM Hugo Mills
  2014-05-05 17:07 ` [PATCH 1/3] btrfs check: Fix wrong level access Hugo Mills
  2014-05-05 17:07 ` [PATCH 2/3] btrfs check: Pre-sort keys in a block while searching Hugo Mills
@ 2014-05-05 17:07 ` Hugo Mills
  2014-05-16 14:22   ` David Sterba
  2 siblings, 1 reply; 6+ messages in thread
From: Hugo Mills @ 2014-05-05 17:07 UTC (permalink / raw)
  To: linux-btrfs

If someone has had bad RAM which has been used to store a metadata block,
there's a chance that one or more of the keys has had a bit changed. The
block checksum doesn't help us here, because it's made on the bad data.

To fix this, if there's a block with a bad key order in it, we find out-of-
order keys by bracketing them between the good keys either side of them,
and then attempting to flip each bit of the key in turn.

If precisely one of those bitflips puts the broken key back into order
relative to its two neighbours, we probably have a fix for the bitflip,
and so we write it back to the FS.

This doesn't repair bitflipped keys at the start or end of a metadata
block, nor bitflips in any other data structure.

Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
---
 cmds-check.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 99 insertions(+), 4 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index b2e4a46..c93fea3 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2406,8 +2406,9 @@ static int swap_values(struct btrfs_root *root, struct btrfs_path *path,
 }
 
 /*
- * Attempt to fix basic block failures.  Currently we only handle bad key
- * orders, we will cycle through the keys and swap them if necessary.
+ * Attempt to fix basic block failures. Currently we only handle bad
+ * key orders, we will look for fixable bitflips, and also cycle
+ * through the keys and swap them if necessary.
  */
 static int try_to_fix_bad_block(struct btrfs_trans_handle *trans,
 				struct btrfs_root *root,
@@ -2416,8 +2417,9 @@ static int try_to_fix_bad_block(struct btrfs_trans_handle *trans,
 				enum btrfs_tree_block_status status)
 {
 	struct btrfs_path *path;
-	struct btrfs_key k1, k2;
-	int i;
+	struct btrfs_key k1, k2, k3;
+	int i, j;
+	int bit, field;
 	int level;
 	int ret;
 
@@ -2452,6 +2454,99 @@ static int try_to_fix_bad_block(struct btrfs_trans_handle *trans,
 	}
 
 	buf = path->nodes[level];
+
+	/* First, look for bitflips in keys: we identify these where k1 <
+	 * k3 but k1 >= k2 or k2 >= k3. We can fix a bitflip if there's
+	 * exactly one bit that we can flip that makes k1 < k2 < k3. */
+	for (i = 0; i < btrfs_header_nritems(buf) - 2; i++) {
+		if (level) {
+			btrfs_node_key_to_cpu(buf, &k1, i);
+			btrfs_node_key_to_cpu(buf, &k2, i+1);
+			btrfs_node_key_to_cpu(buf, &k3, i+2);
+		} else {
+			btrfs_item_key_to_cpu(buf, &k1, i);
+			btrfs_item_key_to_cpu(buf, &k2, i+1);
+			btrfs_item_key_to_cpu(buf, &k3, i+2);
+		}
+
+		if (btrfs_comp_cpu_keys(&k1, &k3) >= 0)
+			continue; /* Bracketing keys compare incorrectly:
+						 we can't fix this */
+		if (btrfs_comp_cpu_keys(&k1, &k2) <= 0
+			&& btrfs_comp_cpu_keys(&k2, &k3) <= 0)
+			continue; /* All three keys are in order: nothing to do */
+
+		bit = -1;
+		field = -1;
+		for(j = 0; j < 64; j++) {
+			/* Look for flipped/fixable bits in the objectid */
+			k2.objectid ^= 0x1ULL << j;
+			if (btrfs_comp_cpu_keys(&k1, &k2) <= 0
+				&& btrfs_comp_cpu_keys(&k2, &k3) <= 0) {
+				/* Do nothing if we've already found a flippable bit:
+				 * multiple solutions means we can't know what the
+				 * right thing to do is */
+				if (field != -1) {
+					field = -1;
+					break;
+				}
+				bit = j;
+				field = 0;
+			}
+			k2.objectid ^= 0x1ULL << j;
+
+			/* Look for flipped/fixable bits in the type */
+			if (j < 8) {
+				k2.type ^= 0x1ULL << j;
+				if (btrfs_comp_cpu_keys(&k1, &k2) <= 0
+					&& btrfs_comp_cpu_keys(&k2, &k3) <= 0) {
+					if (field != -1) {
+						field = -1;
+						break;
+					}
+					bit = j;
+					field = 1;
+				}
+				k2.type ^= 0x1ULL << j;
+			}
+
+			/* Look for flipped/fixable bits in the offset */
+			k2.offset ^= 0x1ULL << j;
+			if (btrfs_comp_cpu_keys(&k1, &k2) <= 0
+				&& btrfs_comp_cpu_keys(&k2, &k3) <= 0) {
+				if (field != -1) {
+					field = -1;
+					break;
+				}
+				bit = j;
+				field = 2;
+			}
+			k2.offset ^= 0x1ULL << j;
+		}
+
+		if (field != -1) {
+			fprintf(stderr, "Fixing bitflipped key (%llx, %d, %llx) ",
+					k2.objectid, k2.type, k2.offset);
+			if (field == 0)
+				k2.objectid ^= 0x1ULL << bit;
+			else if (field == 1)
+				k2.type ^= 0x1ULL << bit;
+			else if (field == 2)
+				k2.offset ^= 0x1ULL << bit;
+
+			fprintf(stderr, "to (%llx, %d, %llx)\n",
+					k2.objectid, k2.type, k2.offset);
+
+			/* Write the key back to disk */
+			path->slots[0] = i+1;
+			btrfs_set_item_key_unsafe(root, path, &k2);
+			btrfs_mark_buffer_dirty(buf);
+			/* Go back and test all the keys again */
+			i = 0;
+		}
+	}
+
+	/* Now simply swap keys to put them in order */
 	for (i = 0; i < btrfs_header_nritems(buf) - 1; i++) {
 		if (level) {
 			btrfs_node_key_to_cpu(buf, &k1, i);
-- 
1.9.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] btrfs check: Attempt to fix misordered keys with bitflips in them
  2014-05-05 17:07 ` [PATCH 3/3] btrfs check: Attempt to fix misordered keys with bitflips in them Hugo Mills
@ 2014-05-16 14:22   ` David Sterba
  2014-05-16 14:43     ` Hugo Mills
  0 siblings, 1 reply; 6+ messages in thread
From: David Sterba @ 2014-05-16 14:22 UTC (permalink / raw)
  To: Hugo Mills; +Cc: linux-btrfs

On Mon, May 05, 2014 at 06:07:51PM +0100, Hugo Mills wrote:
> If precisely one of those bitflips puts the broken key back into order
> relative to its two neighbours, we probably have a fix for the bitflip,
> and so we write it back to the FS.

This sounds safe enough to me.  I'll add the patch to integration but
before I push it further upstream I'd really like to see the bitflip fix
in action, so if you already have testing images, please let me know.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] btrfs check: Attempt to fix misordered keys with bitflips in them
  2014-05-16 14:22   ` David Sterba
@ 2014-05-16 14:43     ` Hugo Mills
  0 siblings, 0 replies; 6+ messages in thread
From: Hugo Mills @ 2014-05-16 14:43 UTC (permalink / raw)
  To: dsterba, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1157 bytes --]

On Fri, May 16, 2014 at 04:22:36PM +0200, David Sterba wrote:
> On Mon, May 05, 2014 at 06:07:51PM +0100, Hugo Mills wrote:
> > If precisely one of those bitflips puts the broken key back into order
> > relative to its two neighbours, we probably have a fix for the bitflip,
> > and so we write it back to the FS.
> 
> This sounds safe enough to me.  I'll add the patch to integration but
> before I push it further upstream I'd really like to see the bitflip fix
> in action, so if you already have testing images, please let me know.

   Here's the one I mostly used to test with -- it's a 32 GiB sparse
full filesystem image, with a file full of zeroes in it. It has a
single bitflip in the csum tree created by hand with a hex editor, and
then the csum fixed up afterwards, again by hand.

   Hugo.

[1] http://carfax.org.uk/files/temp/testfs.img.tar.gz

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
     --- I don't care about "it works on my machine". We are not ---     
                         shipping your machine.                          

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-05-16 14:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-05 17:07 [[PATCH] 0/3] btrfs-check: Fix bitflipped keys from bad RAM Hugo Mills
2014-05-05 17:07 ` [PATCH 1/3] btrfs check: Fix wrong level access Hugo Mills
2014-05-05 17:07 ` [PATCH 2/3] btrfs check: Pre-sort keys in a block while searching Hugo Mills
2014-05-05 17:07 ` [PATCH 3/3] btrfs check: Attempt to fix misordered keys with bitflips in them Hugo Mills
2014-05-16 14:22   ` David Sterba
2014-05-16 14:43     ` Hugo Mills

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.