All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.
@ 2015-02-04  7:16 Qu Wenruo
  2015-02-04  7:16 ` [PATCH 1/7] btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result in the same level of path->lowest_level Qu Wenruo
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-02-04  7:16 UTC (permalink / raw)
  To: linux-btrfs

Btrfs's metadata csum is a good mechanism, keeping bit error away from
sensitive kernel. But such mechanism will also be too sensitive, like
bit error in csum bytes or low all zero bits in nodeptr.
It's a trade using "error tolerance" for stable, and is reasonable for
most cases since there is DUP/RAID1/5/6/10 duplication level.

But in some case, whatever for development purpose or despair user who
can't tolerant all his/her inline data lost, or even crazy QA team
hoping btrfs can survive heavy random bits bombing, there are some guys
want to get rid of the csum protection and face the crucial raw data no
matter what disaster may happen.

So, introduce the new '--dangerous' (or "destruction"/"debug" if you like)
option for btrfsck to reset all csum of tree blocks.

The csum reseting have the following features:
1) Top to down level by level
The csum resetting is done from tree to level 1, and only when all the
csum of nodes in this level is reset and can pass read_tree_block()
check, it will continue to next level.
And all bytenr in nodeptr will be re-aligned, so bit error in the low 12
bits(4K sector size case) can also be repaired without pain.
With this behavior, error in nodeptr has a chance not affecting its
child.

2) No Copy-on-write
COW means we needs to have a valid extent tree, if extent tree is
corrupted COW will only be a BUG_ON blocking us.
So all the r/w in this dangerous mode will use no-cow write. That's why
we export and slightly modified write_tree_block() to do no-cow tree
block write with newly calculated csum.
Since the write is not cowed, if it fails, it will also destroy the last
hope for manual inspection.

Qu Wenruo (7):
  btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result
    in     the same level of path->lowest_level.
  btrfs-progs: Introduce btrfs_next_slot() function to iterate to next  
      slot in given level.
  btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node.
  btrfs-progs: Export write_tree_block() and allow it to do nocow write.
  btrfs-progs: Introduce new function reset_tree_block_csum() for later 
       tree block csum reset.
  btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to  
      reset one/all tree's csum in tree root.
  btrfs-progs: Introduce "--dangerous" option to reset all tree block   
     csum.

 cmds-check.c | 284 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 ctree.c      |  18 ++--
 ctree.h      |  25 +++++-
 disk-io.c    |  55 +++++++++---
 disk-io.h    |   3 +
 5 files changed, 359 insertions(+), 26 deletions(-)

-- 
2.2.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/7] btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result in the same level of path->lowest_level.
  2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
@ 2015-02-04  7:16 ` Qu Wenruo
  2015-02-04  7:16 ` [PATCH 2/7] btrfs-progs: Introduce btrfs_next_slot() function to iterate to next slot in given level Qu Wenruo
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-02-04  7:16 UTC (permalink / raw)
  To: linux-btrfs

Before this patch, btrfs_(prev/next)_leaf() will search for the leaf
even path->lowest_level is set.

This is OK since nobody needs such function.

But later patches to clean up the csum in tree block needs to iterate
tree blocks level by level, which such function will be very handy.

This patch just modify the original btrfs_(prev/next)_leaf() to stop at
path->lowest_level, and alias btrfs_(prev/next)_tree_block() to them.
Since caller of btrfs_(prev/next)_leaf() don't set path->lowest_level,
there is no difference for them.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 ctree.c | 18 +++++++++++-------
 ctree.h | 12 ++++++++++++
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/ctree.c b/ctree.c
index 130c61f..2d1da1c 100644
--- a/ctree.c
+++ b/ctree.c
@@ -2750,13 +2750,15 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 
 /*
  * walk up the tree as far as required to find the previous leaf.
+ * result will be at the same level of path->lowest_level.
+ *
  * returns 0 if it found something or 1 if there are no lesser leaves.
  * returns < 0 on io errors.
  */
 int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path)
 {
-	int slot;
-	int level = 1;
+	int slot = 0;
+	int level = path->lowest_level + 1;
 	struct extent_buffer *c;
 	struct extent_buffer *next = NULL;
 
@@ -2792,7 +2794,7 @@ int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path)
 			slot--;
 		path->nodes[level] = next;
 		path->slots[level] = slot;
-		if (!level)
+		if (level == path->lowest_level)
 			break;
 		next = read_node_slot(root, next, slot);
 		if (!extent_buffer_uptodate(next)) {
@@ -2805,14 +2807,16 @@ int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path)
 }
 
 /*
- * walk up the tree as far as required to find the next leaf.
+ * walk up the tree as far as required to find the next node/leaf.
+ * result will be at the same level of path->lowest_level.
+ *
  * returns 0 if it found something or 1 if there are no greater leaves.
  * returns < 0 on io errors.
  */
 int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path)
 {
-	int slot;
-	int level = 1;
+	int slot = 0;
+	int level = path->lowest_level + 1;
 	struct extent_buffer *c;
 	struct extent_buffer *next = NULL;
 
@@ -2844,7 +2848,7 @@ int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path)
 		free_extent_buffer(c);
 		path->nodes[level] = next;
 		path->slots[level] = 0;
-		if (!level)
+		if (level == path->lowest_level)
 			break;
 		if (path->reada)
 			reada_for_search(root, path, level, 0, 0);
diff --git a/ctree.h b/ctree.h
index 2a678a9..c52d3de 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2334,6 +2334,12 @@ static inline int btrfs_insert_empty_item(struct btrfs_trans_handle *trans,
 }
 
 int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path);
+static inline int btrfs_next_tree_block(struct btrfs_root *root,
+					struct btrfs_path *path)
+{
+	return btrfs_next_leaf(root, path);
+}
+
 static inline int btrfs_next_item(struct btrfs_root *root,
 				  struct btrfs_path *p)
 {
@@ -2344,6 +2350,12 @@ static inline int btrfs_next_item(struct btrfs_root *root,
 }
 
 int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path);
+static inline int btrfs_prev_tree_block(struct btrfs_root *root,
+					struct btrfs_path *path)
+{
+	return btrfs_prev_leaf(root, path);
+}
+
 int btrfs_leaf_free_space(struct btrfs_root *root, struct extent_buffer *leaf);
 void btrfs_fixup_low_keys(struct btrfs_root *root, struct btrfs_path *path,
 			  struct btrfs_disk_key *key, int level);
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/7] btrfs-progs: Introduce btrfs_next_slot() function to iterate to next slot in given level.
  2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
  2015-02-04  7:16 ` [PATCH 1/7] btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result in the same level of path->lowest_level Qu Wenruo
@ 2015-02-04  7:16 ` Qu Wenruo
  2015-02-04  7:16 ` [PATCH 3/7] btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node Qu Wenruo
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-02-04  7:16 UTC (permalink / raw)
  To: linux-btrfs

This will help a lot in level by level iteration.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 ctree.h | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/ctree.h b/ctree.h
index c52d3de..0dfe733 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2340,13 +2340,18 @@ static inline int btrfs_next_tree_block(struct btrfs_root *root,
 	return btrfs_next_leaf(root, path);
 }
 
+static inline int btrfs_next_slot(struct btrfs_root *root,
+				  struct btrfs_path *p, int level)
+{
+	++p->slots[level];
+	if (p->slots[level] >= btrfs_header_nritems(p->nodes[level]))
+		return btrfs_next_tree_block(root, p);
+	return 0;
+}
 static inline int btrfs_next_item(struct btrfs_root *root,
 				  struct btrfs_path *p)
 {
-	++p->slots[0];
-	if (p->slots[0] >= btrfs_header_nritems(p->nodes[0]))
-		return btrfs_next_leaf(root, p);
-	return 0;
+	return btrfs_next_slot(root, p, 0);
 }
 
 int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path);
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/7] btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node.
  2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
  2015-02-04  7:16 ` [PATCH 1/7] btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result in the same level of path->lowest_level Qu Wenruo
  2015-02-04  7:16 ` [PATCH 2/7] btrfs-progs: Introduce btrfs_next_slot() function to iterate to next slot in given level Qu Wenruo
@ 2015-02-04  7:16 ` Qu Wenruo
  2015-02-04  7:16 ` [PATCH 4/7] btrfs-progs: Export write_tree_block() and allow it to do nocow write Qu Wenruo
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-02-04  7:16 UTC (permalink / raw)
  To: linux-btrfs

With this patch, btrfs_read_fs_root() will try to re-read the tree node
if it's not up to date.

This will help for the tree block csum resetting function.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 disk-io.c | 51 +++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index cb18002..747ef15 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -713,20 +713,47 @@ struct btrfs_root *btrfs_read_fs_root(struct btrfs_fs_info *fs_info,
 	struct btrfs_root *root;
 	struct rb_node *node;
 	int ret;
+	int found = 0;
 	u64 objectid = location->objectid;
 
-	if (location->objectid == BTRFS_ROOT_TREE_OBJECTID)
-		return fs_info->tree_root;
-	if (location->objectid == BTRFS_EXTENT_TREE_OBJECTID)
-		return fs_info->extent_root;
-	if (location->objectid == BTRFS_CHUNK_TREE_OBJECTID)
-		return fs_info->chunk_root;
-	if (location->objectid == BTRFS_DEV_TREE_OBJECTID)
-		return fs_info->dev_root;
-	if (location->objectid == BTRFS_CSUM_TREE_OBJECTID)
-		return fs_info->csum_root;
-	if (location->objectid == BTRFS_QUOTA_TREE_OBJECTID)
-		return fs_info->quota_root;
+	if (location->objectid == BTRFS_ROOT_TREE_OBJECTID) {
+		root = fs_info->tree_root;
+		found = 1;
+	}
+	if (location->objectid == BTRFS_EXTENT_TREE_OBJECTID) {
+		root = fs_info->extent_root;
+		found = 1;
+	}
+	if (location->objectid == BTRFS_CHUNK_TREE_OBJECTID) {
+		root = fs_info->chunk_root;
+		found = 1;
+	}
+	if (location->objectid == BTRFS_DEV_TREE_OBJECTID) {
+		root = fs_info->dev_root;
+		found = 1;
+	}
+	if (location->objectid == BTRFS_CSUM_TREE_OBJECTID) {
+		root = fs_info->csum_root;
+		found = 1;
+	}
+	if (location->objectid == BTRFS_QUOTA_TREE_OBJECTID) {
+		root = fs_info->quota_root;
+		found = 1;
+	}
+	/*
+	 * The specified root has corruption. We should try to reread
+	 * it since its treeblock csum may be reseted.
+	 */
+	if (found && !extent_buffer_uptodate(root->node)) {
+		u64 bytenr = btrfs_root_bytenr(&root->root_item);
+
+		root->node = read_tree_block(root, bytenr, root->nodesize, 0);
+		if (!extent_buffer_uptodate(root->node)) {
+			if (IS_ERR(root->node))
+				return ERR_PTR(PTR_ERR(root->node));
+			return ERR_PTR(-EIO);
+		}
+	}
 
 	BUG_ON(location->objectid == BTRFS_TREE_RELOC_OBJECTID ||
 	       location->offset != (u64)-1);
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/7] btrfs-progs: Export write_tree_block() and allow it to do nocow write.
  2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
                   ` (2 preceding siblings ...)
  2015-02-04  7:16 ` [PATCH 3/7] btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node Qu Wenruo
@ 2015-02-04  7:16 ` Qu Wenruo
  2015-02-04  7:16 ` [PATCH 4/5] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-02-04  7:16 UTC (permalink / raw)
  To: linux-btrfs

Export write_tree_block() function, and allow it to write data to disk
without transaction.

This is useful for resetting tree block csum, where the reset is done
level by level, so btrfs_search_slot() is doing lowest_level search and
can't do cow (cow with lowest_level will screw up extent backref).

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 disk-io.c | 4 ++--
 disk-io.h | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 747ef15..755cb7c 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -371,7 +371,7 @@ int write_and_map_eb(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
-static int write_tree_block(struct btrfs_trans_handle *trans,
+int write_tree_block(struct btrfs_trans_handle *trans,
 		     struct btrfs_root *root,
 		     struct extent_buffer *eb)
 {
@@ -380,7 +380,7 @@ static int write_tree_block(struct btrfs_trans_handle *trans,
 		BUG();
 	}
 
-	if (!btrfs_buffer_uptodate(eb, trans->transid))
+	if (trans && !btrfs_buffer_uptodate(eb, trans->transid))
 		BUG();
 
 	btrfs_set_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN);
diff --git a/disk-io.h b/disk-io.h
index c3eceaa..0cbe2dc 100644
--- a/disk-io.h
+++ b/disk-io.h
@@ -59,6 +59,9 @@ struct btrfs_device;
 int read_whole_eb(struct btrfs_fs_info *info, struct extent_buffer *eb, int mirror);
 struct extent_buffer *read_tree_block(struct btrfs_root *root, u64 bytenr,
 				      u32 blocksize, u64 parent_transid);
+int write_tree_block(struct btrfs_trans_handle *trans,
+		     struct btrfs_root *root,
+		     struct extent_buffer *eb);
 void readahead_tree_block(struct btrfs_root *root, u64 bytenr, u32 blocksize,
 			  u64 parent_transid);
 struct extent_buffer *btrfs_find_create_tree_block(struct btrfs_root *root,
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset.
  2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
                   ` (3 preceding siblings ...)
  2015-02-04  7:16 ` [PATCH 4/7] btrfs-progs: Export write_tree_block() and allow it to do nocow write Qu Wenruo
@ 2015-02-04  7:16 ` Qu Wenruo
  2015-02-04  7:16 ` [PATCH 5/5] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-02-04  7:16 UTC (permalink / raw)
  To: linux-btrfs

New function reset_tree_block_csum() will do the black magic to reset
csum for a tree block in-place.

This provides the basis to the whole tree csum resetting function.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 cmds-check.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 5817ecf..e4b4f4a 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9137,6 +9137,62 @@ static u16 report_root_corrupted(struct btrfs_fs_info *fs_info,
 	return ret;
 }
 
+/*
+ * Black magics to reset the csum of a tree block.
+ * The evil part is to modify block without transaction/cow.
+ *
+ * Return 0 if the csum is OK or reset the csum
+ * Return <0 if error happened
+ */
+static int reset_tree_block_csum(struct btrfs_fs_info *fs_info,
+				 u64 bytenr, u32 len)
+{
+	/*
+	 * read_tree_block just use root as ladder to reach fs_info,
+	 * so use chunk_root since it must be OK.
+	 */
+	struct btrfs_root *root = fs_info->chunk_root;
+	struct extent_buffer *eb;
+	char *buf = NULL;
+	u32 crc;
+	int ret = 0;
+
+	eb = read_tree_block(root, bytenr, len, 0);
+	/* No need to do anything since its csum is OK */
+	if (extent_buffer_uptodate(eb))
+		goto out;
+
+	buf = malloc(len);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	ret = read_data_from_disk(fs_info, buf, bytenr, len, 0);
+	if (ret < 0)
+		goto out;
+	crc = ~(u32)0;
+	crc = btrfs_csum_data(NULL, buf + BTRFS_CSUM_SIZE, crc,
+			      len - BTRFS_CSUM_SIZE);
+	btrfs_csum_final(crc, buf);
+	ret = write_data_to_disk(fs_info, buf, bytenr, len, 0);
+	if (ret < 0)
+		goto out;
+
+	/* Make sure now we can read the tree block */
+	eb = read_tree_block(root, bytenr, len, 0);
+	if (!extent_buffer_uptodate(eb)) {
+		if (IS_ERR(eb))
+			ret = PTR_ERR(eb);
+		else
+			ret = -EINVAL;
+		goto out;
+	}
+out:
+	free(buf);
+	free_extent_buffer(eb);
+	return ret;
+}
+
 const char * const cmd_check_usage[] = {
 	"btrfs check [options] <device>",
 	"Check an unmounted btrfs filesystem.",
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root.
  2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
                   ` (4 preceding siblings ...)
  2015-02-04  7:16 ` [PATCH 4/5] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
@ 2015-02-04  7:16 ` Qu Wenruo
  2015-02-04  7:16 ` [PATCH 5/7] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-02-04  7:16 UTC (permalink / raw)
  To: linux-btrfs

New function reset_one_root_csum() will reset all csum in one root.
And reset_roots_csum() will reset all csum of all trees in tree root.
which provides the basis for later dangerous options.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 cmds-check.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 157 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index e4b4f4a..2cdd3d9 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9193,6 +9193,163 @@ out:
 	return ret;
 }
 
+static int reset_one_root_csum(struct btrfs_trans_handle *trans,
+			       struct btrfs_root *root)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct extent_buffer *node;
+	u32 sectorsize = root->sectorsize;
+	int max_level = btrfs_root_level(&root->root_item);
+	int slot;
+	int i;
+	int ret = 0;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+	path->reada = 0;
+
+	/* Iterate all levels except level 0 */
+	for (i = 0; i < max_level; i++) {
+		int cur_level = max_level - i;
+
+		path->lowest_level = cur_level;
+		key.offset = 0;
+		key.objectid = 0;
+		key.type = 0;
+
+		ret = btrfs_search_slot(trans, root, &key, path, 1, 1);
+		if (ret < 0)
+			goto out;
+
+		/* Iterate all node slots in this level */
+		while (1) {
+			u64 bytenr;
+
+			node = path->nodes[0];
+			slot = path->slots[0];
+			bytenr = btrfs_node_blockptr(node, slot);
+
+			if (bytenr != round_down(bytenr, sectorsize)) {
+				bytenr = round_down(bytenr, sectorsize);
+				btrfs_set_node_blockptr(node, slot, bytenr);
+				btrfs_mark_buffer_dirty(node);
+			}
+			ret = reset_tree_block_csum(fs_info, bytenr,
+						    root->nodesize);
+			if (ret < 0) {
+				fprintf(stderr,
+					"Fail to reset csum for tree block at %llu\n",
+					bytenr);
+				goto next_slot;
+			}
+next_slot:
+			ret = btrfs_next_slot(root, path, cur_level);
+			/*
+			 * Error should not happen since higher level iteration
+			 * has already reset the csum of this level.
+			 * Either way, goto next level should be OK.
+			 */
+			if (ret)
+				break;
+		}
+		btrfs_release_path(path);
+	}
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+static int reset_roots_csum(struct btrfs_trans_handle *trans,
+			    struct btrfs_root *tree_root)
+{
+	struct btrfs_fs_info *fs_info = tree_root->fs_info;
+	struct btrfs_key key;
+	struct btrfs_path *path;
+	struct btrfs_root_item *ri;
+	struct btrfs_root *root;
+	struct extent_buffer *node;
+	u32 nodesize = tree_root->nodesize;
+	u32 sectorsize = tree_root->sectorsize;
+	u64 bytenr;
+	int slot;
+	int ret = 0;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = 0;
+	key.offset = 0;
+	key.type = 0;
+
+	ret = btrfs_search_slot(trans, tree_root, &key, path, 1, 0);
+	if (ret < 0)
+		goto out;
+	while (1) {
+		slot = path->slots[0];
+		node = path->nodes[0];
+		btrfs_item_key_to_cpu(node, &key, slot);
+		if (key.type != BTRFS_ROOT_ITEM_KEY)
+			goto next;
+
+		/*
+		 * skip tree reloc tree, it's not support by
+		 * btrfs_read_fs_root() yet.
+		 */
+		if (key.objectid == BTRFS_TREE_RELOC_OBJECTID)
+			goto next;
+
+		ri = btrfs_item_ptr(node, slot, struct btrfs_root_item);
+		bytenr = btrfs_disk_root_bytenr(node, ri);
+
+		/*
+		 * Check if the bytenr is aligned.
+		 * If the error bit is only in the low 12bits, no real damage
+		 * will happen since we will align it.
+		 */
+		if (round_down(bytenr, sectorsize) != bytenr) {
+			bytenr = round_down(bytenr, sectorsize);
+			btrfs_set_disk_root_bytenr(node, ri, bytenr);
+			btrfs_mark_buffer_dirty(node);
+		}
+
+		ret = reset_tree_block_csum(fs_info, bytenr, nodesize);
+		if (ret < 0) {
+			fprintf(stderr,
+				"Failed to reset root for tree %llu, skip it\n",
+				key.objectid);
+			goto next;
+		}
+		key.offset = (u64)-1;
+		root = btrfs_read_fs_root(fs_info, &key);
+		if (!root || IS_ERR(root) ||
+		    !extent_buffer_uptodate(root->node)) {
+			fprintf(stderr,
+				"Root of tree %lld is still corrupted, skip it\n",
+				key.objectid);
+			goto next;
+		}
+		ret = reset_one_root_csum(trans, root);
+		if (ret < 0)
+			goto next;
+
+next:
+		ret = btrfs_next_item(tree_root, path);
+		if (ret < 0)
+			goto out;
+		if (ret > 0) {
+			ret = 0;
+			goto out;
+		}
+	}
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
 const char * const cmd_check_usage[] = {
 	"btrfs check [options] <device>",
 	"Check an unmounted btrfs filesystem.",
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/7] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset.
  2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
                   ` (5 preceding siblings ...)
  2015-02-04  7:16 ` [PATCH 5/5] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
@ 2015-02-04  7:16 ` Qu Wenruo
  2015-02-04  7:16 ` [PATCH 6/7] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-02-04  7:16 UTC (permalink / raw)
  To: linux-btrfs

New function reset_tree_block_csum() will do the black magic to reset
csum for a tree block in-place.

This provides the basis to the whole tree csum resetting function.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 cmds-check.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 5817ecf..e4b4f4a 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9137,6 +9137,62 @@ static u16 report_root_corrupted(struct btrfs_fs_info *fs_info,
 	return ret;
 }
 
+/*
+ * Black magics to reset the csum of a tree block.
+ * The evil part is to modify block without transaction/cow.
+ *
+ * Return 0 if the csum is OK or reset the csum
+ * Return <0 if error happened
+ */
+static int reset_tree_block_csum(struct btrfs_fs_info *fs_info,
+				 u64 bytenr, u32 len)
+{
+	/*
+	 * read_tree_block just use root as ladder to reach fs_info,
+	 * so use chunk_root since it must be OK.
+	 */
+	struct btrfs_root *root = fs_info->chunk_root;
+	struct extent_buffer *eb;
+	char *buf = NULL;
+	u32 crc;
+	int ret = 0;
+
+	eb = read_tree_block(root, bytenr, len, 0);
+	/* No need to do anything since its csum is OK */
+	if (extent_buffer_uptodate(eb))
+		goto out;
+
+	buf = malloc(len);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	ret = read_data_from_disk(fs_info, buf, bytenr, len, 0);
+	if (ret < 0)
+		goto out;
+	crc = ~(u32)0;
+	crc = btrfs_csum_data(NULL, buf + BTRFS_CSUM_SIZE, crc,
+			      len - BTRFS_CSUM_SIZE);
+	btrfs_csum_final(crc, buf);
+	ret = write_data_to_disk(fs_info, buf, bytenr, len, 0);
+	if (ret < 0)
+		goto out;
+
+	/* Make sure now we can read the tree block */
+	eb = read_tree_block(root, bytenr, len, 0);
+	if (!extent_buffer_uptodate(eb)) {
+		if (IS_ERR(eb))
+			ret = PTR_ERR(eb);
+		else
+			ret = -EINVAL;
+		goto out;
+	}
+out:
+	free(buf);
+	free_extent_buffer(eb);
+	return ret;
+}
+
 const char * const cmd_check_usage[] = {
 	"btrfs check [options] <device>",
 	"Check an unmounted btrfs filesystem.",
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/7] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root.
  2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
                   ` (6 preceding siblings ...)
  2015-02-04  7:16 ` [PATCH 5/7] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
@ 2015-02-04  7:16 ` Qu Wenruo
  2015-02-04  7:16 ` [PATCH 7/7] btrfs-progs: Introduce "--dangerous" option to reset all tree block csum Qu Wenruo
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-02-04  7:16 UTC (permalink / raw)
  To: linux-btrfs

New function reset_one_root_csum() will reset all csum in one root.
And reset_roots_csum() will reset all csum of all trees in tree root.
which provides the basis for later dangerous options.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 cmds-check.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 176 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index e4b4f4a..535a518 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9193,6 +9193,182 @@ out:
 	return ret;
 }
 
+static int reset_one_root_csum(struct btrfs_root *root)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct extent_buffer *node;
+	u32 sectorsize = root->sectorsize;
+	int max_level = btrfs_root_level(&root->root_item);
+	int slot;
+	int i;
+	int ret = 0;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+	path->reada = 0;
+
+	/* Iterate all levels except level 0 */
+	for (i = 0; i < max_level; i++) {
+		int cur_level = max_level - i;
+
+		path->lowest_level = cur_level;
+		key.offset = 0;
+		key.objectid = 0;
+		key.type = 0;
+
+		/*
+		 * btrfs_search_slot() can't work with lowest_level and cow,
+		 * so use inslen=0 and cow=0 and later modify will be done
+		 * directly to disk.
+		 */
+		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+		if (ret < 0)
+			goto out;
+
+		/* Iterate all node slots in this level */
+		while (1) {
+			u64 bytenr;
+
+			node = path->nodes[0];
+			slot = path->slots[0];
+			bytenr = btrfs_node_blockptr(node, slot);
+
+			if (bytenr != round_down(bytenr, sectorsize)) {
+				bytenr = round_down(bytenr, sectorsize);
+				btrfs_set_node_blockptr(node, slot, bytenr);
+				ret = write_tree_block(NULL, root, node);
+				if (ret < 0) {
+					fprintf(stderr,
+					"Fail to write extent at %llu\n",
+						bytenr);
+					goto out;
+				}
+			}
+			ret = reset_tree_block_csum(fs_info, bytenr,
+						    root->nodesize);
+			if (ret < 0) {
+				fprintf(stderr,
+					"Fail to reset csum for tree block at %llu\n",
+					bytenr);
+				goto out;
+			}
+
+			ret = btrfs_next_slot(root, path, cur_level);
+			/*
+			 * Error should not happen since higher level iteration
+			 * has already reset the csum of this level.
+			 * Either way, goto next level should be OK.
+			 */
+			if (ret)
+				break;
+		}
+		btrfs_release_path(path);
+	}
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+static int reset_roots_csum(struct btrfs_root *tree_root)
+{
+	struct btrfs_fs_info *fs_info = tree_root->fs_info;
+	struct btrfs_key key;
+	struct btrfs_path *path;
+	struct btrfs_root_item *ri;
+	struct btrfs_root *root;
+	struct extent_buffer *node;
+	u32 nodesize = tree_root->nodesize;
+	u32 sectorsize = tree_root->sectorsize;
+	u64 bytenr;
+	int slot;
+	int ret = 0;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = 0;
+	key.offset = 0;
+	key.type = 0;
+
+	/*
+	 * Tree root is OK and we can do cow. But in case extent tree is
+	 * corrupted, we still use the nocow method.
+	 */
+	ret = btrfs_search_slot(NULL, tree_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	while (1) {
+		slot = path->slots[0];
+		node = path->nodes[0];
+		btrfs_item_key_to_cpu(node, &key, slot);
+		if (key.type != BTRFS_ROOT_ITEM_KEY)
+			goto next;
+
+		/*
+		 * skip tree reloc tree, it's not support by
+		 * btrfs_read_fs_root() yet.
+		 */
+		if (key.objectid == BTRFS_TREE_RELOC_OBJECTID)
+			goto next;
+
+		ri = btrfs_item_ptr(node, slot, struct btrfs_root_item);
+		bytenr = btrfs_disk_root_bytenr(node, ri);
+
+		/*
+		 * Check if the bytenr is aligned.
+		 * If the error bit is only in the low all zero bits,
+		 * no real damage will happen since we will align it.
+		 */
+		if (round_down(bytenr, sectorsize) != bytenr) {
+			bytenr = round_down(bytenr, sectorsize);
+			btrfs_set_disk_root_bytenr(node, ri, bytenr);
+			ret = write_tree_block(NULL, root, node);
+			if (ret < 0) {
+				fprintf(stderr,
+				"Fail to write extent at %llu\n",
+					bytenr);
+				goto out;
+			}
+		}
+
+		ret = reset_tree_block_csum(fs_info, bytenr, nodesize);
+		if (ret < 0) {
+			fprintf(stderr,
+				"Failed to reset root for tree %llu, skip it\n",
+				key.objectid);
+			goto next;
+		}
+		key.offset = (u64)-1;
+		root = btrfs_read_fs_root(fs_info, &key);
+		if (!root || IS_ERR(root) ||
+		    !extent_buffer_uptodate(root->node)) {
+			fprintf(stderr,
+				"Root of tree %lld is still corrupted, skip it\n",
+				key.objectid);
+			goto next;
+		}
+		ret = reset_one_root_csum(root);
+		if (ret < 0)
+			goto next;
+
+next:
+		ret = btrfs_next_item(tree_root, path);
+		if (ret < 0)
+			goto out;
+		if (ret > 0) {
+			ret = 0;
+			goto out;
+		}
+	}
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
 const char * const cmd_check_usage[] = {
 	"btrfs check [options] <device>",
 	"Check an unmounted btrfs filesystem.",
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/7] btrfs-progs: Introduce "--dangerous" option to reset all tree block csum.
  2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
                   ` (7 preceding siblings ...)
  2015-02-04  7:16 ` [PATCH 6/7] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
@ 2015-02-04  7:16 ` Qu Wenruo
  2015-02-04  9:16 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Martin Steigerwald
  2015-04-22  5:55 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
  10 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-02-04  7:16 UTC (permalink / raw)
  To: linux-btrfs

Sometimes minor bit error is repairable without much pain, like bit
error in tree root csum.
But in that case, if metadata profile is single it is unable to mount
nor btrfsck can repair it.

So add '--dangerous' option to reset all tree block csum.

NOTE: in most case, bit error can cause unpredictable error for btrfsck
or kernel. So this is *VERY VERY* dangerous, only designed for developer
or experienced btrfs user, or crazy guy who wants to mount a broken btrfs
at any cost.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 cmds-check.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 535a518..e5cb0ea 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -56,6 +56,7 @@ static int repair = 0;
 static int no_holes = 0;
 static int init_extent_tree = 0;
 static int check_data_csum = 0;
+static int dangerous = 0;
 
 struct extent_backref {
 	struct list_head list;
@@ -9232,8 +9233,8 @@ static int reset_one_root_csum(struct btrfs_root *root)
 		while (1) {
 			u64 bytenr;
 
-			node = path->nodes[0];
-			slot = path->slots[0];
+			node = path->nodes[cur_level];
+			slot = path->slots[cur_level];
 			bytenr = btrfs_node_blockptr(node, slot);
 
 			if (bytenr != round_down(bytenr, sectorsize)) {
@@ -9326,7 +9327,7 @@ static int reset_roots_csum(struct btrfs_root *tree_root)
 		if (round_down(bytenr, sectorsize) != bytenr) {
 			bytenr = round_down(bytenr, sectorsize);
 			btrfs_set_disk_root_bytenr(node, ri, bytenr);
-			ret = write_tree_block(NULL, root, node);
+			ret = write_tree_block(NULL, tree_root, node);
 			if (ret < 0) {
 				fprintf(stderr,
 				"Fail to write extent at %llu\n",
@@ -9369,6 +9370,30 @@ out:
 	return ret;
 }
 
+static int do_dangerous_work(struct btrfs_fs_info *fs_info)
+{
+	int ret = 0;
+
+	/*
+	 * TODO: we can use sb bytenr to reset tree root without a valid tree
+	 * root. But open_ctree will use backup/search tree, so this is not so
+	 * important.
+	 */
+	if (!extent_buffer_uptodate(fs_info->tree_root->node)) {
+		fprintf(stderr,
+			"Tree root corrupted, unable to continue.\n");
+		return -EIO;
+	}
+
+	/* First reset tree root csum */
+	ret = reset_one_root_csum(fs_info->tree_root);
+	if (ret < 0)
+		return ret;
+
+	ret = reset_roots_csum(fs_info->tree_root);
+	return ret;
+}
+
 const char * const cmd_check_usage[] = {
 	"btrfs check [options] <device>",
 	"Check an unmounted btrfs filesystem.",
@@ -9378,6 +9403,7 @@ const char * const cmd_check_usage[] = {
 	"--repair                    try to repair the filesystem",
 	"--init-csum-tree            create a new CRC tree",
 	"--init-extent-tree          create a new extent tree",
+	"--dangerous                 reset all tree block csum, very dangerous",
 	"--check-data-csum           verify checkums of data blocks",
 	"--qgroup-report             print a report on qgroup consistency",
 	"--subvol-extents <subvolid> print subvolume extents and sharing state",
@@ -9407,13 +9433,14 @@ int cmd_check(int argc, char **argv)
 		int c;
 		int option_index = 0;
 		enum { OPT_REPAIR = 257, OPT_INIT_CSUM, OPT_INIT_EXTENT,
-			OPT_CHECK_CSUM, OPT_READONLY };
+			OPT_CHECK_CSUM, OPT_READONLY, OPT_DANGEROUS };
 		static const struct option long_options[] = {
 			{ "super", 1, NULL, 's' },
 			{ "repair", 0, NULL, OPT_REPAIR },
 			{ "readonly", 0, NULL, OPT_READONLY },
 			{ "init-csum-tree", 0, NULL, OPT_INIT_CSUM },
 			{ "init-extent-tree", 0, NULL, OPT_INIT_EXTENT },
+			{ "dangerous", 0, NULL, OPT_DANGEROUS},
 			{ "check-data-csum", 0, NULL, OPT_CHECK_CSUM },
 			{ "backup", 0, NULL, 'b' },
 			{ "subvol-extents", 1, NULL, 'E' },
@@ -9478,6 +9505,13 @@ int cmd_check(int argc, char **argv)
 			case OPT_CHECK_CSUM:
 				check_data_csum = 1;
 				break;
+			case OPT_DANGEROUS:
+				dangerous = 1;
+				repair = 1;
+				ctree_flags |= (OPEN_CTREE_WRITES |
+						OPEN_CTREE_PARTIAL |
+						__RETURN_CHUNK_ROOT);
+				break;
 		}
 	}
 	argc = argc - optind;
@@ -9518,6 +9552,22 @@ again:
 
 	root = info->fs_root;
 
+	if (dangerous && !in_recheck) {
+		printf("Reset all csum of tree block may cause disaster!\n");
+		printf("Only do this if you have binary backup!\n");
+		ret = 1;
+		if (ask_user("Are you sure?")) {
+			in_recheck = 1;
+			dangerous = 0;
+			ret = do_dangerous_work(info);
+			close_ctree(root);
+			ctree_flags &= ~(__RETURN_CHUNK_ROOT |
+					 OPEN_CTREE_PARTIAL);
+			goto again;
+		} else
+			goto close_out;
+	}
+
 	/*
 	 * repair mode will force us to commit transaction which
 	 * will make us fail to load log tree when mounting.
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.
  2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
                   ` (8 preceding siblings ...)
  2015-02-04  7:16 ` [PATCH 7/7] btrfs-progs: Introduce "--dangerous" option to reset all tree block csum Qu Wenruo
@ 2015-02-04  9:16 ` Martin Steigerwald
  2015-02-04 10:07   ` Paul Jones
  2015-02-05  1:35   ` Qu Wenruo
  2015-04-22  5:55 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
  10 siblings, 2 replies; 18+ messages in thread
From: Martin Steigerwald @ 2015-02-04  9:16 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
> Btrfs's metadata csum is a good mechanism, keeping bit error away from
> sensitive kernel. But such mechanism will also be too sensitive, like
> bit error in csum bytes or low all zero bits in nodeptr.
> It's a trade using "error tolerance" for stable, and is reasonable for
> most cases since there is DUP/RAID1/5/6/10 duplication level.
> 
> But in some case, whatever for development purpose or despair user who
> can't tolerant all his/her inline data lost, or even crazy QA team
> hoping btrfs can survive heavy random bits bombing, there are some guys
> want to get rid of the csum protection and face the crucial raw data no
> matter what disaster may happen.
> 
> So, introduce the new '--dangerous' (or "destruction"/"debug" if you
> like) option for btrfsck to reset all csum of tree blocks.

I often wondered about this: AFAIK if you get a csum error BTRFS makes 
this an input/output error. For being able to access the data in place, 
how about a "iwantmycorrupteddataback" mount option where BTRFS just logs 
csum errors but allows one to access the files nonetheless. This could even 
work together with remount. Maybe it would be good not to allow writing to 
broken csum blocks, i.e. fail these with input/output error.

This way, the csum would not be automatically fixed, *but* one is able to 
access the broken data, *while* knowing it is broken.

If that is possible already, I missed it.

> The csum reseting have the following features:
> 1) Top to down level by level
> The csum resetting is done from tree to level 1, and only when all the
> csum of nodes in this level is reset and can pass read_tree_block()
> check, it will continue to next level.
> And all bytenr in nodeptr will be re-aligned, so bit error in the low 12
> bits(4K sector size case) can also be repaired without pain.
> With this behavior, error in nodeptr has a chance not affecting its
> child.
> 
> 2) No Copy-on-write
> COW means we needs to have a valid extent tree, if extent tree is
> corrupted COW will only be a BUG_ON blocking us.
> So all the r/w in this dangerous mode will use no-cow write. That's why
> we export and slightly modified write_tree_block() to do no-cow tree
> block write with newly calculated csum.
> Since the write is not cowed, if it fails, it will also destroy the last
> hope for manual inspection.
> 
> Qu Wenruo (7):
>   btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result
>     in     the same level of path->lowest_level.
>   btrfs-progs: Introduce btrfs_next_slot() function to iterate to next
>       slot in given level.
>   btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node.
>   btrfs-progs: Export write_tree_block() and allow it to do nocow write.
> btrfs-progs: Introduce new function reset_tree_block_csum() for later
> tree block csum reset.
>   btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to
>       reset one/all tree's csum in tree root.
>   btrfs-progs: Introduce "--dangerous" option to reset all tree block
>      csum.
> 
>  cmds-check.c | 284
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- ctree.c    
>  |  18 ++--
>  ctree.h      |  25 +++++-
>  disk-io.c    |  55 +++++++++---
>  disk-io.h    |   3 +
>  5 files changed, 359 insertions(+), 26 deletions(-)

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.
  2015-02-04  9:16 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Martin Steigerwald
@ 2015-02-04 10:07   ` Paul Jones
  2015-02-05  1:43     ` Qu Wenruo
  2015-02-05  1:35   ` Qu Wenruo
  1 sibling, 1 reply; 18+ messages in thread
From: Paul Jones @ 2015-02-04 10:07 UTC (permalink / raw)
  To: Martin Steigerwald, Qu Wenruo; +Cc: linux-btrfs

> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> owner@vger.kernel.org] On Behalf Of Martin Steigerwald
> Sent: Wednesday, 4 February 2015 8:16 PM
> To: Qu Wenruo
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA
> dangerous mode.
> 
> Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
> > Btrfs's metadata csum is a good mechanism, keeping bit error away from
> > sensitive kernel. But such mechanism will also be too sensitive, like
> > bit error in csum bytes or low all zero bits in nodeptr.
> > It's a trade using "error tolerance" for stable, and is reasonable for
> > most cases since there is DUP/RAID1/5/6/10 duplication level.
> >
> > But in some case, whatever for development purpose or despair user who
> > can't tolerant all his/her inline data lost, or even crazy QA team
> > hoping btrfs can survive heavy random bits bombing, there are some
> > guys want to get rid of the csum protection and face the crucial raw
> > data no matter what disaster may happen.
> >
> > So, introduce the new '--dangerous' (or "destruction"/"debug" if you
> > like) option for btrfsck to reset all csum of tree blocks.
> 
> I often wondered about this: AFAIK if you get a csum error BTRFS makes this
> an input/output error. For being able to access the data in place, how about a
> "iwantmycorrupteddataback" mount option where BTRFS just logs csum
> errors but allows one to access the files nonetheless. This could even work
> together with remount. Maybe it would be good not to allow writing to
> broken csum blocks, i.e. fail these with input/output error.
> 
> This way, the csum would not be automatically fixed, *but* one is able to
> access the broken data, *while* knowing it is broken.


I seriously could have used that yesterday - I had a raw VM image with a csum error that wouldn't go away. The VM worked fine (even rebooting) so I figured I would just copy the file to another filesystem and then copy it back. Rsync doesn't play nicely with errors so I used dd if=disk1 of=/elsewhere/disk1 bs=4096 conv=notrunc,noerror but after waiting for 100G to copy twice it no longer booted. 
The backup was only 8 hours old so no big deal, but if it was a busy day that could have been nasty! (Why I didn't press the backup button before I did the above I don't know...)

Paul.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.
  2015-02-04  9:16 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Martin Steigerwald
  2015-02-04 10:07   ` Paul Jones
@ 2015-02-05  1:35   ` Qu Wenruo
  2015-02-05  8:31     ` Martin Steigerwald
  1 sibling, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2015-02-05  1:35 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-btrfs


-------- Original Message --------
Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, 
AKA dangerous mode.
From: Martin Steigerwald <martin@lichtvoll.de>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2015年02月04日 17:16
> Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
>> Btrfs's metadata csum is a good mechanism, keeping bit error away from
>> sensitive kernel. But such mechanism will also be too sensitive, like
>> bit error in csum bytes or low all zero bits in nodeptr.
>> It's a trade using "error tolerance" for stable, and is reasonable for
>> most cases since there is DUP/RAID1/5/6/10 duplication level.
>>
>> But in some case, whatever for development purpose or despair user who
>> can't tolerant all his/her inline data lost, or even crazy QA team
>> hoping btrfs can survive heavy random bits bombing, there are some guys
>> want to get rid of the csum protection and face the crucial raw data no
>> matter what disaster may happen.
>>
>> So, introduce the new '--dangerous' (or "destruction"/"debug" if you
>> like) option for btrfsck to reset all csum of tree blocks.
> I often wondered about this: AFAIK if you get a csum error BTRFS makes
> this an input/output error. For being able to access the data in place,
> how about a "iwantmycorrupteddataback" mount option where BTRFS just logs
> csum errors but allows one to access the files nonetheless.
The idea is good, but don't forget we have metadata(tree block) and data.
For data, this is completely OK.
But for metadata, this may be a disaster just like the --dangerous option.
> This could even
> work together with remount. Maybe it would be good not to allow writing to
> broken csum blocks, i.e. fail these with input/output error.
Don't forget btrfs' COW write.
So write into data shouldn't be a problem.(if COW is enabled).
>
> This way, the csum would not be automatically fixed, *but* one is able to
> access the broken data, *while* knowing it is broken.
>
> If that is possible already, I missed it.
Much as you considered, data csum can be rebuilt in btrfsck with 
--init-csum-tree option.
Although not every user knows this feature and even less users know the 
correct timing using it.

Thanks,
Qu
>
>> The csum reseting have the following features:
>> 1) Top to down level by level
>> The csum resetting is done from tree to level 1, and only when all the
>> csum of nodes in this level is reset and can pass read_tree_block()
>> check, it will continue to next level.
>> And all bytenr in nodeptr will be re-aligned, so bit error in the low 12
>> bits(4K sector size case) can also be repaired without pain.
>> With this behavior, error in nodeptr has a chance not affecting its
>> child.
>>
>> 2) No Copy-on-write
>> COW means we needs to have a valid extent tree, if extent tree is
>> corrupted COW will only be a BUG_ON blocking us.
>> So all the r/w in this dangerous mode will use no-cow write. That's why
>> we export and slightly modified write_tree_block() to do no-cow tree
>> block write with newly calculated csum.
>> Since the write is not cowed, if it fails, it will also destroy the last
>> hope for manual inspection.
>>
>> Qu Wenruo (7):
>>    btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result
>>      in     the same level of path->lowest_level.
>>    btrfs-progs: Introduce btrfs_next_slot() function to iterate to next
>>        slot in given level.
>>    btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node.
>>    btrfs-progs: Export write_tree_block() and allow it to do nocow write.
>> btrfs-progs: Introduce new function reset_tree_block_csum() for later
>> tree block csum reset.
>>    btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to
>>        reset one/all tree's csum in tree root.
>>    btrfs-progs: Introduce "--dangerous" option to reset all tree block
>>       csum.
>>
>>   cmds-check.c | 284
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- ctree.c
>>   |  18 ++--
>>   ctree.h      |  25 +++++-
>>   disk-io.c    |  55 +++++++++---
>>   disk-io.h    |   3 +
>>   5 files changed, 359 insertions(+), 26 deletions(-)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.
  2015-02-04 10:07   ` Paul Jones
@ 2015-02-05  1:43     ` Qu Wenruo
  0 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-02-05  1:43 UTC (permalink / raw)
  To: Paul Jones, Martin Steigerwald; +Cc: linux-btrfs


-------- Original Message --------
Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, 
AKA dangerous mode.
From: Paul Jones <paul@pauljones.id.au>
To: Martin Steigerwald <martin@lichtvoll.de>, Qu Wenruo 
<quwenruo@cn.fujitsu.com>
Date: 2015年02月04日 18:07
>> -----Original Message-----
>> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
>> owner@vger.kernel.org] On Behalf Of Martin Steigerwald
>> Sent: Wednesday, 4 February 2015 8:16 PM
>> To: Qu Wenruo
>> Cc: linux-btrfs@vger.kernel.org
>> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA
>> dangerous mode.
>>
>> Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
>>> Btrfs's metadata csum is a good mechanism, keeping bit error away from
>>> sensitive kernel. But such mechanism will also be too sensitive, like
>>> bit error in csum bytes or low all zero bits in nodeptr.
>>> It's a trade using "error tolerance" for stable, and is reasonable for
>>> most cases since there is DUP/RAID1/5/6/10 duplication level.
>>>
>>> But in some case, whatever for development purpose or despair user who
>>> can't tolerant all his/her inline data lost, or even crazy QA team
>>> hoping btrfs can survive heavy random bits bombing, there are some
>>> guys want to get rid of the csum protection and face the crucial raw
>>> data no matter what disaster may happen.
>>>
>>> So, introduce the new '--dangerous' (or "destruction"/"debug" if you
>>> like) option for btrfsck to reset all csum of tree blocks.
>> I often wondered about this: AFAIK if you get a csum error BTRFS makes this
>> an input/output error. For being able to access the data in place, how about a
>> "iwantmycorrupteddataback" mount option where BTRFS just logs csum
>> errors but allows one to access the files nonetheless. This could even work
>> together with remount. Maybe it would be good not to allow writing to
>> broken csum blocks, i.e. fail these with input/output error.
>>
>> This way, the csum would not be automatically fixed, *but* one is able to
>> access the broken data, *while* knowing it is broken.
>
> I seriously could have used that yesterday - I had a raw VM image with a csum error that wouldn't go away.
Is the image stored in btrfs? And you are sure the csum error belongs to 
the image?
If so, this function will not really help since the --dangerous option 
will only reset metadata csum, not
data csum.

And in that case, btrfsck --init-csum-tree  <your btrfs device> would be 
a much better choice.
> The VM worked fine (even rebooting) so I figured I would just copy the file to another filesystem and then copy it back. Rsync doesn't play nicely with errors so I used dd if=disk1 of=/elsewhere/disk1 bs=4096 conv=notrunc,noerror but after waiting for 100G to copy twice it no longer booted.
Not quite sure about conv=noerror, for case 4K OK, 4K bad, 4K OK case, 
if conv=noerror cause output to be
4K OK, 4K OK then that's the problem.
If conv=noerror cause output to be 4K OK, 4K all zero, 4K OK, then IMHO 
the problem should not happen...

Thanks,
Qu
> The backup was only 8 hours old so no big deal, but if it was a busy day that could have been nasty! (Why I didn't press the backup button before I did the above I don't know...)
>
> Paul.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.
  2015-02-05  1:35   ` Qu Wenruo
@ 2015-02-05  8:31     ` Martin Steigerwald
  2015-02-05  8:45       ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Steigerwald @ 2015-02-05  8:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Am Donnerstag, 5. Februar 2015, 09:35:26 schrieb Qu Wenruo:
> -------- Original Message --------
> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks,
> AKA dangerous mode.
> From: Martin Steigerwald <martin@lichtvoll.de>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2015年02月04日 17:16
> 
> > Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
> >> Btrfs's metadata csum is a good mechanism, keeping bit error away
> >> from
> >> sensitive kernel. But such mechanism will also be too sensitive, like
> >> bit error in csum bytes or low all zero bits in nodeptr.
> >> It's a trade using "error tolerance" for stable, and is reasonable
> >> for
> >> most cases since there is DUP/RAID1/5/6/10 duplication level.
> >> 
> >> But in some case, whatever for development purpose or despair user
> >> who
> >> can't tolerant all his/her inline data lost, or even crazy QA team
> >> hoping btrfs can survive heavy random bits bombing, there are some
> >> guys
> >> want to get rid of the csum protection and face the crucial raw data
> >> no
> >> matter what disaster may happen.
> >> 
> >> So, introduce the new '--dangerous' (or "destruction"/"debug" if you
> >> like) option for btrfsck to reset all csum of tree blocks.
> > 
> > I often wondered about this: AFAIK if you get a csum error BTRFS makes
> > this an input/output error. For being able to access the data in
> > place,
> > how about a "iwantmycorrupteddataback" mount option where BTRFS just
> > logs csum errors but allows one to access the files nonetheless.
> 
> The idea is good, but don't forget we have metadata(tree block) and
> data. For data, this is completely OK.
> But for metadata, this may be a disaster just like the --dangerous
> option.

Ah yes, so probably only do this for data or have an extra option for 
skipping csum on metadata for the really desparate, but then I´d really 
force read only to avoid corrupted causing more damage.

> > This could even
> > work together with remount. Maybe it would be good not to allow
> > writing to broken csum blocks, i.e. fail these with input/output
> > error.
> 
> Don't forget btrfs' COW write.
> So write into data shouldn't be a problem.(if COW is enabled).

Yes, but… it hides the corruption. Unless you have a snapshot if an 
application reads corrupted data and then writes it back, then you have no 
indication that the data was corrupted in the first time.

> > This way, the csum would not be automatically fixed, *but* one is able
> > to access the broken data, *while* knowing it is broken.
> > 
> > If that is possible already, I missed it.
> 
> Much as you considered, data csum can be rebuilt in btrfsck with
> --init-csum-tree option.
> Although not every user knows this feature and even less users know the
> correct timing using it.

I wonder about making a wiki page about recovery options with two parts:

1) Diagnosis. First find out what might be wrong.

2) Cure. Then decide which steps to try to recover.

And of cause an intro on best practice to only work on a copy of the copy 
for any in-place repair attempts.

I´d be willing to make such a page, provided I get enough hints on what to 
try when. I have some ideas myself, but I am not sure they are accurate :)

Thanks,
Martin


> 
> Thanks,
> Qu
> 
> >> The csum reseting have the following features:
> >> 1) Top to down level by level
> >> The csum resetting is done from tree to level 1, and only when all
> >> the
> >> csum of nodes in this level is reset and can pass read_tree_block()
> >> check, it will continue to next level.
> >> And all bytenr in nodeptr will be re-aligned, so bit error in the low
> >> 12 bits(4K sector size case) can also be repaired without pain.
> >> With this behavior, error in nodeptr has a chance not affecting its
> >> child.
> >> 
> >> 2) No Copy-on-write
> >> COW means we needs to have a valid extent tree, if extent tree is
> >> corrupted COW will only be a BUG_ON blocking us.
> >> So all the r/w in this dangerous mode will use no-cow write. That's
> >> why
> >> we export and slightly modified write_tree_block() to do no-cow tree
> >> block write with newly calculated csum.
> >> Since the write is not cowed, if it fails, it will also destroy the
> >> last hope for manual inspection.
> >> 
> >> Qu Wenruo (7):
> >>    btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search
> >>    result
> >>    
> >>      in     the same level of path->lowest_level.
> >>    
> >>    btrfs-progs: Introduce btrfs_next_slot() function to iterate to
> >>    next
> >>    
> >>        slot in given level.
> >>    
> >>    btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node.
> >>    btrfs-progs: Export write_tree_block() and allow it to do nocow
> >>    write.
> >> 
> >> btrfs-progs: Introduce new function reset_tree_block_csum() for later
> >> tree block csum reset.
> >> 
> >>    btrfs-progs: Introduce new function reset_(one_root/roots)_csum()
> >>    to
> >>    
> >>        reset one/all tree's csum in tree root.
> >>    
> >>    btrfs-progs: Introduce "--dangerous" option to reset all tree
> >>    block
> >>    
> >>       csum.
> >>   
> >>   cmds-check.c | 284
> >> 
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- ctree.c
> >> 
> >>   |  18 ++--
> >>   
> >>   ctree.h      |  25 +++++-
> >>   disk-io.c    |  55 +++++++++---
> >>   disk-io.h    |   3 +
> >>   5 files changed, 359 insertions(+), 26 deletions(-)

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.
  2015-02-05  8:31     ` Martin Steigerwald
@ 2015-02-05  8:45       ` Qu Wenruo
  2015-02-05  8:59         ` BTRFS wiki: page about recovery (was: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.) Martin Steigerwald
  0 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2015-02-05  8:45 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-btrfs


-------- Original Message --------
Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, 
AKA dangerous mode.
From: Martin Steigerwald <martin@lichtvoll.de>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2015年02月05日 16:31
> Am Donnerstag, 5. Februar 2015, 09:35:26 schrieb Qu Wenruo:
>> -------- Original Message --------
>> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks,
>> AKA dangerous mode.
>> From: Martin Steigerwald <martin@lichtvoll.de>
>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Date: 2015年02月04日 17:16
>>
>>> Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
>>>> Btrfs's metadata csum is a good mechanism, keeping bit error away
>>>> from
>>>> sensitive kernel. But such mechanism will also be too sensitive, like
>>>> bit error in csum bytes or low all zero bits in nodeptr.
>>>> It's a trade using "error tolerance" for stable, and is reasonable
>>>> for
>>>> most cases since there is DUP/RAID1/5/6/10 duplication level.
>>>>
>>>> But in some case, whatever for development purpose or despair user
>>>> who
>>>> can't tolerant all his/her inline data lost, or even crazy QA team
>>>> hoping btrfs can survive heavy random bits bombing, there are some
>>>> guys
>>>> want to get rid of the csum protection and face the crucial raw data
>>>> no
>>>> matter what disaster may happen.
>>>>
>>>> So, introduce the new '--dangerous' (or "destruction"/"debug" if you
>>>> like) option for btrfsck to reset all csum of tree blocks.
>>> I often wondered about this: AFAIK if you get a csum error BTRFS makes
>>> this an input/output error. For being able to access the data in
>>> place,
>>> how about a "iwantmycorrupteddataback" mount option where BTRFS just
>>> logs csum errors but allows one to access the files nonetheless.
>> The idea is good, but don't forget we have metadata(tree block) and
>> data. For data, this is completely OK.
>> But for metadata, this may be a disaster just like the --dangerous
>> option.
> Ah yes, so probably only do this for data or have an extra option for
> skipping csum on metadata for the really desparate, but then I´d really
> force read only to avoid corrupted causing more damage.
>
>>> This could even
>>> work together with remount. Maybe it would be good not to allow
>>> writing to broken csum blocks, i.e. fail these with input/output
>>> error.
>> Don't forget btrfs' COW write.
>> So write into data shouldn't be a problem.(if COW is enabled).
> Yes, but… it hides the corruption. Unless you have a snapshot if an
> application reads corrupted data and then writes it back, then you have no
> indication that the data was corrupted in the first time.
>
>>> This way, the csum would not be automatically fixed, *but* one is able
>>> to access the broken data, *while* knowing it is broken.
>>>
>>> If that is possible already, I missed it.
>> Much as you considered, data csum can be rebuilt in btrfsck with
>> --init-csum-tree option.
>> Although not every user knows this feature and even less users know the
>> correct timing using it.
> I wonder about making a wiki page about recovery options with two parts:
>
> 1) Diagnosis. First find out what might be wrong.
>
> 2) Cure. Then decide which steps to try to recover.
This seems really useful.

But I'm a little afraid of introducing too much info for end user, 
metadata/data, difference between btrfsck
and scrub and tons of other things may make user confused.
And more, this things should be done by btrfsck automatically...

Beside this, wiki pages about real world btrfs recovery strategy is very 
helpful.
Feel free to add, although I'm not sure how to add pages to btrfs wiki, 
maybe you need to contact Marc or
David?

Thanks,
Qu
>
> And of cause an intro on best practice to only work on a copy of the copy
> for any in-place repair attempts.
>
> I´d be willing to make such a page, provided I get enough hints on what to
> try when. I have some ideas myself, but I am not sure they are accurate :)
>
> Thanks,
> Martin
>
>
>> Thanks,
>> Qu
>>
>>>> The csum reseting have the following features:
>>>> 1) Top to down level by level
>>>> The csum resetting is done from tree to level 1, and only when all
>>>> the
>>>> csum of nodes in this level is reset and can pass read_tree_block()
>>>> check, it will continue to next level.
>>>> And all bytenr in nodeptr will be re-aligned, so bit error in the low
>>>> 12 bits(4K sector size case) can also be repaired without pain.
>>>> With this behavior, error in nodeptr has a chance not affecting its
>>>> child.
>>>>
>>>> 2) No Copy-on-write
>>>> COW means we needs to have a valid extent tree, if extent tree is
>>>> corrupted COW will only be a BUG_ON blocking us.
>>>> So all the r/w in this dangerous mode will use no-cow write. That's
>>>> why
>>>> we export and slightly modified write_tree_block() to do no-cow tree
>>>> block write with newly calculated csum.
>>>> Since the write is not cowed, if it fails, it will also destroy the
>>>> last hope for manual inspection.
>>>>
>>>> Qu Wenruo (7):
>>>>     btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search
>>>>     result
>>>>     
>>>>       in     the same level of path->lowest_level.
>>>>     
>>>>     btrfs-progs: Introduce btrfs_next_slot() function to iterate to
>>>>     next
>>>>     
>>>>         slot in given level.
>>>>     
>>>>     btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node.
>>>>     btrfs-progs: Export write_tree_block() and allow it to do nocow
>>>>     write.
>>>>
>>>> btrfs-progs: Introduce new function reset_tree_block_csum() for later
>>>> tree block csum reset.
>>>>
>>>>     btrfs-progs: Introduce new function reset_(one_root/roots)_csum()
>>>>     to
>>>>     
>>>>         reset one/all tree's csum in tree root.
>>>>     
>>>>     btrfs-progs: Introduce "--dangerous" option to reset all tree
>>>>     block
>>>>     
>>>>        csum.
>>>>    
>>>>    cmds-check.c | 284
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- ctree.c
>>>>
>>>>    |  18 ++--
>>>>    
>>>>    ctree.h      |  25 +++++-
>>>>    disk-io.c    |  55 +++++++++---
>>>>    disk-io.h    |   3 +
>>>>    5 files changed, 359 insertions(+), 26 deletions(-)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* BTRFS wiki: page about recovery (was: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.)
  2015-02-05  8:45       ` Qu Wenruo
@ 2015-02-05  8:59         ` Martin Steigerwald
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Steigerwald @ 2015-02-05  8:59 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, David Sterba

Am Donnerstag, 5. Februar 2015, 16:45:17 schrieb Qu Wenruo:
> -------- Original Message --------
> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks,
> AKA dangerous mode.
> From: Martin Steigerwald <martin@lichtvoll.de>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2015年02月05日 16:31
> 
> > Am Donnerstag, 5. Februar 2015, 09:35:26 schrieb Qu Wenruo:
> >> -------- Original Message --------
> >> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree
> >> blocks, AKA dangerous mode.
> >> From: Martin Steigerwald <martin@lichtvoll.de>
> >> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> >> Date: 2015年02月04日 17:16
> >> 
> >>> Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
> >>>> Btrfs's metadata csum is a good mechanism, keeping bit error away
> >>>> from
> >>>> sensitive kernel. But such mechanism will also be too sensitive,
> >>>> like
> >>>> bit error in csum bytes or low all zero bits in nodeptr.
> >>>> It's a trade using "error tolerance" for stable, and is reasonable
> >>>> for
> >>>> most cases since there is DUP/RAID1/5/6/10 duplication level.
> >>>> 
> >>>> But in some case, whatever for development purpose or despair user
> >>>> who
> >>>> can't tolerant all his/her inline data lost, or even crazy QA team
> >>>> hoping btrfs can survive heavy random bits bombing, there are some
> >>>> guys
> >>>> want to get rid of the csum protection and face the crucial raw
> >>>> data
> >>>> no
> >>>> matter what disaster may happen.
> >>>> 
> >>>> So, introduce the new '--dangerous' (or "destruction"/"debug" if
> >>>> you
> >>>> like) option for btrfsck to reset all csum of tree blocks.
> >>> 
> >>> I often wondered about this: AFAIK if you get a csum error BTRFS
> >>> makes
> >>> this an input/output error. For being able to access the data in
> >>> place,
> >>> how about a "iwantmycorrupteddataback" mount option where BTRFS just
> >>> logs csum errors but allows one to access the files nonetheless.
> >> 
> >> The idea is good, but don't forget we have metadata(tree block) and
> >> data. For data, this is completely OK.
> >> But for metadata, this may be a disaster just like the --dangerous
> >> option.
> > 
> > Ah yes, so probably only do this for data or have an extra option for
> > skipping csum on metadata for the really desparate, but then I´d
> > really
> > force read only to avoid corrupted causing more damage.
> > 
> >>> This could even
> >>> work together with remount. Maybe it would be good not to allow
> >>> writing to broken csum blocks, i.e. fail these with input/output
> >>> error.
> >> 
> >> Don't forget btrfs' COW write.
> >> So write into data shouldn't be a problem.(if COW is enabled).
> > 
> > Yes, but… it hides the corruption. Unless you have a snapshot if an
> > application reads corrupted data and then writes it back, then you
> > have no indication that the data was corrupted in the first time.
> > 
> >>> This way, the csum would not be automatically fixed, *but* one is
> >>> able
> >>> to access the broken data, *while* knowing it is broken.
> >>> 
> >>> If that is possible already, I missed it.
> >> 
> >> Much as you considered, data csum can be rebuilt in btrfsck with
> >> --init-csum-tree option.
> >> Although not every user knows this feature and even less users know
> >> the
> >> correct timing using it.
> > 
> > I wonder about making a wiki page about recovery options with two
> > parts:
> > 
> > 1) Diagnosis. First find out what might be wrong.
> > 
> > 2) Cure. Then decide which steps to try to recover.
> 
> This seems really useful.
> 
> But I'm a little afraid of introducing too much info for end user,
> metadata/data, difference between btrfsck
> and scrub and tons of other things may make user confused.
> And more, this things should be done by btrfsck automatically...

Sure. The page should contain a disclaimer anyway, and I think its good to 
have it as easy as possible for the user. But also, for the early 
adopters, I think it is really good to have some guidance available, with 
the caveat to always ask here on the mailing list if unsure about next 
step.

> Beside this, wiki pages about real world btrfs recovery strategy is very
> helpful.
> Feel free to add, although I'm not sure how to add pages to btrfs wiki,
> maybe you need to contact Marc or
> David?

David, I requested a wiki account via the page and even made a (not quite 
serious) 50 words biography in order to pass that form.

Thanks,
Martin

> 
> Thanks,
> Qu
> 
> > And of cause an intro on best practice to only work on a copy of the
> > copy for any in-place repair attempts.
> > 
> > I´d be willing to make such a page, provided I get enough hints on
> > what to try when. I have some ideas myself, but I am not sure they
> > are accurate :)
> > 
> > Thanks,
> > Martin
> > 
> >> Thanks,
> >> Qu
> >> 
> >>>> The csum reseting have the following features:
> >>>> 1) Top to down level by level
> >>>> The csum resetting is done from tree to level 1, and only when all
> >>>> the
> >>>> csum of nodes in this level is reset and can pass read_tree_block()
> >>>> check, it will continue to next level.
> >>>> And all bytenr in nodeptr will be re-aligned, so bit error in the
> >>>> low
> >>>> 12 bits(4K sector size case) can also be repaired without pain.
> >>>> With this behavior, error in nodeptr has a chance not affecting its
> >>>> child.
> >>>> 
> >>>> 2) No Copy-on-write
> >>>> COW means we needs to have a valid extent tree, if extent tree is
> >>>> corrupted COW will only be a BUG_ON blocking us.
> >>>> So all the r/w in this dangerous mode will use no-cow write. That's
> >>>> why
> >>>> we export and slightly modified write_tree_block() to do no-cow
> >>>> tree
> >>>> block write with newly calculated csum.
> >>>> Since the write is not cowed, if it fails, it will also destroy the
> >>>> last hope for manual inspection.
> >>>> 
> >>>> Qu Wenruo (7):
> >>>>     btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search
> >>>>     result
> >>>>     
> >>>>       in     the same level of path->lowest_level.
> >>>>     
> >>>>     btrfs-progs: Introduce btrfs_next_slot() function to iterate to
> >>>>     next
> >>>>     
> >>>>         slot in given level.
> >>>>     
> >>>>     btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree
> >>>>     node.
> >>>>     btrfs-progs: Export write_tree_block() and allow it to do nocow
> >>>>     write.
> >>>> 
> >>>> btrfs-progs: Introduce new function reset_tree_block_csum() for
> >>>> later
> >>>> tree block csum reset.
> >>>> 
> >>>>     btrfs-progs: Introduce new function
> >>>>     reset_(one_root/roots)_csum()
> >>>>     to
> >>>>     
> >>>>         reset one/all tree's csum in tree root.
> >>>>     
> >>>>     btrfs-progs: Introduce "--dangerous" option to reset all tree
> >>>>     block
> >>>>     
> >>>>        csum.
> >>>>    
> >>>>    cmds-check.c | 284
> >>>> 
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- ctree.c
> >>>> 
> >>>>    |  18 ++--
> >>>>    
> >>>>    ctree.h      |  25 +++++-
> >>>>    disk-io.c    |  55 +++++++++---
> >>>>    disk-io.h    |   3 +
> >>>>    5 files changed, 359 insertions(+), 26 deletions(-)

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.
  2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
                   ` (9 preceding siblings ...)
  2015-02-04  9:16 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Martin Steigerwald
@ 2015-04-22  5:55 ` Qu Wenruo
  10 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2015-04-22  5:55 UTC (permalink / raw)
  To: linux-btrfs, David Sterba

Ping.

No new comment nor merged?

Thanks,
Qu

-------- Original Message  --------
Subject: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA 
dangerous mode.
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <linux-btrfs@vger.kernel.org>
Date: 2015年02月04日 15:16

> Btrfs's metadata csum is a good mechanism, keeping bit error away from
> sensitive kernel. But such mechanism will also be too sensitive, like
> bit error in csum bytes or low all zero bits in nodeptr.
> It's a trade using "error tolerance" for stable, and is reasonable for
> most cases since there is DUP/RAID1/5/6/10 duplication level.
>
> But in some case, whatever for development purpose or despair user who
> can't tolerant all his/her inline data lost, or even crazy QA team
> hoping btrfs can survive heavy random bits bombing, there are some guys
> want to get rid of the csum protection and face the crucial raw data no
> matter what disaster may happen.
>
> So, introduce the new '--dangerous' (or "destruction"/"debug" if you like)
> option for btrfsck to reset all csum of tree blocks.
>
> The csum reseting have the following features:
> 1) Top to down level by level
> The csum resetting is done from tree to level 1, and only when all the
> csum of nodes in this level is reset and can pass read_tree_block()
> check, it will continue to next level.
> And all bytenr in nodeptr will be re-aligned, so bit error in the low 12
> bits(4K sector size case) can also be repaired without pain.
> With this behavior, error in nodeptr has a chance not affecting its
> child.
>
> 2) No Copy-on-write
> COW means we needs to have a valid extent tree, if extent tree is
> corrupted COW will only be a BUG_ON blocking us.
> So all the r/w in this dangerous mode will use no-cow write. That's why
> we export and slightly modified write_tree_block() to do no-cow tree
> block write with newly calculated csum.
> Since the write is not cowed, if it fails, it will also destroy the last
> hope for manual inspection.
>
> Qu Wenruo (7):
>    btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result
>      in     the same level of path->lowest_level.
>    btrfs-progs: Introduce btrfs_next_slot() function to iterate to next
>        slot in given level.
>    btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node.
>    btrfs-progs: Export write_tree_block() and allow it to do nocow write.
>    btrfs-progs: Introduce new function reset_tree_block_csum() for later
>         tree block csum reset.
>    btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to
>        reset one/all tree's csum in tree root.
>    btrfs-progs: Introduce "--dangerous" option to reset all tree block
>       csum.
>
>   cmds-check.c | 284 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   ctree.c      |  18 ++--
>   ctree.h      |  25 +++++-
>   disk-io.c    |  55 +++++++++---
>   disk-io.h    |   3 +
>   5 files changed, 359 insertions(+), 26 deletions(-)
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-04-22  5:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
2015-02-04  7:16 ` [PATCH 1/7] btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result in the same level of path->lowest_level Qu Wenruo
2015-02-04  7:16 ` [PATCH 2/7] btrfs-progs: Introduce btrfs_next_slot() function to iterate to next slot in given level Qu Wenruo
2015-02-04  7:16 ` [PATCH 3/7] btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node Qu Wenruo
2015-02-04  7:16 ` [PATCH 4/7] btrfs-progs: Export write_tree_block() and allow it to do nocow write Qu Wenruo
2015-02-04  7:16 ` [PATCH 4/5] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
2015-02-04  7:16 ` [PATCH 5/5] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
2015-02-04  7:16 ` [PATCH 5/7] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
2015-02-04  7:16 ` [PATCH 6/7] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
2015-02-04  7:16 ` [PATCH 7/7] btrfs-progs: Introduce "--dangerous" option to reset all tree block csum Qu Wenruo
2015-02-04  9:16 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Martin Steigerwald
2015-02-04 10:07   ` Paul Jones
2015-02-05  1:43     ` Qu Wenruo
2015-02-05  1:35   ` Qu Wenruo
2015-02-05  8:31     ` Martin Steigerwald
2015-02-05  8:45       ` Qu Wenruo
2015-02-05  8:59         ` BTRFS wiki: page about recovery (was: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.) Martin Steigerwald
2015-04-22  5:55 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.