All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent
@ 2017-12-21 22:42 Liu Bo
  2017-12-21 22:42 ` [PATCH 01/10] Btrfs: add helper for em merge logic Liu Bo
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Liu Bo @ 2017-12-21 22:42 UTC (permalink / raw)
  To: linux-btrfs

Although
commit e6c4efd87ab0 ("btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map")
fixed up the negetive em->len, it has introduced several regressions, several has been fixed by

commit 32be3a1ac6d0 ("btrfs: Fix the wrong condition judgment about subset extent map"),
commit 8dff9c853410 ("Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent") and
commit 8e2bd3b7fac9 ("Btrfs: deal with existing encompassing extent map in btrfs_get_extent()").

Unfortunately, there is one more regression which is caught recently in
our test farm, more details are explained in patch 7.

While debugging the above issue, I found that all of these bugs are caused
by some racy situations, which can be very tricky to reproduce, so I made
several extent map specific test cases in btrfs's selftest framework.

Patch 1-2 are some preparatory work.
Patch 3-5 are regression tests for handling EEXIST from adding extent map.
Patch 6-7 are fixing two bugs which can be reproduced by the above test cases.
Patch 8-10 are debugging wise, so that we can know what happened easily.

Liu Bo (10):
  Btrfs: add helper for em merge logic
  Btrfs: move extent map specific code to extent_map.c
  Btrfs: add extent map selftests
  Btrfs: extent map selftest: buffered write vs dio read
  Btrfs: extent map selftest: dio write vs dio read
  Btrfs: fix incorrect block_len in merge_extent_mapping
  Btrfs: fix unexpected EEXIST from btrfs_get_extent
  Btrfs: add WARN_ONCE to detect unexpected error from
    merge_extent_mapping
  Btrfs: add tracepoint for em's EEXIST case
  Btrfs: noinline merge_extent_mapping

 fs/btrfs/Makefile                 |   2 +-
 fs/btrfs/extent_map.c             | 120 +++++++++++++
 fs/btrfs/extent_map.h             |   2 +
 fs/btrfs/inode.c                  | 108 +-----------
 fs/btrfs/tests/btrfs-tests.c      |   3 +
 fs/btrfs/tests/btrfs-tests.h      |   1 +
 fs/btrfs/tests/extent-map-tests.c | 363 ++++++++++++++++++++++++++++++++++++++
 include/trace/events/btrfs.h      |  35 ++++
 8 files changed, 526 insertions(+), 108 deletions(-)
 create mode 100644 fs/btrfs/tests/extent-map-tests.c

-- 
2.9.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/10] Btrfs: add helper for em merge logic
  2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
@ 2017-12-21 22:42 ` Liu Bo
  2017-12-22  7:23   ` Nikolay Borisov
  2017-12-21 22:42 ` [PATCH 02/10] Btrfs: move extent map specific code to extent_map.c Liu Bo
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Liu Bo @ 2017-12-21 22:42 UTC (permalink / raw)
  To: linux-btrfs

This is a prepare work for the following extent map selftest, which
runs tests against em merge logic.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/ctree.h |   2 ++
 fs/btrfs/inode.c | 101 ++++++++++++++++++++++++++++++-------------------------
 2 files changed, 58 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b2e09fe..328f40f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3148,6 +3148,8 @@ struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
 						    int delay_iput);
 void btrfs_wait_and_free_delalloc_work(struct btrfs_delalloc_work *work);
 
+int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
+			     struct extent_map **em_in, u64 start, u64 len);
 struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
 		struct page *page, size_t pg_offset, u64 start,
 		u64 len, int create);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e1a7f3c..527df6f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6911,6 +6911,61 @@ static noinline int uncompress_inline(struct btrfs_path *path,
 	return ret;
 }
 
+int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
+			     struct extent_map **em_in, u64 start, u64 len)
+{
+	int ret;
+	struct extent_map *em = *em_in;
+
+	ret = add_extent_mapping(em_tree, em, 0);
+	/* it is possible that someone inserted the extent into the tree
+	 * while we had the lock dropped.  It is also possible that
+	 * an overlapping map exists in the tree
+	 */
+	if (ret == -EEXIST) {
+		struct extent_map *existing;
+
+		ret = 0;
+
+		existing = search_extent_mapping(em_tree, start, len);
+
+		/*
+		 * existing will always be non-NULL, since there must be
+		 * extent causing the -EEXIST.
+		 */
+		if (existing->start == em->start &&
+		    extent_map_end(existing) >= extent_map_end(em) &&
+		    em->block_start == existing->block_start) {
+			/*
+			 * The existing extent map already encompasses the
+			 * entire extent map we tried to add.
+			 */
+			free_extent_map(em);
+			*em_in = existing;
+			ret = 0;
+		} else if (start >= extent_map_end(existing) ||
+		    start <= existing->start) {
+			/*
+			 * The existing extent map is the one nearest to
+			 * the [start, start + len) range which overlaps
+			 */
+			ret = merge_extent_mapping(em_tree, existing,
+						   em, start);
+			free_extent_map(existing);
+			if (ret) {
+				free_extent_map(em);
+				*em_in = NULL;
+			}
+		} else {
+			free_extent_map(em);
+			*em_in = existing;
+			ret = 0;
+		}
+	}
+	ASSERT(ret == 0 || ret == -EEXIST);
+	return ret;
+}
+
 /*
  * a bit scary, this does extent mapping from logical file offset to the disk.
  * the ugly parts come from merging extents from the disk with the in-ram
@@ -7147,51 +7202,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 
 	err = 0;
 	write_lock(&em_tree->lock);
-	ret = add_extent_mapping(em_tree, em, 0);
-	/* it is possible that someone inserted the extent into the tree
-	 * while we had the lock dropped.  It is also possible that
-	 * an overlapping map exists in the tree
-	 */
-	if (ret == -EEXIST) {
-		struct extent_map *existing;
-
-		ret = 0;
-
-		existing = search_extent_mapping(em_tree, start, len);
-		/*
-		 * existing will always be non-NULL, since there must be
-		 * extent causing the -EEXIST.
-		 */
-		if (existing->start == em->start &&
-		    extent_map_end(existing) >= extent_map_end(em) &&
-		    em->block_start == existing->block_start) {
-			/*
-			 * The existing extent map already encompasses the
-			 * entire extent map we tried to add.
-			 */
-			free_extent_map(em);
-			em = existing;
-			err = 0;
-
-		} else if (start >= extent_map_end(existing) ||
-		    start <= existing->start) {
-			/*
-			 * The existing extent map is the one nearest to
-			 * the [start, start + len) range which overlaps
-			 */
-			err = merge_extent_mapping(em_tree, existing,
-						   em, start);
-			free_extent_map(existing);
-			if (err) {
-				free_extent_map(em);
-				em = NULL;
-			}
-		} else {
-			free_extent_map(em);
-			em = existing;
-			err = 0;
-		}
-	}
+	err = btrfs_add_extent_mapping(em_tree, &em, start, len);
 	write_unlock(&em_tree->lock);
 out:
 
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 02/10] Btrfs: move extent map specific code to extent_map.c
  2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
  2017-12-21 22:42 ` [PATCH 01/10] Btrfs: add helper for em merge logic Liu Bo
@ 2017-12-21 22:42 ` Liu Bo
  2017-12-21 22:42 ` [PATCH 03/10] Btrfs: add extent map selftests Liu Bo
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Liu Bo @ 2017-12-21 22:42 UTC (permalink / raw)
  To: linux-btrfs

These helpers are extent map specific, this moves them to extent_map.c.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/ctree.h      |   2 -
 fs/btrfs/extent_map.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/extent_map.h |   2 +
 fs/btrfs/inode.c      | 117 -------------------------------------------------
 4 files changed, 120 insertions(+), 119 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 328f40f..b2e09fe 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3148,8 +3148,6 @@ struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
 						    int delay_iput);
 void btrfs_wait_and_free_delalloc_work(struct btrfs_delalloc_work *work);
 
-int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
-			     struct extent_map **em_in, u64 start, u64 len);
 struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
 		struct page *page, size_t pg_offset, u64 start,
 		u64 len, int create);
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 2e348fb..51665de 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -454,3 +454,121 @@ void replace_extent_mapping(struct extent_map_tree *tree,
 
 	setup_extent_mapping(tree, new, modified);
 }
+
+static struct extent_map *next_extent_map(struct extent_map *em)
+{
+	struct rb_node *next;
+
+	next = rb_next(&em->rb_node);
+	if (!next)
+		return NULL;
+	return container_of(next, struct extent_map, rb_node);
+}
+
+static struct extent_map *prev_extent_map(struct extent_map *em)
+{
+	struct rb_node *prev;
+
+	prev = rb_prev(&em->rb_node);
+	if (!prev)
+		return NULL;
+	return container_of(prev, struct extent_map, rb_node);
+}
+
+/*
+ * Given an existing extent in the tree, the existing extent is the nearest
+ * extent to map_start, and an extent that you want to insert, deal with overlap
+ * and insert the best fitted new extent into the tree.
+ */
+static int merge_extent_mapping(struct extent_map_tree *em_tree,
+				struct extent_map *existing,
+				struct extent_map *em,
+				u64 map_start)
+{
+	struct extent_map *prev;
+	struct extent_map *next;
+	u64 start;
+	u64 end;
+	u64 start_diff;
+
+	BUG_ON(map_start < em->start || map_start >= extent_map_end(em));
+
+	if (existing->start > map_start) {
+		next = existing;
+		prev = prev_extent_map(next);
+	} else {
+		prev = existing;
+		next = next_extent_map(prev);
+	}
+
+	start = prev ? extent_map_end(prev) : em->start;
+	start = max_t(u64, start, em->start);
+	end = next ? next->start : extent_map_end(em);
+	end = min_t(u64, end, extent_map_end(em));
+	start_diff = start - em->start;
+	em->start = start;
+	em->len = end - start;
+	if (em->block_start < EXTENT_MAP_LAST_BYTE &&
+	    !test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
+		em->block_start += start_diff;
+		em->block_len -= start_diff;
+	}
+	return add_extent_mapping(em_tree, em, 0);
+}
+
+/* This handle the EEXIST case of add_extent_mapping. */
+int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
+			     struct extent_map **em_in, u64 start, u64 len)
+{
+	int ret;
+	struct extent_map *em = *em_in;
+
+	ret = add_extent_mapping(em_tree, em, 0);
+	/*
+	 * It is possible that someone inserted the extent into the tree while
+	 * we had the lock dropped.  It is also possible that an overlapping map
+	 * exists in the tree
+	 */
+	if (ret == -EEXIST) {
+		struct extent_map *existing;
+
+		ret = 0;
+
+		existing = search_extent_mapping(em_tree, start, len);
+
+		/*
+		 * existing will always be non-NULL, since there must be
+		 * extent causing the -EEXIST.
+		 */
+		if (existing->start == em->start &&
+		    extent_map_end(existing) >= extent_map_end(em) &&
+		    em->block_start == existing->block_start) {
+			/*
+			 * The existing extent map already encompasses the
+			 * entire extent map we tried to add.
+			 */
+			free_extent_map(em);
+			*em_in = existing;
+			ret = 0;
+		} else if (start >= extent_map_end(existing) ||
+		    start <= existing->start) {
+			/*
+			 * The existing extent map is the one nearest to
+			 * the [start, start + len) range which overlaps
+			 */
+			ret = merge_extent_mapping(em_tree, existing,
+						   em, start);
+			free_extent_map(existing);
+			if (ret) {
+				free_extent_map(em);
+				*em_in = NULL;
+			}
+		} else {
+			free_extent_map(em);
+			*em_in = existing;
+			ret = 0;
+		}
+	}
+	ASSERT(ret == 0 || ret == -EEXIST);
+	return ret;
+}
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 64365bb..40f34ad 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -92,4 +92,6 @@ int unpin_extent_cache(struct extent_map_tree *tree, u64 start, u64 len, u64 gen
 void clear_em_logging(struct extent_map_tree *tree, struct extent_map *em);
 struct extent_map *search_extent_mapping(struct extent_map_tree *tree,
 					 u64 start, u64 len);
+int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
+			     struct extent_map **em_in, u64 start, u64 len);
 #endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 527df6f..b2272f9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6803,68 +6803,6 @@ static int btrfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 	goto out_fail;
 }
 
-/* Find next extent map of a given extent map, caller needs to ensure locks */
-static struct extent_map *next_extent_map(struct extent_map *em)
-{
-	struct rb_node *next;
-
-	next = rb_next(&em->rb_node);
-	if (!next)
-		return NULL;
-	return container_of(next, struct extent_map, rb_node);
-}
-
-static struct extent_map *prev_extent_map(struct extent_map *em)
-{
-	struct rb_node *prev;
-
-	prev = rb_prev(&em->rb_node);
-	if (!prev)
-		return NULL;
-	return container_of(prev, struct extent_map, rb_node);
-}
-
-/* helper for btfs_get_extent.  Given an existing extent in the tree,
- * the existing extent is the nearest extent to map_start,
- * and an extent that you want to insert, deal with overlap and insert
- * the best fitted new extent into the tree.
- */
-static int merge_extent_mapping(struct extent_map_tree *em_tree,
-				struct extent_map *existing,
-				struct extent_map *em,
-				u64 map_start)
-{
-	struct extent_map *prev;
-	struct extent_map *next;
-	u64 start;
-	u64 end;
-	u64 start_diff;
-
-	BUG_ON(map_start < em->start || map_start >= extent_map_end(em));
-
-	if (existing->start > map_start) {
-		next = existing;
-		prev = prev_extent_map(next);
-	} else {
-		prev = existing;
-		next = next_extent_map(prev);
-	}
-
-	start = prev ? extent_map_end(prev) : em->start;
-	start = max_t(u64, start, em->start);
-	end = next ? next->start : extent_map_end(em);
-	end = min_t(u64, end, extent_map_end(em));
-	start_diff = start - em->start;
-	em->start = start;
-	em->len = end - start;
-	if (em->block_start < EXTENT_MAP_LAST_BYTE &&
-	    !test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
-		em->block_start += start_diff;
-		em->block_len -= start_diff;
-	}
-	return add_extent_mapping(em_tree, em, 0);
-}
-
 static noinline int uncompress_inline(struct btrfs_path *path,
 				      struct page *page,
 				      size_t pg_offset, u64 extent_offset,
@@ -6911,61 +6849,6 @@ static noinline int uncompress_inline(struct btrfs_path *path,
 	return ret;
 }
 
-int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
-			     struct extent_map **em_in, u64 start, u64 len)
-{
-	int ret;
-	struct extent_map *em = *em_in;
-
-	ret = add_extent_mapping(em_tree, em, 0);
-	/* it is possible that someone inserted the extent into the tree
-	 * while we had the lock dropped.  It is also possible that
-	 * an overlapping map exists in the tree
-	 */
-	if (ret == -EEXIST) {
-		struct extent_map *existing;
-
-		ret = 0;
-
-		existing = search_extent_mapping(em_tree, start, len);
-
-		/*
-		 * existing will always be non-NULL, since there must be
-		 * extent causing the -EEXIST.
-		 */
-		if (existing->start == em->start &&
-		    extent_map_end(existing) >= extent_map_end(em) &&
-		    em->block_start == existing->block_start) {
-			/*
-			 * The existing extent map already encompasses the
-			 * entire extent map we tried to add.
-			 */
-			free_extent_map(em);
-			*em_in = existing;
-			ret = 0;
-		} else if (start >= extent_map_end(existing) ||
-		    start <= existing->start) {
-			/*
-			 * The existing extent map is the one nearest to
-			 * the [start, start + len) range which overlaps
-			 */
-			ret = merge_extent_mapping(em_tree, existing,
-						   em, start);
-			free_extent_map(existing);
-			if (ret) {
-				free_extent_map(em);
-				*em_in = NULL;
-			}
-		} else {
-			free_extent_map(em);
-			*em_in = existing;
-			ret = 0;
-		}
-	}
-	ASSERT(ret == 0 || ret == -EEXIST);
-	return ret;
-}
-
 /*
  * a bit scary, this does extent mapping from logical file offset to the disk.
  * the ugly parts come from merging extents from the disk with the in-ram
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 03/10] Btrfs: add extent map selftests
  2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
  2017-12-21 22:42 ` [PATCH 01/10] Btrfs: add helper for em merge logic Liu Bo
  2017-12-21 22:42 ` [PATCH 02/10] Btrfs: move extent map specific code to extent_map.c Liu Bo
@ 2017-12-21 22:42 ` Liu Bo
  2017-12-22  7:42   ` Nikolay Borisov
  2017-12-21 22:42 ` [PATCH 04/10] Btrfs: extent map selftest: buffered write vs dio read Liu Bo
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Liu Bo @ 2017-12-21 22:42 UTC (permalink / raw)
  To: linux-btrfs

We've observed that btrfs_get_extent() and merge_extent_mapping() could
return -EEXIST in several cases, and they are caused by some racy
condition, e.g dio read vs dio write, which makes the problem very tricky
to reproduce.

This adds extent map selftests in order to simulate those racy situations.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/Makefile                 |   2 +-
 fs/btrfs/tests/btrfs-tests.c      |   3 +
 fs/btrfs/tests/btrfs-tests.h      |   1 +
 fs/btrfs/tests/extent-map-tests.c | 202 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 207 insertions(+), 1 deletion(-)
 create mode 100644 fs/btrfs/tests/extent-map-tests.c

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 6fe881d..0c43736 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -19,4 +19,4 @@ btrfs-$(CONFIG_BTRFS_FS_REF_VERIFY) += ref-verify.o
 btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \
 	tests/extent-buffer-tests.o tests/btrfs-tests.o \
 	tests/extent-io-tests.o tests/inode-tests.o tests/qgroup-tests.o \
-	tests/free-space-tree-tests.o
+	tests/free-space-tree-tests.o tests/extent-map-tests.o
diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
index d3f2537..9786d8c 100644
--- a/fs/btrfs/tests/btrfs-tests.c
+++ b/fs/btrfs/tests/btrfs-tests.c
@@ -277,6 +277,9 @@ int btrfs_run_sanity_tests(void)
 				goto out;
 		}
 	}
+	ret = btrfs_test_extent_map();
+	if (ret)
+		goto out;
 out:
 	btrfs_destroy_test_fs();
 	return ret;
diff --git a/fs/btrfs/tests/btrfs-tests.h b/fs/btrfs/tests/btrfs-tests.h
index 266f1e3..bc0615b 100644
--- a/fs/btrfs/tests/btrfs-tests.h
+++ b/fs/btrfs/tests/btrfs-tests.h
@@ -33,6 +33,7 @@ int btrfs_test_extent_io(u32 sectorsize, u32 nodesize);
 int btrfs_test_inodes(u32 sectorsize, u32 nodesize);
 int btrfs_test_qgroups(u32 sectorsize, u32 nodesize);
 int btrfs_test_free_space_tree(u32 sectorsize, u32 nodesize);
+int btrfs_test_extent_map(void);
 struct inode *btrfs_new_test_inode(void);
 struct btrfs_fs_info *btrfs_alloc_dummy_fs_info(u32 nodesize, u32 sectorsize);
 void btrfs_free_dummy_fs_info(struct btrfs_fs_info *fs_info);
diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
new file mode 100644
index 0000000..0407396
--- /dev/null
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -0,0 +1,202 @@
+/*
+ * Copyright (C) 2017 Oracle.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include <linux/types.h>
+#include "btrfs-tests.h"
+#include "../ctree.h"
+
+static void free_extent_map_tree(struct extent_map_tree *em_tree)
+{
+	struct extent_map *em;
+	struct rb_node *node;
+
+	while (!RB_EMPTY_ROOT(&em_tree->map)) {
+		node = rb_first(&em_tree->map);
+		em = rb_entry(node, struct extent_map, rb_node);
+		remove_extent_mapping(em_tree, em);
+
+#ifdef CONFIG_BTRFS_DEBUG
+		if (refcount_read(&em->refs) != 1) {
+			test_msg(
+				 "Oops we have a em leak: em (start 0x%llx len 0x%llx block_start 0x%llx block_len 0x%llx) refs %d\n",
+				 em->start, em->len, em->block_start,
+				 em->block_len, refcount_read(&em->refs));
+
+			refcount_set(&em->refs, 1);
+		}
+#endif
+		free_extent_map(em);
+	}
+}
+
+/*
+ * Test scenario:
+ *
+ * Suppose that no extent map has been loaded into memory yet, there is a file
+ * extent [0, 16K), followed by another file extent [16K, 20K), two dio reads
+ * are entering btrfs_get_extent() concurrently, t1 is reading [8K, 16K), t2 is
+ * reading [0, 8K)
+ *
+ *     t1                            t2
+ *  btrfs_get_extent()              btrfs_get_extent()
+ *    -> lookup_extent_mapping()      ->lookup_extent_mapping()
+ *    -> add_extent_mapping(0, 16K)
+ *    -> return em
+ *                                    ->add_extent_mapping(0, 16K)
+ *                                    -> #handle -EEXIST
+ */
+static void test_case_1(struct extent_map_tree *em_tree)
+{
+	struct extent_map *em;
+	u64 start = 0;
+	u64 len = SZ_8K;
+	int ret;
+
+	em = alloc_extent_map();
+	if (!em)
+		/* Skip the test on error. */
+		return;
+
+	/* Add [0, 16K) */
+	em->start = 0;
+	em->len = SZ_16K;
+	em->block_start = 0;
+	em->block_len = SZ_16K;
+	ret = add_extent_mapping(em_tree, em, 0);
+	ASSERT(ret == 0);
+	free_extent_map(em);
+
+	/* Add [16K, 20K) following [0, 16K)  */
+	em = alloc_extent_map();
+	if (!em)
+		goto out;
+
+	em->start = SZ_16K;
+	em->len = SZ_4K;
+	em->block_start = SZ_32K; /* avoid merging */
+	em->block_len = SZ_4K;
+	ret = add_extent_mapping(em_tree, em, 0);
+	ASSERT(ret == 0);
+	free_extent_map(em);
+
+	em = alloc_extent_map();
+	if (!em)
+		goto out;
+
+	/* Add [0, 8K), should return [0, 16K) instead. */
+	em->start = start;
+	em->len = len;
+	em->block_start = start;
+	em->block_len = len;
+	ret = btrfs_add_extent_mapping(em_tree, &em, em->start, em->len);
+	if (ret)
+		test_msg("case1 [%llu %llu]: ret %d\n",
+			 start, start + len, ret);
+	if (em &&
+	    (em->start != 0 || extent_map_end(em) != SZ_16K ||
+	     em->block_start != 0 || em->block_len != SZ_16K))
+		test_msg("case1 [%llu %llu]: ret %d return a wrong em (start %llu len %llu block_start %llu block_len %llu\n",
+			 start, start + len, ret, em->start, em->len,
+			 em->block_start, em->block_len);
+	free_extent_map(em);
+out:
+	/* free memory */
+	free_extent_map_tree(em_tree);
+}
+
+/*
+ * Test scenario:
+ *
+ * Reading the inline ending up with EEXIST, ie. read an inline
+ * extent and discard page cache and read it again.
+ */
+static void test_case_2(struct extent_map_tree *em_tree)
+{
+	struct extent_map *em;
+	int ret;
+
+	em = alloc_extent_map();
+	if (!em)
+		/* Skip the test on error. */
+		return;
+
+	/* Add [0, 1K) */
+	em->start = 0;
+	em->len = SZ_1K;
+	em->block_start = EXTENT_MAP_INLINE;
+	em->block_len = (u64)-1;
+	ret = add_extent_mapping(em_tree, em, 0);
+	ASSERT(ret == 0);
+	free_extent_map(em);
+
+	/* Add [4K, 4K) following [0, 1K)  */
+	em = alloc_extent_map();
+	if (!em)
+		goto out;
+
+	em->start = SZ_4K;
+	em->len = SZ_4K;
+	em->block_start = SZ_4K;
+	em->block_len = SZ_4K;
+	ret = add_extent_mapping(em_tree, em, 0);
+	ASSERT(ret == 0);
+	free_extent_map(em);
+
+	em = alloc_extent_map();
+	if (!em)
+		goto out;
+
+	/* Add [0, 1K) */
+	em->start = 0;
+	em->len = SZ_1K;
+	em->block_start = EXTENT_MAP_INLINE;
+	em->block_len = (u64)-1;
+	ret = btrfs_add_extent_mapping(em_tree, &em, em->start, em->len);
+	if (ret)
+		test_msg("case2 [0 1K]: ret %d\n", ret);
+	if (em &&
+	    (em->start != 0 || extent_map_end(em) != SZ_1K ||
+	     em->block_start != EXTENT_MAP_INLINE || em->block_len != (u64)-1))
+		test_msg("case2 [0 1K]: ret %d return a wrong em (start %llu len %llu block_start %llu block_len %llu\n",
+			 ret, em->start, em->len, em->block_start,
+			 em->block_len);
+	free_extent_map(em);
+out:
+	/* free memory */
+	free_extent_map_tree(em_tree);
+}
+
+int btrfs_test_extent_map()
+{
+	struct extent_map_tree *em_tree;
+
+	test_msg("Running extent_map tests\n");
+
+	em_tree = kzalloc(sizeof(*em_tree), GFP_KERNEL);
+	if (!em_tree)
+		/* Skip the test on error. */
+		return 0;
+
+	extent_map_tree_init(em_tree);
+
+	test_case_1(em_tree);
+	test_case_2(em_tree);
+
+	kfree(em_tree);
+	return 0;
+}
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 04/10] Btrfs: extent map selftest: buffered write vs dio read
  2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
                   ` (2 preceding siblings ...)
  2017-12-21 22:42 ` [PATCH 03/10] Btrfs: add extent map selftests Liu Bo
@ 2017-12-21 22:42 ` Liu Bo
  2017-12-22  7:51   ` Nikolay Borisov
  2017-12-21 22:42 ` [PATCH 05/10] Btrfs: extent map selftest: dio " Liu Bo
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Liu Bo @ 2017-12-21 22:42 UTC (permalink / raw)
  To: linux-btrfs

This test case simulates the racy situation of buffered write vs dio
read, and see if btrfs_get_extent() would return -EEXIST.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/tests/extent-map-tests.c | 73 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
index 0407396..2adf55f 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -181,6 +181,78 @@ static void test_case_2(struct extent_map_tree *em_tree)
 	free_extent_map_tree(em_tree);
 }
 
+static void __test_case_3(struct extent_map_tree *em_tree, u64 start)
+{
+	struct extent_map *em;
+	u64 len = SZ_4K;
+	int ret;
+
+	em = alloc_extent_map();
+	if (!em)
+		/* Skip this test on error. */
+		return;
+
+	/* Add [4K, 8K) */
+	em->start = SZ_4K;
+	em->len = SZ_4K;
+	em->block_start = SZ_4K;
+	em->block_len = SZ_4K;
+	ret = add_extent_mapping(em_tree, em, 0);
+	ASSERT(ret == 0);
+	free_extent_map(em);
+
+	em = alloc_extent_map();
+	if (!em)
+		goto out;
+
+	/* Add [0, 16K) */
+	em->start = 0;
+	em->len = SZ_16K;
+	em->block_start = 0;
+	em->block_len = SZ_16K;
+	ret = btrfs_add_extent_mapping(em_tree, &em, start, len);
+	if (ret)
+		test_msg("case3 [0x%llx 0x%llx): ret %d\n",
+			 start, start + len, ret);
+	/*
+	 * Since bytes within em are contiguous, em->block_start is identical to
+	 * em->start.
+	 */
+	if (em &&
+	    (start < em->start || start + len > extent_map_end(em) ||
+	     em->start != em->block_start || em->len != em->block_len))
+		test_msg("case3 [0x%llx 0x%llx): ret %d em (start 0x%llx len 0x%llx block_start 0x%llx block_len 0x%llx)\n",
+			 start, start + len, ret, em->start, em->len,
+			 em->block_start, em->block_len);
+	free_extent_map(em);
+out:
+	/* free memory */
+	free_extent_map_tree(em_tree);
+}
+
+/*
+ * Test scenario:
+ *
+ * Suppose that no extent map has been loaded into memory yet.
+ * There is a file extent [0, 16K), two jobs are running concurrently
+ * against it, t1 is buffered writing to [4K, 8K) and t2 is doing dio
+ * read from [0, 4K) or [8K, 12K) or [12K, 16K).
+ *
+ * t1 goes ahead of t2 and adds em [4K, 8K) into tree.
+ *
+ *         t1                       t2
+ *  cow_file_range()	     btrfs_get_extent()
+ *                            -> lookup_extent_mapping()
+ *   -> add_extent_mapping()
+ *                            -> add_extent_mapping()
+ */
+static void test_case_3(struct extent_map_tree *em_tree)
+{
+	__test_case_3(em_tree, 0);
+	__test_case_3(em_tree, SZ_8K);
+	__test_case_3(em_tree, (12 * 1024ULL));
+}
+
 int btrfs_test_extent_map()
 {
 	struct extent_map_tree *em_tree;
@@ -196,6 +268,7 @@ int btrfs_test_extent_map()
 
 	test_case_1(em_tree);
 	test_case_2(em_tree);
+	test_case_3(em_tree);
 
 	kfree(em_tree);
 	return 0;
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 05/10] Btrfs: extent map selftest: dio write vs dio read
  2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
                   ` (3 preceding siblings ...)
  2017-12-21 22:42 ` [PATCH 04/10] Btrfs: extent map selftest: buffered write vs dio read Liu Bo
@ 2017-12-21 22:42 ` Liu Bo
  2017-12-21 22:42 ` [PATCH 06/10] Btrfs: fix incorrect block_len in merge_extent_mapping Liu Bo
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Liu Bo @ 2017-12-21 22:42 UTC (permalink / raw)
  To: linux-btrfs

This test case simulates the racy situation of dio write vs dio read,
and see if btrfs_get_extent() would return -EEXIST.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/tests/extent-map-tests.c | 88 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
index 2adf55f..66d5523 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -253,6 +253,93 @@ static void test_case_3(struct extent_map_tree *em_tree)
 	__test_case_3(em_tree, (12 * 1024ULL));
 }
 
+static void __test_case_4(struct extent_map_tree *em_tree, u64 start)
+{
+	struct extent_map *em;
+	u64 len = SZ_4K;
+	int ret;
+
+	em = alloc_extent_map();
+	if (!em)
+		/* Skip this test on error. */
+		return;
+
+	/* Add [0K, 8K) */
+	em->start = 0;
+	em->len = SZ_8K;
+	em->block_start = 0;
+	em->block_len = SZ_8K;
+	ret = add_extent_mapping(em_tree, em, 0);
+	ASSERT(ret == 0);
+	free_extent_map(em);
+
+	em = alloc_extent_map();
+	if (!em)
+		goto out;
+
+	/* Add [8K, 24K) */
+	em->start = SZ_8K;
+	em->len = 24 * 1024ULL;
+	em->block_start = SZ_16K; /* avoid merging */
+	em->block_len = 24 * 1024ULL;
+	ret = add_extent_mapping(em_tree, em, 0);
+	ASSERT(ret == 0);
+	free_extent_map(em);
+
+	em = alloc_extent_map();
+	if (!em)
+		goto out;
+	/* Add [0K, 32K) */
+	em->start = 0;
+	em->len = SZ_32K;
+	em->block_start = 0;
+	em->block_len = SZ_32K;
+	ret = btrfs_add_extent_mapping(em_tree, &em, start, len);
+	if (ret)
+		test_msg("case4 [0x%llx 0x%llx): ret %d\n",
+			 start, len, ret);
+	if (em &&
+	    (start < em->start || start + len > extent_map_end(em)))
+		test_msg("case4 [0x%llx 0x%llx): ret %d, added wrong em (start 0x%llx len 0x%llx block_start 0x%llx block_len 0x%llx)\n",
+			 start, len, ret, em->start, em->len, em->block_start,
+			 em->block_len);
+	free_extent_map(em);
+out:
+	/* free memory */
+	free_extent_map_tree(em_tree);
+}
+
+/*
+ * Test scenario:
+ *
+ * Suppose that no extent map has been loaded into memory yet.
+ * There is a file extent [0, 32K), two jobs are running concurrently
+ * against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio
+ * read from [0, 4K) or [4K, 8K).
+ *
+ * t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K).
+ *
+ *         t1                                t2
+ *  btrfs_get_blocks_direct()	       btrfs_get_blocks_direct()
+ *   -> btrfs_get_extent()              -> btrfs_get_extent()
+ *       -> lookup_extent_mapping()
+ *       -> add_extent_mapping()            -> lookup_extent_mapping()
+ *          # load [0, 32K)
+ *   -> btrfs_new_extent_direct()
+ *       -> btrfs_drop_extent_cache()
+ *          # split [0, 32K)
+ *       -> add_extent_mapping()
+ *          # add [8K, 32K)
+ *                                          -> add_extent_mapping()
+ *                                             # handle -EEXIST when adding
+ *                                             # [0, 32K)
+ */
+static void test_case_4(struct extent_map_tree *em_tree)
+{
+	__test_case_4(em_tree, 0);
+	__test_case_4(em_tree, SZ_4K);
+}
+
 int btrfs_test_extent_map()
 {
 	struct extent_map_tree *em_tree;
@@ -269,6 +356,7 @@ int btrfs_test_extent_map()
 	test_case_1(em_tree);
 	test_case_2(em_tree);
 	test_case_3(em_tree);
+	test_case_4(em_tree);
 
 	kfree(em_tree);
 	return 0;
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 06/10] Btrfs: fix incorrect block_len in merge_extent_mapping
  2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
                   ` (4 preceding siblings ...)
  2017-12-21 22:42 ` [PATCH 05/10] Btrfs: extent map selftest: dio " Liu Bo
@ 2017-12-21 22:42 ` Liu Bo
  2017-12-21 22:42 ` [PATCH 07/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent Liu Bo
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Liu Bo @ 2017-12-21 22:42 UTC (permalink / raw)
  To: linux-btrfs

%block_len could be checked on deciding if two em are mergable.

merge_extent_mapping() has only added the front pad if the front part
of em gets truncated, but it's possible that the end part gets
truncated.

For both compressed extent and inline extent, em->block_len is not
adjusted accordingly, while for regular extent, em->block_len always
equals to em->len, hence this sets em->block_len with em->len.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_map.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 51665de..6653b08 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -511,7 +511,7 @@ static int merge_extent_mapping(struct extent_map_tree *em_tree,
 	if (em->block_start < EXTENT_MAP_LAST_BYTE &&
 	    !test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
 		em->block_start += start_diff;
-		em->block_len -= start_diff;
+		em->block_len = em->len;
 	}
 	return add_extent_mapping(em_tree, em, 0);
 }
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 07/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent
  2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
                   ` (5 preceding siblings ...)
  2017-12-21 22:42 ` [PATCH 06/10] Btrfs: fix incorrect block_len in merge_extent_mapping Liu Bo
@ 2017-12-21 22:42 ` Liu Bo
  2017-12-22 12:10   ` Nikolay Borisov
  2017-12-21 22:42 ` [PATCH 08/10] Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping Liu Bo
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Liu Bo @ 2017-12-21 22:42 UTC (permalink / raw)
  To: linux-btrfs

This fixes a corner case that is caused by a race of dio write vs dio
read/write.

dio write:
[0, 32k) -> [0, 8k) + [8k, 32k)

dio read/write:

While get_extent() with [0, 4k), [0, 8k) is found as existing em, even
though start == existing->start, em is [0, 32k),
extent_map_end(em) > extent_map_end(existing),
then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
(with a length 0), and get_extent ends up returning -EEXIST, and dio
read/write will get -EEXIST which is confusing applications.

Here I concluded all the possible situations,
1) start < existing->start

            +-----------+em+-----------+
+--prev---+ |     +-------------+      |
|         | |     |             |      |
+---------+ +     +---+existing++      ++
                +
                |
                +
             start

2) start == existing->start

      +------------em------------+
      |     +-------------+      |
      |     |             |      |
      +     +----existing-+      +
            |
            |
            +
         start

3) start > existing->start && start < (existing->start + existing->len)

      +------------em------------+
      |     +-------------+      |
      |     |             |      |
      +     +----existing-+      +
               |
               |
               +
             start

4) start >= (existing->start + existing->len)

+-----------+em+-----------+
|     +-------------+      | +--next---+
|     |             |      | |         |
+     +---+existing++      + +---------+
                      +
                      |
                      +
                   start

After going thru the above case by case, it turns out that if start is
within existing em (front inclusive), then the existing em should be
returned, otherwise, we try our best to merge candidate em with
sibling ems to form a larger em.

Reported-by: David Vallender <david.vallender@landmark.co.uk>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_map.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 6653b08..d386cfb 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -483,7 +483,7 @@ static struct extent_map *prev_extent_map(struct extent_map *em)
 static int merge_extent_mapping(struct extent_map_tree *em_tree,
 				struct extent_map *existing,
 				struct extent_map *em,
-				u64 map_start)
+				u64 map_start, u64 map_len)
 {
 	struct extent_map *prev;
 	struct extent_map *next;
@@ -496,9 +496,13 @@ static int merge_extent_mapping(struct extent_map_tree *em_tree,
 	if (existing->start > map_start) {
 		next = existing;
 		prev = prev_extent_map(next);
+		if (prev)
+			ASSERT(extent_map_end(prev) <= map_start);
 	} else {
 		prev = existing;
 		next = next_extent_map(prev);
+		if (next)
+			ASSERT(map_start + map_len <= next->start);
 	}
 
 	start = prev ? extent_map_end(prev) : em->start;
@@ -540,35 +544,26 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
 		 * existing will always be non-NULL, since there must be
 		 * extent causing the -EEXIST.
 		 */
-		if (existing->start == em->start &&
-		    extent_map_end(existing) >= extent_map_end(em) &&
-		    em->block_start == existing->block_start) {
-			/*
-			 * The existing extent map already encompasses the
-			 * entire extent map we tried to add.
-			 */
+		if (start >= existing->start &&
+		    start < extent_map_end(existing)) {
 			free_extent_map(em);
 			*em_in = existing;
 			ret = 0;
-		} else if (start >= extent_map_end(existing) ||
-		    start <= existing->start) {
+		} else {
 			/*
 			 * The existing extent map is the one nearest to
 			 * the [start, start + len) range which overlaps
 			 */
 			ret = merge_extent_mapping(em_tree, existing,
-						   em, start);
+						   em, start, len);
 			free_extent_map(existing);
 			if (ret) {
 				free_extent_map(em);
 				*em_in = NULL;
 			}
-		} else {
-			free_extent_map(em);
-			*em_in = existing;
-			ret = 0;
 		}
 	}
+
 	ASSERT(ret == 0 || ret == -EEXIST);
 	return ret;
 }
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 08/10] Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping
  2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
                   ` (6 preceding siblings ...)
  2017-12-21 22:42 ` [PATCH 07/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent Liu Bo
@ 2017-12-21 22:42 ` Liu Bo
  2017-12-22  8:52   ` Nikolay Borisov
  2017-12-21 22:42 ` [PATCH 09/10] Btrfs: add tracepoint for em's EEXIST case Liu Bo
  2017-12-21 22:42 ` [PATCH 10/10] Btrfs: noinline merge_extent_mapping Liu Bo
  9 siblings, 1 reply; 24+ messages in thread
From: Liu Bo @ 2017-12-21 22:42 UTC (permalink / raw)
  To: linux-btrfs

This is a subtle case, so in order to understand the problem, it'd be good to
know the content of existing and em when any error occurs.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_map.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index d386cfb..a8b7e24 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -550,17 +550,23 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
 			*em_in = existing;
 			ret = 0;
 		} else {
+			u64 orig_start = em->start;
+			u64 orig_len = em->len;
+
 			/*
 			 * The existing extent map is the one nearest to
 			 * the [start, start + len) range which overlaps
 			 */
 			ret = merge_extent_mapping(em_tree, existing,
 						   em, start, len);
-			free_extent_map(existing);
 			if (ret) {
 				free_extent_map(em);
 				*em_in = NULL;
+				WARN_ONCE(ret, KERN_INFO "Unexpected error %d: merge existing(start %llu len %llu) with em(start %llu len %llu)\n",
+					  ret, existing->start, existing->len,
+					  orig_start, orig_len);
 			}
+			free_extent_map(existing);
 		}
 	}
 
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 09/10] Btrfs: add tracepoint for em's EEXIST case
  2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
                   ` (7 preceding siblings ...)
  2017-12-21 22:42 ` [PATCH 08/10] Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping Liu Bo
@ 2017-12-21 22:42 ` Liu Bo
  2017-12-22  8:56   ` Nikolay Borisov
  2017-12-21 22:42 ` [PATCH 10/10] Btrfs: noinline merge_extent_mapping Liu Bo
  9 siblings, 1 reply; 24+ messages in thread
From: Liu Bo @ 2017-12-21 22:42 UTC (permalink / raw)
  To: linux-btrfs

This is adding a tracepoint 'btrfs_handle_em_exist' to help debug the
subtle bugs around merge_extent_mapping.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_map.c        |  1 +
 include/trace/events/btrfs.h | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index a8b7e24..40e4d30 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -539,6 +539,7 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
 		ret = 0;
 
 		existing = search_extent_mapping(em_tree, start, len);
+		trace_btrfs_handle_em_exist(existing, em, start, len);
 
 		/*
 		 * existing will always be non-NULL, since there must be
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 4342a32..b7ffcf7 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -249,6 +249,41 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
 		  __entry->refs, __entry->compress_type)
 );
 
+TRACE_EVENT(btrfs_handle_em_exist,
+
+	TP_PROTO(const struct extent_map *existing, const struct extent_map *map, u64 start, u64 len),
+
+	TP_ARGS(existing, map, start, len),
+
+	TP_STRUCT__entry(
+		__field(	u64,  e_start		)
+		__field(	u64,  e_len		)
+		__field(	u64,  map_start		)
+		__field(	u64,  map_len		)
+		__field(	u64,  start		)
+		__field(	u64,  len		)
+	),
+
+	TP_fast_assign(
+		__entry->e_start	= existing->start;
+		__entry->e_len		= existing->len;
+		__entry->map_start	= map->start;
+		__entry->map_len	= map->len;
+		__entry->start		= start;
+		__entry->len		= len;
+	),
+
+	TP_printk("start=%llu len=%llu "
+		  "existing(start=%llu len=%llu) "
+		  "em(start=%llu len=%llu)",
+		  (unsigned long long)__entry->start,
+		  (unsigned long long)__entry->len,
+		  (unsigned long long)__entry->e_start,
+		  (unsigned long long)__entry->e_len,
+		  (unsigned long long)__entry->map_start,
+		  (unsigned long long)__entry->map_len)
+);
+
 /* file extent item */
 DECLARE_EVENT_CLASS(btrfs__file_extent_item_regular,
 
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 10/10] Btrfs: noinline merge_extent_mapping
  2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
                   ` (8 preceding siblings ...)
  2017-12-21 22:42 ` [PATCH 09/10] Btrfs: add tracepoint for em's EEXIST case Liu Bo
@ 2017-12-21 22:42 ` Liu Bo
  9 siblings, 0 replies; 24+ messages in thread
From: Liu Bo @ 2017-12-21 22:42 UTC (permalink / raw)
  To: linux-btrfs

In order to debug subtle bugs around merge_extent_mapping(), perf probe
can be used to check the arguments, but sometimes merge_extent_mapping()
got inlined by compiler and couldn't be probed.

This is adding noinline attribute to merge_extent_mapping().

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_map.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 40e4d30..f10a6fc 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -480,10 +480,10 @@ static struct extent_map *prev_extent_map(struct extent_map *em)
  * extent to map_start, and an extent that you want to insert, deal with overlap
  * and insert the best fitted new extent into the tree.
  */
-static int merge_extent_mapping(struct extent_map_tree *em_tree,
-				struct extent_map *existing,
-				struct extent_map *em,
-				u64 map_start, u64 map_len)
+static noinline int merge_extent_mapping(struct extent_map_tree *em_tree,
+					 struct extent_map *existing,
+					 struct extent_map *em,
+					 u64 map_start, u64 map_len)
 {
 	struct extent_map *prev;
 	struct extent_map *next;
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 01/10] Btrfs: add helper for em merge logic
  2017-12-21 22:42 ` [PATCH 01/10] Btrfs: add helper for em merge logic Liu Bo
@ 2017-12-22  7:23   ` Nikolay Borisov
  2017-12-22 18:45     ` Liu Bo
  0 siblings, 1 reply; 24+ messages in thread
From: Nikolay Borisov @ 2017-12-22  7:23 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs



On 22.12.2017 00:42, Liu Bo wrote:
> This is a prepare work for the following extent map selftest, which
> runs tests against em merge logic.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/ctree.h |   2 ++
>  fs/btrfs/inode.c | 101 ++++++++++++++++++++++++++++++-------------------------
>  2 files changed, 58 insertions(+), 45 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index b2e09fe..328f40f 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3148,6 +3148,8 @@ struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
>  						    int delay_iput);
>  void btrfs_wait_and_free_delalloc_work(struct btrfs_delalloc_work *work);
>  
> +int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
> +			     struct extent_map **em_in, u64 start, u64 len);
>  struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
>  		struct page *page, size_t pg_offset, u64 start,
>  		u64 len, int create);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index e1a7f3c..527df6f 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6911,6 +6911,61 @@ static noinline int uncompress_inline(struct btrfs_path *path,
>  	return ret;
>  }
>  
> +int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
> +			     struct extent_map **em_in, u64 start, u64 len)

How about adding the following comment above the function: 

/**                                                                             
 * btrfs_add_extent_mapping - try to add given extent mapping                   
 * @em_tree - the extent tree into which we want to add the mapping             
 * @em_in - extent we are inserting                                             
 * @start - the start of the logical range of the extent we are adding          
 * @len - logical length of the extent                                          
 *                                                                              
 * Insert @em_in into @em_tree. In case there is an overlapping range, handle   
 * the -EEXIST by either:                                                       
 * a) Returning the existing extent in @em_in if there is a full overlap        
 * b) Merge the extents if they are near each other.                            
 *                                                                              
 * Returns 0 on success or a negative error code                               
 *                                                                              
 */ 

Also one thing which I'm not very clear is why do we need the start/len, aren't 
those already set in em_in ?



> +{
> +	int ret;
> +	struct extent_map *em = *em_in;
> +
> +	ret = add_extent_mapping(em_tree, em, 0);
> +	/* it is possible that someone inserted the extent into the tree
> +	 * while we had the lock dropped.  It is also possible that
> +	 * an overlapping map exists in the tree
> +	 */
> +	if (ret == -EEXIST) {
> +		struct extent_map *existing;
> +
> +		ret = 0;
> +
> +		existing = search_extent_mapping(em_tree, start, len);
> +
> +		/*
> +		 * existing will always be non-NULL, since there must be
> +		 * extent causing the -EEXIST.
> +		 */
> +		if (existing->start == em->start &&
> +		    extent_map_end(existing) >= extent_map_end(em) &&
> +		    em->block_start == existing->block_start) {
> +			/*
> +			 * The existing extent map already encompasses the
> +			 * entire extent map we tried to add.
> +			 */
> +			free_extent_map(em);
> +			*em_in = existing;
> +			ret = 0;
> +		} else if (start >= extent_map_end(existing) ||
> +		    start <= existing->start) {
> +			/*
> +			 * The existing extent map is the one nearest to
> +			 * the [start, start + len) range which overlaps
> +			 */
> +			ret = merge_extent_mapping(em_tree, existing,
> +						   em, start);
> +			free_extent_map(existing);
> +			if (ret) {
> +				free_extent_map(em);
> +				*em_in = NULL;
> +			}
> +		} else {
> +			free_extent_map(em);
> +			*em_in = existing;
> +			ret = 0;
> +		}
> +	}
> +	ASSERT(ret == 0 || ret == -EEXIST);
> +	return ret;
> +}
> +
>  /*
>   * a bit scary, this does extent mapping from logical file offset to the disk.
>   * the ugly parts come from merging extents from the disk with the in-ram
> @@ -7147,51 +7202,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
>  
>  	err = 0;
>  	write_lock(&em_tree->lock);
> -	ret = add_extent_mapping(em_tree, em, 0);
> -	/* it is possible that someone inserted the extent into the tree
> -	 * while we had the lock dropped.  It is also possible that
> -	 * an overlapping map exists in the tree
> -	 */
> -	if (ret == -EEXIST) {
> -		struct extent_map *existing;
> -
> -		ret = 0;
> -
> -		existing = search_extent_mapping(em_tree, start, len);
> -		/*
> -		 * existing will always be non-NULL, since there must be
> -		 * extent causing the -EEXIST.
> -		 */
> -		if (existing->start == em->start &&
> -		    extent_map_end(existing) >= extent_map_end(em) &&
> -		    em->block_start == existing->block_start) {
> -			/*
> -			 * The existing extent map already encompasses the
> -			 * entire extent map we tried to add.
> -			 */
> -			free_extent_map(em);
> -			em = existing;
> -			err = 0;
> -
> -		} else if (start >= extent_map_end(existing) ||
> -		    start <= existing->start) {
> -			/*
> -			 * The existing extent map is the one nearest to
> -			 * the [start, start + len) range which overlaps
> -			 */
> -			err = merge_extent_mapping(em_tree, existing,
> -						   em, start);
> -			free_extent_map(existing);
> -			if (err) {
> -				free_extent_map(em);
> -				em = NULL;
> -			}
> -		} else {
> -			free_extent_map(em);
> -			em = existing;
> -			err = 0;
> -		}
> -	}
> +	err = btrfs_add_extent_mapping(em_tree, &em, start, len);
>  	write_unlock(&em_tree->lock);
>  out:
>  
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/10] Btrfs: add extent map selftests
  2017-12-21 22:42 ` [PATCH 03/10] Btrfs: add extent map selftests Liu Bo
@ 2017-12-22  7:42   ` Nikolay Borisov
  2017-12-22 18:53     ` Liu Bo
  0 siblings, 1 reply; 24+ messages in thread
From: Nikolay Borisov @ 2017-12-22  7:42 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs



On 22.12.2017 00:42, Liu Bo wrote:
> We've observed that btrfs_get_extent() and merge_extent_mapping() could
> return -EEXIST in several cases, and they are caused by some racy
> condition, e.g dio read vs dio write, which makes the problem very tricky
> to reproduce.
> 
> This adds extent map selftests in order to simulate those racy situations.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/Makefile                 |   2 +-
>  fs/btrfs/tests/btrfs-tests.c      |   3 +
>  fs/btrfs/tests/btrfs-tests.h      |   1 +
>  fs/btrfs/tests/extent-map-tests.c | 202 ++++++++++++++++++++++++++++++++++++++
>  4 files changed, 207 insertions(+), 1 deletion(-)
>  create mode 100644 fs/btrfs/tests/extent-map-tests.c
> 
> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> index 6fe881d..0c43736 100644
> --- a/fs/btrfs/Makefile
> +++ b/fs/btrfs/Makefile
> @@ -19,4 +19,4 @@ btrfs-$(CONFIG_BTRFS_FS_REF_VERIFY) += ref-verify.o
>  btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \
>  	tests/extent-buffer-tests.o tests/btrfs-tests.o \
>  	tests/extent-io-tests.o tests/inode-tests.o tests/qgroup-tests.o \
> -	tests/free-space-tree-tests.o
> +	tests/free-space-tree-tests.o tests/extent-map-tests.o
> diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
> index d3f2537..9786d8c 100644
> --- a/fs/btrfs/tests/btrfs-tests.c
> +++ b/fs/btrfs/tests/btrfs-tests.c
> @@ -277,6 +277,9 @@ int btrfs_run_sanity_tests(void)
>  				goto out;
>  		}
>  	}
> +	ret = btrfs_test_extent_map();
> +	if (ret)
> +		goto out;
>  out:
>  	btrfs_destroy_test_fs();
>  	return ret;
> diff --git a/fs/btrfs/tests/btrfs-tests.h b/fs/btrfs/tests/btrfs-tests.h
> index 266f1e3..bc0615b 100644
> --- a/fs/btrfs/tests/btrfs-tests.h
> +++ b/fs/btrfs/tests/btrfs-tests.h
> @@ -33,6 +33,7 @@ int btrfs_test_extent_io(u32 sectorsize, u32 nodesize);
>  int btrfs_test_inodes(u32 sectorsize, u32 nodesize);
>  int btrfs_test_qgroups(u32 sectorsize, u32 nodesize);
>  int btrfs_test_free_space_tree(u32 sectorsize, u32 nodesize);
> +int btrfs_test_extent_map(void);
>  struct inode *btrfs_new_test_inode(void);
>  struct btrfs_fs_info *btrfs_alloc_dummy_fs_info(u32 nodesize, u32 sectorsize);
>  void btrfs_free_dummy_fs_info(struct btrfs_fs_info *fs_info);
> diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
> new file mode 100644
> index 0000000..0407396
> --- /dev/null
> +++ b/fs/btrfs/tests/extent-map-tests.c
> @@ -0,0 +1,202 @@
> +/*
> + * Copyright (C) 2017 Oracle.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; if not, write to the
> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> + * Boston, MA 021110-1307, USA.
> + */
> +
> +#include <linux/types.h>
> +#include "btrfs-tests.h"
> +#include "../ctree.h"
> +
> +static void free_extent_map_tree(struct extent_map_tree *em_tree)
> +{
> +	struct extent_map *em;
> +	struct rb_node *node;
> +
> +	while (!RB_EMPTY_ROOT(&em_tree->map)) {
> +		node = rb_first(&em_tree->map);
> +		em = rb_entry(node, struct extent_map, rb_node);
> +		remove_extent_mapping(em_tree, em);
> +
> +#ifdef CONFIG_BTRFS_DEBUG
> +		if (refcount_read(&em->refs) != 1) {
> +			test_msg(
> +				 "Oops we have a em leak: em (start 0x%llx len 0x%llx block_start 0x%llx block_len 0x%llx) refs %d\n",
> +				 em->start, em->len, em->block_start,
> +				 em->block_len, refcount_read(&em->refs));
> +
> +			refcount_set(&em->refs, 1);
> +		}
> +#endif
> +		free_extent_map(em);
> +	}
> +}
> +
> +/*
> + * Test scenario:
> + *
> + * Suppose that no extent map has been loaded into memory yet, there is a file
> + * extent [0, 16K), followed by another file extent [16K, 20K), two dio reads
> + * are entering btrfs_get_extent() concurrently, t1 is reading [8K, 16K), t2 is
> + * reading [0, 8K)
> + *
> + *     t1                            t2
> + *  btrfs_get_extent()              btrfs_get_extent()
> + *    -> lookup_extent_mapping()      ->lookup_extent_mapping()
> + *    -> add_extent_mapping(0, 16K)
> + *    -> return em
> + *                                    ->add_extent_mapping(0, 16K)
> + *                                    -> #handle -EEXIST
> + */
> +static void test_case_1(struct extent_map_tree *em_tree)
> +{

So here you are testing the "fully overlapping" case in btrfs_add_extent_mapping. 
What about another test case about merging 2 adjacent extents i.e. this block of code: 

} else if (start >= extent_map_end(existing) || start <= existing->start) {

> +	struct extent_map *em;
> +	u64 start = 0;
> +	u64 len = SZ_8K;
> +	int ret;
> +
> +	em = alloc_extent_map();
> +	if (!em)
> +		/* Skip the test on error. */
> +		return;
> +
> +	/* Add [0, 16K) */
> +	em->start = 0;
> +	em->len = SZ_16K;
> +	em->block_start = 0;
> +	em->block_len = SZ_16K;
> +	ret = add_extent_mapping(em_tree, em, 0);
> +	ASSERT(ret == 0);
> +	free_extent_map(em);
> +
> +	/* Add [16K, 20K) following [0, 16K)  */
> +	em = alloc_extent_map();
> +	if (!em)
> +		goto out;
> +
> +	em->start = SZ_16K;
> +	em->len = SZ_4K;
> +	em->block_start = SZ_32K; /* avoid merging */
> +	em->block_len = SZ_4K;
> +	ret = add_extent_mapping(em_tree, em, 0);
> +	ASSERT(ret == 0);
> +	free_extent_map(em);
> +
> +	em = alloc_extent_map();
> +	if (!em)
> +		goto out;
> +
> +	/* Add [0, 8K), should return [0, 16K) instead. */
> +	em->start = start;
> +	em->len = len;
> +	em->block_start = start;
> +	em->block_len = len;

Any particular reason you are using start/len and not the constants as in the 
previous 2 extent insertion cases? Makes it a lot easier to see what's happening 
without reading the above variables?

> +	ret = btrfs_add_extent_mapping(em_tree, &em, em->start, em->len);
> +	if (ret)
> +		test_msg("case1 [%llu %llu]: ret %d\n",
> +			 start, start + len, ret);
> +	if (em &&
> +	    (em->start != 0 || extent_map_end(em) != SZ_16K ||
> +	     em->block_start != 0 || em->block_len != SZ_16K))
> +		test_msg("case1 [%llu %llu]: ret %d return a wrong em (start %llu len %llu block_start %llu block_len %llu\n",
> +			 start, start + len, ret, em->start, em->len,
> +			 em->block_start, em->block_len);
> +	free_extent_map(em);
> +out:
> +	/* free memory */
> +	free_extent_map_tree(em_tree);
> +}
> +
> +/*
> + * Test scenario:
> + *
> + * Reading the inline ending up with EEXIST, ie. read an inline
> + * extent and discard page cache and read it again.
> + */
> +static void test_case_2(struct extent_map_tree *em_tree)
> +{
> +	struct extent_map *em;
> +	int ret;
> +
> +	em = alloc_extent_map();
> +	if (!em)
> +		/* Skip the test on error. */
> +		return;
> +
> +	/* Add [0, 1K) */
> +	em->start = 0;
> +	em->len = SZ_1K;
> +	em->block_start = EXTENT_MAP_INLINE;
> +	em->block_len = (u64)-1;
> +	ret = add_extent_mapping(em_tree, em, 0);
> +	ASSERT(ret == 0);
> +	free_extent_map(em);
> +
> +	/* Add [4K, 4K) following [0, 1K)  */
> +	em = alloc_extent_map();
> +	if (!em)
> +		goto out;
> +
> +	em->start = SZ_4K;
> +	em->len = SZ_4K;
> +	em->block_start = SZ_4K;
> +	em->block_len = SZ_4K;
> +	ret = add_extent_mapping(em_tree, em, 0);
> +	ASSERT(ret == 0);
> +	free_extent_map(em);
> +
> +	em = alloc_extent_map();
> +	if (!em)
> +		goto out;
> +
> +	/* Add [0, 1K) */
> +	em->start = 0;
> +	em->len = SZ_1K;
> +	em->block_start = EXTENT_MAP_INLINE;
> +	em->block_len = (u64)-1;
> +	ret = btrfs_add_extent_mapping(em_tree, &em, em->start, em->len);
> +	if (ret)
> +		test_msg("case2 [0 1K]: ret %d\n", ret);
> +	if (em &&
> +	    (em->start != 0 || extent_map_end(em) != SZ_1K ||
> +	     em->block_start != EXTENT_MAP_INLINE || em->block_len != (u64)-1))
> +		test_msg("case2 [0 1K]: ret %d return a wrong em (start %llu len %llu block_start %llu block_len %llu\n",
> +			 ret, em->start, em->len, em->block_start,
> +			 em->block_len);
> +	free_extent_map(em);
> +out:
> +	/* free memory */
> +	free_extent_map_tree(em_tree);
> +}
> +
> +int btrfs_test_extent_map()
> +{
> +	struct extent_map_tree *em_tree;
> +
> +	test_msg("Running extent_map tests\n");
> +
> +	em_tree = kzalloc(sizeof(*em_tree), GFP_KERNEL);
> +	if (!em_tree)
> +		/* Skip the test on error. */
> +		return 0;
> +
> +	extent_map_tree_init(em_tree);
> +
> +	test_case_1(em_tree);
> +	test_case_2(em_tree);
> +
> +	kfree(em_tree);
> +	return 0;
> +}
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 04/10] Btrfs: extent map selftest: buffered write vs dio read
  2017-12-21 22:42 ` [PATCH 04/10] Btrfs: extent map selftest: buffered write vs dio read Liu Bo
@ 2017-12-22  7:51   ` Nikolay Borisov
  2017-12-22 18:58     ` Liu Bo
  0 siblings, 1 reply; 24+ messages in thread
From: Nikolay Borisov @ 2017-12-22  7:51 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs



On 22.12.2017 00:42, Liu Bo wrote:
> This test case simulates the racy situation of buffered write vs dio
> read, and see if btrfs_get_extent() would return -EEXIST.

Isn't mixing dio/buffered IO on the same file (range?) considered
dangerous in any case?

> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/tests/extent-map-tests.c | 73 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 73 insertions(+)
> 
> diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
> index 0407396..2adf55f 100644
> --- a/fs/btrfs/tests/extent-map-tests.c
> +++ b/fs/btrfs/tests/extent-map-tests.c
> @@ -181,6 +181,78 @@ static void test_case_2(struct extent_map_tree *em_tree)
>  	free_extent_map_tree(em_tree);
>  }
>  
> +static void __test_case_3(struct extent_map_tree *em_tree, u64 start)
> +{
> +	struct extent_map *em;
> +	u64 len = SZ_4K;
> +	int ret;
> +
> +	em = alloc_extent_map();
> +	if (!em)
> +		/* Skip this test on error. */
> +		return;
> +
> +	/* Add [4K, 8K) */
> +	em->start = SZ_4K;
> +	em->len = SZ_4K;
> +	em->block_start = SZ_4K;
> +	em->block_len = SZ_4K;
> +	ret = add_extent_mapping(em_tree, em, 0);
> +	ASSERT(ret == 0);
> +	free_extent_map(em);
> +
> +	em = alloc_extent_map();
> +	if (!em)
> +		goto out;
> +
> +	/* Add [0, 16K) */
> +	em->start = 0;
> +	em->len = SZ_16K;
> +	em->block_start = 0;
> +	em->block_len = SZ_16K;
> +	ret = btrfs_add_extent_mapping(em_tree, &em, start, len);
> +	if (ret)
> +		test_msg("case3 [0x%llx 0x%llx): ret %d\n",
> +			 start, start + len, ret);
> +	/*
> +	 * Since bytes within em are contiguous, em->block_start is identical to
> +	 * em->start.
> +	 */
> +	if (em &&
> +	    (start < em->start || start + len > extent_map_end(em) ||
> +	     em->start != em->block_start || em->len != em->block_len))
> +		test_msg("case3 [0x%llx 0x%llx): ret %d em (start 0x%llx len 0x%llx block_start 0x%llx block_len 0x%llx)\n",
> +			 start, start + len, ret, em->start, em->len,
> +			 em->block_start, em->block_len);
> +	free_extent_map(em);
> +out:
> +	/* free memory */
> +	free_extent_map_tree(em_tree);
> +}
> +
> +/*
> + * Test scenario:
> + *
> + * Suppose that no extent map has been loaded into memory yet.
> + * There is a file extent [0, 16K), two jobs are running concurrently
> + * against it, t1 is buffered writing to [4K, 8K) and t2 is doing dio
> + * read from [0, 4K) or [8K, 12K) or [12K, 16K).
> + *
> + * t1 goes ahead of t2 and adds em [4K, 8K) into tree.
> + *
> + *         t1                       t2
> + *  cow_file_range()	     btrfs_get_extent()
> + *                            -> lookup_extent_mapping()
> + *   -> add_extent_mapping()
> + *                            -> add_extent_mapping()
> + */
> +static void test_case_3(struct extent_map_tree *em_tree)
> +{
> +	__test_case_3(em_tree, 0);
> +	__test_case_3(em_tree, SZ_8K);
> +	__test_case_3(em_tree, (12 * 1024ULL));
> +}
> +
>  int btrfs_test_extent_map()
>  {
>  	struct extent_map_tree *em_tree;
> @@ -196,6 +268,7 @@ int btrfs_test_extent_map()
>  
>  	test_case_1(em_tree);
>  	test_case_2(em_tree);
> +	test_case_3(em_tree);
>  
>  	kfree(em_tree);
>  	return 0;
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 08/10] Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping
  2017-12-21 22:42 ` [PATCH 08/10] Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping Liu Bo
@ 2017-12-22  8:52   ` Nikolay Borisov
  0 siblings, 0 replies; 24+ messages in thread
From: Nikolay Borisov @ 2017-12-22  8:52 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs



On 22.12.2017 00:42, Liu Bo wrote:
> This is a subtle case, so in order to understand the problem, it'd be good to
> know the content of existing and em when any error occurs.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/extent_map.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> index d386cfb..a8b7e24 100644
> --- a/fs/btrfs/extent_map.c
> +++ b/fs/btrfs/extent_map.c
> @@ -550,17 +550,23 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
>  			*em_in = existing;
>  			ret = 0;
>  		} else {
> +			u64 orig_start = em->start;
> +			u64 orig_len = em->len;
> +
>  			/*
>  			 * The existing extent map is the one nearest to
>  			 * the [start, start + len) range which overlaps
>  			 */
>  			ret = merge_extent_mapping(em_tree, existing,
>  						   em, start, len);
> -			free_extent_map(existing);
>  			if (ret) {
>  				free_extent_map(em);
>  				*em_in = NULL;
> +				WARN_ONCE(ret, KERN_INFO "Unexpected error %d: merge existing(start %llu len %llu) with em(start %llu len %llu)\n",

I think this KERN_INFO is spurious and should be removed?

> +					  ret, existing->start, existing->len,
> +					  orig_start, orig_len);
>  			}
> +			free_extent_map(existing);
>  		}
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 09/10] Btrfs: add tracepoint for em's EEXIST case
  2017-12-21 22:42 ` [PATCH 09/10] Btrfs: add tracepoint for em's EEXIST case Liu Bo
@ 2017-12-22  8:56   ` Nikolay Borisov
  2017-12-22 19:07     ` Liu Bo
  0 siblings, 1 reply; 24+ messages in thread
From: Nikolay Borisov @ 2017-12-22  8:56 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs



On 22.12.2017 00:42, Liu Bo wrote:
> This is adding a tracepoint 'btrfs_handle_em_exist' to help debug the
> subtle bugs around merge_extent_mapping.

In the next patch you are already making the function which takes all
these values noinline, meaning you can attach a kprobe so you can
interrogate the args via systemtap,perf probe or even bpf. So I'd rather
not add this tracepoint since the general sentiment seems to be that
tracepoints are ABI and so have to be maintained.

> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/extent_map.c        |  1 +
>  include/trace/events/btrfs.h | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> index a8b7e24..40e4d30 100644
> --- a/fs/btrfs/extent_map.c
> +++ b/fs/btrfs/extent_map.c
> @@ -539,6 +539,7 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
>  		ret = 0;
>  
>  		existing = search_extent_mapping(em_tree, start, len);
> +		trace_btrfs_handle_em_exist(existing, em, start, len);
>  
>  		/*
>  		 * existing will always be non-NULL, since there must be
> diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
> index 4342a32..b7ffcf7 100644
> --- a/include/trace/events/btrfs.h
> +++ b/include/trace/events/btrfs.h
> @@ -249,6 +249,41 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
>  		  __entry->refs, __entry->compress_type)
>  );
>  
> +TRACE_EVENT(btrfs_handle_em_exist,
> +
> +	TP_PROTO(const struct extent_map *existing, const struct extent_map *map, u64 start, u64 len),
> +
> +	TP_ARGS(existing, map, start, len),
> +
> +	TP_STRUCT__entry(
> +		__field(	u64,  e_start		)
> +		__field(	u64,  e_len		)
> +		__field(	u64,  map_start		)
> +		__field(	u64,  map_len		)
> +		__field(	u64,  start		)
> +		__field(	u64,  len		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->e_start	= existing->start;
> +		__entry->e_len		= existing->len;
> +		__entry->map_start	= map->start;
> +		__entry->map_len	= map->len;
> +		__entry->start		= start;
> +		__entry->len		= len;
> +	),
> +
> +	TP_printk("start=%llu len=%llu "
> +		  "existing(start=%llu len=%llu) "
> +		  "em(start=%llu len=%llu)",
> +		  (unsigned long long)__entry->start,
> +		  (unsigned long long)__entry->len,
> +		  (unsigned long long)__entry->e_start,
> +		  (unsigned long long)__entry->e_len,
> +		  (unsigned long long)__entry->map_start,
> +		  (unsigned long long)__entry->map_len)
> +);
> +
>  /* file extent item */
>  DECLARE_EVENT_CLASS(btrfs__file_extent_item_regular,
>  
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 07/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent
  2017-12-21 22:42 ` [PATCH 07/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent Liu Bo
@ 2017-12-22 12:10   ` Nikolay Borisov
  2017-12-22 19:19     ` Liu Bo
  0 siblings, 1 reply; 24+ messages in thread
From: Nikolay Borisov @ 2017-12-22 12:10 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs



On 22.12.2017 00:42, Liu Bo wrote:
> This fixes a corner case that is caused by a race of dio write vs dio
> read/write.
> 
> dio write:
> [0, 32k) -> [0, 8k) + [8k, 32k)
> 
> dio read/write:
> 
> While get_extent() with [0, 4k), [0, 8k) is found as existing em, even
> though start == existing->start, em is [0, 32k),
> extent_map_end(em) > extent_map_end(existing),
> then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
> (with a length 0), and get_extent ends up returning -EEXIST, and dio
> read/write will get -EEXIST which is confusing applications.

I think this paragraph should be expanded a bit since you've crammed a
lot of information in few words.

In the below graphs em should be extent we'd like to insert, no? So
given that it always encompasses the existing (0, 8k given the above
description) then em should be 0, 32k ?

> 
> Here I concluded all the possible situations,
> 1) start < existing->start
> 
>             +-----------+em+-----------+
> +--prev---+ |     +-------------+      |
> |         | |     |             |      |
> +---------+ +     +---+existing++      ++
>                 +
>                 |
>                 +
>              start
> 
> 2) start == existing->start
> 
>       +------------em------------+
>       |     +-------------+      |
>       |     |             |      |
>       +     +----existing-+      +
>             |
>             |
>             +
>          start
> 
> 3) start > existing->start && start < (existing->start + existing->len)
> 
>       +------------em------------+
>       |     +-------------+      |
>       |     |             |      |
>       +     +----existing-+      +
>                |
>                |
>                +
>              start
> 
> 4) start >= (existing->start + existing->len)
> 
> +-----------+em+-----------+
> |     +-------------+      | +--next---+
> |     |             |      | |         |
> +     +---+existing++      + +---------+
>                       +
>                       |
>                       +
>                    start
> 
> After going thru the above case by case, it turns out that if start is
> within existing em (front inclusive), then the existing em should be
> returned, otherwise, we try our best to merge candidate em with
> sibling ems to form a larger em.
> 
> Reported-by: David Vallender <david.vallender@landmark.co.uk>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/extent_map.c | 25 ++++++++++---------------
>  1 file changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> index 6653b08..d386cfb 100644
> --- a/fs/btrfs/extent_map.c
> +++ b/fs/btrfs/extent_map.c
> @@ -483,7 +483,7 @@ static struct extent_map *prev_extent_map(struct extent_map *em)
>  static int merge_extent_mapping(struct extent_map_tree *em_tree,
>  				struct extent_map *existing,
>  				struct extent_map *em,
> -				u64 map_start)
> +				u64 map_start, u64 map_len)
>  {
>  	struct extent_map *prev;
>  	struct extent_map *next;
> @@ -496,9 +496,13 @@ static int merge_extent_mapping(struct extent_map_tree *em_tree,
>  	if (existing->start > map_start) {
>  		next = existing;
>  		prev = prev_extent_map(next);
> +		if (prev)
> +			ASSERT(extent_map_end(prev) <= map_start);
>  	} else {
>  		prev = existing;
>  		next = next_extent_map(prev);
> +		if (next)
> +			ASSERT(map_start + map_len <= next->start);
>  	}
>  
>  	start = prev ? extent_map_end(prev) : em->start;
> @@ -540,35 +544,26 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
>  		 * existing will always be non-NULL, since there must be
>  		 * extent causing the -EEXIST.
>  		 */
> -		if (existing->start == em->start &&
> -		    extent_map_end(existing) >= extent_map_end(em) &&
> -		    em->block_start == existing->block_start) {
> -			/*
> -			 * The existing extent map already encompasses the
> -			 * entire extent map we tried to add.
> -			 */
> +		if (start >= existing->start &&
> +		    start < extent_map_end(existing)) {
>  			free_extent_map(em);
>  			*em_in = existing;
>  			ret = 0;
> -		} else if (start >= extent_map_end(existing) ||
> -		    start <= existing->start) {
> +		} else {
>  			/*
>  			 * The existing extent map is the one nearest to
>  			 * the [start, start + len) range which overlaps
>  			 */
>  			ret = merge_extent_mapping(em_tree, existing,
> -						   em, start);
> +						   em, start, len);
>  			free_extent_map(existing);
>  			if (ret) {
>  				free_extent_map(em);
>  				*em_in = NULL;
>  			}
> -		} else {
> -			free_extent_map(em);
> -			*em_in = existing;
> -			ret = 0;
>  		}
>  	}
> +
>  	ASSERT(ret == 0 || ret == -EEXIST);
>  	return ret;
>  }
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 01/10] Btrfs: add helper for em merge logic
  2017-12-22  7:23   ` Nikolay Borisov
@ 2017-12-22 18:45     ` Liu Bo
  0 siblings, 0 replies; 24+ messages in thread
From: Liu Bo @ 2017-12-22 18:45 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Fri, Dec 22, 2017 at 09:23:40AM +0200, Nikolay Borisov wrote:
> 
> 
> On 22.12.2017 00:42, Liu Bo wrote:
> > This is a prepare work for the following extent map selftest, which
> > runs tests against em merge logic.
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/ctree.h |   2 ++
> >  fs/btrfs/inode.c | 101 ++++++++++++++++++++++++++++++-------------------------
> >  2 files changed, 58 insertions(+), 45 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index b2e09fe..328f40f 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -3148,6 +3148,8 @@ struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
> >  						    int delay_iput);
> >  void btrfs_wait_and_free_delalloc_work(struct btrfs_delalloc_work *work);
> >  
> > +int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
> > +			     struct extent_map **em_in, u64 start, u64 len);
> >  struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
> >  		struct page *page, size_t pg_offset, u64 start,
> >  		u64 len, int create);
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index e1a7f3c..527df6f 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -6911,6 +6911,61 @@ static noinline int uncompress_inline(struct btrfs_path *path,
> >  	return ret;
> >  }
> >  
> > +int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
> > +			     struct extent_map **em_in, u64 start, u64 len)
> 
> How about adding the following comment above the function: 
> 
> /**                                                                             
>  * btrfs_add_extent_mapping - try to add given extent mapping                   
>  * @em_tree - the extent tree into which we want to add the mapping             
>  * @em_in - extent we are inserting                                             
>  * @start - the start of the logical range of the extent we are adding          
>  * @len - logical length of the extent                                          
>  *                                                                              
>  * Insert @em_in into @em_tree. In case there is an overlapping range, handle   
>  * the -EEXIST by either:                                                       
>  * a) Returning the existing extent in @em_in if there is a full overlap        
>  * b) Merge the extents if they are near each other.                            
>  *                                                                              
>  * Returns 0 on success or a negative error code                               
>  *                                                                              
>  */ 
> 

Appreciate your comments.

Sure, comments are always welcome.

> Also one thing which I'm not very clear is why do we need the start/len, aren't 
> those already set in em_in ?
>

[start, start+len) is within *em_in, ie.

|----*em_in--------|
   |------|
  start   start+len

What we really care about is [start, start+len), which is passed to
btrfs_get_extent().

For example, if a file extent item on disk is [0, 32k), given a tuple
(start=0, len=4k), reading [0, 4k) would end up with *em_in being
[0, 32k).

So if add_extent_mapping(*em_in) returns EEXIST, then we know the
'existing' em is overlapped with *em_in, then we need to check whether
it's overlapped with [start, start+len).

thanks,
-liubo

> 
> 
> > +{
> > +	int ret;
> > +	struct extent_map *em = *em_in;
> > +
> > +	ret = add_extent_mapping(em_tree, em, 0);
> > +	/* it is possible that someone inserted the extent into the tree
> > +	 * while we had the lock dropped.  It is also possible that
> > +	 * an overlapping map exists in the tree
> > +	 */
> > +	if (ret == -EEXIST) {
> > +		struct extent_map *existing;
> > +
> > +		ret = 0;
> > +
> > +		existing = search_extent_mapping(em_tree, start, len);
> > +
> > +		/*
> > +		 * existing will always be non-NULL, since there must be
> > +		 * extent causing the -EEXIST.
> > +		 */
> > +		if (existing->start == em->start &&
> > +		    extent_map_end(existing) >= extent_map_end(em) &&
> > +		    em->block_start == existing->block_start) {
> > +			/*
> > +			 * The existing extent map already encompasses the
> > +			 * entire extent map we tried to add.
> > +			 */
> > +			free_extent_map(em);
> > +			*em_in = existing;
> > +			ret = 0;
> > +		} else if (start >= extent_map_end(existing) ||
> > +		    start <= existing->start) {
> > +			/*
> > +			 * The existing extent map is the one nearest to
> > +			 * the [start, start + len) range which overlaps
> > +			 */
> > +			ret = merge_extent_mapping(em_tree, existing,
> > +						   em, start);
> > +			free_extent_map(existing);
> > +			if (ret) {
> > +				free_extent_map(em);
> > +				*em_in = NULL;
> > +			}
> > +		} else {
> > +			free_extent_map(em);
> > +			*em_in = existing;
> > +			ret = 0;
> > +		}
> > +	}
> > +	ASSERT(ret == 0 || ret == -EEXIST);
> > +	return ret;
> > +}
> > +
> >  /*
> >   * a bit scary, this does extent mapping from logical file offset to the disk.
> >   * the ugly parts come from merging extents from the disk with the in-ram
> > @@ -7147,51 +7202,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
> >  
> >  	err = 0;
> >  	write_lock(&em_tree->lock);
> > -	ret = add_extent_mapping(em_tree, em, 0);
> > -	/* it is possible that someone inserted the extent into the tree
> > -	 * while we had the lock dropped.  It is also possible that
> > -	 * an overlapping map exists in the tree
> > -	 */
> > -	if (ret == -EEXIST) {
> > -		struct extent_map *existing;
> > -
> > -		ret = 0;
> > -
> > -		existing = search_extent_mapping(em_tree, start, len);
> > -		/*
> > -		 * existing will always be non-NULL, since there must be
> > -		 * extent causing the -EEXIST.
> > -		 */
> > -		if (existing->start == em->start &&
> > -		    extent_map_end(existing) >= extent_map_end(em) &&
> > -		    em->block_start == existing->block_start) {
> > -			/*
> > -			 * The existing extent map already encompasses the
> > -			 * entire extent map we tried to add.
> > -			 */
> > -			free_extent_map(em);
> > -			em = existing;
> > -			err = 0;
> > -
> > -		} else if (start >= extent_map_end(existing) ||
> > -		    start <= existing->start) {
> > -			/*
> > -			 * The existing extent map is the one nearest to
> > -			 * the [start, start + len) range which overlaps
> > -			 */
> > -			err = merge_extent_mapping(em_tree, existing,
> > -						   em, start);
> > -			free_extent_map(existing);
> > -			if (err) {
> > -				free_extent_map(em);
> > -				em = NULL;
> > -			}
> > -		} else {
> > -			free_extent_map(em);
> > -			em = existing;
> > -			err = 0;
> > -		}
> > -	}
> > +	err = btrfs_add_extent_mapping(em_tree, &em, start, len);
> >  	write_unlock(&em_tree->lock);
> >  out:
> >  
> > 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/10] Btrfs: add extent map selftests
  2017-12-22  7:42   ` Nikolay Borisov
@ 2017-12-22 18:53     ` Liu Bo
  0 siblings, 0 replies; 24+ messages in thread
From: Liu Bo @ 2017-12-22 18:53 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Fri, Dec 22, 2017 at 09:42:17AM +0200, Nikolay Borisov wrote:
> 
> 
> On 22.12.2017 00:42, Liu Bo wrote:
> > We've observed that btrfs_get_extent() and merge_extent_mapping() could
> > return -EEXIST in several cases, and they are caused by some racy
> > condition, e.g dio read vs dio write, which makes the problem very tricky
> > to reproduce.
> > 
> > This adds extent map selftests in order to simulate those racy situations.
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/Makefile                 |   2 +-
> >  fs/btrfs/tests/btrfs-tests.c      |   3 +
> >  fs/btrfs/tests/btrfs-tests.h      |   1 +
> >  fs/btrfs/tests/extent-map-tests.c | 202 ++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 207 insertions(+), 1 deletion(-)
> >  create mode 100644 fs/btrfs/tests/extent-map-tests.c
> > 
> > diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> > index 6fe881d..0c43736 100644
> > --- a/fs/btrfs/Makefile
> > +++ b/fs/btrfs/Makefile
> > @@ -19,4 +19,4 @@ btrfs-$(CONFIG_BTRFS_FS_REF_VERIFY) += ref-verify.o
> >  btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \
> >  	tests/extent-buffer-tests.o tests/btrfs-tests.o \
> >  	tests/extent-io-tests.o tests/inode-tests.o tests/qgroup-tests.o \
> > -	tests/free-space-tree-tests.o
> > +	tests/free-space-tree-tests.o tests/extent-map-tests.o
> > diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
> > index d3f2537..9786d8c 100644
> > --- a/fs/btrfs/tests/btrfs-tests.c
> > +++ b/fs/btrfs/tests/btrfs-tests.c
> > @@ -277,6 +277,9 @@ int btrfs_run_sanity_tests(void)
> >  				goto out;
> >  		}
> >  	}
> > +	ret = btrfs_test_extent_map();
> > +	if (ret)
> > +		goto out;
> >  out:
> >  	btrfs_destroy_test_fs();
> >  	return ret;
> > diff --git a/fs/btrfs/tests/btrfs-tests.h b/fs/btrfs/tests/btrfs-tests.h
> > index 266f1e3..bc0615b 100644
> > --- a/fs/btrfs/tests/btrfs-tests.h
> > +++ b/fs/btrfs/tests/btrfs-tests.h
> > @@ -33,6 +33,7 @@ int btrfs_test_extent_io(u32 sectorsize, u32 nodesize);
> >  int btrfs_test_inodes(u32 sectorsize, u32 nodesize);
> >  int btrfs_test_qgroups(u32 sectorsize, u32 nodesize);
> >  int btrfs_test_free_space_tree(u32 sectorsize, u32 nodesize);
> > +int btrfs_test_extent_map(void);
> >  struct inode *btrfs_new_test_inode(void);
> >  struct btrfs_fs_info *btrfs_alloc_dummy_fs_info(u32 nodesize, u32 sectorsize);
> >  void btrfs_free_dummy_fs_info(struct btrfs_fs_info *fs_info);
> > diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
> > new file mode 100644
> > index 0000000..0407396
> > --- /dev/null
> > +++ b/fs/btrfs/tests/extent-map-tests.c
> > @@ -0,0 +1,202 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All rights reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public
> > + * License v2 as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public
> > + * License along with this program; if not, write to the
> > + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> > + * Boston, MA 021110-1307, USA.
> > + */
> > +
> > +#include <linux/types.h>
> > +#include "btrfs-tests.h"
> > +#include "../ctree.h"
> > +
> > +static void free_extent_map_tree(struct extent_map_tree *em_tree)
> > +{
> > +	struct extent_map *em;
> > +	struct rb_node *node;
> > +
> > +	while (!RB_EMPTY_ROOT(&em_tree->map)) {
> > +		node = rb_first(&em_tree->map);
> > +		em = rb_entry(node, struct extent_map, rb_node);
> > +		remove_extent_mapping(em_tree, em);
> > +
> > +#ifdef CONFIG_BTRFS_DEBUG
> > +		if (refcount_read(&em->refs) != 1) {
> > +			test_msg(
> > +				 "Oops we have a em leak: em (start 0x%llx len 0x%llx block_start 0x%llx block_len 0x%llx) refs %d\n",
> > +				 em->start, em->len, em->block_start,
> > +				 em->block_len, refcount_read(&em->refs));
> > +
> > +			refcount_set(&em->refs, 1);
> > +		}
> > +#endif
> > +		free_extent_map(em);
> > +	}
> > +}
> > +
> > +/*
> > + * Test scenario:
> > + *
> > + * Suppose that no extent map has been loaded into memory yet, there is a file
> > + * extent [0, 16K), followed by another file extent [16K, 20K), two dio reads
> > + * are entering btrfs_get_extent() concurrently, t1 is reading [8K, 16K), t2 is
> > + * reading [0, 8K)
> > + *
> > + *     t1                            t2
> > + *  btrfs_get_extent()              btrfs_get_extent()
> > + *    -> lookup_extent_mapping()      ->lookup_extent_mapping()
> > + *    -> add_extent_mapping(0, 16K)
> > + *    -> return em
> > + *                                    ->add_extent_mapping(0, 16K)
> > + *                                    -> #handle -EEXIST
> > + */
> > +static void test_case_1(struct extent_map_tree *em_tree)
> > +{
> 
> So here you are testing the "fully overlapping" case in btrfs_add_extent_mapping. 
> What about another test case about merging 2 adjacent extents i.e. this block of code: 
> 
> } else if (start >= extent_map_end(existing) || start <= existing->start) {
>

They're in the following patches.

> > +	struct extent_map *em;
> > +	u64 start = 0;
> > +	u64 len = SZ_8K;
> > +	int ret;
> > +
> > +	em = alloc_extent_map();
> > +	if (!em)
> > +		/* Skip the test on error. */
> > +		return;
> > +
> > +	/* Add [0, 16K) */
> > +	em->start = 0;
> > +	em->len = SZ_16K;
> > +	em->block_start = 0;
> > +	em->block_len = SZ_16K;
> > +	ret = add_extent_mapping(em_tree, em, 0);
> > +	ASSERT(ret == 0);
> > +	free_extent_map(em);
> > +
> > +	/* Add [16K, 20K) following [0, 16K)  */
> > +	em = alloc_extent_map();
> > +	if (!em)
> > +		goto out;
> > +
> > +	em->start = SZ_16K;
> > +	em->len = SZ_4K;
> > +	em->block_start = SZ_32K; /* avoid merging */
> > +	em->block_len = SZ_4K;
> > +	ret = add_extent_mapping(em_tree, em, 0);
> > +	ASSERT(ret == 0);
> > +	free_extent_map(em);
> > +
> > +	em = alloc_extent_map();
> > +	if (!em)
> > +		goto out;
> > +
> > +	/* Add [0, 8K), should return [0, 16K) instead. */
> > +	em->start = start;
> > +	em->len = len;
> > +	em->block_start = start;
> > +	em->block_len = len;
> 
> Any particular reason you are using start/len and not the constants as in the 
> previous 2 extent insertion cases? Makes it a lot easier to see what's happening 
> without reading the above variables?
>

The above comment showed it, didn't it?

The following if (ret) and if (em...) also uses start and len.

thanks,
-liubo
> > +	ret = btrfs_add_extent_mapping(em_tree, &em, em->start, em->len);
> > +	if (ret)
> > +		test_msg("case1 [%llu %llu]: ret %d\n",
> > +			 start, start + len, ret);
> > +	if (em &&
> > +	    (em->start != 0 || extent_map_end(em) != SZ_16K ||
> > +	     em->block_start != 0 || em->block_len != SZ_16K))
> > +		test_msg("case1 [%llu %llu]: ret %d return a wrong em (start %llu len %llu block_start %llu block_len %llu\n",
> > +			 start, start + len, ret, em->start, em->len,
> > +			 em->block_start, em->block_len);
> > +	free_extent_map(em);
> > +out:
> > +	/* free memory */
> > +	free_extent_map_tree(em_tree);
> > +}
> > +
> > +/*
> > + * Test scenario:
> > + *
> > + * Reading the inline ending up with EEXIST, ie. read an inline
> > + * extent and discard page cache and read it again.
> > + */
> > +static void test_case_2(struct extent_map_tree *em_tree)
> > +{
> > +	struct extent_map *em;
> > +	int ret;
> > +
> > +	em = alloc_extent_map();
> > +	if (!em)
> > +		/* Skip the test on error. */
> > +		return;
> > +
> > +	/* Add [0, 1K) */
> > +	em->start = 0;
> > +	em->len = SZ_1K;
> > +	em->block_start = EXTENT_MAP_INLINE;
> > +	em->block_len = (u64)-1;
> > +	ret = add_extent_mapping(em_tree, em, 0);
> > +	ASSERT(ret == 0);
> > +	free_extent_map(em);
> > +
> > +	/* Add [4K, 4K) following [0, 1K)  */
> > +	em = alloc_extent_map();
> > +	if (!em)
> > +		goto out;
> > +
> > +	em->start = SZ_4K;
> > +	em->len = SZ_4K;
> > +	em->block_start = SZ_4K;
> > +	em->block_len = SZ_4K;
> > +	ret = add_extent_mapping(em_tree, em, 0);
> > +	ASSERT(ret == 0);
> > +	free_extent_map(em);
> > +
> > +	em = alloc_extent_map();
> > +	if (!em)
> > +		goto out;
> > +
> > +	/* Add [0, 1K) */
> > +	em->start = 0;
> > +	em->len = SZ_1K;
> > +	em->block_start = EXTENT_MAP_INLINE;
> > +	em->block_len = (u64)-1;
> > +	ret = btrfs_add_extent_mapping(em_tree, &em, em->start, em->len);
> > +	if (ret)
> > +		test_msg("case2 [0 1K]: ret %d\n", ret);
> > +	if (em &&
> > +	    (em->start != 0 || extent_map_end(em) != SZ_1K ||
> > +	     em->block_start != EXTENT_MAP_INLINE || em->block_len != (u64)-1))
> > +		test_msg("case2 [0 1K]: ret %d return a wrong em (start %llu len %llu block_start %llu block_len %llu\n",
> > +			 ret, em->start, em->len, em->block_start,
> > +			 em->block_len);
> > +	free_extent_map(em);
> > +out:
> > +	/* free memory */
> > +	free_extent_map_tree(em_tree);
> > +}
> > +
> > +int btrfs_test_extent_map()
> > +{
> > +	struct extent_map_tree *em_tree;
> > +
> > +	test_msg("Running extent_map tests\n");
> > +
> > +	em_tree = kzalloc(sizeof(*em_tree), GFP_KERNEL);
> > +	if (!em_tree)
> > +		/* Skip the test on error. */
> > +		return 0;
> > +
> > +	extent_map_tree_init(em_tree);
> > +
> > +	test_case_1(em_tree);
> > +	test_case_2(em_tree);
> > +
> > +	kfree(em_tree);
> > +	return 0;
> > +}
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 04/10] Btrfs: extent map selftest: buffered write vs dio read
  2017-12-22  7:51   ` Nikolay Borisov
@ 2017-12-22 18:58     ` Liu Bo
  0 siblings, 0 replies; 24+ messages in thread
From: Liu Bo @ 2017-12-22 18:58 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Fri, Dec 22, 2017 at 09:51:24AM +0200, Nikolay Borisov wrote:
> 
> 
> On 22.12.2017 00:42, Liu Bo wrote:
> > This test case simulates the racy situation of buffered write vs dio
> > read, and see if btrfs_get_extent() would return -EEXIST.
> 
> Isn't mixing dio/buffered IO on the same file (range?) considered
> dangerous in any case?

They are, but it is sometimes the way how applications work.

thanks,
-liubo

> 
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/tests/extent-map-tests.c | 73 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 73 insertions(+)
> > 
> > diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
> > index 0407396..2adf55f 100644
> > --- a/fs/btrfs/tests/extent-map-tests.c
> > +++ b/fs/btrfs/tests/extent-map-tests.c
> > @@ -181,6 +181,78 @@ static void test_case_2(struct extent_map_tree *em_tree)
> >  	free_extent_map_tree(em_tree);
> >  }
> >  
> > +static void __test_case_3(struct extent_map_tree *em_tree, u64 start)
> > +{
> > +	struct extent_map *em;
> > +	u64 len = SZ_4K;
> > +	int ret;
> > +
> > +	em = alloc_extent_map();
> > +	if (!em)
> > +		/* Skip this test on error. */
> > +		return;
> > +
> > +	/* Add [4K, 8K) */
> > +	em->start = SZ_4K;
> > +	em->len = SZ_4K;
> > +	em->block_start = SZ_4K;
> > +	em->block_len = SZ_4K;
> > +	ret = add_extent_mapping(em_tree, em, 0);
> > +	ASSERT(ret == 0);
> > +	free_extent_map(em);
> > +
> > +	em = alloc_extent_map();
> > +	if (!em)
> > +		goto out;
> > +
> > +	/* Add [0, 16K) */
> > +	em->start = 0;
> > +	em->len = SZ_16K;
> > +	em->block_start = 0;
> > +	em->block_len = SZ_16K;
> > +	ret = btrfs_add_extent_mapping(em_tree, &em, start, len);
> > +	if (ret)
> > +		test_msg("case3 [0x%llx 0x%llx): ret %d\n",
> > +			 start, start + len, ret);
> > +	/*
> > +	 * Since bytes within em are contiguous, em->block_start is identical to
> > +	 * em->start.
> > +	 */
> > +	if (em &&
> > +	    (start < em->start || start + len > extent_map_end(em) ||
> > +	     em->start != em->block_start || em->len != em->block_len))
> > +		test_msg("case3 [0x%llx 0x%llx): ret %d em (start 0x%llx len 0x%llx block_start 0x%llx block_len 0x%llx)\n",
> > +			 start, start + len, ret, em->start, em->len,
> > +			 em->block_start, em->block_len);
> > +	free_extent_map(em);
> > +out:
> > +	/* free memory */
> > +	free_extent_map_tree(em_tree);
> > +}
> > +
> > +/*
> > + * Test scenario:
> > + *
> > + * Suppose that no extent map has been loaded into memory yet.
> > + * There is a file extent [0, 16K), two jobs are running concurrently
> > + * against it, t1 is buffered writing to [4K, 8K) and t2 is doing dio
> > + * read from [0, 4K) or [8K, 12K) or [12K, 16K).
> > + *
> > + * t1 goes ahead of t2 and adds em [4K, 8K) into tree.
> > + *
> > + *         t1                       t2
> > + *  cow_file_range()	     btrfs_get_extent()
> > + *                            -> lookup_extent_mapping()
> > + *   -> add_extent_mapping()
> > + *                            -> add_extent_mapping()
> > + */
> > +static void test_case_3(struct extent_map_tree *em_tree)
> > +{
> > +	__test_case_3(em_tree, 0);
> > +	__test_case_3(em_tree, SZ_8K);
> > +	__test_case_3(em_tree, (12 * 1024ULL));
> > +}
> > +
> >  int btrfs_test_extent_map()
> >  {
> >  	struct extent_map_tree *em_tree;
> > @@ -196,6 +268,7 @@ int btrfs_test_extent_map()
> >  
> >  	test_case_1(em_tree);
> >  	test_case_2(em_tree);
> > +	test_case_3(em_tree);
> >  
> >  	kfree(em_tree);
> >  	return 0;
> > 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 09/10] Btrfs: add tracepoint for em's EEXIST case
  2017-12-22  8:56   ` Nikolay Borisov
@ 2017-12-22 19:07     ` Liu Bo
  2017-12-23  7:39       ` Nikolay Borisov
  0 siblings, 1 reply; 24+ messages in thread
From: Liu Bo @ 2017-12-22 19:07 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Fri, Dec 22, 2017 at 10:56:31AM +0200, Nikolay Borisov wrote:
> 
> 
> On 22.12.2017 00:42, Liu Bo wrote:
> > This is adding a tracepoint 'btrfs_handle_em_exist' to help debug the
> > subtle bugs around merge_extent_mapping.
> 
> In the next patch you are already making the function which takes all
> these values noinline, meaning you can attach a kprobe so you can
> interrogate the args via systemtap,perf probe or even bpf. So I'd rather
> not add this tracepoint since the general sentiment seems to be that
> tracepoints are ABI and so have to be maintained.
>

The following patch of noinline function is inside the else {} branch,
so kprobe would give us something back only when it runs to else{}.

Since subtle bugs may lie in this area, a simple tracepoint like this
can help us understand them more efficiently.

thanks,
-liubo

> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/extent_map.c        |  1 +
> >  include/trace/events/btrfs.h | 35 +++++++++++++++++++++++++++++++++++
> >  2 files changed, 36 insertions(+)
> > 
> > diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> > index a8b7e24..40e4d30 100644
> > --- a/fs/btrfs/extent_map.c
> > +++ b/fs/btrfs/extent_map.c
> > @@ -539,6 +539,7 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
> >  		ret = 0;
> >  
> >  		existing = search_extent_mapping(em_tree, start, len);
> > +		trace_btrfs_handle_em_exist(existing, em, start, len);
> >  
> >  		/*
> >  		 * existing will always be non-NULL, since there must be
> > diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
> > index 4342a32..b7ffcf7 100644
> > --- a/include/trace/events/btrfs.h
> > +++ b/include/trace/events/btrfs.h
> > @@ -249,6 +249,41 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
> >  		  __entry->refs, __entry->compress_type)
> >  );
> >  
> > +TRACE_EVENT(btrfs_handle_em_exist,
> > +
> > +	TP_PROTO(const struct extent_map *existing, const struct extent_map *map, u64 start, u64 len),
> > +
> > +	TP_ARGS(existing, map, start, len),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(	u64,  e_start		)
> > +		__field(	u64,  e_len		)
> > +		__field(	u64,  map_start		)
> > +		__field(	u64,  map_len		)
> > +		__field(	u64,  start		)
> > +		__field(	u64,  len		)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->e_start	= existing->start;
> > +		__entry->e_len		= existing->len;
> > +		__entry->map_start	= map->start;
> > +		__entry->map_len	= map->len;
> > +		__entry->start		= start;
> > +		__entry->len		= len;
> > +	),
> > +
> > +	TP_printk("start=%llu len=%llu "
> > +		  "existing(start=%llu len=%llu) "
> > +		  "em(start=%llu len=%llu)",
> > +		  (unsigned long long)__entry->start,
> > +		  (unsigned long long)__entry->len,
> > +		  (unsigned long long)__entry->e_start,
> > +		  (unsigned long long)__entry->e_len,
> > +		  (unsigned long long)__entry->map_start,
> > +		  (unsigned long long)__entry->map_len)
> > +);
> > +
> >  /* file extent item */
> >  DECLARE_EVENT_CLASS(btrfs__file_extent_item_regular,
> >  
> > 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 07/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent
  2017-12-22 12:10   ` Nikolay Borisov
@ 2017-12-22 19:19     ` Liu Bo
  0 siblings, 0 replies; 24+ messages in thread
From: Liu Bo @ 2017-12-22 19:19 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Fri, Dec 22, 2017 at 02:10:45PM +0200, Nikolay Borisov wrote:
> 
> 
> On 22.12.2017 00:42, Liu Bo wrote:
> > This fixes a corner case that is caused by a race of dio write vs dio
> > read/write.
> > 
> > dio write:
> > [0, 32k) -> [0, 8k) + [8k, 32k)
> > 
> > dio read/write:
> > 
> > While get_extent() with [0, 4k), [0, 8k) is found as existing em, even
> > though start == existing->start, em is [0, 32k),
> > extent_map_end(em) > extent_map_end(existing),
> > then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
> > (with a length 0), and get_extent ends up returning -EEXIST, and dio
> > read/write will get -EEXIST which is confusing applications.
> 
> I think this paragraph should be expanded a bit since you've crammed a
> lot of information in few words.
> 

OK, it's not easy to explain the problem, either.

Probably I should also attach the following as a extra explanation,

+ * Suppose that no extent map has been loaded into memory yet.
+ * There is a file extent [0, 32K), two jobs are running concurrently
+ * against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio
+ * read from [0, 4K) or [4K, 8K).
+ *
+ * t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K).
+ *
+ *         t1                                t2
+ *  btrfs_get_blocks_direct()         btrfs_get_blocks_direct()
+ *   -> btrfs_get_extent()              -> btrfs_get_extent()
+ *       -> lookup_extent_mapping()
+ *       -> add_extent_mapping()            -> lookup_extent_mapping()
+ *          # load [0, 32K)
+ *   -> btrfs_new_extent_direct()
+ *       -> btrfs_drop_extent_cache()
+ *          # split [0, 32K)
+ *       -> add_extent_mapping()
+ *          # add [8K, 32K)
+ *                                          -> add_extent_mapping()
+ *                                             # handle -EEXIST when adding
+ *                                             # [0, 32K)


> In the below graphs em should be extent we'd like to insert, no?
> So given that it always encompasses the existing (0, 8k given the
> above description) then em should be 0, 32k ?

Yes and yes.

thanks,
-liubo

> > 
> > Here I concluded all the possible situations,
> > 1) start < existing->start
> > 
> >             +-----------+em+-----------+
> > +--prev---+ |     +-------------+      |
> > |         | |     |             |      |
> > +---------+ +     +---+existing++      ++
> >                 +
> >                 |
> >                 +
> >              start
> > 
> > 2) start == existing->start
> > 
> >       +------------em------------+
> >       |     +-------------+      |
> >       |     |             |      |
> >       +     +----existing-+      +
> >             |
> >             |
> >             +
> >          start
> > 
> > 3) start > existing->start && start < (existing->start + existing->len)
> > 
> >       +------------em------------+
> >       |     +-------------+      |
> >       |     |             |      |
> >       +     +----existing-+      +
> >                |
> >                |
> >                +
> >              start
> > 
> > 4) start >= (existing->start + existing->len)
> > 
> > +-----------+em+-----------+
> > |     +-------------+      | +--next---+
> > |     |             |      | |         |
> > +     +---+existing++      + +---------+
> >                       +
> >                       |
> >                       +
> >                    start
> > 
> > After going thru the above case by case, it turns out that if start is
> > within existing em (front inclusive), then the existing em should be
> > returned, otherwise, we try our best to merge candidate em with
> > sibling ems to form a larger em.
> > 
> > Reported-by: David Vallender <david.vallender@landmark.co.uk>
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/extent_map.c | 25 ++++++++++---------------
> >  1 file changed, 10 insertions(+), 15 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> > index 6653b08..d386cfb 100644
> > --- a/fs/btrfs/extent_map.c
> > +++ b/fs/btrfs/extent_map.c
> > @@ -483,7 +483,7 @@ static struct extent_map *prev_extent_map(struct extent_map *em)
> >  static int merge_extent_mapping(struct extent_map_tree *em_tree,
> >  				struct extent_map *existing,
> >  				struct extent_map *em,
> > -				u64 map_start)
> > +				u64 map_start, u64 map_len)
> >  {
> >  	struct extent_map *prev;
> >  	struct extent_map *next;
> > @@ -496,9 +496,13 @@ static int merge_extent_mapping(struct extent_map_tree *em_tree,
> >  	if (existing->start > map_start) {
> >  		next = existing;
> >  		prev = prev_extent_map(next);
> > +		if (prev)
> > +			ASSERT(extent_map_end(prev) <= map_start);
> >  	} else {
> >  		prev = existing;
> >  		next = next_extent_map(prev);
> > +		if (next)
> > +			ASSERT(map_start + map_len <= next->start);
> >  	}
> >  
> >  	start = prev ? extent_map_end(prev) : em->start;
> > @@ -540,35 +544,26 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
> >  		 * existing will always be non-NULL, since there must be
> >  		 * extent causing the -EEXIST.
> >  		 */
> > -		if (existing->start == em->start &&
> > -		    extent_map_end(existing) >= extent_map_end(em) &&
> > -		    em->block_start == existing->block_start) {
> > -			/*
> > -			 * The existing extent map already encompasses the
> > -			 * entire extent map we tried to add.
> > -			 */
> > +		if (start >= existing->start &&
> > +		    start < extent_map_end(existing)) {
> >  			free_extent_map(em);
> >  			*em_in = existing;
> >  			ret = 0;
> > -		} else if (start >= extent_map_end(existing) ||
> > -		    start <= existing->start) {
> > +		} else {
> >  			/*
> >  			 * The existing extent map is the one nearest to
> >  			 * the [start, start + len) range which overlaps
> >  			 */
> >  			ret = merge_extent_mapping(em_tree, existing,
> > -						   em, start);
> > +						   em, start, len);
> >  			free_extent_map(existing);
> >  			if (ret) {
> >  				free_extent_map(em);
> >  				*em_in = NULL;
> >  			}
> > -		} else {
> > -			free_extent_map(em);
> > -			*em_in = existing;
> > -			ret = 0;
> >  		}
> >  	}
> > +
> >  	ASSERT(ret == 0 || ret == -EEXIST);
> >  	return ret;
> >  }
> > 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 09/10] Btrfs: add tracepoint for em's EEXIST case
  2017-12-22 19:07     ` Liu Bo
@ 2017-12-23  7:39       ` Nikolay Borisov
  2018-01-02 18:02         ` Liu Bo
  0 siblings, 1 reply; 24+ messages in thread
From: Nikolay Borisov @ 2017-12-23  7:39 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs



On 22.12.2017 21:07, Liu Bo wrote:
> On Fri, Dec 22, 2017 at 10:56:31AM +0200, Nikolay Borisov wrote:
>>
>>
>> On 22.12.2017 00:42, Liu Bo wrote:
>>> This is adding a tracepoint 'btrfs_handle_em_exist' to help debug the
>>> subtle bugs around merge_extent_mapping.
>>
>> In the next patch you are already making the function which takes all
>> these values noinline, meaning you can attach a kprobe so you can
>> interrogate the args via systemtap,perf probe or even bpf. So I'd rather
>> not add this tracepoint since the general sentiment seems to be that
>> tracepoints are ABI and so have to be maintained.
>>
> 
> The following patch of noinline function is inside the else {} branch,
> so kprobe would give us something back only when it runs to else{}.
> 
> Since subtle bugs may lie in this area, a simple tracepoint like this
> can help us understand them more efficiently.

In that case if you make search_extent_mapping and put a kprobe/ret
kprobe there you should be able to gain the same information, we should
really careful when adding trace events...
> 
> thanks,
> -liubo
> 
>>>
>>> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
>>> ---
>>>  fs/btrfs/extent_map.c        |  1 +
>>>  include/trace/events/btrfs.h | 35 +++++++++++++++++++++++++++++++++++
>>>  2 files changed, 36 insertions(+)
>>>
>>> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
>>> index a8b7e24..40e4d30 100644
>>> --- a/fs/btrfs/extent_map.c
>>> +++ b/fs/btrfs/extent_map.c
>>> @@ -539,6 +539,7 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
>>>  		ret = 0;
>>>  
>>>  		existing = search_extent_mapping(em_tree, start, len);
>>> +		trace_btrfs_handle_em_exist(existing, em, start, len);
>>>  
>>>  		/*
>>>  		 * existing will always be non-NULL, since there must be
>>> diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
>>> index 4342a32..b7ffcf7 100644
>>> --- a/include/trace/events/btrfs.h
>>> +++ b/include/trace/events/btrfs.h
>>> @@ -249,6 +249,41 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
>>>  		  __entry->refs, __entry->compress_type)
>>>  );
>>>  
>>> +TRACE_EVENT(btrfs_handle_em_exist,
>>> +
>>> +	TP_PROTO(const struct extent_map *existing, const struct extent_map *map, u64 start, u64 len),
>>> +
>>> +	TP_ARGS(existing, map, start, len),
>>> +
>>> +	TP_STRUCT__entry(
>>> +		__field(	u64,  e_start		)
>>> +		__field(	u64,  e_len		)
>>> +		__field(	u64,  map_start		)
>>> +		__field(	u64,  map_len		)
>>> +		__field(	u64,  start		)
>>> +		__field(	u64,  len		)
>>> +	),
>>> +
>>> +	TP_fast_assign(
>>> +		__entry->e_start	= existing->start;
>>> +		__entry->e_len		= existing->len;
>>> +		__entry->map_start	= map->start;
>>> +		__entry->map_len	= map->len;
>>> +		__entry->start		= start;
>>> +		__entry->len		= len;
>>> +	),
>>> +
>>> +	TP_printk("start=%llu len=%llu "
>>> +		  "existing(start=%llu len=%llu) "
>>> +		  "em(start=%llu len=%llu)",
>>> +		  (unsigned long long)__entry->start,
>>> +		  (unsigned long long)__entry->len,
>>> +		  (unsigned long long)__entry->e_start,
>>> +		  (unsigned long long)__entry->e_len,
>>> +		  (unsigned long long)__entry->map_start,
>>> +		  (unsigned long long)__entry->map_len)
>>> +);
>>> +
>>>  /* file extent item */
>>>  DECLARE_EVENT_CLASS(btrfs__file_extent_item_regular,
>>>  
>>>
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 09/10] Btrfs: add tracepoint for em's EEXIST case
  2017-12-23  7:39       ` Nikolay Borisov
@ 2018-01-02 18:02         ` Liu Bo
  0 siblings, 0 replies; 24+ messages in thread
From: Liu Bo @ 2018-01-02 18:02 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Sat, Dec 23, 2017 at 09:39:00AM +0200, Nikolay Borisov wrote:
> 
> 
> On 22.12.2017 21:07, Liu Bo wrote:
> > On Fri, Dec 22, 2017 at 10:56:31AM +0200, Nikolay Borisov wrote:
> >>
> >>
> >> On 22.12.2017 00:42, Liu Bo wrote:
> >>> This is adding a tracepoint 'btrfs_handle_em_exist' to help debug the
> >>> subtle bugs around merge_extent_mapping.
> >>
> >> In the next patch you are already making the function which takes all
> >> these values noinline, meaning you can attach a kprobe so you can
> >> interrogate the args via systemtap,perf probe or even bpf. So I'd rather
> >> not add this tracepoint since the general sentiment seems to be that
> >> tracepoints are ABI and so have to be maintained.
> >>
> > 
> > The following patch of noinline function is inside the else {} branch,
> > so kprobe would give us something back only when it runs to else{}.
> > 
> > Since subtle bugs may lie in this area, a simple tracepoint like this
> > can help us understand them more efficiently.
> 
> In that case if you make search_extent_mapping and put a kprobe/ret
> kprobe there you should be able to gain the same information, we should
> really careful when adding trace events...

Correct me if I'm wrong, using 'perf probe' can only return @retval,
it seems that we're not able to reference retval's number like
retval->var or retval.val.

As @exisiting is returned by search_extent_mapping(), I'm fine without
this tracepoint if 'perf probe' can do the referencing for retval.

Thanks,

-liubo
> > 
> >>>
> >>> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> >>> ---
> >>>  fs/btrfs/extent_map.c        |  1 +
> >>>  include/trace/events/btrfs.h | 35 +++++++++++++++++++++++++++++++++++
> >>>  2 files changed, 36 insertions(+)
> >>>
> >>> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> >>> index a8b7e24..40e4d30 100644
> >>> --- a/fs/btrfs/extent_map.c
> >>> +++ b/fs/btrfs/extent_map.c
> >>> @@ -539,6 +539,7 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
> >>>  		ret = 0;
> >>>  
> >>>  		existing = search_extent_mapping(em_tree, start, len);
> >>> +		trace_btrfs_handle_em_exist(existing, em, start, len);
> >>>  
> >>>  		/*
> >>>  		 * existing will always be non-NULL, since there must be
> >>> diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
> >>> index 4342a32..b7ffcf7 100644
> >>> --- a/include/trace/events/btrfs.h
> >>> +++ b/include/trace/events/btrfs.h
> >>> @@ -249,6 +249,41 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
> >>>  		  __entry->refs, __entry->compress_type)
> >>>  );
> >>>  
> >>> +TRACE_EVENT(btrfs_handle_em_exist,
> >>> +
> >>> +	TP_PROTO(const struct extent_map *existing, const struct extent_map *map, u64 start, u64 len),
> >>> +
> >>> +	TP_ARGS(existing, map, start, len),
> >>> +
> >>> +	TP_STRUCT__entry(
> >>> +		__field(	u64,  e_start		)
> >>> +		__field(	u64,  e_len		)
> >>> +		__field(	u64,  map_start		)
> >>> +		__field(	u64,  map_len		)
> >>> +		__field(	u64,  start		)
> >>> +		__field(	u64,  len		)
> >>> +	),
> >>> +
> >>> +	TP_fast_assign(
> >>> +		__entry->e_start	= existing->start;
> >>> +		__entry->e_len		= existing->len;
> >>> +		__entry->map_start	= map->start;
> >>> +		__entry->map_len	= map->len;
> >>> +		__entry->start		= start;
> >>> +		__entry->len		= len;
> >>> +	),
> >>> +
> >>> +	TP_printk("start=%llu len=%llu "
> >>> +		  "existing(start=%llu len=%llu) "
> >>> +		  "em(start=%llu len=%llu)",
> >>> +		  (unsigned long long)__entry->start,
> >>> +		  (unsigned long long)__entry->len,
> >>> +		  (unsigned long long)__entry->e_start,
> >>> +		  (unsigned long long)__entry->e_len,
> >>> +		  (unsigned long long)__entry->map_start,
> >>> +		  (unsigned long long)__entry->map_len)
> >>> +);
> >>> +
> >>>  /* file extent item */
> >>>  DECLARE_EVENT_CLASS(btrfs__file_extent_item_regular,
> >>>  
> >>>
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-01-02 19:05 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
2017-12-21 22:42 ` [PATCH 01/10] Btrfs: add helper for em merge logic Liu Bo
2017-12-22  7:23   ` Nikolay Borisov
2017-12-22 18:45     ` Liu Bo
2017-12-21 22:42 ` [PATCH 02/10] Btrfs: move extent map specific code to extent_map.c Liu Bo
2017-12-21 22:42 ` [PATCH 03/10] Btrfs: add extent map selftests Liu Bo
2017-12-22  7:42   ` Nikolay Borisov
2017-12-22 18:53     ` Liu Bo
2017-12-21 22:42 ` [PATCH 04/10] Btrfs: extent map selftest: buffered write vs dio read Liu Bo
2017-12-22  7:51   ` Nikolay Borisov
2017-12-22 18:58     ` Liu Bo
2017-12-21 22:42 ` [PATCH 05/10] Btrfs: extent map selftest: dio " Liu Bo
2017-12-21 22:42 ` [PATCH 06/10] Btrfs: fix incorrect block_len in merge_extent_mapping Liu Bo
2017-12-21 22:42 ` [PATCH 07/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent Liu Bo
2017-12-22 12:10   ` Nikolay Borisov
2017-12-22 19:19     ` Liu Bo
2017-12-21 22:42 ` [PATCH 08/10] Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping Liu Bo
2017-12-22  8:52   ` Nikolay Borisov
2017-12-21 22:42 ` [PATCH 09/10] Btrfs: add tracepoint for em's EEXIST case Liu Bo
2017-12-22  8:56   ` Nikolay Borisov
2017-12-22 19:07     ` Liu Bo
2017-12-23  7:39       ` Nikolay Borisov
2018-01-02 18:02         ` Liu Bo
2017-12-21 22:42 ` [PATCH 10/10] Btrfs: noinline merge_extent_mapping Liu Bo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.