All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH 07/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent
Date: Thu, 21 Dec 2017 15:42:53 -0700	[thread overview]
Message-ID: <20171221224256.18196-8-bo.li.liu@oracle.com> (raw)
In-Reply-To: <20171221224256.18196-1-bo.li.liu@oracle.com>

This fixes a corner case that is caused by a race of dio write vs dio
read/write.

dio write:
[0, 32k) -> [0, 8k) + [8k, 32k)

dio read/write:

While get_extent() with [0, 4k), [0, 8k) is found as existing em, even
though start == existing->start, em is [0, 32k),
extent_map_end(em) > extent_map_end(existing),
then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
(with a length 0), and get_extent ends up returning -EEXIST, and dio
read/write will get -EEXIST which is confusing applications.

Here I concluded all the possible situations,
1) start < existing->start

            +-----------+em+-----------+
+--prev---+ |     +-------------+      |
|         | |     |             |      |
+---------+ +     +---+existing++      ++
                +
                |
                +
             start

2) start == existing->start

      +------------em------------+
      |     +-------------+      |
      |     |             |      |
      +     +----existing-+      +
            |
            |
            +
         start

3) start > existing->start && start < (existing->start + existing->len)

      +------------em------------+
      |     +-------------+      |
      |     |             |      |
      +     +----existing-+      +
               |
               |
               +
             start

4) start >= (existing->start + existing->len)

+-----------+em+-----------+
|     +-------------+      | +--next---+
|     |             |      | |         |
+     +---+existing++      + +---------+
                      +
                      |
                      +
                   start

After going thru the above case by case, it turns out that if start is
within existing em (front inclusive), then the existing em should be
returned, otherwise, we try our best to merge candidate em with
sibling ems to form a larger em.

Reported-by: David Vallender <david.vallender@landmark.co.uk>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_map.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 6653b08..d386cfb 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -483,7 +483,7 @@ static struct extent_map *prev_extent_map(struct extent_map *em)
 static int merge_extent_mapping(struct extent_map_tree *em_tree,
 				struct extent_map *existing,
 				struct extent_map *em,
-				u64 map_start)
+				u64 map_start, u64 map_len)
 {
 	struct extent_map *prev;
 	struct extent_map *next;
@@ -496,9 +496,13 @@ static int merge_extent_mapping(struct extent_map_tree *em_tree,
 	if (existing->start > map_start) {
 		next = existing;
 		prev = prev_extent_map(next);
+		if (prev)
+			ASSERT(extent_map_end(prev) <= map_start);
 	} else {
 		prev = existing;
 		next = next_extent_map(prev);
+		if (next)
+			ASSERT(map_start + map_len <= next->start);
 	}
 
 	start = prev ? extent_map_end(prev) : em->start;
@@ -540,35 +544,26 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
 		 * existing will always be non-NULL, since there must be
 		 * extent causing the -EEXIST.
 		 */
-		if (existing->start == em->start &&
-		    extent_map_end(existing) >= extent_map_end(em) &&
-		    em->block_start == existing->block_start) {
-			/*
-			 * The existing extent map already encompasses the
-			 * entire extent map we tried to add.
-			 */
+		if (start >= existing->start &&
+		    start < extent_map_end(existing)) {
 			free_extent_map(em);
 			*em_in = existing;
 			ret = 0;
-		} else if (start >= extent_map_end(existing) ||
-		    start <= existing->start) {
+		} else {
 			/*
 			 * The existing extent map is the one nearest to
 			 * the [start, start + len) range which overlaps
 			 */
 			ret = merge_extent_mapping(em_tree, existing,
-						   em, start);
+						   em, start, len);
 			free_extent_map(existing);
 			if (ret) {
 				free_extent_map(em);
 				*em_in = NULL;
 			}
-		} else {
-			free_extent_map(em);
-			*em_in = existing;
-			ret = 0;
 		}
 	}
+
 	ASSERT(ret == 0 || ret == -EEXIST);
 	return ret;
 }
-- 
2.9.4


  parent reply	other threads:[~2017-12-21 23:45 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
2017-12-21 22:42 ` [PATCH 01/10] Btrfs: add helper for em merge logic Liu Bo
2017-12-22  7:23   ` Nikolay Borisov
2017-12-22 18:45     ` Liu Bo
2017-12-21 22:42 ` [PATCH 02/10] Btrfs: move extent map specific code to extent_map.c Liu Bo
2017-12-21 22:42 ` [PATCH 03/10] Btrfs: add extent map selftests Liu Bo
2017-12-22  7:42   ` Nikolay Borisov
2017-12-22 18:53     ` Liu Bo
2017-12-21 22:42 ` [PATCH 04/10] Btrfs: extent map selftest: buffered write vs dio read Liu Bo
2017-12-22  7:51   ` Nikolay Borisov
2017-12-22 18:58     ` Liu Bo
2017-12-21 22:42 ` [PATCH 05/10] Btrfs: extent map selftest: dio " Liu Bo
2017-12-21 22:42 ` [PATCH 06/10] Btrfs: fix incorrect block_len in merge_extent_mapping Liu Bo
2017-12-21 22:42 ` Liu Bo [this message]
2017-12-22 12:10   ` [PATCH 07/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent Nikolay Borisov
2017-12-22 19:19     ` Liu Bo
2017-12-21 22:42 ` [PATCH 08/10] Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping Liu Bo
2017-12-22  8:52   ` Nikolay Borisov
2017-12-21 22:42 ` [PATCH 09/10] Btrfs: add tracepoint for em's EEXIST case Liu Bo
2017-12-22  8:56   ` Nikolay Borisov
2017-12-22 19:07     ` Liu Bo
2017-12-23  7:39       ` Nikolay Borisov
2018-01-02 18:02         ` Liu Bo
2017-12-21 22:42 ` [PATCH 10/10] Btrfs: noinline merge_extent_mapping Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171221224256.18196-8-bo.li.liu@oracle.com \
    --to=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.