All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] update bytes_may_use timely to avoid false ENOSPC issue
@ 2016-07-25  7:51 Wang Xiaoguang
  2016-07-25  7:51 ` [PATCH v2 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Wang Xiaoguang @ 2016-07-25  7:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jbacik, holger

Currently in btrfs, for data space reservation, it does not update
bytes_may_use in btrfs_update_reserved_bytes() and the decrease operation
will be delayed to be done in extent_clear_unlock_delalloc(), for
fallocate(2), decrease operation is even delayed to be done in end
of btrfs_fallocate(), which is too late. Obviously such delay will
cause unnecessary pressure to enospc system.

So in this patch set, we will remove RESERVE_FREE, RESERVE_ALLOC and
RESERVE_ALLOC_NO_ACCOUNT, and always update bytes_may_use timely.

I already have sent a fstests test case for this issue, and I can send
[Patch 4/4] as a independent patch, but its bug also can be revealed by
the same reproduce scripts, so I include it here.

Changelog:
v2:
  Fix a trace point issue.

Wang Xiaoguang (4):
  btrfs: use correct offset for reloc_inode in
    prealloc_file_extent_cluster()
  btrfs: divide btrfs_update_reserved_bytes() into two functions
  btrfs: update btrfs_space_info's bytes_may_use timely
  btrfs: should block unused block groups deletion work when allocating
    data space

 fs/btrfs/ctree.h       |   3 +-
 fs/btrfs/disk-io.c     |   1 +
 fs/btrfs/extent-tree.c | 171 ++++++++++++++++++++++++++++---------------------
 fs/btrfs/extent_io.h   |   1 +
 fs/btrfs/file.c        |  26 ++++----
 fs/btrfs/inode-map.c   |   3 +-
 fs/btrfs/inode.c       |  37 ++++++++---
 fs/btrfs/relocation.c  |  17 +++--
 8 files changed, 159 insertions(+), 100 deletions(-)

-- 
2.9.0




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
  2016-07-25  7:51 [PATCH v2 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
@ 2016-07-25  7:51 ` Wang Xiaoguang
  2016-07-25  7:51 ` [PATCH v2 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Wang Xiaoguang @ 2016-07-25  7:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jbacik, holger

In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
which indeed are extent's bytenr. The correct value should be
cluster->[start|end] minus block group's start bytenr.

start bytenr   cluster->start
|              |     extent      |   extent   | ...| extent |
|----------------------------------------------------------------|
|                block group reloc_inode                         |

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/relocation.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0477dca..a0de885 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3030,12 +3030,14 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	u64 num_bytes;
 	int nr = 0;
 	int ret = 0;
+	u64 prealloc_start = cluster->start - offset;
+	u64 prealloc_end = cluster->end - offset;
 
 	BUG_ON(cluster->start != cluster->boundary[0]);
 	inode_lock(inode);
 
-	ret = btrfs_check_data_free_space(inode, cluster->start,
-					  cluster->end + 1 - cluster->start);
+	ret = btrfs_check_data_free_space(inode, prealloc_start,
+					  prealloc_end + 1 - prealloc_start);
 	if (ret)
 		goto out;
 
@@ -3056,8 +3058,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
 			break;
 		nr++;
 	}
-	btrfs_free_reserved_data_space(inode, cluster->start,
-				       cluster->end + 1 - cluster->start);
+	btrfs_free_reserved_data_space(inode, prealloc_start,
+				       prealloc_end + 1 - prealloc_start);
 out:
 	inode_unlock(inode);
 	return ret;
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions
  2016-07-25  7:51 [PATCH v2 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
  2016-07-25  7:51 ` [PATCH v2 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
@ 2016-07-25  7:51 ` Wang Xiaoguang
  2016-07-25  7:51 ` [PATCH v2 3/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
  2016-07-25  7:51 ` [PATCH v2 4/4] btrfs: should block unused block groups deletion work when allocating data space Wang Xiaoguang
  3 siblings, 0 replies; 13+ messages in thread
From: Wang Xiaoguang @ 2016-07-25  7:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jbacik, holger

This patch divides btrfs_update_reserved_bytes() into
btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes(), and
next patch will extend btrfs_add_reserved_bytes()to fix some
false ENOSPC error, please see later patch for detailed info.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/extent-tree.c | 97 +++++++++++++++++++++++++++++---------------------
 1 file changed, 57 insertions(+), 40 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 82b912a..8eaac39 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -104,9 +104,10 @@ static int find_next_key(struct btrfs_path *path, int level,
 			 struct btrfs_key *key);
 static void dump_space_info(struct btrfs_space_info *info, u64 bytes,
 			    int dump_block_groups);
-static int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache,
-				       u64 num_bytes, int reserve,
-				       int delalloc);
+static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
+				    u64 num_bytes, int reserve, int delalloc);
+static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache,
+				     u64 num_bytes, int delalloc);
 static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv,
 			       u64 num_bytes);
 int btrfs_pin_extent(struct btrfs_root *root,
@@ -6297,19 +6298,14 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
 }
 
 /**
- * btrfs_update_reserved_bytes - update the block_group and space info counters
+ * btrfs_add_reserved_bytes - update the block_group and space info counters
  * @cache:	The cache we are manipulating
  * @num_bytes:	The number of bytes in question
  * @reserve:	One of the reservation enums
  * @delalloc:   The blocks are allocated for the delalloc write
  *
- * This is called by the allocator when it reserves space, or by somebody who is
- * freeing space that was never actually used on disk.  For example if you
- * reserve some space for a new leaf in transaction A and before transaction A
- * commits you free that leaf, you call this with reserve set to 0 in order to
- * clear the reservation.
- *
- * Metadata reservations should be called with RESERVE_ALLOC so we do the proper
+ * This is called by the allocator when it reserves space. Metadata
+ * reservations should be called with RESERVE_ALLOC so we do the proper
  * ENOSPC accounting.  For data we handle the reservation through clearing the
  * delalloc bits in the io_tree.  We have to do this since we could end up
  * allocating less disk space for the amount of data we have reserved in the
@@ -6319,44 +6315,65 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
  * make the reservation and return -EAGAIN, otherwise this function always
  * succeeds.
  */
-static int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache,
-				       u64 num_bytes, int reserve, int delalloc)
+static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
+				    u64 num_bytes, int reserve, int delalloc)
 {
 	struct btrfs_space_info *space_info = cache->space_info;
 	int ret = 0;
 
 	spin_lock(&space_info->lock);
 	spin_lock(&cache->lock);
-	if (reserve != RESERVE_FREE) {
-		if (cache->ro) {
-			ret = -EAGAIN;
-		} else {
-			cache->reserved += num_bytes;
-			space_info->bytes_reserved += num_bytes;
-			if (reserve == RESERVE_ALLOC) {
-				trace_btrfs_space_reservation(cache->fs_info,
-						"space_info", space_info->flags,
-						num_bytes, 0);
-				space_info->bytes_may_use -= num_bytes;
-			}
-
-			if (delalloc)
-				cache->delalloc_bytes += num_bytes;
-		}
+	if (cache->ro) {
+		ret = -EAGAIN;
 	} else {
-		if (cache->ro)
-			space_info->bytes_readonly += num_bytes;
-		cache->reserved -= num_bytes;
-		space_info->bytes_reserved -= num_bytes;
+		cache->reserved += num_bytes;
+		space_info->bytes_reserved += num_bytes;
+		if (reserve == RESERVE_ALLOC) {
+			trace_btrfs_space_reservation(cache->fs_info,
+					"space_info", space_info->flags,
+					num_bytes, 0);
+			space_info->bytes_may_use -= num_bytes;
+		}
 
 		if (delalloc)
-			cache->delalloc_bytes -= num_bytes;
+			cache->delalloc_bytes += num_bytes;
 	}
 	spin_unlock(&cache->lock);
 	spin_unlock(&space_info->lock);
 	return ret;
 }
 
+/**
+ * btrfs_free_reserved_bytes - update the block_group and space info counters
+ * @cache:      The cache we are manipulating
+ * @num_bytes:  The number of bytes in question
+ * @delalloc:   The blocks are allocated for the delalloc write
+ *
+ * This is called by somebody who is freeing space that was never actually used
+ * on disk.  For example if you reserve some space for a new leaf in transaction
+ * A and before transaction A commits you free that leaf, you call this with
+ * reserve set to 0 in order to clear the reservation.
+ */
+
+static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache,
+				     u64 num_bytes, int delalloc)
+{
+	struct btrfs_space_info *space_info = cache->space_info;
+	int ret = 0;
+
+	spin_lock(&space_info->lock);
+	spin_lock(&cache->lock);
+	if (cache->ro)
+		space_info->bytes_readonly += num_bytes;
+	cache->reserved -= num_bytes;
+	space_info->bytes_reserved -= num_bytes;
+
+	if (delalloc)
+		cache->delalloc_bytes -= num_bytes;
+	spin_unlock(&cache->lock);
+	spin_unlock(&space_info->lock);
+	return ret;
+}
 void btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans,
 				struct btrfs_root *root)
 {
@@ -6976,7 +6993,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 		WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags));
 
 		btrfs_add_free_space(cache, buf->start, buf->len);
-		btrfs_update_reserved_bytes(cache, buf->len, RESERVE_FREE, 0);
+		btrfs_free_reserved_bytes(cache, buf->len, 0);
 		btrfs_put_block_group(cache);
 		trace_btrfs_reserved_extent_free(root, buf->start, buf->len);
 		pin = 0;
@@ -7548,8 +7565,8 @@ checks:
 					     search_start - offset);
 		BUG_ON(offset > search_start);
 
-		ret = btrfs_update_reserved_bytes(block_group, num_bytes,
-						  alloc_type, delalloc);
+		ret = btrfs_add_reserved_bytes(block_group, num_bytes,
+				alloc_type, delalloc);
 		if (ret == -EAGAIN) {
 			btrfs_add_free_space(block_group, offset, num_bytes);
 			goto loop;
@@ -7781,7 +7798,7 @@ static int __btrfs_free_reserved_extent(struct btrfs_root *root,
 		if (btrfs_test_opt(root, DISCARD))
 			ret = btrfs_discard_extent(root, start, len, NULL);
 		btrfs_add_free_space(cache, start, len);
-		btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, delalloc);
+		btrfs_free_reserved_bytes(cache, len, delalloc);
 	}
 
 	btrfs_put_block_group(cache);
@@ -8011,8 +8028,8 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
 	if (!block_group)
 		return -EINVAL;
 
-	ret = btrfs_update_reserved_bytes(block_group, ins->offset,
-					  RESERVE_ALLOC_NO_ACCOUNT, 0);
+	ret = btrfs_add_reserved_bytes(block_group, ins->offset,
+			RESERVE_ALLOC_NO_ACCOUNT, 0);
 	BUG_ON(ret); /* logic error */
 	ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
 					 0, owner, offset, ins, 1);
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/4] btrfs: update btrfs_space_info's bytes_may_use timely
  2016-07-25  7:51 [PATCH v2 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
  2016-07-25  7:51 ` [PATCH v2 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
  2016-07-25  7:51 ` [PATCH v2 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
@ 2016-07-25  7:51 ` Wang Xiaoguang
  2016-07-25 13:33   ` Josef Bacik
  2016-07-25 18:10   ` Josef Bacik
  2016-07-25  7:51 ` [PATCH v2 4/4] btrfs: should block unused block groups deletion work when allocating data space Wang Xiaoguang
  3 siblings, 2 replies; 13+ messages in thread
From: Wang Xiaoguang @ 2016-07-25  7:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jbacik, holger

This patch can fix some false ENOSPC errors, below test script can
reproduce one false ENOSPC error:
	#!/bin/bash
	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
	dev=$(losetup --show -f fs.img)
	mkfs.btrfs -f -M $dev
	mkdir /tmp/mntpoint
	mount $dev /tmp/mntpoint
	cd /tmp/mntpoint
	xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile

Above script will fail for ENOSPC reason, but indeed fs still has free
space to satisfy this request. Please see call graph:
btrfs_fallocate()
|-> btrfs_alloc_data_chunk_ondemand()
|   bytes_may_use += 64M
|-> btrfs_prealloc_file_range()
    |-> btrfs_reserve_extent()
        |-> btrfs_add_reserved_bytes()
        |   alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
        |   change bytes_may_use, and bytes_reserved += 64M. Now
        |   bytes_may_use + bytes_reserved == 128M, which is greater
        |   than btrfs_space_info's total_bytes, false enospc occurs.
        |   Note, the bytes_may_use decrease operation will be done in
        |   end of btrfs_fallocate(), which is too late.

Here is another simple case for buffered write:
                    CPU 1              |              CPU 2
                                       |
|-> cow_file_range()                   |-> __btrfs_buffered_write()
    |-> btrfs_reserve_extent()         |   |
    |                                  |   |
    |                                  |   |
    |    .....                         |   |-> btrfs_check_data_free_space()
    |                                  |
    |                                  |
    |-> extent_clear_unlock_delalloc() |

In CPU 1, btrfs_reserve_extent()->find_free_extent()->
btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
operation will be delayed to be done in extent_clear_unlock_delalloc().
Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
btrfs_check_data_free_space() tries to reserve 100MB data space.
If
	100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
		data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
		data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
btrfs_check_data_free_space() will try to allcate new data chunk or call
btrfs_start_delalloc_roots(), or commit current transaction in order to
reserve some free space, obviously a lot of work. But indeed it's not
necessary as long as decreasing bytes_may_use timely, we still have
free space, decreasing 128M from bytes_may_use.

To fix this issue, this patch chooses to update bytes_may_use for both
data and metadata in btrfs_add_reserved_bytes(). For compress path, real
extent length may not be equal to file content length, so introduce a
ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
file content length. Then compress path can update bytes_may_use
correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
and RESERVE_FREE.

As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
PREALLOC, we also need to update bytes_may_use, but can not pass
EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so
here we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
to update btrfs_space_info's bytes_may_use.

Meanwhile __btrfs_prealloc_file_range() will call
btrfs_free_reserved_data_space() internally for both sucessful and failed
path, btrfs_prealloc_file_range()'s callers does not need to call
btrfs_free_reserved_data_space() any more.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  2 +-
 fs/btrfs/extent-tree.c | 56 +++++++++++++++++---------------------------------
 fs/btrfs/extent_io.h   |  1 +
 fs/btrfs/file.c        | 26 +++++++++++++----------
 fs/btrfs/inode-map.c   |  3 +--
 fs/btrfs/inode.c       | 37 ++++++++++++++++++++++++---------
 fs/btrfs/relocation.c  | 11 ++++++++--
 7 files changed, 73 insertions(+), 63 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4274a7b..7eb2913 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2556,7 +2556,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
 				   struct btrfs_root *root,
 				   u64 root_objectid, u64 owner, u64 offset,
 				   struct btrfs_key *ins);
-int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes,
+int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 num_bytes,
 			 u64 min_alloc_size, u64 empty_size, u64 hint_byte,
 			 struct btrfs_key *ins, int is_data, int delalloc);
 int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8eaac39..df8d756 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -60,21 +60,6 @@ enum {
 	CHUNK_ALLOC_FORCE = 2,
 };
 
-/*
- * Control how reservations are dealt with.
- *
- * RESERVE_FREE - freeing a reservation.
- * RESERVE_ALLOC - allocating space and we need to update bytes_may_use for
- *   ENOSPC accounting
- * RESERVE_ALLOC_NO_ACCOUNT - allocating space and we should not update
- *   bytes_may_use as the ENOSPC accounting is done elsewhere
- */
-enum {
-	RESERVE_FREE = 0,
-	RESERVE_ALLOC = 1,
-	RESERVE_ALLOC_NO_ACCOUNT = 2,
-};
-
 static int update_block_group(struct btrfs_trans_handle *trans,
 			      struct btrfs_root *root, u64 bytenr,
 			      u64 num_bytes, int alloc);
@@ -105,7 +90,7 @@ static int find_next_key(struct btrfs_path *path, int level,
 static void dump_space_info(struct btrfs_space_info *info, u64 bytes,
 			    int dump_block_groups);
 static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
-				    u64 num_bytes, int reserve, int delalloc);
+				    u64 ram_bytes, u64 num_bytes, int delalloc);
 static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache,
 				     u64 num_bytes, int delalloc);
 static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv,
@@ -3491,7 +3476,6 @@ again:
 		dcs = BTRFS_DC_SETUP;
 	else if (ret == -ENOSPC)
 		set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
-	btrfs_free_reserved_data_space(inode, 0, num_pages);
 
 out_put:
 	iput(inode);
@@ -6300,8 +6284,9 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
 /**
  * btrfs_add_reserved_bytes - update the block_group and space info counters
  * @cache:	The cache we are manipulating
+ * @ram_bytes:  The number of bytes of file content, and will be same to
+ *              @num_bytes except for the compress path.
  * @num_bytes:	The number of bytes in question
- * @reserve:	One of the reservation enums
  * @delalloc:   The blocks are allocated for the delalloc write
  *
  * This is called by the allocator when it reserves space. Metadata
@@ -6316,7 +6301,7 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
  * succeeds.
  */
 static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
-				    u64 num_bytes, int reserve, int delalloc)
+				    u64 ram_bytes, u64 num_bytes, int delalloc)
 {
 	struct btrfs_space_info *space_info = cache->space_info;
 	int ret = 0;
@@ -6328,13 +6313,11 @@ static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
 	} else {
 		cache->reserved += num_bytes;
 		space_info->bytes_reserved += num_bytes;
-		if (reserve == RESERVE_ALLOC) {
-			trace_btrfs_space_reservation(cache->fs_info,
-					"space_info", space_info->flags,
-					num_bytes, 0);
-			space_info->bytes_may_use -= num_bytes;
-		}
 
+		trace_btrfs_space_reservation(cache->fs_info,
+				"space_info", space_info->flags,
+				ram_bytes, 0);
+		space_info->bytes_may_use -= ram_bytes;
 		if (delalloc)
 			cache->delalloc_bytes += num_bytes;
 	}
@@ -7218,9 +7201,9 @@ btrfs_release_block_group(struct btrfs_block_group_cache *cache,
  * the free space extent currently.
  */
 static noinline int find_free_extent(struct btrfs_root *orig_root,
-				     u64 num_bytes, u64 empty_size,
-				     u64 hint_byte, struct btrfs_key *ins,
-				     u64 flags, int delalloc)
+				u64 ram_bytes, u64 num_bytes, u64 empty_size,
+				u64 hint_byte, struct btrfs_key *ins,
+				u64 flags, int delalloc)
 {
 	int ret = 0;
 	struct btrfs_root *root = orig_root->fs_info->extent_root;
@@ -7232,8 +7215,6 @@ static noinline int find_free_extent(struct btrfs_root *orig_root,
 	struct btrfs_space_info *space_info;
 	int loop = 0;
 	int index = __get_raid_index(flags);
-	int alloc_type = (flags & BTRFS_BLOCK_GROUP_DATA) ?
-		RESERVE_ALLOC_NO_ACCOUNT : RESERVE_ALLOC;
 	bool failed_cluster_refill = false;
 	bool failed_alloc = false;
 	bool use_cluster = true;
@@ -7565,8 +7546,8 @@ checks:
 					     search_start - offset);
 		BUG_ON(offset > search_start);
 
-		ret = btrfs_add_reserved_bytes(block_group, num_bytes,
-				alloc_type, delalloc);
+		ret = btrfs_add_reserved_bytes(block_group, ram_bytes,
+				num_bytes, delalloc);
 		if (ret == -EAGAIN) {
 			btrfs_add_free_space(block_group, offset, num_bytes);
 			goto loop;
@@ -7739,7 +7720,7 @@ again:
 	up_read(&info->groups_sem);
 }
 
-int btrfs_reserve_extent(struct btrfs_root *root,
+int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes,
 			 u64 num_bytes, u64 min_alloc_size,
 			 u64 empty_size, u64 hint_byte,
 			 struct btrfs_key *ins, int is_data, int delalloc)
@@ -7751,8 +7732,8 @@ int btrfs_reserve_extent(struct btrfs_root *root,
 	flags = btrfs_get_alloc_profile(root, is_data);
 again:
 	WARN_ON(num_bytes < root->sectorsize);
-	ret = find_free_extent(root, num_bytes, empty_size, hint_byte, ins,
-			       flags, delalloc);
+	ret = find_free_extent(root, ram_bytes, num_bytes, empty_size,
+			       hint_byte, ins, flags, delalloc);
 	if (!ret && !is_data) {
 		btrfs_dec_block_group_reservations(root->fs_info,
 						   ins->objectid);
@@ -7761,6 +7742,7 @@ again:
 			num_bytes = min(num_bytes >> 1, ins->offset);
 			num_bytes = round_down(num_bytes, root->sectorsize);
 			num_bytes = max(num_bytes, min_alloc_size);
+			ram_bytes = num_bytes;
 			if (num_bytes == min_alloc_size)
 				final_tried = true;
 			goto again;
@@ -8029,7 +8011,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
 		return -EINVAL;
 
 	ret = btrfs_add_reserved_bytes(block_group, ins->offset,
-			RESERVE_ALLOC_NO_ACCOUNT, 0);
+				       ins->offset, 0);
 	BUG_ON(ret); /* logic error */
 	ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
 					 0, owner, offset, ins, 1);
@@ -8171,7 +8153,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
 	if (IS_ERR(block_rsv))
 		return ERR_CAST(block_rsv);
 
-	ret = btrfs_reserve_extent(root, blocksize, blocksize,
+	ret = btrfs_reserve_extent(root, blocksize, blocksize, blocksize,
 				   empty_size, hint, &ins, 0, 0);
 	if (ret)
 		goto out_unuse;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index c0c1c4f..b52ca5d 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -20,6 +20,7 @@
 #define EXTENT_DAMAGED		(1U << 14)
 #define EXTENT_NORESERVE	(1U << 15)
 #define EXTENT_QGROUP_RESERVED	(1U << 16)
+#define EXTENT_CLEAR_DATA_RESV	(1U << 17)
 #define EXTENT_IOBITS		(EXTENT_LOCKED | EXTENT_WRITEBACK)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 2234e88..b4d9258 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2669,6 +2669,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 
 	alloc_start = round_down(offset, blocksize);
 	alloc_end = round_up(offset + len, blocksize);
+	cur_offset = alloc_start;
 
 	/* Make sure we aren't being give some crap mode */
 	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
@@ -2761,7 +2762,6 @@ static long btrfs_fallocate(struct file *file, int mode,
 
 	/* First, check if we exceed the qgroup limit */
 	INIT_LIST_HEAD(&reserve_list);
-	cur_offset = alloc_start;
 	while (1) {
 		em = btrfs_get_extent(inode, NULL, 0, cur_offset,
 				      alloc_end - cur_offset, 0);
@@ -2788,6 +2788,14 @@ static long btrfs_fallocate(struct file *file, int mode,
 					last_byte - cur_offset);
 			if (ret < 0)
 				break;
+		} else {
+			/*
+			 * Do not need to reserve unwritten extent for this
+			 * range, free reserved data space first, otherwise
+			 * it'll result in false ENOSPC error.
+			 */
+			btrfs_free_reserved_data_space(inode, cur_offset,
+				last_byte - cur_offset);
 		}
 		free_extent_map(em);
 		cur_offset = last_byte;
@@ -2805,6 +2813,9 @@ static long btrfs_fallocate(struct file *file, int mode,
 					range->start,
 					range->len, 1 << inode->i_blkbits,
 					offset + len, &alloc_hint);
+		else
+			btrfs_free_reserved_data_space(inode, range->start,
+						       range->len);
 		list_del(&range->list);
 		kfree(range);
 	}
@@ -2839,18 +2850,11 @@ out_unlock:
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
 			     &cached_state, GFP_KERNEL);
 out:
-	/*
-	 * As we waited the extent range, the data_rsv_map must be empty
-	 * in the range, as written data range will be released from it.
-	 * And for prealloacted extent, it will also be released when
-	 * its metadata is written.
-	 * So this is completely used as cleanup.
-	 */
-	btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start);
 	inode_unlock(inode);
 	/* Let go of our reservation. */
-	btrfs_free_reserved_data_space(inode, alloc_start,
-				       alloc_end - alloc_start);
+	if (ret != 0)
+		btrfs_free_reserved_data_space(inode, alloc_start,
+				       alloc_end - cur_offset);
 	return ret;
 }
 
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 70107f7..e59e7d6 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -495,10 +495,9 @@ again:
 	ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
 					      prealloc, prealloc, &alloc_hint);
 	if (ret) {
-		btrfs_delalloc_release_space(inode, 0, prealloc);
+		btrfs_delalloc_release_metadata(inode, prealloc);
 		goto out_put;
 	}
-	btrfs_free_reserved_data_space(inode, 0, prealloc);
 
 	ret = btrfs_write_out_ino_cache(root, trans, path, inode);
 out_put:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4421954..e0cee59 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -564,6 +564,8 @@ cont:
 						     PAGE_SET_WRITEBACK |
 						     page_error_op |
 						     PAGE_END_WRITEBACK);
+			btrfs_free_reserved_data_space_noquota(inode, start,
+						end - start + 1);
 			goto free_pages_out;
 		}
 	}
@@ -739,7 +741,7 @@ retry:
 		lock_extent(io_tree, async_extent->start,
 			    async_extent->start + async_extent->ram_size - 1);
 
-		ret = btrfs_reserve_extent(root,
+		ret = btrfs_reserve_extent(root, async_extent->ram_size,
 					   async_extent->compressed_size,
 					   async_extent->compressed_size,
 					   0, alloc_hint, &ins, 1, 1);
@@ -966,7 +968,8 @@ static noinline int cow_file_range(struct inode *inode,
 				     EXTENT_DEFRAG, PAGE_UNLOCK |
 				     PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK |
 				     PAGE_END_WRITEBACK);
-
+			btrfs_free_reserved_data_space_noquota(inode, start,
+						end - start + 1);
 			*nr_written = *nr_written +
 			     (end - start + PAGE_SIZE) / PAGE_SIZE;
 			*page_started = 1;
@@ -986,7 +989,7 @@ static noinline int cow_file_range(struct inode *inode,
 		unsigned long op;
 
 		cur_alloc_size = disk_num_bytes;
-		ret = btrfs_reserve_extent(root, cur_alloc_size,
+		ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size,
 					   root->sectorsize, 0, alloc_hint,
 					   &ins, 1, 1);
 		if (ret < 0)
@@ -1485,8 +1488,10 @@ out_check:
 		extent_clear_unlock_delalloc(inode, cur_offset,
 					     cur_offset + num_bytes - 1,
 					     locked_page, EXTENT_LOCKED |
-					     EXTENT_DELALLOC, PAGE_UNLOCK |
-					     PAGE_SET_PRIVATE2);
+					     EXTENT_DELALLOC |
+					     EXTENT_CLEAR_DATA_RESV,
+					     PAGE_UNLOCK | PAGE_SET_PRIVATE2);
+
 		if (!nolock && nocow)
 			btrfs_end_write_no_snapshoting(root);
 		cur_offset = extent_end;
@@ -1803,7 +1808,9 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 			return;
 
 		if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID
-		    && do_list && !(state->state & EXTENT_NORESERVE))
+		    && do_list && !(state->state & EXTENT_NORESERVE)
+		    && (*bits & (EXTENT_DO_ACCOUNTING |
+		    EXTENT_CLEAR_DATA_RESV)))
 			btrfs_free_reserved_data_space_noquota(inode,
 					state->start, len);
 
@@ -7214,7 +7221,7 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
 	int ret;
 
 	alloc_hint = get_extent_allocation_hint(inode, start, len);
-	ret = btrfs_reserve_extent(root, len, root->sectorsize, 0,
+	ret = btrfs_reserve_extent(root, len, len, root->sectorsize, 0,
 				   alloc_hint, &ins, 1, 1);
 	if (ret)
 		return ERR_PTR(ret);
@@ -7714,6 +7721,13 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 				ret = PTR_ERR(em2);
 				goto unlock_err;
 			}
+			/*
+			 * For inode marked NODATACOW or extent marked PREALLOC,
+			 * use the existing or preallocated extent, so does not
+			 * need to adjust btrfs_space_info's bytes_may_use.
+			 */
+			btrfs_free_reserved_data_space_noquota(inode,
+					start, len);
 			goto unlock;
 		}
 	}
@@ -7748,7 +7762,6 @@ unlock:
 			i_size_write(inode, start + len);
 
 		adjust_dio_outstanding_extents(inode, dio_data, len);
-		btrfs_free_reserved_data_space(inode, start, len);
 		WARN_ON(dio_data->reserve < len);
 		dio_data->reserve -= len;
 		dio_data->unsubmitted_oe_range_end = start + len;
@@ -10269,6 +10282,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 	u64 last_alloc = (u64)-1;
 	int ret = 0;
 	bool own_trans = true;
+	u64 end = start + num_bytes - 1;
 
 	if (trans)
 		own_trans = false;
@@ -10290,8 +10304,8 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 		 * sized chunks.
 		 */
 		cur_bytes = min(cur_bytes, last_alloc);
-		ret = btrfs_reserve_extent(root, cur_bytes, min_size, 0,
-					   *alloc_hint, &ins, 1, 0);
+		ret = btrfs_reserve_extent(root, cur_bytes, cur_bytes,
+				min_size, 0, *alloc_hint, &ins, 1, 0);
 		if (ret) {
 			if (own_trans)
 				btrfs_end_transaction(trans, root);
@@ -10377,6 +10391,9 @@ next:
 		if (own_trans)
 			btrfs_end_transaction(trans, root);
 	}
+	if (cur_offset < end)
+		btrfs_free_reserved_data_space(inode, cur_offset,
+			end - cur_offset + 1);
 	return ret;
 }
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index a0de885..f39c4db 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3032,6 +3032,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	int ret = 0;
 	u64 prealloc_start = cluster->start - offset;
 	u64 prealloc_end = cluster->end - offset;
+	u64 cur_offset;
 
 	BUG_ON(cluster->start != cluster->boundary[0]);
 	inode_lock(inode);
@@ -3041,6 +3042,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	if (ret)
 		goto out;
 
+	cur_offset = prealloc_start;
 	while (nr < cluster->nr) {
 		start = cluster->boundary[nr] - offset;
 		if (nr + 1 < cluster->nr)
@@ -3050,16 +3052,21 @@ int prealloc_file_extent_cluster(struct inode *inode,
 
 		lock_extent(&BTRFS_I(inode)->io_tree, start, end);
 		num_bytes = end + 1 - start;
+		if (cur_offset < start)
+			btrfs_free_reserved_data_space(inode, cur_offset,
+					start - cur_offset);
 		ret = btrfs_prealloc_file_range(inode, 0, start,
 						num_bytes, num_bytes,
 						end + 1, &alloc_hint);
+		cur_offset = end + 1;
 		unlock_extent(&BTRFS_I(inode)->io_tree, start, end);
 		if (ret)
 			break;
 		nr++;
 	}
-	btrfs_free_reserved_data_space(inode, prealloc_start,
-				       prealloc_end + 1 - prealloc_start);
+	if (cur_offset < prealloc_end)
+		btrfs_free_reserved_data_space(inode, cur_offset,
+				       prealloc_end + 1 - cur_offset);
 out:
 	inode_unlock(inode);
 	return ret;
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/4] btrfs: should block unused block groups deletion work when allocating data space
  2016-07-25  7:51 [PATCH v2 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
                   ` (2 preceding siblings ...)
  2016-07-25  7:51 ` [PATCH v2 3/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
@ 2016-07-25  7:51 ` Wang Xiaoguang
  2016-07-25 13:32   ` Josef Bacik
  3 siblings, 1 reply; 13+ messages in thread
From: Wang Xiaoguang @ 2016-07-25  7:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jbacik, holger

cleaner_kthread() may run at any time, in which it'll call btrfs_delete_unused_bgs()
to delete unused block groups. Because this work is asynchronous, it may also result
in false ENOSPC error. Please see below race window:

               CPU1                           |             CPU2
                                              |
|-> btrfs_alloc_data_chunk_ondemand()         |-> cleaner_kthread()
    |-> do_chunk_alloc()                      |   |
    |   assume it returns ENOSPC, which means |   |
    |   btrfs_space_info is full and have free|   |
    |   space to satisfy data request.        |   |
    |                                         |   |- > btrfs_delete_unused_bgs()
    |                                         |   |    it will decrease btrfs_space_info
    |                                         |   |    total_bytes and make
    |                                         |   |    btrfs_space_info is not full.
    |                                         |   |
In this case, we may get ENOSPC error, but btrfs_space_info is not full.

To fix this issue, in btrfs_alloc_data_chunk_ondemand(), if we need to call
do_chunk_alloc() to allocating new chunk, we should block btrfs_delete_unused_bgs().
So here we introduce a new struct rw_semaphore bg_delete_sem to do this job.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  1 +
 fs/btrfs/disk-io.c     |  1 +
 fs/btrfs/extent-tree.c | 40 ++++++++++++++++++++++++++++++++++------
 3 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7eb2913..bf0751d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -800,6 +800,7 @@ struct btrfs_fs_info {
 	struct mutex cleaner_mutex;
 	struct mutex chunk_mutex;
 	struct mutex volume_mutex;
+	struct rw_semaphore bg_delete_sem;
 
 	/*
 	 * this is taken to make sure we don't set block groups ro after
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 60ce119..65a1465 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2676,6 +2676,7 @@ int open_ctree(struct super_block *sb,
 	mutex_init(&fs_info->ordered_operations_mutex);
 	mutex_init(&fs_info->tree_log_mutex);
 	mutex_init(&fs_info->chunk_mutex);
+	init_rwsem(&fs_info->bg_delete_sem);
 	mutex_init(&fs_info->transaction_kthread_mutex);
 	mutex_init(&fs_info->cleaner_mutex);
 	mutex_init(&fs_info->volume_mutex);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index df8d756..d1f8638 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4111,6 +4111,7 @@ int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
 	int ret = 0;
 	int need_commit = 2;
 	int have_pinned_space;
+	int have_bg_delete_sem = 0;
 
 	/* make sure bytes are sectorsize aligned */
 	bytes = ALIGN(bytes, root->sectorsize);
@@ -4121,8 +4122,11 @@ int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
 	}
 
 	data_sinfo = fs_info->data_sinfo;
-	if (!data_sinfo)
+	if (!data_sinfo) {
+		down_read(&root->fs_info->bg_delete_sem);
+		have_bg_delete_sem = 1;
 		goto alloc;
+	}
 
 again:
 	/* make sure we have enough space to handle the data first */
@@ -4134,10 +4138,21 @@ again:
 	if (used + bytes > data_sinfo->total_bytes) {
 		struct btrfs_trans_handle *trans;
 
+		spin_unlock(&data_sinfo->lock);
+		/*
+		 * We may need to allocate new chunk, so we should block
+		 * btrfs_delete_unused_bgs()
+		 */
+		if (have_bg_delete_sem == 0) {
+			down_read(&root->fs_info->bg_delete_sem);
+			have_bg_delete_sem = 1;
+		}
+
 		/*
 		 * if we don't have enough free bytes in this space then we need
 		 * to alloc a new chunk.
 		 */
+		spin_lock(&data_sinfo->lock);
 		if (!data_sinfo->full) {
 			u64 alloc_target;
 
@@ -4156,17 +4171,20 @@ alloc:
 			 * the fs.
 			 */
 			trans = btrfs_join_transaction(root);
-			if (IS_ERR(trans))
+			if (IS_ERR(trans)) {
+				up_read(&root->fs_info->bg_delete_sem);
 				return PTR_ERR(trans);
+			}
 
 			ret = do_chunk_alloc(trans, root->fs_info->extent_root,
 					     alloc_target,
 					     CHUNK_ALLOC_NO_FORCE);
 			btrfs_end_transaction(trans, root);
 			if (ret < 0) {
-				if (ret != -ENOSPC)
+				if (ret != -ENOSPC) {
+					up_read(&root->fs_info->bg_delete_sem);
 					return ret;
-				else {
+				} else {
 					have_pinned_space = 1;
 					goto commit_trans;
 				}
@@ -4200,15 +4218,19 @@ commit_trans:
 			}
 
 			trans = btrfs_join_transaction(root);
-			if (IS_ERR(trans))
+			if (IS_ERR(trans)) {
+				up_read(&root->fs_info->bg_delete_sem);
 				return PTR_ERR(trans);
+			}
 			if (have_pinned_space >= 0 ||
 			    test_bit(BTRFS_TRANS_HAVE_FREE_BGS,
 				     &trans->transaction->flags) ||
 			    need_commit > 0) {
 				ret = btrfs_commit_transaction(trans, root);
-				if (ret)
+				if (ret) {
+					up_read(&root->fs_info->bg_delete_sem);
 					return ret;
+				}
 				/*
 				 * The cleaner kthread might still be doing iput
 				 * operations. Wait for it to finish so that
@@ -4225,6 +4247,7 @@ commit_trans:
 		trace_btrfs_space_reservation(root->fs_info,
 					      "space_info:enospc",
 					      data_sinfo->flags, bytes, 1);
+		up_read(&root->fs_info->bg_delete_sem);
 		return -ENOSPC;
 	}
 	data_sinfo->bytes_may_use += bytes;
@@ -4232,6 +4255,9 @@ commit_trans:
 				      data_sinfo->flags, bytes, 1);
 	spin_unlock(&data_sinfo->lock);
 
+	if (have_bg_delete_sem == 1)
+		up_read(&root->fs_info->bg_delete_sem);
+
 	return ret;
 }
 
@@ -10594,6 +10620,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 		spin_unlock(&fs_info->unused_bgs_lock);
 
 		mutex_lock(&fs_info->delete_unused_bgs_mutex);
+		down_write(&root->fs_info->bg_delete_sem);
 
 		/* Don't want to race with allocators so take the groups_sem */
 		down_write(&space_info->groups_sem);
@@ -10721,6 +10748,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 end_trans:
 		btrfs_end_transaction(trans, root);
 next:
+		up_write(&root->fs_info->bg_delete_sem);
 		mutex_unlock(&fs_info->delete_unused_bgs_mutex);
 		btrfs_put_block_group(block_group);
 		spin_lock(&fs_info->unused_bgs_lock);
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] btrfs: should block unused block groups deletion work when allocating data space
  2016-07-25  7:51 ` [PATCH v2 4/4] btrfs: should block unused block groups deletion work when allocating data space Wang Xiaoguang
@ 2016-07-25 13:32   ` Josef Bacik
  2016-07-26 12:04     ` [PATCH v3] " Wang Xiaoguang
  0 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2016-07-25 13:32 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger

On 07/25/2016 03:51 AM, Wang Xiaoguang wrote:
> cleaner_kthread() may run at any time, in which it'll call btrfs_delete_unused_bgs()
> to delete unused block groups. Because this work is asynchronous, it may also result
> in false ENOSPC error. Please see below race window:
>
>                CPU1                           |             CPU2
>                                               |
> |-> btrfs_alloc_data_chunk_ondemand()         |-> cleaner_kthread()
>     |-> do_chunk_alloc()                      |   |
>     |   assume it returns ENOSPC, which means |   |
>     |   btrfs_space_info is full and have free|   |
>     |   space to satisfy data request.        |   |
>     |                                         |   |- > btrfs_delete_unused_bgs()
>     |                                         |   |    it will decrease btrfs_space_info
>     |                                         |   |    total_bytes and make
>     |                                         |   |    btrfs_space_info is not full.
>     |                                         |   |
> In this case, we may get ENOSPC error, but btrfs_space_info is not full.
>
> To fix this issue, in btrfs_alloc_data_chunk_ondemand(), if we need to call
> do_chunk_alloc() to allocating new chunk, we should block btrfs_delete_unused_bgs().
> So here we introduce a new struct rw_semaphore bg_delete_sem to do this job.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>  fs/btrfs/ctree.h       |  1 +
>  fs/btrfs/disk-io.c     |  1 +
>  fs/btrfs/extent-tree.c | 40 ++++++++++++++++++++++++++++++++++------
>  3 files changed, 36 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 7eb2913..bf0751d 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -800,6 +800,7 @@ struct btrfs_fs_info {
>  	struct mutex cleaner_mutex;
>  	struct mutex chunk_mutex;
>  	struct mutex volume_mutex;
> +	struct rw_semaphore bg_delete_sem;
>
>  	/*
>  	 * this is taken to make sure we don't set block groups ro after
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 60ce119..65a1465 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2676,6 +2676,7 @@ int open_ctree(struct super_block *sb,
>  	mutex_init(&fs_info->ordered_operations_mutex);
>  	mutex_init(&fs_info->tree_log_mutex);
>  	mutex_init(&fs_info->chunk_mutex);
> +	init_rwsem(&fs_info->bg_delete_sem);
>  	mutex_init(&fs_info->transaction_kthread_mutex);
>  	mutex_init(&fs_info->cleaner_mutex);
>  	mutex_init(&fs_info->volume_mutex);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index df8d756..d1f8638 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4111,6 +4111,7 @@ int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
>  	int ret = 0;
>  	int need_commit = 2;
>  	int have_pinned_space;
> +	int have_bg_delete_sem = 0;
>
>  	/* make sure bytes are sectorsize aligned */
>  	bytes = ALIGN(bytes, root->sectorsize);
> @@ -4121,8 +4122,11 @@ int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
>  	}
>
>  	data_sinfo = fs_info->data_sinfo;
> -	if (!data_sinfo)
> +	if (!data_sinfo) {
> +		down_read(&root->fs_info->bg_delete_sem);
> +		have_bg_delete_sem = 1;
>  		goto alloc;
> +	}
>
>  again:
>  	/* make sure we have enough space to handle the data first */
> @@ -4134,10 +4138,21 @@ again:
>  	if (used + bytes > data_sinfo->total_bytes) {
>  		struct btrfs_trans_handle *trans;
>
> +		spin_unlock(&data_sinfo->lock);
> +		/*
> +		 * We may need to allocate new chunk, so we should block
> +		 * btrfs_delete_unused_bgs()
> +		 */
> +		if (have_bg_delete_sem == 0) {
> +			down_read(&root->fs_info->bg_delete_sem);
> +			have_bg_delete_sem = 1;
> +		}
> +

We could race with somebody else doing an allocation here, so instead do 
something like

if (!have_bg_delete_sem) {
	spin_unlock(&data_sinfo->lock);
	down_read(&root->fs_info->bg_delete_sem);
	have_bg_delete_sem = 1;
	spin_lock((&data_sinfo->lock);
	if (used + bytes <= data_sinfo->total_bytes)
		goto done; // add a done label at bytes_may_use += bytes
}

Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] btrfs: update btrfs_space_info's bytes_may_use timely
  2016-07-25  7:51 ` [PATCH v2 3/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
@ 2016-07-25 13:33   ` Josef Bacik
  2016-07-25 18:10   ` Josef Bacik
  1 sibling, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2016-07-25 13:33 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger

On 07/25/2016 03:51 AM, Wang Xiaoguang wrote:
> This patch can fix some false ENOSPC errors, below test script can
> reproduce one false ENOSPC error:
> 	#!/bin/bash
> 	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
> 	dev=$(losetup --show -f fs.img)
> 	mkfs.btrfs -f -M $dev
> 	mkdir /tmp/mntpoint
> 	mount $dev /tmp/mntpoint
> 	cd /tmp/mntpoint
> 	xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile
>
> Above script will fail for ENOSPC reason, but indeed fs still has free
> space to satisfy this request. Please see call graph:
> btrfs_fallocate()
> |-> btrfs_alloc_data_chunk_ondemand()
> |   bytes_may_use += 64M
> |-> btrfs_prealloc_file_range()
>     |-> btrfs_reserve_extent()
>         |-> btrfs_add_reserved_bytes()
>         |   alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
>         |   change bytes_may_use, and bytes_reserved += 64M. Now
>         |   bytes_may_use + bytes_reserved == 128M, which is greater
>         |   than btrfs_space_info's total_bytes, false enospc occurs.
>         |   Note, the bytes_may_use decrease operation will be done in
>         |   end of btrfs_fallocate(), which is too late.
>
> Here is another simple case for buffered write:
>                     CPU 1              |              CPU 2
>                                        |
> |-> cow_file_range()                   |-> __btrfs_buffered_write()
>     |-> btrfs_reserve_extent()         |   |
>     |                                  |   |
>     |                                  |   |
>     |    .....                         |   |-> btrfs_check_data_free_space()
>     |                                  |
>     |                                  |
>     |-> extent_clear_unlock_delalloc() |
>
> In CPU 1, btrfs_reserve_extent()->find_free_extent()->
> btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
> operation will be delayed to be done in extent_clear_unlock_delalloc().
> Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
> btrfs_check_data_free_space() tries to reserve 100MB data space.
> If
> 	100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
> 		data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
> 		data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
> btrfs_check_data_free_space() will try to allcate new data chunk or call
> btrfs_start_delalloc_roots(), or commit current transaction in order to
> reserve some free space, obviously a lot of work. But indeed it's not
> necessary as long as decreasing bytes_may_use timely, we still have
> free space, decreasing 128M from bytes_may_use.
>
> To fix this issue, this patch chooses to update bytes_may_use for both
> data and metadata in btrfs_add_reserved_bytes(). For compress path, real
> extent length may not be equal to file content length, so introduce a
> ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
> btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
> file content length. Then compress path can update bytes_may_use
> correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
> and RESERVE_FREE.
>
> As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
> run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
> PREALLOC, we also need to update bytes_may_use, but can not pass
> EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so
> here we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
> to update btrfs_space_info's bytes_may_use.
>
> Meanwhile __btrfs_prealloc_file_range() will call
> btrfs_free_reserved_data_space() internally for both sucessful and failed
> path, btrfs_prealloc_file_range()'s callers does not need to call
> btrfs_free_reserved_data_space() any more.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Yup this is good, thanks for fixing this problem

Signed-off-by: Josef Bacik <jbacik@fb.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] btrfs: update btrfs_space_info's bytes_may_use timely
  2016-07-25  7:51 ` [PATCH v2 3/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
  2016-07-25 13:33   ` Josef Bacik
@ 2016-07-25 18:10   ` Josef Bacik
  1 sibling, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2016-07-25 18:10 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger

On 07/25/2016 03:51 AM, Wang Xiaoguang wrote:
> This patch can fix some false ENOSPC errors, below test script can
> reproduce one false ENOSPC error:
> 	#!/bin/bash
> 	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
> 	dev=$(losetup --show -f fs.img)
> 	mkfs.btrfs -f -M $dev
> 	mkdir /tmp/mntpoint
> 	mount $dev /tmp/mntpoint
> 	cd /tmp/mntpoint
> 	xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile
>
> Above script will fail for ENOSPC reason, but indeed fs still has free
> space to satisfy this request. Please see call graph:
> btrfs_fallocate()
> |-> btrfs_alloc_data_chunk_ondemand()
> |   bytes_may_use += 64M
> |-> btrfs_prealloc_file_range()
>     |-> btrfs_reserve_extent()
>         |-> btrfs_add_reserved_bytes()
>         |   alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
>         |   change bytes_may_use, and bytes_reserved += 64M. Now
>         |   bytes_may_use + bytes_reserved == 128M, which is greater
>         |   than btrfs_space_info's total_bytes, false enospc occurs.
>         |   Note, the bytes_may_use decrease operation will be done in
>         |   end of btrfs_fallocate(), which is too late.
>
> Here is another simple case for buffered write:
>                     CPU 1              |              CPU 2
>                                        |
> |-> cow_file_range()                   |-> __btrfs_buffered_write()
>     |-> btrfs_reserve_extent()         |   |
>     |                                  |   |
>     |                                  |   |
>     |    .....                         |   |-> btrfs_check_data_free_space()
>     |                                  |
>     |                                  |
>     |-> extent_clear_unlock_delalloc() |
>
> In CPU 1, btrfs_reserve_extent()->find_free_extent()->
> btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
> operation will be delayed to be done in extent_clear_unlock_delalloc().
> Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
> btrfs_check_data_free_space() tries to reserve 100MB data space.
> If
> 	100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
> 		data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
> 		data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
> btrfs_check_data_free_space() will try to allcate new data chunk or call
> btrfs_start_delalloc_roots(), or commit current transaction in order to
> reserve some free space, obviously a lot of work. But indeed it's not
> necessary as long as decreasing bytes_may_use timely, we still have
> free space, decreasing 128M from bytes_may_use.
>
> To fix this issue, this patch chooses to update bytes_may_use for both
> data and metadata in btrfs_add_reserved_bytes(). For compress path, real
> extent length may not be equal to file content length, so introduce a
> ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
> btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
> file content length. Then compress path can update bytes_may_use
> correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
> and RESERVE_FREE.
>
> As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
> run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
> PREALLOC, we also need to update bytes_may_use, but can not pass
> EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so
> here we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
> to update btrfs_space_info's bytes_may_use.
>
> Meanwhile __btrfs_prealloc_file_range() will call
> btrfs_free_reserved_data_space() internally for both sucessful and failed
> path, btrfs_prealloc_file_range()'s callers does not need to call
> btrfs_free_reserved_data_space() any more.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---

Sigh sorry, baby kept me up most of the night, that should be

Reviewed-by: Josef Bacik <jbacik@fb.com>

Thanks,

Josef


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3] btrfs: should block unused block groups deletion work when allocating data space
  2016-07-25 13:32   ` Josef Bacik
@ 2016-07-26 12:04     ` Wang Xiaoguang
  2016-07-26 15:51       ` Josef Bacik
  0 siblings, 1 reply; 13+ messages in thread
From: Wang Xiaoguang @ 2016-07-26 12:04 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jbacik

cleaner_kthread() may run at any time, in which it'll call btrfs_delete_unused_bgs()
to delete unused block groups. Because this work is asynchronous, it may also result
in false ENOSPC error. Please see below race window:

               CPU1                           |             CPU2
                                              |
|-> btrfs_alloc_data_chunk_ondemand()         |-> cleaner_kthread()
    |-> do_chunk_alloc()                      |   |
    |   assume it returns ENOSPC, which means |   |
    |   btrfs_space_info is full and have free|   |
    |   space to satisfy data request.        |   |
    |                                         |   |- > btrfs_delete_unused_bgs()
    |                                         |   |    it will decrease btrfs_space_info
    |                                         |   |    total_bytes and make
    |                                         |   |    btrfs_space_info is not full.
    |                                         |   |
In this case, we may get ENOSPC error, but btrfs_space_info is not full.

To fix this issue, in btrfs_alloc_data_chunk_ondemand(), if we need to call
do_chunk_alloc() to allocating new chunk, we should block btrfs_delete_unused_bgs().
So here we introduce a new struct rw_semaphore bg_delete_sem to do this job.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
v3: improve one code logic suggested by Josef, thanks.
---
 fs/btrfs/ctree.h       |  1 +
 fs/btrfs/disk-io.c     |  1 +
 fs/btrfs/extent-tree.c | 44 ++++++++++++++++++++++++++++++++++++++------
 3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7eb2913..bf0751d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -800,6 +800,7 @@ struct btrfs_fs_info {
 	struct mutex cleaner_mutex;
 	struct mutex chunk_mutex;
 	struct mutex volume_mutex;
+	struct rw_semaphore bg_delete_sem;
 
 	/*
 	 * this is taken to make sure we don't set block groups ro after
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 60ce119..f8609fd 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2683,6 +2683,7 @@ int open_ctree(struct super_block *sb,
 	init_rwsem(&fs_info->commit_root_sem);
 	init_rwsem(&fs_info->cleanup_work_sem);
 	init_rwsem(&fs_info->subvol_sem);
+	init_rwsem(&fs_info->bg_delete_sem);
 	sema_init(&fs_info->uuid_tree_rescan_sem, 1);
 
 	btrfs_init_dev_replace_locks(fs_info);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index df8d756..50b440e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4111,6 +4111,7 @@ int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
 	int ret = 0;
 	int need_commit = 2;
 	int have_pinned_space;
+	int have_bg_delete_sem = 0;
 
 	/* make sure bytes are sectorsize aligned */
 	bytes = ALIGN(bytes, root->sectorsize);
@@ -4121,8 +4122,11 @@ int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
 	}
 
 	data_sinfo = fs_info->data_sinfo;
-	if (!data_sinfo)
+	if (!data_sinfo) {
+		down_read(&root->fs_info->bg_delete_sem);
+		have_bg_delete_sem = 1;
 		goto alloc;
+	}
 
 again:
 	/* make sure we have enough space to handle the data first */
@@ -4135,6 +4139,19 @@ again:
 		struct btrfs_trans_handle *trans;
 
 		/*
+		 * We may need to allocate new chunk, so we should block
+		 * btrfs_delete_unused_bgs()
+		 */
+		if (!have_bg_delete_sem) {
+			spin_unlock(&data_sinfo->lock);
+			down_read(&root->fs_info->bg_delete_sem);
+			have_bg_delete_sem = 1;
+			spin_lock(&data_sinfo->lock);
+			if (used + bytes <= data_sinfo->total_bytes)
+				goto done;
+		}
+
+		/*
 		 * if we don't have enough free bytes in this space then we need
 		 * to alloc a new chunk.
 		 */
@@ -4156,17 +4173,20 @@ alloc:
 			 * the fs.
 			 */
 			trans = btrfs_join_transaction(root);
-			if (IS_ERR(trans))
+			if (IS_ERR(trans)) {
+				up_read(&root->fs_info->bg_delete_sem);
 				return PTR_ERR(trans);
+			}
 
 			ret = do_chunk_alloc(trans, root->fs_info->extent_root,
 					     alloc_target,
 					     CHUNK_ALLOC_NO_FORCE);
 			btrfs_end_transaction(trans, root);
 			if (ret < 0) {
-				if (ret != -ENOSPC)
+				if (ret != -ENOSPC) {
+					up_read(&root->fs_info->bg_delete_sem);
 					return ret;
-				else {
+				} else {
 					have_pinned_space = 1;
 					goto commit_trans;
 				}
@@ -4200,15 +4220,19 @@ commit_trans:
 			}
 
 			trans = btrfs_join_transaction(root);
-			if (IS_ERR(trans))
+			if (IS_ERR(trans)) {
+				up_read(&root->fs_info->bg_delete_sem);
 				return PTR_ERR(trans);
+			}
 			if (have_pinned_space >= 0 ||
 			    test_bit(BTRFS_TRANS_HAVE_FREE_BGS,
 				     &trans->transaction->flags) ||
 			    need_commit > 0) {
 				ret = btrfs_commit_transaction(trans, root);
-				if (ret)
+				if (ret) {
+					up_read(&root->fs_info->bg_delete_sem);
 					return ret;
+				}
 				/*
 				 * The cleaner kthread might still be doing iput
 				 * operations. Wait for it to finish so that
@@ -4225,13 +4249,19 @@ commit_trans:
 		trace_btrfs_space_reservation(root->fs_info,
 					      "space_info:enospc",
 					      data_sinfo->flags, bytes, 1);
+		up_read(&root->fs_info->bg_delete_sem);
 		return -ENOSPC;
 	}
+
+done:
 	data_sinfo->bytes_may_use += bytes;
 	trace_btrfs_space_reservation(root->fs_info, "space_info",
 				      data_sinfo->flags, bytes, 1);
 	spin_unlock(&data_sinfo->lock);
 
+	if (have_bg_delete_sem)
+		up_read(&root->fs_info->bg_delete_sem);
+
 	return ret;
 }
 
@@ -10594,6 +10624,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 		spin_unlock(&fs_info->unused_bgs_lock);
 
 		mutex_lock(&fs_info->delete_unused_bgs_mutex);
+		down_write(&root->fs_info->bg_delete_sem);
 
 		/* Don't want to race with allocators so take the groups_sem */
 		down_write(&space_info->groups_sem);
@@ -10721,6 +10752,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 end_trans:
 		btrfs_end_transaction(trans, root);
 next:
+		up_write(&root->fs_info->bg_delete_sem);
 		mutex_unlock(&fs_info->delete_unused_bgs_mutex);
 		btrfs_put_block_group(block_group);
 		spin_lock(&fs_info->unused_bgs_lock);
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] btrfs: should block unused block groups deletion work when allocating data space
  2016-07-26 12:04     ` [PATCH v3] " Wang Xiaoguang
@ 2016-07-26 15:51       ` Josef Bacik
  2016-07-26 16:25         ` David Sterba
  2016-07-27  1:26         ` Wang Xiaoguang
  0 siblings, 2 replies; 13+ messages in thread
From: Josef Bacik @ 2016-07-26 15:51 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs

On 07/26/2016 08:04 AM, Wang Xiaoguang wrote:
> cleaner_kthread() may run at any time, in which it'll call btrfs_delete_unused_bgs()
> to delete unused block groups. Because this work is asynchronous, it may also result
> in false ENOSPC error. Please see below race window:
>
>                CPU1                           |             CPU2
>                                               |
> |-> btrfs_alloc_data_chunk_ondemand()         |-> cleaner_kthread()
>     |-> do_chunk_alloc()                      |   |
>     |   assume it returns ENOSPC, which means |   |
>     |   btrfs_space_info is full and have free|   |
>     |   space to satisfy data request.        |   |
>     |                                         |   |- > btrfs_delete_unused_bgs()
>     |                                         |   |    it will decrease btrfs_space_info
>     |                                         |   |    total_bytes and make
>     |                                         |   |    btrfs_space_info is not full.
>     |                                         |   |
> In this case, we may get ENOSPC error, but btrfs_space_info is not full.
>
> To fix this issue, in btrfs_alloc_data_chunk_ondemand(), if we need to call
> do_chunk_alloc() to allocating new chunk, we should block btrfs_delete_unused_bgs().
> So here we introduce a new struct rw_semaphore bg_delete_sem to do this job.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
> v3: improve one code logic suggested by Josef, thanks.
> ---
>  fs/btrfs/ctree.h       |  1 +
>  fs/btrfs/disk-io.c     |  1 +
>  fs/btrfs/extent-tree.c | 44 ++++++++++++++++++++++++++++++++++++++------
>  3 files changed, 40 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 7eb2913..bf0751d 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -800,6 +800,7 @@ struct btrfs_fs_info {
>  	struct mutex cleaner_mutex;
>  	struct mutex chunk_mutex;
>  	struct mutex volume_mutex;
> +	struct rw_semaphore bg_delete_sem;
>
>  	/*
>  	 * this is taken to make sure we don't set block groups ro after
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 60ce119..f8609fd 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2683,6 +2683,7 @@ int open_ctree(struct super_block *sb,
>  	init_rwsem(&fs_info->commit_root_sem);
>  	init_rwsem(&fs_info->cleanup_work_sem);
>  	init_rwsem(&fs_info->subvol_sem);
> +	init_rwsem(&fs_info->bg_delete_sem);
>  	sema_init(&fs_info->uuid_tree_rescan_sem, 1);
>
>  	btrfs_init_dev_replace_locks(fs_info);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index df8d756..50b440e 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4111,6 +4111,7 @@ int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
>  	int ret = 0;
>  	int need_commit = 2;
>  	int have_pinned_space;
> +	int have_bg_delete_sem = 0;
>
>  	/* make sure bytes are sectorsize aligned */
>  	bytes = ALIGN(bytes, root->sectorsize);
> @@ -4121,8 +4122,11 @@ int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
>  	}
>
>  	data_sinfo = fs_info->data_sinfo;
> -	if (!data_sinfo)
> +	if (!data_sinfo) {
> +		down_read(&root->fs_info->bg_delete_sem);
> +		have_bg_delete_sem = 1;
>  		goto alloc;
> +	}
>
>  again:
>  	/* make sure we have enough space to handle the data first */
> @@ -4135,6 +4139,19 @@ again:
>  		struct btrfs_trans_handle *trans;
>
>  		/*
> +		 * We may need to allocate new chunk, so we should block
> +		 * btrfs_delete_unused_bgs()
> +		 */
> +		if (!have_bg_delete_sem) {
> +			spin_unlock(&data_sinfo->lock);
> +			down_read(&root->fs_info->bg_delete_sem);
> +			have_bg_delete_sem = 1;
> +			spin_lock(&data_sinfo->lock);
> +			if (used + bytes <= data_sinfo->total_bytes)

Doh sorry I forgot to mention you need to re-calculate used here.  Also I 
noticed we already have a delete_unused_bgs_mutex, can we drop that and replace 
it with bg_delete_sem as well?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] btrfs: should block unused block groups deletion work when allocating data space
  2016-07-26 15:51       ` Josef Bacik
@ 2016-07-26 16:25         ` David Sterba
  2016-07-27  1:26         ` Wang Xiaoguang
  1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2016-07-26 16:25 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Wang Xiaoguang, linux-btrfs

On Tue, Jul 26, 2016 at 11:51:39AM -0400, Josef Bacik wrote:
> Also I 
> noticed we already have a delete_unused_bgs_mutex, can we drop that and replace 
> it with bg_delete_sem as well?  Thanks,

Oh, yes please, making sense of all the mutexes and semaphores gets
harder with every new one.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] btrfs: should block unused block groups deletion work when allocating data space
  2016-07-26 15:51       ` Josef Bacik
  2016-07-26 16:25         ` David Sterba
@ 2016-07-27  1:26         ` Wang Xiaoguang
  2016-07-27  5:50           ` [PATCH v4] " Wang Xiaoguang
  1 sibling, 1 reply; 13+ messages in thread
From: Wang Xiaoguang @ 2016-07-27  1:26 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs

hello,

On 07/26/2016 11:51 PM, Josef Bacik wrote:
> On 07/26/2016 08:04 AM, Wang Xiaoguang wrote:
>> cleaner_kthread() may run at any time, in which it'll call 
>> btrfs_delete_unused_bgs()
>> to delete unused block groups. Because this work is asynchronous, it 
>> may also result
>> in false ENOSPC error. Please see below race window:
>>
>>                CPU1                           |             CPU2
>>                                               |
>> |-> btrfs_alloc_data_chunk_ondemand()         |-> cleaner_kthread()
>>     |-> do_chunk_alloc()                      |   |
>>     |   assume it returns ENOSPC, which means |   |
>>     |   btrfs_space_info is full and have free|   |
>>     |   space to satisfy data request.        |   |
>>     |                                         |   |- > 
>> btrfs_delete_unused_bgs()
>>     |                                         |   |    it will 
>> decrease btrfs_space_info
>>     |                                         |   | total_bytes and make
>>     |                                         |   | btrfs_space_info 
>> is not full.
>>     |                                         |   |
>> In this case, we may get ENOSPC error, but btrfs_space_info is not full.
>>
>> To fix this issue, in btrfs_alloc_data_chunk_ondemand(), if we need 
>> to call
>> do_chunk_alloc() to allocating new chunk, we should block 
>> btrfs_delete_unused_bgs().
>> So here we introduce a new struct rw_semaphore bg_delete_sem to do 
>> this job.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>> v3: improve one code logic suggested by Josef, thanks.
>> ---
>>  fs/btrfs/ctree.h       |  1 +
>>  fs/btrfs/disk-io.c     |  1 +
>>  fs/btrfs/extent-tree.c | 44 
>> ++++++++++++++++++++++++++++++++++++++------
>>  3 files changed, 40 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 7eb2913..bf0751d 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -800,6 +800,7 @@ struct btrfs_fs_info {
>>      struct mutex cleaner_mutex;
>>      struct mutex chunk_mutex;
>>      struct mutex volume_mutex;
>> +    struct rw_semaphore bg_delete_sem;
>>
>>      /*
>>       * this is taken to make sure we don't set block groups ro after
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 60ce119..f8609fd 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -2683,6 +2683,7 @@ int open_ctree(struct super_block *sb,
>>      init_rwsem(&fs_info->commit_root_sem);
>>      init_rwsem(&fs_info->cleanup_work_sem);
>>      init_rwsem(&fs_info->subvol_sem);
>> +    init_rwsem(&fs_info->bg_delete_sem);
>>      sema_init(&fs_info->uuid_tree_rescan_sem, 1);
>>
>>      btrfs_init_dev_replace_locks(fs_info);
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index df8d756..50b440e 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -4111,6 +4111,7 @@ int btrfs_alloc_data_chunk_ondemand(struct 
>> inode *inode, u64 bytes)
>>      int ret = 0;
>>      int need_commit = 2;
>>      int have_pinned_space;
>> +    int have_bg_delete_sem = 0;
>>
>>      /* make sure bytes are sectorsize aligned */
>>      bytes = ALIGN(bytes, root->sectorsize);
>> @@ -4121,8 +4122,11 @@ int btrfs_alloc_data_chunk_ondemand(struct 
>> inode *inode, u64 bytes)
>>      }
>>
>>      data_sinfo = fs_info->data_sinfo;
>> -    if (!data_sinfo)
>> +    if (!data_sinfo) {
>> +        down_read(&root->fs_info->bg_delete_sem);
>> +        have_bg_delete_sem = 1;
>>          goto alloc;
>> +    }
>>
>>  again:
>>      /* make sure we have enough space to handle the data first */
>> @@ -4135,6 +4139,19 @@ again:
>>          struct btrfs_trans_handle *trans;
>>
>>          /*
>> +         * We may need to allocate new chunk, so we should block
>> +         * btrfs_delete_unused_bgs()
>> +         */
>> +        if (!have_bg_delete_sem) {
>> +            spin_unlock(&data_sinfo->lock);
>> +            down_read(&root->fs_info->bg_delete_sem);
>> +            have_bg_delete_sem = 1;
>> +            spin_lock(&data_sinfo->lock);
>> +            if (used + bytes <= data_sinfo->total_bytes)
>
> Doh sorry I forgot to mention you need to re-calculate used here. Also 
> I noticed we already have a delete_unused_bgs_mutex, can we drop that 
> and replace it with bg_delete_sem as well?  Thanks,
Sorry, I also forgot to re-calculate used...
OK, I'll try to replace delete_unused_bgs_mutex with bg_delete_sem, 
please wait a while :)

Regards,
Xiaoguang Wang

>
> Josef
>
>




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4] btrfs: should block unused block groups deletion work when allocating data space
  2016-07-27  1:26         ` Wang Xiaoguang
@ 2016-07-27  5:50           ` Wang Xiaoguang
  0 siblings, 0 replies; 13+ messages in thread
From: Wang Xiaoguang @ 2016-07-27  5:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jbacik, dsterba

cleaner_kthread() may run at any time, in which it'll call btrfs_delete_unused_bgs()
to delete unused block groups. Because this work is asynchronous, it may also result
in false ENOSPC error. Please see below race window:

               CPU1                           |             CPU2
                                              |
|-> btrfs_alloc_data_chunk_ondemand()         |-> cleaner_kthread()
    |-> do_chunk_alloc()                      |   |
    |   assume it returns ENOSPC, which means |   |
    |   btrfs_space_info is full and have free|   |
    |   space to satisfy data request.        |   |
    |                                         |   |- > btrfs_delete_unused_bgs()
    |                                         |   |    it will decrease btrfs_space_info
    |                                         |   |    total_bytes and make
    |                                         |   |    btrfs_space_info is not full.
    |                                         |   |
In this case, we may get ENOSPC error, but btrfs_space_info is not full.

To fix this issue, in btrfs_alloc_data_chunk_ondemand(), if we need to call
do_chunk_alloc() to allocating new chunk, we should block btrfs_delete_unused_bgs().
Here we introduce a new struct rw_semaphore bg_delete_sem to do this job.

Indeed there is already a "struct mutex delete_unused_bgs_mutex", but it's mutex,
we can not use it for this purpose. Of course, we can re-define it to be struct
rw_semaphore, then use it in btrfs_alloc_data_chunk_ondemand(). Either method will
work.

But given that delete_unused_bgs_mutex's name length is longer than bg_delete_sem,
I choose the first method, to create a new struct rw_semaphore bg_delete_sem and
delete delete_unused_bgs_mutex :)

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
v3: improve one code logic suggested by Josef, thanks.
v4: replace delete_unused_bgs_mutex with our new bg_delete_sem, also
    suggested by Josef.
---
 fs/btrfs/ctree.h       |  2 +-
 fs/btrfs/disk-io.c     | 13 ++++++-------
 fs/btrfs/extent-tree.c | 42 ++++++++++++++++++++++++++++++++++--------
 fs/btrfs/volumes.c     | 42 +++++++++++++++++++++---------------------
 4 files changed, 62 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7eb2913..90041a2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -800,6 +800,7 @@ struct btrfs_fs_info {
 	struct mutex cleaner_mutex;
 	struct mutex chunk_mutex;
 	struct mutex volume_mutex;
+	struct rw_semaphore bg_delete_sem;
 
 	/*
 	 * this is taken to make sure we don't set block groups ro after
@@ -1079,7 +1080,6 @@ struct btrfs_fs_info {
 	spinlock_t unused_bgs_lock;
 	struct list_head unused_bgs;
 	struct mutex unused_bg_unpin_mutex;
-	struct mutex delete_unused_bgs_mutex;
 
 	/* For btrfs to record security options */
 	struct security_mnt_opts security_opts;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 60ce119..86cad9a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1839,12 +1839,11 @@ static int cleaner_kthread(void *arg)
 		btrfs_run_defrag_inodes(root->fs_info);
 
 		/*
-		 * Acquires fs_info->delete_unused_bgs_mutex to avoid racing
-		 * with relocation (btrfs_relocate_chunk) and relocation
-		 * acquires fs_info->cleaner_mutex (btrfs_relocate_block_group)
-		 * after acquiring fs_info->delete_unused_bgs_mutex. So we
-		 * can't hold, nor need to, fs_info->cleaner_mutex when deleting
-		 * unused block groups.
+		 * Acquires fs_info->bg_delete_sem to avoid racing with
+		 * relocation (btrfs_relocate_chunk) and relocation acquires
+		 * fs_info->cleaner_mutex (btrfs_relocate_block_group) after
+		 * acquiring fs_info->bg_delete_sem. So we can't hold, nor need
+		 * to, fs_info->cleaner_mutex when deleting unused block groups.
 		 */
 		btrfs_delete_unused_bgs(root->fs_info);
 sleep:
@@ -2595,7 +2594,6 @@ int open_ctree(struct super_block *sb,
 	spin_lock_init(&fs_info->unused_bgs_lock);
 	rwlock_init(&fs_info->tree_mod_log_lock);
 	mutex_init(&fs_info->unused_bg_unpin_mutex);
-	mutex_init(&fs_info->delete_unused_bgs_mutex);
 	mutex_init(&fs_info->reloc_mutex);
 	mutex_init(&fs_info->delalloc_root_mutex);
 	mutex_init(&fs_info->cleaner_delayed_iput_mutex);
@@ -2683,6 +2681,7 @@ int open_ctree(struct super_block *sb,
 	init_rwsem(&fs_info->commit_root_sem);
 	init_rwsem(&fs_info->cleanup_work_sem);
 	init_rwsem(&fs_info->subvol_sem);
+	init_rwsem(&fs_info->bg_delete_sem);
 	sema_init(&fs_info->uuid_tree_rescan_sem, 1);
 
 	btrfs_init_dev_replace_locks(fs_info);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index df8d756..19d4bb1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4111,6 +4111,7 @@ int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
 	int ret = 0;
 	int need_commit = 2;
 	int have_pinned_space;
+	int have_bg_delete_sem = 0;
 
 	/* make sure bytes are sectorsize aligned */
 	bytes = ALIGN(bytes, root->sectorsize);
@@ -4121,8 +4122,11 @@ int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
 	}
 
 	data_sinfo = fs_info->data_sinfo;
-	if (!data_sinfo)
+	if (!data_sinfo) {
+		down_read(&root->fs_info->bg_delete_sem);
+		have_bg_delete_sem = 1;
 		goto alloc;
+	}
 
 again:
 	/* make sure we have enough space to handle the data first */
@@ -4135,6 +4139,17 @@ again:
 		struct btrfs_trans_handle *trans;
 
 		/*
+		 * We may need to allocate new chunk, so we should block
+		 * btrfs_delete_unused_bgs()
+		 */
+		if (!have_bg_delete_sem) {
+			spin_unlock(&data_sinfo->lock);
+			down_read(&root->fs_info->bg_delete_sem);
+			have_bg_delete_sem = 1;
+			goto again;
+		}
+
+		/*
 		 * if we don't have enough free bytes in this space then we need
 		 * to alloc a new chunk.
 		 */
@@ -4156,17 +4171,20 @@ alloc:
 			 * the fs.
 			 */
 			trans = btrfs_join_transaction(root);
-			if (IS_ERR(trans))
+			if (IS_ERR(trans)) {
+				up_read(&root->fs_info->bg_delete_sem);
 				return PTR_ERR(trans);
+			}
 
 			ret = do_chunk_alloc(trans, root->fs_info->extent_root,
 					     alloc_target,
 					     CHUNK_ALLOC_NO_FORCE);
 			btrfs_end_transaction(trans, root);
 			if (ret < 0) {
-				if (ret != -ENOSPC)
+				if (ret != -ENOSPC) {
+					up_read(&root->fs_info->bg_delete_sem);
 					return ret;
-				else {
+				} else {
 					have_pinned_space = 1;
 					goto commit_trans;
 				}
@@ -4200,15 +4218,19 @@ commit_trans:
 			}
 
 			trans = btrfs_join_transaction(root);
-			if (IS_ERR(trans))
+			if (IS_ERR(trans)) {
+				up_read(&root->fs_info->bg_delete_sem);
 				return PTR_ERR(trans);
+			}
 			if (have_pinned_space >= 0 ||
 			    test_bit(BTRFS_TRANS_HAVE_FREE_BGS,
 				     &trans->transaction->flags) ||
 			    need_commit > 0) {
 				ret = btrfs_commit_transaction(trans, root);
-				if (ret)
+				if (ret) {
+					up_read(&root->fs_info->bg_delete_sem);
 					return ret;
+				}
 				/*
 				 * The cleaner kthread might still be doing iput
 				 * operations. Wait for it to finish so that
@@ -4225,6 +4247,7 @@ commit_trans:
 		trace_btrfs_space_reservation(root->fs_info,
 					      "space_info:enospc",
 					      data_sinfo->flags, bytes, 1);
+		up_read(&root->fs_info->bg_delete_sem);
 		return -ENOSPC;
 	}
 	data_sinfo->bytes_may_use += bytes;
@@ -4232,6 +4255,9 @@ commit_trans:
 				      data_sinfo->flags, bytes, 1);
 	spin_unlock(&data_sinfo->lock);
 
+	if (have_bg_delete_sem)
+		up_read(&root->fs_info->bg_delete_sem);
+
 	return ret;
 }
 
@@ -10593,7 +10619,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 		}
 		spin_unlock(&fs_info->unused_bgs_lock);
 
-		mutex_lock(&fs_info->delete_unused_bgs_mutex);
+		down_write(&root->fs_info->bg_delete_sem);
 
 		/* Don't want to race with allocators so take the groups_sem */
 		down_write(&space_info->groups_sem);
@@ -10721,7 +10747,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 end_trans:
 		btrfs_end_transaction(trans, root);
 next:
-		mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+		up_write(&root->fs_info->bg_delete_sem);
 		btrfs_put_block_group(block_group);
 		spin_lock(&fs_info->unused_bgs_lock);
 	}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 589f128..7dbf70a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2876,7 +2876,7 @@ static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset)
 	 * we release the path used to search the chunk/dev tree and before
 	 * the current task acquires this mutex and calls us.
 	 */
-	ASSERT(mutex_is_locked(&root->fs_info->delete_unused_bgs_mutex));
+	ASSERT(rwsem_is_locked(&root->fs_info->bg_delete_sem));
 
 	ret = btrfs_can_relocate(extent_root, chunk_offset);
 	if (ret)
@@ -2929,10 +2929,10 @@ again:
 	key.type = BTRFS_CHUNK_ITEM_KEY;
 
 	while (1) {
-		mutex_lock(&root->fs_info->delete_unused_bgs_mutex);
+		down_read(&root->fs_info->bg_delete_sem);
 		ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
 		if (ret < 0) {
-			mutex_unlock(&root->fs_info->delete_unused_bgs_mutex);
+			up_read(&root->fs_info->bg_delete_sem);
 			goto error;
 		}
 		BUG_ON(ret == 0); /* Corruption */
@@ -2940,7 +2940,7 @@ again:
 		ret = btrfs_previous_item(chunk_root, path, key.objectid,
 					  key.type);
 		if (ret)
-			mutex_unlock(&root->fs_info->delete_unused_bgs_mutex);
+			up_read(&root->fs_info->bg_delete_sem);
 		if (ret < 0)
 			goto error;
 		if (ret > 0)
@@ -2962,7 +2962,7 @@ again:
 			else
 				BUG_ON(ret);
 		}
-		mutex_unlock(&root->fs_info->delete_unused_bgs_mutex);
+		up_read(&root->fs_info->bg_delete_sem);
 
 		if (found_key.offset == 0)
 			break;
@@ -3498,10 +3498,10 @@ again:
 			goto error;
 		}
 
-		mutex_lock(&fs_info->delete_unused_bgs_mutex);
+		down_read(&fs_info->bg_delete_sem);
 		ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
 		if (ret < 0) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			up_read(&fs_info->bg_delete_sem);
 			goto error;
 		}
 
@@ -3515,7 +3515,7 @@ again:
 		ret = btrfs_previous_item(chunk_root, path, 0,
 					  BTRFS_CHUNK_ITEM_KEY);
 		if (ret) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			up_read(&fs_info->bg_delete_sem);
 			ret = 0;
 			break;
 		}
@@ -3525,7 +3525,7 @@ again:
 		btrfs_item_key_to_cpu(leaf, &found_key, slot);
 
 		if (found_key.objectid != key.objectid) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			up_read(&fs_info->bg_delete_sem);
 			break;
 		}
 
@@ -3543,12 +3543,12 @@ again:
 
 		btrfs_release_path(path);
 		if (!ret) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			up_read(&fs_info->bg_delete_sem);
 			goto loop;
 		}
 
 		if (counting) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			up_read(&fs_info->bg_delete_sem);
 			spin_lock(&fs_info->balance_lock);
 			bctl->stat.expected++;
 			spin_unlock(&fs_info->balance_lock);
@@ -3573,7 +3573,7 @@ again:
 					count_meta < bctl->meta.limit_min)
 				|| ((chunk_type & BTRFS_BLOCK_GROUP_SYSTEM) &&
 					count_sys < bctl->sys.limit_min)) {
-			mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+			up_read(&fs_info->bg_delete_sem);
 			goto loop;
 		}
 
@@ -3586,7 +3586,7 @@ again:
 		    !chunk_reserved && !bytes_used) {
 			trans = btrfs_start_transaction(chunk_root, 0);
 			if (IS_ERR(trans)) {
-				mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+				up_read(&fs_info->bg_delete_sem);
 				ret = PTR_ERR(trans);
 				goto error;
 			}
@@ -3595,7 +3595,7 @@ again:
 						      BTRFS_BLOCK_GROUP_DATA);
 			btrfs_end_transaction(trans, chunk_root);
 			if (ret < 0) {
-				mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+				up_read(&fs_info->bg_delete_sem);
 				goto error;
 			}
 			chunk_reserved = 1;
@@ -3603,7 +3603,7 @@ again:
 
 		ret = btrfs_relocate_chunk(chunk_root,
 					   found_key.offset);
-		mutex_unlock(&fs_info->delete_unused_bgs_mutex);
+		up_read(&fs_info->bg_delete_sem);
 		if (ret && ret != -ENOSPC)
 			goto error;
 		if (ret == -ENOSPC) {
@@ -4330,16 +4330,16 @@ again:
 	key.type = BTRFS_DEV_EXTENT_KEY;
 
 	do {
-		mutex_lock(&root->fs_info->delete_unused_bgs_mutex);
+		down_read(&root->fs_info->bg_delete_sem);
 		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
 		if (ret < 0) {
-			mutex_unlock(&root->fs_info->delete_unused_bgs_mutex);
+			up_read(&root->fs_info->bg_delete_sem);
 			goto done;
 		}
 
 		ret = btrfs_previous_item(root, path, 0, key.type);
 		if (ret)
-			mutex_unlock(&root->fs_info->delete_unused_bgs_mutex);
+			up_read(&root->fs_info->bg_delete_sem);
 		if (ret < 0)
 			goto done;
 		if (ret) {
@@ -4353,7 +4353,7 @@ again:
 		btrfs_item_key_to_cpu(l, &key, path->slots[0]);
 
 		if (key.objectid != device->devid) {
-			mutex_unlock(&root->fs_info->delete_unused_bgs_mutex);
+			up_read(&root->fs_info->bg_delete_sem);
 			btrfs_release_path(path);
 			break;
 		}
@@ -4362,7 +4362,7 @@ again:
 		length = btrfs_dev_extent_length(l, dev_extent);
 
 		if (key.offset + length <= new_size) {
-			mutex_unlock(&root->fs_info->delete_unused_bgs_mutex);
+			up_read(&root->fs_info->bg_delete_sem);
 			btrfs_release_path(path);
 			break;
 		}
@@ -4371,7 +4371,7 @@ again:
 		btrfs_release_path(path);
 
 		ret = btrfs_relocate_chunk(root, chunk_offset);
-		mutex_unlock(&root->fs_info->delete_unused_bgs_mutex);
+		up_read(&root->fs_info->bg_delete_sem);
 		if (ret && ret != -ENOSPC)
 			goto done;
 		if (ret == -ENOSPC)
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-07-27  5:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-25  7:51 [PATCH v2 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
2016-07-25  7:51 ` [PATCH v2 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
2016-07-25  7:51 ` [PATCH v2 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
2016-07-25  7:51 ` [PATCH v2 3/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
2016-07-25 13:33   ` Josef Bacik
2016-07-25 18:10   ` Josef Bacik
2016-07-25  7:51 ` [PATCH v2 4/4] btrfs: should block unused block groups deletion work when allocating data space Wang Xiaoguang
2016-07-25 13:32   ` Josef Bacik
2016-07-26 12:04     ` [PATCH v3] " Wang Xiaoguang
2016-07-26 15:51       ` Josef Bacik
2016-07-26 16:25         ` David Sterba
2016-07-27  1:26         ` Wang Xiaoguang
2016-07-27  5:50           ` [PATCH v4] " Wang Xiaoguang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.