All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue
@ 2016-07-20  5:56 Wang Xiaoguang
  2016-07-20  5:56 ` [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-20  5:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jbacik, holger

Currently in btrfs, for data space reservation, it does not update
bytes_may_use in btrfs_update_reserved_bytes() and the decrease operation
will be delayed to be done in extent_clear_unlock_delalloc(), for
fallocate(2), decrease operation is even delayed to be done in end
of btrfs_fallocate(), which is too late. Obviously such delay will
cause unnecessary pressure to enospc system, in [PATCH 4/4], there is
a simpel test script that can reveal such false ENOSPC bug.

So in this patch set, it will remove RESERVE_FREE, RESERVE_ALLOC and
RESERVE_ALLOC_NO_ACCOUNT, and we always update bytes_may_use timely.

I'll also commit a fstests test case for this issue.

Wang Xiaoguang (4):
  btrfs: use correct offset for reloc_inode in
    prealloc_file_extent_cluster()
  btrfs: divide btrfs_update_reserved_bytes() into two functions
  btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
  btrfs: update btrfs_space_info's bytes_may_use timely

 fs/btrfs/ctree.h       |   2 +-
 fs/btrfs/extent-tree.c | 131 ++++++++++++++++++++++++-------------------------
 fs/btrfs/extent_io.h   |   1 +
 fs/btrfs/file.c        |  26 +++++-----
 fs/btrfs/inode-map.c   |   3 +-
 fs/btrfs/inode.c       |  37 ++++++++++----
 fs/btrfs/relocation.c  |  17 +++++--
 7 files changed, 123 insertions(+), 94 deletions(-)

-- 
2.9.0




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
  2016-07-20  5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
@ 2016-07-20  5:56 ` Wang Xiaoguang
  2016-07-20 13:18   ` Josef Bacik
  2016-07-20  5:56 ` [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-20  5:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jbacik, holger

In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
which indeed are extent's bytenr. The correct value should be
cluster->[start|end] minus block group's start bytenr.

start bytenr   cluster->start
|              |     extent      |   extent   | ...| extent |
|----------------------------------------------------------------|
|                block group reloc_inode                         |

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/relocation.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0477dca..a0de885 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3030,12 +3030,14 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	u64 num_bytes;
 	int nr = 0;
 	int ret = 0;
+	u64 prealloc_start = cluster->start - offset;
+	u64 prealloc_end = cluster->end - offset;
 
 	BUG_ON(cluster->start != cluster->boundary[0]);
 	inode_lock(inode);
 
-	ret = btrfs_check_data_free_space(inode, cluster->start,
-					  cluster->end + 1 - cluster->start);
+	ret = btrfs_check_data_free_space(inode, prealloc_start,
+					  prealloc_end + 1 - prealloc_start);
 	if (ret)
 		goto out;
 
@@ -3056,8 +3058,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
 			break;
 		nr++;
 	}
-	btrfs_free_reserved_data_space(inode, cluster->start,
-				       cluster->end + 1 - cluster->start);
+	btrfs_free_reserved_data_space(inode, prealloc_start,
+				       prealloc_end + 1 - prealloc_start);
 out:
 	inode_unlock(inode);
 	return ret;
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions
  2016-07-20  5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
  2016-07-20  5:56 ` [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
@ 2016-07-20  5:56 ` Wang Xiaoguang
  2016-07-20 13:21   ` Josef Bacik
  2016-07-20  5:56 ` [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag Wang Xiaoguang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-20  5:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jbacik, holger

This patch divides btrfs_update_reserved_bytes() into
btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes(), and
next patch will extend btrfs_add_reserved_bytes()to fix some
false ENOSPC error, please see later patch for detailed info.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 97 +++++++++++++++++++++++++++++---------------------
 1 file changed, 57 insertions(+), 40 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 82b912a..8eaac39 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -104,9 +104,10 @@ static int find_next_key(struct btrfs_path *path, int level,
 			 struct btrfs_key *key);
 static void dump_space_info(struct btrfs_space_info *info, u64 bytes,
 			    int dump_block_groups);
-static int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache,
-				       u64 num_bytes, int reserve,
-				       int delalloc);
+static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
+				    u64 num_bytes, int reserve, int delalloc);
+static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache,
+				     u64 num_bytes, int delalloc);
 static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv,
 			       u64 num_bytes);
 int btrfs_pin_extent(struct btrfs_root *root,
@@ -6297,19 +6298,14 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
 }
 
 /**
- * btrfs_update_reserved_bytes - update the block_group and space info counters
+ * btrfs_add_reserved_bytes - update the block_group and space info counters
  * @cache:	The cache we are manipulating
  * @num_bytes:	The number of bytes in question
  * @reserve:	One of the reservation enums
  * @delalloc:   The blocks are allocated for the delalloc write
  *
- * This is called by the allocator when it reserves space, or by somebody who is
- * freeing space that was never actually used on disk.  For example if you
- * reserve some space for a new leaf in transaction A and before transaction A
- * commits you free that leaf, you call this with reserve set to 0 in order to
- * clear the reservation.
- *
- * Metadata reservations should be called with RESERVE_ALLOC so we do the proper
+ * This is called by the allocator when it reserves space. Metadata
+ * reservations should be called with RESERVE_ALLOC so we do the proper
  * ENOSPC accounting.  For data we handle the reservation through clearing the
  * delalloc bits in the io_tree.  We have to do this since we could end up
  * allocating less disk space for the amount of data we have reserved in the
@@ -6319,44 +6315,65 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
  * make the reservation and return -EAGAIN, otherwise this function always
  * succeeds.
  */
-static int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache,
-				       u64 num_bytes, int reserve, int delalloc)
+static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
+				    u64 num_bytes, int reserve, int delalloc)
 {
 	struct btrfs_space_info *space_info = cache->space_info;
 	int ret = 0;
 
 	spin_lock(&space_info->lock);
 	spin_lock(&cache->lock);
-	if (reserve != RESERVE_FREE) {
-		if (cache->ro) {
-			ret = -EAGAIN;
-		} else {
-			cache->reserved += num_bytes;
-			space_info->bytes_reserved += num_bytes;
-			if (reserve == RESERVE_ALLOC) {
-				trace_btrfs_space_reservation(cache->fs_info,
-						"space_info", space_info->flags,
-						num_bytes, 0);
-				space_info->bytes_may_use -= num_bytes;
-			}
-
-			if (delalloc)
-				cache->delalloc_bytes += num_bytes;
-		}
+	if (cache->ro) {
+		ret = -EAGAIN;
 	} else {
-		if (cache->ro)
-			space_info->bytes_readonly += num_bytes;
-		cache->reserved -= num_bytes;
-		space_info->bytes_reserved -= num_bytes;
+		cache->reserved += num_bytes;
+		space_info->bytes_reserved += num_bytes;
+		if (reserve == RESERVE_ALLOC) {
+			trace_btrfs_space_reservation(cache->fs_info,
+					"space_info", space_info->flags,
+					num_bytes, 0);
+			space_info->bytes_may_use -= num_bytes;
+		}
 
 		if (delalloc)
-			cache->delalloc_bytes -= num_bytes;
+			cache->delalloc_bytes += num_bytes;
 	}
 	spin_unlock(&cache->lock);
 	spin_unlock(&space_info->lock);
 	return ret;
 }
 
+/**
+ * btrfs_free_reserved_bytes - update the block_group and space info counters
+ * @cache:      The cache we are manipulating
+ * @num_bytes:  The number of bytes in question
+ * @delalloc:   The blocks are allocated for the delalloc write
+ *
+ * This is called by somebody who is freeing space that was never actually used
+ * on disk.  For example if you reserve some space for a new leaf in transaction
+ * A and before transaction A commits you free that leaf, you call this with
+ * reserve set to 0 in order to clear the reservation.
+ */
+
+static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache,
+				     u64 num_bytes, int delalloc)
+{
+	struct btrfs_space_info *space_info = cache->space_info;
+	int ret = 0;
+
+	spin_lock(&space_info->lock);
+	spin_lock(&cache->lock);
+	if (cache->ro)
+		space_info->bytes_readonly += num_bytes;
+	cache->reserved -= num_bytes;
+	space_info->bytes_reserved -= num_bytes;
+
+	if (delalloc)
+		cache->delalloc_bytes -= num_bytes;
+	spin_unlock(&cache->lock);
+	spin_unlock(&space_info->lock);
+	return ret;
+}
 void btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans,
 				struct btrfs_root *root)
 {
@@ -6976,7 +6993,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 		WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags));
 
 		btrfs_add_free_space(cache, buf->start, buf->len);
-		btrfs_update_reserved_bytes(cache, buf->len, RESERVE_FREE, 0);
+		btrfs_free_reserved_bytes(cache, buf->len, 0);
 		btrfs_put_block_group(cache);
 		trace_btrfs_reserved_extent_free(root, buf->start, buf->len);
 		pin = 0;
@@ -7548,8 +7565,8 @@ checks:
 					     search_start - offset);
 		BUG_ON(offset > search_start);
 
-		ret = btrfs_update_reserved_bytes(block_group, num_bytes,
-						  alloc_type, delalloc);
+		ret = btrfs_add_reserved_bytes(block_group, num_bytes,
+				alloc_type, delalloc);
 		if (ret == -EAGAIN) {
 			btrfs_add_free_space(block_group, offset, num_bytes);
 			goto loop;
@@ -7781,7 +7798,7 @@ static int __btrfs_free_reserved_extent(struct btrfs_root *root,
 		if (btrfs_test_opt(root, DISCARD))
 			ret = btrfs_discard_extent(root, start, len, NULL);
 		btrfs_add_free_space(cache, start, len);
-		btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, delalloc);
+		btrfs_free_reserved_bytes(cache, len, delalloc);
 	}
 
 	btrfs_put_block_group(cache);
@@ -8011,8 +8028,8 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
 	if (!block_group)
 		return -EINVAL;
 
-	ret = btrfs_update_reserved_bytes(block_group, ins->offset,
-					  RESERVE_ALLOC_NO_ACCOUNT, 0);
+	ret = btrfs_add_reserved_bytes(block_group, ins->offset,
+			RESERVE_ALLOC_NO_ACCOUNT, 0);
 	BUG_ON(ret); /* logic error */
 	ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
 					 0, owner, offset, ins, 1);
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
  2016-07-20  5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
  2016-07-20  5:56 ` [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
  2016-07-20  5:56 ` [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
@ 2016-07-20  5:56 ` Wang Xiaoguang
  2016-07-20 13:22   ` Josef Bacik
  2016-07-20  5:56 ` [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
  2016-07-20  8:46 ` [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Holger Hoffstätte
  4 siblings, 1 reply; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-20  5:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jbacik, holger

In next patch, btrfs_clear_bit_hook() will not call
btrfs_free_reserved_data_space_noquota() to update btrfs_space_info's
bytes_may_use unless it has EXTENT_DO_ACCOUNTING or EXTENT_CLEAR_DATA_RESV,
as for the reason, please see the next patch for detailed info.

As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
PREALLOC, we also need to update bytes_may_use, but can not pass
EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so here
we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
to update btrfs_space_info's bytes_may_use.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/extent_io.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index c0c1c4f..b52ca5d 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -20,6 +20,7 @@
 #define EXTENT_DAMAGED		(1U << 14)
 #define EXTENT_NORESERVE	(1U << 15)
 #define EXTENT_QGROUP_RESERVED	(1U << 16)
+#define EXTENT_CLEAR_DATA_RESV	(1U << 17)
 #define EXTENT_IOBITS		(EXTENT_LOCKED | EXTENT_WRITEBACK)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
 
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely
  2016-07-20  5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
                   ` (2 preceding siblings ...)
  2016-07-20  5:56 ` [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag Wang Xiaoguang
@ 2016-07-20  5:56 ` Wang Xiaoguang
  2016-07-20 13:35   ` Josef Bacik
  2016-07-20  8:46 ` [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Holger Hoffstätte
  4 siblings, 1 reply; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-20  5:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jbacik, holger

This patch can fix some false ENOSPC errors, below test script can
reproduce one false ENOSPC error:
	#!/bin/bash
	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
	dev=$(losetup --show -f fs.img)
	mkfs.btrfs -f -M $dev
	mkdir /tmp/mntpoint
	mount $dev /tmp/mntpoint
	cd /tmp/mntpoint
	xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile

Above script will fail for ENOSPC reason, but indeed fs still has free
space to satisfy this request. Please see call graph:
btrfs_fallocate()
|-> btrfs_alloc_data_chunk_ondemand()
|   bytes_may_use += 64M
|-> btrfs_prealloc_file_range()
    |-> btrfs_reserve_extent()
        |-> btrfs_add_reserved_bytes()
        |   alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
        |   change bytes_may_use, and bytes_reserved += 64M. Now
        |   bytes_may_use + bytes_reserved == 128M, which is greater
        |   than btrfs_space_info's total_bytes, false enospc occurs.
        |   Note, the bytes_may_use decrease operation will done in
        |   end of btrfs_fallocate(), which is too late.

Here is another simple case for buffered write:
                    CPU 1              |              CPU 2
                                       |
|-> cow_file_range()                   |-> __btrfs_buffered_write()
    |-> btrfs_reserve_extent()         |   |
    |                                  |   |
    |                                  |   |
    |    .....                         |   |-> btrfs_check_data_free_space()
    |                                  |
    |                                  |
    |-> extent_clear_unlock_delalloc() |

In CPU 1, btrfs_reserve_extent()->find_free_extent()->
btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
operation will be delayed to be done in extent_clear_unlock_delalloc().
Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
btrfs_check_data_free_space() tries to reserve 100MB data space.
If
	100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
		data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
		data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
btrfs_check_data_free_space() will try to allcate new data chunk or call
btrfs_start_delalloc_roots(), or commit current transaction inorder to
reserve some free space, obviously a lot of work. But indeed it's not
necessary as long as decreasing bytes_may_use timely, we still have
free space, decreasing 128M from bytes_may_use.

To fix this issue, this patch chooses to update bytes_may_use for both
data and metadata in btrfs_add_reserved_bytes(). For compress path, real
extent length may not be equal to file content length, so introduce a
ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
file content length. Then compress path can update bytes_may_use
correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
and RESERVE_FREE.

For inode marked as NODATACOW or extent marked as PREALLOC, we can
directly call btrfs_free_reserved_data_space() to adjust bytes_may_use.

Meanwhile __btrfs_prealloc_file_range() will call
btrfs_free_reserved_data_space() internally for both sucessful and failed
path, btrfs_prealloc_file_range()'s callers does not need to call
btrfs_free_reserved_data_space() any more.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  2 +-
 fs/btrfs/extent-tree.c | 56 +++++++++++++++++---------------------------------
 fs/btrfs/file.c        | 26 +++++++++++++----------
 fs/btrfs/inode-map.c   |  3 +--
 fs/btrfs/inode.c       | 37 ++++++++++++++++++++++++---------
 fs/btrfs/relocation.c  | 11 ++++++++--
 6 files changed, 72 insertions(+), 63 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4274a7b..7eb2913 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2556,7 +2556,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
 				   struct btrfs_root *root,
 				   u64 root_objectid, u64 owner, u64 offset,
 				   struct btrfs_key *ins);
-int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes,
+int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 num_bytes,
 			 u64 min_alloc_size, u64 empty_size, u64 hint_byte,
 			 struct btrfs_key *ins, int is_data, int delalloc);
 int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8eaac39..5447973 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -60,21 +60,6 @@ enum {
 	CHUNK_ALLOC_FORCE = 2,
 };
 
-/*
- * Control how reservations are dealt with.
- *
- * RESERVE_FREE - freeing a reservation.
- * RESERVE_ALLOC - allocating space and we need to update bytes_may_use for
- *   ENOSPC accounting
- * RESERVE_ALLOC_NO_ACCOUNT - allocating space and we should not update
- *   bytes_may_use as the ENOSPC accounting is done elsewhere
- */
-enum {
-	RESERVE_FREE = 0,
-	RESERVE_ALLOC = 1,
-	RESERVE_ALLOC_NO_ACCOUNT = 2,
-};
-
 static int update_block_group(struct btrfs_trans_handle *trans,
 			      struct btrfs_root *root, u64 bytenr,
 			      u64 num_bytes, int alloc);
@@ -105,7 +90,7 @@ static int find_next_key(struct btrfs_path *path, int level,
 static void dump_space_info(struct btrfs_space_info *info, u64 bytes,
 			    int dump_block_groups);
 static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
-				    u64 num_bytes, int reserve, int delalloc);
+				    u64 ram_bytes, u64 num_bytes, int delalloc);
 static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache,
 				     u64 num_bytes, int delalloc);
 static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv,
@@ -3491,7 +3476,6 @@ again:
 		dcs = BTRFS_DC_SETUP;
 	else if (ret == -ENOSPC)
 		set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
-	btrfs_free_reserved_data_space(inode, 0, num_pages);
 
 out_put:
 	iput(inode);
@@ -6300,8 +6284,9 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
 /**
  * btrfs_add_reserved_bytes - update the block_group and space info counters
  * @cache:	The cache we are manipulating
+ * @ram_bytes:  The number of bytes of file content, and will be same to
+ *              @num_bytes except for the compress path.
  * @num_bytes:	The number of bytes in question
- * @reserve:	One of the reservation enums
  * @delalloc:   The blocks are allocated for the delalloc write
  *
  * This is called by the allocator when it reserves space. Metadata
@@ -6316,7 +6301,7 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
  * succeeds.
  */
 static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
-				    u64 num_bytes, int reserve, int delalloc)
+				    u64 ram_bytes, u64 num_bytes, int delalloc)
 {
 	struct btrfs_space_info *space_info = cache->space_info;
 	int ret = 0;
@@ -6328,13 +6313,11 @@ static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
 	} else {
 		cache->reserved += num_bytes;
 		space_info->bytes_reserved += num_bytes;
-		if (reserve == RESERVE_ALLOC) {
-			trace_btrfs_space_reservation(cache->fs_info,
-					"space_info", space_info->flags,
-					num_bytes, 0);
-			space_info->bytes_may_use -= num_bytes;
-		}
 
+		trace_btrfs_space_reservation(cache->fs_info,
+				"space_info", space_info->flags,
+				num_bytes, 0);
+		space_info->bytes_may_use -= ram_bytes;
 		if (delalloc)
 			cache->delalloc_bytes += num_bytes;
 	}
@@ -7218,9 +7201,9 @@ btrfs_release_block_group(struct btrfs_block_group_cache *cache,
  * the free space extent currently.
  */
 static noinline int find_free_extent(struct btrfs_root *orig_root,
-				     u64 num_bytes, u64 empty_size,
-				     u64 hint_byte, struct btrfs_key *ins,
-				     u64 flags, int delalloc)
+				u64 ram_bytes, u64 num_bytes, u64 empty_size,
+				u64 hint_byte, struct btrfs_key *ins,
+				u64 flags, int delalloc)
 {
 	int ret = 0;
 	struct btrfs_root *root = orig_root->fs_info->extent_root;
@@ -7232,8 +7215,6 @@ static noinline int find_free_extent(struct btrfs_root *orig_root,
 	struct btrfs_space_info *space_info;
 	int loop = 0;
 	int index = __get_raid_index(flags);
-	int alloc_type = (flags & BTRFS_BLOCK_GROUP_DATA) ?
-		RESERVE_ALLOC_NO_ACCOUNT : RESERVE_ALLOC;
 	bool failed_cluster_refill = false;
 	bool failed_alloc = false;
 	bool use_cluster = true;
@@ -7565,8 +7546,8 @@ checks:
 					     search_start - offset);
 		BUG_ON(offset > search_start);
 
-		ret = btrfs_add_reserved_bytes(block_group, num_bytes,
-				alloc_type, delalloc);
+		ret = btrfs_add_reserved_bytes(block_group, ram_bytes,
+				num_bytes, delalloc);
 		if (ret == -EAGAIN) {
 			btrfs_add_free_space(block_group, offset, num_bytes);
 			goto loop;
@@ -7739,7 +7720,7 @@ again:
 	up_read(&info->groups_sem);
 }
 
-int btrfs_reserve_extent(struct btrfs_root *root,
+int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes,
 			 u64 num_bytes, u64 min_alloc_size,
 			 u64 empty_size, u64 hint_byte,
 			 struct btrfs_key *ins, int is_data, int delalloc)
@@ -7751,8 +7732,8 @@ int btrfs_reserve_extent(struct btrfs_root *root,
 	flags = btrfs_get_alloc_profile(root, is_data);
 again:
 	WARN_ON(num_bytes < root->sectorsize);
-	ret = find_free_extent(root, num_bytes, empty_size, hint_byte, ins,
-			       flags, delalloc);
+	ret = find_free_extent(root, ram_bytes, num_bytes, empty_size,
+			       hint_byte, ins, flags, delalloc);
 	if (!ret && !is_data) {
 		btrfs_dec_block_group_reservations(root->fs_info,
 						   ins->objectid);
@@ -7761,6 +7742,7 @@ again:
 			num_bytes = min(num_bytes >> 1, ins->offset);
 			num_bytes = round_down(num_bytes, root->sectorsize);
 			num_bytes = max(num_bytes, min_alloc_size);
+			ram_bytes = num_bytes;
 			if (num_bytes == min_alloc_size)
 				final_tried = true;
 			goto again;
@@ -8029,7 +8011,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
 		return -EINVAL;
 
 	ret = btrfs_add_reserved_bytes(block_group, ins->offset,
-			RESERVE_ALLOC_NO_ACCOUNT, 0);
+				       ins->offset, 0);
 	BUG_ON(ret); /* logic error */
 	ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
 					 0, owner, offset, ins, 1);
@@ -8171,7 +8153,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
 	if (IS_ERR(block_rsv))
 		return ERR_CAST(block_rsv);
 
-	ret = btrfs_reserve_extent(root, blocksize, blocksize,
+	ret = btrfs_reserve_extent(root, blocksize, blocksize, blocksize,
 				   empty_size, hint, &ins, 0, 0);
 	if (ret)
 		goto out_unuse;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 2234e88..b4d9258 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2669,6 +2669,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 
 	alloc_start = round_down(offset, blocksize);
 	alloc_end = round_up(offset + len, blocksize);
+	cur_offset = alloc_start;
 
 	/* Make sure we aren't being give some crap mode */
 	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
@@ -2761,7 +2762,6 @@ static long btrfs_fallocate(struct file *file, int mode,
 
 	/* First, check if we exceed the qgroup limit */
 	INIT_LIST_HEAD(&reserve_list);
-	cur_offset = alloc_start;
 	while (1) {
 		em = btrfs_get_extent(inode, NULL, 0, cur_offset,
 				      alloc_end - cur_offset, 0);
@@ -2788,6 +2788,14 @@ static long btrfs_fallocate(struct file *file, int mode,
 					last_byte - cur_offset);
 			if (ret < 0)
 				break;
+		} else {
+			/*
+			 * Do not need to reserve unwritten extent for this
+			 * range, free reserved data space first, otherwise
+			 * it'll result in false ENOSPC error.
+			 */
+			btrfs_free_reserved_data_space(inode, cur_offset,
+				last_byte - cur_offset);
 		}
 		free_extent_map(em);
 		cur_offset = last_byte;
@@ -2805,6 +2813,9 @@ static long btrfs_fallocate(struct file *file, int mode,
 					range->start,
 					range->len, 1 << inode->i_blkbits,
 					offset + len, &alloc_hint);
+		else
+			btrfs_free_reserved_data_space(inode, range->start,
+						       range->len);
 		list_del(&range->list);
 		kfree(range);
 	}
@@ -2839,18 +2850,11 @@ out_unlock:
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
 			     &cached_state, GFP_KERNEL);
 out:
-	/*
-	 * As we waited the extent range, the data_rsv_map must be empty
-	 * in the range, as written data range will be released from it.
-	 * And for prealloacted extent, it will also be released when
-	 * its metadata is written.
-	 * So this is completely used as cleanup.
-	 */
-	btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start);
 	inode_unlock(inode);
 	/* Let go of our reservation. */
-	btrfs_free_reserved_data_space(inode, alloc_start,
-				       alloc_end - alloc_start);
+	if (ret != 0)
+		btrfs_free_reserved_data_space(inode, alloc_start,
+				       alloc_end - cur_offset);
 	return ret;
 }
 
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 70107f7..e59e7d6 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -495,10 +495,9 @@ again:
 	ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
 					      prealloc, prealloc, &alloc_hint);
 	if (ret) {
-		btrfs_delalloc_release_space(inode, 0, prealloc);
+		btrfs_delalloc_release_metadata(inode, prealloc);
 		goto out_put;
 	}
-	btrfs_free_reserved_data_space(inode, 0, prealloc);
 
 	ret = btrfs_write_out_ino_cache(root, trans, path, inode);
 out_put:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4421954..e0cee59 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -564,6 +564,8 @@ cont:
 						     PAGE_SET_WRITEBACK |
 						     page_error_op |
 						     PAGE_END_WRITEBACK);
+			btrfs_free_reserved_data_space_noquota(inode, start,
+						end - start + 1);
 			goto free_pages_out;
 		}
 	}
@@ -739,7 +741,7 @@ retry:
 		lock_extent(io_tree, async_extent->start,
 			    async_extent->start + async_extent->ram_size - 1);
 
-		ret = btrfs_reserve_extent(root,
+		ret = btrfs_reserve_extent(root, async_extent->ram_size,
 					   async_extent->compressed_size,
 					   async_extent->compressed_size,
 					   0, alloc_hint, &ins, 1, 1);
@@ -966,7 +968,8 @@ static noinline int cow_file_range(struct inode *inode,
 				     EXTENT_DEFRAG, PAGE_UNLOCK |
 				     PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK |
 				     PAGE_END_WRITEBACK);
-
+			btrfs_free_reserved_data_space_noquota(inode, start,
+						end - start + 1);
 			*nr_written = *nr_written +
 			     (end - start + PAGE_SIZE) / PAGE_SIZE;
 			*page_started = 1;
@@ -986,7 +989,7 @@ static noinline int cow_file_range(struct inode *inode,
 		unsigned long op;
 
 		cur_alloc_size = disk_num_bytes;
-		ret = btrfs_reserve_extent(root, cur_alloc_size,
+		ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size,
 					   root->sectorsize, 0, alloc_hint,
 					   &ins, 1, 1);
 		if (ret < 0)
@@ -1485,8 +1488,10 @@ out_check:
 		extent_clear_unlock_delalloc(inode, cur_offset,
 					     cur_offset + num_bytes - 1,
 					     locked_page, EXTENT_LOCKED |
-					     EXTENT_DELALLOC, PAGE_UNLOCK |
-					     PAGE_SET_PRIVATE2);
+					     EXTENT_DELALLOC |
+					     EXTENT_CLEAR_DATA_RESV,
+					     PAGE_UNLOCK | PAGE_SET_PRIVATE2);
+
 		if (!nolock && nocow)
 			btrfs_end_write_no_snapshoting(root);
 		cur_offset = extent_end;
@@ -1803,7 +1808,9 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 			return;
 
 		if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID
-		    && do_list && !(state->state & EXTENT_NORESERVE))
+		    && do_list && !(state->state & EXTENT_NORESERVE)
+		    && (*bits & (EXTENT_DO_ACCOUNTING |
+		    EXTENT_CLEAR_DATA_RESV)))
 			btrfs_free_reserved_data_space_noquota(inode,
 					state->start, len);
 
@@ -7214,7 +7221,7 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
 	int ret;
 
 	alloc_hint = get_extent_allocation_hint(inode, start, len);
-	ret = btrfs_reserve_extent(root, len, root->sectorsize, 0,
+	ret = btrfs_reserve_extent(root, len, len, root->sectorsize, 0,
 				   alloc_hint, &ins, 1, 1);
 	if (ret)
 		return ERR_PTR(ret);
@@ -7714,6 +7721,13 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 				ret = PTR_ERR(em2);
 				goto unlock_err;
 			}
+			/*
+			 * For inode marked NODATACOW or extent marked PREALLOC,
+			 * use the existing or preallocated extent, so does not
+			 * need to adjust btrfs_space_info's bytes_may_use.
+			 */
+			btrfs_free_reserved_data_space_noquota(inode,
+					start, len);
 			goto unlock;
 		}
 	}
@@ -7748,7 +7762,6 @@ unlock:
 			i_size_write(inode, start + len);
 
 		adjust_dio_outstanding_extents(inode, dio_data, len);
-		btrfs_free_reserved_data_space(inode, start, len);
 		WARN_ON(dio_data->reserve < len);
 		dio_data->reserve -= len;
 		dio_data->unsubmitted_oe_range_end = start + len;
@@ -10269,6 +10282,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 	u64 last_alloc = (u64)-1;
 	int ret = 0;
 	bool own_trans = true;
+	u64 end = start + num_bytes - 1;
 
 	if (trans)
 		own_trans = false;
@@ -10290,8 +10304,8 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 		 * sized chunks.
 		 */
 		cur_bytes = min(cur_bytes, last_alloc);
-		ret = btrfs_reserve_extent(root, cur_bytes, min_size, 0,
-					   *alloc_hint, &ins, 1, 0);
+		ret = btrfs_reserve_extent(root, cur_bytes, cur_bytes,
+				min_size, 0, *alloc_hint, &ins, 1, 0);
 		if (ret) {
 			if (own_trans)
 				btrfs_end_transaction(trans, root);
@@ -10377,6 +10391,9 @@ next:
 		if (own_trans)
 			btrfs_end_transaction(trans, root);
 	}
+	if (cur_offset < end)
+		btrfs_free_reserved_data_space(inode, cur_offset,
+			end - cur_offset + 1);
 	return ret;
 }
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index a0de885..f39c4db 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3032,6 +3032,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	int ret = 0;
 	u64 prealloc_start = cluster->start - offset;
 	u64 prealloc_end = cluster->end - offset;
+	u64 cur_offset;
 
 	BUG_ON(cluster->start != cluster->boundary[0]);
 	inode_lock(inode);
@@ -3041,6 +3042,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	if (ret)
 		goto out;
 
+	cur_offset = prealloc_start;
 	while (nr < cluster->nr) {
 		start = cluster->boundary[nr] - offset;
 		if (nr + 1 < cluster->nr)
@@ -3050,16 +3052,21 @@ int prealloc_file_extent_cluster(struct inode *inode,
 
 		lock_extent(&BTRFS_I(inode)->io_tree, start, end);
 		num_bytes = end + 1 - start;
+		if (cur_offset < start)
+			btrfs_free_reserved_data_space(inode, cur_offset,
+					start - cur_offset);
 		ret = btrfs_prealloc_file_range(inode, 0, start,
 						num_bytes, num_bytes,
 						end + 1, &alloc_hint);
+		cur_offset = end + 1;
 		unlock_extent(&BTRFS_I(inode)->io_tree, start, end);
 		if (ret)
 			break;
 		nr++;
 	}
-	btrfs_free_reserved_data_space(inode, prealloc_start,
-				       prealloc_end + 1 - prealloc_start);
+	if (cur_offset < prealloc_end)
+		btrfs_free_reserved_data_space(inode, cur_offset,
+				       prealloc_end + 1 - cur_offset);
 out:
 	inode_unlock(inode);
 	return ret;
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue
  2016-07-20  5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
                   ` (3 preceding siblings ...)
  2016-07-20  5:56 ` [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
@ 2016-07-20  8:46 ` Holger Hoffstätte
  2016-07-21  1:51   ` Wang Xiaoguang
  4 siblings, 1 reply; 15+ messages in thread
From: Holger Hoffstätte @ 2016-07-20  8:46 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, jbacik

On 07/20/16 07:56, Wang Xiaoguang wrote:
> Currently in btrfs, for data space reservation, it does not update
> bytes_may_use in btrfs_update_reserved_bytes() and the decrease operation
> will be delayed to be done in extent_clear_unlock_delalloc(), for
> fallocate(2), decrease operation is even delayed to be done in end
> of btrfs_fallocate(), which is too late. Obviously such delay will
> cause unnecessary pressure to enospc system, in [PATCH 4/4], there is
> a simpel test script that can reveal such false ENOSPC bug.
> 
> So in this patch set, it will remove RESERVE_FREE, RESERVE_ALLOC and
> RESERVE_ALLOC_NO_ACCOUNT, and we always update bytes_may_use timely.
> 
> I'll also commit a fstests test case for this issue.
> 
> Wang Xiaoguang (4):
>   btrfs: use correct offset for reloc_inode in
>     prealloc_file_extent_cluster()
>   btrfs: divide btrfs_update_reserved_bytes() into two functions
>   btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
>   btrfs: update btrfs_space_info's bytes_may_use timely
> 

Just like the previous version, for all 4 patches:

Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>

via the reproducer script & some very large manual fallocates.

thanks!
Holger


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
  2016-07-20  5:56 ` [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
@ 2016-07-20 13:18   ` Josef Bacik
  2016-07-21  1:49     ` Wang Xiaoguang
  0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-07-20 13:18 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger

On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
> In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
> wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
> which indeed are extent's bytenr. The correct value should be
> cluster->[start|end] minus block group's start bytenr.
>
> start bytenr   cluster->start
> |              |     extent      |   extent   | ...| extent |
> |----------------------------------------------------------------|
> |                block group reloc_inode                         |
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>  fs/btrfs/relocation.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0477dca..a0de885 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3030,12 +3030,14 @@ int prealloc_file_extent_cluster(struct inode *inode,
>  	u64 num_bytes;
>  	int nr = 0;
>  	int ret = 0;
> +	u64 prealloc_start = cluster->start - offset;
> +	u64 prealloc_end = cluster->end - offset;
>
>  	BUG_ON(cluster->start != cluster->boundary[0]);
>  	inode_lock(inode);
>
> -	ret = btrfs_check_data_free_space(inode, cluster->start,
> -					  cluster->end + 1 - cluster->start);
> +	ret = btrfs_check_data_free_space(inode, prealloc_start,
> +					  prealloc_end + 1 - prealloc_start);
>  	if (ret)
>  		goto out;
>
> @@ -3056,8 +3058,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
>  			break;
>  		nr++;
>  	}
> -	btrfs_free_reserved_data_space(inode, cluster->start,
> -				       cluster->end + 1 - cluster->start);
> +	btrfs_free_reserved_data_space(inode, prealloc_start,
> +				       prealloc_end + 1 - prealloc_start);
>  out:
>  	inode_unlock(inode);
>  	return ret;
>

This ends up being the same amount.  Consider this scenario

bg bytenr = 4096
cluster->start = 8192
cluster->end = 12287

cluster->end + 1 - cluster->start = 4096

prealloc_start = cluster->start - offset = 0
prealloc_end = cluster->end - offset = 8191

prealloc_end + 1 - prealloc_start = 4096

You shift both by the same amount, which gives you the same answer.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions
  2016-07-20  5:56 ` [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
@ 2016-07-20 13:21   ` Josef Bacik
  0 siblings, 0 replies; 15+ messages in thread
From: Josef Bacik @ 2016-07-20 13:21 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger

On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
> This patch divides btrfs_update_reserved_bytes() into
> btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes(), and
> next patch will extend btrfs_add_reserved_bytes()to fix some
> false ENOSPC error, please see later patch for detailed info.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

This appears to be functionally the same

Signed-off-by: Josef Bacik <jbacik@fb.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
  2016-07-20  5:56 ` [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag Wang Xiaoguang
@ 2016-07-20 13:22   ` Josef Bacik
  2016-07-21  1:15     ` Wang Xiaoguang
  0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-07-20 13:22 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger

On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
> In next patch, btrfs_clear_bit_hook() will not call
> btrfs_free_reserved_data_space_noquota() to update btrfs_space_info's
> bytes_may_use unless it has EXTENT_DO_ACCOUNTING or EXTENT_CLEAR_DATA_RESV,
> as for the reason, please see the next patch for detailed info.
>
> As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
> run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
> PREALLOC, we also need to update bytes_may_use, but can not pass
> EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so here
> we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
> to update btrfs_space_info's bytes_may_use.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

There's no point in introducing only a flag in one patch, collapse this into the 
patch that actually uses it.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely
  2016-07-20  5:56 ` [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
@ 2016-07-20 13:35   ` Josef Bacik
  2016-07-21  1:18     ` Wang Xiaoguang
  0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-07-20 13:35 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger

On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
> This patch can fix some false ENOSPC errors, below test script can
> reproduce one false ENOSPC error:
> 	#!/bin/bash
> 	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
> 	dev=$(losetup --show -f fs.img)
> 	mkfs.btrfs -f -M $dev
> 	mkdir /tmp/mntpoint
> 	mount $dev /tmp/mntpoint
> 	cd /tmp/mntpoint
> 	xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile
>
> Above script will fail for ENOSPC reason, but indeed fs still has free
> space to satisfy this request. Please see call graph:
> btrfs_fallocate()
> |-> btrfs_alloc_data_chunk_ondemand()
> |   bytes_may_use += 64M
> |-> btrfs_prealloc_file_range()
>     |-> btrfs_reserve_extent()
>         |-> btrfs_add_reserved_bytes()
>         |   alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
>         |   change bytes_may_use, and bytes_reserved += 64M. Now
>         |   bytes_may_use + bytes_reserved == 128M, which is greater
>         |   than btrfs_space_info's total_bytes, false enospc occurs.
>         |   Note, the bytes_may_use decrease operation will done in
>         |   end of btrfs_fallocate(), which is too late.
>
> Here is another simple case for buffered write:
>                     CPU 1              |              CPU 2
>                                        |
> |-> cow_file_range()                   |-> __btrfs_buffered_write()
>     |-> btrfs_reserve_extent()         |   |
>     |                                  |   |
>     |                                  |   |
>     |    .....                         |   |-> btrfs_check_data_free_space()
>     |                                  |
>     |                                  |
>     |-> extent_clear_unlock_delalloc() |
>
> In CPU 1, btrfs_reserve_extent()->find_free_extent()->
> btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
> operation will be delayed to be done in extent_clear_unlock_delalloc().
> Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
> btrfs_check_data_free_space() tries to reserve 100MB data space.
> If
> 	100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
> 		data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
> 		data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
> btrfs_check_data_free_space() will try to allcate new data chunk or call
> btrfs_start_delalloc_roots(), or commit current transaction inorder to
> reserve some free space, obviously a lot of work. But indeed it's not
> necessary as long as decreasing bytes_may_use timely, we still have
> free space, decreasing 128M from bytes_may_use.
>
> To fix this issue, this patch chooses to update bytes_may_use for both
> data and metadata in btrfs_add_reserved_bytes(). For compress path, real
> extent length may not be equal to file content length, so introduce a
> ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
> btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
> file content length. Then compress path can update bytes_may_use
> correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
> and RESERVE_FREE.
>
> For inode marked as NODATACOW or extent marked as PREALLOC, we can
> directly call btrfs_free_reserved_data_space() to adjust bytes_may_use.
>
> Meanwhile __btrfs_prealloc_file_range() will call
> btrfs_free_reserved_data_space() internally for both sucessful and failed
> path, btrfs_prealloc_file_range()'s callers does not need to call
> btrfs_free_reserved_data_space() any more.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>  fs/btrfs/ctree.h       |  2 +-
>  fs/btrfs/extent-tree.c | 56 +++++++++++++++++---------------------------------
>  fs/btrfs/file.c        | 26 +++++++++++++----------
>  fs/btrfs/inode-map.c   |  3 +--
>  fs/btrfs/inode.c       | 37 ++++++++++++++++++++++++---------
>  fs/btrfs/relocation.c  | 11 ++++++++--
>  6 files changed, 72 insertions(+), 63 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 4274a7b..7eb2913 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2556,7 +2556,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
>  				   struct btrfs_root *root,
>  				   u64 root_objectid, u64 owner, u64 offset,
>  				   struct btrfs_key *ins);
> -int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes,
> +int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 num_bytes,
>  			 u64 min_alloc_size, u64 empty_size, u64 hint_byte,
>  			 struct btrfs_key *ins, int is_data, int delalloc);
>  int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 8eaac39..5447973 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -60,21 +60,6 @@ enum {
>  	CHUNK_ALLOC_FORCE = 2,
>  };
>
> -/*
> - * Control how reservations are dealt with.
> - *
> - * RESERVE_FREE - freeing a reservation.
> - * RESERVE_ALLOC - allocating space and we need to update bytes_may_use for
> - *   ENOSPC accounting
> - * RESERVE_ALLOC_NO_ACCOUNT - allocating space and we should not update
> - *   bytes_may_use as the ENOSPC accounting is done elsewhere
> - */
> -enum {
> -	RESERVE_FREE = 0,
> -	RESERVE_ALLOC = 1,
> -	RESERVE_ALLOC_NO_ACCOUNT = 2,
> -};
> -
>  static int update_block_group(struct btrfs_trans_handle *trans,
>  			      struct btrfs_root *root, u64 bytenr,
>  			      u64 num_bytes, int alloc);
> @@ -105,7 +90,7 @@ static int find_next_key(struct btrfs_path *path, int level,
>  static void dump_space_info(struct btrfs_space_info *info, u64 bytes,
>  			    int dump_block_groups);
>  static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
> -				    u64 num_bytes, int reserve, int delalloc);
> +				    u64 ram_bytes, u64 num_bytes, int delalloc);
>  static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache,
>  				     u64 num_bytes, int delalloc);
>  static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv,
> @@ -3491,7 +3476,6 @@ again:
>  		dcs = BTRFS_DC_SETUP;
>  	else if (ret == -ENOSPC)
>  		set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
> -	btrfs_free_reserved_data_space(inode, 0, num_pages);
>
>  out_put:
>  	iput(inode);
> @@ -6300,8 +6284,9 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
>  /**
>   * btrfs_add_reserved_bytes - update the block_group and space info counters
>   * @cache:	The cache we are manipulating
> + * @ram_bytes:  The number of bytes of file content, and will be same to
> + *              @num_bytes except for the compress path.
>   * @num_bytes:	The number of bytes in question
> - * @reserve:	One of the reservation enums
>   * @delalloc:   The blocks are allocated for the delalloc write
>   *
>   * This is called by the allocator when it reserves space. Metadata
> @@ -6316,7 +6301,7 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
>   * succeeds.
>   */
>  static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
> -				    u64 num_bytes, int reserve, int delalloc)
> +				    u64 ram_bytes, u64 num_bytes, int delalloc)
>  {
>  	struct btrfs_space_info *space_info = cache->space_info;
>  	int ret = 0;
> @@ -6328,13 +6313,11 @@ static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
>  	} else {
>  		cache->reserved += num_bytes;
>  		space_info->bytes_reserved += num_bytes;
> -		if (reserve == RESERVE_ALLOC) {
> -			trace_btrfs_space_reservation(cache->fs_info,
> -					"space_info", space_info->flags,
> -					num_bytes, 0);
> -			space_info->bytes_may_use -= num_bytes;
> -		}
>
> +		trace_btrfs_space_reservation(cache->fs_info,
> +				"space_info", space_info->flags,
> +				num_bytes, 0);

This needs to be ram_bytes to keep the accounting consistent for tools that use 
these tracepoints.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
  2016-07-20 13:22   ` Josef Bacik
@ 2016-07-21  1:15     ` Wang Xiaoguang
  0 siblings, 0 replies; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-21  1:15 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba, holger

hello,

On 07/20/2016 09:22 PM, Josef Bacik wrote:
> On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
>> In next patch, btrfs_clear_bit_hook() will not call
>> btrfs_free_reserved_data_space_noquota() to update btrfs_space_info's
>> bytes_may_use unless it has EXTENT_DO_ACCOUNTING or 
>> EXTENT_CLEAR_DATA_RESV,
>> as for the reason, please see the next patch for detailed info.
>>
>> As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
>> run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
>> PREALLOC, we also need to update bytes_may_use, but can not pass
>> EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so 
>> here
>> we introduce EXTENT_CLEAR_DATA_RESV flag to indicate 
>> btrfs_clear_bit_hook()
>> to update btrfs_space_info's bytes_may_use.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>
> There's no point in introducing only a flag in one patch, collapse 
> this into the patch that actually uses it.  Thanks,
OK

Regards,
Xiaoguang Wang
>
> Josef
>
>




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely
  2016-07-20 13:35   ` Josef Bacik
@ 2016-07-21  1:18     ` Wang Xiaoguang
  0 siblings, 0 replies; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-21  1:18 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba, holger

hello,

On 07/20/2016 09:35 PM, Josef Bacik wrote:
> On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
>> This patch can fix some false ENOSPC errors, below test script can
>> reproduce one false ENOSPC error:
>>     #!/bin/bash
>>     dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
>>     dev=$(losetup --show -f fs.img)
>>     mkfs.btrfs -f -M $dev
>>     mkdir /tmp/mntpoint
>>     mount $dev /tmp/mntpoint
>>     cd /tmp/mntpoint
>>     xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile
>>
>> Above script will fail for ENOSPC reason, but indeed fs still has free
>> space to satisfy this request. Please see call graph:
>> btrfs_fallocate()
>> |-> btrfs_alloc_data_chunk_ondemand()
>> |   bytes_may_use += 64M
>> |-> btrfs_prealloc_file_range()
>>     |-> btrfs_reserve_extent()
>>         |-> btrfs_add_reserved_bytes()
>>         |   alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
>>         |   change bytes_may_use, and bytes_reserved += 64M. Now
>>         |   bytes_may_use + bytes_reserved == 128M, which is greater
>>         |   than btrfs_space_info's total_bytes, false enospc occurs.
>>         |   Note, the bytes_may_use decrease operation will done in
>>         |   end of btrfs_fallocate(), which is too late.
>>
>> Here is another simple case for buffered write:
>>                     CPU 1              |              CPU 2
>>                                        |
>> |-> cow_file_range()                   |-> __btrfs_buffered_write()
>>     |-> btrfs_reserve_extent()         |   |
>>     |                                  |   |
>>     |                                  |   |
>>     |    .....                         |   |-> 
>> btrfs_check_data_free_space()
>>     |                                  |
>>     |                                  |
>>     |-> extent_clear_unlock_delalloc() |
>>
>> In CPU 1, btrfs_reserve_extent()->find_free_extent()->
>> btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
>> operation will be delayed to be done in extent_clear_unlock_delalloc().
>> Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
>> btrfs_check_data_free_space() tries to reserve 100MB data space.
>> If
>>     100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
>>         data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
>>         data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
>> btrfs_check_data_free_space() will try to allcate new data chunk or call
>> btrfs_start_delalloc_roots(), or commit current transaction inorder to
>> reserve some free space, obviously a lot of work. But indeed it's not
>> necessary as long as decreasing bytes_may_use timely, we still have
>> free space, decreasing 128M from bytes_may_use.
>>
>> To fix this issue, this patch chooses to update bytes_may_use for both
>> data and metadata in btrfs_add_reserved_bytes(). For compress path, real
>> extent length may not be equal to file content length, so introduce a
>> ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
>> btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
>> file content length. Then compress path can update bytes_may_use
>> correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, 
>> RESERVE_ALLOC
>> and RESERVE_FREE.
>>
>> For inode marked as NODATACOW or extent marked as PREALLOC, we can
>> directly call btrfs_free_reserved_data_space() to adjust bytes_may_use.
>>
>> Meanwhile __btrfs_prealloc_file_range() will call
>> btrfs_free_reserved_data_space() internally for both sucessful and 
>> failed
>> path, btrfs_prealloc_file_range()'s callers does not need to call
>> btrfs_free_reserved_data_space() any more.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>>  fs/btrfs/ctree.h       |  2 +-
>>  fs/btrfs/extent-tree.c | 56 
>> +++++++++++++++++---------------------------------
>>  fs/btrfs/file.c        | 26 +++++++++++++----------
>>  fs/btrfs/inode-map.c   |  3 +--
>>  fs/btrfs/inode.c       | 37 ++++++++++++++++++++++++---------
>>  fs/btrfs/relocation.c  | 11 ++++++++--
>>  6 files changed, 72 insertions(+), 63 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 4274a7b..7eb2913 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -2556,7 +2556,7 @@ int btrfs_alloc_logged_file_extent(struct 
>> btrfs_trans_handle *trans,
>>                     struct btrfs_root *root,
>>                     u64 root_objectid, u64 owner, u64 offset,
>>                     struct btrfs_key *ins);
>> -int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes,
>> +int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 
>> num_bytes,
>>               u64 min_alloc_size, u64 empty_size, u64 hint_byte,
>>               struct btrfs_key *ins, int is_data, int delalloc);
>>  int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct 
>> btrfs_root *root,
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 8eaac39..5447973 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -60,21 +60,6 @@ enum {
>>      CHUNK_ALLOC_FORCE = 2,
>>  };
>>
>> -/*
>> - * Control how reservations are dealt with.
>> - *
>> - * RESERVE_FREE - freeing a reservation.
>> - * RESERVE_ALLOC - allocating space and we need to update 
>> bytes_may_use for
>> - *   ENOSPC accounting
>> - * RESERVE_ALLOC_NO_ACCOUNT - allocating space and we should not update
>> - *   bytes_may_use as the ENOSPC accounting is done elsewhere
>> - */
>> -enum {
>> -    RESERVE_FREE = 0,
>> -    RESERVE_ALLOC = 1,
>> -    RESERVE_ALLOC_NO_ACCOUNT = 2,
>> -};
>> -
>>  static int update_block_group(struct btrfs_trans_handle *trans,
>>                    struct btrfs_root *root, u64 bytenr,
>>                    u64 num_bytes, int alloc);
>> @@ -105,7 +90,7 @@ static int find_next_key(struct btrfs_path *path, 
>> int level,
>>  static void dump_space_info(struct btrfs_space_info *info, u64 bytes,
>>                  int dump_block_groups);
>>  static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache 
>> *cache,
>> -                    u64 num_bytes, int reserve, int delalloc);
>> +                    u64 ram_bytes, u64 num_bytes, int delalloc);
>>  static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache 
>> *cache,
>>                       u64 num_bytes, int delalloc);
>>  static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv,
>> @@ -3491,7 +3476,6 @@ again:
>>          dcs = BTRFS_DC_SETUP;
>>      else if (ret == -ENOSPC)
>>          set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
>> -    btrfs_free_reserved_data_space(inode, 0, num_pages);
>>
>>  out_put:
>>      iput(inode);
>> @@ -6300,8 +6284,9 @@ void btrfs_wait_block_group_reservations(struct 
>> btrfs_block_group_cache *bg)
>>  /**
>>   * btrfs_add_reserved_bytes - update the block_group and space info 
>> counters
>>   * @cache:    The cache we are manipulating
>> + * @ram_bytes:  The number of bytes of file content, and will be 
>> same to
>> + *              @num_bytes except for the compress path.
>>   * @num_bytes:    The number of bytes in question
>> - * @reserve:    One of the reservation enums
>>   * @delalloc:   The blocks are allocated for the delalloc write
>>   *
>>   * This is called by the allocator when it reserves space. Metadata
>> @@ -6316,7 +6301,7 @@ void btrfs_wait_block_group_reservations(struct 
>> btrfs_block_group_cache *bg)
>>   * succeeds.
>>   */
>>  static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache 
>> *cache,
>> -                    u64 num_bytes, int reserve, int delalloc)
>> +                    u64 ram_bytes, u64 num_bytes, int delalloc)
>>  {
>>      struct btrfs_space_info *space_info = cache->space_info;
>>      int ret = 0;
>> @@ -6328,13 +6313,11 @@ static int btrfs_add_reserved_bytes(struct 
>> btrfs_block_group_cache *cache,
>>      } else {
>>          cache->reserved += num_bytes;
>>          space_info->bytes_reserved += num_bytes;
>> -        if (reserve == RESERVE_ALLOC) {
>> -            trace_btrfs_space_reservation(cache->fs_info,
>> -                    "space_info", space_info->flags,
>> -                    num_bytes, 0);
>> -            space_info->bytes_may_use -= num_bytes;
>> -        }
>>
>> +        trace_btrfs_space_reservation(cache->fs_info,
>> +                "space_info", space_info->flags,
>> +                num_bytes, 0);
>
> This needs to be ram_bytes to keep the accounting consistent for tools 
> that use these tracepoints.  Thanks,
OK, I'll fix this issue in later version.

Regards,
Xiaoguang Wang
>
> Josef
>
>




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
  2016-07-20 13:18   ` Josef Bacik
@ 2016-07-21  1:49     ` Wang Xiaoguang
  2016-07-21 13:05       ` Josef Bacik
  0 siblings, 1 reply; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-21  1:49 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba, holger

hello,

On 07/20/2016 09:18 PM, Josef Bacik wrote:
> On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
>> In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
>> wrong file offset for reloc_inode, it uses cluster->start and 
>> cluster->end,
>> which indeed are extent's bytenr. The correct value should be
>> cluster->[start|end] minus block group's start bytenr.
>>
>> start bytenr   cluster->start
>> |              |     extent      |   extent   | ...| extent |
>> |----------------------------------------------------------------|
>> |                block group reloc_inode |
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>>  fs/btrfs/relocation.c | 10 ++++++----
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 0477dca..a0de885 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -3030,12 +3030,14 @@ int prealloc_file_extent_cluster(struct inode 
>> *inode,
>>      u64 num_bytes;
>>      int nr = 0;
>>      int ret = 0;
>> +    u64 prealloc_start = cluster->start - offset;
>> +    u64 prealloc_end = cluster->end - offset;
>>
>>      BUG_ON(cluster->start != cluster->boundary[0]);
>>      inode_lock(inode);
>>
>> -    ret = btrfs_check_data_free_space(inode, cluster->start,
>> -                      cluster->end + 1 - cluster->start);
>> +    ret = btrfs_check_data_free_space(inode, prealloc_start,
>> +                      prealloc_end + 1 - prealloc_start);
>>      if (ret)
>>          goto out;
>>
>> @@ -3056,8 +3058,8 @@ int prealloc_file_extent_cluster(struct inode 
>> *inode,
>>              break;
>>          nr++;
>>      }
>> -    btrfs_free_reserved_data_space(inode, cluster->start,
>> -                       cluster->end + 1 - cluster->start);
>> +    btrfs_free_reserved_data_space(inode, prealloc_start,
>> +                       prealloc_end + 1 - prealloc_start);
>>  out:
>>      inode_unlock(inode);
>>      return ret;
>>
>
> This ends up being the same amount.  Consider this scenario
>
> bg bytenr = 4096
> cluster->start = 8192
> cluster->end = 12287
>
> cluster->end + 1 - cluster->start = 4096
>
> prealloc_start = cluster->start - offset = 0
> prealloc_end = cluster->end - offset = 8191
>
> prealloc_end + 1 - prealloc_start = 4096
>
> You shift both by the same amount, which gives you the same answer.  
> Thanks,
Thanks for reviewing.
Yes, I know the amount of preallocated data space is the same, this patch
does not fix any bugs :)

For every block group to be balanced, we create a corresponding inode.
For this inode, the initial offset should be 0. In your above example,
before this patch, it's btrfs_free_reserved_data_space(inode, 8192, 4096);
with this patch, it's btrfs_free_reserved_data_space(inode, 4096, 4096).

I just want to make btrfs_free_reserved_data_space()'s 'start' argument
be offset inside block group, not offset inside whole fs byternr space. 
I'm not
a  English native, hope that I have expressed what I want to :)

But yes, I'm also OK with removing this patch.

Regards,
Xiaoguang Wang

>
> Josef
>
>




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue
  2016-07-20  8:46 ` [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Holger Hoffstätte
@ 2016-07-21  1:51   ` Wang Xiaoguang
  0 siblings, 0 replies; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-21  1:51 UTC (permalink / raw)
  To: Holger Hoffstätte, linux-btrfs; +Cc: dsterba, jbacik

hello,

On 07/20/2016 04:46 PM, Holger Hoffstätte wrote:
> On 07/20/16 07:56, Wang Xiaoguang wrote:
>> Currently in btrfs, for data space reservation, it does not update
>> bytes_may_use in btrfs_update_reserved_bytes() and the decrease operation
>> will be delayed to be done in extent_clear_unlock_delalloc(), for
>> fallocate(2), decrease operation is even delayed to be done in end
>> of btrfs_fallocate(), which is too late. Obviously such delay will
>> cause unnecessary pressure to enospc system, in [PATCH 4/4], there is
>> a simpel test script that can reveal such false ENOSPC bug.
>>
>> So in this patch set, it will remove RESERVE_FREE, RESERVE_ALLOC and
>> RESERVE_ALLOC_NO_ACCOUNT, and we always update bytes_may_use timely.
>>
>> I'll also commit a fstests test case for this issue.
>>
>> Wang Xiaoguang (4):
>>    btrfs: use correct offset for reloc_inode in
>>      prealloc_file_extent_cluster()
>>    btrfs: divide btrfs_update_reserved_bytes() into two functions
>>    btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
>>    btrfs: update btrfs_space_info's bytes_may_use timely
>>
> Just like the previous version, for all 4 patches:
>
> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
>
> via the reproducer script & some very large manual fallocates.
Yes, thanks very much for your test.
I also run the whole fstests before sending this patch set :)

Regards,
Xiaoguang Wang
>
> thanks!
> Holger
>
>
>




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
  2016-07-21  1:49     ` Wang Xiaoguang
@ 2016-07-21 13:05       ` Josef Bacik
  0 siblings, 0 replies; 15+ messages in thread
From: Josef Bacik @ 2016-07-21 13:05 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger

On 07/20/2016 09:49 PM, Wang Xiaoguang wrote:
> hello,
>
> On 07/20/2016 09:18 PM, Josef Bacik wrote:
>> On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
>>> In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
>>> wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
>>> which indeed are extent's bytenr. The correct value should be
>>> cluster->[start|end] minus block group's start bytenr.
>>>
>>> start bytenr   cluster->start
>>> |              |     extent      |   extent   | ...| extent |
>>> |----------------------------------------------------------------|
>>> |                block group reloc_inode |
>>>
>>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>> ---
>>>  fs/btrfs/relocation.c | 10 ++++++----
>>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>>> index 0477dca..a0de885 100644
>>> --- a/fs/btrfs/relocation.c
>>> +++ b/fs/btrfs/relocation.c
>>> @@ -3030,12 +3030,14 @@ int prealloc_file_extent_cluster(struct inode *inode,
>>>      u64 num_bytes;
>>>      int nr = 0;
>>>      int ret = 0;
>>> +    u64 prealloc_start = cluster->start - offset;
>>> +    u64 prealloc_end = cluster->end - offset;
>>>
>>>      BUG_ON(cluster->start != cluster->boundary[0]);
>>>      inode_lock(inode);
>>>
>>> -    ret = btrfs_check_data_free_space(inode, cluster->start,
>>> -                      cluster->end + 1 - cluster->start);
>>> +    ret = btrfs_check_data_free_space(inode, prealloc_start,
>>> +                      prealloc_end + 1 - prealloc_start);
>>>      if (ret)
>>>          goto out;
>>>
>>> @@ -3056,8 +3058,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
>>>              break;
>>>          nr++;
>>>      }
>>> -    btrfs_free_reserved_data_space(inode, cluster->start,
>>> -                       cluster->end + 1 - cluster->start);
>>> +    btrfs_free_reserved_data_space(inode, prealloc_start,
>>> +                       prealloc_end + 1 - prealloc_start);
>>>  out:
>>>      inode_unlock(inode);
>>>      return ret;
>>>
>>
>> This ends up being the same amount.  Consider this scenario
>>
>> bg bytenr = 4096
>> cluster->start = 8192
>> cluster->end = 12287
>>
>> cluster->end + 1 - cluster->start = 4096
>>
>> prealloc_start = cluster->start - offset = 0
>> prealloc_end = cluster->end - offset = 8191
>>
>> prealloc_end + 1 - prealloc_start = 4096
>>
>> You shift both by the same amount, which gives you the same answer.  Thanks,
> Thanks for reviewing.
> Yes, I know the amount of preallocated data space is the same, this patch
> does not fix any bugs :)
>
> For every block group to be balanced, we create a corresponding inode.
> For this inode, the initial offset should be 0. In your above example,
> before this patch, it's btrfs_free_reserved_data_space(inode, 8192, 4096);
> with this patch, it's btrfs_free_reserved_data_space(inode, 4096, 4096).
>
> I just want to make btrfs_free_reserved_data_space()'s 'start' argument
> be offset inside block group, not offset inside whole fs byternr space. I'm not
> a  English native, hope that I have expressed what I want to :)
>
> But yes, I'm also OK with removing this patch.
>

Oh I see, ok that's fine if we're just trying to make it look sane then go for it.

Signed-off-by: Josef Bacik <jbacik@fb.com>

Thanks,

Josef


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-07-21 13:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-20  5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
2016-07-20  5:56 ` [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
2016-07-20 13:18   ` Josef Bacik
2016-07-21  1:49     ` Wang Xiaoguang
2016-07-21 13:05       ` Josef Bacik
2016-07-20  5:56 ` [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
2016-07-20 13:21   ` Josef Bacik
2016-07-20  5:56 ` [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag Wang Xiaoguang
2016-07-20 13:22   ` Josef Bacik
2016-07-21  1:15     ` Wang Xiaoguang
2016-07-20  5:56 ` [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
2016-07-20 13:35   ` Josef Bacik
2016-07-21  1:18     ` Wang Xiaoguang
2016-07-20  8:46 ` [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Holger Hoffstätte
2016-07-21  1:51   ` Wang Xiaoguang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.