All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues
@ 2016-11-11  8:39 Wang Xiaoguang
  2016-11-11  8:39 ` [PATCH 1/3] btrfs: improve inode's outstanding_extents computation Wang Xiaoguang
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Wang Xiaoguang @ 2016-11-11  8:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, jbacik, dsterba, holger, s.priebe

When having compression enabled, Stefan Priebe ofen got enospc errors
though fs still has much free space. Qu Wenruo also has submitted a
fstests test case which can reproduce this bug steadily, please see
url: https://patchwork.kernel.org/patch/9420527

First patch[1/3] "btrfs: improve inode's outstanding_extents computation" is to
fix outstanding_extents and reserved_extents account issues. This issue was revealed
by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, When modifying
BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often gets these warnings from
btrfs_destroy_inode():
        WARN_ON(BTRFS_I(inode)->outstanding_extents);
        WARN_ON(BTRFS_I(inode)->reserved_extents);
Please see this patch's commit message for detailed info, and this patch is
necessary to patch2 and patch3.

For false enospc, the root reasson is that for compression, its max extent size will
be 128k, not 128MB. If we still use 128MB as max extent size to reserve metadata for
compression, obviously it's not appropriate. In patch "btrfs: Introduce COMPRESS
reserve type to fix false enospc for compression" commit message,
we explain why false enospc error occurs, please see it for detailed info.

To fix this issue, we introduce a new enum type:
	enum btrfs_metadata_reserve_type {
		BTRFS_RESERVE_NORMAL,
        	BTRFS_RESERVE_COMPRESS,
	};
For btrfs_delalloc_[reserve|release]_metadata() and
btrfs_delalloc_[reserve|release]_space(), we introce a new btrfs_metadata_reserve_type
argument, then if a path needs to go compression, we pass BTRFS_RESERVE_COMPRESS,
otherwise pass BTRFS_RESERVE_NORMAL.

With these patchs, Stefan no longer saw such false enospc errors, and Qu Wenruo's
fstests test case will also pass. I have also run whole fstests multiple times,
no regression occurs, thanks.

Wang Xiaoguang (3):
  btrfs: improve inode's outstanding_extents computation
  btrfs: introduce type based delalloc metadata reserve
  btrfs: Introduce COMPRESS reserve type to fix false enospc for
    compression

 fs/btrfs/ctree.h             |  36 +++++--
 fs/btrfs/extent-tree.c       |  52 ++++++---
 fs/btrfs/extent_io.c         |  61 ++++++++++-
 fs/btrfs/extent_io.h         |   5 +
 fs/btrfs/file.c              |  25 +++--
 fs/btrfs/free-space-cache.c  |   6 +-
 fs/btrfs/inode-map.c         |   6 +-
 fs/btrfs/inode.c             | 246 ++++++++++++++++++++++++++++++++++---------
 fs/btrfs/ioctl.c             |  16 +--
 fs/btrfs/relocation.c        |  14 ++-
 fs/btrfs/tests/inode-tests.c |  15 +--
 11 files changed, 381 insertions(+), 101 deletions(-)

-- 
2.7.4




^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] btrfs: improve inode's outstanding_extents computation
  2016-11-11  8:39 [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues Wang Xiaoguang
@ 2016-11-11  8:39 ` Wang Xiaoguang
  2017-01-03 21:00   ` Liu Bo
  2016-11-11  8:39 ` [PATCH 2/3] btrfs: introduce type based delalloc metadata reserve Wang Xiaoguang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Wang Xiaoguang @ 2016-11-11  8:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, jbacik, dsterba, holger, s.priebe

This issue was revealed by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB,
When modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often
gets these warnings from btrfs_destroy_inode():
	WARN_ON(BTRFS_I(inode)->outstanding_extents);
	WARN_ON(BTRFS_I(inode)->reserved_extents);

Simple test program below can reproduce this issue steadily.
Note: you need to modify BTRFS_MAX_EXTENT_SIZE to 64KB to have test,
otherwise there won't be such WARNING.
	#include <string.h>
	#include <unistd.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <fcntl.h>

	int main(void)
	{
		int fd;
		char buf[68 *1024];

		memset(buf, 0, 68 * 1024);
		fd = open("testfile", O_CREAT | O_EXCL | O_RDWR);
		pwrite(fd, buf, 68 * 1024, 64 * 1024);
		return;
	}

When BTRFS_MAX_EXTENT_SIZE is 64KB, and buffered data range is:
64KB						128K		132KB
|-----------------------------------------------|---------------|
                         64 + 4KB

1) for above data range, btrfs_delalloc_reserve_metadata() will reserve
metadata and set BTRFS_I(inode)->outstanding_extents to 2.
(68KB + 64KB - 1) / 64KB == 2

Outstanding_extents: 2

2) then btrfs_dirty_page() will be called to dirty pages and set
EXTENT_DELALLOC flag. In this case, btrfs_set_bit_hook() will be called
twice.
The 1st set_bit_hook() call will set DEALLOC flag for the first 64K.
64KB						128KB
|-----------------------------------------------|
	64KB DELALLOC
Outstanding_extents: 2

Set_bit_hooks() uses FIRST_DELALLOC flag to avoid re-increase
outstanding_extents counter.
So for 1st set_bit_hooks() call, it won't modify outstanding_extents,
it's still 2.

Then FIRST_DELALLOC flag is *CLEARED*.

3) 2nd btrfs_set_bit_hook() call.
Because FIRST_DELALLOC have been cleared by previous set_bit_hook(),
btrfs_set_bit_hook() will increase BTRFS_I(inode)->outstanding_extents by
one, so now BTRFS_I(inode)->outstanding_extents is 3.
64KB                                            128KB            132KB
|-----------------------------------------------|----------------|
	64K DELALLOC				   4K DELALLOC
Outstanding_extents: 3

But the correct outstanding_extents number should be 2, not 3.
The 2nd btrfs_set_bit_hook() call just screwed up this, and leads to the
WARN_ON().

Normally, we can solve it by only increasing outstanding_extents in
set_bit_hook().
But the problem is for delalloc_reserve/release_metadata(), we only have
a 'length' parameter, and calculate in-accurate outstanding_extents.
If we only rely on set_bit_hook() release_metadata() will crew things up
as it will decrease inaccurate number.

So the fix we use is:
1) Increase *INACCURATE* outstanding_extents at delalloc_reserve_meta
   Just as a place holder.
2) Increase *accurate* outstanding_extents at set_bit_hooks()
   This is the real increaser.
3) Decrease *INACCURATE* outstanding_extents before returning
   This makes outstanding_extents to correct value.

For 128M BTRFS_MAX_EXTENT_SIZE, due to limitation of
__btrfs_buffered_write(), each iteration will only handle about 2MB
data.
So btrfs_dirty_pages() won't need to handle cases cross 2 extents.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/ctree.h |  2 ++
 fs/btrfs/inode.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++------
 fs/btrfs/ioctl.c |  6 ++----
 3 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9d8edcb..766d152 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3139,6 +3139,8 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput,
 			       int nr);
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
 			      struct extent_state **cached_state, int dedupe);
+int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
+			    struct extent_state **cached_state);
 int btrfs_create_subvol_root(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *new_root,
 			     struct btrfs_root *parent_root,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1f980ef..25e0083 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1601,6 +1601,9 @@ static void btrfs_split_extent_hook(struct inode *inode,
 	if (!(orig->state & EXTENT_DELALLOC))
 		return;
 
+	if (btrfs_is_free_space_inode(inode))
+		return;
+
 	size = orig->end - orig->start + 1;
 	if (size > BTRFS_MAX_EXTENT_SIZE) {
 		u64 num_extents;
@@ -1643,6 +1646,9 @@ static void btrfs_merge_extent_hook(struct inode *inode,
 	if (!(other->state & EXTENT_DELALLOC))
 		return;
 
+	if (btrfs_is_free_space_inode(inode))
+		return;
+
 	if (new->start > other->start)
 		new_size = new->end - other->start + 1;
 	else
@@ -1738,7 +1744,6 @@ static void btrfs_del_delalloc_inode(struct btrfs_root *root,
 static void btrfs_set_bit_hook(struct inode *inode,
 			       struct extent_state *state, unsigned *bits)
 {
-
 	if ((*bits & EXTENT_DEFRAG) && !(*bits & EXTENT_DELALLOC))
 		WARN_ON(1);
 	/*
@@ -1749,13 +1754,16 @@ static void btrfs_set_bit_hook(struct inode *inode,
 	if (!(state->state & EXTENT_DELALLOC) && (*bits & EXTENT_DELALLOC)) {
 		struct btrfs_root *root = BTRFS_I(inode)->root;
 		u64 len = state->end + 1 - state->start;
+		u64 num_extents = div64_u64(len + BTRFS_MAX_EXTENT_SIZE - 1,
+					    BTRFS_MAX_EXTENT_SIZE);
 		bool do_list = !btrfs_is_free_space_inode(inode);
 
-		if (*bits & EXTENT_FIRST_DELALLOC) {
+		if (*bits & EXTENT_FIRST_DELALLOC)
 			*bits &= ~EXTENT_FIRST_DELALLOC;
-		} else {
+
+		if (do_list) {
 			spin_lock(&BTRFS_I(inode)->lock);
-			BTRFS_I(inode)->outstanding_extents++;
+			BTRFS_I(inode)->outstanding_extents += num_extents;
 			spin_unlock(&BTRFS_I(inode)->lock);
 		}
 
@@ -1803,7 +1811,7 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 
 		if (*bits & EXTENT_FIRST_DELALLOC) {
 			*bits &= ~EXTENT_FIRST_DELALLOC;
-		} else if (!(*bits & EXTENT_DO_ACCOUNTING)) {
+		} else if (!(*bits & EXTENT_DO_ACCOUNTING) && do_list) {
 			spin_lock(&BTRFS_I(inode)->lock);
 			BTRFS_I(inode)->outstanding_extents -= num_extents;
 			spin_unlock(&BTRFS_I(inode)->lock);
@@ -2001,9 +2009,52 @@ static noinline int add_pending_csums(struct btrfs_trans_handle *trans,
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
 			      struct extent_state **cached_state, int dedupe)
 {
+	int ret;
+	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
+				    BTRFS_MAX_EXTENT_SIZE);
+
+	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
+	ret = set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end,
+				  cached_state);
+
+	/*
+	 * btrfs_delalloc_reserve_metadata() will first add number of
+	 * outstanding extents according to data length, which is inaccurate
+	 * for case like dirtying already dirty pages.
+	 * so here we will decrease such inaccurate numbers, to make
+	 * outstanding_extents only rely on the correct values added by
+	 * set_bit_hook()
+	 *
+	 * Also, we skipped the metadata space reserve for space cache inodes,
+	 * so don't modify the outstanding_extents value.
+	 */
+	if (ret == 0 && !btrfs_is_free_space_inode(inode)) {
+		spin_lock(&BTRFS_I(inode)->lock);
+		BTRFS_I(inode)->outstanding_extents -= num_extents;
+		spin_unlock(&BTRFS_I(inode)->lock);
+	}
+
+	return ret;
+}
+
+int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
+			    struct extent_state **cached_state)
+{
+	int ret;
+	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
+				    BTRFS_MAX_EXTENT_SIZE);
+
 	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
-	return set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end,
-				   cached_state);
+	ret = set_extent_defrag(&BTRFS_I(inode)->io_tree, start, end,
+				cached_state);
+
+	if (ret == 0 && !btrfs_is_free_space_inode(inode)) {
+		spin_lock(&BTRFS_I(inode)->lock);
+		BTRFS_I(inode)->outstanding_extents -= num_extents;
+		spin_unlock(&BTRFS_I(inode)->lock);
+	}
+
+	return ret;
 }
 
 /* see btrfs_writepage_start_hook for details on why this is required */
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 7163457..eff5eae 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1235,10 +1235,8 @@ again:
 				(page_cnt - i_done) << PAGE_SHIFT);
 	}
 
-
-	set_extent_defrag(&BTRFS_I(inode)->io_tree, page_start, page_end - 1,
-			  &cached_state);
-
+	btrfs_set_extent_defrag(inode, page_start,
+				page_end - 1, &cached_state);
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree,
 			     page_start, page_end - 1, &cached_state,
 			     GFP_NOFS);
-- 
2.7.4




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] btrfs: introduce type based delalloc metadata reserve
  2016-11-11  8:39 [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues Wang Xiaoguang
  2016-11-11  8:39 ` [PATCH 1/3] btrfs: improve inode's outstanding_extents computation Wang Xiaoguang
@ 2016-11-11  8:39 ` Wang Xiaoguang
  2016-11-11  8:39 ` [PATCH 3/3] btrfs: Introduce COMPRESS reserve type to fix false enospc for compression Wang Xiaoguang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Wang Xiaoguang @ 2016-11-11  8:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, jbacik, dsterba, holger, s.priebe, Qu Wenruo

Introduce type based metadata reserve parameter for delalloc space
reservation/freeing function.

The problem we are going to solve is, btrfs use different max extent
size for different mount options.

For compression, the max extent size is 128K, while for non-compress write
it's 128M.
And further more, split/merge extent hook highly depends that max extent
size.

Such situation contributes to quite a lot of false ENOSPC.

So this patch introduce the facility to help solve these false ENOSPC
related to different max extent size.

Currently only normal 128M extent size is supported. More types will
follow soon.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h             |  34 ++++++++++---
 fs/btrfs/extent-tree.c       |  50 +++++++++++++------
 fs/btrfs/file.c              |  22 ++++++---
 fs/btrfs/free-space-cache.c  |   6 ++-
 fs/btrfs/inode-map.c         |   6 ++-
 fs/btrfs/inode.c             | 114 ++++++++++++++++++++++++++++---------------
 fs/btrfs/ioctl.c             |  10 ++--
 fs/btrfs/relocation.c        |  11 +++--
 fs/btrfs/tests/inode-tests.c |  15 +++---
 9 files changed, 180 insertions(+), 88 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 766d152..64c63c2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -100,6 +100,19 @@ static const int btrfs_csum_sizes[] = { 4 };
 
 #define BTRFS_MAX_EXTENT_SIZE SZ_128M
 
+/*
+ * Type based metadata reserve type
+ * This affects how btrfs reserve metadata space for buffered write.
+ *
+ * This is caused by the different max extent size for normal COW
+ * and compression, and further in-band dedupe
+ */
+enum btrfs_metadata_reserve_type {
+	BTRFS_RESERVE_NORMAL,
+};
+
+u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type);
+
 struct btrfs_mapping_tree {
 	struct extent_map_tree map_tree;
 };
@@ -2693,10 +2706,14 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 void btrfs_subvolume_release_metadata(struct btrfs_root *root,
 				      struct btrfs_block_rsv *rsv,
 				      u64 qgroup_reserved);
-int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
-void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
-void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
+int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes,
+			enum btrfs_metadata_reserve_type reserve_type);
+void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes,
+			enum btrfs_metadata_reserve_type reserve_type);
+int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len,
+			enum btrfs_metadata_reserve_type reserve_type);
+void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len,
+			enum btrfs_metadata_reserve_type reserve_type);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root,
 					      unsigned short type);
@@ -3138,9 +3155,11 @@ int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput);
 int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput,
 			       int nr);
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
-			      struct extent_state **cached_state, int dedupe);
+			      struct extent_state **cached_state,
+			      enum btrfs_metadata_reserve_type reserve_type);
 int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
-			    struct extent_state **cached_state);
+			    struct extent_state **cached_state,
+			    enum btrfs_metadata_reserve_type reserve_type);
 int btrfs_create_subvol_root(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *new_root,
 			     struct btrfs_root *parent_root,
@@ -3233,7 +3252,8 @@ int btrfs_release_file(struct inode *inode, struct file *file);
 int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
 		      struct page **pages, size_t num_pages,
 		      loff_t pos, size_t write_bytes,
-		      struct extent_state **cached);
+		      struct extent_state **cached,
+		      enum btrfs_metadata_reserve_type reserve_type);
 int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
 ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
 			      struct file *file_out, loff_t pos_out,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4607af3..bd99f25 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5841,15 +5841,16 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root,
  * reserved extents that need to be freed.  This must be called with
  * BTRFS_I(inode)->lock held.
  */
-static unsigned drop_outstanding_extent(struct inode *inode, u64 num_bytes)
+static unsigned drop_outstanding_extent(struct inode *inode, u64 num_bytes,
+				enum btrfs_metadata_reserve_type reserve_type)
 {
 	unsigned drop_inode_space = 0;
 	unsigned dropped_extents = 0;
 	unsigned num_extents = 0;
+	u64 max_extent_size = btrfs_max_extent_size(reserve_type);
 
-	num_extents = (unsigned)div64_u64(num_bytes +
-					  BTRFS_MAX_EXTENT_SIZE - 1,
-					  BTRFS_MAX_EXTENT_SIZE);
+	num_extents = (unsigned)div64_u64(num_bytes + max_extent_size - 1,
+					  max_extent_size);
 	ASSERT(num_extents);
 	ASSERT(BTRFS_I(inode)->outstanding_extents >= num_extents);
 	BTRFS_I(inode)->outstanding_extents -= num_extents;
@@ -5919,7 +5920,17 @@ static u64 calc_csum_metadata_size(struct inode *inode, u64 num_bytes,
 	return btrfs_calc_trans_metadata_size(root, old_csums - num_csums);
 }
 
-int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
+u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type)
+{
+	if (reserve_type == BTRFS_RESERVE_NORMAL)
+		return BTRFS_MAX_EXTENT_SIZE;
+
+	ASSERT(0);
+	return BTRFS_MAX_EXTENT_SIZE;
+}
+
+int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes,
+			enum btrfs_metadata_reserve_type reserve_type)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_block_rsv *block_rsv = &root->fs_info->delalloc_block_rsv;
@@ -5930,6 +5941,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
 	int ret = 0;
 	bool delalloc_lock = true;
 	u64 to_free = 0;
+	u64 max_extent_size = btrfs_max_extent_size(reserve_type);
 	unsigned dropped;
 	bool release_extra = false;
 
@@ -5958,9 +5970,8 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
 	num_bytes = ALIGN(num_bytes, root->sectorsize);
 
 	spin_lock(&BTRFS_I(inode)->lock);
-	nr_extents = (unsigned)div64_u64(num_bytes +
-					 BTRFS_MAX_EXTENT_SIZE - 1,
-					 BTRFS_MAX_EXTENT_SIZE);
+	nr_extents = (unsigned)div64_u64(num_bytes + max_extent_size - 1,
+					 max_extent_size);
 	BTRFS_I(inode)->outstanding_extents += nr_extents;
 
 	nr_extents = 0;
@@ -6011,7 +6022,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
 
 out_fail:
 	spin_lock(&BTRFS_I(inode)->lock);
-	dropped = drop_outstanding_extent(inode, num_bytes);
+	dropped = drop_outstanding_extent(inode, num_bytes, reserve_type);
 	/*
 	 * If the inodes csum_bytes is the same as the original
 	 * csum_bytes then we know we haven't raced with any free()ers
@@ -6077,12 +6088,15 @@ out_fail:
  * btrfs_delalloc_release_metadata - release a metadata reservation for an inode
  * @inode: the inode to release the reservation for
  * @num_bytes: the number of bytes we're releasing
+ * @reserve_type: the type when we reserve delalloc space for this range.
+ *                must be the same passed to btrfs_delalloc_reserve_metadata()
  *
  * This will release the metadata reservation for an inode.  This can be called
  * once we complete IO for a given set of bytes to release their metadata
  * reservations.
  */
-void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
+void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes,
+			enum btrfs_metadata_reserve_type reserve_type)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	u64 to_free = 0;
@@ -6090,7 +6104,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
 
 	num_bytes = ALIGN(num_bytes, root->sectorsize);
 	spin_lock(&BTRFS_I(inode)->lock);
-	dropped = drop_outstanding_extent(inode, num_bytes);
+	dropped = drop_outstanding_extent(inode, num_bytes, reserve_type);
 
 	if (num_bytes)
 		to_free = calc_csum_metadata_size(inode, num_bytes, 0);
@@ -6114,6 +6128,8 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
  * @inode: inode we're writing to
  * @start: start range we are writing to
  * @len: how long the range we are writing to
+ * @reserve_type: the type of write we're reserving for.
+ *		  determine the max extent size.
  *
  * This will do the following things
  *
@@ -6131,14 +6147,15 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
  * Return 0 for success
  * Return <0 for error(-ENOSPC or -EQUOT)
  */
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
+int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len,
+				 enum btrfs_metadata_reserve_type reserve_type)
 {
 	int ret;
 
 	ret = btrfs_check_data_free_space(inode, start, len);
 	if (ret < 0)
 		return ret;
-	ret = btrfs_delalloc_reserve_metadata(inode, len);
+	ret = btrfs_delalloc_reserve_metadata(inode, len, reserve_type);
 	if (ret < 0)
 		btrfs_free_reserved_data_space(inode, start, len);
 	return ret;
@@ -6149,6 +6166,8 @@ int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
  * @inode: inode we're releasing space for
  * @start: start position of the space already reserved
  * @len: the len of the space already reserved
+ * @reserve_type: the type of write we're releasing for
+ *		  must match the type passed to btrfs_delalloc_reserve_space()
  *
  * This must be matched with a call to btrfs_delalloc_reserve_space.  This is
  * called in the case that we don't need the metadata AND data reservations
@@ -6159,9 +6178,10 @@ int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
  * list if there are no delalloc bytes left.
  * Also it will handle the qgroup reserved space.
  */
-void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len)
+void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len,
+			enum btrfs_metadata_reserve_type reserve_type)
 {
-	btrfs_delalloc_release_metadata(inode, len);
+	btrfs_delalloc_release_metadata(inode, len, reserve_type);
 	btrfs_free_reserved_data_space(inode, start, len);
 }
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 72a180d..f174381 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -488,7 +488,8 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages)
 int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
 			     struct page **pages, size_t num_pages,
 			     loff_t pos, size_t write_bytes,
-			     struct extent_state **cached)
+			     struct extent_state **cached,
+			     enum btrfs_metadata_reserve_type reserve_type)
 {
 	int err = 0;
 	int i;
@@ -503,7 +504,7 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
 
 	end_of_last_block = start_pos + num_bytes - 1;
 	err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block,
-					cached, 0);
+					cached, reserve_type);
 	if (err)
 		return err;
 
@@ -1521,6 +1522,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 	bool only_release_metadata = false;
 	bool force_page_uptodate = false;
 	bool need_unlock;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
 
 	nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
 			PAGE_SIZE / (sizeof(struct page *)));
@@ -1583,7 +1585,8 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 			}
 		}
 
-		ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes);
+		ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes,
+						      reserve_type);
 		if (ret) {
 			if (!only_release_metadata)
 				btrfs_free_reserved_data_space(inode, pos,
@@ -1666,14 +1669,16 @@ again:
 			}
 			if (only_release_metadata) {
 				btrfs_delalloc_release_metadata(inode,
-								release_bytes);
+								release_bytes,
+								reserve_type);
 			} else {
 				u64 __pos;
 
 				__pos = round_down(pos, root->sectorsize) +
 					(dirty_pages << PAGE_SHIFT);
 				btrfs_delalloc_release_space(inode, __pos,
-							     release_bytes);
+							     release_bytes,
+							     reserve_type);
 			}
 		}
 
@@ -1683,7 +1688,7 @@ again:
 		if (copied > 0)
 			ret = btrfs_dirty_pages(root, inode, pages,
 						dirty_pages, pos, copied,
-						NULL);
+						NULL, reserve_type);
 		if (need_unlock)
 			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
 					     lockstart, lockend, &cached_state,
@@ -1724,11 +1729,12 @@ again:
 	if (release_bytes) {
 		if (only_release_metadata) {
 			btrfs_end_write_no_snapshoting(root);
-			btrfs_delalloc_release_metadata(inode, release_bytes);
+			btrfs_delalloc_release_metadata(inode, release_bytes,
+							reserve_type);
 		} else {
 			btrfs_delalloc_release_space(inode,
 						round_down(pos, root->sectorsize),
-						release_bytes);
+						release_bytes, reserve_type);
 		}
 	}
 
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index e4b48f3..1e433f7 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1297,7 +1297,8 @@ static int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode,
 
 	/* Everything is written out, now we dirty the pages in the file. */
 	ret = btrfs_dirty_pages(root, inode, io_ctl->pages, io_ctl->num_pages,
-				0, i_size_read(inode), &cached_state);
+				0, i_size_read(inode), &cached_state,
+				BTRFS_RESERVE_NORMAL);
 	if (ret)
 		goto out_nospc;
 
@@ -3536,7 +3537,8 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
 
 	if (ret) {
 		if (release_metadata)
-			btrfs_delalloc_release_metadata(inode, inode->i_size);
+			btrfs_delalloc_release_metadata(inode, inode->i_size,
+							BTRFS_RESERVE_NORMAL);
 #ifdef DEBUG
 		btrfs_err(root->fs_info,
 			"failed to write free ino cache for root %llu",
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index d27014b..0335862 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -491,14 +491,16 @@ again:
 	/* Just to make sure we have enough space */
 	prealloc += 8 * PAGE_SIZE;
 
-	ret = btrfs_delalloc_reserve_space(inode, 0, prealloc);
+	ret = btrfs_delalloc_reserve_space(inode, 0, prealloc,
+					   BTRFS_RESERVE_NORMAL);
 	if (ret)
 		goto out_put;
 
 	ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
 					      prealloc, prealloc, &alloc_hint);
 	if (ret) {
-		btrfs_delalloc_release_metadata(inode, prealloc);
+		btrfs_delalloc_release_metadata(inode, prealloc,
+						BTRFS_RESERVE_NORMAL);
 		goto out_put;
 	}
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 25e0083..906ab27 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -315,7 +315,8 @@ static noinline int cow_file_range_inline(struct btrfs_root *root,
 	}
 
 	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags);
-	btrfs_delalloc_release_metadata(inode, end + 1 - start);
+	btrfs_delalloc_release_metadata(inode, end + 1 - start,
+					BTRFS_RESERVE_NORMAL);
 	btrfs_drop_extent_cache(inode, start, aligned_end - 1, 0);
 out:
 	/*
@@ -1596,6 +1597,8 @@ static void btrfs_split_extent_hook(struct inode *inode,
 				    struct extent_state *orig, u64 split)
 {
 	u64 size;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
+	u64 max_extent_size;
 
 	/* not delalloc, ignore it */
 	if (!(orig->state & EXTENT_DELALLOC))
@@ -1604,8 +1607,10 @@ static void btrfs_split_extent_hook(struct inode *inode,
 	if (btrfs_is_free_space_inode(inode))
 		return;
 
+	max_extent_size = btrfs_max_extent_size(reserve_type);
+
 	size = orig->end - orig->start + 1;
-	if (size > BTRFS_MAX_EXTENT_SIZE) {
+	if (size > max_extent_size) {
 		u64 num_extents;
 		u64 new_size;
 
@@ -1614,13 +1619,13 @@ static void btrfs_split_extent_hook(struct inode *inode,
 		 * applies here, just in reverse.
 		 */
 		new_size = orig->end - split + 1;
-		num_extents = div64_u64(new_size + BTRFS_MAX_EXTENT_SIZE - 1,
-					BTRFS_MAX_EXTENT_SIZE);
+		num_extents = div64_u64(new_size + max_extent_size - 1,
+					max_extent_size);
 		new_size = split - orig->start;
-		num_extents += div64_u64(new_size + BTRFS_MAX_EXTENT_SIZE - 1,
-					BTRFS_MAX_EXTENT_SIZE);
-		if (div64_u64(size + BTRFS_MAX_EXTENT_SIZE - 1,
-			      BTRFS_MAX_EXTENT_SIZE) >= num_extents)
+		num_extents += div64_u64(new_size + max_extent_size - 1,
+					 max_extent_size);
+		if (div64_u64(size + max_extent_size - 1,
+			      max_extent_size) >= num_extents)
 			return;
 	}
 
@@ -1641,6 +1646,8 @@ static void btrfs_merge_extent_hook(struct inode *inode,
 {
 	u64 new_size, old_size;
 	u64 num_extents;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
+	u64 max_extent_size;
 
 	/* not delalloc, ignore it */
 	if (!(other->state & EXTENT_DELALLOC))
@@ -1649,13 +1656,15 @@ static void btrfs_merge_extent_hook(struct inode *inode,
 	if (btrfs_is_free_space_inode(inode))
 		return;
 
+	max_extent_size = btrfs_max_extent_size(reserve_type);
+
 	if (new->start > other->start)
 		new_size = new->end - other->start + 1;
 	else
 		new_size = other->end - new->start + 1;
 
 	/* we're not bigger than the max, unreserve the space and go */
-	if (new_size <= BTRFS_MAX_EXTENT_SIZE) {
+	if (new_size <= max_extent_size) {
 		spin_lock(&BTRFS_I(inode)->lock);
 		BTRFS_I(inode)->outstanding_extents--;
 		spin_unlock(&BTRFS_I(inode)->lock);
@@ -1681,14 +1690,14 @@ static void btrfs_merge_extent_hook(struct inode *inode,
 	 * this case.
 	 */
 	old_size = other->end - other->start + 1;
-	num_extents = div64_u64(old_size + BTRFS_MAX_EXTENT_SIZE - 1,
-				BTRFS_MAX_EXTENT_SIZE);
+	num_extents = div64_u64(old_size + max_extent_size - 1,
+				max_extent_size);
 	old_size = new->end - new->start + 1;
-	num_extents += div64_u64(old_size + BTRFS_MAX_EXTENT_SIZE - 1,
-				 BTRFS_MAX_EXTENT_SIZE);
+	num_extents += div64_u64(old_size + max_extent_size - 1,
+				 max_extent_size);
 
-	if (div64_u64(new_size + BTRFS_MAX_EXTENT_SIZE - 1,
-		      BTRFS_MAX_EXTENT_SIZE) >= num_extents)
+	if (div64_u64(new_size + max_extent_size - 1,
+		      max_extent_size) >= num_extents)
 		return;
 
 	spin_lock(&BTRFS_I(inode)->lock);
@@ -1754,10 +1763,16 @@ static void btrfs_set_bit_hook(struct inode *inode,
 	if (!(state->state & EXTENT_DELALLOC) && (*bits & EXTENT_DELALLOC)) {
 		struct btrfs_root *root = BTRFS_I(inode)->root;
 		u64 len = state->end + 1 - state->start;
-		u64 num_extents = div64_u64(len + BTRFS_MAX_EXTENT_SIZE - 1,
-					    BTRFS_MAX_EXTENT_SIZE);
+		enum btrfs_metadata_reserve_type reserve_type =
+					BTRFS_RESERVE_NORMAL;
+		u64 max_extent_size;
+		u64 num_extents;
 		bool do_list = !btrfs_is_free_space_inode(inode);
 
+		max_extent_size = btrfs_max_extent_size(reserve_type);
+		num_extents = div64_u64(len + max_extent_size - 1,
+					max_extent_size);
+
 		if (*bits & EXTENT_FIRST_DELALLOC)
 			*bits &= ~EXTENT_FIRST_DELALLOC;
 
@@ -1792,8 +1807,9 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 				 unsigned *bits)
 {
 	u64 len = state->end + 1 - state->start;
-	u64 num_extents = div64_u64(len + BTRFS_MAX_EXTENT_SIZE -1,
-				    BTRFS_MAX_EXTENT_SIZE);
+	u64 max_extent_size;
+	u64 num_extents;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
 
 	spin_lock(&BTRFS_I(inode)->lock);
 	if ((state->state & EXTENT_DEFRAG) && (*bits & EXTENT_DEFRAG))
@@ -1809,6 +1825,10 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 		struct btrfs_root *root = BTRFS_I(inode)->root;
 		bool do_list = !btrfs_is_free_space_inode(inode);
 
+		max_extent_size = btrfs_max_extent_size(reserve_type);
+		num_extents = div64_u64(len + max_extent_size - 1,
+					max_extent_size);
+
 		if (*bits & EXTENT_FIRST_DELALLOC) {
 			*bits &= ~EXTENT_FIRST_DELALLOC;
 		} else if (!(*bits & EXTENT_DO_ACCOUNTING) && do_list) {
@@ -1824,7 +1844,8 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 		 */
 		if (*bits & EXTENT_DO_ACCOUNTING &&
 		    root != root->fs_info->tree_root)
-			btrfs_delalloc_release_metadata(inode, len);
+			btrfs_delalloc_release_metadata(inode, len,
+							reserve_type);
 
 		/* For sanity tests. */
 		if (btrfs_is_testing(root->fs_info))
@@ -2007,11 +2028,13 @@ static noinline int add_pending_csums(struct btrfs_trans_handle *trans,
 }
 
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
-			      struct extent_state **cached_state, int dedupe)
+			      struct extent_state **cached_state,
+			      enum btrfs_metadata_reserve_type reserve_type)
 {
 	int ret;
-	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
-				    BTRFS_MAX_EXTENT_SIZE);
+	u64 max_extent_size = btrfs_max_extent_size(reserve_type);
+	u64 num_extents = div64_u64(end - start + max_extent_size,
+				    max_extent_size);
 
 	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
 	ret = set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end,
@@ -2038,11 +2061,13 @@ int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
 }
 
 int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
-			    struct extent_state **cached_state)
+			    struct extent_state **cached_state,
+			    enum btrfs_metadata_reserve_type reserve_type)
 {
 	int ret;
-	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
-				    BTRFS_MAX_EXTENT_SIZE);
+	u64 max_extent_size = btrfs_max_extent_size(reserve_type);
+	u64 num_extents = div64_u64(end - start + max_extent_size,
+				    max_extent_size);
 
 	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
 	ret = set_extent_defrag(&BTRFS_I(inode)->io_tree, start, end,
@@ -2073,6 +2098,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 	u64 page_start;
 	u64 page_end;
 	int ret;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
 
 	fixup = container_of(work, struct btrfs_writepage_fixup, work);
 	page = fixup->page;
@@ -2106,7 +2132,7 @@ again:
 	}
 
 	ret = btrfs_delalloc_reserve_space(inode, page_start,
-					   PAGE_SIZE);
+					   PAGE_SIZE, reserve_type);
 	if (ret) {
 		mapping_set_error(page->mapping, ret);
 		end_extent_writepage(page, ret, page_start, page_end);
@@ -2115,7 +2141,7 @@ again:
 	 }
 
 	btrfs_set_extent_delalloc(inode, page_start, page_end, &cached_state,
-				  0);
+				  reserve_type);
 	ClearPageChecked(page);
 	set_page_dirty(page);
 out:
@@ -2925,6 +2951,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 	u64 logical_len = ordered_extent->len;
 	bool nolock;
 	bool truncated = false;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
 
 	nolock = btrfs_is_free_space_inode(inode);
 
@@ -3048,7 +3075,8 @@ out_unlock:
 			     ordered_extent->len - 1, &cached_state, GFP_NOFS);
 out:
 	if (root != root->fs_info->tree_root)
-		btrfs_delalloc_release_metadata(inode, ordered_extent->len);
+		btrfs_delalloc_release_metadata(inode, ordered_extent->len,
+						reserve_type);
 	if (trans)
 		btrfs_end_transaction(trans, root);
 
@@ -4762,13 +4790,14 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	int ret = 0;
 	u64 block_start;
 	u64 block_end;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
 
 	if ((offset & (blocksize - 1)) == 0 &&
 	    (!len || ((len & (blocksize - 1)) == 0)))
 		goto out;
 
 	ret = btrfs_delalloc_reserve_space(inode,
-			round_down(from, blocksize), blocksize);
+			round_down(from, blocksize), blocksize, reserve_type);
 	if (ret)
 		goto out;
 
@@ -4777,7 +4806,7 @@ again:
 	if (!page) {
 		btrfs_delalloc_release_space(inode,
 				round_down(from, blocksize),
-				blocksize);
+				blocksize, reserve_type);
 		ret = -ENOMEM;
 		goto out;
 	}
@@ -4820,7 +4849,7 @@ again:
 			  0, 0, &cached_state, GFP_NOFS);
 
 	ret = btrfs_set_extent_delalloc(inode, block_start, block_end,
-					&cached_state, 0);
+					&cached_state, reserve_type);
 	if (ret) {
 		unlock_extent_cached(io_tree, block_start, block_end,
 				     &cached_state, GFP_NOFS);
@@ -4848,7 +4877,7 @@ again:
 out_unlock:
 	if (ret)
 		btrfs_delalloc_release_space(inode, block_start,
-					     blocksize);
+					     blocksize, reserve_type);
 	unlock_page(page);
 	put_page(page);
 out:
@@ -8743,7 +8772,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 			inode_unlock(inode);
 			relock = true;
 		}
-		ret = btrfs_delalloc_reserve_space(inode, offset, count);
+		ret = btrfs_delalloc_reserve_space(inode, offset, count,
+						   BTRFS_RESERVE_NORMAL);
 		if (ret)
 			goto out;
 		dio_data.outstanding_extents = div64_u64(count +
@@ -8775,7 +8805,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		if (ret < 0 && ret != -EIOCBQUEUED) {
 			if (dio_data.reserve)
 				btrfs_delalloc_release_space(inode, offset,
-							     dio_data.reserve);
+					dio_data.reserve, BTRFS_RESERVE_NORMAL);
 			/*
 			 * On error we might have left some ordered extents
 			 * without submitting corresponding bios for them, so
@@ -8791,7 +8821,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 					0);
 		} else if (ret >= 0 && (size_t)ret < count)
 			btrfs_delalloc_release_space(inode, offset,
-						     count - (size_t)ret);
+						     count - (size_t)ret,
+						     BTRFS_RESERVE_NORMAL);
 	}
 out:
 	if (wakeup)
@@ -9030,6 +9061,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
 	char *kaddr;
 	unsigned long zero_start;
 	loff_t size;
@@ -9056,7 +9088,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	 * being processed by btrfs_page_mkwrite() function.
 	 */
 	ret = btrfs_delalloc_reserve_space(inode, page_start,
-					   reserved_space);
+					   reserved_space, reserve_type);
 	if (!ret) {
 		ret = file_update_time(vma->vm_file);
 		reserved = 1;
@@ -9108,7 +9140,8 @@ again:
 			BTRFS_I(inode)->outstanding_extents++;
 			spin_unlock(&BTRFS_I(inode)->lock);
 			btrfs_delalloc_release_space(inode, page_start,
-						PAGE_SIZE - reserved_space);
+						PAGE_SIZE - reserved_space,
+						reserve_type);
 		}
 	}
 
@@ -9125,7 +9158,7 @@ again:
 			  0, 0, &cached_state, GFP_NOFS);
 
 	ret = btrfs_set_extent_delalloc(inode, page_start, end,
-					&cached_state, 0);
+					&cached_state, reserve_type);
 	if (ret) {
 		unlock_extent_cached(io_tree, page_start, page_end,
 				     &cached_state, GFP_NOFS);
@@ -9163,7 +9196,8 @@ out_unlock:
 	}
 	unlock_page(page);
 out:
-	btrfs_delalloc_release_space(inode, page_start, reserved_space);
+	btrfs_delalloc_release_space(inode, page_start, reserved_space,
+				     reserve_type);
 out_noreserve:
 	sb_end_pagefault(inode->i_sb);
 	return ret;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index eff5eae..870737b 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1131,6 +1131,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
 	struct extent_io_tree *tree;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
 	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
 
 	file_end = (isize - 1) >> PAGE_SHIFT;
@@ -1141,7 +1142,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 
 	ret = btrfs_delalloc_reserve_space(inode,
 			start_index << PAGE_SHIFT,
-			page_cnt << PAGE_SHIFT);
+			page_cnt << PAGE_SHIFT, reserve_type);
 	if (ret)
 		return ret;
 	i_done = 0;
@@ -1232,11 +1233,12 @@ again:
 		spin_unlock(&BTRFS_I(inode)->lock);
 		btrfs_delalloc_release_space(inode,
 				start_index << PAGE_SHIFT,
-				(page_cnt - i_done) << PAGE_SHIFT);
+				(page_cnt - i_done) << PAGE_SHIFT,
+				reserve_type);
 	}
 
 	btrfs_set_extent_defrag(inode, page_start,
-				page_end - 1, &cached_state);
+				page_end - 1, &cached_state, reserve_type);
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree,
 			     page_start, page_end - 1, &cached_state,
 			     GFP_NOFS);
@@ -1257,7 +1259,7 @@ out:
 	}
 	btrfs_delalloc_release_space(inode,
 			start_index << PAGE_SHIFT,
-			page_cnt << PAGE_SHIFT);
+			page_cnt << PAGE_SHIFT, reserve_type);
 	return ret;
 
 }
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index c4af0cd..0e4b370 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3148,6 +3148,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
 	unsigned long last_index;
 	struct page *page;
 	struct file_ra_state *ra;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
 	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
 	int nr = 0;
 	int ret = 0;
@@ -3173,7 +3174,8 @@ static int relocate_file_extent_cluster(struct inode *inode,
 	index = (cluster->start - offset) >> PAGE_SHIFT;
 	last_index = (cluster->end - offset) >> PAGE_SHIFT;
 	while (index <= last_index) {
-		ret = btrfs_delalloc_reserve_metadata(inode, PAGE_SIZE);
+		ret = btrfs_delalloc_reserve_metadata(inode, PAGE_SIZE,
+						      reserve_type);
 		if (ret)
 			goto out;
 
@@ -3186,7 +3188,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
 						   mask);
 			if (!page) {
 				btrfs_delalloc_release_metadata(inode,
-							PAGE_SIZE);
+						PAGE_SIZE, reserve_type);
 				ret = -ENOMEM;
 				goto out;
 			}
@@ -3205,7 +3207,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
 				unlock_page(page);
 				put_page(page);
 				btrfs_delalloc_release_metadata(inode,
-							PAGE_SIZE);
+						PAGE_SIZE, reserve_type);
 				ret = -EIO;
 				goto out;
 			}
@@ -3226,7 +3228,8 @@ static int relocate_file_extent_cluster(struct inode *inode,
 			nr++;
 		}
 
-		btrfs_set_extent_delalloc(inode, page_start, page_end, NULL, 0);
+		btrfs_set_extent_delalloc(inode, page_start, page_end, NULL,
+					  reserve_type);
 		set_page_dirty(page);
 
 		unlock_extent(&BTRFS_I(inode)->io_tree,
diff --git a/fs/btrfs/tests/inode-tests.c b/fs/btrfs/tests/inode-tests.c
index 0bf4680..6edd44d 100644
--- a/fs/btrfs/tests/inode-tests.c
+++ b/fs/btrfs/tests/inode-tests.c
@@ -942,6 +942,7 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
 	struct btrfs_fs_info *fs_info = NULL;
 	struct inode *inode = NULL;
 	struct btrfs_root *root = NULL;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
 	int ret = -ENOMEM;
 
 	inode = btrfs_new_test_inode();
@@ -968,7 +969,7 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
 	/* [BTRFS_MAX_EXTENT_SIZE] */
 	BTRFS_I(inode)->outstanding_extents++;
 	ret = btrfs_set_extent_delalloc(inode, 0, BTRFS_MAX_EXTENT_SIZE - 1,
-					NULL, 0);
+					NULL, reserve_type);
 	if (ret) {
 		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
 		goto out;
@@ -984,7 +985,7 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
 	BTRFS_I(inode)->outstanding_extents++;
 	ret = btrfs_set_extent_delalloc(inode, BTRFS_MAX_EXTENT_SIZE,
 					BTRFS_MAX_EXTENT_SIZE + sectorsize - 1,
-					NULL, 0);
+					NULL, reserve_type);
 	if (ret) {
 		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
 		goto out;
@@ -1019,7 +1020,7 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
 	ret = btrfs_set_extent_delalloc(inode, BTRFS_MAX_EXTENT_SIZE >> 1,
 					(BTRFS_MAX_EXTENT_SIZE >> 1)
 					+ sectorsize - 1,
-					NULL, 0);
+					NULL, reserve_type);
 	if (ret) {
 		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
 		goto out;
@@ -1042,7 +1043,7 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
 	ret = btrfs_set_extent_delalloc(inode,
 			BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize,
 			(BTRFS_MAX_EXTENT_SIZE << 1) + 3 * sectorsize - 1,
-			NULL, 0);
+			NULL, reserve_type);
 	if (ret) {
 		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
 		goto out;
@@ -1060,7 +1061,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
 	BTRFS_I(inode)->outstanding_extents++;
 	ret = btrfs_set_extent_delalloc(inode,
 			BTRFS_MAX_EXTENT_SIZE + sectorsize,
-			BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize - 1, NULL, 0);
+			BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize - 1, NULL,
+			reserve_type);
 	if (ret) {
 		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
 		goto out;
@@ -1097,7 +1099,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
 	BTRFS_I(inode)->outstanding_extents++;
 	ret = btrfs_set_extent_delalloc(inode,
 			BTRFS_MAX_EXTENT_SIZE + sectorsize,
-			BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize - 1, NULL, 0);
+			BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize - 1, NULL,
+			reserve_type);
 	if (ret) {
 		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
 		goto out;
-- 
2.7.4




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] btrfs: Introduce COMPRESS reserve type to fix false enospc for compression
  2016-11-11  8:39 [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues Wang Xiaoguang
  2016-11-11  8:39 ` [PATCH 1/3] btrfs: improve inode's outstanding_extents computation Wang Xiaoguang
  2016-11-11  8:39 ` [PATCH 2/3] btrfs: introduce type based delalloc metadata reserve Wang Xiaoguang
@ 2016-11-11  8:39 ` Wang Xiaoguang
  2016-11-22  9:46 ` [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues Wang Xiaoguang
  2016-12-31  7:31 ` Stefan Priebe - Profihost AG
  4 siblings, 0 replies; 17+ messages in thread
From: Wang Xiaoguang @ 2016-11-11  8:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, jbacik, dsterba, holger, s.priebe, Qu Wenruo

When testing btrfs compression, sometimes we got ENOSPC error, though fs
still has much free space, xfstests generic/171, generic/172, generic/173,
generic/174, generic/175 can reveal this bug in my test environment when
compression is enabled.

After some debuging work, we found that it's
btrfs_delalloc_reserve_metadata() which sometimes tries to reserve too
much metadata space, even for very small data range.

In btrfs_delalloc_reserve_metadata(), the number of metadata bytes to
reserve is calculated by the difference between outstanding extents and
reserved extents.
But due to bad designed drop_outstanding_extent() function, it can make
the difference too big, and cause problem.

The problem happens in the following flow with compression enabled.

1) Buffered write 128M data with 128K blocksize
   outstanding_extents = 1
   reserved_extents = 1024 (128M / 128K, one blocksize will get one
                            reserved_extent)

   Note: it's btrfs_merge_extent_hook() to merge outstanding extents.
         But reserved extents are still 1024.

2) Allocate extents for dirty range
   cow_file_range_async() split above large extent into small 128K
   extents.
   Let's assume 2 compressed extents have been split.

   So we have:
   outstanding_extents = 3
   reserved_extents = 1024

   range [0, 256K) has extents allocated

3) One ordered extent get finished
   btrfs_finish_ordered_io()
   |- btrfs_delalloc_release_metadata()
      |- drop_outstanding_extent()

   drop_outstanding_extent() will free *ALL* redundant reserved extents.
   So we have:
   outstanding_extents = 2 (One has finished)
   reserved_extents = 2

4) Continue allocating extents for dirty range
   cow_file_range_async() continue handling the remaining range.

   When the whole 128M range is done and assume no more ordered extents
   have finished.
   outstanding_extents = 1023 (One has finished in Step 3)
   reserved_extents = 2 (*ALL* freed in Step 3)

5) Another buffered write happens to the file
   btrfs_delalloc_reserve_metadata() will calculate metadata space.

   The calculation is:
   meta_to_reserve = (outstanding_extents - reserved_extents) * \
		     nodesize * max_tree_level(8) * 2

   If nodesize is 16K, it's 1021 * 16K * 8 * 2, near 256M.
   If nodesize is 64K, it's about 1G.

   That's totally insane.

The fix is to introduce new reserve type, COMPRESSION, to info outstanding
extents calculation algorithm, to get correct outstanding_extents based
extent size.

So in Step 1), outstanding_extents = 1024 reserved_extents = 1024
Step 2): outstanding_extents = 1024 reserved_extents = 1024
Step 3): outstanding_extents = 1023 reserved_extents = 1023

And in Step 5) we reserve correct amount of metadata space.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  2 ++
 fs/btrfs/extent-tree.c |  2 ++
 fs/btrfs/extent_io.c   | 61 ++++++++++++++++++++++++++++++++--
 fs/btrfs/extent_io.h   |  5 +++
 fs/btrfs/file.c        |  3 ++
 fs/btrfs/inode.c       | 89 ++++++++++++++++++++++++++++++++++++++++++--------
 fs/btrfs/ioctl.c       |  2 ++
 fs/btrfs/relocation.c  |  3 ++
 8 files changed, 152 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 64c63c2..6097bf0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -109,9 +109,11 @@ static const int btrfs_csum_sizes[] = { 4 };
  */
 enum btrfs_metadata_reserve_type {
 	BTRFS_RESERVE_NORMAL,
+	BTRFS_RESERVE_COMPRESS,
 };
 
 u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type);
+int inode_need_compress(struct inode *inode);
 
 struct btrfs_mapping_tree {
 	struct extent_map_tree map_tree;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index bd99f25..d5691b0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5924,6 +5924,8 @@ u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type)
 {
 	if (reserve_type == BTRFS_RESERVE_NORMAL)
 		return BTRFS_MAX_EXTENT_SIZE;
+	else if (reserve_type == BTRFS_RESERVE_COMPRESS)
+		return SZ_128K;
 
 	ASSERT(0);
 	return BTRFS_MAX_EXTENT_SIZE;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8ed05d9..854379d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -603,7 +603,7 @@ static int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	btrfs_debug_check_extent_io_range(tree, start, end);
 
 	if (bits & EXTENT_DELALLOC)
-		bits |= EXTENT_NORESERVE;
+		bits |= EXTENT_NORESERVE | EXTENT_COMPRESS;
 
 	if (delete)
 		bits |= ~EXTENT_CTLBITS;
@@ -742,6 +742,60 @@ out:
 
 }
 
+static void adjust_one_outstanding_extent(struct inode *inode, u64 len,
+				enum btrfs_metadata_reserve_type reserve_type)
+{
+	unsigned old_extents, new_extents;
+	u64 max_extent_size = btrfs_max_extent_size(reserve_type);
+
+	old_extents = div64_u64(len + max_extent_size - 1, max_extent_size);
+	new_extents = div64_u64(len + BTRFS_MAX_EXTENT_SIZE - 1,
+				BTRFS_MAX_EXTENT_SIZE);
+	if (old_extents <= new_extents)
+		return;
+
+	spin_lock(&BTRFS_I(inode)->lock);
+	BTRFS_I(inode)->outstanding_extents -= old_extents - new_extents;
+	spin_unlock(&BTRFS_I(inode)->lock);
+}
+
+/*
+ * For a extent with EXTENT_COMPRESS flag, if later it does not go through
+ * compress path, we need to adjust the number of outstanding_extents.
+ * It's because for extent with EXTENT_COMPRESS flag, its number of outstanding
+ * extents is calculated by 128KB, so here we need to adjust it.
+ */
+void adjust_outstanding_extents(struct inode *inode, u64 start, u64 end,
+				enum btrfs_metadata_reserve_type reserve_type)
+{
+	struct rb_node *node;
+	struct extent_state *state;
+	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
+
+	spin_lock(&tree->lock);
+	node = tree_search(tree, start);
+	if (!node)
+		goto out;
+
+	while (1) {
+		state = rb_entry(node, struct extent_state, rb_node);
+		if (state->start > end)
+			goto out;
+		/*
+		 * The whole range is locked, so we can safely clear
+		 * EXTENT_COMPRESS flag.
+		 */
+		state->state &= ~EXTENT_COMPRESS;
+		adjust_one_outstanding_extent(inode,
+				state->end - state->start + 1, reserve_type);
+		node = rb_next(node);
+		if (!node)
+			break;
+	}
+out:
+	spin_unlock(&tree->lock);
+}
+
 static void wait_on_state(struct extent_io_tree *tree,
 			  struct extent_state *state)
 		__releases(tree->lock)
@@ -1504,6 +1558,7 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree,
 	u64 cur_start = *start;
 	u64 found = 0;
 	u64 total_bytes = 0;
+	unsigned pre_state;
 
 	spin_lock(&tree->lock);
 
@@ -1521,7 +1576,8 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree,
 	while (1) {
 		state = rb_entry(node, struct extent_state, rb_node);
 		if (found && (state->start != cur_start ||
-			      (state->state & EXTENT_BOUNDARY))) {
+			      (state->state & EXTENT_BOUNDARY) ||
+			      (state->state ^ pre_state) & EXTENT_COMPRESS)) {
 			goto out;
 		}
 		if (!(state->state & EXTENT_DELALLOC)) {
@@ -1537,6 +1593,7 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree,
 		found++;
 		*end = state->end;
 		cur_start = state->end + 1;
+		pre_state = state->state;
 		node = rb_next(node);
 		total_bytes += state->end - state->start + 1;
 		if (total_bytes >= max_bytes)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index ab31d14..09dbdd7 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -21,6 +21,7 @@
 #define EXTENT_NORESERVE	(1U << 15)
 #define EXTENT_QGROUP_RESERVED	(1U << 16)
 #define EXTENT_CLEAR_DATA_RESV	(1U << 17)
+#define EXTENT_COMPRESS		(1U << 18)
 #define EXTENT_IOBITS		(EXTENT_LOCKED | EXTENT_WRITEBACK)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
 
@@ -248,6 +249,10 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		     unsigned bits, int wake, int delete,
 		     struct extent_state **cached, gfp_t mask);
 
+enum btrfs_metadata_reserve_type;
+void adjust_outstanding_extents(struct inode *inode, u64 start, u64 end,
+		enum btrfs_metadata_reserve_type reserve_type);
+
 static inline int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end)
 {
 	return clear_extent_bit(tree, start, end, EXTENT_LOCKED, 1, 0, NULL,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f174381..4abd280 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1532,6 +1532,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 	if (!pages)
 		return -ENOMEM;
 
+	if (inode_need_compress(inode))
+		reserve_type = BTRFS_RESERVE_COMPRESS;
+
 	while (iov_iter_count(i) > 0) {
 		size_t offset = pos & (PAGE_SIZE - 1);
 		size_t sector_offset;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 906ab27..f3ea0e0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -372,7 +372,7 @@ static noinline int add_async_extent(struct async_cow *cow,
 	return 0;
 }
 
-static inline int inode_need_compress(struct inode *inode)
+int inode_need_compress(struct inode *inode)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 
@@ -711,6 +711,17 @@ retry:
 					 async_extent->start +
 					 async_extent->ram_size - 1);
 
+			/*
+			 * We use 128KB as max extent size to calculate number
+			 * of outstanding extents for this extent before, now
+			 * it'll go throuth uncompressed IO, we need to use
+			 * 128MB as max extent size to re-calculate number of
+			 * outstanding extents for this extent.
+			 */
+			adjust_outstanding_extents(inode, async_extent->start,
+						   async_extent->start +
+						   async_extent->ram_size - 1,
+						   BTRFS_RESERVE_COMPRESS);
 			/* allocate blocks */
 			ret = cow_file_range(inode, async_cow->locked_page,
 					     async_extent->start,
@@ -1153,7 +1164,8 @@ static noinline void async_cow_free(struct btrfs_work *work)
 
 static int cow_file_range_async(struct inode *inode, struct page *locked_page,
 				u64 start, u64 end, int *page_started,
-				unsigned long *nr_written)
+				unsigned long *nr_written,
+				enum btrfs_metadata_reserve_type reserve_type)
 {
 	struct async_cow *async_cow;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -1171,11 +1183,10 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
 		async_cow->locked_page = locked_page;
 		async_cow->start = start;
 
-		if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS &&
-		    !btrfs_test_opt(root->fs_info, FORCE_COMPRESS))
-			cur_end = end;
-		else
+		if (reserve_type == BTRFS_RESERVE_COMPRESS)
 			cur_end = min(end, start + SZ_512K - 1);
+		else
+			ASSERT(0);
 
 		async_cow->end = cur_end;
 		INIT_LIST_HEAD(&async_cow->extents);
@@ -1574,21 +1585,37 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page,
 {
 	int ret;
 	int force_cow = need_force_cow(inode, start, end);
+	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+	int need_compress;
+	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
+
+	need_compress = test_range_bit(io_tree, start, end,
+				       EXTENT_COMPRESS, 1, NULL);
+	if (need_compress)
+		reserve_type = BTRFS_RESERVE_COMPRESS;
 
 	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW && !force_cow) {
+		if (need_compress)
+			adjust_outstanding_extents(inode, start, end,
+						   reserve_type);
+
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
 					 page_started, 1, nr_written);
 	} else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) {
+		if (need_compress)
+			adjust_outstanding_extents(inode, start, end,
+						   reserve_type);
+
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
 					 page_started, 0, nr_written);
-	} else if (!inode_need_compress(inode)) {
+	} else if (!need_compress) {
 		ret = cow_file_range(inode, locked_page, start, end, end,
 				      page_started, nr_written, 1, NULL);
 	} else {
 		set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
 			&BTRFS_I(inode)->runtime_flags);
 		ret = cow_file_range_async(inode, locked_page, start, end,
-					   page_started, nr_written);
+				page_started, nr_written, reserve_type);
 	}
 	return ret;
 }
@@ -1607,6 +1634,8 @@ static void btrfs_split_extent_hook(struct inode *inode,
 	if (btrfs_is_free_space_inode(inode))
 		return;
 
+	if (orig->state & EXTENT_COMPRESS)
+		reserve_type = BTRFS_RESERVE_COMPRESS;
 	max_extent_size = btrfs_max_extent_size(reserve_type);
 
 	size = orig->end - orig->start + 1;
@@ -1656,6 +1685,8 @@ static void btrfs_merge_extent_hook(struct inode *inode,
 	if (btrfs_is_free_space_inode(inode))
 		return;
 
+	if (other->state & EXTENT_COMPRESS)
+		reserve_type = BTRFS_RESERVE_COMPRESS;
 	max_extent_size = btrfs_max_extent_size(reserve_type);
 
 	if (new->start > other->start)
@@ -1769,6 +1800,8 @@ static void btrfs_set_bit_hook(struct inode *inode,
 		u64 num_extents;
 		bool do_list = !btrfs_is_free_space_inode(inode);
 
+		if (*bits & EXTENT_COMPRESS)
+			reserve_type = BTRFS_RESERVE_COMPRESS;
 		max_extent_size = btrfs_max_extent_size(reserve_type);
 		num_extents = div64_u64(len + max_extent_size - 1,
 					max_extent_size);
@@ -1825,6 +1858,8 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 		struct btrfs_root *root = BTRFS_I(inode)->root;
 		bool do_list = !btrfs_is_free_space_inode(inode);
 
+		if (state->state & EXTENT_COMPRESS)
+			reserve_type = BTRFS_RESERVE_COMPRESS;
 		max_extent_size = btrfs_max_extent_size(reserve_type);
 		num_extents = div64_u64(len + max_extent_size - 1,
 					max_extent_size);
@@ -2027,18 +2062,30 @@ static noinline int add_pending_csums(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
+/*
+ * Normally flag should be 0, but if a data range will go through compress path,
+ * set flag to 1. Note: here we should ensure enum btrfs_metadata_reserve_type
+ * and flag's values are consistent.
+ */
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
 			      struct extent_state **cached_state,
 			      enum btrfs_metadata_reserve_type reserve_type)
 {
 	int ret;
+	unsigned bits;
 	u64 max_extent_size = btrfs_max_extent_size(reserve_type);
 	u64 num_extents = div64_u64(end - start + max_extent_size,
 				    max_extent_size);
 
+	/* compression path */
+	if (reserve_type == BTRFS_RESERVE_COMPRESS)
+		bits = EXTENT_DELALLOC | EXTENT_COMPRESS | EXTENT_UPTODATE;
+	else
+		bits = EXTENT_DELALLOC | EXTENT_UPTODATE;
+
 	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
-	ret = set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end,
-				  cached_state);
+	ret = set_extent_bit(&BTRFS_I(inode)->io_tree, start, end,
+			     bits, NULL, cached_state, GFP_NOFS);
 
 	/*
 	 * btrfs_delalloc_reserve_metadata() will first add number of
@@ -2065,14 +2112,20 @@ int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
 			    enum btrfs_metadata_reserve_type reserve_type)
 {
 	int ret;
+	unsigned bits;
 	u64 max_extent_size = btrfs_max_extent_size(reserve_type);
 	u64 num_extents = div64_u64(end - start + max_extent_size,
 				    max_extent_size);
 
 	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
-	ret = set_extent_defrag(&BTRFS_I(inode)->io_tree, start, end,
-				cached_state);
+	if (reserve_type == BTRFS_RESERVE_COMPRESS)
+		bits = EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG |
+				EXTENT_COMPRESS;
+	else
+		bits = EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG;
 
+	ret = set_extent_bit(&BTRFS_I(inode)->io_tree, start, end,
+			     bits, NULL, cached_state, GFP_NOFS);
 	if (ret == 0 && !btrfs_is_free_space_inode(inode)) {
 		spin_lock(&BTRFS_I(inode)->lock);
 		BTRFS_I(inode)->outstanding_extents -= num_extents;
@@ -2131,6 +2184,8 @@ again:
 		goto again;
 	}
 
+	if (inode_need_compress(inode))
+		reserve_type = BTRFS_RESERVE_COMPRESS;
 	ret = btrfs_delalloc_reserve_space(inode, page_start,
 					   PAGE_SIZE, reserve_type);
 	if (ret) {
@@ -3029,8 +3084,11 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 
 	trans->block_rsv = &root->fs_info->delalloc_block_rsv;
 
-	if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered_extent->flags))
+	if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered_extent->flags)) {
 		compress_type = ordered_extent->compress_type;
+		reserve_type = BTRFS_RESERVE_COMPRESS;
+	}
+
 	if (test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) {
 		BUG_ON(compress_type);
 		ret = btrfs_mark_extent_written(trans, inode,
@@ -4792,6 +4850,9 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	u64 block_end;
 	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
 
+	if (inode_need_compress(inode))
+		reserve_type = BTRFS_RESERVE_COMPRESS;
+
 	if ((offset & (blocksize - 1)) == 0 &&
 	    (!len || ((len & (blocksize - 1)) == 0)))
 		goto out;
@@ -9079,6 +9140,8 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	page_end = page_start + PAGE_SIZE - 1;
 	end = page_end;
 
+	if (inode_need_compress(inode))
+		reserve_type = BTRFS_RESERVE_COMPRESS;
 	/*
 	 * Reserving delalloc space after obtaining the page lock can lead to
 	 * deadlock. For example, if a dirty page is locked by this function
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 870737b..492ec2e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1140,6 +1140,8 @@ static int cluster_pages_for_defrag(struct inode *inode,
 
 	page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
 
+	if (inode_need_compress(inode))
+		reserve_type = BTRFS_RESERVE_COMPRESS;
 	ret = btrfs_delalloc_reserve_space(inode,
 			start_index << PAGE_SHIFT,
 			page_cnt << PAGE_SHIFT, reserve_type);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0e4b370..08cfb47 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3156,6 +3156,9 @@ static int relocate_file_extent_cluster(struct inode *inode,
 	if (!cluster->nr)
 		return 0;
 
+	if (inode_need_compress(inode))
+		reserve_type = BTRFS_RESERVE_COMPRESS;
+
 	ra = kzalloc(sizeof(*ra), GFP_NOFS);
 	if (!ra)
 		return -ENOMEM;
-- 
2.7.4




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues
  2016-11-11  8:39 [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues Wang Xiaoguang
                   ` (2 preceding siblings ...)
  2016-11-11  8:39 ` [PATCH 3/3] btrfs: Introduce COMPRESS reserve type to fix false enospc for compression Wang Xiaoguang
@ 2016-11-22  9:46 ` Wang Xiaoguang
  2016-12-31  7:31 ` Stefan Priebe - Profihost AG
  4 siblings, 0 replies; 17+ messages in thread
From: Wang Xiaoguang @ 2016-11-22  9:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, jbacik, dsterba, holger, s.priebe

hello,

Any comments about this patchset?

The first two patches almost don't do any functional change which I think
could be review firstly.
     btrfs: improve inode's outstanding_extents computation
     btrfs: introduce type based delalloc metadata reserve


Regards,
Xiaoguang Wang

On 11/11/2016 04:39 PM, Wang Xiaoguang wrote:
> When having compression enabled, Stefan Priebe ofen got enospc errors
> though fs still has much free space. Qu Wenruo also has submitted a
> fstests test case which can reproduce this bug steadily, please see
> url: https://patchwork.kernel.org/patch/9420527
>
> First patch[1/3] "btrfs: improve inode's outstanding_extents computation" is to
> fix outstanding_extents and reserved_extents account issues. This issue was revealed
> by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, When modifying
> BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often gets these warnings from
> btrfs_destroy_inode():
>          WARN_ON(BTRFS_I(inode)->outstanding_extents);
>          WARN_ON(BTRFS_I(inode)->reserved_extents);
> Please see this patch's commit message for detailed info, and this patch is
> necessary to patch2 and patch3.
>
> For false enospc, the root reasson is that for compression, its max extent size will
> be 128k, not 128MB. If we still use 128MB as max extent size to reserve metadata for
> compression, obviously it's not appropriate. In patch "btrfs: Introduce COMPRESS
> reserve type to fix false enospc for compression" commit message,
> we explain why false enospc error occurs, please see it for detailed info.
>
> To fix this issue, we introduce a new enum type:
> 	enum btrfs_metadata_reserve_type {
> 		BTRFS_RESERVE_NORMAL,
>          	BTRFS_RESERVE_COMPRESS,
> 	};
> For btrfs_delalloc_[reserve|release]_metadata() and
> btrfs_delalloc_[reserve|release]_space(), we introce a new btrfs_metadata_reserve_type
> argument, then if a path needs to go compression, we pass BTRFS_RESERVE_COMPRESS,
> otherwise pass BTRFS_RESERVE_NORMAL.
>
> With these patchs, Stefan no longer saw such false enospc errors, and Qu Wenruo's
> fstests test case will also pass. I have also run whole fstests multiple times,
> no regression occurs, thanks.
>
> Wang Xiaoguang (3):
>    btrfs: improve inode's outstanding_extents computation
>    btrfs: introduce type based delalloc metadata reserve
>    btrfs: Introduce COMPRESS reserve type to fix false enospc for
>      compression
>
>   fs/btrfs/ctree.h             |  36 +++++--
>   fs/btrfs/extent-tree.c       |  52 ++++++---
>   fs/btrfs/extent_io.c         |  61 ++++++++++-
>   fs/btrfs/extent_io.h         |   5 +
>   fs/btrfs/file.c              |  25 +++--
>   fs/btrfs/free-space-cache.c  |   6 +-
>   fs/btrfs/inode-map.c         |   6 +-
>   fs/btrfs/inode.c             | 246 ++++++++++++++++++++++++++++++++++---------
>   fs/btrfs/ioctl.c             |  16 +--
>   fs/btrfs/relocation.c        |  14 ++-
>   fs/btrfs/tests/inode-tests.c |  15 +--
>   11 files changed, 381 insertions(+), 101 deletions(-)
>




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues
  2016-11-11  8:39 [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues Wang Xiaoguang
                   ` (3 preceding siblings ...)
  2016-11-22  9:46 ` [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues Wang Xiaoguang
@ 2016-12-31  7:31 ` Stefan Priebe - Profihost AG
  2017-01-01  9:32   ` Qu Wenruo
  4 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-12-31  7:31 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: clm, jbacik, dsterba, holger

Any news on this series? I can't see it in 4.9 nor in 4.10-rc

Stefan

Am 11.11.2016 um 09:39 schrieb Wang Xiaoguang:
> When having compression enabled, Stefan Priebe ofen got enospc errors
> though fs still has much free space. Qu Wenruo also has submitted a
> fstests test case which can reproduce this bug steadily, please see
> url: https://patchwork.kernel.org/patch/9420527
> 
> First patch[1/3] "btrfs: improve inode's outstanding_extents computation" is to
> fix outstanding_extents and reserved_extents account issues. This issue was revealed
> by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, When modifying
> BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often gets these warnings from
> btrfs_destroy_inode():
>         WARN_ON(BTRFS_I(inode)->outstanding_extents);
>         WARN_ON(BTRFS_I(inode)->reserved_extents);
> Please see this patch's commit message for detailed info, and this patch is
> necessary to patch2 and patch3.
> 
> For false enospc, the root reasson is that for compression, its max extent size will
> be 128k, not 128MB. If we still use 128MB as max extent size to reserve metadata for
> compression, obviously it's not appropriate. In patch "btrfs: Introduce COMPRESS
> reserve type to fix false enospc for compression" commit message,
> we explain why false enospc error occurs, please see it for detailed info.
> 
> To fix this issue, we introduce a new enum type:
> 	enum btrfs_metadata_reserve_type {
> 		BTRFS_RESERVE_NORMAL,
>         	BTRFS_RESERVE_COMPRESS,
> 	};
> For btrfs_delalloc_[reserve|release]_metadata() and
> btrfs_delalloc_[reserve|release]_space(), we introce a new btrfs_metadata_reserve_type
> argument, then if a path needs to go compression, we pass BTRFS_RESERVE_COMPRESS,
> otherwise pass BTRFS_RESERVE_NORMAL.
> 
> With these patchs, Stefan no longer saw such false enospc errors, and Qu Wenruo's
> fstests test case will also pass. I have also run whole fstests multiple times,
> no regression occurs, thanks.
> 
> Wang Xiaoguang (3):
>   btrfs: improve inode's outstanding_extents computation
>   btrfs: introduce type based delalloc metadata reserve
>   btrfs: Introduce COMPRESS reserve type to fix false enospc for
>     compression
> 
>  fs/btrfs/ctree.h             |  36 +++++--
>  fs/btrfs/extent-tree.c       |  52 ++++++---
>  fs/btrfs/extent_io.c         |  61 ++++++++++-
>  fs/btrfs/extent_io.h         |   5 +
>  fs/btrfs/file.c              |  25 +++--
>  fs/btrfs/free-space-cache.c  |   6 +-
>  fs/btrfs/inode-map.c         |   6 +-
>  fs/btrfs/inode.c             | 246 ++++++++++++++++++++++++++++++++++---------
>  fs/btrfs/ioctl.c             |  16 +--
>  fs/btrfs/relocation.c        |  14 ++-
>  fs/btrfs/tests/inode-tests.c |  15 +--
>  11 files changed, 381 insertions(+), 101 deletions(-)
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues
  2016-12-31  7:31 ` Stefan Priebe - Profihost AG
@ 2017-01-01  9:32   ` Qu Wenruo
  2017-01-04 16:13     ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2017-01-01  9:32 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Wang Xiaoguang, linux-btrfs
  Cc: clm, jbacik, dsterba, holger

Hi Stefan,

I'm trying to push it to for-next (will be v4.11), but no response yet.

It would be quite nice for you to test the following git pull and give 
some feedback, so that we can merge it faster.

https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60418.html

Thanks,
Qu

On 12/31/2016 03:31 PM, Stefan Priebe - Profihost AG wrote:
> Any news on this series? I can't see it in 4.9 nor in 4.10-rc
>
> Stefan
>
> Am 11.11.2016 um 09:39 schrieb Wang Xiaoguang:
>> When having compression enabled, Stefan Priebe ofen got enospc errors
>> though fs still has much free space. Qu Wenruo also has submitted a
>> fstests test case which can reproduce this bug steadily, please see
>> url: https://patchwork.kernel.org/patch/9420527
>>
>> First patch[1/3] "btrfs: improve inode's outstanding_extents computation" is to
>> fix outstanding_extents and reserved_extents account issues. This issue was revealed
>> by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, When modifying
>> BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often gets these warnings from
>> btrfs_destroy_inode():
>>         WARN_ON(BTRFS_I(inode)->outstanding_extents);
>>         WARN_ON(BTRFS_I(inode)->reserved_extents);
>> Please see this patch's commit message for detailed info, and this patch is
>> necessary to patch2 and patch3.
>>
>> For false enospc, the root reasson is that for compression, its max extent size will
>> be 128k, not 128MB. If we still use 128MB as max extent size to reserve metadata for
>> compression, obviously it's not appropriate. In patch "btrfs: Introduce COMPRESS
>> reserve type to fix false enospc for compression" commit message,
>> we explain why false enospc error occurs, please see it for detailed info.
>>
>> To fix this issue, we introduce a new enum type:
>> 	enum btrfs_metadata_reserve_type {
>> 		BTRFS_RESERVE_NORMAL,
>>         	BTRFS_RESERVE_COMPRESS,
>> 	};
>> For btrfs_delalloc_[reserve|release]_metadata() and
>> btrfs_delalloc_[reserve|release]_space(), we introce a new btrfs_metadata_reserve_type
>> argument, then if a path needs to go compression, we pass BTRFS_RESERVE_COMPRESS,
>> otherwise pass BTRFS_RESERVE_NORMAL.
>>
>> With these patchs, Stefan no longer saw such false enospc errors, and Qu Wenruo's
>> fstests test case will also pass. I have also run whole fstests multiple times,
>> no regression occurs, thanks.
>>
>> Wang Xiaoguang (3):
>>   btrfs: improve inode's outstanding_extents computation
>>   btrfs: introduce type based delalloc metadata reserve
>>   btrfs: Introduce COMPRESS reserve type to fix false enospc for
>>     compression
>>
>>  fs/btrfs/ctree.h             |  36 +++++--
>>  fs/btrfs/extent-tree.c       |  52 ++++++---
>>  fs/btrfs/extent_io.c         |  61 ++++++++++-
>>  fs/btrfs/extent_io.h         |   5 +
>>  fs/btrfs/file.c              |  25 +++--
>>  fs/btrfs/free-space-cache.c  |   6 +-
>>  fs/btrfs/inode-map.c         |   6 +-
>>  fs/btrfs/inode.c             | 246 ++++++++++++++++++++++++++++++++++---------
>>  fs/btrfs/ioctl.c             |  16 +--
>>  fs/btrfs/relocation.c        |  14 ++-
>>  fs/btrfs/tests/inode-tests.c |  15 +--
>>  11 files changed, 381 insertions(+), 101 deletions(-)
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] btrfs: improve inode's outstanding_extents computation
  2016-11-11  8:39 ` [PATCH 1/3] btrfs: improve inode's outstanding_extents computation Wang Xiaoguang
@ 2017-01-03 21:00   ` Liu Bo
  2017-01-03 23:36     ` Liu Bo
  0 siblings, 1 reply; 17+ messages in thread
From: Liu Bo @ 2017-01-03 21:00 UTC (permalink / raw)
  To: Wang Xiaoguang; +Cc: linux-btrfs, clm, jbacik, dsterba, holger, s.priebe

On Fri, Nov 11, 2016 at 04:39:45PM +0800, Wang Xiaoguang wrote:
> This issue was revealed by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB,
> When modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often
> gets these warnings from btrfs_destroy_inode():
> 	WARN_ON(BTRFS_I(inode)->outstanding_extents);
> 	WARN_ON(BTRFS_I(inode)->reserved_extents);
> 
> Simple test program below can reproduce this issue steadily.
> Note: you need to modify BTRFS_MAX_EXTENT_SIZE to 64KB to have test,
> otherwise there won't be such WARNING.
> 	#include <string.h>
> 	#include <unistd.h>
> 	#include <sys/types.h>
> 	#include <sys/stat.h>
> 	#include <fcntl.h>
> 
> 	int main(void)
> 	{
> 		int fd;
> 		char buf[68 *1024];
> 
> 		memset(buf, 0, 68 * 1024);
> 		fd = open("testfile", O_CREAT | O_EXCL | O_RDWR);

                pwrite(fd, buf, 64 * 1024, 64 * 1024);

(During my test, I had to add the above in order to reproduce the
warning, another alternative way is to use truncate and pread.)

> 		pwrite(fd, buf, 68 * 1024, 64 * 1024);
> 		return;
> 	}
> 
> When BTRFS_MAX_EXTENT_SIZE is 64KB, and buffered data range is:
> 64KB						128K		132KB
> |-----------------------------------------------|---------------|
>                          64 + 4KB
> 
> 1) for above data range, btrfs_delalloc_reserve_metadata() will reserve
> metadata and set BTRFS_I(inode)->outstanding_extents to 2.
> (68KB + 64KB - 1) / 64KB == 2
> 
> Outstanding_extents: 2
> 
> 2) then btrfs_dirty_page() will be called to dirty pages and set
> EXTENT_DELALLOC flag. In this case, btrfs_set_bit_hook() will be called
> twice.
> The 1st set_bit_hook() call will set DEALLOC flag for the first 64K.
> 64KB						128KB
> |-----------------------------------------------|
> 	64KB DELALLOC
> Outstanding_extents: 2

I think that here the 1st extent [64k, 128k] is only set with
EXTENT_UPTODATE since other bits like DELALLOC has been cleared by
lock_and_cleanup_extent_if_need.

It would work if we check EXTENT_UPTODATE on deciding whether to clear
EXTENT_FIRST_DELALLOC, i.e.

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4175987..4eace1a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1752,7 +1752,12 @@ static void btrfs_set_bit_hook(struct inode *inode,
 		bool do_list = !btrfs_is_free_space_inode(inode);
 
 		if (*bits & EXTENT_FIRST_DELALLOC) {
-			*bits &= ~EXTENT_FIRST_DELALLOC;
+			/*
+			 * With EXTENT_UPTODATE, the state gets rewritten
+			 * before writing it back, or gets read from disk.
+			 */
+			if (!(state->state & EXTENT_UPTODATE))
+				*bits &= ~EXTENT_FIRST_DELALLOC;
 		} else {
 			spin_lock(&BTRFS_I(inode)->lock);
 			BTRFS_I(inode)->outstanding_extents++;

Thanks,

-liubo

> 
> Set_bit_hooks() uses FIRST_DELALLOC flag to avoid re-increase
> outstanding_extents counter.
> So for 1st set_bit_hooks() call, it won't modify outstanding_extents,
> it's still 2.
> 
> Then FIRST_DELALLOC flag is *CLEARED*.
> 
> 3) 2nd btrfs_set_bit_hook() call.
> Because FIRST_DELALLOC have been cleared by previous set_bit_hook(),
> btrfs_set_bit_hook() will increase BTRFS_I(inode)->outstanding_extents by
> one, so now BTRFS_I(inode)->outstanding_extents is 3.
> 64KB                                            128KB            132KB
> |-----------------------------------------------|----------------|
> 	64K DELALLOC				   4K DELALLOC
> Outstanding_extents: 3
> 
> But the correct outstanding_extents number should be 2, not 3.
> The 2nd btrfs_set_bit_hook() call just screwed up this, and leads to the
> WARN_ON().
> 
> Normally, we can solve it by only increasing outstanding_extents in
> set_bit_hook().
> But the problem is for delalloc_reserve/release_metadata(), we only have
> a 'length' parameter, and calculate in-accurate outstanding_extents.
> If we only rely on set_bit_hook() release_metadata() will crew things up
> as it will decrease inaccurate number.
> 
> So the fix we use is:
> 1) Increase *INACCURATE* outstanding_extents at delalloc_reserve_meta
>    Just as a place holder.
> 2) Increase *accurate* outstanding_extents at set_bit_hooks()
>    This is the real increaser.
> 3) Decrease *INACCURATE* outstanding_extents before returning
>    This makes outstanding_extents to correct value.
> 
> For 128M BTRFS_MAX_EXTENT_SIZE, due to limitation of
> __btrfs_buffered_write(), each iteration will only handle about 2MB
> data.
> So btrfs_dirty_pages() won't need to handle cases cross 2 extents.
> 
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>  fs/btrfs/ctree.h |  2 ++
>  fs/btrfs/inode.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++------
>  fs/btrfs/ioctl.c |  6 ++----
>  3 files changed, 62 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 9d8edcb..766d152 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3139,6 +3139,8 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput,
>  			       int nr);
>  int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
>  			      struct extent_state **cached_state, int dedupe);
> +int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
> +			    struct extent_state **cached_state);
>  int btrfs_create_subvol_root(struct btrfs_trans_handle *trans,
>  			     struct btrfs_root *new_root,
>  			     struct btrfs_root *parent_root,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 1f980ef..25e0083 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1601,6 +1601,9 @@ static void btrfs_split_extent_hook(struct inode *inode,
>  	if (!(orig->state & EXTENT_DELALLOC))
>  		return;
>  
> +	if (btrfs_is_free_space_inode(inode))
> +		return;
> +
>  	size = orig->end - orig->start + 1;
>  	if (size > BTRFS_MAX_EXTENT_SIZE) {
>  		u64 num_extents;
> @@ -1643,6 +1646,9 @@ static void btrfs_merge_extent_hook(struct inode *inode,
>  	if (!(other->state & EXTENT_DELALLOC))
>  		return;
>  
> +	if (btrfs_is_free_space_inode(inode))
> +		return;
> +
>  	if (new->start > other->start)
>  		new_size = new->end - other->start + 1;
>  	else
> @@ -1738,7 +1744,6 @@ static void btrfs_del_delalloc_inode(struct btrfs_root *root,
>  static void btrfs_set_bit_hook(struct inode *inode,
>  			       struct extent_state *state, unsigned *bits)
>  {
> -
>  	if ((*bits & EXTENT_DEFRAG) && !(*bits & EXTENT_DELALLOC))
>  		WARN_ON(1);
>  	/*
> @@ -1749,13 +1754,16 @@ static void btrfs_set_bit_hook(struct inode *inode,
>  	if (!(state->state & EXTENT_DELALLOC) && (*bits & EXTENT_DELALLOC)) {
>  		struct btrfs_root *root = BTRFS_I(inode)->root;
>  		u64 len = state->end + 1 - state->start;
> +		u64 num_extents = div64_u64(len + BTRFS_MAX_EXTENT_SIZE - 1,
> +					    BTRFS_MAX_EXTENT_SIZE);
>  		bool do_list = !btrfs_is_free_space_inode(inode);
>  
> -		if (*bits & EXTENT_FIRST_DELALLOC) {
> +		if (*bits & EXTENT_FIRST_DELALLOC)
>  			*bits &= ~EXTENT_FIRST_DELALLOC;
> -		} else {
> +
> +		if (do_list) {
>  			spin_lock(&BTRFS_I(inode)->lock);
> -			BTRFS_I(inode)->outstanding_extents++;
> +			BTRFS_I(inode)->outstanding_extents += num_extents;
>  			spin_unlock(&BTRFS_I(inode)->lock);
>  		}
>  
> @@ -1803,7 +1811,7 @@ static void btrfs_clear_bit_hook(struct inode *inode,
>  
>  		if (*bits & EXTENT_FIRST_DELALLOC) {
>  			*bits &= ~EXTENT_FIRST_DELALLOC;
> -		} else if (!(*bits & EXTENT_DO_ACCOUNTING)) {
> +		} else if (!(*bits & EXTENT_DO_ACCOUNTING) && do_list) {
>  			spin_lock(&BTRFS_I(inode)->lock);
>  			BTRFS_I(inode)->outstanding_extents -= num_extents;
>  			spin_unlock(&BTRFS_I(inode)->lock);
> @@ -2001,9 +2009,52 @@ static noinline int add_pending_csums(struct btrfs_trans_handle *trans,
>  int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
>  			      struct extent_state **cached_state, int dedupe)
>  {
> +	int ret;
> +	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
> +				    BTRFS_MAX_EXTENT_SIZE);
> +
> +	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
> +	ret = set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end,
> +				  cached_state);
> +
> +	/*
> +	 * btrfs_delalloc_reserve_metadata() will first add number of
> +	 * outstanding extents according to data length, which is inaccurate
> +	 * for case like dirtying already dirty pages.
> +	 * so here we will decrease such inaccurate numbers, to make
> +	 * outstanding_extents only rely on the correct values added by
> +	 * set_bit_hook()
> +	 *
> +	 * Also, we skipped the metadata space reserve for space cache inodes,
> +	 * so don't modify the outstanding_extents value.
> +	 */
> +	if (ret == 0 && !btrfs_is_free_space_inode(inode)) {
> +		spin_lock(&BTRFS_I(inode)->lock);
> +		BTRFS_I(inode)->outstanding_extents -= num_extents;
> +		spin_unlock(&BTRFS_I(inode)->lock);
> +	}
> +
> +	return ret;
> +}
> +
> +int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
> +			    struct extent_state **cached_state)
> +{
> +	int ret;
> +	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
> +				    BTRFS_MAX_EXTENT_SIZE);
> +
>  	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
> -	return set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end,
> -				   cached_state);
> +	ret = set_extent_defrag(&BTRFS_I(inode)->io_tree, start, end,
> +				cached_state);
> +
> +	if (ret == 0 && !btrfs_is_free_space_inode(inode)) {
> +		spin_lock(&BTRFS_I(inode)->lock);
> +		BTRFS_I(inode)->outstanding_extents -= num_extents;
> +		spin_unlock(&BTRFS_I(inode)->lock);
> +	}
> +
> +	return ret;
>  }
>  
>  /* see btrfs_writepage_start_hook for details on why this is required */
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 7163457..eff5eae 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1235,10 +1235,8 @@ again:
>  				(page_cnt - i_done) << PAGE_SHIFT);
>  	}
>  
> -
> -	set_extent_defrag(&BTRFS_I(inode)->io_tree, page_start, page_end - 1,
> -			  &cached_state);
> -
> +	btrfs_set_extent_defrag(inode, page_start,
> +				page_end - 1, &cached_state);
>  	unlock_extent_cached(&BTRFS_I(inode)->io_tree,
>  			     page_start, page_end - 1, &cached_state,
>  			     GFP_NOFS);
> -- 
> 2.7.4
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] btrfs: improve inode's outstanding_extents computation
  2017-01-03 21:00   ` Liu Bo
@ 2017-01-03 23:36     ` Liu Bo
  2017-01-23  6:16       ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Liu Bo @ 2017-01-03 23:36 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, jbacik, dsterba, holger, s.priebe, quwenruo

(Resend this reply due to a message that there is an invalid email address.)

On Tue, Jan 03, 2017 at 01:00:45PM -0800, Liu Bo wrote:
> On Fri, Nov 11, 2016 at 04:39:45PM +0800, Wang Xiaoguang wrote:
> > This issue was revealed by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB,
> > When modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often
> > gets these warnings from btrfs_destroy_inode():
> > 	WARN_ON(BTRFS_I(inode)->outstanding_extents);
> > 	WARN_ON(BTRFS_I(inode)->reserved_extents);
> > 
> > Simple test program below can reproduce this issue steadily.
> > Note: you need to modify BTRFS_MAX_EXTENT_SIZE to 64KB to have test,
> > otherwise there won't be such WARNING.
> > 	#include <string.h>
> > 	#include <unistd.h>
> > 	#include <sys/types.h>
> > 	#include <sys/stat.h>
> > 	#include <fcntl.h>
> > 
> > 	int main(void)
> > 	{
> > 		int fd;
> > 		char buf[68 *1024];
> > 
> > 		memset(buf, 0, 68 * 1024);
> > 		fd = open("testfile", O_CREAT | O_EXCL | O_RDWR);
 
	        pwrite(fd, buf, 64 * 1024, 64 * 1024);

(During my test, I had to add the above in order to reproduce the
warning, another alternative way is to use truncate and pread.)
 
> > 		pwrite(fd, buf, 68 * 1024, 64 * 1024);
> > 		return;
> > 	}
> > 
> > When BTRFS_MAX_EXTENT_SIZE is 64KB, and buffered data range is:
> > 64KB						128K		132KB
> > |-----------------------------------------------|---------------|
> >                          64 + 4KB
> > 
> > 1) for above data range, btrfs_delalloc_reserve_metadata() will reserve
> > metadata and set BTRFS_I(inode)->outstanding_extents to 2.
> > (68KB + 64KB - 1) / 64KB == 2
> > 
> > Outstanding_extents: 2
> > 
> > 2) then btrfs_dirty_page() will be called to dirty pages and set
> > EXTENT_DELALLOC flag. In this case, btrfs_set_bit_hook() will be called
> > twice.
> > The 1st set_bit_hook() call will set DEALLOC flag for the first 64K.
> > 64KB						128KB
> > |-----------------------------------------------|
> > 	64KB DELALLOC
> > Outstanding_extents: 2
> 
I think that here the 1st extent [64k, 128k] is only set with
EXTENT_UPTODATE since other bits like DELALLOC has been cleared by
lock_and_cleanup_extent_if_need.

It would work if we check EXTENT_UPTODATE on deciding whether to clear
EXTENT_FIRST_DELALLOC, i.e.

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4175987..4eace1a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1752,7 +1752,12 @@ static void btrfs_set_bit_hook(struct inode *inode,
	bool do_list = !btrfs_is_free_space_inode(inode);

	if (*bits & EXTENT_FIRST_DELALLOC) {
-			*bits &= ~EXTENT_FIRST_DELALLOC;
+			/*
+			 * With EXTENT_UPTODATE, the state gets rewritten
+			 * before writing it back, or gets read from disk.
+			 */
+			if (!(state->state & EXTENT_UPTODATE))
+				*bits &= ~EXTENT_FIRST_DELALLOC;
	} else {
		spin_lock(&BTRFS_I(inode)->lock);
		BTRFS_I(inode)->outstanding_extents++;

Thanks,

-liubo
> 
> > 
> > Set_bit_hooks() uses FIRST_DELALLOC flag to avoid re-increase
> > outstanding_extents counter.
> > So for 1st set_bit_hooks() call, it won't modify outstanding_extents,
> > it's still 2.
> > 
> > Then FIRST_DELALLOC flag is *CLEARED*.
> > 
> > 3) 2nd btrfs_set_bit_hook() call.
> > Because FIRST_DELALLOC have been cleared by previous set_bit_hook(),
> > btrfs_set_bit_hook() will increase BTRFS_I(inode)->outstanding_extents by
> > one, so now BTRFS_I(inode)->outstanding_extents is 3.
> > 64KB                                            128KB            132KB
> > |-----------------------------------------------|----------------|
> > 	64K DELALLOC				   4K DELALLOC
> > Outstanding_extents: 3
> > 
> > But the correct outstanding_extents number should be 2, not 3.
> > The 2nd btrfs_set_bit_hook() call just screwed up this, and leads to the
> > WARN_ON().
> > 
> > Normally, we can solve it by only increasing outstanding_extents in
> > set_bit_hook().
> > But the problem is for delalloc_reserve/release_metadata(), we only have
> > a 'length' parameter, and calculate in-accurate outstanding_extents.
> > If we only rely on set_bit_hook() release_metadata() will crew things up
> > as it will decrease inaccurate number.
> > 
> > So the fix we use is:
> > 1) Increase *INACCURATE* outstanding_extents at delalloc_reserve_meta
> >    Just as a place holder.
> > 2) Increase *accurate* outstanding_extents at set_bit_hooks()
> >    This is the real increaser.
> > 3) Decrease *INACCURATE* outstanding_extents before returning
> >    This makes outstanding_extents to correct value.
> > 
> > For 128M BTRFS_MAX_EXTENT_SIZE, due to limitation of
> > __btrfs_buffered_write(), each iteration will only handle about 2MB
> > data.
> > So btrfs_dirty_pages() won't need to handle cases cross 2 extents.
> > 
> > Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> > ---
> >  fs/btrfs/ctree.h |  2 ++
> >  fs/btrfs/inode.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  fs/btrfs/ioctl.c |  6 ++----
> >  3 files changed, 62 insertions(+), 11 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 9d8edcb..766d152 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -3139,6 +3139,8 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput,
> >  			       int nr);
> >  int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
> >  			      struct extent_state **cached_state, int dedupe);
> > +int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
> > +			    struct extent_state **cached_state);
> >  int btrfs_create_subvol_root(struct btrfs_trans_handle *trans,
> >  			     struct btrfs_root *new_root,
> >  			     struct btrfs_root *parent_root,
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 1f980ef..25e0083 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -1601,6 +1601,9 @@ static void btrfs_split_extent_hook(struct inode *inode,
> >  	if (!(orig->state & EXTENT_DELALLOC))
> >  		return;
> >  
> > +	if (btrfs_is_free_space_inode(inode))
> > +		return;
> > +
> >  	size = orig->end - orig->start + 1;
> >  	if (size > BTRFS_MAX_EXTENT_SIZE) {
> >  		u64 num_extents;
> > @@ -1643,6 +1646,9 @@ static void btrfs_merge_extent_hook(struct inode *inode,
> >  	if (!(other->state & EXTENT_DELALLOC))
> >  		return;
> >  
> > +	if (btrfs_is_free_space_inode(inode))
> > +		return;
> > +
> >  	if (new->start > other->start)
> >  		new_size = new->end - other->start + 1;
> >  	else
> > @@ -1738,7 +1744,6 @@ static void btrfs_del_delalloc_inode(struct btrfs_root *root,
> >  static void btrfs_set_bit_hook(struct inode *inode,
> >  			       struct extent_state *state, unsigned *bits)
> >  {
> > -
> >  	if ((*bits & EXTENT_DEFRAG) && !(*bits & EXTENT_DELALLOC))
> >  		WARN_ON(1);
> >  	/*
> > @@ -1749,13 +1754,16 @@ static void btrfs_set_bit_hook(struct inode *inode,
> >  	if (!(state->state & EXTENT_DELALLOC) && (*bits & EXTENT_DELALLOC)) {
> >  		struct btrfs_root *root = BTRFS_I(inode)->root;
> >  		u64 len = state->end + 1 - state->start;
> > +		u64 num_extents = div64_u64(len + BTRFS_MAX_EXTENT_SIZE - 1,
> > +					    BTRFS_MAX_EXTENT_SIZE);
> >  		bool do_list = !btrfs_is_free_space_inode(inode);
> >  
> > -		if (*bits & EXTENT_FIRST_DELALLOC) {
> > +		if (*bits & EXTENT_FIRST_DELALLOC)
> >  			*bits &= ~EXTENT_FIRST_DELALLOC;
> > -		} else {
> > +
> > +		if (do_list) {
> >  			spin_lock(&BTRFS_I(inode)->lock);
> > -			BTRFS_I(inode)->outstanding_extents++;
> > +			BTRFS_I(inode)->outstanding_extents += num_extents;
> >  			spin_unlock(&BTRFS_I(inode)->lock);
> >  		}
> >  
> > @@ -1803,7 +1811,7 @@ static void btrfs_clear_bit_hook(struct inode *inode,
> >  
> >  		if (*bits & EXTENT_FIRST_DELALLOC) {
> >  			*bits &= ~EXTENT_FIRST_DELALLOC;
> > -		} else if (!(*bits & EXTENT_DO_ACCOUNTING)) {
> > +		} else if (!(*bits & EXTENT_DO_ACCOUNTING) && do_list) {
> >  			spin_lock(&BTRFS_I(inode)->lock);
> >  			BTRFS_I(inode)->outstanding_extents -= num_extents;
> >  			spin_unlock(&BTRFS_I(inode)->lock);
> > @@ -2001,9 +2009,52 @@ static noinline int add_pending_csums(struct btrfs_trans_handle *trans,
> >  int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
> >  			      struct extent_state **cached_state, int dedupe)
> >  {
> > +	int ret;
> > +	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
> > +				    BTRFS_MAX_EXTENT_SIZE);
> > +
> > +	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
> > +	ret = set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end,
> > +				  cached_state);
> > +
> > +	/*
> > +	 * btrfs_delalloc_reserve_metadata() will first add number of
> > +	 * outstanding extents according to data length, which is inaccurate
> > +	 * for case like dirtying already dirty pages.
> > +	 * so here we will decrease such inaccurate numbers, to make
> > +	 * outstanding_extents only rely on the correct values added by
> > +	 * set_bit_hook()
> > +	 *
> > +	 * Also, we skipped the metadata space reserve for space cache inodes,
> > +	 * so don't modify the outstanding_extents value.
> > +	 */
> > +	if (ret == 0 && !btrfs_is_free_space_inode(inode)) {
> > +		spin_lock(&BTRFS_I(inode)->lock);
> > +		BTRFS_I(inode)->outstanding_extents -= num_extents;
> > +		spin_unlock(&BTRFS_I(inode)->lock);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
> > +			    struct extent_state **cached_state)
> > +{
> > +	int ret;
> > +	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
> > +				    BTRFS_MAX_EXTENT_SIZE);
> > +
> >  	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
> > -	return set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end,
> > -				   cached_state);
> > +	ret = set_extent_defrag(&BTRFS_I(inode)->io_tree, start, end,
> > +				cached_state);
> > +
> > +	if (ret == 0 && !btrfs_is_free_space_inode(inode)) {
> > +		spin_lock(&BTRFS_I(inode)->lock);
> > +		BTRFS_I(inode)->outstanding_extents -= num_extents;
> > +		spin_unlock(&BTRFS_I(inode)->lock);
> > +	}
> > +
> > +	return ret;
> >  }
> >  
> >  /* see btrfs_writepage_start_hook for details on why this is required */
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 7163457..eff5eae 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -1235,10 +1235,8 @@ again:
> >  				(page_cnt - i_done) << PAGE_SHIFT);
> >  	}
> >  
> > -
> > -	set_extent_defrag(&BTRFS_I(inode)->io_tree, page_start, page_end - 1,
> > -			  &cached_state);
> > -
> > +	btrfs_set_extent_defrag(inode, page_start,
> > +				page_end - 1, &cached_state);
> >  	unlock_extent_cached(&BTRFS_I(inode)->io_tree,
> >  			     page_start, page_end - 1, &cached_state,
> >  			     GFP_NOFS);
> > -- 
> > 2.7.4
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues
  2017-01-01  9:32   ` Qu Wenruo
@ 2017-01-04 16:13     ` Stefan Priebe - Profihost AG
  2017-02-25  8:23       ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-01-04 16:13 UTC (permalink / raw)
  To: Qu Wenruo, Wang Xiaoguang, linux-btrfs; +Cc: clm, jbacik, dsterba, holger

Hi Qu,

Am 01.01.2017 um 10:32 schrieb Qu Wenruo:
> Hi Stefan,
> 
> I'm trying to push it to for-next (will be v4.11), but no response yet.
> 
> It would be quite nice for you to test the following git pull and give
> some feedback, so that we can merge it faster.
> 
> https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60418.html

I'm also getting a notifier that wang's email does not exist anymore
(wangxg.fnst@cn.fujitsu.com).

I would like to test that branch will need some time todo so. Last time
i tried 4.10-rc1 i had the same problems like this guy:
https://www.marc.info/?l=linux-btrfs&m=148338312525137&w=2

Stefan

> Thanks,
> Qu
> 
> On 12/31/2016 03:31 PM, Stefan Priebe - Profihost AG wrote:
>> Any news on this series? I can't see it in 4.9 nor in 4.10-rc
>>
>> Stefan
>>
>> Am 11.11.2016 um 09:39 schrieb Wang Xiaoguang:
>>> When having compression enabled, Stefan Priebe ofen got enospc errors
>>> though fs still has much free space. Qu Wenruo also has submitted a
>>> fstests test case which can reproduce this bug steadily, please see
>>> url: https://patchwork.kernel.org/patch/9420527
>>>
>>> First patch[1/3] "btrfs: improve inode's outstanding_extents
>>> computation" is to
>>> fix outstanding_extents and reserved_extents account issues. This
>>> issue was revealed
>>> by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, When modifying
>>> BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often gets these
>>> warnings from
>>> btrfs_destroy_inode():
>>>         WARN_ON(BTRFS_I(inode)->outstanding_extents);
>>>         WARN_ON(BTRFS_I(inode)->reserved_extents);
>>> Please see this patch's commit message for detailed info, and this
>>> patch is
>>> necessary to patch2 and patch3.
>>>
>>> For false enospc, the root reasson is that for compression, its max
>>> extent size will
>>> be 128k, not 128MB. If we still use 128MB as max extent size to
>>> reserve metadata for
>>> compression, obviously it's not appropriate. In patch "btrfs:
>>> Introduce COMPRESS
>>> reserve type to fix false enospc for compression" commit message,
>>> we explain why false enospc error occurs, please see it for detailed
>>> info.
>>>
>>> To fix this issue, we introduce a new enum type:
>>>     enum btrfs_metadata_reserve_type {
>>>         BTRFS_RESERVE_NORMAL,
>>>             BTRFS_RESERVE_COMPRESS,
>>>     };
>>> For btrfs_delalloc_[reserve|release]_metadata() and
>>> btrfs_delalloc_[reserve|release]_space(), we introce a new
>>> btrfs_metadata_reserve_type
>>> argument, then if a path needs to go compression, we pass
>>> BTRFS_RESERVE_COMPRESS,
>>> otherwise pass BTRFS_RESERVE_NORMAL.
>>>
>>> With these patchs, Stefan no longer saw such false enospc errors, and
>>> Qu Wenruo's
>>> fstests test case will also pass. I have also run whole fstests
>>> multiple times,
>>> no regression occurs, thanks.
>>>
>>> Wang Xiaoguang (3):
>>>   btrfs: improve inode's outstanding_extents computation
>>>   btrfs: introduce type based delalloc metadata reserve
>>>   btrfs: Introduce COMPRESS reserve type to fix false enospc for
>>>     compression
>>>
>>>  fs/btrfs/ctree.h             |  36 +++++--
>>>  fs/btrfs/extent-tree.c       |  52 ++++++---
>>>  fs/btrfs/extent_io.c         |  61 ++++++++++-
>>>  fs/btrfs/extent_io.h         |   5 +
>>>  fs/btrfs/file.c              |  25 +++--
>>>  fs/btrfs/free-space-cache.c  |   6 +-
>>>  fs/btrfs/inode-map.c         |   6 +-
>>>  fs/btrfs/inode.c             | 246
>>> ++++++++++++++++++++++++++++++++++---------
>>>  fs/btrfs/ioctl.c             |  16 +--
>>>  fs/btrfs/relocation.c        |  14 ++-
>>>  fs/btrfs/tests/inode-tests.c |  15 +--
>>>  11 files changed, 381 insertions(+), 101 deletions(-)
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] btrfs: improve inode's outstanding_extents computation
  2017-01-03 23:36     ` Liu Bo
@ 2017-01-23  6:16       ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2017-01-23  6:16 UTC (permalink / raw)
  To: bo.li.liu, linux-btrfs; +Cc: clm, jbacik, dsterba, holger, s.priebe

Hi Liu,

Sorry for the late reply.

At 01/04/2017 07:36 AM, Liu Bo wrote:
> (Resend this reply due to a message that there is an invalid email address.)
>
> On Tue, Jan 03, 2017 at 01:00:45PM -0800, Liu Bo wrote:
>> On Fri, Nov 11, 2016 at 04:39:45PM +0800, Wang Xiaoguang wrote:
>>> This issue was revealed by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB,
>>> When modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often
>>> gets these warnings from btrfs_destroy_inode():
>>> 	WARN_ON(BTRFS_I(inode)->outstanding_extents);
>>> 	WARN_ON(BTRFS_I(inode)->reserved_extents);
>>>
>>> Simple test program below can reproduce this issue steadily.
>>> Note: you need to modify BTRFS_MAX_EXTENT_SIZE to 64KB to have test,
>>> otherwise there won't be such WARNING.
>>> 	#include <string.h>
>>> 	#include <unistd.h>
>>> 	#include <sys/types.h>
>>> 	#include <sys/stat.h>
>>> 	#include <fcntl.h>
>>>
>>> 	int main(void)
>>> 	{
>>> 		int fd;
>>> 		char buf[68 *1024];
>>>
>>> 		memset(buf, 0, 68 * 1024);
>>> 		fd = open("testfile", O_CREAT | O_EXCL | O_RDWR);
>
> 	        pwrite(fd, buf, 64 * 1024, 64 * 1024);
>
> (During my test, I had to add the above in order to reproduce the
> warning, another alternative way is to use truncate and pread.)
>
>>> 		pwrite(fd, buf, 68 * 1024, 64 * 1024);
>>> 		return;
>>> 	}
>>>
>>> When BTRFS_MAX_EXTENT_SIZE is 64KB, and buffered data range is:
>>> 64KB						128K		132KB
>>> |-----------------------------------------------|---------------|
>>>                          64 + 4KB
>>>
>>> 1) for above data range, btrfs_delalloc_reserve_metadata() will reserve
>>> metadata and set BTRFS_I(inode)->outstanding_extents to 2.
>>> (68KB + 64KB - 1) / 64KB == 2
>>>
>>> Outstanding_extents: 2
>>>
>>> 2) then btrfs_dirty_page() will be called to dirty pages and set
>>> EXTENT_DELALLOC flag. In this case, btrfs_set_bit_hook() will be called
>>> twice.
>>> The 1st set_bit_hook() call will set DEALLOC flag for the first 64K.
>>> 64KB						128KB
>>> |-----------------------------------------------|
>>> 	64KB DELALLOC
>>> Outstanding_extents: 2
>>
> I think that here the 1st extent [64k, 128k] is only set with
> EXTENT_UPTODATE since other bits like DELALLOC has been cleared by
> lock_and_cleanup_extent_if_need.
>
> It would work if we check EXTENT_UPTODATE on deciding whether to clear
> EXTENT_FIRST_DELALLOC, i.e.
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 4175987..4eace1a 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1752,7 +1752,12 @@ static void btrfs_set_bit_hook(struct inode *inode,
> 	bool do_list = !btrfs_is_free_space_inode(inode);
>
> 	if (*bits & EXTENT_FIRST_DELALLOC) {
> -			*bits &= ~EXTENT_FIRST_DELALLOC;
> +			/*
> +			 * With EXTENT_UPTODATE, the state gets rewritten
> +			 * before writing it back, or gets read from disk.
> +			 */
> +			if (!(state->state & EXTENT_UPTODATE))
> +				*bits &= ~EXTENT_FIRST_DELALLOC;
> 	} else {
> 		spin_lock(&BTRFS_I(inode)->lock);
> 		BTRFS_I(inode)->outstanding_extents++;
>

This fix seems quite interesting and clean, I'll try if it can replace 
Wang's patches.

Thanks,
Qu

> Thanks,
>
> -liubo
>>
>>>
>>> Set_bit_hooks() uses FIRST_DELALLOC flag to avoid re-increase
>>> outstanding_extents counter.
>>> So for 1st set_bit_hooks() call, it won't modify outstanding_extents,
>>> it's still 2.
>>>
>>> Then FIRST_DELALLOC flag is *CLEARED*.
>>>
>>> 3) 2nd btrfs_set_bit_hook() call.
>>> Because FIRST_DELALLOC have been cleared by previous set_bit_hook(),
>>> btrfs_set_bit_hook() will increase BTRFS_I(inode)->outstanding_extents by
>>> one, so now BTRFS_I(inode)->outstanding_extents is 3.
>>> 64KB                                            128KB            132KB
>>> |-----------------------------------------------|----------------|
>>> 	64K DELALLOC				   4K DELALLOC
>>> Outstanding_extents: 3
>>>
>>> But the correct outstanding_extents number should be 2, not 3.
>>> The 2nd btrfs_set_bit_hook() call just screwed up this, and leads to the
>>> WARN_ON().
>>>
>>> Normally, we can solve it by only increasing outstanding_extents in
>>> set_bit_hook().
>>> But the problem is for delalloc_reserve/release_metadata(), we only have
>>> a 'length' parameter, and calculate in-accurate outstanding_extents.
>>> If we only rely on set_bit_hook() release_metadata() will crew things up
>>> as it will decrease inaccurate number.
>>>
>>> So the fix we use is:
>>> 1) Increase *INACCURATE* outstanding_extents at delalloc_reserve_meta
>>>    Just as a place holder.
>>> 2) Increase *accurate* outstanding_extents at set_bit_hooks()
>>>    This is the real increaser.
>>> 3) Decrease *INACCURATE* outstanding_extents before returning
>>>    This makes outstanding_extents to correct value.
>>>
>>> For 128M BTRFS_MAX_EXTENT_SIZE, due to limitation of
>>> __btrfs_buffered_write(), each iteration will only handle about 2MB
>>> data.
>>> So btrfs_dirty_pages() won't need to handle cases cross 2 extents.
>>>
>>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>> ---
>>>  fs/btrfs/ctree.h |  2 ++
>>>  fs/btrfs/inode.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++------
>>>  fs/btrfs/ioctl.c |  6 ++----
>>>  3 files changed, 62 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index 9d8edcb..766d152 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -3139,6 +3139,8 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput,
>>>  			       int nr);
>>>  int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
>>>  			      struct extent_state **cached_state, int dedupe);
>>> +int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
>>> +			    struct extent_state **cached_state);
>>>  int btrfs_create_subvol_root(struct btrfs_trans_handle *trans,
>>>  			     struct btrfs_root *new_root,
>>>  			     struct btrfs_root *parent_root,
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index 1f980ef..25e0083 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -1601,6 +1601,9 @@ static void btrfs_split_extent_hook(struct inode *inode,
>>>  	if (!(orig->state & EXTENT_DELALLOC))
>>>  		return;
>>>
>>> +	if (btrfs_is_free_space_inode(inode))
>>> +		return;
>>> +
>>>  	size = orig->end - orig->start + 1;
>>>  	if (size > BTRFS_MAX_EXTENT_SIZE) {
>>>  		u64 num_extents;
>>> @@ -1643,6 +1646,9 @@ static void btrfs_merge_extent_hook(struct inode *inode,
>>>  	if (!(other->state & EXTENT_DELALLOC))
>>>  		return;
>>>
>>> +	if (btrfs_is_free_space_inode(inode))
>>> +		return;
>>> +
>>>  	if (new->start > other->start)
>>>  		new_size = new->end - other->start + 1;
>>>  	else
>>> @@ -1738,7 +1744,6 @@ static void btrfs_del_delalloc_inode(struct btrfs_root *root,
>>>  static void btrfs_set_bit_hook(struct inode *inode,
>>>  			       struct extent_state *state, unsigned *bits)
>>>  {
>>> -
>>>  	if ((*bits & EXTENT_DEFRAG) && !(*bits & EXTENT_DELALLOC))
>>>  		WARN_ON(1);
>>>  	/*
>>> @@ -1749,13 +1754,16 @@ static void btrfs_set_bit_hook(struct inode *inode,
>>>  	if (!(state->state & EXTENT_DELALLOC) && (*bits & EXTENT_DELALLOC)) {
>>>  		struct btrfs_root *root = BTRFS_I(inode)->root;
>>>  		u64 len = state->end + 1 - state->start;
>>> +		u64 num_extents = div64_u64(len + BTRFS_MAX_EXTENT_SIZE - 1,
>>> +					    BTRFS_MAX_EXTENT_SIZE);
>>>  		bool do_list = !btrfs_is_free_space_inode(inode);
>>>
>>> -		if (*bits & EXTENT_FIRST_DELALLOC) {
>>> +		if (*bits & EXTENT_FIRST_DELALLOC)
>>>  			*bits &= ~EXTENT_FIRST_DELALLOC;
>>> -		} else {
>>> +
>>> +		if (do_list) {
>>>  			spin_lock(&BTRFS_I(inode)->lock);
>>> -			BTRFS_I(inode)->outstanding_extents++;
>>> +			BTRFS_I(inode)->outstanding_extents += num_extents;
>>>  			spin_unlock(&BTRFS_I(inode)->lock);
>>>  		}
>>>
>>> @@ -1803,7 +1811,7 @@ static void btrfs_clear_bit_hook(struct inode *inode,
>>>
>>>  		if (*bits & EXTENT_FIRST_DELALLOC) {
>>>  			*bits &= ~EXTENT_FIRST_DELALLOC;
>>> -		} else if (!(*bits & EXTENT_DO_ACCOUNTING)) {
>>> +		} else if (!(*bits & EXTENT_DO_ACCOUNTING) && do_list) {
>>>  			spin_lock(&BTRFS_I(inode)->lock);
>>>  			BTRFS_I(inode)->outstanding_extents -= num_extents;
>>>  			spin_unlock(&BTRFS_I(inode)->lock);
>>> @@ -2001,9 +2009,52 @@ static noinline int add_pending_csums(struct btrfs_trans_handle *trans,
>>>  int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
>>>  			      struct extent_state **cached_state, int dedupe)
>>>  {
>>> +	int ret;
>>> +	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
>>> +				    BTRFS_MAX_EXTENT_SIZE);
>>> +
>>> +	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
>>> +	ret = set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end,
>>> +				  cached_state);
>>> +
>>> +	/*
>>> +	 * btrfs_delalloc_reserve_metadata() will first add number of
>>> +	 * outstanding extents according to data length, which is inaccurate
>>> +	 * for case like dirtying already dirty pages.
>>> +	 * so here we will decrease such inaccurate numbers, to make
>>> +	 * outstanding_extents only rely on the correct values added by
>>> +	 * set_bit_hook()
>>> +	 *
>>> +	 * Also, we skipped the metadata space reserve for space cache inodes,
>>> +	 * so don't modify the outstanding_extents value.
>>> +	 */
>>> +	if (ret == 0 && !btrfs_is_free_space_inode(inode)) {
>>> +		spin_lock(&BTRFS_I(inode)->lock);
>>> +		BTRFS_I(inode)->outstanding_extents -= num_extents;
>>> +		spin_unlock(&BTRFS_I(inode)->lock);
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
>>> +			    struct extent_state **cached_state)
>>> +{
>>> +	int ret;
>>> +	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
>>> +				    BTRFS_MAX_EXTENT_SIZE);
>>> +
>>>  	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
>>> -	return set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end,
>>> -				   cached_state);
>>> +	ret = set_extent_defrag(&BTRFS_I(inode)->io_tree, start, end,
>>> +				cached_state);
>>> +
>>> +	if (ret == 0 && !btrfs_is_free_space_inode(inode)) {
>>> +		spin_lock(&BTRFS_I(inode)->lock);
>>> +		BTRFS_I(inode)->outstanding_extents -= num_extents;
>>> +		spin_unlock(&BTRFS_I(inode)->lock);
>>> +	}
>>> +
>>> +	return ret;
>>>  }
>>>
>>>  /* see btrfs_writepage_start_hook for details on why this is required */
>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>> index 7163457..eff5eae 100644
>>> --- a/fs/btrfs/ioctl.c
>>> +++ b/fs/btrfs/ioctl.c
>>> @@ -1235,10 +1235,8 @@ again:
>>>  				(page_cnt - i_done) << PAGE_SHIFT);
>>>  	}
>>>
>>> -
>>> -	set_extent_defrag(&BTRFS_I(inode)->io_tree, page_start, page_end - 1,
>>> -			  &cached_state);
>>> -
>>> +	btrfs_set_extent_defrag(inode, page_start,
>>> +				page_end - 1, &cached_state);
>>>  	unlock_extent_cached(&BTRFS_I(inode)->io_tree,
>>>  			     page_start, page_end - 1, &cached_state,
>>>  			     GFP_NOFS);
>>> --
>>> 2.7.4
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues
  2017-01-04 16:13     ` Stefan Priebe - Profihost AG
@ 2017-02-25  8:23       ` Stefan Priebe - Profihost AG
  2017-02-27  0:46         ` Qu Wenruo
  2017-02-27  7:22         ` Qu Wenruo
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-02-25  8:23 UTC (permalink / raw)
  To: Qu Wenruo, Wang Xiaoguang, linux-btrfs; +Cc: clm, jbacik, dsterba, holger

Dear Qu,

any news on your branch? I still don't see it merged anywhere.

Greets,
Stefan

Am 04.01.2017 um 17:13 schrieb Stefan Priebe - Profihost AG:
> Hi Qu,
> 
> Am 01.01.2017 um 10:32 schrieb Qu Wenruo:
>> Hi Stefan,
>>
>> I'm trying to push it to for-next (will be v4.11), but no response yet.
>>
>> It would be quite nice for you to test the following git pull and give
>> some feedback, so that we can merge it faster.
>>
>> https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60418.html
> 
> I'm also getting a notifier that wang's email does not exist anymore
> (wangxg.fnst@cn.fujitsu.com).
> 
> I would like to test that branch will need some time todo so. Last time
> i tried 4.10-rc1 i had the same problems like this guy:
> https://www.marc.info/?l=linux-btrfs&m=148338312525137&w=2
> 
> Stefan
> 
>> Thanks,
>> Qu
>>
>> On 12/31/2016 03:31 PM, Stefan Priebe - Profihost AG wrote:
>>> Any news on this series? I can't see it in 4.9 nor in 4.10-rc
>>>
>>> Stefan
>>>
>>> Am 11.11.2016 um 09:39 schrieb Wang Xiaoguang:
>>>> When having compression enabled, Stefan Priebe ofen got enospc errors
>>>> though fs still has much free space. Qu Wenruo also has submitted a
>>>> fstests test case which can reproduce this bug steadily, please see
>>>> url: https://patchwork.kernel.org/patch/9420527
>>>>
>>>> First patch[1/3] "btrfs: improve inode's outstanding_extents
>>>> computation" is to
>>>> fix outstanding_extents and reserved_extents account issues. This
>>>> issue was revealed
>>>> by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, When modifying
>>>> BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often gets these
>>>> warnings from
>>>> btrfs_destroy_inode():
>>>>         WARN_ON(BTRFS_I(inode)->outstanding_extents);
>>>>         WARN_ON(BTRFS_I(inode)->reserved_extents);
>>>> Please see this patch's commit message for detailed info, and this
>>>> patch is
>>>> necessary to patch2 and patch3.
>>>>
>>>> For false enospc, the root reasson is that for compression, its max
>>>> extent size will
>>>> be 128k, not 128MB. If we still use 128MB as max extent size to
>>>> reserve metadata for
>>>> compression, obviously it's not appropriate. In patch "btrfs:
>>>> Introduce COMPRESS
>>>> reserve type to fix false enospc for compression" commit message,
>>>> we explain why false enospc error occurs, please see it for detailed
>>>> info.
>>>>
>>>> To fix this issue, we introduce a new enum type:
>>>>     enum btrfs_metadata_reserve_type {
>>>>         BTRFS_RESERVE_NORMAL,
>>>>             BTRFS_RESERVE_COMPRESS,
>>>>     };
>>>> For btrfs_delalloc_[reserve|release]_metadata() and
>>>> btrfs_delalloc_[reserve|release]_space(), we introce a new
>>>> btrfs_metadata_reserve_type
>>>> argument, then if a path needs to go compression, we pass
>>>> BTRFS_RESERVE_COMPRESS,
>>>> otherwise pass BTRFS_RESERVE_NORMAL.
>>>>
>>>> With these patchs, Stefan no longer saw such false enospc errors, and
>>>> Qu Wenruo's
>>>> fstests test case will also pass. I have also run whole fstests
>>>> multiple times,
>>>> no regression occurs, thanks.
>>>>
>>>> Wang Xiaoguang (3):
>>>>   btrfs: improve inode's outstanding_extents computation
>>>>   btrfs: introduce type based delalloc metadata reserve
>>>>   btrfs: Introduce COMPRESS reserve type to fix false enospc for
>>>>     compression
>>>>
>>>>  fs/btrfs/ctree.h             |  36 +++++--
>>>>  fs/btrfs/extent-tree.c       |  52 ++++++---
>>>>  fs/btrfs/extent_io.c         |  61 ++++++++++-
>>>>  fs/btrfs/extent_io.h         |   5 +
>>>>  fs/btrfs/file.c              |  25 +++--
>>>>  fs/btrfs/free-space-cache.c  |   6 +-
>>>>  fs/btrfs/inode-map.c         |   6 +-
>>>>  fs/btrfs/inode.c             | 246
>>>> ++++++++++++++++++++++++++++++++++---------
>>>>  fs/btrfs/ioctl.c             |  16 +--
>>>>  fs/btrfs/relocation.c        |  14 ++-
>>>>  fs/btrfs/tests/inode-tests.c |  15 +--
>>>>  11 files changed, 381 insertions(+), 101 deletions(-)
>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues
  2017-02-25  8:23       ` Stefan Priebe - Profihost AG
@ 2017-02-27  0:46         ` Qu Wenruo
  2017-02-27  7:22         ` Qu Wenruo
  1 sibling, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2017-02-27  0:46 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Qu Wenruo, Wang Xiaoguang, linux-btrfs
  Cc: clm, jbacik, dsterba, holger

I sent mail to David asking about these unmerged patchset, but no reply yet.

And to my surprise, the patch even disappeared from david's for-next branch.

Not sure what's wrong with all these unmerged patches.

Thanks,
Qu

At 02/25/2017 04:23 PM, Stefan Priebe - Profihost AG wrote:
> Dear Qu,
>
> any news on your branch? I still don't see it merged anywhere.
>
> Greets,
> Stefan
>
> Am 04.01.2017 um 17:13 schrieb Stefan Priebe - Profihost AG:
>> Hi Qu,
>>
>> Am 01.01.2017 um 10:32 schrieb Qu Wenruo:
>>> Hi Stefan,
>>>
>>> I'm trying to push it to for-next (will be v4.11), but no response yet.
>>>
>>> It would be quite nice for you to test the following git pull and give
>>> some feedback, so that we can merge it faster.
>>>
>>> https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60418.html
>>
>> I'm also getting a notifier that wang's email does not exist anymore
>> (wangxg.fnst@cn.fujitsu.com).
>>
>> I would like to test that branch will need some time todo so. Last time
>> i tried 4.10-rc1 i had the same problems like this guy:
>> https://www.marc.info/?l=linux-btrfs&m=148338312525137&w=2
>>
>> Stefan
>>
>>> Thanks,
>>> Qu
>>>
>>> On 12/31/2016 03:31 PM, Stefan Priebe - Profihost AG wrote:
>>>> Any news on this series? I can't see it in 4.9 nor in 4.10-rc
>>>>
>>>> Stefan
>>>>
>>>> Am 11.11.2016 um 09:39 schrieb Wang Xiaoguang:
>>>>> When having compression enabled, Stefan Priebe ofen got enospc errors
>>>>> though fs still has much free space. Qu Wenruo also has submitted a
>>>>> fstests test case which can reproduce this bug steadily, please see
>>>>> url: https://patchwork.kernel.org/patch/9420527
>>>>>
>>>>> First patch[1/3] "btrfs: improve inode's outstanding_extents
>>>>> computation" is to
>>>>> fix outstanding_extents and reserved_extents account issues. This
>>>>> issue was revealed
>>>>> by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, When modifying
>>>>> BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often gets these
>>>>> warnings from
>>>>> btrfs_destroy_inode():
>>>>>         WARN_ON(BTRFS_I(inode)->outstanding_extents);
>>>>>         WARN_ON(BTRFS_I(inode)->reserved_extents);
>>>>> Please see this patch's commit message for detailed info, and this
>>>>> patch is
>>>>> necessary to patch2 and patch3.
>>>>>
>>>>> For false enospc, the root reasson is that for compression, its max
>>>>> extent size will
>>>>> be 128k, not 128MB. If we still use 128MB as max extent size to
>>>>> reserve metadata for
>>>>> compression, obviously it's not appropriate. In patch "btrfs:
>>>>> Introduce COMPRESS
>>>>> reserve type to fix false enospc for compression" commit message,
>>>>> we explain why false enospc error occurs, please see it for detailed
>>>>> info.
>>>>>
>>>>> To fix this issue, we introduce a new enum type:
>>>>>     enum btrfs_metadata_reserve_type {
>>>>>         BTRFS_RESERVE_NORMAL,
>>>>>             BTRFS_RESERVE_COMPRESS,
>>>>>     };
>>>>> For btrfs_delalloc_[reserve|release]_metadata() and
>>>>> btrfs_delalloc_[reserve|release]_space(), we introce a new
>>>>> btrfs_metadata_reserve_type
>>>>> argument, then if a path needs to go compression, we pass
>>>>> BTRFS_RESERVE_COMPRESS,
>>>>> otherwise pass BTRFS_RESERVE_NORMAL.
>>>>>
>>>>> With these patchs, Stefan no longer saw such false enospc errors, and
>>>>> Qu Wenruo's
>>>>> fstests test case will also pass. I have also run whole fstests
>>>>> multiple times,
>>>>> no regression occurs, thanks.
>>>>>
>>>>> Wang Xiaoguang (3):
>>>>>   btrfs: improve inode's outstanding_extents computation
>>>>>   btrfs: introduce type based delalloc metadata reserve
>>>>>   btrfs: Introduce COMPRESS reserve type to fix false enospc for
>>>>>     compression
>>>>>
>>>>>  fs/btrfs/ctree.h             |  36 +++++--
>>>>>  fs/btrfs/extent-tree.c       |  52 ++++++---
>>>>>  fs/btrfs/extent_io.c         |  61 ++++++++++-
>>>>>  fs/btrfs/extent_io.h         |   5 +
>>>>>  fs/btrfs/file.c              |  25 +++--
>>>>>  fs/btrfs/free-space-cache.c  |   6 +-
>>>>>  fs/btrfs/inode-map.c         |   6 +-
>>>>>  fs/btrfs/inode.c             | 246
>>>>> ++++++++++++++++++++++++++++++++++---------
>>>>>  fs/btrfs/ioctl.c             |  16 +--
>>>>>  fs/btrfs/relocation.c        |  14 ++-
>>>>>  fs/btrfs/tests/inode-tests.c |  15 +--
>>>>>  11 files changed, 381 insertions(+), 101 deletions(-)
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues
  2017-02-25  8:23       ` Stefan Priebe - Profihost AG
  2017-02-27  0:46         ` Qu Wenruo
@ 2017-02-27  7:22         ` Qu Wenruo
  2017-02-27 13:43           ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2017-02-27  7:22 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Qu Wenruo, Wang Xiaoguang, linux-btrfs
  Cc: clm, jbacik, dsterba, holger



At 02/25/2017 04:23 PM, Stefan Priebe - Profihost AG wrote:
> Dear Qu,
>
> any news on your branch? I still don't see it merged anywhere.
>
> Greets,
> Stefan

I just remember that Liu Bo has commented one of the patches, I'm afraid 
I can only push these patches until I addressed his concern.

I'll start digging it as memory for this fix is quite blurred now.

Thanks,
Qu
>
> Am 04.01.2017 um 17:13 schrieb Stefan Priebe - Profihost AG:
>> Hi Qu,
>>
>> Am 01.01.2017 um 10:32 schrieb Qu Wenruo:
>>> Hi Stefan,
>>>
>>> I'm trying to push it to for-next (will be v4.11), but no response yet.
>>>
>>> It would be quite nice for you to test the following git pull and give
>>> some feedback, so that we can merge it faster.
>>>
>>> https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60418.html
>>
>> I'm also getting a notifier that wang's email does not exist anymore
>> (wangxg.fnst@cn.fujitsu.com).
>>
>> I would like to test that branch will need some time todo so. Last time
>> i tried 4.10-rc1 i had the same problems like this guy:
>> https://www.marc.info/?l=linux-btrfs&m=148338312525137&w=2
>>
>> Stefan
>>
>>> Thanks,
>>> Qu
>>>
>>> On 12/31/2016 03:31 PM, Stefan Priebe - Profihost AG wrote:
>>>> Any news on this series? I can't see it in 4.9 nor in 4.10-rc
>>>>
>>>> Stefan
>>>>
>>>> Am 11.11.2016 um 09:39 schrieb Wang Xiaoguang:
>>>>> When having compression enabled, Stefan Priebe ofen got enospc errors
>>>>> though fs still has much free space. Qu Wenruo also has submitted a
>>>>> fstests test case which can reproduce this bug steadily, please see
>>>>> url: https://patchwork.kernel.org/patch/9420527
>>>>>
>>>>> First patch[1/3] "btrfs: improve inode's outstanding_extents
>>>>> computation" is to
>>>>> fix outstanding_extents and reserved_extents account issues. This
>>>>> issue was revealed
>>>>> by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, When modifying
>>>>> BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often gets these
>>>>> warnings from
>>>>> btrfs_destroy_inode():
>>>>>         WARN_ON(BTRFS_I(inode)->outstanding_extents);
>>>>>         WARN_ON(BTRFS_I(inode)->reserved_extents);
>>>>> Please see this patch's commit message for detailed info, and this
>>>>> patch is
>>>>> necessary to patch2 and patch3.
>>>>>
>>>>> For false enospc, the root reasson is that for compression, its max
>>>>> extent size will
>>>>> be 128k, not 128MB. If we still use 128MB as max extent size to
>>>>> reserve metadata for
>>>>> compression, obviously it's not appropriate. In patch "btrfs:
>>>>> Introduce COMPRESS
>>>>> reserve type to fix false enospc for compression" commit message,
>>>>> we explain why false enospc error occurs, please see it for detailed
>>>>> info.
>>>>>
>>>>> To fix this issue, we introduce a new enum type:
>>>>>     enum btrfs_metadata_reserve_type {
>>>>>         BTRFS_RESERVE_NORMAL,
>>>>>             BTRFS_RESERVE_COMPRESS,
>>>>>     };
>>>>> For btrfs_delalloc_[reserve|release]_metadata() and
>>>>> btrfs_delalloc_[reserve|release]_space(), we introce a new
>>>>> btrfs_metadata_reserve_type
>>>>> argument, then if a path needs to go compression, we pass
>>>>> BTRFS_RESERVE_COMPRESS,
>>>>> otherwise pass BTRFS_RESERVE_NORMAL.
>>>>>
>>>>> With these patchs, Stefan no longer saw such false enospc errors, and
>>>>> Qu Wenruo's
>>>>> fstests test case will also pass. I have also run whole fstests
>>>>> multiple times,
>>>>> no regression occurs, thanks.
>>>>>
>>>>> Wang Xiaoguang (3):
>>>>>   btrfs: improve inode's outstanding_extents computation
>>>>>   btrfs: introduce type based delalloc metadata reserve
>>>>>   btrfs: Introduce COMPRESS reserve type to fix false enospc for
>>>>>     compression
>>>>>
>>>>>  fs/btrfs/ctree.h             |  36 +++++--
>>>>>  fs/btrfs/extent-tree.c       |  52 ++++++---
>>>>>  fs/btrfs/extent_io.c         |  61 ++++++++++-
>>>>>  fs/btrfs/extent_io.h         |   5 +
>>>>>  fs/btrfs/file.c              |  25 +++--
>>>>>  fs/btrfs/free-space-cache.c  |   6 +-
>>>>>  fs/btrfs/inode-map.c         |   6 +-
>>>>>  fs/btrfs/inode.c             | 246
>>>>> ++++++++++++++++++++++++++++++++++---------
>>>>>  fs/btrfs/ioctl.c             |  16 +--
>>>>>  fs/btrfs/relocation.c        |  14 ++-
>>>>>  fs/btrfs/tests/inode-tests.c |  15 +--
>>>>>  11 files changed, 381 insertions(+), 101 deletions(-)
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues
  2017-02-27  7:22         ` Qu Wenruo
@ 2017-02-27 13:43           ` Stefan Priebe - Profihost AG
  2017-04-25 19:25             ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-02-27 13:43 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, Wang Xiaoguang, linux-btrfs
  Cc: clm, jbacik, dsterba, holger

Hi,

can please anybody comment on that one? Josef? Chris? I still need those
patches to be able to let btrfs run for more than 24hours without ENOSPC
issues.

Greets,
Stefan

Am 27.02.2017 um 08:22 schrieb Qu Wenruo:
> 
> 
> At 02/25/2017 04:23 PM, Stefan Priebe - Profihost AG wrote:
>> Dear Qu,
>>
>> any news on your branch? I still don't see it merged anywhere.
>>
>> Greets,
>> Stefan
> 
> I just remember that Liu Bo has commented one of the patches, I'm afraid
> I can only push these patches until I addressed his concern.
> 
> I'll start digging it as memory for this fix is quite blurred now.
> 
> Thanks,
> Qu
>>
>> Am 04.01.2017 um 17:13 schrieb Stefan Priebe - Profihost AG:
>>> Hi Qu,
>>>
>>> Am 01.01.2017 um 10:32 schrieb Qu Wenruo:
>>>> Hi Stefan,
>>>>
>>>> I'm trying to push it to for-next (will be v4.11), but no response yet.
>>>>
>>>> It would be quite nice for you to test the following git pull and give
>>>> some feedback, so that we can merge it faster.
>>>>
>>>> https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60418.html
>>>
>>> I'm also getting a notifier that wang's email does not exist anymore
>>> (wangxg.fnst@cn.fujitsu.com).
>>>
>>> I would like to test that branch will need some time todo so. Last time
>>> i tried 4.10-rc1 i had the same problems like this guy:
>>> https://www.marc.info/?l=linux-btrfs&m=148338312525137&w=2
>>>
>>> Stefan
>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>> On 12/31/2016 03:31 PM, Stefan Priebe - Profihost AG wrote:
>>>>> Any news on this series? I can't see it in 4.9 nor in 4.10-rc
>>>>>
>>>>> Stefan
>>>>>
>>>>> Am 11.11.2016 um 09:39 schrieb Wang Xiaoguang:
>>>>>> When having compression enabled, Stefan Priebe ofen got enospc errors
>>>>>> though fs still has much free space. Qu Wenruo also has submitted a
>>>>>> fstests test case which can reproduce this bug steadily, please see
>>>>>> url: https://patchwork.kernel.org/patch/9420527
>>>>>>
>>>>>> First patch[1/3] "btrfs: improve inode's outstanding_extents
>>>>>> computation" is to
>>>>>> fix outstanding_extents and reserved_extents account issues. This
>>>>>> issue was revealed
>>>>>> by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, When modifying
>>>>>> BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often gets these
>>>>>> warnings from
>>>>>> btrfs_destroy_inode():
>>>>>>         WARN_ON(BTRFS_I(inode)->outstanding_extents);
>>>>>>         WARN_ON(BTRFS_I(inode)->reserved_extents);
>>>>>> Please see this patch's commit message for detailed info, and this
>>>>>> patch is
>>>>>> necessary to patch2 and patch3.
>>>>>>
>>>>>> For false enospc, the root reasson is that for compression, its max
>>>>>> extent size will
>>>>>> be 128k, not 128MB. If we still use 128MB as max extent size to
>>>>>> reserve metadata for
>>>>>> compression, obviously it's not appropriate. In patch "btrfs:
>>>>>> Introduce COMPRESS
>>>>>> reserve type to fix false enospc for compression" commit message,
>>>>>> we explain why false enospc error occurs, please see it for detailed
>>>>>> info.
>>>>>>
>>>>>> To fix this issue, we introduce a new enum type:
>>>>>>     enum btrfs_metadata_reserve_type {
>>>>>>         BTRFS_RESERVE_NORMAL,
>>>>>>             BTRFS_RESERVE_COMPRESS,
>>>>>>     };
>>>>>> For btrfs_delalloc_[reserve|release]_metadata() and
>>>>>> btrfs_delalloc_[reserve|release]_space(), we introce a new
>>>>>> btrfs_metadata_reserve_type
>>>>>> argument, then if a path needs to go compression, we pass
>>>>>> BTRFS_RESERVE_COMPRESS,
>>>>>> otherwise pass BTRFS_RESERVE_NORMAL.
>>>>>>
>>>>>> With these patchs, Stefan no longer saw such false enospc errors, and
>>>>>> Qu Wenruo's
>>>>>> fstests test case will also pass. I have also run whole fstests
>>>>>> multiple times,
>>>>>> no regression occurs, thanks.
>>>>>>
>>>>>> Wang Xiaoguang (3):
>>>>>>   btrfs: improve inode's outstanding_extents computation
>>>>>>   btrfs: introduce type based delalloc metadata reserve
>>>>>>   btrfs: Introduce COMPRESS reserve type to fix false enospc for
>>>>>>     compression
>>>>>>
>>>>>>  fs/btrfs/ctree.h             |  36 +++++--
>>>>>>  fs/btrfs/extent-tree.c       |  52 ++++++---
>>>>>>  fs/btrfs/extent_io.c         |  61 ++++++++++-
>>>>>>  fs/btrfs/extent_io.h         |   5 +
>>>>>>  fs/btrfs/file.c              |  25 +++--
>>>>>>  fs/btrfs/free-space-cache.c  |   6 +-
>>>>>>  fs/btrfs/inode-map.c         |   6 +-
>>>>>>  fs/btrfs/inode.c             | 246
>>>>>> ++++++++++++++++++++++++++++++++++---------
>>>>>>  fs/btrfs/ioctl.c             |  16 +--
>>>>>>  fs/btrfs/relocation.c        |  14 ++-
>>>>>>  fs/btrfs/tests/inode-tests.c |  15 +--
>>>>>>  11 files changed, 381 insertions(+), 101 deletions(-)
>>>>>>
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues
  2017-02-27 13:43           ` Stefan Priebe - Profihost AG
@ 2017-04-25 19:25             ` Stefan Priebe - Profihost AG
  2017-04-26  0:41               ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-04-25 19:25 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, Wang Xiaoguang, linux-btrfs
  Cc: clm, jbacik, dsterba, holger

Hello Qu,

still noone on this one? Or is this one solved in another way in 4.10 or
4.11 or is compression just experimental? Haven't seen a note on this.

Thanks,
Stefan

Am 27.02.2017 um 14:43 schrieb Stefan Priebe - Profihost AG:
> Hi,
> 
> can please anybody comment on that one? Josef? Chris? I still need those
> patches to be able to let btrfs run for more than 24hours without ENOSPC
> issues.
> 
> Greets,
> Stefan
> 
> Am 27.02.2017 um 08:22 schrieb Qu Wenruo:
>>
>>
>> At 02/25/2017 04:23 PM, Stefan Priebe - Profihost AG wrote:
>>> Dear Qu,
>>>
>>> any news on your branch? I still don't see it merged anywhere.
>>>
>>> Greets,
>>> Stefan
>>
>> I just remember that Liu Bo has commented one of the patches, I'm afraid
>> I can only push these patches until I addressed his concern.
>>
>> I'll start digging it as memory for this fix is quite blurred now.
>>
>> Thanks,
>> Qu
>>>
>>> Am 04.01.2017 um 17:13 schrieb Stefan Priebe - Profihost AG:
>>>> Hi Qu,
>>>>
>>>> Am 01.01.2017 um 10:32 schrieb Qu Wenruo:
>>>>> Hi Stefan,
>>>>>
>>>>> I'm trying to push it to for-next (will be v4.11), but no response yet.
>>>>>
>>>>> It would be quite nice for you to test the following git pull and give
>>>>> some feedback, so that we can merge it faster.
>>>>>
>>>>> https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60418.html
>>>>
>>>> I'm also getting a notifier that wang's email does not exist anymore
>>>> (wangxg.fnst@cn.fujitsu.com).
>>>>
>>>> I would like to test that branch will need some time todo so. Last time
>>>> i tried 4.10-rc1 i had the same problems like this guy:
>>>> https://www.marc.info/?l=linux-btrfs&m=148338312525137&w=2
>>>>
>>>> Stefan
>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>> On 12/31/2016 03:31 PM, Stefan Priebe - Profihost AG wrote:
>>>>>> Any news on this series? I can't see it in 4.9 nor in 4.10-rc
>>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> Am 11.11.2016 um 09:39 schrieb Wang Xiaoguang:
>>>>>>> When having compression enabled, Stefan Priebe ofen got enospc errors
>>>>>>> though fs still has much free space. Qu Wenruo also has submitted a
>>>>>>> fstests test case which can reproduce this bug steadily, please see
>>>>>>> url: https://patchwork.kernel.org/patch/9420527
>>>>>>>
>>>>>>> First patch[1/3] "btrfs: improve inode's outstanding_extents
>>>>>>> computation" is to
>>>>>>> fix outstanding_extents and reserved_extents account issues. This
>>>>>>> issue was revealed
>>>>>>> by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, When modifying
>>>>>>> BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often gets these
>>>>>>> warnings from
>>>>>>> btrfs_destroy_inode():
>>>>>>>         WARN_ON(BTRFS_I(inode)->outstanding_extents);
>>>>>>>         WARN_ON(BTRFS_I(inode)->reserved_extents);
>>>>>>> Please see this patch's commit message for detailed info, and this
>>>>>>> patch is
>>>>>>> necessary to patch2 and patch3.
>>>>>>>
>>>>>>> For false enospc, the root reasson is that for compression, its max
>>>>>>> extent size will
>>>>>>> be 128k, not 128MB. If we still use 128MB as max extent size to
>>>>>>> reserve metadata for
>>>>>>> compression, obviously it's not appropriate. In patch "btrfs:
>>>>>>> Introduce COMPRESS
>>>>>>> reserve type to fix false enospc for compression" commit message,
>>>>>>> we explain why false enospc error occurs, please see it for detailed
>>>>>>> info.
>>>>>>>
>>>>>>> To fix this issue, we introduce a new enum type:
>>>>>>>     enum btrfs_metadata_reserve_type {
>>>>>>>         BTRFS_RESERVE_NORMAL,
>>>>>>>             BTRFS_RESERVE_COMPRESS,
>>>>>>>     };
>>>>>>> For btrfs_delalloc_[reserve|release]_metadata() and
>>>>>>> btrfs_delalloc_[reserve|release]_space(), we introce a new
>>>>>>> btrfs_metadata_reserve_type
>>>>>>> argument, then if a path needs to go compression, we pass
>>>>>>> BTRFS_RESERVE_COMPRESS,
>>>>>>> otherwise pass BTRFS_RESERVE_NORMAL.
>>>>>>>
>>>>>>> With these patchs, Stefan no longer saw such false enospc errors, and
>>>>>>> Qu Wenruo's
>>>>>>> fstests test case will also pass. I have also run whole fstests
>>>>>>> multiple times,
>>>>>>> no regression occurs, thanks.
>>>>>>>
>>>>>>> Wang Xiaoguang (3):
>>>>>>>   btrfs: improve inode's outstanding_extents computation
>>>>>>>   btrfs: introduce type based delalloc metadata reserve
>>>>>>>   btrfs: Introduce COMPRESS reserve type to fix false enospc for
>>>>>>>     compression
>>>>>>>
>>>>>>>  fs/btrfs/ctree.h             |  36 +++++--
>>>>>>>  fs/btrfs/extent-tree.c       |  52 ++++++---
>>>>>>>  fs/btrfs/extent_io.c         |  61 ++++++++++-
>>>>>>>  fs/btrfs/extent_io.h         |   5 +
>>>>>>>  fs/btrfs/file.c              |  25 +++--
>>>>>>>  fs/btrfs/free-space-cache.c  |   6 +-
>>>>>>>  fs/btrfs/inode-map.c         |   6 +-
>>>>>>>  fs/btrfs/inode.c             | 246
>>>>>>> ++++++++++++++++++++++++++++++++++---------
>>>>>>>  fs/btrfs/ioctl.c             |  16 +--
>>>>>>>  fs/btrfs/relocation.c        |  14 ++-
>>>>>>>  fs/btrfs/tests/inode-tests.c |  15 +--
>>>>>>>  11 files changed, 381 insertions(+), 101 deletions(-)
>>>>>>>
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-btrfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues
  2017-04-25 19:25             ` Stefan Priebe - Profihost AG
@ 2017-04-26  0:41               ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2017-04-26  0:41 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Qu Wenruo, Wang Xiaoguang, linux-btrfs
  Cc: clm, jbacik, dsterba, holger



At 04/26/2017 03:25 AM, Stefan Priebe - Profihost AG wrote:
> Hello Qu,
> 
> still noone on this one? Or is this one solved in another way in 4.10 or
> 4.11 or is compression just experimental? Haven't seen a note on this.

Still pushing the patchset in in-band dedupe patchset.

I'll re-push them in separate patchset.
But without reviewer I'm afraid the hope to merge is small though.

Thanks,
Qu

> 
> Thanks,
> Stefan
> 
> Am 27.02.2017 um 14:43 schrieb Stefan Priebe - Profihost AG:
>> Hi,
>>
>> can please anybody comment on that one? Josef? Chris? I still need those
>> patches to be able to let btrfs run for more than 24hours without ENOSPC
>> issues.
>>
>> Greets,
>> Stefan
>>
>> Am 27.02.2017 um 08:22 schrieb Qu Wenruo:
>>>
>>>
>>> At 02/25/2017 04:23 PM, Stefan Priebe - Profihost AG wrote:
>>>> Dear Qu,
>>>>
>>>> any news on your branch? I still don't see it merged anywhere.
>>>>
>>>> Greets,
>>>> Stefan
>>>
>>> I just remember that Liu Bo has commented one of the patches, I'm afraid
>>> I can only push these patches until I addressed his concern.
>>>
>>> I'll start digging it as memory for this fix is quite blurred now.
>>>
>>> Thanks,
>>> Qu
>>>>
>>>> Am 04.01.2017 um 17:13 schrieb Stefan Priebe - Profihost AG:
>>>>> Hi Qu,
>>>>>
>>>>> Am 01.01.2017 um 10:32 schrieb Qu Wenruo:
>>>>>> Hi Stefan,
>>>>>>
>>>>>> I'm trying to push it to for-next (will be v4.11), but no response yet.
>>>>>>
>>>>>> It would be quite nice for you to test the following git pull and give
>>>>>> some feedback, so that we can merge it faster.
>>>>>>
>>>>>> https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60418.html
>>>>>
>>>>> I'm also getting a notifier that wang's email does not exist anymore
>>>>> (wangxg.fnst@cn.fujitsu.com).
>>>>>
>>>>> I would like to test that branch will need some time todo so. Last time
>>>>> i tried 4.10-rc1 i had the same problems like this guy:
>>>>> https://www.marc.info/?l=linux-btrfs&m=148338312525137&w=2
>>>>>
>>>>> Stefan
>>>>>
>>>>>> Thanks,
>>>>>> Qu
>>>>>>
>>>>>> On 12/31/2016 03:31 PM, Stefan Priebe - Profihost AG wrote:
>>>>>>> Any news on this series? I can't see it in 4.9 nor in 4.10-rc
>>>>>>>
>>>>>>> Stefan
>>>>>>>
>>>>>>> Am 11.11.2016 um 09:39 schrieb Wang Xiaoguang:
>>>>>>>> When having compression enabled, Stefan Priebe ofen got enospc errors
>>>>>>>> though fs still has much free space. Qu Wenruo also has submitted a
>>>>>>>> fstests test case which can reproduce this bug steadily, please see
>>>>>>>> url: https://patchwork.kernel.org/patch/9420527
>>>>>>>>
>>>>>>>> First patch[1/3] "btrfs: improve inode's outstanding_extents
>>>>>>>> computation" is to
>>>>>>>> fix outstanding_extents and reserved_extents account issues. This
>>>>>>>> issue was revealed
>>>>>>>> by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, When modifying
>>>>>>>> BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often gets these
>>>>>>>> warnings from
>>>>>>>> btrfs_destroy_inode():
>>>>>>>>          WARN_ON(BTRFS_I(inode)->outstanding_extents);
>>>>>>>>          WARN_ON(BTRFS_I(inode)->reserved_extents);
>>>>>>>> Please see this patch's commit message for detailed info, and this
>>>>>>>> patch is
>>>>>>>> necessary to patch2 and patch3.
>>>>>>>>
>>>>>>>> For false enospc, the root reasson is that for compression, its max
>>>>>>>> extent size will
>>>>>>>> be 128k, not 128MB. If we still use 128MB as max extent size to
>>>>>>>> reserve metadata for
>>>>>>>> compression, obviously it's not appropriate. In patch "btrfs:
>>>>>>>> Introduce COMPRESS
>>>>>>>> reserve type to fix false enospc for compression" commit message,
>>>>>>>> we explain why false enospc error occurs, please see it for detailed
>>>>>>>> info.
>>>>>>>>
>>>>>>>> To fix this issue, we introduce a new enum type:
>>>>>>>>      enum btrfs_metadata_reserve_type {
>>>>>>>>          BTRFS_RESERVE_NORMAL,
>>>>>>>>              BTRFS_RESERVE_COMPRESS,
>>>>>>>>      };
>>>>>>>> For btrfs_delalloc_[reserve|release]_metadata() and
>>>>>>>> btrfs_delalloc_[reserve|release]_space(), we introce a new
>>>>>>>> btrfs_metadata_reserve_type
>>>>>>>> argument, then if a path needs to go compression, we pass
>>>>>>>> BTRFS_RESERVE_COMPRESS,
>>>>>>>> otherwise pass BTRFS_RESERVE_NORMAL.
>>>>>>>>
>>>>>>>> With these patchs, Stefan no longer saw such false enospc errors, and
>>>>>>>> Qu Wenruo's
>>>>>>>> fstests test case will also pass. I have also run whole fstests
>>>>>>>> multiple times,
>>>>>>>> no regression occurs, thanks.
>>>>>>>>
>>>>>>>> Wang Xiaoguang (3):
>>>>>>>>    btrfs: improve inode's outstanding_extents computation
>>>>>>>>    btrfs: introduce type based delalloc metadata reserve
>>>>>>>>    btrfs: Introduce COMPRESS reserve type to fix false enospc for
>>>>>>>>      compression
>>>>>>>>
>>>>>>>>   fs/btrfs/ctree.h             |  36 +++++--
>>>>>>>>   fs/btrfs/extent-tree.c       |  52 ++++++---
>>>>>>>>   fs/btrfs/extent_io.c         |  61 ++++++++++-
>>>>>>>>   fs/btrfs/extent_io.h         |   5 +
>>>>>>>>   fs/btrfs/file.c              |  25 +++--
>>>>>>>>   fs/btrfs/free-space-cache.c  |   6 +-
>>>>>>>>   fs/btrfs/inode-map.c         |   6 +-
>>>>>>>>   fs/btrfs/inode.c             | 246
>>>>>>>> ++++++++++++++++++++++++++++++++++---------
>>>>>>>>   fs/btrfs/ioctl.c             |  16 +--
>>>>>>>>   fs/btrfs/relocation.c        |  14 ++-
>>>>>>>>   fs/btrfs/tests/inode-tests.c |  15 +--
>>>>>>>>   11 files changed, 381 insertions(+), 101 deletions(-)
>>>>>>>>
>>>>>>> -- 
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> linux-btrfs" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>
>>>
> 
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-04-26  0:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-11  8:39 [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues Wang Xiaoguang
2016-11-11  8:39 ` [PATCH 1/3] btrfs: improve inode's outstanding_extents computation Wang Xiaoguang
2017-01-03 21:00   ` Liu Bo
2017-01-03 23:36     ` Liu Bo
2017-01-23  6:16       ` Qu Wenruo
2016-11-11  8:39 ` [PATCH 2/3] btrfs: introduce type based delalloc metadata reserve Wang Xiaoguang
2016-11-11  8:39 ` [PATCH 3/3] btrfs: Introduce COMPRESS reserve type to fix false enospc for compression Wang Xiaoguang
2016-11-22  9:46 ` [PATCH 0/3] introduce type based delalloc metadata reserve to fix some false enospc issues Wang Xiaoguang
2016-12-31  7:31 ` Stefan Priebe - Profihost AG
2017-01-01  9:32   ` Qu Wenruo
2017-01-04 16:13     ` Stefan Priebe - Profihost AG
2017-02-25  8:23       ` Stefan Priebe - Profihost AG
2017-02-27  0:46         ` Qu Wenruo
2017-02-27  7:22         ` Qu Wenruo
2017-02-27 13:43           ` Stefan Priebe - Profihost AG
2017-04-25 19:25             ` Stefan Priebe - Profihost AG
2017-04-26  0:41               ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.