* [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue
@ 2016-07-20 5:56 Wang Xiaoguang
2016-07-20 5:56 ` [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-20 5:56 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, jbacik, holger
Currently in btrfs, for data space reservation, it does not update
bytes_may_use in btrfs_update_reserved_bytes() and the decrease operation
will be delayed to be done in extent_clear_unlock_delalloc(), for
fallocate(2), decrease operation is even delayed to be done in end
of btrfs_fallocate(), which is too late. Obviously such delay will
cause unnecessary pressure to enospc system, in [PATCH 4/4], there is
a simpel test script that can reveal such false ENOSPC bug.
So in this patch set, it will remove RESERVE_FREE, RESERVE_ALLOC and
RESERVE_ALLOC_NO_ACCOUNT, and we always update bytes_may_use timely.
I'll also commit a fstests test case for this issue.
Wang Xiaoguang (4):
btrfs: use correct offset for reloc_inode in
prealloc_file_extent_cluster()
btrfs: divide btrfs_update_reserved_bytes() into two functions
btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
btrfs: update btrfs_space_info's bytes_may_use timely
fs/btrfs/ctree.h | 2 +-
fs/btrfs/extent-tree.c | 131 ++++++++++++++++++++++++-------------------------
fs/btrfs/extent_io.h | 1 +
fs/btrfs/file.c | 26 +++++-----
fs/btrfs/inode-map.c | 3 +-
fs/btrfs/inode.c | 37 ++++++++++----
fs/btrfs/relocation.c | 17 +++++--
7 files changed, 123 insertions(+), 94 deletions(-)
--
2.9.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
2016-07-20 5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
@ 2016-07-20 5:56 ` Wang Xiaoguang
2016-07-20 13:18 ` Josef Bacik
2016-07-20 5:56 ` [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
` (3 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-20 5:56 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, jbacik, holger
In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
which indeed are extent's bytenr. The correct value should be
cluster->[start|end] minus block group's start bytenr.
start bytenr cluster->start
| | extent | extent | ...| extent |
|----------------------------------------------------------------|
| block group reloc_inode |
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
fs/btrfs/relocation.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0477dca..a0de885 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3030,12 +3030,14 @@ int prealloc_file_extent_cluster(struct inode *inode,
u64 num_bytes;
int nr = 0;
int ret = 0;
+ u64 prealloc_start = cluster->start - offset;
+ u64 prealloc_end = cluster->end - offset;
BUG_ON(cluster->start != cluster->boundary[0]);
inode_lock(inode);
- ret = btrfs_check_data_free_space(inode, cluster->start,
- cluster->end + 1 - cluster->start);
+ ret = btrfs_check_data_free_space(inode, prealloc_start,
+ prealloc_end + 1 - prealloc_start);
if (ret)
goto out;
@@ -3056,8 +3058,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
break;
nr++;
}
- btrfs_free_reserved_data_space(inode, cluster->start,
- cluster->end + 1 - cluster->start);
+ btrfs_free_reserved_data_space(inode, prealloc_start,
+ prealloc_end + 1 - prealloc_start);
out:
inode_unlock(inode);
return ret;
--
2.9.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions
2016-07-20 5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
2016-07-20 5:56 ` [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
@ 2016-07-20 5:56 ` Wang Xiaoguang
2016-07-20 13:21 ` Josef Bacik
2016-07-20 5:56 ` [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag Wang Xiaoguang
` (2 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-20 5:56 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, jbacik, holger
This patch divides btrfs_update_reserved_bytes() into
btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes(), and
next patch will extend btrfs_add_reserved_bytes()to fix some
false ENOSPC error, please see later patch for detailed info.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
fs/btrfs/extent-tree.c | 97 +++++++++++++++++++++++++++++---------------------
1 file changed, 57 insertions(+), 40 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 82b912a..8eaac39 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -104,9 +104,10 @@ static int find_next_key(struct btrfs_path *path, int level,
struct btrfs_key *key);
static void dump_space_info(struct btrfs_space_info *info, u64 bytes,
int dump_block_groups);
-static int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache,
- u64 num_bytes, int reserve,
- int delalloc);
+static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
+ u64 num_bytes, int reserve, int delalloc);
+static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache,
+ u64 num_bytes, int delalloc);
static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv,
u64 num_bytes);
int btrfs_pin_extent(struct btrfs_root *root,
@@ -6297,19 +6298,14 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
}
/**
- * btrfs_update_reserved_bytes - update the block_group and space info counters
+ * btrfs_add_reserved_bytes - update the block_group and space info counters
* @cache: The cache we are manipulating
* @num_bytes: The number of bytes in question
* @reserve: One of the reservation enums
* @delalloc: The blocks are allocated for the delalloc write
*
- * This is called by the allocator when it reserves space, or by somebody who is
- * freeing space that was never actually used on disk. For example if you
- * reserve some space for a new leaf in transaction A and before transaction A
- * commits you free that leaf, you call this with reserve set to 0 in order to
- * clear the reservation.
- *
- * Metadata reservations should be called with RESERVE_ALLOC so we do the proper
+ * This is called by the allocator when it reserves space. Metadata
+ * reservations should be called with RESERVE_ALLOC so we do the proper
* ENOSPC accounting. For data we handle the reservation through clearing the
* delalloc bits in the io_tree. We have to do this since we could end up
* allocating less disk space for the amount of data we have reserved in the
@@ -6319,44 +6315,65 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
* make the reservation and return -EAGAIN, otherwise this function always
* succeeds.
*/
-static int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache,
- u64 num_bytes, int reserve, int delalloc)
+static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
+ u64 num_bytes, int reserve, int delalloc)
{
struct btrfs_space_info *space_info = cache->space_info;
int ret = 0;
spin_lock(&space_info->lock);
spin_lock(&cache->lock);
- if (reserve != RESERVE_FREE) {
- if (cache->ro) {
- ret = -EAGAIN;
- } else {
- cache->reserved += num_bytes;
- space_info->bytes_reserved += num_bytes;
- if (reserve == RESERVE_ALLOC) {
- trace_btrfs_space_reservation(cache->fs_info,
- "space_info", space_info->flags,
- num_bytes, 0);
- space_info->bytes_may_use -= num_bytes;
- }
-
- if (delalloc)
- cache->delalloc_bytes += num_bytes;
- }
+ if (cache->ro) {
+ ret = -EAGAIN;
} else {
- if (cache->ro)
- space_info->bytes_readonly += num_bytes;
- cache->reserved -= num_bytes;
- space_info->bytes_reserved -= num_bytes;
+ cache->reserved += num_bytes;
+ space_info->bytes_reserved += num_bytes;
+ if (reserve == RESERVE_ALLOC) {
+ trace_btrfs_space_reservation(cache->fs_info,
+ "space_info", space_info->flags,
+ num_bytes, 0);
+ space_info->bytes_may_use -= num_bytes;
+ }
if (delalloc)
- cache->delalloc_bytes -= num_bytes;
+ cache->delalloc_bytes += num_bytes;
}
spin_unlock(&cache->lock);
spin_unlock(&space_info->lock);
return ret;
}
+/**
+ * btrfs_free_reserved_bytes - update the block_group and space info counters
+ * @cache: The cache we are manipulating
+ * @num_bytes: The number of bytes in question
+ * @delalloc: The blocks are allocated for the delalloc write
+ *
+ * This is called by somebody who is freeing space that was never actually used
+ * on disk. For example if you reserve some space for a new leaf in transaction
+ * A and before transaction A commits you free that leaf, you call this with
+ * reserve set to 0 in order to clear the reservation.
+ */
+
+static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache,
+ u64 num_bytes, int delalloc)
+{
+ struct btrfs_space_info *space_info = cache->space_info;
+ int ret = 0;
+
+ spin_lock(&space_info->lock);
+ spin_lock(&cache->lock);
+ if (cache->ro)
+ space_info->bytes_readonly += num_bytes;
+ cache->reserved -= num_bytes;
+ space_info->bytes_reserved -= num_bytes;
+
+ if (delalloc)
+ cache->delalloc_bytes -= num_bytes;
+ spin_unlock(&cache->lock);
+ spin_unlock(&space_info->lock);
+ return ret;
+}
void btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans,
struct btrfs_root *root)
{
@@ -6976,7 +6993,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags));
btrfs_add_free_space(cache, buf->start, buf->len);
- btrfs_update_reserved_bytes(cache, buf->len, RESERVE_FREE, 0);
+ btrfs_free_reserved_bytes(cache, buf->len, 0);
btrfs_put_block_group(cache);
trace_btrfs_reserved_extent_free(root, buf->start, buf->len);
pin = 0;
@@ -7548,8 +7565,8 @@ checks:
search_start - offset);
BUG_ON(offset > search_start);
- ret = btrfs_update_reserved_bytes(block_group, num_bytes,
- alloc_type, delalloc);
+ ret = btrfs_add_reserved_bytes(block_group, num_bytes,
+ alloc_type, delalloc);
if (ret == -EAGAIN) {
btrfs_add_free_space(block_group, offset, num_bytes);
goto loop;
@@ -7781,7 +7798,7 @@ static int __btrfs_free_reserved_extent(struct btrfs_root *root,
if (btrfs_test_opt(root, DISCARD))
ret = btrfs_discard_extent(root, start, len, NULL);
btrfs_add_free_space(cache, start, len);
- btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, delalloc);
+ btrfs_free_reserved_bytes(cache, len, delalloc);
}
btrfs_put_block_group(cache);
@@ -8011,8 +8028,8 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
if (!block_group)
return -EINVAL;
- ret = btrfs_update_reserved_bytes(block_group, ins->offset,
- RESERVE_ALLOC_NO_ACCOUNT, 0);
+ ret = btrfs_add_reserved_bytes(block_group, ins->offset,
+ RESERVE_ALLOC_NO_ACCOUNT, 0);
BUG_ON(ret); /* logic error */
ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
0, owner, offset, ins, 1);
--
2.9.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
2016-07-20 5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
2016-07-20 5:56 ` [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
2016-07-20 5:56 ` [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
@ 2016-07-20 5:56 ` Wang Xiaoguang
2016-07-20 13:22 ` Josef Bacik
2016-07-20 5:56 ` [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
2016-07-20 8:46 ` [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Holger Hoffstätte
4 siblings, 1 reply; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-20 5:56 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, jbacik, holger
In next patch, btrfs_clear_bit_hook() will not call
btrfs_free_reserved_data_space_noquota() to update btrfs_space_info's
bytes_may_use unless it has EXTENT_DO_ACCOUNTING or EXTENT_CLEAR_DATA_RESV,
as for the reason, please see the next patch for detailed info.
As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
PREALLOC, we also need to update bytes_may_use, but can not pass
EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so here
we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
to update btrfs_space_info's bytes_may_use.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
fs/btrfs/extent_io.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index c0c1c4f..b52ca5d 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -20,6 +20,7 @@
#define EXTENT_DAMAGED (1U << 14)
#define EXTENT_NORESERVE (1U << 15)
#define EXTENT_QGROUP_RESERVED (1U << 16)
+#define EXTENT_CLEAR_DATA_RESV (1U << 17)
#define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK)
#define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
--
2.9.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely
2016-07-20 5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
` (2 preceding siblings ...)
2016-07-20 5:56 ` [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag Wang Xiaoguang
@ 2016-07-20 5:56 ` Wang Xiaoguang
2016-07-20 13:35 ` Josef Bacik
2016-07-20 8:46 ` [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Holger Hoffstätte
4 siblings, 1 reply; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-20 5:56 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, jbacik, holger
This patch can fix some false ENOSPC errors, below test script can
reproduce one false ENOSPC error:
#!/bin/bash
dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
dev=$(losetup --show -f fs.img)
mkfs.btrfs -f -M $dev
mkdir /tmp/mntpoint
mount $dev /tmp/mntpoint
cd /tmp/mntpoint
xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile
Above script will fail for ENOSPC reason, but indeed fs still has free
space to satisfy this request. Please see call graph:
btrfs_fallocate()
|-> btrfs_alloc_data_chunk_ondemand()
| bytes_may_use += 64M
|-> btrfs_prealloc_file_range()
|-> btrfs_reserve_extent()
|-> btrfs_add_reserved_bytes()
| alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
| change bytes_may_use, and bytes_reserved += 64M. Now
| bytes_may_use + bytes_reserved == 128M, which is greater
| than btrfs_space_info's total_bytes, false enospc occurs.
| Note, the bytes_may_use decrease operation will done in
| end of btrfs_fallocate(), which is too late.
Here is another simple case for buffered write:
CPU 1 | CPU 2
|
|-> cow_file_range() |-> __btrfs_buffered_write()
|-> btrfs_reserve_extent() | |
| | |
| | |
| ..... | |-> btrfs_check_data_free_space()
| |
| |
|-> extent_clear_unlock_delalloc() |
In CPU 1, btrfs_reserve_extent()->find_free_extent()->
btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
operation will be delayed to be done in extent_clear_unlock_delalloc().
Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
btrfs_check_data_free_space() tries to reserve 100MB data space.
If
100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
btrfs_check_data_free_space() will try to allcate new data chunk or call
btrfs_start_delalloc_roots(), or commit current transaction inorder to
reserve some free space, obviously a lot of work. But indeed it's not
necessary as long as decreasing bytes_may_use timely, we still have
free space, decreasing 128M from bytes_may_use.
To fix this issue, this patch chooses to update bytes_may_use for both
data and metadata in btrfs_add_reserved_bytes(). For compress path, real
extent length may not be equal to file content length, so introduce a
ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
file content length. Then compress path can update bytes_may_use
correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
and RESERVE_FREE.
For inode marked as NODATACOW or extent marked as PREALLOC, we can
directly call btrfs_free_reserved_data_space() to adjust bytes_may_use.
Meanwhile __btrfs_prealloc_file_range() will call
btrfs_free_reserved_data_space() internally for both sucessful and failed
path, btrfs_prealloc_file_range()'s callers does not need to call
btrfs_free_reserved_data_space() any more.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
fs/btrfs/ctree.h | 2 +-
fs/btrfs/extent-tree.c | 56 +++++++++++++++++---------------------------------
fs/btrfs/file.c | 26 +++++++++++++----------
fs/btrfs/inode-map.c | 3 +--
fs/btrfs/inode.c | 37 ++++++++++++++++++++++++---------
fs/btrfs/relocation.c | 11 ++++++++--
6 files changed, 72 insertions(+), 63 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4274a7b..7eb2913 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2556,7 +2556,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
u64 root_objectid, u64 owner, u64 offset,
struct btrfs_key *ins);
-int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes,
+int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 num_bytes,
u64 min_alloc_size, u64 empty_size, u64 hint_byte,
struct btrfs_key *ins, int is_data, int delalloc);
int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8eaac39..5447973 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -60,21 +60,6 @@ enum {
CHUNK_ALLOC_FORCE = 2,
};
-/*
- * Control how reservations are dealt with.
- *
- * RESERVE_FREE - freeing a reservation.
- * RESERVE_ALLOC - allocating space and we need to update bytes_may_use for
- * ENOSPC accounting
- * RESERVE_ALLOC_NO_ACCOUNT - allocating space and we should not update
- * bytes_may_use as the ENOSPC accounting is done elsewhere
- */
-enum {
- RESERVE_FREE = 0,
- RESERVE_ALLOC = 1,
- RESERVE_ALLOC_NO_ACCOUNT = 2,
-};
-
static int update_block_group(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr,
u64 num_bytes, int alloc);
@@ -105,7 +90,7 @@ static int find_next_key(struct btrfs_path *path, int level,
static void dump_space_info(struct btrfs_space_info *info, u64 bytes,
int dump_block_groups);
static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
- u64 num_bytes, int reserve, int delalloc);
+ u64 ram_bytes, u64 num_bytes, int delalloc);
static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache,
u64 num_bytes, int delalloc);
static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv,
@@ -3491,7 +3476,6 @@ again:
dcs = BTRFS_DC_SETUP;
else if (ret == -ENOSPC)
set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
- btrfs_free_reserved_data_space(inode, 0, num_pages);
out_put:
iput(inode);
@@ -6300,8 +6284,9 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
/**
* btrfs_add_reserved_bytes - update the block_group and space info counters
* @cache: The cache we are manipulating
+ * @ram_bytes: The number of bytes of file content, and will be same to
+ * @num_bytes except for the compress path.
* @num_bytes: The number of bytes in question
- * @reserve: One of the reservation enums
* @delalloc: The blocks are allocated for the delalloc write
*
* This is called by the allocator when it reserves space. Metadata
@@ -6316,7 +6301,7 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
* succeeds.
*/
static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
- u64 num_bytes, int reserve, int delalloc)
+ u64 ram_bytes, u64 num_bytes, int delalloc)
{
struct btrfs_space_info *space_info = cache->space_info;
int ret = 0;
@@ -6328,13 +6313,11 @@ static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
} else {
cache->reserved += num_bytes;
space_info->bytes_reserved += num_bytes;
- if (reserve == RESERVE_ALLOC) {
- trace_btrfs_space_reservation(cache->fs_info,
- "space_info", space_info->flags,
- num_bytes, 0);
- space_info->bytes_may_use -= num_bytes;
- }
+ trace_btrfs_space_reservation(cache->fs_info,
+ "space_info", space_info->flags,
+ num_bytes, 0);
+ space_info->bytes_may_use -= ram_bytes;
if (delalloc)
cache->delalloc_bytes += num_bytes;
}
@@ -7218,9 +7201,9 @@ btrfs_release_block_group(struct btrfs_block_group_cache *cache,
* the free space extent currently.
*/
static noinline int find_free_extent(struct btrfs_root *orig_root,
- u64 num_bytes, u64 empty_size,
- u64 hint_byte, struct btrfs_key *ins,
- u64 flags, int delalloc)
+ u64 ram_bytes, u64 num_bytes, u64 empty_size,
+ u64 hint_byte, struct btrfs_key *ins,
+ u64 flags, int delalloc)
{
int ret = 0;
struct btrfs_root *root = orig_root->fs_info->extent_root;
@@ -7232,8 +7215,6 @@ static noinline int find_free_extent(struct btrfs_root *orig_root,
struct btrfs_space_info *space_info;
int loop = 0;
int index = __get_raid_index(flags);
- int alloc_type = (flags & BTRFS_BLOCK_GROUP_DATA) ?
- RESERVE_ALLOC_NO_ACCOUNT : RESERVE_ALLOC;
bool failed_cluster_refill = false;
bool failed_alloc = false;
bool use_cluster = true;
@@ -7565,8 +7546,8 @@ checks:
search_start - offset);
BUG_ON(offset > search_start);
- ret = btrfs_add_reserved_bytes(block_group, num_bytes,
- alloc_type, delalloc);
+ ret = btrfs_add_reserved_bytes(block_group, ram_bytes,
+ num_bytes, delalloc);
if (ret == -EAGAIN) {
btrfs_add_free_space(block_group, offset, num_bytes);
goto loop;
@@ -7739,7 +7720,7 @@ again:
up_read(&info->groups_sem);
}
-int btrfs_reserve_extent(struct btrfs_root *root,
+int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes,
u64 num_bytes, u64 min_alloc_size,
u64 empty_size, u64 hint_byte,
struct btrfs_key *ins, int is_data, int delalloc)
@@ -7751,8 +7732,8 @@ int btrfs_reserve_extent(struct btrfs_root *root,
flags = btrfs_get_alloc_profile(root, is_data);
again:
WARN_ON(num_bytes < root->sectorsize);
- ret = find_free_extent(root, num_bytes, empty_size, hint_byte, ins,
- flags, delalloc);
+ ret = find_free_extent(root, ram_bytes, num_bytes, empty_size,
+ hint_byte, ins, flags, delalloc);
if (!ret && !is_data) {
btrfs_dec_block_group_reservations(root->fs_info,
ins->objectid);
@@ -7761,6 +7742,7 @@ again:
num_bytes = min(num_bytes >> 1, ins->offset);
num_bytes = round_down(num_bytes, root->sectorsize);
num_bytes = max(num_bytes, min_alloc_size);
+ ram_bytes = num_bytes;
if (num_bytes == min_alloc_size)
final_tried = true;
goto again;
@@ -8029,7 +8011,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
return -EINVAL;
ret = btrfs_add_reserved_bytes(block_group, ins->offset,
- RESERVE_ALLOC_NO_ACCOUNT, 0);
+ ins->offset, 0);
BUG_ON(ret); /* logic error */
ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
0, owner, offset, ins, 1);
@@ -8171,7 +8153,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
if (IS_ERR(block_rsv))
return ERR_CAST(block_rsv);
- ret = btrfs_reserve_extent(root, blocksize, blocksize,
+ ret = btrfs_reserve_extent(root, blocksize, blocksize, blocksize,
empty_size, hint, &ins, 0, 0);
if (ret)
goto out_unuse;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 2234e88..b4d9258 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2669,6 +2669,7 @@ static long btrfs_fallocate(struct file *file, int mode,
alloc_start = round_down(offset, blocksize);
alloc_end = round_up(offset + len, blocksize);
+ cur_offset = alloc_start;
/* Make sure we aren't being give some crap mode */
if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
@@ -2761,7 +2762,6 @@ static long btrfs_fallocate(struct file *file, int mode,
/* First, check if we exceed the qgroup limit */
INIT_LIST_HEAD(&reserve_list);
- cur_offset = alloc_start;
while (1) {
em = btrfs_get_extent(inode, NULL, 0, cur_offset,
alloc_end - cur_offset, 0);
@@ -2788,6 +2788,14 @@ static long btrfs_fallocate(struct file *file, int mode,
last_byte - cur_offset);
if (ret < 0)
break;
+ } else {
+ /*
+ * Do not need to reserve unwritten extent for this
+ * range, free reserved data space first, otherwise
+ * it'll result in false ENOSPC error.
+ */
+ btrfs_free_reserved_data_space(inode, cur_offset,
+ last_byte - cur_offset);
}
free_extent_map(em);
cur_offset = last_byte;
@@ -2805,6 +2813,9 @@ static long btrfs_fallocate(struct file *file, int mode,
range->start,
range->len, 1 << inode->i_blkbits,
offset + len, &alloc_hint);
+ else
+ btrfs_free_reserved_data_space(inode, range->start,
+ range->len);
list_del(&range->list);
kfree(range);
}
@@ -2839,18 +2850,11 @@ out_unlock:
unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
&cached_state, GFP_KERNEL);
out:
- /*
- * As we waited the extent range, the data_rsv_map must be empty
- * in the range, as written data range will be released from it.
- * And for prealloacted extent, it will also be released when
- * its metadata is written.
- * So this is completely used as cleanup.
- */
- btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start);
inode_unlock(inode);
/* Let go of our reservation. */
- btrfs_free_reserved_data_space(inode, alloc_start,
- alloc_end - alloc_start);
+ if (ret != 0)
+ btrfs_free_reserved_data_space(inode, alloc_start,
+ alloc_end - cur_offset);
return ret;
}
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 70107f7..e59e7d6 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -495,10 +495,9 @@ again:
ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
prealloc, prealloc, &alloc_hint);
if (ret) {
- btrfs_delalloc_release_space(inode, 0, prealloc);
+ btrfs_delalloc_release_metadata(inode, prealloc);
goto out_put;
}
- btrfs_free_reserved_data_space(inode, 0, prealloc);
ret = btrfs_write_out_ino_cache(root, trans, path, inode);
out_put:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4421954..e0cee59 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -564,6 +564,8 @@ cont:
PAGE_SET_WRITEBACK |
page_error_op |
PAGE_END_WRITEBACK);
+ btrfs_free_reserved_data_space_noquota(inode, start,
+ end - start + 1);
goto free_pages_out;
}
}
@@ -739,7 +741,7 @@ retry:
lock_extent(io_tree, async_extent->start,
async_extent->start + async_extent->ram_size - 1);
- ret = btrfs_reserve_extent(root,
+ ret = btrfs_reserve_extent(root, async_extent->ram_size,
async_extent->compressed_size,
async_extent->compressed_size,
0, alloc_hint, &ins, 1, 1);
@@ -966,7 +968,8 @@ static noinline int cow_file_range(struct inode *inode,
EXTENT_DEFRAG, PAGE_UNLOCK |
PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK |
PAGE_END_WRITEBACK);
-
+ btrfs_free_reserved_data_space_noquota(inode, start,
+ end - start + 1);
*nr_written = *nr_written +
(end - start + PAGE_SIZE) / PAGE_SIZE;
*page_started = 1;
@@ -986,7 +989,7 @@ static noinline int cow_file_range(struct inode *inode,
unsigned long op;
cur_alloc_size = disk_num_bytes;
- ret = btrfs_reserve_extent(root, cur_alloc_size,
+ ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size,
root->sectorsize, 0, alloc_hint,
&ins, 1, 1);
if (ret < 0)
@@ -1485,8 +1488,10 @@ out_check:
extent_clear_unlock_delalloc(inode, cur_offset,
cur_offset + num_bytes - 1,
locked_page, EXTENT_LOCKED |
- EXTENT_DELALLOC, PAGE_UNLOCK |
- PAGE_SET_PRIVATE2);
+ EXTENT_DELALLOC |
+ EXTENT_CLEAR_DATA_RESV,
+ PAGE_UNLOCK | PAGE_SET_PRIVATE2);
+
if (!nolock && nocow)
btrfs_end_write_no_snapshoting(root);
cur_offset = extent_end;
@@ -1803,7 +1808,9 @@ static void btrfs_clear_bit_hook(struct inode *inode,
return;
if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID
- && do_list && !(state->state & EXTENT_NORESERVE))
+ && do_list && !(state->state & EXTENT_NORESERVE)
+ && (*bits & (EXTENT_DO_ACCOUNTING |
+ EXTENT_CLEAR_DATA_RESV)))
btrfs_free_reserved_data_space_noquota(inode,
state->start, len);
@@ -7214,7 +7221,7 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
int ret;
alloc_hint = get_extent_allocation_hint(inode, start, len);
- ret = btrfs_reserve_extent(root, len, root->sectorsize, 0,
+ ret = btrfs_reserve_extent(root, len, len, root->sectorsize, 0,
alloc_hint, &ins, 1, 1);
if (ret)
return ERR_PTR(ret);
@@ -7714,6 +7721,13 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
ret = PTR_ERR(em2);
goto unlock_err;
}
+ /*
+ * For inode marked NODATACOW or extent marked PREALLOC,
+ * use the existing or preallocated extent, so does not
+ * need to adjust btrfs_space_info's bytes_may_use.
+ */
+ btrfs_free_reserved_data_space_noquota(inode,
+ start, len);
goto unlock;
}
}
@@ -7748,7 +7762,6 @@ unlock:
i_size_write(inode, start + len);
adjust_dio_outstanding_extents(inode, dio_data, len);
- btrfs_free_reserved_data_space(inode, start, len);
WARN_ON(dio_data->reserve < len);
dio_data->reserve -= len;
dio_data->unsubmitted_oe_range_end = start + len;
@@ -10269,6 +10282,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
u64 last_alloc = (u64)-1;
int ret = 0;
bool own_trans = true;
+ u64 end = start + num_bytes - 1;
if (trans)
own_trans = false;
@@ -10290,8 +10304,8 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
* sized chunks.
*/
cur_bytes = min(cur_bytes, last_alloc);
- ret = btrfs_reserve_extent(root, cur_bytes, min_size, 0,
- *alloc_hint, &ins, 1, 0);
+ ret = btrfs_reserve_extent(root, cur_bytes, cur_bytes,
+ min_size, 0, *alloc_hint, &ins, 1, 0);
if (ret) {
if (own_trans)
btrfs_end_transaction(trans, root);
@@ -10377,6 +10391,9 @@ next:
if (own_trans)
btrfs_end_transaction(trans, root);
}
+ if (cur_offset < end)
+ btrfs_free_reserved_data_space(inode, cur_offset,
+ end - cur_offset + 1);
return ret;
}
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index a0de885..f39c4db 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3032,6 +3032,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
int ret = 0;
u64 prealloc_start = cluster->start - offset;
u64 prealloc_end = cluster->end - offset;
+ u64 cur_offset;
BUG_ON(cluster->start != cluster->boundary[0]);
inode_lock(inode);
@@ -3041,6 +3042,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
if (ret)
goto out;
+ cur_offset = prealloc_start;
while (nr < cluster->nr) {
start = cluster->boundary[nr] - offset;
if (nr + 1 < cluster->nr)
@@ -3050,16 +3052,21 @@ int prealloc_file_extent_cluster(struct inode *inode,
lock_extent(&BTRFS_I(inode)->io_tree, start, end);
num_bytes = end + 1 - start;
+ if (cur_offset < start)
+ btrfs_free_reserved_data_space(inode, cur_offset,
+ start - cur_offset);
ret = btrfs_prealloc_file_range(inode, 0, start,
num_bytes, num_bytes,
end + 1, &alloc_hint);
+ cur_offset = end + 1;
unlock_extent(&BTRFS_I(inode)->io_tree, start, end);
if (ret)
break;
nr++;
}
- btrfs_free_reserved_data_space(inode, prealloc_start,
- prealloc_end + 1 - prealloc_start);
+ if (cur_offset < prealloc_end)
+ btrfs_free_reserved_data_space(inode, cur_offset,
+ prealloc_end + 1 - cur_offset);
out:
inode_unlock(inode);
return ret;
--
2.9.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue
2016-07-20 5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
` (3 preceding siblings ...)
2016-07-20 5:56 ` [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
@ 2016-07-20 8:46 ` Holger Hoffstätte
2016-07-21 1:51 ` Wang Xiaoguang
4 siblings, 1 reply; 15+ messages in thread
From: Holger Hoffstätte @ 2016-07-20 8:46 UTC (permalink / raw)
To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, jbacik
On 07/20/16 07:56, Wang Xiaoguang wrote:
> Currently in btrfs, for data space reservation, it does not update
> bytes_may_use in btrfs_update_reserved_bytes() and the decrease operation
> will be delayed to be done in extent_clear_unlock_delalloc(), for
> fallocate(2), decrease operation is even delayed to be done in end
> of btrfs_fallocate(), which is too late. Obviously such delay will
> cause unnecessary pressure to enospc system, in [PATCH 4/4], there is
> a simpel test script that can reveal such false ENOSPC bug.
>
> So in this patch set, it will remove RESERVE_FREE, RESERVE_ALLOC and
> RESERVE_ALLOC_NO_ACCOUNT, and we always update bytes_may_use timely.
>
> I'll also commit a fstests test case for this issue.
>
> Wang Xiaoguang (4):
> btrfs: use correct offset for reloc_inode in
> prealloc_file_extent_cluster()
> btrfs: divide btrfs_update_reserved_bytes() into two functions
> btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
> btrfs: update btrfs_space_info's bytes_may_use timely
>
Just like the previous version, for all 4 patches:
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
via the reproducer script & some very large manual fallocates.
thanks!
Holger
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
2016-07-20 5:56 ` [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
@ 2016-07-20 13:18 ` Josef Bacik
2016-07-21 1:49 ` Wang Xiaoguang
0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-07-20 13:18 UTC (permalink / raw)
To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger
On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
> In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
> wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
> which indeed are extent's bytenr. The correct value should be
> cluster->[start|end] minus block group's start bytenr.
>
> start bytenr cluster->start
> | | extent | extent | ...| extent |
> |----------------------------------------------------------------|
> | block group reloc_inode |
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
> fs/btrfs/relocation.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0477dca..a0de885 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3030,12 +3030,14 @@ int prealloc_file_extent_cluster(struct inode *inode,
> u64 num_bytes;
> int nr = 0;
> int ret = 0;
> + u64 prealloc_start = cluster->start - offset;
> + u64 prealloc_end = cluster->end - offset;
>
> BUG_ON(cluster->start != cluster->boundary[0]);
> inode_lock(inode);
>
> - ret = btrfs_check_data_free_space(inode, cluster->start,
> - cluster->end + 1 - cluster->start);
> + ret = btrfs_check_data_free_space(inode, prealloc_start,
> + prealloc_end + 1 - prealloc_start);
> if (ret)
> goto out;
>
> @@ -3056,8 +3058,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
> break;
> nr++;
> }
> - btrfs_free_reserved_data_space(inode, cluster->start,
> - cluster->end + 1 - cluster->start);
> + btrfs_free_reserved_data_space(inode, prealloc_start,
> + prealloc_end + 1 - prealloc_start);
> out:
> inode_unlock(inode);
> return ret;
>
This ends up being the same amount. Consider this scenario
bg bytenr = 4096
cluster->start = 8192
cluster->end = 12287
cluster->end + 1 - cluster->start = 4096
prealloc_start = cluster->start - offset = 0
prealloc_end = cluster->end - offset = 8191
prealloc_end + 1 - prealloc_start = 4096
You shift both by the same amount, which gives you the same answer. Thanks,
Josef
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions
2016-07-20 5:56 ` [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
@ 2016-07-20 13:21 ` Josef Bacik
0 siblings, 0 replies; 15+ messages in thread
From: Josef Bacik @ 2016-07-20 13:21 UTC (permalink / raw)
To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger
On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
> This patch divides btrfs_update_reserved_bytes() into
> btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes(), and
> next patch will extend btrfs_add_reserved_bytes()to fix some
> false ENOSPC error, please see later patch for detailed info.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
This appears to be functionally the same
Signed-off-by: Josef Bacik <jbacik@fb.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
2016-07-20 5:56 ` [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag Wang Xiaoguang
@ 2016-07-20 13:22 ` Josef Bacik
2016-07-21 1:15 ` Wang Xiaoguang
0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-07-20 13:22 UTC (permalink / raw)
To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger
On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
> In next patch, btrfs_clear_bit_hook() will not call
> btrfs_free_reserved_data_space_noquota() to update btrfs_space_info's
> bytes_may_use unless it has EXTENT_DO_ACCOUNTING or EXTENT_CLEAR_DATA_RESV,
> as for the reason, please see the next patch for detailed info.
>
> As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
> run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
> PREALLOC, we also need to update bytes_may_use, but can not pass
> EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so here
> we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
> to update btrfs_space_info's bytes_may_use.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
There's no point in introducing only a flag in one patch, collapse this into the
patch that actually uses it. Thanks,
Josef
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely
2016-07-20 5:56 ` [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
@ 2016-07-20 13:35 ` Josef Bacik
2016-07-21 1:18 ` Wang Xiaoguang
0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-07-20 13:35 UTC (permalink / raw)
To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger
On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
> This patch can fix some false ENOSPC errors, below test script can
> reproduce one false ENOSPC error:
> #!/bin/bash
> dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
> dev=$(losetup --show -f fs.img)
> mkfs.btrfs -f -M $dev
> mkdir /tmp/mntpoint
> mount $dev /tmp/mntpoint
> cd /tmp/mntpoint
> xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile
>
> Above script will fail for ENOSPC reason, but indeed fs still has free
> space to satisfy this request. Please see call graph:
> btrfs_fallocate()
> |-> btrfs_alloc_data_chunk_ondemand()
> | bytes_may_use += 64M
> |-> btrfs_prealloc_file_range()
> |-> btrfs_reserve_extent()
> |-> btrfs_add_reserved_bytes()
> | alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
> | change bytes_may_use, and bytes_reserved += 64M. Now
> | bytes_may_use + bytes_reserved == 128M, which is greater
> | than btrfs_space_info's total_bytes, false enospc occurs.
> | Note, the bytes_may_use decrease operation will done in
> | end of btrfs_fallocate(), which is too late.
>
> Here is another simple case for buffered write:
> CPU 1 | CPU 2
> |
> |-> cow_file_range() |-> __btrfs_buffered_write()
> |-> btrfs_reserve_extent() | |
> | | |
> | | |
> | ..... | |-> btrfs_check_data_free_space()
> | |
> | |
> |-> extent_clear_unlock_delalloc() |
>
> In CPU 1, btrfs_reserve_extent()->find_free_extent()->
> btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
> operation will be delayed to be done in extent_clear_unlock_delalloc().
> Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
> btrfs_check_data_free_space() tries to reserve 100MB data space.
> If
> 100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
> data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
> data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
> btrfs_check_data_free_space() will try to allcate new data chunk or call
> btrfs_start_delalloc_roots(), or commit current transaction inorder to
> reserve some free space, obviously a lot of work. But indeed it's not
> necessary as long as decreasing bytes_may_use timely, we still have
> free space, decreasing 128M from bytes_may_use.
>
> To fix this issue, this patch chooses to update bytes_may_use for both
> data and metadata in btrfs_add_reserved_bytes(). For compress path, real
> extent length may not be equal to file content length, so introduce a
> ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
> btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
> file content length. Then compress path can update bytes_may_use
> correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
> and RESERVE_FREE.
>
> For inode marked as NODATACOW or extent marked as PREALLOC, we can
> directly call btrfs_free_reserved_data_space() to adjust bytes_may_use.
>
> Meanwhile __btrfs_prealloc_file_range() will call
> btrfs_free_reserved_data_space() internally for both sucessful and failed
> path, btrfs_prealloc_file_range()'s callers does not need to call
> btrfs_free_reserved_data_space() any more.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
> fs/btrfs/ctree.h | 2 +-
> fs/btrfs/extent-tree.c | 56 +++++++++++++++++---------------------------------
> fs/btrfs/file.c | 26 +++++++++++++----------
> fs/btrfs/inode-map.c | 3 +--
> fs/btrfs/inode.c | 37 ++++++++++++++++++++++++---------
> fs/btrfs/relocation.c | 11 ++++++++--
> 6 files changed, 72 insertions(+), 63 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 4274a7b..7eb2913 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2556,7 +2556,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
> struct btrfs_root *root,
> u64 root_objectid, u64 owner, u64 offset,
> struct btrfs_key *ins);
> -int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes,
> +int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 num_bytes,
> u64 min_alloc_size, u64 empty_size, u64 hint_byte,
> struct btrfs_key *ins, int is_data, int delalloc);
> int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 8eaac39..5447973 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -60,21 +60,6 @@ enum {
> CHUNK_ALLOC_FORCE = 2,
> };
>
> -/*
> - * Control how reservations are dealt with.
> - *
> - * RESERVE_FREE - freeing a reservation.
> - * RESERVE_ALLOC - allocating space and we need to update bytes_may_use for
> - * ENOSPC accounting
> - * RESERVE_ALLOC_NO_ACCOUNT - allocating space and we should not update
> - * bytes_may_use as the ENOSPC accounting is done elsewhere
> - */
> -enum {
> - RESERVE_FREE = 0,
> - RESERVE_ALLOC = 1,
> - RESERVE_ALLOC_NO_ACCOUNT = 2,
> -};
> -
> static int update_block_group(struct btrfs_trans_handle *trans,
> struct btrfs_root *root, u64 bytenr,
> u64 num_bytes, int alloc);
> @@ -105,7 +90,7 @@ static int find_next_key(struct btrfs_path *path, int level,
> static void dump_space_info(struct btrfs_space_info *info, u64 bytes,
> int dump_block_groups);
> static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
> - u64 num_bytes, int reserve, int delalloc);
> + u64 ram_bytes, u64 num_bytes, int delalloc);
> static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache,
> u64 num_bytes, int delalloc);
> static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv,
> @@ -3491,7 +3476,6 @@ again:
> dcs = BTRFS_DC_SETUP;
> else if (ret == -ENOSPC)
> set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
> - btrfs_free_reserved_data_space(inode, 0, num_pages);
>
> out_put:
> iput(inode);
> @@ -6300,8 +6284,9 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
> /**
> * btrfs_add_reserved_bytes - update the block_group and space info counters
> * @cache: The cache we are manipulating
> + * @ram_bytes: The number of bytes of file content, and will be same to
> + * @num_bytes except for the compress path.
> * @num_bytes: The number of bytes in question
> - * @reserve: One of the reservation enums
> * @delalloc: The blocks are allocated for the delalloc write
> *
> * This is called by the allocator when it reserves space. Metadata
> @@ -6316,7 +6301,7 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
> * succeeds.
> */
> static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
> - u64 num_bytes, int reserve, int delalloc)
> + u64 ram_bytes, u64 num_bytes, int delalloc)
> {
> struct btrfs_space_info *space_info = cache->space_info;
> int ret = 0;
> @@ -6328,13 +6313,11 @@ static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache,
> } else {
> cache->reserved += num_bytes;
> space_info->bytes_reserved += num_bytes;
> - if (reserve == RESERVE_ALLOC) {
> - trace_btrfs_space_reservation(cache->fs_info,
> - "space_info", space_info->flags,
> - num_bytes, 0);
> - space_info->bytes_may_use -= num_bytes;
> - }
>
> + trace_btrfs_space_reservation(cache->fs_info,
> + "space_info", space_info->flags,
> + num_bytes, 0);
This needs to be ram_bytes to keep the accounting consistent for tools that use
these tracepoints. Thanks,
Josef
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
2016-07-20 13:22 ` Josef Bacik
@ 2016-07-21 1:15 ` Wang Xiaoguang
0 siblings, 0 replies; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-21 1:15 UTC (permalink / raw)
To: Josef Bacik, linux-btrfs; +Cc: dsterba, holger
hello,
On 07/20/2016 09:22 PM, Josef Bacik wrote:
> On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
>> In next patch, btrfs_clear_bit_hook() will not call
>> btrfs_free_reserved_data_space_noquota() to update btrfs_space_info's
>> bytes_may_use unless it has EXTENT_DO_ACCOUNTING or
>> EXTENT_CLEAR_DATA_RESV,
>> as for the reason, please see the next patch for detailed info.
>>
>> As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
>> run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
>> PREALLOC, we also need to update bytes_may_use, but can not pass
>> EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so
>> here
>> we introduce EXTENT_CLEAR_DATA_RESV flag to indicate
>> btrfs_clear_bit_hook()
>> to update btrfs_space_info's bytes_may_use.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>
> There's no point in introducing only a flag in one patch, collapse
> this into the patch that actually uses it. Thanks,
OK
Regards,
Xiaoguang Wang
>
> Josef
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely
2016-07-20 13:35 ` Josef Bacik
@ 2016-07-21 1:18 ` Wang Xiaoguang
0 siblings, 0 replies; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-21 1:18 UTC (permalink / raw)
To: Josef Bacik, linux-btrfs; +Cc: dsterba, holger
hello,
On 07/20/2016 09:35 PM, Josef Bacik wrote:
> On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
>> This patch can fix some false ENOSPC errors, below test script can
>> reproduce one false ENOSPC error:
>> #!/bin/bash
>> dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
>> dev=$(losetup --show -f fs.img)
>> mkfs.btrfs -f -M $dev
>> mkdir /tmp/mntpoint
>> mount $dev /tmp/mntpoint
>> cd /tmp/mntpoint
>> xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile
>>
>> Above script will fail for ENOSPC reason, but indeed fs still has free
>> space to satisfy this request. Please see call graph:
>> btrfs_fallocate()
>> |-> btrfs_alloc_data_chunk_ondemand()
>> | bytes_may_use += 64M
>> |-> btrfs_prealloc_file_range()
>> |-> btrfs_reserve_extent()
>> |-> btrfs_add_reserved_bytes()
>> | alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
>> | change bytes_may_use, and bytes_reserved += 64M. Now
>> | bytes_may_use + bytes_reserved == 128M, which is greater
>> | than btrfs_space_info's total_bytes, false enospc occurs.
>> | Note, the bytes_may_use decrease operation will done in
>> | end of btrfs_fallocate(), which is too late.
>>
>> Here is another simple case for buffered write:
>> CPU 1 | CPU 2
>> |
>> |-> cow_file_range() |-> __btrfs_buffered_write()
>> |-> btrfs_reserve_extent() | |
>> | | |
>> | | |
>> | ..... | |->
>> btrfs_check_data_free_space()
>> | |
>> | |
>> |-> extent_clear_unlock_delalloc() |
>>
>> In CPU 1, btrfs_reserve_extent()->find_free_extent()->
>> btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
>> operation will be delayed to be done in extent_clear_unlock_delalloc().
>> Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
>> btrfs_check_data_free_space() tries to reserve 100MB data space.
>> If
>> 100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
>> data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
>> data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
>> btrfs_check_data_free_space() will try to allcate new data chunk or call
>> btrfs_start_delalloc_roots(), or commit current transaction inorder to
>> reserve some free space, obviously a lot of work. But indeed it's not
>> necessary as long as decreasing bytes_may_use timely, we still have
>> free space, decreasing 128M from bytes_may_use.
>>
>> To fix this issue, this patch chooses to update bytes_may_use for both
>> data and metadata in btrfs_add_reserved_bytes(). For compress path, real
>> extent length may not be equal to file content length, so introduce a
>> ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
>> btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
>> file content length. Then compress path can update bytes_may_use
>> correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT,
>> RESERVE_ALLOC
>> and RESERVE_FREE.
>>
>> For inode marked as NODATACOW or extent marked as PREALLOC, we can
>> directly call btrfs_free_reserved_data_space() to adjust bytes_may_use.
>>
>> Meanwhile __btrfs_prealloc_file_range() will call
>> btrfs_free_reserved_data_space() internally for both sucessful and
>> failed
>> path, btrfs_prealloc_file_range()'s callers does not need to call
>> btrfs_free_reserved_data_space() any more.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>> fs/btrfs/ctree.h | 2 +-
>> fs/btrfs/extent-tree.c | 56
>> +++++++++++++++++---------------------------------
>> fs/btrfs/file.c | 26 +++++++++++++----------
>> fs/btrfs/inode-map.c | 3 +--
>> fs/btrfs/inode.c | 37 ++++++++++++++++++++++++---------
>> fs/btrfs/relocation.c | 11 ++++++++--
>> 6 files changed, 72 insertions(+), 63 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 4274a7b..7eb2913 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -2556,7 +2556,7 @@ int btrfs_alloc_logged_file_extent(struct
>> btrfs_trans_handle *trans,
>> struct btrfs_root *root,
>> u64 root_objectid, u64 owner, u64 offset,
>> struct btrfs_key *ins);
>> -int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes,
>> +int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64
>> num_bytes,
>> u64 min_alloc_size, u64 empty_size, u64 hint_byte,
>> struct btrfs_key *ins, int is_data, int delalloc);
>> int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct
>> btrfs_root *root,
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 8eaac39..5447973 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -60,21 +60,6 @@ enum {
>> CHUNK_ALLOC_FORCE = 2,
>> };
>>
>> -/*
>> - * Control how reservations are dealt with.
>> - *
>> - * RESERVE_FREE - freeing a reservation.
>> - * RESERVE_ALLOC - allocating space and we need to update
>> bytes_may_use for
>> - * ENOSPC accounting
>> - * RESERVE_ALLOC_NO_ACCOUNT - allocating space and we should not update
>> - * bytes_may_use as the ENOSPC accounting is done elsewhere
>> - */
>> -enum {
>> - RESERVE_FREE = 0,
>> - RESERVE_ALLOC = 1,
>> - RESERVE_ALLOC_NO_ACCOUNT = 2,
>> -};
>> -
>> static int update_block_group(struct btrfs_trans_handle *trans,
>> struct btrfs_root *root, u64 bytenr,
>> u64 num_bytes, int alloc);
>> @@ -105,7 +90,7 @@ static int find_next_key(struct btrfs_path *path,
>> int level,
>> static void dump_space_info(struct btrfs_space_info *info, u64 bytes,
>> int dump_block_groups);
>> static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache
>> *cache,
>> - u64 num_bytes, int reserve, int delalloc);
>> + u64 ram_bytes, u64 num_bytes, int delalloc);
>> static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache
>> *cache,
>> u64 num_bytes, int delalloc);
>> static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv,
>> @@ -3491,7 +3476,6 @@ again:
>> dcs = BTRFS_DC_SETUP;
>> else if (ret == -ENOSPC)
>> set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
>> - btrfs_free_reserved_data_space(inode, 0, num_pages);
>>
>> out_put:
>> iput(inode);
>> @@ -6300,8 +6284,9 @@ void btrfs_wait_block_group_reservations(struct
>> btrfs_block_group_cache *bg)
>> /**
>> * btrfs_add_reserved_bytes - update the block_group and space info
>> counters
>> * @cache: The cache we are manipulating
>> + * @ram_bytes: The number of bytes of file content, and will be
>> same to
>> + * @num_bytes except for the compress path.
>> * @num_bytes: The number of bytes in question
>> - * @reserve: One of the reservation enums
>> * @delalloc: The blocks are allocated for the delalloc write
>> *
>> * This is called by the allocator when it reserves space. Metadata
>> @@ -6316,7 +6301,7 @@ void btrfs_wait_block_group_reservations(struct
>> btrfs_block_group_cache *bg)
>> * succeeds.
>> */
>> static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache
>> *cache,
>> - u64 num_bytes, int reserve, int delalloc)
>> + u64 ram_bytes, u64 num_bytes, int delalloc)
>> {
>> struct btrfs_space_info *space_info = cache->space_info;
>> int ret = 0;
>> @@ -6328,13 +6313,11 @@ static int btrfs_add_reserved_bytes(struct
>> btrfs_block_group_cache *cache,
>> } else {
>> cache->reserved += num_bytes;
>> space_info->bytes_reserved += num_bytes;
>> - if (reserve == RESERVE_ALLOC) {
>> - trace_btrfs_space_reservation(cache->fs_info,
>> - "space_info", space_info->flags,
>> - num_bytes, 0);
>> - space_info->bytes_may_use -= num_bytes;
>> - }
>>
>> + trace_btrfs_space_reservation(cache->fs_info,
>> + "space_info", space_info->flags,
>> + num_bytes, 0);
>
> This needs to be ram_bytes to keep the accounting consistent for tools
> that use these tracepoints. Thanks,
OK, I'll fix this issue in later version.
Regards,
Xiaoguang Wang
>
> Josef
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
2016-07-20 13:18 ` Josef Bacik
@ 2016-07-21 1:49 ` Wang Xiaoguang
2016-07-21 13:05 ` Josef Bacik
0 siblings, 1 reply; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-21 1:49 UTC (permalink / raw)
To: Josef Bacik, linux-btrfs; +Cc: dsterba, holger
hello,
On 07/20/2016 09:18 PM, Josef Bacik wrote:
> On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
>> In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
>> wrong file offset for reloc_inode, it uses cluster->start and
>> cluster->end,
>> which indeed are extent's bytenr. The correct value should be
>> cluster->[start|end] minus block group's start bytenr.
>>
>> start bytenr cluster->start
>> | | extent | extent | ...| extent |
>> |----------------------------------------------------------------|
>> | block group reloc_inode |
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>> fs/btrfs/relocation.c | 10 ++++++----
>> 1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 0477dca..a0de885 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -3030,12 +3030,14 @@ int prealloc_file_extent_cluster(struct inode
>> *inode,
>> u64 num_bytes;
>> int nr = 0;
>> int ret = 0;
>> + u64 prealloc_start = cluster->start - offset;
>> + u64 prealloc_end = cluster->end - offset;
>>
>> BUG_ON(cluster->start != cluster->boundary[0]);
>> inode_lock(inode);
>>
>> - ret = btrfs_check_data_free_space(inode, cluster->start,
>> - cluster->end + 1 - cluster->start);
>> + ret = btrfs_check_data_free_space(inode, prealloc_start,
>> + prealloc_end + 1 - prealloc_start);
>> if (ret)
>> goto out;
>>
>> @@ -3056,8 +3058,8 @@ int prealloc_file_extent_cluster(struct inode
>> *inode,
>> break;
>> nr++;
>> }
>> - btrfs_free_reserved_data_space(inode, cluster->start,
>> - cluster->end + 1 - cluster->start);
>> + btrfs_free_reserved_data_space(inode, prealloc_start,
>> + prealloc_end + 1 - prealloc_start);
>> out:
>> inode_unlock(inode);
>> return ret;
>>
>
> This ends up being the same amount. Consider this scenario
>
> bg bytenr = 4096
> cluster->start = 8192
> cluster->end = 12287
>
> cluster->end + 1 - cluster->start = 4096
>
> prealloc_start = cluster->start - offset = 0
> prealloc_end = cluster->end - offset = 8191
>
> prealloc_end + 1 - prealloc_start = 4096
>
> You shift both by the same amount, which gives you the same answer.
> Thanks,
Thanks for reviewing.
Yes, I know the amount of preallocated data space is the same, this patch
does not fix any bugs :)
For every block group to be balanced, we create a corresponding inode.
For this inode, the initial offset should be 0. In your above example,
before this patch, it's btrfs_free_reserved_data_space(inode, 8192, 4096);
with this patch, it's btrfs_free_reserved_data_space(inode, 4096, 4096).
I just want to make btrfs_free_reserved_data_space()'s 'start' argument
be offset inside block group, not offset inside whole fs byternr space.
I'm not
a English native, hope that I have expressed what I want to :)
But yes, I'm also OK with removing this patch.
Regards,
Xiaoguang Wang
>
> Josef
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue
2016-07-20 8:46 ` [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Holger Hoffstätte
@ 2016-07-21 1:51 ` Wang Xiaoguang
0 siblings, 0 replies; 15+ messages in thread
From: Wang Xiaoguang @ 2016-07-21 1:51 UTC (permalink / raw)
To: Holger Hoffstätte, linux-btrfs; +Cc: dsterba, jbacik
hello,
On 07/20/2016 04:46 PM, Holger Hoffstätte wrote:
> On 07/20/16 07:56, Wang Xiaoguang wrote:
>> Currently in btrfs, for data space reservation, it does not update
>> bytes_may_use in btrfs_update_reserved_bytes() and the decrease operation
>> will be delayed to be done in extent_clear_unlock_delalloc(), for
>> fallocate(2), decrease operation is even delayed to be done in end
>> of btrfs_fallocate(), which is too late. Obviously such delay will
>> cause unnecessary pressure to enospc system, in [PATCH 4/4], there is
>> a simpel test script that can reveal such false ENOSPC bug.
>>
>> So in this patch set, it will remove RESERVE_FREE, RESERVE_ALLOC and
>> RESERVE_ALLOC_NO_ACCOUNT, and we always update bytes_may_use timely.
>>
>> I'll also commit a fstests test case for this issue.
>>
>> Wang Xiaoguang (4):
>> btrfs: use correct offset for reloc_inode in
>> prealloc_file_extent_cluster()
>> btrfs: divide btrfs_update_reserved_bytes() into two functions
>> btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag
>> btrfs: update btrfs_space_info's bytes_may_use timely
>>
> Just like the previous version, for all 4 patches:
>
> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
>
> via the reproducer script & some very large manual fallocates.
Yes, thanks very much for your test.
I also run the whole fstests before sending this patch set :)
Regards,
Xiaoguang Wang
>
> thanks!
> Holger
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
2016-07-21 1:49 ` Wang Xiaoguang
@ 2016-07-21 13:05 ` Josef Bacik
0 siblings, 0 replies; 15+ messages in thread
From: Josef Bacik @ 2016-07-21 13:05 UTC (permalink / raw)
To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba, holger
On 07/20/2016 09:49 PM, Wang Xiaoguang wrote:
> hello,
>
> On 07/20/2016 09:18 PM, Josef Bacik wrote:
>> On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
>>> In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
>>> wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
>>> which indeed are extent's bytenr. The correct value should be
>>> cluster->[start|end] minus block group's start bytenr.
>>>
>>> start bytenr cluster->start
>>> | | extent | extent | ...| extent |
>>> |----------------------------------------------------------------|
>>> | block group reloc_inode |
>>>
>>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>> ---
>>> fs/btrfs/relocation.c | 10 ++++++----
>>> 1 file changed, 6 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>>> index 0477dca..a0de885 100644
>>> --- a/fs/btrfs/relocation.c
>>> +++ b/fs/btrfs/relocation.c
>>> @@ -3030,12 +3030,14 @@ int prealloc_file_extent_cluster(struct inode *inode,
>>> u64 num_bytes;
>>> int nr = 0;
>>> int ret = 0;
>>> + u64 prealloc_start = cluster->start - offset;
>>> + u64 prealloc_end = cluster->end - offset;
>>>
>>> BUG_ON(cluster->start != cluster->boundary[0]);
>>> inode_lock(inode);
>>>
>>> - ret = btrfs_check_data_free_space(inode, cluster->start,
>>> - cluster->end + 1 - cluster->start);
>>> + ret = btrfs_check_data_free_space(inode, prealloc_start,
>>> + prealloc_end + 1 - prealloc_start);
>>> if (ret)
>>> goto out;
>>>
>>> @@ -3056,8 +3058,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
>>> break;
>>> nr++;
>>> }
>>> - btrfs_free_reserved_data_space(inode, cluster->start,
>>> - cluster->end + 1 - cluster->start);
>>> + btrfs_free_reserved_data_space(inode, prealloc_start,
>>> + prealloc_end + 1 - prealloc_start);
>>> out:
>>> inode_unlock(inode);
>>> return ret;
>>>
>>
>> This ends up being the same amount. Consider this scenario
>>
>> bg bytenr = 4096
>> cluster->start = 8192
>> cluster->end = 12287
>>
>> cluster->end + 1 - cluster->start = 4096
>>
>> prealloc_start = cluster->start - offset = 0
>> prealloc_end = cluster->end - offset = 8191
>>
>> prealloc_end + 1 - prealloc_start = 4096
>>
>> You shift both by the same amount, which gives you the same answer. Thanks,
> Thanks for reviewing.
> Yes, I know the amount of preallocated data space is the same, this patch
> does not fix any bugs :)
>
> For every block group to be balanced, we create a corresponding inode.
> For this inode, the initial offset should be 0. In your above example,
> before this patch, it's btrfs_free_reserved_data_space(inode, 8192, 4096);
> with this patch, it's btrfs_free_reserved_data_space(inode, 4096, 4096).
>
> I just want to make btrfs_free_reserved_data_space()'s 'start' argument
> be offset inside block group, not offset inside whole fs byternr space. I'm not
> a English native, hope that I have expressed what I want to :)
>
> But yes, I'm also OK with removing this patch.
>
Oh I see, ok that's fine if we're just trying to make it look sane then go for it.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-07-21 13:05 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-20 5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
2016-07-20 5:56 ` [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
2016-07-20 13:18 ` Josef Bacik
2016-07-21 1:49 ` Wang Xiaoguang
2016-07-21 13:05 ` Josef Bacik
2016-07-20 5:56 ` [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
2016-07-20 13:21 ` Josef Bacik
2016-07-20 5:56 ` [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag Wang Xiaoguang
2016-07-20 13:22 ` Josef Bacik
2016-07-21 1:15 ` Wang Xiaoguang
2016-07-20 5:56 ` [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
2016-07-20 13:35 ` Josef Bacik
2016-07-21 1:18 ` Wang Xiaoguang
2016-07-20 8:46 ` [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Holger Hoffstätte
2016-07-21 1:51 ` Wang Xiaoguang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.