[PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier
@ 2013-11-21 13:43 Miao Xie
  2013-11-21 13:43 ` [PATCH 2/5] Btrfs: just do diry page flush for the inode with compression before direct IO Miao Xie
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Miao Xie @ 2013-11-21 13:43 UTC (permalink / raw)
  To: linux-btrfs

The tasks that wait for the IO_DONE flag just care about the io of the dirty
pages, so it is better to wake up them immediately after all the pages are
written, not the whole process of the io completes.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/ordered-data.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index eb5bac4..1bd7002 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -348,10 +348,13 @@ int btrfs_dec_test_first_ordered_pending(struct inode *inode,
 	if (!uptodate)
 		set_bit(BTRFS_ORDERED_IOERR, &entry->flags);
 
-	if (entry->bytes_left == 0)
+	if (entry->bytes_left == 0) {
 		ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags);
-	else
+		if (waitqueue_active(&entry->wait))
+			wake_up(&entry->wait);
+	} else {
 		ret = 1;
+	}
 out:
 	if (!ret && cached && entry) {
 		*cached = entry;
@@ -408,10 +411,13 @@ have_entry:
 	if (!uptodate)
 		set_bit(BTRFS_ORDERED_IOERR, &entry->flags);
 
-	if (entry->bytes_left == 0)
+	if (entry->bytes_left == 0) {
 		ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags);
-	else
+		if (waitqueue_active(&entry->wait))
+			wake_up(&entry->wait);
+	} else {
 		ret = 1;
+	}
 out:
 	if (!ret && cached && entry) {
 		*cached = entry;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] Btrfs: just do diry page flush for the inode with compression before direct IO
  2013-11-21 13:43 [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier Miao Xie
@ 2013-11-21 13:43 ` Miao Xie
  2013-11-21 13:43 ` [PATCH 3/5] Btrfs: remove the unnecessary flush when preparing the pages Miao Xie
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Miao Xie @ 2013-11-21 13:43 UTC (permalink / raw)
  To: linux-btrfs

As the comment in the btrfs_direct_IO says, only the compressed pages need be
flush again to make sure they are on the disk, but the common pages needn't,
so we add a if statement to check if the inode has compressed pages or not,
if no, skip the flush.

And in order to prevent the write ranges from intersecting, we need wait for
the running ordered extents. But the current code waits for them twice, one
is done before the direct IO starts (in btrfs_wait_ordered_range()), the other
is before we get the blocks, it is unnecessary. because we can do the direct
IO without holding i_mutex, it means that the intersected ordered extents may
happen during the direct IO, the first wait can not avoid this problem. So we
use filemap_fdatawrite_range() instead of btrfs_wait_ordered_range() to remove
the first wait.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/inode.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index da8d2f6..a407242 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7229,15 +7229,15 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
 	smp_mb__after_atomic_inc();

 	/*
-	 * The generic stuff only does filemap_write_and_wait_range, which isn't
-	 * enough if we've written compressed pages to this area, so we need to
-	 * call btrfs_wait_ordered_range to make absolutely sure that any
-	 * outstanding dirty pages are on disk.
+	 * The generic stuff only does filemap_write_and_wait_range, which
+	 * isn't enough if we've written compressed pages to this area, so
+	 * we need to flush the dirty pages again to make absolutely sure
+	 * that any outstanding dirty pages are on disk.
 	 */
 	count = iov_length(iov, nr_segs);
-	ret = btrfs_wait_ordered_range(inode, offset, count);
-	if (ret)
-		return ret;
+	if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
+		     &BTRFS_I(inode)->runtime_flags))
+		filemap_fdatawrite_range(inode->i_mapping, offset, count);

 	if (rw & WRITE) {
 		/*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] Btrfs: remove the unnecessary flush when preparing the pages
  2013-11-21 13:43 [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier Miao Xie
  2013-11-21 13:43 ` [PATCH 2/5] Btrfs: just do diry page flush for the inode with compression before direct IO Miao Xie
@ 2013-11-21 13:43 ` Miao Xie
  2013-11-21 13:43 ` [PATCH 4/5] Btrfs: remove unnecessary lock in may_commit_transaction() Miao Xie
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Miao Xie @ 2013-11-21 13:43 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/file.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 82d0342..27f65e4 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1286,12 +1286,11 @@ again:
 		struct btrfs_ordered_extent *ordered;
 		lock_extent_bits(&BTRFS_I(inode)->io_tree,
 				 start_pos, last_pos - 1, 0, &cached_state);
-		ordered = btrfs_lookup_first_ordered_extent(inode,
-							    last_pos - 1);
+		ordered = btrfs_lookup_ordered_range(inode, start_pos,
+						     last_pos - start_pos);
 		if (ordered &&
 		    ordered->file_offset + ordered->len > start_pos &&
 		    ordered->file_offset < last_pos) {
-			btrfs_put_ordered_extent(ordered);
 			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
 					     start_pos, last_pos - 1,
 					     &cached_state, GFP_NOFS);
@@ -1299,10 +1298,8 @@ again:
 				unlock_page(pages[i]);
 				page_cache_release(pages[i]);
 			}
-			err = btrfs_wait_ordered_range(inode, start_pos,
-						       last_pos - start_pos);
-			if (err)
-				goto fail;
+			btrfs_start_ordered_extent(inode, ordered, 1);
+			btrfs_put_ordered_extent(ordered);
 			goto again;
 		}
 		if (ordered)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] Btrfs: remove unnecessary lock in may_commit_transaction()
  2013-11-21 13:43 [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier Miao Xie
  2013-11-21 13:43 ` [PATCH 2/5] Btrfs: just do diry page flush for the inode with compression before direct IO Miao Xie
  2013-11-21 13:43 ` [PATCH 3/5] Btrfs: remove the unnecessary flush when preparing the pages Miao Xie
@ 2013-11-21 13:43 ` Miao Xie
  2013-11-21 13:43 ` [PATCH 5/5] Btrfs: reclaim the reserved metadata space at background Miao Xie
  2013-11-22  4:30 ` [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier Liu Bo
  4 siblings, 0 replies; 9+ messages in thread
From: Miao Xie @ 2013-11-21 13:43 UTC (permalink / raw)
  To: linux-btrfs

The reason is:
- The per-cpu counter has its own lock to protect itself.
- Here we need get a exact value.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 45d98d0..12a5b6d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4142,13 +4142,9 @@ static int may_commit_transaction(struct btrfs_root *root,
 		goto commit;
 
 	/* See if there is enough pinned space to make this reservation */
-	spin_lock(&space_info->lock);
 	if (percpu_counter_compare(&space_info->total_bytes_pinned,
-				   bytes) >= 0) {
-		spin_unlock(&space_info->lock);
+				   bytes) >= 0)
 		goto commit;
-	}
-	spin_unlock(&space_info->lock);
 
 	/*
 	 * See if there is some space in the delayed insertion reservation for
@@ -4157,16 +4153,13 @@ static int may_commit_transaction(struct btrfs_root *root,
 	if (space_info != delayed_rsv->space_info)
 		return -ENOSPC;
 
-	spin_lock(&space_info->lock);
 	spin_lock(&delayed_rsv->lock);
 	if (percpu_counter_compare(&space_info->total_bytes_pinned,
 				   bytes - delayed_rsv->size) >= 0) {
 		spin_unlock(&delayed_rsv->lock);
-		spin_unlock(&space_info->lock);
 		return -ENOSPC;
 	}
 	spin_unlock(&delayed_rsv->lock);
-	spin_unlock(&space_info->lock);
 
 commit:
 	trans = btrfs_join_transaction(root);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] Btrfs: reclaim the reserved metadata space at background
  2013-11-21 13:43 [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier Miao Xie
                   ` (2 preceding siblings ...)
  2013-11-21 13:43 ` [PATCH 4/5] Btrfs: remove unnecessary lock in may_commit_transaction() Miao Xie
@ 2013-11-21 13:43 ` Miao Xie
  2013-11-22 17:48   ` Josef Bacik
  2013-11-22  4:30 ` [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier Liu Bo
  4 siblings, 1 reply; 9+ messages in thread
From: Miao Xie @ 2013-11-21 13:43 UTC (permalink / raw)
  To: linux-btrfs

Before applying this patch, the task had to reclaim the metadata space
by itself if the metadata space was not enough. And When the task started
the space reclamation, all the other tasks which wanted to reserve the
metadata space were blocked. At some cases, they would be blocked for
a long time, it made the performance fluctuate wildly.

So we introduce the background metadata space reclamation, when the space
is about to be exhausted, we insert a reclaim work into the workqueue, the
worker of the workqueue helps us to reclaim the reserved space at the
background. By this way, the tasks needn't reclaim the space by themselves at
most cases, and even if the tasks have to reclaim the space or are blocked
for the space reclamation, they will get enough space more quickly.

Here is my test result(Tested by sysbench for 3 times):
 Memory:	2GB
 CPU:		2Cores * 1CPU
 Partition:	20GB(SSD)
				w/o			w
 seqwr-512KB-1Thread-2GB	180.08MB/s		178.00MB/s
 seqwr-512KB-8Threads-2GB	179.14MB/s		179.25MB/s
 seqwr-128KB-1Thread-2GB	186.3MB/s		186.57MB/s
 seqwr-128KB-8Threads-2GB	176.05MB/s		183.05MB/s

 rndwr-512KB-1Thread-2GB	114.20MB/s		117.42MB/s
 rndwr-512KB-8Threads-2GB	129.18MB/s		132.51MB/s
 rndwr-128KB-1Thread-2GB	963.94MB/s		1.4058GB/s
 rndwr-128KB-8Threads-2GB	1.1950GB/s		1.3787GB/s

The above numbers are the averages.

rndwr: random write test
seqwr: sequential write test
512KB,128KB: the size of each request
1Thread, 8Threads: the number of the test threads
2GB: The total file size

For the last two cases, the ramdom write test just executed 10000 requests
every time we ran the test, so the most data was cached in the page cache,
it is why the random test was faster than the sequential write test. And
besides that, according to our test result, the random write performance
on the kernel without the patch was not stable, it fluctuated between
840MB/s and 1.25GB/s. But after applying the patch, the performance of
the random write was stable.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  6 +++
 fs/btrfs/disk-io.c     |  3 ++
 fs/btrfs/extent-tree.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/super.c       |  1 +
 4 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f9aeb27..e0d2a3b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -33,6 +33,7 @@
 #include <asm/kmap_types.h>
 #include <linux/pagemap.h>
 #include <linux/btrfs.h>
+#include <linux/workqueue.h>
 #include "extent_io.h"
 #include "extent_map.h"
 #include "async-thread.h"
@@ -1289,6 +1290,8 @@ struct btrfs_stripe_hash_table {
 
 #define BTRFS_STRIPE_HASH_TABLE_BITS 11
 
+void btrfs_init_async_reclaim_work(struct work_struct *work);
+
 /* fs_info */
 struct reloc_control;
 struct btrfs_device;
@@ -1655,6 +1658,9 @@ struct btrfs_fs_info {
 
 	struct semaphore uuid_tree_rescan_sem;
 	unsigned int update_uuid_tree_gen:1;
+
+	/* Used to reclaim the metadata space in the background. */
+	struct work_struct async_reclaim_work;
 };
 
 /*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4c4ed0b..2b49923 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2234,6 +2234,7 @@ int open_ctree(struct super_block *sb,
 	atomic_set(&fs_info->balance_cancel_req, 0);
 	fs_info->balance_ctl = NULL;
 	init_waitqueue_head(&fs_info->balance_wait_q);
+	btrfs_init_async_reclaim_work(&fs_info->async_reclaim_work);
 
 	sb->s_blocksize = 4096;
 	sb->s_blocksize_bits = blksize_bits(4096);
@@ -3579,6 +3580,8 @@ int close_ctree(struct btrfs_root *root)
 	/* clear out the rbtree of defraggable inodes */
 	btrfs_cleanup_defrag_inodes(fs_info);
 
+	cancel_work_sync(&fs_info->async_reclaim_work);
+
 	if (!(fs_info->sb->s_flags & MS_RDONLY)) {
 		ret = btrfs_commit_super(root);
 		if (ret)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 12a5b6d..c122dc7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4230,6 +4230,98 @@ static int flush_space(struct btrfs_root *root,
 
 	return ret;
 }
+
+static inline u64
+btrfs_calc_reclaim_metadata_size(struct btrfs_root *root,
+				 struct btrfs_space_info *space_info)
+{
+	u64 used;
+	u64 expected;
+	u64 to_reclaim;
+
+	to_reclaim = min_t(u64, num_online_cpus() * 1024 * 1024,
+				32 * 1024 * 1024);
+	spin_lock(&space_info->lock);
+	if (can_overcommit(root, space_info, to_reclaim,
+			   BTRFS_RESERVE_FLUSH_ALL)) {
+		to_reclaim = 0;
+		goto out;
+	}
+
+	used = space_info->bytes_used + space_info->bytes_reserved +
+	       space_info->bytes_pinned + space_info->bytes_readonly +
+	       space_info->bytes_may_use;
+	if (can_overcommit(root, space_info, 1024 * 1024,
+			   BTRFS_RESERVE_FLUSH_ALL))
+		expected = div_factor_fine(space_info->total_bytes, 95);
+	else
+		expected = div_factor_fine(space_info->total_bytes, 90);
+	to_reclaim = used - expected;
+out:
+	spin_unlock(&space_info->lock);
+
+	return to_reclaim;
+}
+
+static inline int need_do_async_reclaim(struct btrfs_space_info *space_info,
+					struct btrfs_fs_info *fs_info, u64 used)
+{
+	return (used >= div_factor_fine(space_info->total_bytes, 95) &&
+		!btrfs_fs_closing(fs_info) &&
+		!test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
+}
+
+static int btrfs_need_do_async_reclaim(struct btrfs_space_info *space_info,
+				       struct btrfs_fs_info *fs_info)
+{
+	u64 used;
+
+	spin_lock(&space_info->lock);
+	used = space_info->bytes_used + space_info->bytes_reserved +
+	       space_info->bytes_pinned + space_info->bytes_readonly +
+	       space_info->bytes_may_use;
+	if (need_do_async_reclaim(space_info, fs_info, used)) {
+		spin_unlock(&space_info->lock);
+		return 1;
+	}
+	spin_unlock(&space_info->lock);
+
+	return 0;
+}
+
+static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
+{
+	struct btrfs_fs_info *fs_info;
+	struct btrfs_space_info *space_info;
+	u64 to_reclaim;
+	int flush_state;
+
+	fs_info = container_of(work, struct btrfs_fs_info, async_reclaim_work);
+	space_info = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
+
+	to_reclaim = btrfs_calc_reclaim_metadata_size(fs_info->fs_root,
+						      space_info);
+	if (!to_reclaim)
+		return;
+
+	flush_state = FLUSH_DELAYED_ITEMS_NR;
+	do {
+		flush_space(fs_info->fs_root, space_info, to_reclaim,
+			    to_reclaim, flush_state);
+		flush_state++;
+		if (!btrfs_need_do_async_reclaim(space_info, fs_info))
+			return;
+	} while (flush_state <= COMMIT_TRANS);
+
+	if (btrfs_need_do_async_reclaim(space_info, fs_info))
+		queue_work(system_unbound_wq, work);
+}
+
+void btrfs_init_async_reclaim_work(struct work_struct *work)
+{
+	INIT_WORK(work, btrfs_async_reclaim_metadata_space);
+}
+
 /**
  * reserve_metadata_bytes - try to reserve bytes from the block_rsv's space
  * @root - the root we're allocating for
@@ -4337,8 +4429,13 @@ again:
 	if (ret && flush != BTRFS_RESERVE_NO_FLUSH) {
 		flushing = true;
 		space_info->flush = 1;
+	} else if (!ret && space_info->flags & BTRFS_BLOCK_GROUP_METADATA) {
+		used += orig_bytes;
+		if (need_do_async_reclaim(space_info, root->fs_info, used) &&
+		    !work_busy(&root->fs_info->async_reclaim_work))
+			queue_work(system_unbound_wq,
+				   &root->fs_info->async_reclaim_work);
 	}
-
 	spin_unlock(&space_info->lock);
 
 	if (!ret || flush == BTRFS_RESERVE_NO_FLUSH)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 2d8ac1b..fb62c45 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1329,6 +1329,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 		 * this also happens on 'umount -rf' or on shutdown, when
 		 * the filesystem is busy.
 		 */
+		cancel_work_sync(&fs_info->async_reclaim_work);
 
 		/* wait for the uuid_scan task to finish */
 		down(&fs_info->uuid_tree_rescan_sem);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier
  2013-11-21 13:43 [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier Miao Xie
                   ` (3 preceding siblings ...)
  2013-11-21 13:43 ` [PATCH 5/5] Btrfs: reclaim the reserved metadata space at background Miao Xie
@ 2013-11-22  4:30 ` Liu Bo
  2013-11-22  7:28   ` Miao Xie
  4 siblings, 1 reply; 9+ messages in thread
From: Liu Bo @ 2013-11-22  4:30 UTC (permalink / raw)
  To: Miao Xie; +Cc: linux-btrfs

On Thu, Nov 21, 2013 at 09:43:14PM +0800, Miao Xie wrote:
> The tasks that wait for the IO_DONE flag just care about the io of the dirty
> pages, so it is better to wake up them immediately after all the pages are
> written, not the whole process of the io completes.

This doesn't seem to make sense, the waiters still go to wait and schedule since
IO_DONE is not set there yet.

-liubo

> 
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
>  fs/btrfs/ordered-data.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index eb5bac4..1bd7002 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -348,10 +348,13 @@ int btrfs_dec_test_first_ordered_pending(struct inode *inode,
>  	if (!uptodate)
>  		set_bit(BTRFS_ORDERED_IOERR, &entry->flags);
>  
> -	if (entry->bytes_left == 0)
> +	if (entry->bytes_left == 0) {
>  		ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags);
> -	else
> +		if (waitqueue_active(&entry->wait))
> +			wake_up(&entry->wait);
> +	} else {
>  		ret = 1;
> +	}
>  out:
>  	if (!ret && cached && entry) {
>  		*cached = entry;
> @@ -408,10 +411,13 @@ have_entry:
>  	if (!uptodate)
>  		set_bit(BTRFS_ORDERED_IOERR, &entry->flags);
>  
> -	if (entry->bytes_left == 0)
> +	if (entry->bytes_left == 0) {
>  		ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags);
> -	else
> +		if (waitqueue_active(&entry->wait))
> +			wake_up(&entry->wait);
> +	} else {
>  		ret = 1;
> +	}
>  out:
>  	if (!ret && cached && entry) {
>  		*cached = entry;
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier
  2013-11-22  4:30 ` [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier Liu Bo
@ 2013-11-22  7:28   ` Miao Xie
  2013-11-22  8:47     ` Liu Bo
  0 siblings, 1 reply; 9+ messages in thread
From: Miao Xie @ 2013-11-22  7:28 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs

On Fri, 22 Nov 2013 12:30:40 +0800, Liu Bo wrote:
> On Thu, Nov 21, 2013 at 09:43:14PM +0800, Miao Xie wrote:
>> The tasks that wait for the IO_DONE flag just care about the io of the dirty
>> pages, so it is better to wake up them immediately after all the pages are
>> written, not the whole process of the io completes.
> 
> This doesn't seem to make sense, the waiters still go to wait and schedule since
> IO_DONE is not set there yet.

I can not understand what you said. We wake up the waiters after IO_DONE is set,
the waiters who wait for IO_DONE flag will not go to wait.

Miao

> 
> -liubo
> 
>>
>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>> ---
>>  fs/btrfs/ordered-data.c | 14 ++++++++++----
>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
>> index eb5bac4..1bd7002 100644
>> --- a/fs/btrfs/ordered-data.c
>> +++ b/fs/btrfs/ordered-data.c
>> @@ -348,10 +348,13 @@ int btrfs_dec_test_first_ordered_pending(struct inode *inode,
>>  	if (!uptodate)
>>  		set_bit(BTRFS_ORDERED_IOERR, &entry->flags);
>>  
>> -	if (entry->bytes_left == 0)
>> +	if (entry->bytes_left == 0) {
>>  		ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags);
>> -	else
>> +		if (waitqueue_active(&entry->wait))
>> +			wake_up(&entry->wait);
>> +	} else {
>>  		ret = 1;
>> +	}
>>  out:
>>  	if (!ret && cached && entry) {
>>  		*cached = entry;
>> @@ -408,10 +411,13 @@ have_entry:
>>  	if (!uptodate)
>>  		set_bit(BTRFS_ORDERED_IOERR, &entry->flags);
>>  
>> -	if (entry->bytes_left == 0)
>> +	if (entry->bytes_left == 0) {
>>  		ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags);
>> -	else
>> +		if (waitqueue_active(&entry->wait))
>> +			wake_up(&entry->wait);
>> +	} else {
>>  		ret = 1;
>> +	}
>>  out:
>>  	if (!ret && cached && entry) {
>>  		*cached = entry;
>> -- 
>> 1.8.3.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier
  2013-11-22  7:28   ` Miao Xie
@ 2013-11-22  8:47     ` Liu Bo
  0 siblings, 0 replies; 9+ messages in thread
From: Liu Bo @ 2013-11-22  8:47 UTC (permalink / raw)
  To: Miao Xie; +Cc: linux-btrfs

On Fri, Nov 22, 2013 at 03:28:32PM +0800, Miao Xie wrote:
> On Fri, 22 Nov 2013 12:30:40 +0800, Liu Bo wrote:
> > On Thu, Nov 21, 2013 at 09:43:14PM +0800, Miao Xie wrote:
> >> The tasks that wait for the IO_DONE flag just care about the io of the dirty
> >> pages, so it is better to wake up them immediately after all the pages are
> >> written, not the whole process of the io completes.
> > 
> > This doesn't seem to make sense, the waiters still go to wait and schedule since
> > IO_DONE is not set there yet.
> 
> I can not understand what you said. We wake up the waiters after IO_DONE is set,
> the waiters who wait for IO_DONE flag will not go to wait.
> 
> Miao
> 
> > 
> > -liubo
> > 
> >>
> >> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> >> ---
> >>  fs/btrfs/ordered-data.c | 14 ++++++++++----
> >>  1 file changed, 10 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> >> index eb5bac4..1bd7002 100644
> >> --- a/fs/btrfs/ordered-data.c
> >> +++ b/fs/btrfs/ordered-data.c
> >> @@ -348,10 +348,13 @@ int btrfs_dec_test_first_ordered_pending(struct inode *inode,
> >>  	if (!uptodate)
> >>  		set_bit(BTRFS_ORDERED_IOERR, &entry->flags);
> >>  
> >> -	if (entry->bytes_left == 0)
> >> +	if (entry->bytes_left == 0) {
> >>  		ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags);
> >> -	else

My bad, something got in my eye, I was thinking 'else' is keeped.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

thanks,
-liubo

> >> +		if (waitqueue_active(&entry->wait))
> >> +			wake_up(&entry->wait);
> >> +	} else {
> >>  		ret = 1;
> >> +	}
> >>  out:
> >>  	if (!ret && cached && entry) {
> >>  		*cached = entry;
> >> @@ -408,10 +411,13 @@ have_entry:
> >>  	if (!uptodate)
> >>  		set_bit(BTRFS_ORDERED_IOERR, &entry->flags);
> >>  
> >> -	if (entry->bytes_left == 0)
> >> +	if (entry->bytes_left == 0) {
> >>  		ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags);
> >> -	else
> >> +		if (waitqueue_active(&entry->wait))
> >> +			wake_up(&entry->wait);
> >> +	} else {
> >>  		ret = 1;
> >> +	}
> >>  out:
> >>  	if (!ret && cached && entry) {
> >>  		*cached = entry;
> >> -- 
> >> 1.8.3.1
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 5/5] Btrfs: reclaim the reserved metadata space at background
  2013-11-21 13:43 ` [PATCH 5/5] Btrfs: reclaim the reserved metadata space at background Miao Xie
@ 2013-11-22 17:48   ` Josef Bacik
  0 siblings, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2013-11-22 17:48 UTC (permalink / raw)
  To: Miao Xie; +Cc: linux-btrfs

On Thu, Nov 21, 2013 at 09:43:18PM +0800, Miao Xie wrote:
> Before applying this patch, the task had to reclaim the metadata space
> by itself if the metadata space was not enough. And When the task started
> the space reclamation, all the other tasks which wanted to reserve the
> metadata space were blocked. At some cases, they would be blocked for
> a long time, it made the performance fluctuate wildly.
> 

So the reason the flushing is done this way is because of this level of hell
called "early enospc."  Basically we'd get people flushing randomly and other
users would come in and use the reclaimed space, so whoever was flushing would
often ENOSPC because they thought they did everything they could to flush and
still couldn't make allocations.  This approach is a nice balance keeping the
old "one at a time" flushers and adding a background flusher, but I still worry
about people competing with the background flushing.

Consider the case where the background flusher has started and taken all of the
ordered extents on the system to flush (and lets assume that we only have
reservations tied up in ordered extents, which is very possible).  Then a task
comes in to make a reservation but it can't because it doesn't have space, so it
tries to flush.  But the inline flushing stuff doesn't find any ordered extents
to flush because they've been spliced off the list by the background flusher.
So we bail out and do -ENOSPC even though there is plenty of space.

What I would like to see is some way for a flusher who has to flush inline be
able to see that there is a background flusher and wait for it to finish its
work before doing its own flushing.  If I have to start tracking down early
ENOSPC problems again I may very well quit doing file system work forever.
Thanks,

Josef

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-11-22 17:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-21 13:43 [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier Miao Xie
2013-11-21 13:43 ` [PATCH 2/5] Btrfs: just do diry page flush for the inode with compression before direct IO Miao Xie
2013-11-21 13:43 ` [PATCH 3/5] Btrfs: remove the unnecessary flush when preparing the pages Miao Xie
2013-11-21 13:43 ` [PATCH 4/5] Btrfs: remove unnecessary lock in may_commit_transaction() Miao Xie
2013-11-21 13:43 ` [PATCH 5/5] Btrfs: reclaim the reserved metadata space at background Miao Xie
2013-11-22 17:48   ` Josef Bacik
2013-11-22  4:30 ` [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier Liu Bo
2013-11-22  7:28   ` Miao Xie
2013-11-22  8:47     ` Liu Bo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.