linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Naohiro Aota <naohiro.aota@wdc.com>,
	linux-btrfs@vger.kernel.org, dsterba@suse.com
Cc: hare@suse.com, linux-fsdevel@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: Re: [PATCH v13 13/42] btrfs: track unusable bytes for zones
Date: Fri, 22 Jan 2021 10:11:53 -0500	[thread overview]
Message-ID: <7f676b7d-ab80-5dc1-6fbf-ed29e4bf4512@toxicpanda.com> (raw)
In-Reply-To: <b949ad399801a5c5c5a07cafcb259b6151e66e48.1611295439.git.naohiro.aota@wdc.com>

On 1/22/21 1:21 AM, Naohiro Aota wrote:
> In zoned btrfs a region that was once written then freed is not usable
> until we reset the underlying zone. So we need to distinguish such
> unusable space from usable free space.
> 
> Therefore we need to introduce the "zone_unusable" field  to the block
> group structure, and "bytes_zone_unusable" to the space_info structure to
> track the unusable space.
> 
> Pinned bytes are always reclaimed to the unusable space. But, when an
> allocated region is returned before using e.g., the block group becomes
> read-only between allocation time and reservation time, we can safely
> return the region to the block group. For the situation, this commit
> introduces "btrfs_add_free_space_unused". This behaves the same as
> btrfs_add_free_space() on regular btrfs. On zoned btrfs, it rewinds the
> allocation offset.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>   fs/btrfs/block-group.c      | 39 ++++++++++++++-------
>   fs/btrfs/block-group.h      |  1 +
>   fs/btrfs/extent-tree.c      | 10 +++++-
>   fs/btrfs/free-space-cache.c | 67 +++++++++++++++++++++++++++++++++++++
>   fs/btrfs/free-space-cache.h |  2 ++
>   fs/btrfs/space-info.c       | 13 ++++---
>   fs/btrfs/space-info.h       |  4 ++-
>   fs/btrfs/sysfs.c            |  2 ++
>   fs/btrfs/zoned.c            | 24 +++++++++++++
>   fs/btrfs/zoned.h            |  3 ++
>   10 files changed, 146 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 7c210aa5f25f..487511e3f000 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1001,12 +1001,17 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>   		WARN_ON(block_group->space_info->total_bytes
>   			< block_group->length);
>   		WARN_ON(block_group->space_info->bytes_readonly
> -			< block_group->length);
> +			< block_group->length - block_group->zone_unusable);
> +		WARN_ON(block_group->space_info->bytes_zone_unusable
> +			< block_group->zone_unusable);
>   		WARN_ON(block_group->space_info->disk_total
>   			< block_group->length * factor);
>   	}
>   	block_group->space_info->total_bytes -= block_group->length;
> -	block_group->space_info->bytes_readonly -= block_group->length;
> +	block_group->space_info->bytes_readonly -=
> +		(block_group->length - block_group->zone_unusable);
> +	block_group->space_info->bytes_zone_unusable -=
> +		block_group->zone_unusable;
>   	block_group->space_info->disk_total -= block_group->length * factor;
>   
>   	spin_unlock(&block_group->space_info->lock);
> @@ -1150,7 +1155,7 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force)
>   	}
>   
>   	num_bytes = cache->length - cache->reserved - cache->pinned -
> -		    cache->bytes_super - cache->used;
> +		    cache->bytes_super - cache->zone_unusable - cache->used;
>   
>   	/*
>   	 * Data never overcommits, even in mixed mode, so do just the straight
> @@ -1863,12 +1868,20 @@ static int read_one_block_group(struct btrfs_fs_info *info,
>   	}
>   
>   	/*
> -	 * Check for two cases, either we are full, and therefore don't need
> -	 * to bother with the caching work since we won't find any space, or we
> -	 * are empty, and we can just add all the space in and be done with it.
> -	 * This saves us _a_lot_ of time, particularly in the full case.
> +	 * For zoned btrfs, space after the allocation offset is the only
> +	 * free space for a block group. So, we don't need any caching
> +	 * work. btrfs_calc_zone_unusable() will set the amount of free
> +	 * space and zone_unusable space.
> +	 *
> +	 * For regular btrfs, check for two cases, either we are full, and
> +	 * therefore don't need to bother with the caching work since we
> +	 * won't find any space, or we are empty, and we can just add all
> +	 * the space in and be done with it.  This saves us _a_lot_ of
> +	 * time, particularly in the full case.
>   	 */
> -	if (cache->length == cache->used) {
> +	if (btrfs_is_zoned(info)) {
> +		btrfs_calc_zone_unusable(cache);
> +	} else if (cache->length == cache->used) {
>   		cache->last_byte_to_unpin = (u64)-1;
>   		cache->cached = BTRFS_CACHE_FINISHED;
>   		btrfs_free_excluded_extents(cache);
> @@ -1887,7 +1900,8 @@ static int read_one_block_group(struct btrfs_fs_info *info,
>   	}
>   	trace_btrfs_add_block_group(info, cache, 0);
>   	btrfs_update_space_info(info, cache->flags, cache->length,
> -				cache->used, cache->bytes_super, &space_info);
> +				cache->used, cache->bytes_super,
> +				cache->zone_unusable, &space_info);
>   
>   	cache->space_info = space_info;
>   
> @@ -1943,7 +1957,7 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info)
>   			break;
>   		}
>   		btrfs_update_space_info(fs_info, bg->flags, em->len, em->len,
> -					0, &space_info);
> +					0, 0, &space_info);
>   		bg->space_info = space_info;
>   		link_block_group(bg);
>   
> @@ -2185,7 +2199,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used,
>   	 */
>   	trace_btrfs_add_block_group(fs_info, cache, 1);
>   	btrfs_update_space_info(fs_info, cache->flags, size, bytes_used,
> -				cache->bytes_super, &cache->space_info);
> +				cache->bytes_super, 0, &cache->space_info);
>   	btrfs_update_global_block_rsv(fs_info);
>   
>   	link_block_group(cache);
> @@ -2293,7 +2307,8 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group *cache)
>   	spin_lock(&cache->lock);
>   	if (!--cache->ro) {
>   		num_bytes = cache->length - cache->reserved -
> -			    cache->pinned - cache->bytes_super - cache->used;
> +			    cache->pinned - cache->bytes_super -
> +			    cache->zone_unusable - cache->used;
>   		sinfo->bytes_readonly -= num_bytes;
>   		list_del_init(&cache->ro_list);
>   	}
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index 9d026ab1768d..0f3c62c561bc 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -189,6 +189,7 @@ struct btrfs_block_group {
>   	 * allocation. This is used only with ZONED mode enabled.
>   	 */
>   	u64 alloc_offset;
> +	u64 zone_unusable;
>   };
>   
>   static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group)
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 30b1a630dc2f..071a521927e6 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -34,6 +34,7 @@
>   #include "block-group.h"
>   #include "discard.h"
>   #include "rcu-string.h"
> +#include "zoned.h"
>   
>   #undef SCRAMBLE_DELAYED_REFS
>   
> @@ -2725,6 +2726,9 @@ fetch_cluster_info(struct btrfs_fs_info *fs_info,
>   {
>   	struct btrfs_free_cluster *ret = NULL;
>   
> +	if (btrfs_is_zoned(fs_info))
> +		return NULL;
> +

This is unrelated to the rest of the changes, seems like something that was just 
missed?  Should probably be in its own patch.

>   	*empty_cluster = 0;
>   	if (btrfs_mixed_space_info(space_info))
>   		return ret;
> @@ -2808,7 +2812,11 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info,
>   		space_info->max_extent_size = 0;
>   		percpu_counter_add_batch(&space_info->total_bytes_pinned,
>   			    -len, BTRFS_TOTAL_BYTES_PINNED_BATCH);
> -		if (cache->ro) {
> +		if (btrfs_is_zoned(fs_info)) {
> +			/* Need reset before reusing in a zoned block group */
> +			space_info->bytes_zone_unusable += len;
> +			readonly = true;
> +		} else if (cache->ro) {
>   			space_info->bytes_readonly += len;
>   			readonly = true;
>   		}

Is this right?  If we're balancing a block group then it could be marked ro and 
be zoned, so don't we want to account for this in ->bytes_readonly if it's read 
only?  So probably more correct to do

if (cache->ro) {
	/* stuff */
} else if (btrfs_is_zoned(fs_info) {
	/* other stuff */
}

right?  Thanks,

Josef

  reply	other threads:[~2021-01-22 15:13 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-22  6:21 [PATCH v13 00/42] btrfs: zoned block device support Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 01/42] block: add bio_add_zone_append_page Naohiro Aota
2021-01-23  2:50   ` Chaitanya Kulkarni
2021-01-23  3:08   ` Chaitanya Kulkarni
2021-01-25 17:31   ` Johannes Thumshirn
2021-01-22  6:21 ` [PATCH v13 02/42] iomap: support REQ_OP_ZONE_APPEND Naohiro Aota
2021-01-23  2:56   ` Chaitanya Kulkarni
2021-01-22  6:21 ` [PATCH v13 03/42] btrfs: defer loading zone info after opening trees Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 04/42] btrfs: use regular SB location on emulated zoned mode Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 05/42] btrfs: release path before calling into btrfs_load_block_group_zone_info Naohiro Aota
2021-01-22 14:57   ` Josef Bacik
2021-01-22  6:21 ` [PATCH v13 06/42] btrfs: do not load fs_info->zoned from incompat flag Naohiro Aota
2021-01-22 14:58   ` Josef Bacik
2021-01-22  6:21 ` [PATCH v13 07/42] btrfs: disallow fitrim in ZONED mode Naohiro Aota
2021-01-22 14:59   ` Josef Bacik
2021-01-22  6:21 ` [PATCH v13 08/42] btrfs: allow zoned mode on non-zoned block devices Naohiro Aota
2021-01-22 15:03   ` Josef Bacik
2021-01-22  6:21 ` [PATCH v13 09/42] btrfs: implement zoned chunk allocator Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 10/42] btrfs: verify device extent is aligned to zone Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 11/42] btrfs: load zone's allocation offset Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 12/42] btrfs: calculate allocation offset for conventional zones Naohiro Aota
2021-01-22 15:07   ` Josef Bacik
2021-01-22  6:21 ` [PATCH v13 13/42] btrfs: track unusable bytes for zones Naohiro Aota
2021-01-22 15:11   ` Josef Bacik [this message]
2021-01-25 10:37     ` Johannes Thumshirn
2021-01-25 12:08       ` Johannes Thumshirn
2021-01-22  6:21 ` [PATCH v13 14/42] btrfs: do sequential extent allocation in ZONED mode Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 15/42] btrfs: redirty released extent buffers " Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 16/42] btrfs: advance allocation pointer after tree log node Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 17/42] btrfs: enable to mount ZONED incompat flag Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 18/42] btrfs: reset zones of unused block groups Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 19/42] btrfs: extract page adding function Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 20/42] btrfs: use bio_add_zone_append_page for zoned btrfs Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 21/42] btrfs: handle REQ_OP_ZONE_APPEND as writing Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 22/42] btrfs: split ordered extent when bio is sent Naohiro Aota
2021-01-22 15:22   ` Josef Bacik
2021-01-25  8:56     ` Johannes Thumshirn
2021-01-25  9:02     ` Johannes Thumshirn
2021-01-22  6:21 ` [PATCH v13 23/42] btrfs: check if bio spans across an ordered extent Naohiro Aota
2021-01-22 15:23   ` Josef Bacik
2021-01-22  6:21 ` [PATCH v13 24/42] btrfs: extend btrfs_rmap_block for specifying a device Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 25/42] btrfs: cache if block-group is on a sequential zone Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 26/42] btrfs: save irq flags when looking up an ordered extent Naohiro Aota
2021-01-22 15:24   ` Josef Bacik
2021-01-22  6:21 ` [PATCH v13 27/42] btrfs: use ZONE_APPEND write for ZONED btrfs Naohiro Aota
2021-01-22 15:29   ` Josef Bacik
2021-01-22  6:21 ` [PATCH v13 28/42] btrfs: enable zone append writing for direct IO Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 29/42] btrfs: introduce dedicated data write path for ZONED mode Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 30/42] btrfs: serialize meta IOs on " Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 31/42] btrfs: wait existing extents before truncating Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 32/42] btrfs: avoid async metadata checksum on ZONED mode Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 33/42] btrfs: mark block groups to copy for device-replace Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 34/42] btrfs: implement cloning for ZONED device-replace Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 35/42] btrfs: implement copying " Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 36/42] btrfs: support dev-replace in ZONED mode Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 37/42] btrfs: enable relocation " Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 38/42] btrfs: relocate block group to repair IO failure in ZONED Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 39/42] btrfs: split alloc_log_tree() Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 40/42] btrfs: extend zoned allocator to use dedicated tree-log block group Naohiro Aota
2021-01-22  6:21 ` [PATCH v13 41/42] btrfs: serialize log transaction on ZONED mode Naohiro Aota
2021-01-22 15:37   ` Josef Bacik
2021-01-22  6:21 ` [PATCH v13 42/42] btrfs: reorder log node allocation Naohiro Aota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7f676b7d-ab80-5dc1-6fbf-ed29e4bf4512@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=axboe@kernel.dk \
    --cc=darrick.wong@oracle.com \
    --cc=dsterba@suse.com \
    --cc=hare@suse.com \
    --cc=hch@infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=naohiro.aota@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).