All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/17] ZNS Support for Btrfs
@ 2021-08-11 14:16 Naohiro Aota
  2021-08-11 14:16 ` [PATCH 01/17] btrfs: zoned: load zone capacity information from devices Naohiro Aota
                   ` (16 more replies)
  0 siblings, 17 replies; 25+ messages in thread
From: Naohiro Aota @ 2021-08-11 14:16 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, Naohiro Aota

This series extends zoned support for Zoned Namespace (ZNS) SSDs [1].

[1] https://zonedstorage.io/introduction/zns/

This series is available on GitHub at
v1    https://github.com/naota/linux/tree/btrfs-zns-v1
HEAD  https://github.com/naota/linux/tree/btrfs-zns

The ZNS specification introduces extra functionalities listed below.

- No conventional zones
- Zone Append write command
- Zone Capacity
- Active Zones

The first two functionalities are already addressed in the current
zoned support on btrfs. We do not rely on conventional zones, and we
use the zone append write command to write data IOs.

This series implements support for the other ones.

While userland tool needs some tweaks (e.g. using capactiy instead of
the length) to be precise, but it still works fine as it is.

* Zone Capacity Support

A zone capacity is an additional per-zone attribute that indicates the
number of usable logical blocks within each zone, starting from the
first logical block of each zone. It is always smaller or equal to the
zone size.

We can naturally map the capacity to the newly introduced
"zone_capacity" of a block group. Allocations are limited under the
zone capacity instead of the block group's length.

* Active Zones Tracking

The ZNS specification defines a limit on the number of zones that can
be in the implicit open, explicit open or closed conditions. Any zone
with such condition is defined as an active zone and correspond to any
zone that is being written or that has been only partially written. If
the maximum number of active zones is reached, we must either reset or
finish some active zones before being able to chose other zones for
storing data.

In order to not exceed the number of max active zones, we need to
track which zones are active and how the active zones are related to
the block groups. We mark a block group as "active" if the
corresponding device zones are all active. Allocating an extent will
activate a block group, and allocation from an inactive block group is
prohibited. Such active block groups are tracked in a list. Once a
block group is fully written, we deactivate it and remove it from the
list.
  
* Active Zone Aware Sequential Allocator

Handling the active zones will make the allocator complex. Here is a
summary of how find_free_extent_update_loop() behave.
  
1. If enough space is available in an active block group
   -  allocate from it (end, success)
2. If we can activate another zone on a device
   2.1 Try to allocate a new block group and activate it
   2.2 If the activation succeeds
      - allocation will be satisfied from it in the next iteration
   2.3 If the activation failed
      - Try the next cycle. Some writes may free up an active block group
3. If we cannot activate any zones
   3.1 Try to allocate in a small size by checking min_alloc_size
      - btrfs_reserve_extent() will halve the allocation size and
        restart the loop
   3.2 Nothing can be done anymore. Give up. ENOSPC

* Patch series organization

Note: patches 2 and 14 are preparation patches and can be merged
independently.

Patches 1-6 implement zone capacity support.

Patch 7 implements finishing a superblock zone once there is no space
left for new superblock.

Patches 8-13 implement the activation side of the active zone
tracking.

Patches 14 and 15 tweak the allocator to retry with a smaller size if
possible (step 3.1 in the above list)

Patches 16 and 17 implement the deactivation side of the active zone
tracking.

Naohiro Aota (17):
  btrfs: zoned: load zone capacity information from devices
  btrfs: zoned: move btrfs_free_excluded_extents out from
    btrfs_calc_zone_unusable
  btrfs: zoned: calculate free space from zone capacity
  btrfs: zoned: tweak reclaim threshold for zone capacity
  btrfs: zoned: consider zone as full when no more SB can be written
  btrfs: zoned: locate superblock position using zone capacity
  btrfs: zoned: finish superblock zone once no space left for new SB
  btrfs: zoned: load active zone information from devices
  btrfs: zoned: introduce physical_map to btrfs_block_group
  btrfs: zoned: implement active zone tracking
  btrfs: zoned: load active zone info for block group
  btrfs: zoned: activate block group on allocation
  btrfs: zoned: activate new block group
  btrfs: move ffe_ctl one level up
  btrfs: zoned: avoid chunk allocation if active block group has enough
    space
  btrfs: zoned: finish fully written block group
  btrfs: zoned: finish relocating block group

 fs/btrfs/block-group.c      |  29 ++-
 fs/btrfs/block-group.h      |   4 +
 fs/btrfs/ctree.h            |   3 +
 fs/btrfs/disk-io.c          |   6 +-
 fs/btrfs/extent-tree.c      | 204 +++++++++------
 fs/btrfs/extent_io.c        |  11 +-
 fs/btrfs/extent_io.h        |   1 +
 fs/btrfs/free-space-cache.c |  19 +-
 fs/btrfs/inode.c            |   6 +-
 fs/btrfs/relocation.c       |   4 +
 fs/btrfs/zoned.c            | 495 +++++++++++++++++++++++++++++++++---
 fs/btrfs/zoned.h            |  39 ++-
 12 files changed, 692 insertions(+), 129 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH 16/17] btrfs: zoned: finish fully written block group
@ 2021-08-11 22:09 kernel test robot
  0 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2021-08-11 22:09 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 5774 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <59c069e3890f3cbc7fa425cdcf756d241a8bfc92.1628690222.git.naohiro.aota@wdc.com>
References: <59c069e3890f3cbc7fa425cdcf756d241a8bfc92.1628690222.git.naohiro.aota@wdc.com>
TO: Naohiro Aota <naohiro.aota@wdc.com>
TO: Josef Bacik <josef@toxicpanda.com>
TO: David Sterba <dsterba@suse.com>
CC: linux-btrfs(a)vger.kernel.org
CC: Naohiro Aota <naohiro.aota@wdc.com>

Hi Naohiro,

I love your patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[cannot apply to v5.14-rc5 next-20210811]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Naohiro-Aota/ZNS-Support-for-Btrfs/20210811-222302
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
:::::: branch date: 8 hours ago
:::::: commit date: 8 hours ago
config: i386-randconfig-m021-20210810 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/btrfs/zoned.c:1956 btrfs_zone_finish_endio() error: uninitialized symbol 'ret'.

Old smatch warnings:
fs/btrfs/zoned.c:165 sb_zone_number() error: uninitialized symbol 'zone'.
fs/btrfs/zoned.c:1406 btrfs_load_block_group_zone_info() error: uninitialized symbol 'ret'.

vim +/ret +1956 fs/btrfs/zoned.c

ccecd271dc2436 Naohiro Aota 2021-08-11  1900  
ccecd271dc2436 Naohiro Aota 2021-08-11  1901  int btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
ccecd271dc2436 Naohiro Aota 2021-08-11  1902  			    u64 length)
ccecd271dc2436 Naohiro Aota 2021-08-11  1903  {
ccecd271dc2436 Naohiro Aota 2021-08-11  1904  	struct btrfs_block_group *block_group;
ccecd271dc2436 Naohiro Aota 2021-08-11  1905  	struct map_lookup *map;
ccecd271dc2436 Naohiro Aota 2021-08-11  1906  	struct btrfs_device *device;
ccecd271dc2436 Naohiro Aota 2021-08-11  1907  	u64 physical;
ccecd271dc2436 Naohiro Aota 2021-08-11  1908  	int ret;
ccecd271dc2436 Naohiro Aota 2021-08-11  1909  
ccecd271dc2436 Naohiro Aota 2021-08-11  1910  	if (!btrfs_is_zoned(fs_info))
ccecd271dc2436 Naohiro Aota 2021-08-11  1911  		return 0;
ccecd271dc2436 Naohiro Aota 2021-08-11  1912  
ccecd271dc2436 Naohiro Aota 2021-08-11  1913  	block_group = btrfs_lookup_block_group(fs_info, logical);
ccecd271dc2436 Naohiro Aota 2021-08-11  1914  	ASSERT(block_group);
ccecd271dc2436 Naohiro Aota 2021-08-11  1915  
ccecd271dc2436 Naohiro Aota 2021-08-11  1916  	if (logical + length < block_group->start + block_group->zone_capacity) {
ccecd271dc2436 Naohiro Aota 2021-08-11  1917  		ret = 0;
ccecd271dc2436 Naohiro Aota 2021-08-11  1918  		goto out;
ccecd271dc2436 Naohiro Aota 2021-08-11  1919  	}
ccecd271dc2436 Naohiro Aota 2021-08-11  1920  
ccecd271dc2436 Naohiro Aota 2021-08-11  1921  	spin_lock(&block_group->lock);
ccecd271dc2436 Naohiro Aota 2021-08-11  1922  
ccecd271dc2436 Naohiro Aota 2021-08-11  1923  	if (!block_group->zone_is_active) {
ccecd271dc2436 Naohiro Aota 2021-08-11  1924  		spin_unlock(&block_group->lock);
ccecd271dc2436 Naohiro Aota 2021-08-11  1925  		ret = 0;
ccecd271dc2436 Naohiro Aota 2021-08-11  1926  		goto out;
ccecd271dc2436 Naohiro Aota 2021-08-11  1927  	}
ccecd271dc2436 Naohiro Aota 2021-08-11  1928  
ccecd271dc2436 Naohiro Aota 2021-08-11  1929  	block_group->zone_is_active = 0;
ccecd271dc2436 Naohiro Aota 2021-08-11  1930  	/* We should have consumed all the free space */
ccecd271dc2436 Naohiro Aota 2021-08-11  1931  	ASSERT(block_group->alloc_offset == block_group->zone_capacity);
ccecd271dc2436 Naohiro Aota 2021-08-11  1932  	ASSERT(block_group->free_space_ctl->free_space == 0);
ccecd271dc2436 Naohiro Aota 2021-08-11  1933  	btrfs_clear_treelog_bg(block_group);
ccecd271dc2436 Naohiro Aota 2021-08-11  1934  	spin_unlock(&block_group->lock);
ccecd271dc2436 Naohiro Aota 2021-08-11  1935  
ccecd271dc2436 Naohiro Aota 2021-08-11  1936  	map = block_group->physical_map;
ccecd271dc2436 Naohiro Aota 2021-08-11  1937  	device = map->stripes[0].dev;
ccecd271dc2436 Naohiro Aota 2021-08-11  1938  	physical = map->stripes[0].physical;
ccecd271dc2436 Naohiro Aota 2021-08-11  1939  
ccecd271dc2436 Naohiro Aota 2021-08-11  1940  	if (!device->zone_info->max_active_zones) {
ccecd271dc2436 Naohiro Aota 2021-08-11  1941  		ret = 0;
ccecd271dc2436 Naohiro Aota 2021-08-11  1942  		goto out;
ccecd271dc2436 Naohiro Aota 2021-08-11  1943  	}
ccecd271dc2436 Naohiro Aota 2021-08-11  1944  
ccecd271dc2436 Naohiro Aota 2021-08-11  1945  	btrfs_dev_clear_active_zone(device, physical);
ccecd271dc2436 Naohiro Aota 2021-08-11  1946  
ccecd271dc2436 Naohiro Aota 2021-08-11  1947  	spin_lock(&fs_info->zone_active_bgs_lock);
ccecd271dc2436 Naohiro Aota 2021-08-11  1948  	ASSERT(!list_empty(&block_group->active_bg_list));
ccecd271dc2436 Naohiro Aota 2021-08-11  1949  	list_del_init(&block_group->active_bg_list);
ccecd271dc2436 Naohiro Aota 2021-08-11  1950  	spin_unlock(&fs_info->zone_active_bgs_lock);
ccecd271dc2436 Naohiro Aota 2021-08-11  1951  
ccecd271dc2436 Naohiro Aota 2021-08-11  1952  	btrfs_put_block_group(block_group);
ccecd271dc2436 Naohiro Aota 2021-08-11  1953  
ccecd271dc2436 Naohiro Aota 2021-08-11  1954  out:
ccecd271dc2436 Naohiro Aota 2021-08-11  1955  	btrfs_put_block_group(block_group);
ccecd271dc2436 Naohiro Aota 2021-08-11 @1956  	return ret;

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35031 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-08-16  4:34 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11 14:16 [PATCH 00/17] ZNS Support for Btrfs Naohiro Aota
2021-08-11 14:16 ` [PATCH 01/17] btrfs: zoned: load zone capacity information from devices Naohiro Aota
2021-08-11 14:16 ` [PATCH 02/17] btrfs: zoned: move btrfs_free_excluded_extents out from btrfs_calc_zone_unusable Naohiro Aota
2021-08-11 14:16 ` [PATCH 03/17] btrfs: zoned: calculate free space from zone capacity Naohiro Aota
2021-08-11 14:16 ` [PATCH 04/17] btrfs: zoned: tweak reclaim threshold for " Naohiro Aota
2021-08-11 14:16 ` [PATCH 05/17] btrfs: zoned: consider zone as full when no more SB can be written Naohiro Aota
2021-08-11 14:16 ` [PATCH 06/17] btrfs: zoned: locate superblock position using zone capacity Naohiro Aota
2021-08-11 14:16 ` [PATCH 07/17] btrfs: zoned: finish superblock zone once no space left for new SB Naohiro Aota
2021-08-11 14:16 ` [PATCH 08/17] btrfs: zoned: load active zone information from devices Naohiro Aota
2021-08-11 14:16 ` [PATCH 09/17] btrfs: zoned: introduce physical_map to btrfs_block_group Naohiro Aota
2021-08-11 14:16 ` [PATCH 10/17] btrfs: zoned: implement active zone tracking Naohiro Aota
2021-08-11 14:16 ` [PATCH 11/17] btrfs: zoned: load active zone info for block group Naohiro Aota
2021-08-11 14:16 ` [PATCH 12/17] btrfs: zoned: activate block group on allocation Naohiro Aota
2021-08-11 14:16 ` [PATCH 13/17] btrfs: zoned: activate new block group Naohiro Aota
2021-08-11 14:16 ` [PATCH 14/17] btrfs: move ffe_ctl one level up Naohiro Aota
2021-08-11 14:16 ` [PATCH 15/17] btrfs: zoned: avoid chunk allocation if active block group has enough space Naohiro Aota
2021-08-11 14:16 ` [PATCH 16/17] btrfs: zoned: finish fully written block group Naohiro Aota
2021-08-11 17:26   ` kernel test robot
2021-08-11 17:26     ` kernel test robot
2021-08-16  4:34     ` Naohiro Aota
2021-08-16  4:34       ` Naohiro Aota
2021-08-11 18:37   ` kernel test robot
2021-08-11 18:37     ` kernel test robot
2021-08-11 14:16 ` [PATCH 17/17] btrfs: zoned: finish relocating " Naohiro Aota
2021-08-11 22:09 [PATCH 16/17] btrfs: zoned: finish fully written " kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.