linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed
* [f2fs-dev] [PATCH 0/2] f2fs: zns zone-capacity support
@ 2020-07-02 15:53 Aravind Ramesh
  2020-07-02 15:54 ` [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size Aravind Ramesh
  2020-07-02 15:54 ` [f2fs-dev] [PATCH 2/2] f2fs: manage zone capacity during writes and gc Aravind Ramesh
  0 siblings, 2 replies; 16+ messages in thread
From: Aravind Ramesh @ 2020-07-02 15:53 UTC (permalink / raw)
  To: jaegeuk, yuchao0, linux-fsdevel, linux-f2fs-devel, hch
  Cc: niklas.cassel, Damien.LeMoal, Aravind Ramesh, matias.bjorling

NVM Express Zoned Namespace command set specification allows host software
to communicate with a NVM subsystem using zones. ZNS defines a host-managed
zoned block device model for NVMe devices. It divides the logical address
space of a namespace into zones. Each zone provides a LBA range that shall
be written sequentially. An explicit reset of zone is needed to write to
the zone again.

ZNS defines a per-zone capacity which can be equal or less than the
zone-size. Zone-capacity is the number of usable blocks in the zone.
This patchset implements support for ZNS devices with a zone-capacity
that is less that the device zone-size.

The first patch checks if zone-capacity is less than zone-size, if it is,
then any segment which starts after the zone-capacity is marked as
not-free in the free segment bitmap at initial mount time. These segments
are marked as permanently used so they are not allocated for writes and
consequently not needed to be garbage collected. In case the zone-capacity
is not aligned to default segment size(2MB), then a segment can start
before the zone-capacity and span across zone-capacity boundary.
Such spanning segments are also considered as usable segments.

The second patch tracks the usable blocks in a spanning segment, so that
during writes and GC, usable blocks in spanning segment is calculated to
ensure writes/reads do not cross the zone-capacity boundary.

This series is based on the git tree
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git branch dev
and requires the below patch in order to build.
https://lore.kernel.org/linux-nvme/20200701063720.GA28954@lst.de/T/#m19e0197ae1837b7fe959b13fbc2a859b1f2abc1e

The above patch has been merged to the nvme-5.9 branch in the git tree:
git://git.infradead.org/nvme.git

Jaegeuk, perhaps you can carry this patch through your tree as well ?


Aravind Ramesh (2):
  f2fs: support zone capacity less than zone size
  f2fs: manage zone capacity during writes and gc

 fs/f2fs/f2fs.h    |   5 ++
 fs/f2fs/gc.c      |  27 +++++---
 fs/f2fs/gc.h      |  42 +++++++++++--
 fs/f2fs/segment.c | 154 ++++++++++++++++++++++++++++++++++++++++++----
 fs/f2fs/segment.h |  12 ++--
 fs/f2fs/super.c   |  41 ++++++++++--
 6 files changed, 247 insertions(+), 34 deletions(-)

-- 
2.19.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-02 15:53 [f2fs-dev] [PATCH 0/2] f2fs: zns zone-capacity support Aravind Ramesh
@ 2020-07-02 15:54 ` Aravind Ramesh
  2020-07-07  0:07   ` Jaegeuk Kim
  2020-07-07 12:18   ` Chao Yu
  2020-07-02 15:54 ` [f2fs-dev] [PATCH 2/2] f2fs: manage zone capacity during writes and gc Aravind Ramesh
  1 sibling, 2 replies; 16+ messages in thread
From: Aravind Ramesh @ 2020-07-02 15:54 UTC (permalink / raw)
  To: jaegeuk, yuchao0, linux-fsdevel, linux-f2fs-devel, hch
  Cc: niklas.cassel, Damien Le Moal, Aravind Ramesh, matias.bjorling

NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.

If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.

Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 fs/f2fs/f2fs.h    |   5 ++
 fs/f2fs/segment.c | 136 ++++++++++++++++++++++++++++++++++++++++++++--
 fs/f2fs/segment.h |   6 +-
 fs/f2fs/super.c   |  41 ++++++++++++--
 4 files changed, 176 insertions(+), 12 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e6e47618a357..73219e4e1ba4 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1232,6 +1232,7 @@ struct f2fs_dev_info {
 #ifdef CONFIG_BLK_DEV_ZONED
 	unsigned int nr_blkz;		/* Total number of zones */
 	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
+	block_t *zone_capacity_blocks;  /* Array of zone capacity in blks */
 #endif
 };
 
@@ -3395,6 +3396,10 @@ void f2fs_destroy_segment_manager_caches(void);
 int f2fs_rw_hint_to_seg_type(enum rw_hint hint);
 enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
 			enum page_type type, enum temp_type temp);
+unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
+			unsigned int segno);
+unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
+			unsigned int segno);
 
 /*
  * checkpoint.c
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c35614d255e1..d2156f3f56a5 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -4294,9 +4294,12 @@ static void init_free_segmap(struct f2fs_sb_info *sbi)
 {
 	unsigned int start;
 	int type;
+	struct seg_entry *sentry;
 
 	for (start = 0; start < MAIN_SEGS(sbi); start++) {
-		struct seg_entry *sentry = get_seg_entry(sbi, start);
+		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
+			continue;
+		sentry = get_seg_entry(sbi, start);
 		if (!sentry->valid_blocks)
 			__set_free(sbi, start);
 		else
@@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
 	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
 	struct free_segmap_info *free_i = FREE_I(sbi);
 	unsigned int segno = 0, offset = 0, secno;
-	unsigned short valid_blocks;
+	unsigned short valid_blocks, usable_blks_in_seg;
 	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
 
 	while (1) {
@@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
 			break;
 		offset = segno + 1;
 		valid_blocks = get_valid_blocks(sbi, segno, false);
-		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
+		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
+		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
 			continue;
-		if (valid_blocks > sbi->blocks_per_seg) {
+		if (valid_blocks > usable_blks_in_seg) {
 			f2fs_bug_on(sbi, 1);
 			continue;
 		}
@@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
 
 	return 0;
 }
+
+static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
+						unsigned int dev_idx)
+{
+	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
+		return true;
+	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq);
+}
+
+/* Return the zone index in the given device */
+static unsigned int get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
+					int dev_idx)
+{
+	block_t sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
+
+	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
+						sbi->log_blocks_per_blkz;
+}
+
+/*
+ * Return the usable segments in a section based on the zone's
+ * corresponding zone capacity. Zone is equal to a section.
+ */
+static inline unsigned int f2fs_usable_zone_segs_in_sec(
+		struct f2fs_sb_info *sbi, unsigned int segno)
+{
+	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
+
+	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
+	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno), dev_idx);
+
+	/* Conventional zone's capacity is always equal to zone size */
+	if (is_conv_zone(sbi, zone_idx, dev_idx))
+		return sbi->segs_per_sec;
+
+	/*
+	 * If the zone_capacity_blocks array is NULL, then zone capacity
+	 * is equal to the zone size for all zones
+	 */
+	if (!FDEV(dev_idx).zone_capacity_blocks)
+		return sbi->segs_per_sec;
+
+	/* Get the segment count beyond zone capacity block */
+	unusable_segs_in_sec = (sbi->blocks_per_blkz -
+				FDEV(dev_idx).zone_capacity_blocks[zone_idx]) >>
+				sbi->log_blocks_per_seg;
+	return sbi->segs_per_sec - unusable_segs_in_sec;
+}
+
+/*
+ * Return the number of usable blocks in a segment. The number of blocks
+ * returned is always equal to the number of blocks in a segment for
+ * segments fully contained within a sequential zone capacity or a
+ * conventional zone. For segments partially contained in a sequential
+ * zone capacity, the number of usable blocks up to the zone capacity
+ * is returned. 0 is returned in all other cases.
+ */
+static inline unsigned int f2fs_usable_zone_blks_in_seg(
+			struct f2fs_sb_info *sbi, unsigned int segno)
+{
+	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
+	unsigned int zone_idx, dev_idx, secno;
+
+	secno = GET_SEC_FROM_SEG(sbi, segno);
+	seg_start = START_BLOCK(sbi, segno);
+	dev_idx = f2fs_target_device_index(sbi, seg_start);
+	zone_idx = get_zone_idx(sbi, secno, dev_idx);
+
+	/*
+	 * Conventional zone's capacity is always equal to zone size,
+	 * so, blocks per segment is unchanged.
+	 */
+	if (is_conv_zone(sbi, zone_idx, dev_idx))
+		return sbi->blocks_per_seg;
+
+	if (!FDEV(dev_idx).zone_capacity_blocks)
+		return sbi->blocks_per_seg;
+
+	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
+	sec_cap_blkaddr = sec_start_blkaddr +
+				FDEV(dev_idx).zone_capacity_blocks[zone_idx];
+
+	/*
+	 * If segment starts before zone capacity and spans beyond
+	 * zone capacity, then usable blocks are from seg start to
+	 * zone capacity. If the segment starts after the zone capacity,
+	 * then there are no usable blocks.
+	 */
+	if (seg_start >= sec_cap_blkaddr)
+		return 0;
+	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
+		return sec_cap_blkaddr - seg_start;
+
+	return sbi->blocks_per_seg;
+}
 #else
 int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
 {
@@ -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
 {
 	return 0;
 }
+
+static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info *sbi,
+							unsigned int segno)
+{
+	return 0;
+}
+
+static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info *sbi,
+							unsigned int segno)
+{
+	return 0;
+}
 #endif
+unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
+					unsigned int segno)
+{
+	if (f2fs_sb_has_blkzoned(sbi))
+		return f2fs_usable_zone_blks_in_seg(sbi, segno);
+
+	return sbi->blocks_per_seg;
+}
+
+unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
+					unsigned int segno)
+{
+	if (f2fs_sb_has_blkzoned(sbi))
+		return f2fs_usable_zone_segs_in_sec(sbi, segno);
+
+	return sbi->segs_per_sec;
+}
 
 /*
  * Update min, max modified time for cost-benefit GC algorithm
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index f261e3e6a69b..79b0dc33feaf 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -411,6 +411,7 @@ static inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
 	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
 	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
 	unsigned int next;
+	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
 
 	spin_lock(&free_i->segmap_lock);
 	clear_bit(segno, free_i->free_segmap);
@@ -418,7 +419,7 @@ static inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
 
 	next = find_next_bit(free_i->free_segmap,
 			start_segno + sbi->segs_per_sec, start_segno);
-	if (next >= start_segno + sbi->segs_per_sec) {
+	if (next >= start_segno + usable_segs) {
 		clear_bit(secno, free_i->free_secmap);
 		free_i->free_sections++;
 	}
@@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
 	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
 	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
 	unsigned int next;
+	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
 
 	spin_lock(&free_i->segmap_lock);
 	if (test_and_clear_bit(segno, free_i->free_segmap)) {
@@ -453,7 +455,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
 			goto skip_free;
 		next = find_next_bit(free_i->free_segmap,
 				start_segno + sbi->segs_per_sec, start_segno);
-		if (next >= start_segno + sbi->segs_per_sec) {
+		if (next >= start_segno + usable_segs) {
 			if (test_and_clear_bit(secno, free_i->free_secmap))
 				free_i->free_sections++;
 		}
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 80cb7cd358f8..2686b07ae7eb 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1164,6 +1164,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
 		blkdev_put(FDEV(i).bdev, FMODE_EXCL);
 #ifdef CONFIG_BLK_DEV_ZONED
 		kvfree(FDEV(i).blkz_seq);
+		kvfree(FDEV(i).zone_capacity_blocks);
 #endif
 	}
 	kvfree(sbi->devs);
@@ -3039,13 +3040,26 @@ static int init_percpu_info(struct f2fs_sb_info *sbi)
 }
 
 #ifdef CONFIG_BLK_DEV_ZONED
+
+struct f2fs_report_zones_args {
+	struct f2fs_dev_info *dev;
+	bool zone_cap_mismatch;
+};
+
 static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
-			       void *data)
+			      void *data)
 {
-	struct f2fs_dev_info *dev = data;
+	struct f2fs_report_zones_args *rz_args = data;
+
+	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		return 0;
+
+	set_bit(idx, rz_args->dev->blkz_seq);
+	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
+						F2FS_LOG_SECTORS_PER_BLOCK;
+	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
+		rz_args->zone_cap_mismatch = true;
 
-	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
-		set_bit(idx, dev->blkz_seq);
 	return 0;
 }
 
@@ -3053,6 +3067,7 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
 {
 	struct block_device *bdev = FDEV(devi).bdev;
 	sector_t nr_sectors = bdev->bd_part->nr_sects;
+	struct f2fs_report_zones_args rep_zone_arg;
 	int ret;
 
 	if (!f2fs_sb_has_blkzoned(sbi))
@@ -3078,12 +3093,26 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
 	if (!FDEV(devi).blkz_seq)
 		return -ENOMEM;
 
-	/* Get block zones type */
+	/* Get block zones type and zone-capacity */
+	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
+					FDEV(devi).nr_blkz * sizeof(block_t),
+					GFP_KERNEL);
+	if (!FDEV(devi).zone_capacity_blocks)
+		return -ENOMEM;
+
+	rep_zone_arg.dev = &FDEV(devi);
+	rep_zone_arg.zone_cap_mismatch = false;
+
 	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
-				  &FDEV(devi));
+				  &rep_zone_arg);
 	if (ret < 0)
 		return ret;
 
+	if (!rep_zone_arg.zone_cap_mismatch) {
+		kvfree(FDEV(devi).zone_capacity_blocks);
+		FDEV(devi).zone_capacity_blocks = NULL;
+	}
+
 	return 0;
 }
 #endif
-- 
2.19.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [f2fs-dev] [PATCH 2/2] f2fs: manage zone capacity during writes and gc
  2020-07-02 15:53 [f2fs-dev] [PATCH 0/2] f2fs: zns zone-capacity support Aravind Ramesh
  2020-07-02 15:54 ` [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size Aravind Ramesh
@ 2020-07-02 15:54 ` Aravind Ramesh
  1 sibling, 0 replies; 16+ messages in thread
From: Aravind Ramesh @ 2020-07-02 15:54 UTC (permalink / raw)
  To: jaegeuk, yuchao0, linux-fsdevel, linux-f2fs-devel, hch
  Cc: niklas.cassel, Damien Le Moal, Aravind Ramesh, matias.bjorling

Manage the usable segments in a section and usable blocks per segment
during write and gc. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.

Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 fs/f2fs/gc.c      | 27 ++++++++++++++++++++-------
 fs/f2fs/gc.h      | 42 ++++++++++++++++++++++++++++++++++++++----
 fs/f2fs/segment.c | 18 ++++++++++--------
 fs/f2fs/segment.h |  6 +++---
 4 files changed, 71 insertions(+), 22 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 9a40761445d3..dfa6d91cffcb 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -266,13 +266,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno)
 	unsigned char age = 0;
 	unsigned char u;
 	unsigned int i;
+	unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno);
 
-	for (i = 0; i < sbi->segs_per_sec; i++)
+	for (i = 0; i < usable_segs_per_sec; i++)
 		mtime += get_seg_entry(sbi, start + i)->mtime;
 	vblocks = get_valid_blocks(sbi, segno, true);
 
-	mtime = div_u64(mtime, sbi->segs_per_sec);
-	vblocks = div_u64(vblocks, sbi->segs_per_sec);
+	mtime = div_u64(mtime, usable_segs_per_sec);
+	vblocks = div_u64(vblocks, usable_segs_per_sec);
 
 	u = (vblocks * 100) >> sbi->log_blocks_per_seg;
 
@@ -536,6 +537,7 @@ static int gc_node_segment(struct f2fs_sb_info *sbi,
 	int phase = 0;
 	bool fggc = (gc_type == FG_GC);
 	int submitted = 0;
+	unsigned int usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
 
 	start_addr = START_BLOCK(sbi, segno);
 
@@ -545,7 +547,7 @@ static int gc_node_segment(struct f2fs_sb_info *sbi,
 	if (fggc && phase == 2)
 		atomic_inc(&sbi->wb_sync_req[NODE]);
 
-	for (off = 0; off < sbi->blocks_per_seg; off++, entry++) {
+	for (off = 0; off < usable_blks_in_seg; off++, entry++) {
 		nid_t nid = le32_to_cpu(entry->nid);
 		struct page *node_page;
 		struct node_info ni;
@@ -1033,13 +1035,14 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
 	int off;
 	int phase = 0;
 	int submitted = 0;
+	unsigned int usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
 
 	start_addr = START_BLOCK(sbi, segno);
 
 next_step:
 	entry = sum;
 
-	for (off = 0; off < sbi->blocks_per_seg; off++, entry++) {
+	for (off = 0; off < usable_blks_in_seg; off++, entry++) {
 		struct page *data_page;
 		struct inode *inode;
 		struct node_info dni; /* dnode info for the data */
@@ -1201,7 +1204,16 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
 						SUM_TYPE_DATA : SUM_TYPE_NODE;
 	int submitted = 0;
 
-	if (__is_large_section(sbi))
+       /*
+	* zone-capacity can be less than zone-size in zoned devices,
+	* resulting in less than expected usable segments in the zone,
+	* calculate the end segno in the zone which can be garbage collected
+	*/
+	if (f2fs_sb_has_blkzoned(sbi))
+		end_segno -= sbi->segs_per_sec -
+					f2fs_usable_segs_in_sec(sbi, segno);
+
+	else if (__is_large_section(sbi))
 		end_segno = rounddown(end_segno, sbi->segs_per_sec);
 
 	/* readahead multi ssa blocks those have contiguous address */
@@ -1356,7 +1368,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
 		goto stop;
 
 	seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type);
-	if (gc_type == FG_GC && seg_freed == sbi->segs_per_sec)
+	if (gc_type == FG_GC &&
+		seg_freed == f2fs_usable_segs_in_sec(sbi, segno))
 		sec_freed++;
 	total_freed += seg_freed;
 
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
index db3c61046aa4..463b4e38b864 100644
--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -44,13 +44,47 @@ struct gc_inode_list {
 /*
  * inline functions
  */
+
+/*
+ * On a Zoned device zone-capacity can be less than zone-size and if
+ * zone-capacity is not aligned to f2fs segment size(2MB), then the segment
+ * starting just before zone-capacity has some blocks spanning across the
+ * zone-capacity, these blocks are not usable.
+ * Such spanning segments can be in free list so calculate the sum of usable
+ * blocks in currently free segments including normal and spanning segments.
+ */
+static inline block_t free_segs_blk_count_zoned(struct f2fs_sb_info *sbi)
+{
+	block_t free_seg_blks = 0;
+	struct free_segmap_info *free_i = FREE_I(sbi);
+	int j;
+
+	for (j = 0; j < MAIN_SEGS(sbi); j++)
+		if (!test_bit(j, free_i->free_segmap))
+			free_seg_blks += f2fs_usable_blks_in_seg(sbi, j);
+
+	return free_seg_blks;
+}
+
+static inline block_t free_segs_blk_count(struct f2fs_sb_info *sbi)
+{
+	if (f2fs_sb_has_blkzoned(sbi))
+		return free_segs_blk_count_zoned(sbi);
+
+	return free_segments(sbi) << sbi->log_blocks_per_seg;
+}
+
 static inline block_t free_user_blocks(struct f2fs_sb_info *sbi)
 {
-	if (free_segments(sbi) < overprovision_segments(sbi))
+	block_t free_blks, ovp_blks;
+
+	free_blks = free_segs_blk_count(sbi);
+	ovp_blks = overprovision_segments(sbi) << sbi->log_blocks_per_seg;
+
+	if (free_blks < ovp_blks)
 		return 0;
-	else
-		return (free_segments(sbi) - overprovision_segments(sbi))
-			<< sbi->log_blocks_per_seg;
+
+	return free_blks - ovp_blks;
 }
 
 static inline block_t limit_invalid_user_blocks(struct f2fs_sb_info *sbi)
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index d2156f3f56a5..d75c1849dc83 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -869,10 +869,10 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno)
 	ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno);
 
 	if (valid_blocks == 0 && (!is_sbi_flag_set(sbi, SBI_CP_DISABLED) ||
-				ckpt_valid_blocks == sbi->blocks_per_seg)) {
+		ckpt_valid_blocks == f2fs_usable_blks_in_seg(sbi, segno))) {
 		__locate_dirty_segment(sbi, segno, PRE);
 		__remove_dirty_segment(sbi, segno, DIRTY);
-	} else if (valid_blocks < sbi->blocks_per_seg) {
+	} else if (valid_blocks < f2fs_usable_blks_in_seg(sbi, segno)) {
 		__locate_dirty_segment(sbi, segno, DIRTY);
 	} else {
 		/* Recovery routine with SSR needs this */
@@ -915,9 +915,11 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info *sbi)
 	for_each_set_bit(segno, dirty_i->dirty_segmap[DIRTY], MAIN_SEGS(sbi)) {
 		se = get_seg_entry(sbi, segno);
 		if (IS_NODESEG(se->type))
-			holes[NODE] += sbi->blocks_per_seg - se->valid_blocks;
+			holes[NODE] += f2fs_usable_blks_in_seg(sbi, segno) -
+							se->valid_blocks;
 		else
-			holes[DATA] += sbi->blocks_per_seg - se->valid_blocks;
+			holes[DATA] += f2fs_usable_blks_in_seg(sbi, segno) -
+							se->valid_blocks;
 	}
 	mutex_unlock(&dirty_i->seglist_lock);
 
@@ -2167,7 +2169,7 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
 	offset = GET_BLKOFF_FROM_SEG0(sbi, blkaddr);
 
 	f2fs_bug_on(sbi, (new_vblocks >> (sizeof(unsigned short) << 3) ||
-				(new_vblocks > sbi->blocks_per_seg)));
+			(new_vblocks > f2fs_usable_blks_in_seg(sbi, segno))));
 
 	se->valid_blocks = new_vblocks;
 	se->mtime = get_mtime(sbi, false);
@@ -2933,9 +2935,9 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range)
 static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type)
 {
 	struct curseg_info *curseg = CURSEG_I(sbi, type);
-	if (curseg->next_blkoff < sbi->blocks_per_seg)
-		return true;
-	return false;
+
+	return curseg->next_blkoff < f2fs_usable_blks_in_seg(sbi,
+							curseg->segno);
 }
 
 int f2fs_rw_hint_to_seg_type(enum rw_hint hint)
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 79b0dc33feaf..170df8c84f75 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -548,8 +548,8 @@ static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi)
 	/* check current node segment */
 	for (i = CURSEG_HOT_NODE; i <= CURSEG_COLD_NODE; i++) {
 		segno = CURSEG_I(sbi, i)->segno;
-		left_blocks = sbi->blocks_per_seg -
-			get_seg_entry(sbi, segno)->ckpt_valid_blocks;
+		left_blocks = f2fs_usable_blks_in_seg(sbi, segno) -
+				get_seg_entry(sbi, segno)->ckpt_valid_blocks;
 
 		if (node_blocks > left_blocks)
 			return false;
@@ -557,7 +557,7 @@ static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi)
 
 	/* check current data segment */
 	segno = CURSEG_I(sbi, CURSEG_HOT_DATA)->segno;
-	left_blocks = sbi->blocks_per_seg -
+	left_blocks = f2fs_usable_blks_in_seg(sbi, segno) -
 			get_seg_entry(sbi, segno)->ckpt_valid_blocks;
 	if (dent_blocks > left_blocks)
 		return false;
-- 
2.19.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-02 15:54 ` [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size Aravind Ramesh
@ 2020-07-07  0:07   ` Jaegeuk Kim
  2020-07-07  3:27     ` Aravind Ramesh
  2020-07-07 12:18   ` Chao Yu
  1 sibling, 1 reply; 16+ messages in thread
From: Jaegeuk Kim @ 2020-07-07  0:07 UTC (permalink / raw)
  To: Aravind Ramesh
  Cc: niklas.cassel, Damien.LeMoal, linux-f2fs-devel, linux-fsdevel,
	hch, matias.bjorling

Hi,

Is there any dependency to the patch? And, could you please run checkpatch
script?

Thanks,

On 07/02, Aravind Ramesh wrote:
> NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
> Zone-capacity indicates the maximum number of sectors that are usable in
> a zone beginning from the first sector of the zone. This makes the sectors
> sectors after the zone-capacity till zone-size to be unusable.
> This patch set tracks zone-size and zone-capacity in zoned devices and
> calculate the usable blocks per segment and usable segments per section.
> 
> If zone-capacity is less than zone-size mark only those segments which
> start before zone-capacity as free segments. All segments at and beyond
> zone-capacity are treated as permanently used segments. In cases where
> zone-capacity does not align with segment size the last segment will start
> before zone-capacity and end beyond the zone-capacity of the zone. For
> such spanning segments only sectors within the zone-capacity are used.
> 
> Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>  fs/f2fs/f2fs.h    |   5 ++
>  fs/f2fs/segment.c | 136 ++++++++++++++++++++++++++++++++++++++++++++--
>  fs/f2fs/segment.h |   6 +-
>  fs/f2fs/super.c   |  41 ++++++++++++--
>  4 files changed, 176 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index e6e47618a357..73219e4e1ba4 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1232,6 +1232,7 @@ struct f2fs_dev_info {
>  #ifdef CONFIG_BLK_DEV_ZONED
>  	unsigned int nr_blkz;		/* Total number of zones */
>  	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
> +	block_t *zone_capacity_blocks;  /* Array of zone capacity in blks */
>  #endif
>  };
>  
> @@ -3395,6 +3396,10 @@ void f2fs_destroy_segment_manager_caches(void);
>  int f2fs_rw_hint_to_seg_type(enum rw_hint hint);
>  enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
>  			enum page_type type, enum temp_type temp);
> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> +			unsigned int segno);
> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> +			unsigned int segno);
>  
>  /*
>   * checkpoint.c
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index c35614d255e1..d2156f3f56a5 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct f2fs_sb_info *sbi)
>  {
>  	unsigned int start;
>  	int type;
> +	struct seg_entry *sentry;
>  
>  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
> -		struct seg_entry *sentry = get_seg_entry(sbi, start);
> +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
> +			continue;
> +		sentry = get_seg_entry(sbi, start);
>  		if (!sentry->valid_blocks)
>  			__set_free(sbi, start);
>  		else
> @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
>  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>  	struct free_segmap_info *free_i = FREE_I(sbi);
>  	unsigned int segno = 0, offset = 0, secno;
> -	unsigned short valid_blocks;
> +	unsigned short valid_blocks, usable_blks_in_seg;
>  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
>  
>  	while (1) {
> @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
>  			break;
>  		offset = segno + 1;
>  		valid_blocks = get_valid_blocks(sbi, segno, false);
> -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
> +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
> +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
>  			continue;
> -		if (valid_blocks > sbi->blocks_per_seg) {
> +		if (valid_blocks > usable_blks_in_seg) {
>  			f2fs_bug_on(sbi, 1);
>  			continue;
>  		}
> @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
>  
>  	return 0;
>  }
> +
> +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
> +						unsigned int dev_idx)
> +{
> +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
> +		return true;
> +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq);
> +}
> +
> +/* Return the zone index in the given device */
> +static unsigned int get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
> +					int dev_idx)
> +{
> +	block_t sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
> +
> +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
> +						sbi->log_blocks_per_blkz;
> +}
> +
> +/*
> + * Return the usable segments in a section based on the zone's
> + * corresponding zone capacity. Zone is equal to a section.
> + */
> +static inline unsigned int f2fs_usable_zone_segs_in_sec(
> +		struct f2fs_sb_info *sbi, unsigned int segno)
> +{
> +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
> +
> +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
> +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno), dev_idx);
> +
> +	/* Conventional zone's capacity is always equal to zone size */
> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> +		return sbi->segs_per_sec;
> +
> +	/*
> +	 * If the zone_capacity_blocks array is NULL, then zone capacity
> +	 * is equal to the zone size for all zones
> +	 */
> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> +		return sbi->segs_per_sec;
> +
> +	/* Get the segment count beyond zone capacity block */
> +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
> +				FDEV(dev_idx).zone_capacity_blocks[zone_idx]) >>
> +				sbi->log_blocks_per_seg;
> +	return sbi->segs_per_sec - unusable_segs_in_sec;
> +}
> +
> +/*
> + * Return the number of usable blocks in a segment. The number of blocks
> + * returned is always equal to the number of blocks in a segment for
> + * segments fully contained within a sequential zone capacity or a
> + * conventional zone. For segments partially contained in a sequential
> + * zone capacity, the number of usable blocks up to the zone capacity
> + * is returned. 0 is returned in all other cases.
> + */
> +static inline unsigned int f2fs_usable_zone_blks_in_seg(
> +			struct f2fs_sb_info *sbi, unsigned int segno)
> +{
> +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
> +	unsigned int zone_idx, dev_idx, secno;
> +
> +	secno = GET_SEC_FROM_SEG(sbi, segno);
> +	seg_start = START_BLOCK(sbi, segno);
> +	dev_idx = f2fs_target_device_index(sbi, seg_start);
> +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
> +
> +	/*
> +	 * Conventional zone's capacity is always equal to zone size,
> +	 * so, blocks per segment is unchanged.
> +	 */
> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> +		return sbi->blocks_per_seg;
> +
> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> +		return sbi->blocks_per_seg;
> +
> +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
> +	sec_cap_blkaddr = sec_start_blkaddr +
> +				FDEV(dev_idx).zone_capacity_blocks[zone_idx];
> +
> +	/*
> +	 * If segment starts before zone capacity and spans beyond
> +	 * zone capacity, then usable blocks are from seg start to
> +	 * zone capacity. If the segment starts after the zone capacity,
> +	 * then there are no usable blocks.
> +	 */
> +	if (seg_start >= sec_cap_blkaddr)
> +		return 0;
> +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
> +		return sec_cap_blkaddr - seg_start;
> +
> +	return sbi->blocks_per_seg;
> +}
>  #else
>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
>  {
> @@ -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
>  {
>  	return 0;
>  }
> +
> +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info *sbi,
> +							unsigned int segno)
> +{
> +	return 0;
> +}
> +
> +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info *sbi,
> +							unsigned int segno)
> +{
> +	return 0;
> +}
>  #endif
> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> +					unsigned int segno)
> +{
> +	if (f2fs_sb_has_blkzoned(sbi))
> +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
> +
> +	return sbi->blocks_per_seg;
> +}
> +
> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> +					unsigned int segno)
> +{
> +	if (f2fs_sb_has_blkzoned(sbi))
> +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
> +
> +	return sbi->segs_per_sec;
> +}
>  
>  /*
>   * Update min, max modified time for cost-benefit GC algorithm
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> index f261e3e6a69b..79b0dc33feaf 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -411,6 +411,7 @@ static inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>  	unsigned int next;
> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>  
>  	spin_lock(&free_i->segmap_lock);
>  	clear_bit(segno, free_i->free_segmap);
> @@ -418,7 +419,7 @@ static inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
>  
>  	next = find_next_bit(free_i->free_segmap,
>  			start_segno + sbi->segs_per_sec, start_segno);
> -	if (next >= start_segno + sbi->segs_per_sec) {
> +	if (next >= start_segno + usable_segs) {
>  		clear_bit(secno, free_i->free_secmap);
>  		free_i->free_sections++;
>  	}
> @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>  	unsigned int next;
> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>  
>  	spin_lock(&free_i->segmap_lock);
>  	if (test_and_clear_bit(segno, free_i->free_segmap)) {
> @@ -453,7 +455,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
>  			goto skip_free;
>  		next = find_next_bit(free_i->free_segmap,
>  				start_segno + sbi->segs_per_sec, start_segno);
> -		if (next >= start_segno + sbi->segs_per_sec) {
> +		if (next >= start_segno + usable_segs) {
>  			if (test_and_clear_bit(secno, free_i->free_secmap))
>  				free_i->free_sections++;
>  		}
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 80cb7cd358f8..2686b07ae7eb 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
>  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);
>  #ifdef CONFIG_BLK_DEV_ZONED
>  		kvfree(FDEV(i).blkz_seq);
> +		kvfree(FDEV(i).zone_capacity_blocks);
>  #endif
>  	}
>  	kvfree(sbi->devs);
> @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct f2fs_sb_info *sbi)
>  }
>  
>  #ifdef CONFIG_BLK_DEV_ZONED
> +
> +struct f2fs_report_zones_args {
> +	struct f2fs_dev_info *dev;
> +	bool zone_cap_mismatch;
> +};
> +
>  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> -			       void *data)
> +			      void *data)
>  {
> -	struct f2fs_dev_info *dev = data;
> +	struct f2fs_report_zones_args *rz_args = data;
> +
> +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> +		return 0;
> +
> +	set_bit(idx, rz_args->dev->blkz_seq);
> +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
> +						F2FS_LOG_SECTORS_PER_BLOCK;
> +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
> +		rz_args->zone_cap_mismatch = true;
>  
> -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
> -		set_bit(idx, dev->blkz_seq);
>  	return 0;
>  }
>  
> @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>  {
>  	struct block_device *bdev = FDEV(devi).bdev;
>  	sector_t nr_sectors = bdev->bd_part->nr_sects;
> +	struct f2fs_report_zones_args rep_zone_arg;
>  	int ret;
>  
>  	if (!f2fs_sb_has_blkzoned(sbi))
> @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>  	if (!FDEV(devi).blkz_seq)
>  		return -ENOMEM;
>  
> -	/* Get block zones type */
> +	/* Get block zones type and zone-capacity */
> +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
> +					FDEV(devi).nr_blkz * sizeof(block_t),
> +					GFP_KERNEL);
> +	if (!FDEV(devi).zone_capacity_blocks)
> +		return -ENOMEM;
> +
> +	rep_zone_arg.dev = &FDEV(devi);
> +	rep_zone_arg.zone_cap_mismatch = false;
> +
>  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
> -				  &FDEV(devi));
> +				  &rep_zone_arg);
>  	if (ret < 0)
>  		return ret;
>  
> +	if (!rep_zone_arg.zone_cap_mismatch) {
> +		kvfree(FDEV(devi).zone_capacity_blocks);
> +		FDEV(devi).zone_capacity_blocks = NULL;
> +	}
> +
>  	return 0;
>  }
>  #endif
> -- 
> 2.19.1


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-07  0:07   ` Jaegeuk Kim
@ 2020-07-07  3:27     ` Aravind Ramesh
  2020-07-07  3:49       ` Jaegeuk Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Aravind Ramesh @ 2020-07-07  3:27 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: Niklas Cassel, Damien Le Moal, linux-f2fs-devel, linux-fsdevel,
	hch, Matias Bjorling

Hello Jaegeuk,

I had mentioned the dependency in the cover letter for this patch, as below.

This series is based on the git tree
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git branch dev

and requires the below patch in order to build.

https://lore.kernel.org/linux-nvme/20200701063720.GA28954@lst.de/T/#m19e0197ae1837b7fe959b13fbc2a859b1f2abc1e

The above patch has been merged to the nvme-5.9 branch in the git tree:
git://git.infradead.org/nvme.git

Could you consider picking up that patch in your tree ?

I have run checkpatch before sending this, it was ok. Ran it again.

f2fs$ scripts/checkpatch.pl ./0001-f2fs-support-zone-capacity-less-than-zone-size.patch
total: 0 errors, 0 warnings, 289 lines checked

./0001-f2fs-support-zone-capacity-less-than-zone-size.patch has no obvious style problems and is ready for submission.

Thanks,
Aravind

> -----Original Message-----
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Sent: Tuesday, July 7, 2020 5:37 AM
> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>
> Cc: yuchao0@huawei.com; linux-fsdevel@vger.kernel.org; linux-f2fs-
> devel@lists.sourceforge.net; hch@lst.de; Damien Le Moal
> <Damien.LeMoal@wdc.com>; Niklas Cassel <Niklas.Cassel@wdc.com>; Matias
> Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone size
> 
> Hi,
> 
> Is there any dependency to the patch? And, could you please run checkpatch script?
> 
> Thanks,
> 
> On 07/02, Aravind Ramesh wrote:
> > NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
> > Zone-capacity indicates the maximum number of sectors that are usable
> > in a zone beginning from the first sector of the zone. This makes the
> > sectors sectors after the zone-capacity till zone-size to be unusable.
> > This patch set tracks zone-size and zone-capacity in zoned devices and
> > calculate the usable blocks per segment and usable segments per section.
> >
> > If zone-capacity is less than zone-size mark only those segments which
> > start before zone-capacity as free segments. All segments at and
> > beyond zone-capacity are treated as permanently used segments. In
> > cases where zone-capacity does not align with segment size the last
> > segment will start before zone-capacity and end beyond the
> > zone-capacity of the zone. For such spanning segments only sectors within the
> zone-capacity are used.
> >
> > Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
> > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> >  fs/f2fs/f2fs.h    |   5 ++
> >  fs/f2fs/segment.c | 136
> ++++++++++++++++++++++++++++++++++++++++++++--
> >  fs/f2fs/segment.h |   6 +-
> >  fs/f2fs/super.c   |  41 ++++++++++++--
> >  4 files changed, 176 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index
> > e6e47618a357..73219e4e1ba4 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1232,6 +1232,7 @@ struct f2fs_dev_info {  #ifdef
> > CONFIG_BLK_DEV_ZONED
> >  	unsigned int nr_blkz;		/* Total number of zones */
> >  	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
> > +	block_t *zone_capacity_blocks;  /* Array of zone capacity in blks */
> >  #endif
> >  };
> >
> > @@ -3395,6 +3396,10 @@ void f2fs_destroy_segment_manager_caches(void);
> >  int f2fs_rw_hint_to_seg_type(enum rw_hint hint);  enum rw_hint
> > f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
> >  			enum page_type type, enum temp_type temp);
> > +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> > +			unsigned int segno);
> > +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> > +			unsigned int segno);
> >
> >  /*
> >   * checkpoint.c
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
> > c35614d255e1..d2156f3f56a5 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct
> > f2fs_sb_info *sbi)  {
> >  	unsigned int start;
> >  	int type;
> > +	struct seg_entry *sentry;
> >
> >  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
> > -		struct seg_entry *sentry = get_seg_entry(sbi, start);
> > +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
> > +			continue;
> > +		sentry = get_seg_entry(sbi, start);
> >  		if (!sentry->valid_blocks)
> >  			__set_free(sbi, start);
> >  		else
> > @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
> >  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
> >  	struct free_segmap_info *free_i = FREE_I(sbi);
> >  	unsigned int segno = 0, offset = 0, secno;
> > -	unsigned short valid_blocks;
> > +	unsigned short valid_blocks, usable_blks_in_seg;
> >  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
> >
> >  	while (1) {
> > @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
> >  			break;
> >  		offset = segno + 1;
> >  		valid_blocks = get_valid_blocks(sbi, segno, false);
> > -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
> > +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
> > +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
> >  			continue;
> > -		if (valid_blocks > sbi->blocks_per_seg) {
> > +		if (valid_blocks > usable_blks_in_seg) {
> >  			f2fs_bug_on(sbi, 1);
> >  			continue;
> >  		}
> > @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct
> > f2fs_sb_info *sbi)
> >
> >  	return 0;
> >  }
> > +
> > +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
> > +						unsigned int dev_idx)
> > +{
> > +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
> > +		return true;
> > +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
> > +
> > +/* Return the zone index in the given device */ static unsigned int
> > +get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
> > +					int dev_idx)
> > +{
> > +	block_t sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi,
> > +secno));
> > +
> > +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
> > +						sbi->log_blocks_per_blkz;
> > +}
> > +
> > +/*
> > + * Return the usable segments in a section based on the zone's
> > + * corresponding zone capacity. Zone is equal to a section.
> > + */
> > +static inline unsigned int f2fs_usable_zone_segs_in_sec(
> > +		struct f2fs_sb_info *sbi, unsigned int segno) {
> > +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
> > +
> > +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
> > +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno), dev_idx);
> > +
> > +	/* Conventional zone's capacity is always equal to zone size */
> > +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> > +		return sbi->segs_per_sec;
> > +
> > +	/*
> > +	 * If the zone_capacity_blocks array is NULL, then zone capacity
> > +	 * is equal to the zone size for all zones
> > +	 */
> > +	if (!FDEV(dev_idx).zone_capacity_blocks)
> > +		return sbi->segs_per_sec;
> > +
> > +	/* Get the segment count beyond zone capacity block */
> > +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
> > +				FDEV(dev_idx).zone_capacity_blocks[zone_idx])
> >>
> > +				sbi->log_blocks_per_seg;
> > +	return sbi->segs_per_sec - unusable_segs_in_sec; }
> > +
> > +/*
> > + * Return the number of usable blocks in a segment. The number of
> > +blocks
> > + * returned is always equal to the number of blocks in a segment for
> > + * segments fully contained within a sequential zone capacity or a
> > + * conventional zone. For segments partially contained in a
> > +sequential
> > + * zone capacity, the number of usable blocks up to the zone capacity
> > + * is returned. 0 is returned in all other cases.
> > + */
> > +static inline unsigned int f2fs_usable_zone_blks_in_seg(
> > +			struct f2fs_sb_info *sbi, unsigned int segno) {
> > +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
> > +	unsigned int zone_idx, dev_idx, secno;
> > +
> > +	secno = GET_SEC_FROM_SEG(sbi, segno);
> > +	seg_start = START_BLOCK(sbi, segno);
> > +	dev_idx = f2fs_target_device_index(sbi, seg_start);
> > +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
> > +
> > +	/*
> > +	 * Conventional zone's capacity is always equal to zone size,
> > +	 * so, blocks per segment is unchanged.
> > +	 */
> > +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> > +		return sbi->blocks_per_seg;
> > +
> > +	if (!FDEV(dev_idx).zone_capacity_blocks)
> > +		return sbi->blocks_per_seg;
> > +
> > +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
> > +	sec_cap_blkaddr = sec_start_blkaddr +
> > +				FDEV(dev_idx).zone_capacity_blocks[zone_idx];
> > +
> > +	/*
> > +	 * If segment starts before zone capacity and spans beyond
> > +	 * zone capacity, then usable blocks are from seg start to
> > +	 * zone capacity. If the segment starts after the zone capacity,
> > +	 * then there are no usable blocks.
> > +	 */
> > +	if (seg_start >= sec_cap_blkaddr)
> > +		return 0;
> > +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
> > +		return sec_cap_blkaddr - seg_start;
> > +
> > +	return sbi->blocks_per_seg;
> > +}
> >  #else
> >  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)  { @@
> > -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct f2fs_sb_info
> > *sbi)  {
> >  	return 0;
> >  }
> > +
> > +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info *sbi,
> > +							unsigned int segno)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info *sbi,
> > +							unsigned int segno)
> > +{
> > +	return 0;
> > +}
> >  #endif
> > +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> > +					unsigned int segno)
> > +{
> > +	if (f2fs_sb_has_blkzoned(sbi))
> > +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
> > +
> > +	return sbi->blocks_per_seg;
> > +}
> > +
> > +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> > +					unsigned int segno)
> > +{
> > +	if (f2fs_sb_has_blkzoned(sbi))
> > +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
> > +
> > +	return sbi->segs_per_sec;
> > +}
> >
> >  /*
> >   * Update min, max modified time for cost-benefit GC algorithm diff
> > --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
> > f261e3e6a69b..79b0dc33feaf 100644
> > --- a/fs/f2fs/segment.h
> > +++ b/fs/f2fs/segment.h
> > @@ -411,6 +411,7 @@ static inline void __set_free(struct f2fs_sb_info *sbi,
> unsigned int segno)
> >  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> >  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> >  	unsigned int next;
> > +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> >
> >  	spin_lock(&free_i->segmap_lock);
> >  	clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@ static
> > inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
> >
> >  	next = find_next_bit(free_i->free_segmap,
> >  			start_segno + sbi->segs_per_sec, start_segno);
> > -	if (next >= start_segno + sbi->segs_per_sec) {
> > +	if (next >= start_segno + usable_segs) {
> >  		clear_bit(secno, free_i->free_secmap);
> >  		free_i->free_sections++;
> >  	}
> > @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info
> *sbi,
> >  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> >  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> >  	unsigned int next;
> > +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> >
> >  	spin_lock(&free_i->segmap_lock);
> >  	if (test_and_clear_bit(segno, free_i->free_segmap)) { @@ -453,7
> > +455,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
> >  			goto skip_free;
> >  		next = find_next_bit(free_i->free_segmap,
> >  				start_segno + sbi->segs_per_sec, start_segno);
> > -		if (next >= start_segno + sbi->segs_per_sec) {
> > +		if (next >= start_segno + usable_segs) {
> >  			if (test_and_clear_bit(secno, free_i->free_secmap))
> >  				free_i->free_sections++;
> >  		}
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index
> > 80cb7cd358f8..2686b07ae7eb 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
> >  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);  #ifdef
> CONFIG_BLK_DEV_ZONED
> >  		kvfree(FDEV(i).blkz_seq);
> > +		kvfree(FDEV(i).zone_capacity_blocks);
> >  #endif
> >  	}
> >  	kvfree(sbi->devs);
> > @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
> > f2fs_sb_info *sbi)  }
> >
> >  #ifdef CONFIG_BLK_DEV_ZONED
> > +
> > +struct f2fs_report_zones_args {
> > +	struct f2fs_dev_info *dev;
> > +	bool zone_cap_mismatch;
> > +};
> > +
> >  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> > -			       void *data)
> > +			      void *data)
> >  {
> > -	struct f2fs_dev_info *dev = data;
> > +	struct f2fs_report_zones_args *rz_args = data;
> > +
> > +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> > +		return 0;
> > +
> > +	set_bit(idx, rz_args->dev->blkz_seq);
> > +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
> > +						F2FS_LOG_SECTORS_PER_BLOCK;
> > +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
> > +		rz_args->zone_cap_mismatch = true;
> >
> > -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
> > -		set_bit(idx, dev->blkz_seq);
> >  	return 0;
> >  }
> >
> > @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct f2fs_sb_info
> > *sbi, int devi)  {
> >  	struct block_device *bdev = FDEV(devi).bdev;
> >  	sector_t nr_sectors = bdev->bd_part->nr_sects;
> > +	struct f2fs_report_zones_args rep_zone_arg;
> >  	int ret;
> >
> >  	if (!f2fs_sb_has_blkzoned(sbi))
> > @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int
> devi)
> >  	if (!FDEV(devi).blkz_seq)
> >  		return -ENOMEM;
> >
> > -	/* Get block zones type */
> > +	/* Get block zones type and zone-capacity */
> > +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
> > +					FDEV(devi).nr_blkz * sizeof(block_t),
> > +					GFP_KERNEL);
> > +	if (!FDEV(devi).zone_capacity_blocks)
> > +		return -ENOMEM;
> > +
> > +	rep_zone_arg.dev = &FDEV(devi);
> > +	rep_zone_arg.zone_cap_mismatch = false;
> > +
> >  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
> > -				  &FDEV(devi));
> > +				  &rep_zone_arg);
> >  	if (ret < 0)
> >  		return ret;
> >
> > +	if (!rep_zone_arg.zone_cap_mismatch) {
> > +		kvfree(FDEV(devi).zone_capacity_blocks);
> > +		FDEV(devi).zone_capacity_blocks = NULL;
> > +	}
> > +
> >  	return 0;
> >  }
> >  #endif
> > --
> > 2.19.1


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-07  3:27     ` Aravind Ramesh
@ 2020-07-07  3:49       ` Jaegeuk Kim
  2020-07-07  5:18         ` Aravind Ramesh
  0 siblings, 1 reply; 16+ messages in thread
From: Jaegeuk Kim @ 2020-07-07  3:49 UTC (permalink / raw)
  To: Aravind Ramesh
  Cc: Niklas Cassel, Damien Le Moal, linux-f2fs-devel, linux-fsdevel,
	hch, Matias Bjorling

On 07/07, Aravind Ramesh wrote:
> Hello Jaegeuk,
> 
> I had mentioned the dependency in the cover letter for this patch, as below.
> 
> This series is based on the git tree
> git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git branch dev
> 
> and requires the below patch in order to build.
> 
> https://lore.kernel.org/linux-nvme/20200701063720.GA28954@lst.de/T/#m19e0197ae1837b7fe959b13fbc2a859b1f2abc1e
> 
> The above patch has been merged to the nvme-5.9 branch in the git tree:
> git://git.infradead.org/nvme.git
> 
> Could you consider picking up that patch in your tree ?
> 
> I have run checkpatch before sending this, it was ok. Ran it again.

I see. I don't have any device, so have to rely on your tests as usual.
Thank you for posting the patch, and will check any regression.

> 
> f2fs$ scripts/checkpatch.pl ./0001-f2fs-support-zone-capacity-less-than-zone-size.patch
> total: 0 errors, 0 warnings, 289 lines checked
> 
> ./0001-f2fs-support-zone-capacity-less-than-zone-size.patch has no obvious style problems and is ready for submission.
> 
> Thanks,
> Aravind
> 
> > -----Original Message-----
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Sent: Tuesday, July 7, 2020 5:37 AM
> > To: Aravind Ramesh <Aravind.Ramesh@wdc.com>
> > Cc: yuchao0@huawei.com; linux-fsdevel@vger.kernel.org; linux-f2fs-
> > devel@lists.sourceforge.net; hch@lst.de; Damien Le Moal
> > <Damien.LeMoal@wdc.com>; Niklas Cassel <Niklas.Cassel@wdc.com>; Matias
> > Bjorling <Matias.Bjorling@wdc.com>
> > Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone size
> > 
> > Hi,
> > 
> > Is there any dependency to the patch? And, could you please run checkpatch script?
> > 
> > Thanks,
> > 
> > On 07/02, Aravind Ramesh wrote:
> > > NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
> > > Zone-capacity indicates the maximum number of sectors that are usable
> > > in a zone beginning from the first sector of the zone. This makes the
> > > sectors sectors after the zone-capacity till zone-size to be unusable.
> > > This patch set tracks zone-size and zone-capacity in zoned devices and
> > > calculate the usable blocks per segment and usable segments per section.
> > >
> > > If zone-capacity is less than zone-size mark only those segments which
> > > start before zone-capacity as free segments. All segments at and
> > > beyond zone-capacity are treated as permanently used segments. In
> > > cases where zone-capacity does not align with segment size the last
> > > segment will start before zone-capacity and end beyond the
> > > zone-capacity of the zone. For such spanning segments only sectors within the
> > zone-capacity are used.
> > >
> > > Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
> > > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> > > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > > ---
> > >  fs/f2fs/f2fs.h    |   5 ++
> > >  fs/f2fs/segment.c | 136
> > ++++++++++++++++++++++++++++++++++++++++++++--
> > >  fs/f2fs/segment.h |   6 +-
> > >  fs/f2fs/super.c   |  41 ++++++++++++--
> > >  4 files changed, 176 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index
> > > e6e47618a357..73219e4e1ba4 100644
> > > --- a/fs/f2fs/f2fs.h
> > > +++ b/fs/f2fs/f2fs.h
> > > @@ -1232,6 +1232,7 @@ struct f2fs_dev_info {  #ifdef
> > > CONFIG_BLK_DEV_ZONED
> > >  	unsigned int nr_blkz;		/* Total number of zones */
> > >  	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
> > > +	block_t *zone_capacity_blocks;  /* Array of zone capacity in blks */
> > >  #endif
> > >  };
> > >
> > > @@ -3395,6 +3396,10 @@ void f2fs_destroy_segment_manager_caches(void);
> > >  int f2fs_rw_hint_to_seg_type(enum rw_hint hint);  enum rw_hint
> > > f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
> > >  			enum page_type type, enum temp_type temp);
> > > +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> > > +			unsigned int segno);
> > > +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> > > +			unsigned int segno);
> > >
> > >  /*
> > >   * checkpoint.c
> > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
> > > c35614d255e1..d2156f3f56a5 100644
> > > --- a/fs/f2fs/segment.c
> > > +++ b/fs/f2fs/segment.c
> > > @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct
> > > f2fs_sb_info *sbi)  {
> > >  	unsigned int start;
> > >  	int type;
> > > +	struct seg_entry *sentry;
> > >
> > >  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
> > > -		struct seg_entry *sentry = get_seg_entry(sbi, start);
> > > +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
> > > +			continue;
> > > +		sentry = get_seg_entry(sbi, start);
> > >  		if (!sentry->valid_blocks)
> > >  			__set_free(sbi, start);
> > >  		else
> > > @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
> > >  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
> > >  	struct free_segmap_info *free_i = FREE_I(sbi);
> > >  	unsigned int segno = 0, offset = 0, secno;
> > > -	unsigned short valid_blocks;
> > > +	unsigned short valid_blocks, usable_blks_in_seg;
> > >  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
> > >
> > >  	while (1) {
> > > @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
> > >  			break;
> > >  		offset = segno + 1;
> > >  		valid_blocks = get_valid_blocks(sbi, segno, false);
> > > -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
> > > +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
> > > +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
> > >  			continue;
> > > -		if (valid_blocks > sbi->blocks_per_seg) {
> > > +		if (valid_blocks > usable_blks_in_seg) {
> > >  			f2fs_bug_on(sbi, 1);
> > >  			continue;
> > >  		}
> > > @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct
> > > f2fs_sb_info *sbi)
> > >
> > >  	return 0;
> > >  }
> > > +
> > > +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
> > > +						unsigned int dev_idx)
> > > +{
> > > +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
> > > +		return true;
> > > +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
> > > +
> > > +/* Return the zone index in the given device */ static unsigned int
> > > +get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
> > > +					int dev_idx)
> > > +{
> > > +	block_t sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi,
> > > +secno));
> > > +
> > > +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
> > > +						sbi->log_blocks_per_blkz;
> > > +}
> > > +
> > > +/*
> > > + * Return the usable segments in a section based on the zone's
> > > + * corresponding zone capacity. Zone is equal to a section.
> > > + */
> > > +static inline unsigned int f2fs_usable_zone_segs_in_sec(
> > > +		struct f2fs_sb_info *sbi, unsigned int segno) {
> > > +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
> > > +
> > > +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
> > > +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno), dev_idx);
> > > +
> > > +	/* Conventional zone's capacity is always equal to zone size */
> > > +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> > > +		return sbi->segs_per_sec;
> > > +
> > > +	/*
> > > +	 * If the zone_capacity_blocks array is NULL, then zone capacity
> > > +	 * is equal to the zone size for all zones
> > > +	 */
> > > +	if (!FDEV(dev_idx).zone_capacity_blocks)
> > > +		return sbi->segs_per_sec;
> > > +
> > > +	/* Get the segment count beyond zone capacity block */
> > > +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
> > > +				FDEV(dev_idx).zone_capacity_blocks[zone_idx])
> > >>
> > > +				sbi->log_blocks_per_seg;
> > > +	return sbi->segs_per_sec - unusable_segs_in_sec; }
> > > +
> > > +/*
> > > + * Return the number of usable blocks in a segment. The number of
> > > +blocks
> > > + * returned is always equal to the number of blocks in a segment for
> > > + * segments fully contained within a sequential zone capacity or a
> > > + * conventional zone. For segments partially contained in a
> > > +sequential
> > > + * zone capacity, the number of usable blocks up to the zone capacity
> > > + * is returned. 0 is returned in all other cases.
> > > + */
> > > +static inline unsigned int f2fs_usable_zone_blks_in_seg(
> > > +			struct f2fs_sb_info *sbi, unsigned int segno) {
> > > +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
> > > +	unsigned int zone_idx, dev_idx, secno;
> > > +
> > > +	secno = GET_SEC_FROM_SEG(sbi, segno);
> > > +	seg_start = START_BLOCK(sbi, segno);
> > > +	dev_idx = f2fs_target_device_index(sbi, seg_start);
> > > +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
> > > +
> > > +	/*
> > > +	 * Conventional zone's capacity is always equal to zone size,
> > > +	 * so, blocks per segment is unchanged.
> > > +	 */
> > > +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> > > +		return sbi->blocks_per_seg;
> > > +
> > > +	if (!FDEV(dev_idx).zone_capacity_blocks)
> > > +		return sbi->blocks_per_seg;
> > > +
> > > +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
> > > +	sec_cap_blkaddr = sec_start_blkaddr +
> > > +				FDEV(dev_idx).zone_capacity_blocks[zone_idx];
> > > +
> > > +	/*
> > > +	 * If segment starts before zone capacity and spans beyond
> > > +	 * zone capacity, then usable blocks are from seg start to
> > > +	 * zone capacity. If the segment starts after the zone capacity,
> > > +	 * then there are no usable blocks.
> > > +	 */
> > > +	if (seg_start >= sec_cap_blkaddr)
> > > +		return 0;
> > > +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
> > > +		return sec_cap_blkaddr - seg_start;
> > > +
> > > +	return sbi->blocks_per_seg;
> > > +}
> > >  #else
> > >  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)  { @@
> > > -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct f2fs_sb_info
> > > *sbi)  {
> > >  	return 0;
> > >  }
> > > +
> > > +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info *sbi,
> > > +							unsigned int segno)
> > > +{
> > > +	return 0;
> > > +}
> > > +
> > > +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info *sbi,
> > > +							unsigned int segno)
> > > +{
> > > +	return 0;
> > > +}
> > >  #endif
> > > +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> > > +					unsigned int segno)
> > > +{
> > > +	if (f2fs_sb_has_blkzoned(sbi))
> > > +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
> > > +
> > > +	return sbi->blocks_per_seg;
> > > +}
> > > +
> > > +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> > > +					unsigned int segno)
> > > +{
> > > +	if (f2fs_sb_has_blkzoned(sbi))
> > > +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
> > > +
> > > +	return sbi->segs_per_sec;
> > > +}
> > >
> > >  /*
> > >   * Update min, max modified time for cost-benefit GC algorithm diff
> > > --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
> > > f261e3e6a69b..79b0dc33feaf 100644
> > > --- a/fs/f2fs/segment.h
> > > +++ b/fs/f2fs/segment.h
> > > @@ -411,6 +411,7 @@ static inline void __set_free(struct f2fs_sb_info *sbi,
> > unsigned int segno)
> > >  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> > >  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> > >  	unsigned int next;
> > > +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> > >
> > >  	spin_lock(&free_i->segmap_lock);
> > >  	clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@ static
> > > inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
> > >
> > >  	next = find_next_bit(free_i->free_segmap,
> > >  			start_segno + sbi->segs_per_sec, start_segno);
> > > -	if (next >= start_segno + sbi->segs_per_sec) {
> > > +	if (next >= start_segno + usable_segs) {
> > >  		clear_bit(secno, free_i->free_secmap);
> > >  		free_i->free_sections++;
> > >  	}
> > > @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info
> > *sbi,
> > >  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> > >  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> > >  	unsigned int next;
> > > +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> > >
> > >  	spin_lock(&free_i->segmap_lock);
> > >  	if (test_and_clear_bit(segno, free_i->free_segmap)) { @@ -453,7
> > > +455,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
> > >  			goto skip_free;
> > >  		next = find_next_bit(free_i->free_segmap,
> > >  				start_segno + sbi->segs_per_sec, start_segno);
> > > -		if (next >= start_segno + sbi->segs_per_sec) {
> > > +		if (next >= start_segno + usable_segs) {
> > >  			if (test_and_clear_bit(secno, free_i->free_secmap))
> > >  				free_i->free_sections++;
> > >  		}
> > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index
> > > 80cb7cd358f8..2686b07ae7eb 100644
> > > --- a/fs/f2fs/super.c
> > > +++ b/fs/f2fs/super.c
> > > @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
> > >  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);  #ifdef
> > CONFIG_BLK_DEV_ZONED
> > >  		kvfree(FDEV(i).blkz_seq);
> > > +		kvfree(FDEV(i).zone_capacity_blocks);
> > >  #endif
> > >  	}
> > >  	kvfree(sbi->devs);
> > > @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
> > > f2fs_sb_info *sbi)  }
> > >
> > >  #ifdef CONFIG_BLK_DEV_ZONED
> > > +
> > > +struct f2fs_report_zones_args {
> > > +	struct f2fs_dev_info *dev;
> > > +	bool zone_cap_mismatch;
> > > +};
> > > +
> > >  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> > > -			       void *data)
> > > +			      void *data)
> > >  {
> > > -	struct f2fs_dev_info *dev = data;
> > > +	struct f2fs_report_zones_args *rz_args = data;
> > > +
> > > +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> > > +		return 0;
> > > +
> > > +	set_bit(idx, rz_args->dev->blkz_seq);
> > > +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
> > > +						F2FS_LOG_SECTORS_PER_BLOCK;
> > > +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
> > > +		rz_args->zone_cap_mismatch = true;
> > >
> > > -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
> > > -		set_bit(idx, dev->blkz_seq);
> > >  	return 0;
> > >  }
> > >
> > > @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct f2fs_sb_info
> > > *sbi, int devi)  {
> > >  	struct block_device *bdev = FDEV(devi).bdev;
> > >  	sector_t nr_sectors = bdev->bd_part->nr_sects;
> > > +	struct f2fs_report_zones_args rep_zone_arg;
> > >  	int ret;
> > >
> > >  	if (!f2fs_sb_has_blkzoned(sbi))
> > > @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int
> > devi)
> > >  	if (!FDEV(devi).blkz_seq)
> > >  		return -ENOMEM;
> > >
> > > -	/* Get block zones type */
> > > +	/* Get block zones type and zone-capacity */
> > > +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
> > > +					FDEV(devi).nr_blkz * sizeof(block_t),
> > > +					GFP_KERNEL);
> > > +	if (!FDEV(devi).zone_capacity_blocks)
> > > +		return -ENOMEM;
> > > +
> > > +	rep_zone_arg.dev = &FDEV(devi);
> > > +	rep_zone_arg.zone_cap_mismatch = false;
> > > +
> > >  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
> > > -				  &FDEV(devi));
> > > +				  &rep_zone_arg);
> > >  	if (ret < 0)
> > >  		return ret;
> > >
> > > +	if (!rep_zone_arg.zone_cap_mismatch) {
> > > +		kvfree(FDEV(devi).zone_capacity_blocks);
> > > +		FDEV(devi).zone_capacity_blocks = NULL;
> > > +	}
> > > +
> > >  	return 0;
> > >  }
> > >  #endif
> > > --
> > > 2.19.1


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-07  3:49       ` Jaegeuk Kim
@ 2020-07-07  5:18         ` Aravind Ramesh
  0 siblings, 0 replies; 16+ messages in thread
From: Aravind Ramesh @ 2020-07-07  5:18 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: Niklas Cassel, Damien Le Moal, linux-f2fs-devel, linux-fsdevel,
	hch, Matias Bjorling

Thanks Jaegeuk.
The qemu patches to emulate ZNS devices is in the process of getting merged.
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg05390.html
We can use this emulated ZNS drive along with a null-block device to create a f2fs volume if physical ZNS device is not available. 

Aravind

> -----Original Message-----
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Sent: Tuesday, July 7, 2020 9:20 AM
> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>
> Cc: yuchao0@huawei.com; linux-fsdevel@vger.kernel.org; linux-f2fs-
> devel@lists.sourceforge.net; hch@lst.de; Damien Le Moal
> <Damien.LeMoal@wdc.com>; Niklas Cassel <Niklas.Cassel@wdc.com>; Matias
> Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone size
> 
> On 07/07, Aravind Ramesh wrote:
> > Hello Jaegeuk,
> >
> > I had mentioned the dependency in the cover letter for this patch, as below.
> >
> > This series is based on the git tree
> > git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git branch
> > dev
> >
> > and requires the below patch in order to build.
> >
> > https://lore.kernel.org/linux-nvme/20200701063720.GA28954@lst.de/T/#m1
> > 9e0197ae1837b7fe959b13fbc2a859b1f2abc1e
> >
> > The above patch has been merged to the nvme-5.9 branch in the git tree:
> > git://git.infradead.org/nvme.git
> >
> > Could you consider picking up that patch in your tree ?
> >
> > I have run checkpatch before sending this, it was ok. Ran it again.
> 
> I see. I don't have any device, so have to rely on your tests as usual.
> Thank you for posting the patch, and will check any regression.
> 
> >
> > f2fs$ scripts/checkpatch.pl
> > ./0001-f2fs-support-zone-capacity-less-than-zone-size.patch
> > total: 0 errors, 0 warnings, 289 lines checked
> >
> > ./0001-f2fs-support-zone-capacity-less-than-zone-size.patch has no obvious style
> problems and is ready for submission.
> >
> > Thanks,
> > Aravind
> >
> > > -----Original Message-----
> > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > Sent: Tuesday, July 7, 2020 5:37 AM
> > > To: Aravind Ramesh <Aravind.Ramesh@wdc.com>
> > > Cc: yuchao0@huawei.com; linux-fsdevel@vger.kernel.org; linux-f2fs-
> > > devel@lists.sourceforge.net; hch@lst.de; Damien Le Moal
> > > <Damien.LeMoal@wdc.com>; Niklas Cassel <Niklas.Cassel@wdc.com>;
> > > Matias Bjorling <Matias.Bjorling@wdc.com>
> > > Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone
> > > size
> > >
> > > Hi,
> > >
> > > Is there any dependency to the patch? And, could you please run checkpatch
> script?
> > >
> > > Thanks,
> > >
> > > On 07/02, Aravind Ramesh wrote:
> > > > NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
> > > > Zone-capacity indicates the maximum number of sectors that are
> > > > usable in a zone beginning from the first sector of the zone. This
> > > > makes the sectors sectors after the zone-capacity till zone-size to be unusable.
> > > > This patch set tracks zone-size and zone-capacity in zoned devices
> > > > and calculate the usable blocks per segment and usable segments per section.
> > > >
> > > > If zone-capacity is less than zone-size mark only those segments
> > > > which start before zone-capacity as free segments. All segments at
> > > > and beyond zone-capacity are treated as permanently used segments.
> > > > In cases where zone-capacity does not align with segment size the
> > > > last segment will start before zone-capacity and end beyond the
> > > > zone-capacity of the zone. For such spanning segments only sectors
> > > > within the
> > > zone-capacity are used.
> > > >
> > > > Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
> > > > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> > > > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > > > ---
> > > >  fs/f2fs/f2fs.h    |   5 ++
> > > >  fs/f2fs/segment.c | 136
> > > ++++++++++++++++++++++++++++++++++++++++++++--
> > > >  fs/f2fs/segment.h |   6 +-
> > > >  fs/f2fs/super.c   |  41 ++++++++++++--
> > > >  4 files changed, 176 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index
> > > > e6e47618a357..73219e4e1ba4 100644
> > > > --- a/fs/f2fs/f2fs.h
> > > > +++ b/fs/f2fs/f2fs.h
> > > > @@ -1232,6 +1232,7 @@ struct f2fs_dev_info {  #ifdef
> > > > CONFIG_BLK_DEV_ZONED
> > > >  	unsigned int nr_blkz;		/* Total number of zones */
> > > >  	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
> > > > +	block_t *zone_capacity_blocks;  /* Array of zone capacity in
> > > > +blks */
> > > >  #endif
> > > >  };
> > > >
> > > > @@ -3395,6 +3396,10 @@ void
> > > > f2fs_destroy_segment_manager_caches(void);
> > > >  int f2fs_rw_hint_to_seg_type(enum rw_hint hint);  enum rw_hint
> > > > f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
> > > >  			enum page_type type, enum temp_type temp);
> > > > +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> > > > +			unsigned int segno);
> > > > +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> > > > +			unsigned int segno);
> > > >
> > > >  /*
> > > >   * checkpoint.c
> > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
> > > > c35614d255e1..d2156f3f56a5 100644
> > > > --- a/fs/f2fs/segment.c
> > > > +++ b/fs/f2fs/segment.c
> > > > @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct
> > > > f2fs_sb_info *sbi)  {
> > > >  	unsigned int start;
> > > >  	int type;
> > > > +	struct seg_entry *sentry;
> > > >
> > > >  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
> > > > -		struct seg_entry *sentry = get_seg_entry(sbi, start);
> > > > +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
> > > > +			continue;
> > > > +		sentry = get_seg_entry(sbi, start);
> > > >  		if (!sentry->valid_blocks)
> > > >  			__set_free(sbi, start);
> > > >  		else
> > > > @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct f2fs_sb_info
> *sbi)
> > > >  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
> > > >  	struct free_segmap_info *free_i = FREE_I(sbi);
> > > >  	unsigned int segno = 0, offset = 0, secno;
> > > > -	unsigned short valid_blocks;
> > > > +	unsigned short valid_blocks, usable_blks_in_seg;
> > > >  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
> > > >
> > > >  	while (1) {
> > > > @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct f2fs_sb_info
> *sbi)
> > > >  			break;
> > > >  		offset = segno + 1;
> > > >  		valid_blocks = get_valid_blocks(sbi, segno, false);
> > > > -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
> > > > +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
> > > > +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
> > > >  			continue;
> > > > -		if (valid_blocks > sbi->blocks_per_seg) {
> > > > +		if (valid_blocks > usable_blks_in_seg) {
> > > >  			f2fs_bug_on(sbi, 1);
> > > >  			continue;
> > > >  		}
> > > > @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct
> > > > f2fs_sb_info *sbi)
> > > >
> > > >  	return 0;
> > > >  }
> > > > +
> > > > +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
> > > > +						unsigned int dev_idx)
> > > > +{
> > > > +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
> > > > +		return true;
> > > > +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
> > > > +
> > > > +/* Return the zone index in the given device */ static unsigned
> > > > +int get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
> > > > +					int dev_idx)
> > > > +{
> > > > +	block_t sec_start_blkaddr = START_BLOCK(sbi,
> > > > +GET_SEG_FROM_SEC(sbi, secno));
> > > > +
> > > > +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
> > > > +						sbi->log_blocks_per_blkz;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Return the usable segments in a section based on the zone's
> > > > + * corresponding zone capacity. Zone is equal to a section.
> > > > + */
> > > > +static inline unsigned int f2fs_usable_zone_segs_in_sec(
> > > > +		struct f2fs_sb_info *sbi, unsigned int segno) {
> > > > +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
> > > > +
> > > > +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
> > > > +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno),
> > > > +dev_idx);
> > > > +
> > > > +	/* Conventional zone's capacity is always equal to zone size */
> > > > +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> > > > +		return sbi->segs_per_sec;
> > > > +
> > > > +	/*
> > > > +	 * If the zone_capacity_blocks array is NULL, then zone capacity
> > > > +	 * is equal to the zone size for all zones
> > > > +	 */
> > > > +	if (!FDEV(dev_idx).zone_capacity_blocks)
> > > > +		return sbi->segs_per_sec;
> > > > +
> > > > +	/* Get the segment count beyond zone capacity block */
> > > > +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
> > > > +				FDEV(dev_idx).zone_capacity_blocks[zone_idx])
> > > >>
> > > > +				sbi->log_blocks_per_seg;
> > > > +	return sbi->segs_per_sec - unusable_segs_in_sec; }
> > > > +
> > > > +/*
> > > > + * Return the number of usable blocks in a segment. The number of
> > > > +blocks
> > > > + * returned is always equal to the number of blocks in a segment
> > > > +for
> > > > + * segments fully contained within a sequential zone capacity or
> > > > +a
> > > > + * conventional zone. For segments partially contained in a
> > > > +sequential
> > > > + * zone capacity, the number of usable blocks up to the zone
> > > > +capacity
> > > > + * is returned. 0 is returned in all other cases.
> > > > + */
> > > > +static inline unsigned int f2fs_usable_zone_blks_in_seg(
> > > > +			struct f2fs_sb_info *sbi, unsigned int segno) {
> > > > +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
> > > > +	unsigned int zone_idx, dev_idx, secno;
> > > > +
> > > > +	secno = GET_SEC_FROM_SEG(sbi, segno);
> > > > +	seg_start = START_BLOCK(sbi, segno);
> > > > +	dev_idx = f2fs_target_device_index(sbi, seg_start);
> > > > +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
> > > > +
> > > > +	/*
> > > > +	 * Conventional zone's capacity is always equal to zone size,
> > > > +	 * so, blocks per segment is unchanged.
> > > > +	 */
> > > > +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> > > > +		return sbi->blocks_per_seg;
> > > > +
> > > > +	if (!FDEV(dev_idx).zone_capacity_blocks)
> > > > +		return sbi->blocks_per_seg;
> > > > +
> > > > +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
> > > > +	sec_cap_blkaddr = sec_start_blkaddr +
> > > > +				FDEV(dev_idx).zone_capacity_blocks[zone_idx];
> > > > +
> > > > +	/*
> > > > +	 * If segment starts before zone capacity and spans beyond
> > > > +	 * zone capacity, then usable blocks are from seg start to
> > > > +	 * zone capacity. If the segment starts after the zone capacity,
> > > > +	 * then there are no usable blocks.
> > > > +	 */
> > > > +	if (seg_start >= sec_cap_blkaddr)
> > > > +		return 0;
> > > > +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
> > > > +		return sec_cap_blkaddr - seg_start;
> > > > +
> > > > +	return sbi->blocks_per_seg;
> > > > +}
> > > >  #else
> > > >  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)  { @@
> > > > -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct
> > > > f2fs_sb_info
> > > > *sbi)  {
> > > >  	return 0;
> > > >  }
> > > > +
> > > > +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info
> *sbi,
> > > > +							unsigned int segno)
> > > > +{
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info
> *sbi,
> > > > +							unsigned int segno)
> > > > +{
> > > > +	return 0;
> > > > +}
> > > >  #endif
> > > > +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> > > > +					unsigned int segno)
> > > > +{
> > > > +	if (f2fs_sb_has_blkzoned(sbi))
> > > > +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
> > > > +
> > > > +	return sbi->blocks_per_seg;
> > > > +}
> > > > +
> > > > +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> > > > +					unsigned int segno)
> > > > +{
> > > > +	if (f2fs_sb_has_blkzoned(sbi))
> > > > +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
> > > > +
> > > > +	return sbi->segs_per_sec;
> > > > +}
> > > >
> > > >  /*
> > > >   * Update min, max modified time for cost-benefit GC algorithm
> > > > diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
> > > > f261e3e6a69b..79b0dc33feaf 100644
> > > > --- a/fs/f2fs/segment.h
> > > > +++ b/fs/f2fs/segment.h
> > > > @@ -411,6 +411,7 @@ static inline void __set_free(struct
> > > > f2fs_sb_info *sbi,
> > > unsigned int segno)
> > > >  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> > > >  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> > > >  	unsigned int next;
> > > > +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> > > >
> > > >  	spin_lock(&free_i->segmap_lock);
> > > >  	clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@
> > > > static inline void __set_free(struct f2fs_sb_info *sbi, unsigned
> > > > int segno)
> > > >
> > > >  	next = find_next_bit(free_i->free_segmap,
> > > >  			start_segno + sbi->segs_per_sec, start_segno);
> > > > -	if (next >= start_segno + sbi->segs_per_sec) {
> > > > +	if (next >= start_segno + usable_segs) {
> > > >  		clear_bit(secno, free_i->free_secmap);
> > > >  		free_i->free_sections++;
> > > >  	}
> > > > @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct
> > > > f2fs_sb_info
> > > *sbi,
> > > >  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> > > >  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> > > >  	unsigned int next;
> > > > +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> > > >
> > > >  	spin_lock(&free_i->segmap_lock);
> > > >  	if (test_and_clear_bit(segno, free_i->free_segmap)) { @@ -453,7
> > > > +455,7 @@ static inline void __set_test_and_free(struct
> > > > +f2fs_sb_info *sbi,
> > > >  			goto skip_free;
> > > >  		next = find_next_bit(free_i->free_segmap,
> > > >  				start_segno + sbi->segs_per_sec, start_segno);
> > > > -		if (next >= start_segno + sbi->segs_per_sec) {
> > > > +		if (next >= start_segno + usable_segs) {
> > > >  			if (test_and_clear_bit(secno, free_i->free_secmap))
> > > >  				free_i->free_sections++;
> > > >  		}
> > > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index
> > > > 80cb7cd358f8..2686b07ae7eb 100644
> > > > --- a/fs/f2fs/super.c
> > > > +++ b/fs/f2fs/super.c
> > > > @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct f2fs_sb_info
> *sbi)
> > > >  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);  #ifdef
> > > CONFIG_BLK_DEV_ZONED
> > > >  		kvfree(FDEV(i).blkz_seq);
> > > > +		kvfree(FDEV(i).zone_capacity_blocks);
> > > >  #endif
> > > >  	}
> > > >  	kvfree(sbi->devs);
> > > > @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
> > > > f2fs_sb_info *sbi)  }
> > > >
> > > >  #ifdef CONFIG_BLK_DEV_ZONED
> > > > +
> > > > +struct f2fs_report_zones_args {
> > > > +	struct f2fs_dev_info *dev;
> > > > +	bool zone_cap_mismatch;
> > > > +};
> > > > +
> > > >  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> > > > -			       void *data)
> > > > +			      void *data)
> > > >  {
> > > > -	struct f2fs_dev_info *dev = data;
> > > > +	struct f2fs_report_zones_args *rz_args = data;
> > > > +
> > > > +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> > > > +		return 0;
> > > > +
> > > > +	set_bit(idx, rz_args->dev->blkz_seq);
> > > > +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
> > > > +						F2FS_LOG_SECTORS_PER_BLOCK;
> > > > +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
> > > > +		rz_args->zone_cap_mismatch = true;
> > > >
> > > > -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
> > > > -		set_bit(idx, dev->blkz_seq);
> > > >  	return 0;
> > > >  }
> > > >
> > > > @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct
> > > > f2fs_sb_info *sbi, int devi)  {
> > > >  	struct block_device *bdev = FDEV(devi).bdev;
> > > >  	sector_t nr_sectors = bdev->bd_part->nr_sects;
> > > > +	struct f2fs_report_zones_args rep_zone_arg;
> > > >  	int ret;
> > > >
> > > >  	if (!f2fs_sb_has_blkzoned(sbi))
> > > > @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct
> > > > f2fs_sb_info *sbi, int
> > > devi)
> > > >  	if (!FDEV(devi).blkz_seq)
> > > >  		return -ENOMEM;
> > > >
> > > > -	/* Get block zones type */
> > > > +	/* Get block zones type and zone-capacity */
> > > > +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
> > > > +					FDEV(devi).nr_blkz * sizeof(block_t),
> > > > +					GFP_KERNEL);
> > > > +	if (!FDEV(devi).zone_capacity_blocks)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	rep_zone_arg.dev = &FDEV(devi);
> > > > +	rep_zone_arg.zone_cap_mismatch = false;
> > > > +
> > > >  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
> > > > -				  &FDEV(devi));
> > > > +				  &rep_zone_arg);
> > > >  	if (ret < 0)
> > > >  		return ret;
> > > >
> > > > +	if (!rep_zone_arg.zone_cap_mismatch) {
> > > > +		kvfree(FDEV(devi).zone_capacity_blocks);
> > > > +		FDEV(devi).zone_capacity_blocks = NULL;
> > > > +	}
> > > > +
> > > >  	return 0;
> > > >  }
> > > >  #endif
> > > > --
> > > > 2.19.1


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-02 15:54 ` [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size Aravind Ramesh
  2020-07-07  0:07   ` Jaegeuk Kim
@ 2020-07-07 12:18   ` Chao Yu
  2020-07-07 18:23     ` Aravind Ramesh
  1 sibling, 1 reply; 16+ messages in thread
From: Chao Yu @ 2020-07-07 12:18 UTC (permalink / raw)
  To: Aravind Ramesh, jaegeuk, linux-fsdevel, linux-f2fs-devel, hch
  Cc: niklas.cassel, Damien.LeMoal, matias.bjorling

On 2020/7/2 23:54, Aravind Ramesh wrote:
> NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
> Zone-capacity indicates the maximum number of sectors that are usable in
> a zone beginning from the first sector of the zone. This makes the sectors
> sectors after the zone-capacity till zone-size to be unusable.
> This patch set tracks zone-size and zone-capacity in zoned devices and
> calculate the usable blocks per segment and usable segments per section.
> 
> If zone-capacity is less than zone-size mark only those segments which
> start before zone-capacity as free segments. All segments at and beyond
> zone-capacity are treated as permanently used segments. In cases where
> zone-capacity does not align with segment size the last segment will start
> before zone-capacity and end beyond the zone-capacity of the zone. For
> such spanning segments only sectors within the zone-capacity are used.
> 
> Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>  fs/f2fs/f2fs.h    |   5 ++
>  fs/f2fs/segment.c | 136 ++++++++++++++++++++++++++++++++++++++++++++--
>  fs/f2fs/segment.h |   6 +-
>  fs/f2fs/super.c   |  41 ++++++++++++--
>  4 files changed, 176 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index e6e47618a357..73219e4e1ba4 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1232,6 +1232,7 @@ struct f2fs_dev_info {
>  #ifdef CONFIG_BLK_DEV_ZONED
>  	unsigned int nr_blkz;		/* Total number of zones */
>  	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
> +	block_t *zone_capacity_blocks;  /* Array of zone capacity in blks */
>  #endif
>  };
>  
> @@ -3395,6 +3396,10 @@ void f2fs_destroy_segment_manager_caches(void);
>  int f2fs_rw_hint_to_seg_type(enum rw_hint hint);
>  enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
>  			enum page_type type, enum temp_type temp);
> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> +			unsigned int segno);
> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> +			unsigned int segno);
>  
>  /*
>   * checkpoint.c
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index c35614d255e1..d2156f3f56a5 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct f2fs_sb_info *sbi)
>  {
>  	unsigned int start;
>  	int type;
> +	struct seg_entry *sentry;
>  
>  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
> -		struct seg_entry *sentry = get_seg_entry(sbi, start);
> +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)

If usable blocks count is zero, shouldn't we update SIT_I(sbi)->written_valid_blocks
as we did when there is partial usable block in current segment?

> +			continue;
> +		sentry = get_seg_entry(sbi, start);
>  		if (!sentry->valid_blocks)
>  			__set_free(sbi, start);
>  		else
> @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
>  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>  	struct free_segmap_info *free_i = FREE_I(sbi);
>  	unsigned int segno = 0, offset = 0, secno;
> -	unsigned short valid_blocks;
> +	unsigned short valid_blocks, usable_blks_in_seg;
>  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
>  
>  	while (1) {
> @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
>  			break;
>  		offset = segno + 1;
>  		valid_blocks = get_valid_blocks(sbi, segno, false);
> -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
> +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
> +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)

It needs to traverse .cur_valid_map bitmap to check whether blocks in range of
[0, usable_blks_in_seg] are all valid or not, if there is at least one usable
block in the range, segment should be dirty.

One question, if we select dirty segment which across zone-capacity as opened
segment (in curseg), how can we avoid allocating usable block beyong zone-capacity
in such segment via .cur_valid_map?

>  			continue;
> -		if (valid_blocks > sbi->blocks_per_seg) {
> +		if (valid_blocks > usable_blks_in_seg) {
>  			f2fs_bug_on(sbi, 1);
>  			continue;
>  		}
> @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
>  
>  	return 0;
>  }
> +
> +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
> +						unsigned int dev_idx)
> +{
> +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
> +		return true;
> +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq);
> +}
> +
> +/* Return the zone index in the given device */
> +static unsigned int get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
> +					int dev_idx)
> +{
> +	block_t sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
> +
> +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
> +						sbi->log_blocks_per_blkz;
> +}
> +
> +/*
> + * Return the usable segments in a section based on the zone's
> + * corresponding zone capacity. Zone is equal to a section.
> + */
> +static inline unsigned int f2fs_usable_zone_segs_in_sec(
> +		struct f2fs_sb_info *sbi, unsigned int segno)
> +{
> +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
> +
> +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
> +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno), dev_idx);
> +
> +	/* Conventional zone's capacity is always equal to zone size */
> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> +		return sbi->segs_per_sec;
> +
> +	/*
> +	 * If the zone_capacity_blocks array is NULL, then zone capacity
> +	 * is equal to the zone size for all zones
> +	 */
> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> +		return sbi->segs_per_sec;
> +
> +	/* Get the segment count beyond zone capacity block */
> +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
> +				FDEV(dev_idx).zone_capacity_blocks[zone_idx]) >>
> +				sbi->log_blocks_per_seg;
> +	return sbi->segs_per_sec - unusable_segs_in_sec;
> +}
> +
> +/*
> + * Return the number of usable blocks in a segment. The number of blocks
> + * returned is always equal to the number of blocks in a segment for
> + * segments fully contained within a sequential zone capacity or a
> + * conventional zone. For segments partially contained in a sequential
> + * zone capacity, the number of usable blocks up to the zone capacity
> + * is returned. 0 is returned in all other cases.
> + */
> +static inline unsigned int f2fs_usable_zone_blks_in_seg(
> +			struct f2fs_sb_info *sbi, unsigned int segno)
> +{
> +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
> +	unsigned int zone_idx, dev_idx, secno;
> +
> +	secno = GET_SEC_FROM_SEG(sbi, segno);
> +	seg_start = START_BLOCK(sbi, segno);
> +	dev_idx = f2fs_target_device_index(sbi, seg_start);
> +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
> +
> +	/*
> +	 * Conventional zone's capacity is always equal to zone size,
> +	 * so, blocks per segment is unchanged.
> +	 */
> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> +		return sbi->blocks_per_seg;
> +
> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> +		return sbi->blocks_per_seg;
> +
> +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
> +	sec_cap_blkaddr = sec_start_blkaddr +
> +				FDEV(dev_idx).zone_capacity_blocks[zone_idx];
> +
> +	/*
> +	 * If segment starts before zone capacity and spans beyond
> +	 * zone capacity, then usable blocks are from seg start to
> +	 * zone capacity. If the segment starts after the zone capacity,
> +	 * then there are no usable blocks.
> +	 */
> +	if (seg_start >= sec_cap_blkaddr)
> +		return 0;
> +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
> +		return sec_cap_blkaddr - seg_start;
> +
> +	return sbi->blocks_per_seg;
> +}
>  #else
>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
>  {
> @@ -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
>  {
>  	return 0;
>  }
> +
> +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info *sbi,
> +							unsigned int segno)
> +{
> +	return 0;
> +}
> +
> +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info *sbi,
> +							unsigned int segno)
> +{
> +	return 0;
> +}
>  #endif
> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> +					unsigned int segno)
> +{
> +	if (f2fs_sb_has_blkzoned(sbi))
> +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
> +
> +	return sbi->blocks_per_seg;
> +}
> +
> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> +					unsigned int segno)
> +{
> +	if (f2fs_sb_has_blkzoned(sbi))
> +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
> +
> +	return sbi->segs_per_sec;
> +}
>  
>  /*
>   * Update min, max modified time for cost-benefit GC algorithm
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> index f261e3e6a69b..79b0dc33feaf 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -411,6 +411,7 @@ static inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>  	unsigned int next;
> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>  
>  	spin_lock(&free_i->segmap_lock);
>  	clear_bit(segno, free_i->free_segmap);
> @@ -418,7 +419,7 @@ static inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
>  
>  	next = find_next_bit(free_i->free_segmap,
>  			start_segno + sbi->segs_per_sec, start_segno);
> -	if (next >= start_segno + sbi->segs_per_sec) {
> +	if (next >= start_segno + usable_segs) {
>  		clear_bit(secno, free_i->free_secmap);
>  		free_i->free_sections++;
>  	}
> @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>  	unsigned int next;
> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>  
>  	spin_lock(&free_i->segmap_lock);
>  	if (test_and_clear_bit(segno, free_i->free_segmap)) {
> @@ -453,7 +455,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
>  			goto skip_free;
>  		next = find_next_bit(free_i->free_segmap,
>  				start_segno + sbi->segs_per_sec, start_segno);
> -		if (next >= start_segno + sbi->segs_per_sec) {
> +		if (next >= start_segno + usable_segs) {
>  			if (test_and_clear_bit(secno, free_i->free_secmap))
>  				free_i->free_sections++;
>  		}
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 80cb7cd358f8..2686b07ae7eb 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
>  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);
>  #ifdef CONFIG_BLK_DEV_ZONED
>  		kvfree(FDEV(i).blkz_seq);
> +		kvfree(FDEV(i).zone_capacity_blocks);

Now, f2fs_kzalloc won't allocate vmalloc's memory, so it's safe to use kfree().

>  #endif
>  	}
>  	kvfree(sbi->devs);
> @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct f2fs_sb_info *sbi)
>  }
>  
>  #ifdef CONFIG_BLK_DEV_ZONED
> +
> +struct f2fs_report_zones_args {
> +	struct f2fs_dev_info *dev;
> +	bool zone_cap_mismatch;
> +};
> +
>  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> -			       void *data)
> +			      void *data)
>  {
> -	struct f2fs_dev_info *dev = data;
> +	struct f2fs_report_zones_args *rz_args = data;
> +
> +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> +		return 0;
> +
> +	set_bit(idx, rz_args->dev->blkz_seq);
> +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
> +						F2FS_LOG_SECTORS_PER_BLOCK;
> +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
> +		rz_args->zone_cap_mismatch = true;
>  
> -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
> -		set_bit(idx, dev->blkz_seq);
>  	return 0;
>  }
>  
> @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>  {
>  	struct block_device *bdev = FDEV(devi).bdev;
>  	sector_t nr_sectors = bdev->bd_part->nr_sects;
> +	struct f2fs_report_zones_args rep_zone_arg;
>  	int ret;
>  
>  	if (!f2fs_sb_has_blkzoned(sbi))
> @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>  	if (!FDEV(devi).blkz_seq)
>  		return -ENOMEM;
>  
> -	/* Get block zones type */
> +	/* Get block zones type and zone-capacity */
> +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
> +					FDEV(devi).nr_blkz * sizeof(block_t),
> +					GFP_KERNEL);
> +	if (!FDEV(devi).zone_capacity_blocks)
> +		return -ENOMEM;
> +
> +	rep_zone_arg.dev = &FDEV(devi);
> +	rep_zone_arg.zone_cap_mismatch = false;
> +
>  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
> -				  &FDEV(devi));
> +				  &rep_zone_arg);
>  	if (ret < 0)
>  		return ret;

Missed to call kfree(FDEV(devi).zone_capacity_blocks)?

>  
> +	if (!rep_zone_arg.zone_cap_mismatch) {
> +		kvfree(FDEV(devi).zone_capacity_blocks);

Ditto, kfree().

Thanks,

> +		FDEV(devi).zone_capacity_blocks = NULL;
> +	}
> +
>  	return 0;
>  }
>  #endif
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-07 12:18   ` Chao Yu
@ 2020-07-07 18:23     ` Aravind Ramesh
  2020-07-08  2:33       ` Chao Yu
  0 siblings, 1 reply; 16+ messages in thread
From: Aravind Ramesh @ 2020-07-07 18:23 UTC (permalink / raw)
  To: Chao Yu, jaegeuk, linux-fsdevel, linux-f2fs-devel, hch
  Cc: Niklas Cassel, Damien Le Moal, Matias Bjorling

Thanks for review Chao Yu.
Please find my response inline.
I will re-send a V2 after incorporating your comments.

Regards,
Aravind

> -----Original Message-----
> From: Chao Yu <yuchao0@huawei.com>
> Sent: Tuesday, July 7, 2020 5:49 PM
> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org; linux-
> fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone size
> 
> On 2020/7/2 23:54, Aravind Ramesh wrote:
> > NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
> > Zone-capacity indicates the maximum number of sectors that are usable
> > in a zone beginning from the first sector of the zone. This makes the
> > sectors sectors after the zone-capacity till zone-size to be unusable.
> > This patch set tracks zone-size and zone-capacity in zoned devices and
> > calculate the usable blocks per segment and usable segments per section.
> >
> > If zone-capacity is less than zone-size mark only those segments which
> > start before zone-capacity as free segments. All segments at and
> > beyond zone-capacity are treated as permanently used segments. In
> > cases where zone-capacity does not align with segment size the last
> > segment will start before zone-capacity and end beyond the
> > zone-capacity of the zone. For such spanning segments only sectors within the
> zone-capacity are used.
> >
> > Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
> > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> >  fs/f2fs/f2fs.h    |   5 ++
> >  fs/f2fs/segment.c | 136
> ++++++++++++++++++++++++++++++++++++++++++++--
> >  fs/f2fs/segment.h |   6 +-
> >  fs/f2fs/super.c   |  41 ++++++++++++--
> >  4 files changed, 176 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index
> > e6e47618a357..73219e4e1ba4 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1232,6 +1232,7 @@ struct f2fs_dev_info {  #ifdef
> > CONFIG_BLK_DEV_ZONED
> >  	unsigned int nr_blkz;		/* Total number of zones */
> >  	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
> > +	block_t *zone_capacity_blocks;  /* Array of zone capacity in blks */
> >  #endif
> >  };
> >
> > @@ -3395,6 +3396,10 @@ void f2fs_destroy_segment_manager_caches(void);
> >  int f2fs_rw_hint_to_seg_type(enum rw_hint hint);  enum rw_hint
> > f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
> >  			enum page_type type, enum temp_type temp);
> > +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> > +			unsigned int segno);
> > +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> > +			unsigned int segno);
> >
> >  /*
> >   * checkpoint.c
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
> > c35614d255e1..d2156f3f56a5 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct
> > f2fs_sb_info *sbi)  {
> >  	unsigned int start;
> >  	int type;
> > +	struct seg_entry *sentry;
> >
> >  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
> > -		struct seg_entry *sentry = get_seg_entry(sbi, start);
> > +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
> 
> If usable blocks count is zero, shouldn't we update SIT_I(sbi)->written_valid_blocks
> as we did when there is partial usable block in current segment?
If usable_block_count is zero, then it is like a dead segment, all blocks in the segment lie after the
zone-capacity in the zone. So there can never be a valid written content on these segments, hence it is not updated. 
In the other case, when a segment start before the zone-capacity and it ends beyond zone-capacity, then there are
some blocks before zone-capacity which can be used, so they are accounted for.
> 
> > +			continue;
> > +		sentry = get_seg_entry(sbi, start);
> >  		if (!sentry->valid_blocks)
> >  			__set_free(sbi, start);
> >  		else
> > @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
> >  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
> >  	struct free_segmap_info *free_i = FREE_I(sbi);
> >  	unsigned int segno = 0, offset = 0, secno;
> > -	unsigned short valid_blocks;
> > +	unsigned short valid_blocks, usable_blks_in_seg;
> >  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
> >
> >  	while (1) {
> > @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
> >  			break;
> >  		offset = segno + 1;
> >  		valid_blocks = get_valid_blocks(sbi, segno, false);
> > -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
> > +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
> > +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
> 
> It needs to traverse .cur_valid_map bitmap to check whether blocks in range of [0,
> usable_blks_in_seg] are all valid or not, if there is at least one usable block in the
> range, segment should be dirty.
For the segments which start and end before zone-capacity are just like any normal segments.
Segments which start after the zone-capacity are fully unusable and are marked as used in the free_seg_bitmap, so these segments are never used.
Segments which span across the zone-capacity have some unusable blocks. Even when blocks from these segments are allocated/deallocated the valid_blocks counter is incremented/decremented, reflecting the current valid_blocks count.
Comparing valid_blocks count with usable_blocks count in the segment can indicate if the segment is dirty or fully used.
Sorry, but could you please share why cur_valid_map needs to be traversed ?

> 
> One question, if we select dirty segment which across zone-capacity as opened
> segment (in curseg), how can we avoid allocating usable block beyong zone-capacity
> in such segment via .cur_valid_map?
For zoned devices, we have to allocate blocks sequentially, so it's always in LFS manner it is allocated.
The __has_curseg_space() checks for the usable blocks and stops allocating blocks after zone-capacity.
> 
> >  			continue;
> > -		if (valid_blocks > sbi->blocks_per_seg) {
> > +		if (valid_blocks > usable_blks_in_seg) {
> >  			f2fs_bug_on(sbi, 1);
> >  			continue;
> >  		}
> > @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct
> > f2fs_sb_info *sbi)
> >
> >  	return 0;
> >  }
> > +
> > +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
> > +						unsigned int dev_idx)
> > +{
> > +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
> > +		return true;
> > +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
> > +
> > +/* Return the zone index in the given device */ static unsigned int
> > +get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
> > +					int dev_idx)
> > +{
> > +	block_t sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi,
> > +secno));
> > +
> > +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
> > +						sbi->log_blocks_per_blkz;
> > +}
> > +
> > +/*
> > + * Return the usable segments in a section based on the zone's
> > + * corresponding zone capacity. Zone is equal to a section.
> > + */
> > +static inline unsigned int f2fs_usable_zone_segs_in_sec(
> > +		struct f2fs_sb_info *sbi, unsigned int segno) {
> > +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
> > +
> > +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
> > +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno), dev_idx);
> > +
> > +	/* Conventional zone's capacity is always equal to zone size */
> > +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> > +		return sbi->segs_per_sec;
> > +
> > +	/*
> > +	 * If the zone_capacity_blocks array is NULL, then zone capacity
> > +	 * is equal to the zone size for all zones
> > +	 */
> > +	if (!FDEV(dev_idx).zone_capacity_blocks)
> > +		return sbi->segs_per_sec;
> > +
> > +	/* Get the segment count beyond zone capacity block */
> > +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
> > +				FDEV(dev_idx).zone_capacity_blocks[zone_idx])
> >>
> > +				sbi->log_blocks_per_seg;
> > +	return sbi->segs_per_sec - unusable_segs_in_sec; }
> > +
> > +/*
> > + * Return the number of usable blocks in a segment. The number of
> > +blocks
> > + * returned is always equal to the number of blocks in a segment for
> > + * segments fully contained within a sequential zone capacity or a
> > + * conventional zone. For segments partially contained in a
> > +sequential
> > + * zone capacity, the number of usable blocks up to the zone capacity
> > + * is returned. 0 is returned in all other cases.
> > + */
> > +static inline unsigned int f2fs_usable_zone_blks_in_seg(
> > +			struct f2fs_sb_info *sbi, unsigned int segno) {
> > +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
> > +	unsigned int zone_idx, dev_idx, secno;
> > +
> > +	secno = GET_SEC_FROM_SEG(sbi, segno);
> > +	seg_start = START_BLOCK(sbi, segno);
> > +	dev_idx = f2fs_target_device_index(sbi, seg_start);
> > +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
> > +
> > +	/*
> > +	 * Conventional zone's capacity is always equal to zone size,
> > +	 * so, blocks per segment is unchanged.
> > +	 */
> > +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> > +		return sbi->blocks_per_seg;
> > +
> > +	if (!FDEV(dev_idx).zone_capacity_blocks)
> > +		return sbi->blocks_per_seg;
> > +
> > +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
> > +	sec_cap_blkaddr = sec_start_blkaddr +
> > +				FDEV(dev_idx).zone_capacity_blocks[zone_idx];
> > +
> > +	/*
> > +	 * If segment starts before zone capacity and spans beyond
> > +	 * zone capacity, then usable blocks are from seg start to
> > +	 * zone capacity. If the segment starts after the zone capacity,
> > +	 * then there are no usable blocks.
> > +	 */
> > +	if (seg_start >= sec_cap_blkaddr)
> > +		return 0;
> > +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
> > +		return sec_cap_blkaddr - seg_start;
> > +
> > +	return sbi->blocks_per_seg;
> > +}
> >  #else
> >  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)  { @@
> > -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct f2fs_sb_info
> > *sbi)  {
> >  	return 0;
> >  }
> > +
> > +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info *sbi,
> > +							unsigned int segno)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info *sbi,
> > +							unsigned int segno)
> > +{
> > +	return 0;
> > +}
> >  #endif
> > +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> > +					unsigned int segno)
> > +{
> > +	if (f2fs_sb_has_blkzoned(sbi))
> > +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
> > +
> > +	return sbi->blocks_per_seg;
> > +}
> > +
> > +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> > +					unsigned int segno)
> > +{
> > +	if (f2fs_sb_has_blkzoned(sbi))
> > +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
> > +
> > +	return sbi->segs_per_sec;
> > +}
> >
> >  /*
> >   * Update min, max modified time for cost-benefit GC algorithm diff
> > --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
> > f261e3e6a69b..79b0dc33feaf 100644
> > --- a/fs/f2fs/segment.h
> > +++ b/fs/f2fs/segment.h
> > @@ -411,6 +411,7 @@ static inline void __set_free(struct f2fs_sb_info *sbi,
> unsigned int segno)
> >  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> >  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> >  	unsigned int next;
> > +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> >
> >  	spin_lock(&free_i->segmap_lock);
> >  	clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@ static
> > inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
> >
> >  	next = find_next_bit(free_i->free_segmap,
> >  			start_segno + sbi->segs_per_sec, start_segno);
> > -	if (next >= start_segno + sbi->segs_per_sec) {
> > +	if (next >= start_segno + usable_segs) {
> >  		clear_bit(secno, free_i->free_secmap);
> >  		free_i->free_sections++;
> >  	}
> > @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info
> *sbi,
> >  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> >  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> >  	unsigned int next;
> > +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> >
> >  	spin_lock(&free_i->segmap_lock);
> >  	if (test_and_clear_bit(segno, free_i->free_segmap)) { @@ -453,7
> > +455,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
> >  			goto skip_free;
> >  		next = find_next_bit(free_i->free_segmap,
> >  				start_segno + sbi->segs_per_sec, start_segno);
> > -		if (next >= start_segno + sbi->segs_per_sec) {
> > +		if (next >= start_segno + usable_segs) {
> >  			if (test_and_clear_bit(secno, free_i->free_secmap))
> >  				free_i->free_sections++;
> >  		}
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index
> > 80cb7cd358f8..2686b07ae7eb 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
> >  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);  #ifdef
> CONFIG_BLK_DEV_ZONED
> >  		kvfree(FDEV(i).blkz_seq);
> > +		kvfree(FDEV(i).zone_capacity_blocks);
> 
> Now, f2fs_kzalloc won't allocate vmalloc's memory, so it's safe to use kfree().
Ok
> 
> >  #endif
> >  	}
> >  	kvfree(sbi->devs);
> > @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
> > f2fs_sb_info *sbi)  }
> >
> >  #ifdef CONFIG_BLK_DEV_ZONED
> > +
> > +struct f2fs_report_zones_args {
> > +	struct f2fs_dev_info *dev;
> > +	bool zone_cap_mismatch;
> > +};
> > +
> >  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> > -			       void *data)
> > +			      void *data)
> >  {
> > -	struct f2fs_dev_info *dev = data;
> > +	struct f2fs_report_zones_args *rz_args = data;
> > +
> > +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> > +		return 0;
> > +
> > +	set_bit(idx, rz_args->dev->blkz_seq);
> > +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
> > +						F2FS_LOG_SECTORS_PER_BLOCK;
> > +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
> > +		rz_args->zone_cap_mismatch = true;
> >
> > -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
> > -		set_bit(idx, dev->blkz_seq);
> >  	return 0;
> >  }
> >
> > @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct f2fs_sb_info
> > *sbi, int devi)  {
> >  	struct block_device *bdev = FDEV(devi).bdev;
> >  	sector_t nr_sectors = bdev->bd_part->nr_sects;
> > +	struct f2fs_report_zones_args rep_zone_arg;
> >  	int ret;
> >
> >  	if (!f2fs_sb_has_blkzoned(sbi))
> > @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int
> devi)
> >  	if (!FDEV(devi).blkz_seq)
> >  		return -ENOMEM;
> >
> > -	/* Get block zones type */
> > +	/* Get block zones type and zone-capacity */
> > +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
> > +					FDEV(devi).nr_blkz * sizeof(block_t),
> > +					GFP_KERNEL);
> > +	if (!FDEV(devi).zone_capacity_blocks)
> > +		return -ENOMEM;
> > +
> > +	rep_zone_arg.dev = &FDEV(devi);
> > +	rep_zone_arg.zone_cap_mismatch = false;
> > +
> >  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
> > -				  &FDEV(devi));
> > +				  &rep_zone_arg);
> >  	if (ret < 0)
> >  		return ret;
> 
> Missed to call kfree(FDEV(devi).zone_capacity_blocks)?
Thanks for catching it. Will free it here also.
> 
> >
> > +	if (!rep_zone_arg.zone_cap_mismatch) {
> > +		kvfree(FDEV(devi).zone_capacity_blocks);
> 
> Ditto, kfree().
Ok.
> 
> Thanks,
> 
> > +		FDEV(devi).zone_capacity_blocks = NULL;
> > +	}
> > +
> >  	return 0;
> >  }
> >  #endif
> >


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-07 18:23     ` Aravind Ramesh
@ 2020-07-08  2:33       ` Chao Yu
  2020-07-08 13:04         ` Aravind Ramesh
  0 siblings, 1 reply; 16+ messages in thread
From: Chao Yu @ 2020-07-08  2:33 UTC (permalink / raw)
  To: Aravind Ramesh, jaegeuk, linux-fsdevel, linux-f2fs-devel, hch
  Cc: Niklas Cassel, Damien Le Moal, Matias Bjorling

On 2020/7/8 2:23, Aravind Ramesh wrote:
> Thanks for review Chao Yu.
> Please find my response inline.
> I will re-send a V2 after incorporating your comments.
> 
> Regards,
> Aravind
> 
>> -----Original Message-----
>> From: Chao Yu <yuchao0@huawei.com>
>> Sent: Tuesday, July 7, 2020 5:49 PM
>> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org; linux-
>> fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
>> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
>> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
>> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone size
>>
>> On 2020/7/2 23:54, Aravind Ramesh wrote:
>>> NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
>>> Zone-capacity indicates the maximum number of sectors that are usable
>>> in a zone beginning from the first sector of the zone. This makes the
>>> sectors sectors after the zone-capacity till zone-size to be unusable.
>>> This patch set tracks zone-size and zone-capacity in zoned devices and
>>> calculate the usable blocks per segment and usable segments per section.
>>>
>>> If zone-capacity is less than zone-size mark only those segments which
>>> start before zone-capacity as free segments. All segments at and
>>> beyond zone-capacity are treated as permanently used segments. In
>>> cases where zone-capacity does not align with segment size the last
>>> segment will start before zone-capacity and end beyond the
>>> zone-capacity of the zone. For such spanning segments only sectors within the
>> zone-capacity are used.
>>>
>>> Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>>> ---
>>>  fs/f2fs/f2fs.h    |   5 ++
>>>  fs/f2fs/segment.c | 136
>> ++++++++++++++++++++++++++++++++++++++++++++--
>>>  fs/f2fs/segment.h |   6 +-
>>>  fs/f2fs/super.c   |  41 ++++++++++++--
>>>  4 files changed, 176 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index
>>> e6e47618a357..73219e4e1ba4 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -1232,6 +1232,7 @@ struct f2fs_dev_info {  #ifdef
>>> CONFIG_BLK_DEV_ZONED
>>>  	unsigned int nr_blkz;		/* Total number of zones */
>>>  	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
>>> +	block_t *zone_capacity_blocks;  /* Array of zone capacity in blks */
>>>  #endif
>>>  };
>>>
>>> @@ -3395,6 +3396,10 @@ void f2fs_destroy_segment_manager_caches(void);
>>>  int f2fs_rw_hint_to_seg_type(enum rw_hint hint);  enum rw_hint
>>> f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
>>>  			enum page_type type, enum temp_type temp);
>>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
>>> +			unsigned int segno);
>>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
>>> +			unsigned int segno);
>>>
>>>  /*
>>>   * checkpoint.c
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
>>> c35614d255e1..d2156f3f56a5 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct
>>> f2fs_sb_info *sbi)  {
>>>  	unsigned int start;
>>>  	int type;
>>> +	struct seg_entry *sentry;
>>>
>>>  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
>>> -		struct seg_entry *sentry = get_seg_entry(sbi, start);
>>> +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
>>
>> If usable blocks count is zero, shouldn't we update SIT_I(sbi)->written_valid_blocks
>> as we did when there is partial usable block in current segment?
> If usable_block_count is zero, then it is like a dead segment, all blocks in the segment lie after the
> zone-capacity in the zone. So there can never be a valid written content on these segments, hence it is not updated. 
> In the other case, when a segment start before the zone-capacity and it ends beyond zone-capacity, then there are
> some blocks before zone-capacity which can be used, so they are accounted for.

I'm thinking that for limit_free_user_blocks() function, it assumes all
unwritten blocks as potential reclaimable blocks, however segment after
zone-capacity should never be used or reclaimable, it looks calculation
could be not correct here.

static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi)
{
	block_t reclaimable_user_blocks = sbi->user_block_count -
		written_block_count(sbi);
	return (long)(reclaimable_user_blocks * LIMIT_FREE_BLOCK) / 100;
}

static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi)
{
	block_t invalid_user_blocks = sbi->user_block_count -
					written_block_count(sbi);
	/*
	 * Background GC is triggered with the following conditions.
	 * 1. There are a number of invalid blocks.
	 * 2. There is not enough free space.
	 */
	if (invalid_user_blocks > limit_invalid_user_blocks(sbi) &&
			free_user_blocks(sbi) < limit_free_user_blocks(sbi))

-- In this condition, free_user_blocks() doesn't include segments after
zone-capacity, however limit_free_user_blocks() includes them.

		return true;
	return false;
}


>>
>>> +			continue;
>>> +		sentry = get_seg_entry(sbi, start);
>>>  		if (!sentry->valid_blocks)
>>>  			__set_free(sbi, start);
>>>  		else
>>> @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
>>>  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>>>  	struct free_segmap_info *free_i = FREE_I(sbi);
>>>  	unsigned int segno = 0, offset = 0, secno;
>>> -	unsigned short valid_blocks;
>>> +	unsigned short valid_blocks, usable_blks_in_seg;
>>>  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
>>>
>>>  	while (1) {
>>> @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
>>>  			break;
>>>  		offset = segno + 1;
>>>  		valid_blocks = get_valid_blocks(sbi, segno, false);
>>> -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
>>> +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
>>> +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
>>
>> It needs to traverse .cur_valid_map bitmap to check whether blocks in range of [0,
>> usable_blks_in_seg] are all valid or not, if there is at least one usable block in the
>> range, segment should be dirty.
> For the segments which start and end before zone-capacity are just like any normal segments.
> Segments which start after the zone-capacity are fully unusable and are marked as used in the free_seg_bitmap, so these segments are never used.
> Segments which span across the zone-capacity have some unusable blocks. Even when blocks from these segments are allocated/deallocated the valid_blocks counter is incremented/decremented, reflecting the current valid_blocks count.
> Comparing valid_blocks count with usable_blocks count in the segment can indicate if the segment is dirty or fully used.

I thought that if there is one valid block locates in range of
[usable_blks_in_seg, blks_per_seg] (after zone-capacity), the condition
will be incorrect. That should never happen, right?

If so, how about adjusting check_block_count() to do sanity check on bitmap
locates after zone-capacity to make sure there is no free slots there.

> Sorry, but could you please share why cur_valid_map needs to be traversed ?
> 
>>
>> One question, if we select dirty segment which across zone-capacity as opened
>> segment (in curseg), how can we avoid allocating usable block beyong zone-capacity
>> in such segment via .cur_valid_map?
> For zoned devices, we have to allocate blocks sequentially, so it's always in LFS manner it is allocated.
> The __has_curseg_space() checks for the usable blocks and stops allocating blocks after zone-capacity.

Oh, that was implemented in patch 2, I haven't checked that patch...sorry,
however, IMO, patch should be made to apply independently, what if do allocation
only after applying patch 1..., do we need to merge them into one?

>>
>>>  			continue;
>>> -		if (valid_blocks > sbi->blocks_per_seg) {
>>> +		if (valid_blocks > usable_blks_in_seg) {
>>>  			f2fs_bug_on(sbi, 1);
>>>  			continue;
>>>  		}
>>> @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct
>>> f2fs_sb_info *sbi)
>>>
>>>  	return 0;
>>>  }
>>> +
>>> +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
>>> +						unsigned int dev_idx)
>>> +{
>>> +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
>>> +		return true;
>>> +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
>>> +
>>> +/* Return the zone index in the given device */ static unsigned int
>>> +get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
>>> +					int dev_idx)
>>> +{
>>> +	block_t sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi,
>>> +secno));
>>> +
>>> +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
>>> +						sbi->log_blocks_per_blkz;
>>> +}
>>> +
>>> +/*
>>> + * Return the usable segments in a section based on the zone's
>>> + * corresponding zone capacity. Zone is equal to a section.
>>> + */
>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(
>>> +		struct f2fs_sb_info *sbi, unsigned int segno) {
>>> +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
>>> +
>>> +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
>>> +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno), dev_idx);
>>> +
>>> +	/* Conventional zone's capacity is always equal to zone size */
>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
>>> +		return sbi->segs_per_sec;
>>> +
>>> +	/*
>>> +	 * If the zone_capacity_blocks array is NULL, then zone capacity
>>> +	 * is equal to the zone size for all zones
>>> +	 */
>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
>>> +		return sbi->segs_per_sec;
>>> +
>>> +	/* Get the segment count beyond zone capacity block */
>>> +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
>>> +				FDEV(dev_idx).zone_capacity_blocks[zone_idx])
>>>>
>>> +				sbi->log_blocks_per_seg;
>>> +	return sbi->segs_per_sec - unusable_segs_in_sec; }
>>> +
>>> +/*
>>> + * Return the number of usable blocks in a segment. The number of
>>> +blocks
>>> + * returned is always equal to the number of blocks in a segment for
>>> + * segments fully contained within a sequential zone capacity or a
>>> + * conventional zone. For segments partially contained in a
>>> +sequential
>>> + * zone capacity, the number of usable blocks up to the zone capacity
>>> + * is returned. 0 is returned in all other cases.
>>> + */
>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(
>>> +			struct f2fs_sb_info *sbi, unsigned int segno) {
>>> +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
>>> +	unsigned int zone_idx, dev_idx, secno;
>>> +
>>> +	secno = GET_SEC_FROM_SEG(sbi, segno);
>>> +	seg_start = START_BLOCK(sbi, segno);
>>> +	dev_idx = f2fs_target_device_index(sbi, seg_start);
>>> +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
>>> +
>>> +	/*
>>> +	 * Conventional zone's capacity is always equal to zone size,
>>> +	 * so, blocks per segment is unchanged.
>>> +	 */
>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
>>> +		return sbi->blocks_per_seg;
>>> +
>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
>>> +		return sbi->blocks_per_seg;
>>> +
>>> +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
>>> +	sec_cap_blkaddr = sec_start_blkaddr +
>>> +				FDEV(dev_idx).zone_capacity_blocks[zone_idx];
>>> +
>>> +	/*
>>> +	 * If segment starts before zone capacity and spans beyond
>>> +	 * zone capacity, then usable blocks are from seg start to
>>> +	 * zone capacity. If the segment starts after the zone capacity,
>>> +	 * then there are no usable blocks.
>>> +	 */
>>> +	if (seg_start >= sec_cap_blkaddr)
>>> +		return 0;
>>> +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
>>> +		return sec_cap_blkaddr - seg_start;
>>> +
>>> +	return sbi->blocks_per_seg;
>>> +}
>>>  #else
>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)  { @@
>>> -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct f2fs_sb_info
>>> *sbi)  {
>>>  	return 0;
>>>  }
>>> +
>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info *sbi,
>>> +							unsigned int segno)
>>> +{
>>> +	return 0;
>>> +}
>>> +
>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info *sbi,
>>> +							unsigned int segno)
>>> +{
>>> +	return 0;
>>> +}
>>>  #endif
>>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
>>> +					unsigned int segno)
>>> +{
>>> +	if (f2fs_sb_has_blkzoned(sbi))
>>> +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
>>> +
>>> +	return sbi->blocks_per_seg;
>>> +}
>>> +
>>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
>>> +					unsigned int segno)
>>> +{
>>> +	if (f2fs_sb_has_blkzoned(sbi))
>>> +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
>>> +
>>> +	return sbi->segs_per_sec;
>>> +}
>>>
>>>  /*
>>>   * Update min, max modified time for cost-benefit GC algorithm diff
>>> --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
>>> f261e3e6a69b..79b0dc33feaf 100644
>>> --- a/fs/f2fs/segment.h
>>> +++ b/fs/f2fs/segment.h
>>> @@ -411,6 +411,7 @@ static inline void __set_free(struct f2fs_sb_info *sbi,
>> unsigned int segno)
>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>>>  	unsigned int next;
>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>>>
>>>  	spin_lock(&free_i->segmap_lock);
>>>  	clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@ static
>>> inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
>>>
>>>  	next = find_next_bit(free_i->free_segmap,
>>>  			start_segno + sbi->segs_per_sec, start_segno);
>>> -	if (next >= start_segno + sbi->segs_per_sec) {
>>> +	if (next >= start_segno + usable_segs) {
>>>  		clear_bit(secno, free_i->free_secmap);
>>>  		free_i->free_sections++;
>>>  	}
>>> @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info
>> *sbi,
>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>>>  	unsigned int next;
>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>>>
>>>  	spin_lock(&free_i->segmap_lock);
>>>  	if (test_and_clear_bit(segno, free_i->free_segmap)) { @@ -453,7
>>> +455,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
>>>  			goto skip_free;
>>>  		next = find_next_bit(free_i->free_segmap,
>>>  				start_segno + sbi->segs_per_sec, start_segno);
>>> -		if (next >= start_segno + sbi->segs_per_sec) {
>>> +		if (next >= start_segno + usable_segs) {
>>>  			if (test_and_clear_bit(secno, free_i->free_secmap))
>>>  				free_i->free_sections++;
>>>  		}
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index
>>> 80cb7cd358f8..2686b07ae7eb 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
>>>  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);  #ifdef
>> CONFIG_BLK_DEV_ZONED
>>>  		kvfree(FDEV(i).blkz_seq);
>>> +		kvfree(FDEV(i).zone_capacity_blocks);
>>
>> Now, f2fs_kzalloc won't allocate vmalloc's memory, so it's safe to use kfree().
> Ok
>>
>>>  #endif
>>>  	}
>>>  	kvfree(sbi->devs);
>>> @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
>>> f2fs_sb_info *sbi)  }
>>>
>>>  #ifdef CONFIG_BLK_DEV_ZONED
>>> +
>>> +struct f2fs_report_zones_args {
>>> +	struct f2fs_dev_info *dev;
>>> +	bool zone_cap_mismatch;
>>> +};
>>> +
>>>  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
>>> -			       void *data)
>>> +			      void *data)
>>>  {
>>> -	struct f2fs_dev_info *dev = data;
>>> +	struct f2fs_report_zones_args *rz_args = data;
>>> +
>>> +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
>>> +		return 0;
>>> +
>>> +	set_bit(idx, rz_args->dev->blkz_seq);
>>> +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
>>> +						F2FS_LOG_SECTORS_PER_BLOCK;
>>> +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
>>> +		rz_args->zone_cap_mismatch = true;
>>>
>>> -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
>>> -		set_bit(idx, dev->blkz_seq);
>>>  	return 0;
>>>  }
>>>
>>> @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct f2fs_sb_info
>>> *sbi, int devi)  {
>>>  	struct block_device *bdev = FDEV(devi).bdev;
>>>  	sector_t nr_sectors = bdev->bd_part->nr_sects;
>>> +	struct f2fs_report_zones_args rep_zone_arg;
>>>  	int ret;
>>>
>>>  	if (!f2fs_sb_has_blkzoned(sbi))
>>> @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int
>> devi)
>>>  	if (!FDEV(devi).blkz_seq)
>>>  		return -ENOMEM;
>>>
>>> -	/* Get block zones type */
>>> +	/* Get block zones type and zone-capacity */
>>> +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
>>> +					FDEV(devi).nr_blkz * sizeof(block_t),
>>> +					GFP_KERNEL);
>>> +	if (!FDEV(devi).zone_capacity_blocks)
>>> +		return -ENOMEM;
>>> +
>>> +	rep_zone_arg.dev = &FDEV(devi);
>>> +	rep_zone_arg.zone_cap_mismatch = false;
>>> +
>>>  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
>>> -				  &FDEV(devi));
>>> +				  &rep_zone_arg);
>>>  	if (ret < 0)
>>>  		return ret;
>>
>> Missed to call kfree(FDEV(devi).zone_capacity_blocks)?
> Thanks for catching it. Will free it here also.
>>
>>>
>>> +	if (!rep_zone_arg.zone_cap_mismatch) {
>>> +		kvfree(FDEV(devi).zone_capacity_blocks);
>>
>> Ditto, kfree().
> Ok.
>>
>> Thanks,
>>
>>> +		FDEV(devi).zone_capacity_blocks = NULL;
>>> +	}
>>> +
>>>  	return 0;
>>>  }
>>>  #endif
>>>
> .
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-08  2:33       ` Chao Yu
@ 2020-07-08 13:04         ` Aravind Ramesh
  2020-07-09  2:55           ` Chao Yu
  0 siblings, 1 reply; 16+ messages in thread
From: Aravind Ramesh @ 2020-07-08 13:04 UTC (permalink / raw)
  To: Chao Yu, jaegeuk, linux-fsdevel, linux-f2fs-devel, hch
  Cc: Niklas Cassel, Damien Le Moal, Matias Bjorling

Please find my response inline.

Thanks,
Aravind

> -----Original Message-----
> From: Chao Yu <yuchao0@huawei.com>
> Sent: Wednesday, July 8, 2020 8:04 AM
> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org; linux-
> fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone size
> 
> On 2020/7/8 2:23, Aravind Ramesh wrote:
> > Thanks for review Chao Yu.
> > Please find my response inline.
> > I will re-send a V2 after incorporating your comments.
> >
> > Regards,
> > Aravind
> >
> >> -----Original Message-----
> >> From: Chao Yu <yuchao0@huawei.com>
> >> Sent: Tuesday, July 7, 2020 5:49 PM
> >> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org;
> >> linux- fsdevel@vger.kernel.org;
> >> linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
> >> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
> >> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> >> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone
> >> size
> >>
> >> On 2020/7/2 23:54, Aravind Ramesh wrote:
> >>> NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
> >>> Zone-capacity indicates the maximum number of sectors that are
> >>> usable in a zone beginning from the first sector of the zone. This
> >>> makes the sectors sectors after the zone-capacity till zone-size to be unusable.
> >>> This patch set tracks zone-size and zone-capacity in zoned devices
> >>> and calculate the usable blocks per segment and usable segments per section.
> >>>
> >>> If zone-capacity is less than zone-size mark only those segments
> >>> which start before zone-capacity as free segments. All segments at
> >>> and beyond zone-capacity are treated as permanently used segments.
> >>> In cases where zone-capacity does not align with segment size the
> >>> last segment will start before zone-capacity and end beyond the
> >>> zone-capacity of the zone. For such spanning segments only sectors
> >>> within the
> >> zone-capacity are used.
> >>>
> >>> Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
> >>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> >>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> >>> ---
> >>>  fs/f2fs/f2fs.h    |   5 ++
> >>>  fs/f2fs/segment.c | 136
> >> ++++++++++++++++++++++++++++++++++++++++++++--
> >>>  fs/f2fs/segment.h |   6 +-
> >>>  fs/f2fs/super.c   |  41 ++++++++++++--
> >>>  4 files changed, 176 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index
> >>> e6e47618a357..73219e4e1ba4 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -1232,6 +1232,7 @@ struct f2fs_dev_info {  #ifdef
> >>> CONFIG_BLK_DEV_ZONED
> >>>  	unsigned int nr_blkz;		/* Total number of zones */
> >>>  	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
> >>> +	block_t *zone_capacity_blocks;  /* Array of zone capacity in blks
> >>> +*/
> >>>  #endif
> >>>  };
> >>>
> >>> @@ -3395,6 +3396,10 @@ void
> >>> f2fs_destroy_segment_manager_caches(void);
> >>>  int f2fs_rw_hint_to_seg_type(enum rw_hint hint);  enum rw_hint
> >>> f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
> >>>  			enum page_type type, enum temp_type temp);
> >>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> >>> +			unsigned int segno);
> >>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> >>> +			unsigned int segno);
> >>>
> >>>  /*
> >>>   * checkpoint.c
> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
> >>> c35614d255e1..d2156f3f56a5 100644
> >>> --- a/fs/f2fs/segment.c
> >>> +++ b/fs/f2fs/segment.c
> >>> @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct
> >>> f2fs_sb_info *sbi)  {
> >>>  	unsigned int start;
> >>>  	int type;
> >>> +	struct seg_entry *sentry;
> >>>
> >>>  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
> >>> -		struct seg_entry *sentry = get_seg_entry(sbi, start);
> >>> +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
> >>
> >> If usable blocks count is zero, shouldn't we update
> >> SIT_I(sbi)->written_valid_blocks as we did when there is partial usable block in
> current segment?
> > If usable_block_count is zero, then it is like a dead segment, all
> > blocks in the segment lie after the zone-capacity in the zone. So there can never be
> a valid written content on these segments, hence it is not updated.
> > In the other case, when a segment start before the zone-capacity and
> > it ends beyond zone-capacity, then there are some blocks before zone-capacity
> which can be used, so they are accounted for.
> 
> I'm thinking that for limit_free_user_blocks() function, it assumes all unwritten
> blocks as potential reclaimable blocks, however segment after zone-capacity should
> never be used or reclaimable, it looks calculation could be not correct here.
> 
The sbi->user_block_count is updated with the total usable_blocks in the full 
file system during the formatting of the file system using mkfs.f2fs. Please see the f2fs-tools
patch series that I have submitted along with this patch set. 

So sbi->user_block_count reflects the actual number of usable blocks (i.e. total blocks - unusable blocks).

> static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi) {
> 	block_t reclaimable_user_blocks = sbi->user_block_count -
> 		written_block_count(sbi);
> 	return (long)(reclaimable_user_blocks * LIMIT_FREE_BLOCK) / 100; }
> 
> static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi) {
> 	block_t invalid_user_blocks = sbi->user_block_count -
> 					written_block_count(sbi);
> 	/*
> 	 * Background GC is triggered with the following conditions.
> 	 * 1. There are a number of invalid blocks.
> 	 * 2. There is not enough free space.
> 	 */
> 	if (invalid_user_blocks > limit_invalid_user_blocks(sbi) &&
> 			free_user_blocks(sbi) < limit_free_user_blocks(sbi))
> 
> -- In this condition, free_user_blocks() doesn't include segments after zone-capacity,
> however limit_free_user_blocks() includes them.
In the second patch of this patch set, free_user_blocks is updated to account for the segments after zone-capacity.
It basically gets the free segment(segments before zone capacity and free) block count and deducts the 
overprovision segment block count. It also considers the spanning segments block count into account.


> 
> 		return true;
> 	return false;
> }
> 
> 
> >>
> >>> +			continue;
> >>> +		sentry = get_seg_entry(sbi, start);
> >>>  		if (!sentry->valid_blocks)
> >>>  			__set_free(sbi, start);
> >>>  		else
> >>> @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct f2fs_sb_info
> *sbi)
> >>>  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
> >>>  	struct free_segmap_info *free_i = FREE_I(sbi);
> >>>  	unsigned int segno = 0, offset = 0, secno;
> >>> -	unsigned short valid_blocks;
> >>> +	unsigned short valid_blocks, usable_blks_in_seg;
> >>>  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
> >>>
> >>>  	while (1) {
> >>> @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct f2fs_sb_info
> *sbi)
> >>>  			break;
> >>>  		offset = segno + 1;
> >>>  		valid_blocks = get_valid_blocks(sbi, segno, false);
> >>> -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
> >>> +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
> >>> +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
> >>
> >> It needs to traverse .cur_valid_map bitmap to check whether blocks in
> >> range of [0, usable_blks_in_seg] are all valid or not, if there is at
> >> least one usable block in the range, segment should be dirty.
> > For the segments which start and end before zone-capacity are just like any
> normal segments.
> > Segments which start after the zone-capacity are fully unusable and are marked as
> used in the free_seg_bitmap, so these segments are never used.
> > Segments which span across the zone-capacity have some unusable blocks. Even
> when blocks from these segments are allocated/deallocated the valid_blocks
> counter is incremented/decremented, reflecting the current valid_blocks count.
> > Comparing valid_blocks count with usable_blocks count in the segment can
> indicate if the segment is dirty or fully used.
> 
> I thought that if there is one valid block locates in range of [usable_blks_in_seg,
> blks_per_seg] (after zone-capacity), the condition will be incorrect. That should
> never happen, right?
Yes, this will never happen. All blocks after zone-capacity are never usable.
> 
> If so, how about adjusting check_block_count() to do sanity check on bitmap locates
> after zone-capacity to make sure there is no free slots there.

Ok, I will add this check in check_block_count. It makes sense.

> 
> > Sorry, but could you please share why cur_valid_map needs to be traversed ?
> >
> >>
> >> One question, if we select dirty segment which across zone-capacity
> >> as opened segment (in curseg), how can we avoid allocating usable
> >> block beyong zone-capacity in such segment via .cur_valid_map?
> > For zoned devices, we have to allocate blocks sequentially, so it's always in LFS
> manner it is allocated.
> > The __has_curseg_space() checks for the usable blocks and stops allocating blocks
> after zone-capacity.
> 
> Oh, that was implemented in patch 2, I haven't checked that patch...sorry, however,
> IMO, patch should be made to apply independently, what if do allocation only after
> applying patch 1..., do we need to merge them into one?
The patches were split keeping in mind that all data structure related and initialization
Changes would go into patch 1 and IO path and GC related changes in patch 2.
But if you think, merging them to a single patch will be easier to review, 
then I shall merge it and send it as one patch in V2, along with other suggestions incorporated.

Please let me know.
> 
> >>
> >>>  			continue;
> >>> -		if (valid_blocks > sbi->blocks_per_seg) {
> >>> +		if (valid_blocks > usable_blks_in_seg) {
> >>>  			f2fs_bug_on(sbi, 1);
> >>>  			continue;
> >>>  		}
> >>> @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct
> >>> f2fs_sb_info *sbi)
> >>>
> >>>  	return 0;
> >>>  }
> >>> +
> >>> +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
> >>> +						unsigned int dev_idx)
> >>> +{
> >>> +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
> >>> +		return true;
> >>> +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
> >>> +
> >>> +/* Return the zone index in the given device */ static unsigned int
> >>> +get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
> >>> +					int dev_idx)
> >>> +{
> >>> +	block_t sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi,
> >>> +secno));
> >>> +
> >>> +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
> >>> +						sbi->log_blocks_per_blkz;
> >>> +}
> >>> +
> >>> +/*
> >>> + * Return the usable segments in a section based on the zone's
> >>> + * corresponding zone capacity. Zone is equal to a section.
> >>> + */
> >>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(
> >>> +		struct f2fs_sb_info *sbi, unsigned int segno) {
> >>> +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
> >>> +
> >>> +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
> >>> +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno),
> >>> +dev_idx);
> >>> +
> >>> +	/* Conventional zone's capacity is always equal to zone size */
> >>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> >>> +		return sbi->segs_per_sec;
> >>> +
> >>> +	/*
> >>> +	 * If the zone_capacity_blocks array is NULL, then zone capacity
> >>> +	 * is equal to the zone size for all zones
> >>> +	 */
> >>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> >>> +		return sbi->segs_per_sec;
> >>> +
> >>> +	/* Get the segment count beyond zone capacity block */
> >>> +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
> >>> +				FDEV(dev_idx).zone_capacity_blocks[zone_idx])
> >>>>
> >>> +				sbi->log_blocks_per_seg;
> >>> +	return sbi->segs_per_sec - unusable_segs_in_sec; }
> >>> +
> >>> +/*
> >>> + * Return the number of usable blocks in a segment. The number of
> >>> +blocks
> >>> + * returned is always equal to the number of blocks in a segment
> >>> +for
> >>> + * segments fully contained within a sequential zone capacity or a
> >>> + * conventional zone. For segments partially contained in a
> >>> +sequential
> >>> + * zone capacity, the number of usable blocks up to the zone
> >>> +capacity
> >>> + * is returned. 0 is returned in all other cases.
> >>> + */
> >>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(
> >>> +			struct f2fs_sb_info *sbi, unsigned int segno) {
> >>> +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
> >>> +	unsigned int zone_idx, dev_idx, secno;
> >>> +
> >>> +	secno = GET_SEC_FROM_SEG(sbi, segno);
> >>> +	seg_start = START_BLOCK(sbi, segno);
> >>> +	dev_idx = f2fs_target_device_index(sbi, seg_start);
> >>> +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
> >>> +
> >>> +	/*
> >>> +	 * Conventional zone's capacity is always equal to zone size,
> >>> +	 * so, blocks per segment is unchanged.
> >>> +	 */
> >>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> >>> +		return sbi->blocks_per_seg;
> >>> +
> >>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> >>> +		return sbi->blocks_per_seg;
> >>> +
> >>> +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
> >>> +	sec_cap_blkaddr = sec_start_blkaddr +
> >>> +				FDEV(dev_idx).zone_capacity_blocks[zone_idx];
> >>> +
> >>> +	/*
> >>> +	 * If segment starts before zone capacity and spans beyond
> >>> +	 * zone capacity, then usable blocks are from seg start to
> >>> +	 * zone capacity. If the segment starts after the zone capacity,
> >>> +	 * then there are no usable blocks.
> >>> +	 */
> >>> +	if (seg_start >= sec_cap_blkaddr)
> >>> +		return 0;
> >>> +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
> >>> +		return sec_cap_blkaddr - seg_start;
> >>> +
> >>> +	return sbi->blocks_per_seg;
> >>> +}
> >>>  #else
> >>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)  { @@
> >>> -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct f2fs_sb_info
> >>> *sbi)  {
> >>>  	return 0;
> >>>  }
> >>> +
> >>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info
> *sbi,
> >>> +							unsigned int segno)
> >>> +{
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info
> *sbi,
> >>> +							unsigned int segno)
> >>> +{
> >>> +	return 0;
> >>> +}
> >>>  #endif
> >>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> >>> +					unsigned int segno)
> >>> +{
> >>> +	if (f2fs_sb_has_blkzoned(sbi))
> >>> +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
> >>> +
> >>> +	return sbi->blocks_per_seg;
> >>> +}
> >>> +
> >>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> >>> +					unsigned int segno)
> >>> +{
> >>> +	if (f2fs_sb_has_blkzoned(sbi))
> >>> +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
> >>> +
> >>> +	return sbi->segs_per_sec;
> >>> +}
> >>>
> >>>  /*
> >>>   * Update min, max modified time for cost-benefit GC algorithm diff
> >>> --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
> >>> f261e3e6a69b..79b0dc33feaf 100644
> >>> --- a/fs/f2fs/segment.h
> >>> +++ b/fs/f2fs/segment.h
> >>> @@ -411,6 +411,7 @@ static inline void __set_free(struct
> >>> f2fs_sb_info *sbi,
> >> unsigned int segno)
> >>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> >>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> >>>  	unsigned int next;
> >>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> >>>
> >>>  	spin_lock(&free_i->segmap_lock);
> >>>  	clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@ static
> >>> inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
> >>>
> >>>  	next = find_next_bit(free_i->free_segmap,
> >>>  			start_segno + sbi->segs_per_sec, start_segno);
> >>> -	if (next >= start_segno + sbi->segs_per_sec) {
> >>> +	if (next >= start_segno + usable_segs) {
> >>>  		clear_bit(secno, free_i->free_secmap);
> >>>  		free_i->free_sections++;
> >>>  	}
> >>> @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct
> >>> f2fs_sb_info
> >> *sbi,
> >>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> >>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> >>>  	unsigned int next;
> >>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> >>>
> >>>  	spin_lock(&free_i->segmap_lock);
> >>>  	if (test_and_clear_bit(segno, free_i->free_segmap)) { @@ -453,7
> >>> +455,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info
> >>> +*sbi,
> >>>  			goto skip_free;
> >>>  		next = find_next_bit(free_i->free_segmap,
> >>>  				start_segno + sbi->segs_per_sec, start_segno);
> >>> -		if (next >= start_segno + sbi->segs_per_sec) {
> >>> +		if (next >= start_segno + usable_segs) {
> >>>  			if (test_and_clear_bit(secno, free_i->free_secmap))
> >>>  				free_i->free_sections++;
> >>>  		}
> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index
> >>> 80cb7cd358f8..2686b07ae7eb 100644
> >>> --- a/fs/f2fs/super.c
> >>> +++ b/fs/f2fs/super.c
> >>> @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct f2fs_sb_info
> *sbi)
> >>>  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);  #ifdef
> >> CONFIG_BLK_DEV_ZONED
> >>>  		kvfree(FDEV(i).blkz_seq);
> >>> +		kvfree(FDEV(i).zone_capacity_blocks);
> >>
> >> Now, f2fs_kzalloc won't allocate vmalloc's memory, so it's safe to use kfree().
> > Ok
> >>
> >>>  #endif
> >>>  	}
> >>>  	kvfree(sbi->devs);
> >>> @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
> >>> f2fs_sb_info *sbi)  }
> >>>
> >>>  #ifdef CONFIG_BLK_DEV_ZONED
> >>> +
> >>> +struct f2fs_report_zones_args {
> >>> +	struct f2fs_dev_info *dev;
> >>> +	bool zone_cap_mismatch;
> >>> +};
> >>> +
> >>>  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> >>> -			       void *data)
> >>> +			      void *data)
> >>>  {
> >>> -	struct f2fs_dev_info *dev = data;
> >>> +	struct f2fs_report_zones_args *rz_args = data;
> >>> +
> >>> +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> >>> +		return 0;
> >>> +
> >>> +	set_bit(idx, rz_args->dev->blkz_seq);
> >>> +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
> >>> +						F2FS_LOG_SECTORS_PER_BLOCK;
> >>> +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
> >>> +		rz_args->zone_cap_mismatch = true;
> >>>
> >>> -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
> >>> -		set_bit(idx, dev->blkz_seq);
> >>>  	return 0;
> >>>  }
> >>>
> >>> @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct f2fs_sb_info
> >>> *sbi, int devi)  {
> >>>  	struct block_device *bdev = FDEV(devi).bdev;
> >>>  	sector_t nr_sectors = bdev->bd_part->nr_sects;
> >>> +	struct f2fs_report_zones_args rep_zone_arg;
> >>>  	int ret;
> >>>
> >>>  	if (!f2fs_sb_has_blkzoned(sbi))
> >>> @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct
> >>> f2fs_sb_info *sbi, int
> >> devi)
> >>>  	if (!FDEV(devi).blkz_seq)
> >>>  		return -ENOMEM;
> >>>
> >>> -	/* Get block zones type */
> >>> +	/* Get block zones type and zone-capacity */
> >>> +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
> >>> +					FDEV(devi).nr_blkz * sizeof(block_t),
> >>> +					GFP_KERNEL);
> >>> +	if (!FDEV(devi).zone_capacity_blocks)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	rep_zone_arg.dev = &FDEV(devi);
> >>> +	rep_zone_arg.zone_cap_mismatch = false;
> >>> +
> >>>  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
> >>> -				  &FDEV(devi));
> >>> +				  &rep_zone_arg);
> >>>  	if (ret < 0)
> >>>  		return ret;
> >>
> >> Missed to call kfree(FDEV(devi).zone_capacity_blocks)?
> > Thanks for catching it. Will free it here also.
> >>
> >>>
> >>> +	if (!rep_zone_arg.zone_cap_mismatch) {
> >>> +		kvfree(FDEV(devi).zone_capacity_blocks);
> >>
> >> Ditto, kfree().
> > Ok.
> >>
> >> Thanks,
> >>
> >>> +		FDEV(devi).zone_capacity_blocks = NULL;
> >>> +	}
> >>> +
> >>>  	return 0;
> >>>  }
> >>>  #endif
> >>>
> > .
> >


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-08 13:04         ` Aravind Ramesh
@ 2020-07-09  2:55           ` Chao Yu
  2020-07-09  5:31             ` Aravind Ramesh
  0 siblings, 1 reply; 16+ messages in thread
From: Chao Yu @ 2020-07-09  2:55 UTC (permalink / raw)
  To: Aravind Ramesh, jaegeuk, linux-fsdevel, linux-f2fs-devel, hch
  Cc: Niklas Cassel, Damien Le Moal, Matias Bjorling

On 2020/7/8 21:04, Aravind Ramesh wrote:
> Please find my response inline.
> 
> Thanks,
> Aravind
> 
>> -----Original Message-----
>> From: Chao Yu <yuchao0@huawei.com>
>> Sent: Wednesday, July 8, 2020 8:04 AM
>> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org; linux-
>> fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
>> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
>> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
>> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone size
>>
>> On 2020/7/8 2:23, Aravind Ramesh wrote:
>>> Thanks for review Chao Yu.
>>> Please find my response inline.
>>> I will re-send a V2 after incorporating your comments.
>>>
>>> Regards,
>>> Aravind
>>>
>>>> -----Original Message-----
>>>> From: Chao Yu <yuchao0@huawei.com>
>>>> Sent: Tuesday, July 7, 2020 5:49 PM
>>>> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org;
>>>> linux- fsdevel@vger.kernel.org;
>>>> linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
>>>> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
>>>> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
>>>> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone
>>>> size
>>>>
>>>> On 2020/7/2 23:54, Aravind Ramesh wrote:
>>>>> NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
>>>>> Zone-capacity indicates the maximum number of sectors that are
>>>>> usable in a zone beginning from the first sector of the zone. This
>>>>> makes the sectors sectors after the zone-capacity till zone-size to be unusable.
>>>>> This patch set tracks zone-size and zone-capacity in zoned devices
>>>>> and calculate the usable blocks per segment and usable segments per section.
>>>>>
>>>>> If zone-capacity is less than zone-size mark only those segments
>>>>> which start before zone-capacity as free segments. All segments at
>>>>> and beyond zone-capacity are treated as permanently used segments.
>>>>> In cases where zone-capacity does not align with segment size the
>>>>> last segment will start before zone-capacity and end beyond the
>>>>> zone-capacity of the zone. For such spanning segments only sectors
>>>>> within the
>>>> zone-capacity are used.
>>>>>
>>>>> Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>>>>> ---
>>>>>  fs/f2fs/f2fs.h    |   5 ++
>>>>>  fs/f2fs/segment.c | 136
>>>> ++++++++++++++++++++++++++++++++++++++++++++--
>>>>>  fs/f2fs/segment.h |   6 +-
>>>>>  fs/f2fs/super.c   |  41 ++++++++++++--
>>>>>  4 files changed, 176 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index
>>>>> e6e47618a357..73219e4e1ba4 100644
>>>>> --- a/fs/f2fs/f2fs.h
>>>>> +++ b/fs/f2fs/f2fs.h
>>>>> @@ -1232,6 +1232,7 @@ struct f2fs_dev_info {  #ifdef
>>>>> CONFIG_BLK_DEV_ZONED
>>>>>  	unsigned int nr_blkz;		/* Total number of zones */
>>>>>  	unsigned long *blkz_seq;	/* Bitmap indicating sequential zones */
>>>>> +	block_t *zone_capacity_blocks;  /* Array of zone capacity in blks
>>>>> +*/
>>>>>  #endif
>>>>>  };
>>>>>
>>>>> @@ -3395,6 +3396,10 @@ void
>>>>> f2fs_destroy_segment_manager_caches(void);
>>>>>  int f2fs_rw_hint_to_seg_type(enum rw_hint hint);  enum rw_hint
>>>>> f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
>>>>>  			enum page_type type, enum temp_type temp);
>>>>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
>>>>> +			unsigned int segno);
>>>>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
>>>>> +			unsigned int segno);
>>>>>
>>>>>  /*
>>>>>   * checkpoint.c
>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
>>>>> c35614d255e1..d2156f3f56a5 100644
>>>>> --- a/fs/f2fs/segment.c
>>>>> +++ b/fs/f2fs/segment.c
>>>>> @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct
>>>>> f2fs_sb_info *sbi)  {
>>>>>  	unsigned int start;
>>>>>  	int type;
>>>>> +	struct seg_entry *sentry;
>>>>>
>>>>>  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
>>>>> -		struct seg_entry *sentry = get_seg_entry(sbi, start);
>>>>> +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
>>>>
>>>> If usable blocks count is zero, shouldn't we update
>>>> SIT_I(sbi)->written_valid_blocks as we did when there is partial usable block in
>> current segment?
>>> If usable_block_count is zero, then it is like a dead segment, all
>>> blocks in the segment lie after the zone-capacity in the zone. So there can never be
>> a valid written content on these segments, hence it is not updated.
>>> In the other case, when a segment start before the zone-capacity and
>>> it ends beyond zone-capacity, then there are some blocks before zone-capacity
>> which can be used, so they are accounted for.
>>
>> I'm thinking that for limit_free_user_blocks() function, it assumes all unwritten
>> blocks as potential reclaimable blocks, however segment after zone-capacity should
>> never be used or reclaimable, it looks calculation could be not correct here.
>>
> The sbi->user_block_count is updated with the total usable_blocks in the full 
> file system during the formatting of the file system using mkfs.f2fs. Please see the f2fs-tools
> patch series that I have submitted along with this patch set. 
> 
> So sbi->user_block_count reflects the actual number of usable blocks (i.e. total blocks - unusable blocks).

Alright, will check both kernel and f2fs-tools change again later. :)

> 
>> static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi) {
>> 	block_t reclaimable_user_blocks = sbi->user_block_count -
>> 		written_block_count(sbi);
>> 	return (long)(reclaimable_user_blocks * LIMIT_FREE_BLOCK) / 100; }
>>
>> static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi) {
>> 	block_t invalid_user_blocks = sbi->user_block_count -
>> 					written_block_count(sbi);
>> 	/*
>> 	 * Background GC is triggered with the following conditions.
>> 	 * 1. There are a number of invalid blocks.
>> 	 * 2. There is not enough free space.
>> 	 */
>> 	if (invalid_user_blocks > limit_invalid_user_blocks(sbi) &&
>> 			free_user_blocks(sbi) < limit_free_user_blocks(sbi))
>>
>> -- In this condition, free_user_blocks() doesn't include segments after zone-capacity,
>> however limit_free_user_blocks() includes them.
> In the second patch of this patch set, free_user_blocks is updated to account for the segments after zone-capacity.
> It basically gets the free segment(segments before zone capacity and free) block count and deducts the 
> overprovision segment block count. It also considers the spanning segments block count into account.

Okay.

> 
> 
>>
>> 		return true;
>> 	return false;
>> }
>>
>>
>>>>
>>>>> +			continue;
>>>>> +		sentry = get_seg_entry(sbi, start);
>>>>>  		if (!sentry->valid_blocks)
>>>>>  			__set_free(sbi, start);
>>>>>  		else
>>>>> @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct f2fs_sb_info
>> *sbi)
>>>>>  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>>>>>  	struct free_segmap_info *free_i = FREE_I(sbi);
>>>>>  	unsigned int segno = 0, offset = 0, secno;
>>>>> -	unsigned short valid_blocks;
>>>>> +	unsigned short valid_blocks, usable_blks_in_seg;
>>>>>  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
>>>>>
>>>>>  	while (1) {
>>>>> @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct f2fs_sb_info
>> *sbi)
>>>>>  			break;
>>>>>  		offset = segno + 1;
>>>>>  		valid_blocks = get_valid_blocks(sbi, segno, false);
>>>>> -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
>>>>> +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
>>>>> +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
>>>>
>>>> It needs to traverse .cur_valid_map bitmap to check whether blocks in
>>>> range of [0, usable_blks_in_seg] are all valid or not, if there is at
>>>> least one usable block in the range, segment should be dirty.
>>> For the segments which start and end before zone-capacity are just like any
>> normal segments.
>>> Segments which start after the zone-capacity are fully unusable and are marked as
>> used in the free_seg_bitmap, so these segments are never used.
>>> Segments which span across the zone-capacity have some unusable blocks. Even
>> when blocks from these segments are allocated/deallocated the valid_blocks
>> counter is incremented/decremented, reflecting the current valid_blocks count.
>>> Comparing valid_blocks count with usable_blocks count in the segment can
>> indicate if the segment is dirty or fully used.
>>
>> I thought that if there is one valid block locates in range of [usable_blks_in_seg,
>> blks_per_seg] (after zone-capacity), the condition will be incorrect. That should
>> never happen, right?
> Yes, this will never happen. All blocks after zone-capacity are never usable.
>>
>> If so, how about adjusting check_block_count() to do sanity check on bitmap locates
>> after zone-capacity to make sure there is no free slots there.
> 
> Ok, I will add this check in check_block_count. It makes sense.
> 
>>
>>> Sorry, but could you please share why cur_valid_map needs to be traversed ?
>>>
>>>>
>>>> One question, if we select dirty segment which across zone-capacity
>>>> as opened segment (in curseg), how can we avoid allocating usable
>>>> block beyong zone-capacity in such segment via .cur_valid_map?
>>> For zoned devices, we have to allocate blocks sequentially, so it's always in LFS
>> manner it is allocated.
>>> The __has_curseg_space() checks for the usable blocks and stops allocating blocks
>> after zone-capacity.
>>
>> Oh, that was implemented in patch 2, I haven't checked that patch...sorry, however,
>> IMO, patch should be made to apply independently, what if do allocation only after
>> applying patch 1..., do we need to merge them into one?
> The patches were split keeping in mind that all data structure related and initialization
> Changes would go into patch 1 and IO path and GC related changes in patch 2.
> But if you think, merging them to a single patch will be easier to review, 

Yes, please, it's not only about easier review, but also for better maintenance
of patches in upstream, otherwise, it's not possible to apply, backport, revert
one of two patches independently.

I still didn't get the full picture of using such zns device which has
configured zone-capacity, is it like?
1. configure zone-capacity in zns device
2. mkfs.f2fs zns device
3. mount zns device

Can we change zone-capacity dynamically after step 2? Or we should run
mkfs.f2fs again whenever update zone-capacity?

Thanks,

> then I shall merge it and send it as one patch in V2, along with other suggestions incorporated.
> 
> Please let me know.
>>
>>>>
>>>>>  			continue;
>>>>> -		if (valid_blocks > sbi->blocks_per_seg) {
>>>>> +		if (valid_blocks > usable_blks_in_seg) {
>>>>>  			f2fs_bug_on(sbi, 1);
>>>>>  			continue;
>>>>>  		}
>>>>> @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct
>>>>> f2fs_sb_info *sbi)
>>>>>
>>>>>  	return 0;
>>>>>  }
>>>>> +
>>>>> +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
>>>>> +						unsigned int dev_idx)
>>>>> +{
>>>>> +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
>>>>> +		return true;
>>>>> +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
>>>>> +
>>>>> +/* Return the zone index in the given device */ static unsigned int
>>>>> +get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
>>>>> +					int dev_idx)
>>>>> +{
>>>>> +	block_t sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi,
>>>>> +secno));
>>>>> +
>>>>> +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
>>>>> +						sbi->log_blocks_per_blkz;
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Return the usable segments in a section based on the zone's
>>>>> + * corresponding zone capacity. Zone is equal to a section.
>>>>> + */
>>>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(
>>>>> +		struct f2fs_sb_info *sbi, unsigned int segno) {
>>>>> +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
>>>>> +
>>>>> +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
>>>>> +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno),
>>>>> +dev_idx);
>>>>> +
>>>>> +	/* Conventional zone's capacity is always equal to zone size */
>>>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
>>>>> +		return sbi->segs_per_sec;
>>>>> +
>>>>> +	/*
>>>>> +	 * If the zone_capacity_blocks array is NULL, then zone capacity
>>>>> +	 * is equal to the zone size for all zones
>>>>> +	 */
>>>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
>>>>> +		return sbi->segs_per_sec;
>>>>> +
>>>>> +	/* Get the segment count beyond zone capacity block */
>>>>> +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
>>>>> +				FDEV(dev_idx).zone_capacity_blocks[zone_idx])
>>>>>>
>>>>> +				sbi->log_blocks_per_seg;
>>>>> +	return sbi->segs_per_sec - unusable_segs_in_sec; }
>>>>> +
>>>>> +/*
>>>>> + * Return the number of usable blocks in a segment. The number of
>>>>> +blocks
>>>>> + * returned is always equal to the number of blocks in a segment
>>>>> +for
>>>>> + * segments fully contained within a sequential zone capacity or a
>>>>> + * conventional zone. For segments partially contained in a
>>>>> +sequential
>>>>> + * zone capacity, the number of usable blocks up to the zone
>>>>> +capacity
>>>>> + * is returned. 0 is returned in all other cases.
>>>>> + */
>>>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(
>>>>> +			struct f2fs_sb_info *sbi, unsigned int segno) {
>>>>> +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
>>>>> +	unsigned int zone_idx, dev_idx, secno;
>>>>> +
>>>>> +	secno = GET_SEC_FROM_SEG(sbi, segno);
>>>>> +	seg_start = START_BLOCK(sbi, segno);
>>>>> +	dev_idx = f2fs_target_device_index(sbi, seg_start);
>>>>> +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
>>>>> +
>>>>> +	/*
>>>>> +	 * Conventional zone's capacity is always equal to zone size,
>>>>> +	 * so, blocks per segment is unchanged.
>>>>> +	 */
>>>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
>>>>> +		return sbi->blocks_per_seg;
>>>>> +
>>>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
>>>>> +		return sbi->blocks_per_seg;
>>>>> +
>>>>> +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
>>>>> +	sec_cap_blkaddr = sec_start_blkaddr +
>>>>> +				FDEV(dev_idx).zone_capacity_blocks[zone_idx];
>>>>> +
>>>>> +	/*
>>>>> +	 * If segment starts before zone capacity and spans beyond
>>>>> +	 * zone capacity, then usable blocks are from seg start to
>>>>> +	 * zone capacity. If the segment starts after the zone capacity,
>>>>> +	 * then there are no usable blocks.
>>>>> +	 */
>>>>> +	if (seg_start >= sec_cap_blkaddr)
>>>>> +		return 0;
>>>>> +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
>>>>> +		return sec_cap_blkaddr - seg_start;
>>>>> +
>>>>> +	return sbi->blocks_per_seg;
>>>>> +}
>>>>>  #else
>>>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)  { @@
>>>>> -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct f2fs_sb_info
>>>>> *sbi)  {
>>>>>  	return 0;
>>>>>  }
>>>>> +
>>>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info
>> *sbi,
>>>>> +							unsigned int segno)
>>>>> +{
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info
>> *sbi,
>>>>> +							unsigned int segno)
>>>>> +{
>>>>> +	return 0;
>>>>> +}
>>>>>  #endif
>>>>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
>>>>> +					unsigned int segno)
>>>>> +{
>>>>> +	if (f2fs_sb_has_blkzoned(sbi))
>>>>> +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
>>>>> +
>>>>> +	return sbi->blocks_per_seg;
>>>>> +}
>>>>> +
>>>>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
>>>>> +					unsigned int segno)
>>>>> +{
>>>>> +	if (f2fs_sb_has_blkzoned(sbi))
>>>>> +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
>>>>> +
>>>>> +	return sbi->segs_per_sec;
>>>>> +}
>>>>>
>>>>>  /*
>>>>>   * Update min, max modified time for cost-benefit GC algorithm diff
>>>>> --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
>>>>> f261e3e6a69b..79b0dc33feaf 100644
>>>>> --- a/fs/f2fs/segment.h
>>>>> +++ b/fs/f2fs/segment.h
>>>>> @@ -411,6 +411,7 @@ static inline void __set_free(struct
>>>>> f2fs_sb_info *sbi,
>>>> unsigned int segno)
>>>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>>>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>>>>>  	unsigned int next;
>>>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>>>>>
>>>>>  	spin_lock(&free_i->segmap_lock);
>>>>>  	clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@ static
>>>>> inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
>>>>>
>>>>>  	next = find_next_bit(free_i->free_segmap,
>>>>>  			start_segno + sbi->segs_per_sec, start_segno);
>>>>> -	if (next >= start_segno + sbi->segs_per_sec) {
>>>>> +	if (next >= start_segno + usable_segs) {
>>>>>  		clear_bit(secno, free_i->free_secmap);
>>>>>  		free_i->free_sections++;
>>>>>  	}
>>>>> @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct
>>>>> f2fs_sb_info
>>>> *sbi,
>>>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>>>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>>>>>  	unsigned int next;
>>>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>>>>>
>>>>>  	spin_lock(&free_i->segmap_lock);
>>>>>  	if (test_and_clear_bit(segno, free_i->free_segmap)) { @@ -453,7
>>>>> +455,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info
>>>>> +*sbi,
>>>>>  			goto skip_free;
>>>>>  		next = find_next_bit(free_i->free_segmap,
>>>>>  				start_segno + sbi->segs_per_sec, start_segno);
>>>>> -		if (next >= start_segno + sbi->segs_per_sec) {
>>>>> +		if (next >= start_segno + usable_segs) {
>>>>>  			if (test_and_clear_bit(secno, free_i->free_secmap))
>>>>>  				free_i->free_sections++;
>>>>>  		}
>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index
>>>>> 80cb7cd358f8..2686b07ae7eb 100644
>>>>> --- a/fs/f2fs/super.c
>>>>> +++ b/fs/f2fs/super.c
>>>>> @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct f2fs_sb_info
>> *sbi)
>>>>>  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);  #ifdef
>>>> CONFIG_BLK_DEV_ZONED
>>>>>  		kvfree(FDEV(i).blkz_seq);
>>>>> +		kvfree(FDEV(i).zone_capacity_blocks);
>>>>
>>>> Now, f2fs_kzalloc won't allocate vmalloc's memory, so it's safe to use kfree().
>>> Ok
>>>>
>>>>>  #endif
>>>>>  	}
>>>>>  	kvfree(sbi->devs);
>>>>> @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
>>>>> f2fs_sb_info *sbi)  }
>>>>>
>>>>>  #ifdef CONFIG_BLK_DEV_ZONED
>>>>> +
>>>>> +struct f2fs_report_zones_args {
>>>>> +	struct f2fs_dev_info *dev;
>>>>> +	bool zone_cap_mismatch;
>>>>> +};
>>>>> +
>>>>>  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
>>>>> -			       void *data)
>>>>> +			      void *data)
>>>>>  {
>>>>> -	struct f2fs_dev_info *dev = data;
>>>>> +	struct f2fs_report_zones_args *rz_args = data;
>>>>> +
>>>>> +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
>>>>> +		return 0;
>>>>> +
>>>>> +	set_bit(idx, rz_args->dev->blkz_seq);
>>>>> +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
>>>>> +						F2FS_LOG_SECTORS_PER_BLOCK;
>>>>> +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
>>>>> +		rz_args->zone_cap_mismatch = true;
>>>>>
>>>>> -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
>>>>> -		set_bit(idx, dev->blkz_seq);
>>>>>  	return 0;
>>>>>  }
>>>>>
>>>>> @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct f2fs_sb_info
>>>>> *sbi, int devi)  {
>>>>>  	struct block_device *bdev = FDEV(devi).bdev;
>>>>>  	sector_t nr_sectors = bdev->bd_part->nr_sects;
>>>>> +	struct f2fs_report_zones_args rep_zone_arg;
>>>>>  	int ret;
>>>>>
>>>>>  	if (!f2fs_sb_has_blkzoned(sbi))
>>>>> @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct
>>>>> f2fs_sb_info *sbi, int
>>>> devi)
>>>>>  	if (!FDEV(devi).blkz_seq)
>>>>>  		return -ENOMEM;
>>>>>
>>>>> -	/* Get block zones type */
>>>>> +	/* Get block zones type and zone-capacity */
>>>>> +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
>>>>> +					FDEV(devi).nr_blkz * sizeof(block_t),
>>>>> +					GFP_KERNEL);
>>>>> +	if (!FDEV(devi).zone_capacity_blocks)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	rep_zone_arg.dev = &FDEV(devi);
>>>>> +	rep_zone_arg.zone_cap_mismatch = false;
>>>>> +
>>>>>  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
>>>>> -				  &FDEV(devi));
>>>>> +				  &rep_zone_arg);
>>>>>  	if (ret < 0)
>>>>>  		return ret;
>>>>
>>>> Missed to call kfree(FDEV(devi).zone_capacity_blocks)?
>>> Thanks for catching it. Will free it here also.
>>>>
>>>>>
>>>>> +	if (!rep_zone_arg.zone_cap_mismatch) {
>>>>> +		kvfree(FDEV(devi).zone_capacity_blocks);
>>>>
>>>> Ditto, kfree().
>>> Ok.
>>>>
>>>> Thanks,
>>>>
>>>>> +		FDEV(devi).zone_capacity_blocks = NULL;
>>>>> +	}
>>>>> +
>>>>>  	return 0;
>>>>>  }
>>>>>  #endif
>>>>>
>>> .
>>>
> .
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-09  2:55           ` Chao Yu
@ 2020-07-09  5:31             ` Aravind Ramesh
  2020-07-09  7:05               ` Chao Yu
  0 siblings, 1 reply; 16+ messages in thread
From: Aravind Ramesh @ 2020-07-09  5:31 UTC (permalink / raw)
  To: Chao Yu, jaegeuk, linux-fsdevel, linux-f2fs-devel, hch
  Cc: Niklas Cassel, Damien Le Moal, Matias Bjorling

Please find my response inline.

Thanks,
Aravind

> -----Original Message-----
> From: Chao Yu <yuchao0@huawei.com>
> Sent: Thursday, July 9, 2020 8:26 AM
> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org; linux-
> fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone size
> 
> On 2020/7/8 21:04, Aravind Ramesh wrote:
> > Please find my response inline.
> >
> > Thanks,
> > Aravind
[snip..]
> >>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
> >>>>> c35614d255e1..d2156f3f56a5 100644
> >>>>> --- a/fs/f2fs/segment.c
> >>>>> +++ b/fs/f2fs/segment.c
> >>>>> @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct
> >>>>> f2fs_sb_info *sbi)  {
> >>>>>  	unsigned int start;
> >>>>>  	int type;
> >>>>> +	struct seg_entry *sentry;
> >>>>>
> >>>>>  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
> >>>>> -		struct seg_entry *sentry = get_seg_entry(sbi, start);
> >>>>> +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
> >>>>
> >>>> If usable blocks count is zero, shouldn't we update
> >>>> SIT_I(sbi)->written_valid_blocks as we did when there is partial
> >>>> usable block in
> >> current segment?
> >>> If usable_block_count is zero, then it is like a dead segment, all
> >>> blocks in the segment lie after the zone-capacity in the zone. So
> >>> there can never be
> >> a valid written content on these segments, hence it is not updated.
> >>> In the other case, when a segment start before the zone-capacity and
> >>> it ends beyond zone-capacity, then there are some blocks before
> >>> zone-capacity
> >> which can be used, so they are accounted for.
> >>
> >> I'm thinking that for limit_free_user_blocks() function, it assumes
> >> all unwritten blocks as potential reclaimable blocks, however segment
> >> after zone-capacity should never be used or reclaimable, it looks calculation could
> be not correct here.
> >>
> > The sbi->user_block_count is updated with the total usable_blocks in
> > the full file system during the formatting of the file system using
> > mkfs.f2fs. Please see the f2fs-tools patch series that I have submitted along with
> this patch set.
> >
> > So sbi->user_block_count reflects the actual number of usable blocks (i.e. total
> blocks - unusable blocks).
> 
> Alright, will check both kernel and f2fs-tools change again later. :)
> 
> >
> >> static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi) {
> >> 	block_t reclaimable_user_blocks = sbi->user_block_count -
> >> 		written_block_count(sbi);
> >> 	return (long)(reclaimable_user_blocks * LIMIT_FREE_BLOCK) / 100; }
> >>
> >> static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi) {
> >> 	block_t invalid_user_blocks = sbi->user_block_count -
> >> 					written_block_count(sbi);
> >> 	/*
> >> 	 * Background GC is triggered with the following conditions.
> >> 	 * 1. There are a number of invalid blocks.
> >> 	 * 2. There is not enough free space.
> >> 	 */
> >> 	if (invalid_user_blocks > limit_invalid_user_blocks(sbi) &&
> >> 			free_user_blocks(sbi) < limit_free_user_blocks(sbi))
> >>
> >> -- In this condition, free_user_blocks() doesn't include segments
> >> after zone-capacity, however limit_free_user_blocks() includes them.
> > In the second patch of this patch set, free_user_blocks is updated to account for
> the segments after zone-capacity.
> > It basically gets the free segment(segments before zone capacity and
> > free) block count and deducts the overprovision segment block count. It also
> considers the spanning segments block count into account.
> 
> Okay.
> 
> >
> >
> >>
> >> 		return true;
> >> 	return false;
> >> }
> >>
> >>
> >>>>
> >>>>> +			continue;
> >>>>> +		sentry = get_seg_entry(sbi, start);
> >>>>>  		if (!sentry->valid_blocks)
> >>>>>  			__set_free(sbi, start);
> >>>>>  		else
> >>>>> @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct
> >>>>> f2fs_sb_info
> >> *sbi)
> >>>>>  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
> >>>>>  	struct free_segmap_info *free_i = FREE_I(sbi);
> >>>>>  	unsigned int segno = 0, offset = 0, secno;
> >>>>> -	unsigned short valid_blocks;
> >>>>> +	unsigned short valid_blocks, usable_blks_in_seg;
> >>>>>  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
> >>>>>
> >>>>>  	while (1) {
> >>>>> @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct
> >>>>> f2fs_sb_info
> >> *sbi)
> >>>>>  			break;
> >>>>>  		offset = segno + 1;
> >>>>>  		valid_blocks = get_valid_blocks(sbi, segno, false);
> >>>>> -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
> >>>>> +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
> >>>>> +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
> >>>>
> >>>> It needs to traverse .cur_valid_map bitmap to check whether blocks
> >>>> in range of [0, usable_blks_in_seg] are all valid or not, if there
> >>>> is at least one usable block in the range, segment should be dirty.
> >>> For the segments which start and end before zone-capacity are just
> >>> like any
> >> normal segments.
> >>> Segments which start after the zone-capacity are fully unusable and
> >>> are marked as
> >> used in the free_seg_bitmap, so these segments are never used.
> >>> Segments which span across the zone-capacity have some unusable
> >>> blocks. Even
> >> when blocks from these segments are allocated/deallocated the
> >> valid_blocks counter is incremented/decremented, reflecting the current
> valid_blocks count.
> >>> Comparing valid_blocks count with usable_blocks count in the segment
> >>> can
> >> indicate if the segment is dirty or fully used.
> >>
> >> I thought that if there is one valid block locates in range of
> >> [usable_blks_in_seg, blks_per_seg] (after zone-capacity), the
> >> condition will be incorrect. That should never happen, right?
> > Yes, this will never happen. All blocks after zone-capacity are never usable.
> >>
> >> If so, how about adjusting check_block_count() to do sanity check on
> >> bitmap locates after zone-capacity to make sure there is no free slots there.
> >
> > Ok, I will add this check in check_block_count. It makes sense.
> >
> >>
> >>> Sorry, but could you please share why cur_valid_map needs to be traversed ?
> >>>
> >>>>
> >>>> One question, if we select dirty segment which across zone-capacity
> >>>> as opened segment (in curseg), how can we avoid allocating usable
> >>>> block beyong zone-capacity in such segment via .cur_valid_map?
> >>> For zoned devices, we have to allocate blocks sequentially, so it's
> >>> always in LFS
> >> manner it is allocated.
> >>> The __has_curseg_space() checks for the usable blocks and stops
> >>> allocating blocks
> >> after zone-capacity.
> >>
> >> Oh, that was implemented in patch 2, I haven't checked that
> >> patch...sorry, however, IMO, patch should be made to apply
> >> independently, what if do allocation only after applying patch 1..., do we need to
> merge them into one?
> > The patches were split keeping in mind that all data structure related
> > and initialization Changes would go into patch 1 and IO path and GC related
> changes in patch 2.
> > But if you think, merging them to a single patch will be easier to
> > review,
> 
> Yes, please, it's not only about easier review, but also for better maintenance of
> patches in upstream, otherwise, it's not possible to apply, backport, revert one of
> two patches independently.
> 
> I still didn't get the full picture of using such zns device which has configured zone-
> capacity, is it like?
> 1. configure zone-capacity in zns device 2. mkfs.f2fs zns device 3. mount zns device

Zone-capacity is set by the device vendor. It could be same as zone-size or less than zone-size
depending on vendor. It cannot be configured by the user. So the step 1 is not possible.
Since NVMe ZNS device zones are sequentially write only, we need another zoned device with
Conventional zones or any normal block device for the metadata operations of F2fs.
I have provided some more explanation in the cover letter of the kernel patch set on this.
Step 2 is mkfs.f2fs zns device + block device (mkfs.f2fs -m -c /dev/nvme0n1 /dev/nullb1)

A typical nvme-cli output of a zoned device shows zone start and capacity and write pointer as below:

SLBA: 0x0             WP: 0x0             Cap: 0x18800    State: EMPTY        Type: SEQWRITE_REQ   Attrs: 0x0
SLBA: 0x20000    WP: 0x20000    Cap: 0x18800    State: EMPTY        Type: SEQWRITE_REQ   Attrs: 0x0
SLBA: 0x40000    WP: 0x40000    Cap: 0x18800    State: EMPTY        Type: SEQWRITE_REQ   Attrs: 0x0

Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zone is empty. Here for each zone,
only zone start + 49MB is usable area, any lba/sector after 49MB cannot be read or written to, the drive will
fail any attempts to read/write. So, the second zone starts at 64MB and is usable till 113MB (64 + 49) and the
range between 113 and 128MB is again unusable. The next zone starts at 128MB, and so on.

> 
> Can we change zone-capacity dynamically after step 2? Or we should run mkfs.f2fs
> again whenever update zone-capacity?
User cannot change zone-capacity dynamically. It is device dependent.
> 
> Thanks,
> 
> > then I shall merge it and send it as one patch in V2, along with other suggestions
> incorporated.
> >
> > Please let me know.
> >>
> >>>>
> >>>>>  			continue;
> >>>>> -		if (valid_blocks > sbi->blocks_per_seg) {
> >>>>> +		if (valid_blocks > usable_blks_in_seg) {
> >>>>>  			f2fs_bug_on(sbi, 1);
> >>>>>  			continue;
> >>>>>  		}
> >>>>> @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct
> >>>>> f2fs_sb_info *sbi)
> >>>>>
> >>>>>  	return 0;
> >>>>>  }
> >>>>> +
> >>>>> +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
> >>>>> +						unsigned int dev_idx)
> >>>>> +{
> >>>>> +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
> >>>>> +		return true;
> >>>>> +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
> >>>>> +
> >>>>> +/* Return the zone index in the given device */ static unsigned
> >>>>> +int get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
> >>>>> +					int dev_idx)
> >>>>> +{
> >>>>> +	block_t sec_start_blkaddr = START_BLOCK(sbi,
> >>>>> +GET_SEG_FROM_SEC(sbi, secno));
> >>>>> +
> >>>>> +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
> >>>>> +						sbi->log_blocks_per_blkz;
> >>>>> +}
> >>>>> +
> >>>>> +/*
> >>>>> + * Return the usable segments in a section based on the zone's
> >>>>> + * corresponding zone capacity. Zone is equal to a section.
> >>>>> + */
> >>>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(
> >>>>> +		struct f2fs_sb_info *sbi, unsigned int segno) {
> >>>>> +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
> >>>>> +
> >>>>> +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
> >>>>> +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno),
> >>>>> +dev_idx);
> >>>>> +
> >>>>> +	/* Conventional zone's capacity is always equal to zone size */
> >>>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> >>>>> +		return sbi->segs_per_sec;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * If the zone_capacity_blocks array is NULL, then zone capacity
> >>>>> +	 * is equal to the zone size for all zones
> >>>>> +	 */
> >>>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> >>>>> +		return sbi->segs_per_sec;
> >>>>> +
> >>>>> +	/* Get the segment count beyond zone capacity block */
> >>>>> +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
> >>>>> +
> 	FDEV(dev_idx).zone_capacity_blocks[zone_idx])
> >>>>>>
> >>>>> +				sbi->log_blocks_per_seg;
> >>>>> +	return sbi->segs_per_sec - unusable_segs_in_sec; }
> >>>>> +
> >>>>> +/*
> >>>>> + * Return the number of usable blocks in a segment. The number of
> >>>>> +blocks
> >>>>> + * returned is always equal to the number of blocks in a segment
> >>>>> +for
> >>>>> + * segments fully contained within a sequential zone capacity or
> >>>>> +a
> >>>>> + * conventional zone. For segments partially contained in a
> >>>>> +sequential
> >>>>> + * zone capacity, the number of usable blocks up to the zone
> >>>>> +capacity
> >>>>> + * is returned. 0 is returned in all other cases.
> >>>>> + */
> >>>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(
> >>>>> +			struct f2fs_sb_info *sbi, unsigned int segno) {
> >>>>> +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
> >>>>> +	unsigned int zone_idx, dev_idx, secno;
> >>>>> +
> >>>>> +	secno = GET_SEC_FROM_SEG(sbi, segno);
> >>>>> +	seg_start = START_BLOCK(sbi, segno);
> >>>>> +	dev_idx = f2fs_target_device_index(sbi, seg_start);
> >>>>> +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * Conventional zone's capacity is always equal to zone size,
> >>>>> +	 * so, blocks per segment is unchanged.
> >>>>> +	 */
> >>>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> >>>>> +		return sbi->blocks_per_seg;
> >>>>> +
> >>>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> >>>>> +		return sbi->blocks_per_seg;
> >>>>> +
> >>>>> +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi,
> secno));
> >>>>> +	sec_cap_blkaddr = sec_start_blkaddr +
> >>>>> +
> 	FDEV(dev_idx).zone_capacity_blocks[zone_idx];
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * If segment starts before zone capacity and spans beyond
> >>>>> +	 * zone capacity, then usable blocks are from seg start to
> >>>>> +	 * zone capacity. If the segment starts after the zone capacity,
> >>>>> +	 * then there are no usable blocks.
> >>>>> +	 */
> >>>>> +	if (seg_start >= sec_cap_blkaddr)
> >>>>> +		return 0;
> >>>>> +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
> >>>>> +		return sec_cap_blkaddr - seg_start;
> >>>>> +
> >>>>> +	return sbi->blocks_per_seg;
> >>>>> +}
> >>>>>  #else
> >>>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)  { @@
> >>>>> -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct
> >>>>> f2fs_sb_info
> >>>>> *sbi)  {
> >>>>>  	return 0;
> >>>>>  }
> >>>>> +
> >>>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct
> >>>>> +f2fs_sb_info
> >> *sbi,
> >>>>> +							unsigned int
> segno)
> >>>>> +{
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +
> >>>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct
> >>>>> +f2fs_sb_info
> >> *sbi,
> >>>>> +							unsigned int
> segno)
> >>>>> +{
> >>>>> +	return 0;
> >>>>> +}
> >>>>>  #endif
> >>>>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> >>>>> +					unsigned int segno)
> >>>>> +{
> >>>>> +	if (f2fs_sb_has_blkzoned(sbi))
> >>>>> +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
> >>>>> +
> >>>>> +	return sbi->blocks_per_seg;
> >>>>> +}
> >>>>> +
> >>>>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> >>>>> +					unsigned int segno)
> >>>>> +{
> >>>>> +	if (f2fs_sb_has_blkzoned(sbi))
> >>>>> +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
> >>>>> +
> >>>>> +	return sbi->segs_per_sec;
> >>>>> +}
> >>>>>
> >>>>>  /*
> >>>>>   * Update min, max modified time for cost-benefit GC algorithm
> >>>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
> >>>>> f261e3e6a69b..79b0dc33feaf 100644
> >>>>> --- a/fs/f2fs/segment.h
> >>>>> +++ b/fs/f2fs/segment.h
> >>>>> @@ -411,6 +411,7 @@ static inline void __set_free(struct
> >>>>> f2fs_sb_info *sbi,
> >>>> unsigned int segno)
> >>>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> >>>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> >>>>>  	unsigned int next;
> >>>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> >>>>>
> >>>>>  	spin_lock(&free_i->segmap_lock);
> >>>>>  	clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@
> >>>>> static inline void __set_free(struct f2fs_sb_info *sbi, unsigned
> >>>>> int segno)
> >>>>>
> >>>>>  	next = find_next_bit(free_i->free_segmap,
> >>>>>  			start_segno + sbi->segs_per_sec, start_segno);
> >>>>> -	if (next >= start_segno + sbi->segs_per_sec) {
> >>>>> +	if (next >= start_segno + usable_segs) {
> >>>>>  		clear_bit(secno, free_i->free_secmap);
> >>>>>  		free_i->free_sections++;
> >>>>>  	}
> >>>>> @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct
> >>>>> f2fs_sb_info
> >>>> *sbi,
> >>>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> >>>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> >>>>>  	unsigned int next;
> >>>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
> >>>>>
> >>>>>  	spin_lock(&free_i->segmap_lock);
> >>>>>  	if (test_and_clear_bit(segno, free_i->free_segmap)) { @@ -453,7
> >>>>> +455,7 @@ static inline void __set_test_and_free(struct
> >>>>> +f2fs_sb_info *sbi,
> >>>>>  			goto skip_free;
> >>>>>  		next = find_next_bit(free_i->free_segmap,
> >>>>>  				start_segno + sbi->segs_per_sec, start_segno);
> >>>>> -		if (next >= start_segno + sbi->segs_per_sec) {
> >>>>> +		if (next >= start_segno + usable_segs) {
> >>>>>  			if (test_and_clear_bit(secno, free_i->free_secmap))
> >>>>>  				free_i->free_sections++;
> >>>>>  		}
> >>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index
> >>>>> 80cb7cd358f8..2686b07ae7eb 100644
> >>>>> --- a/fs/f2fs/super.c
> >>>>> +++ b/fs/f2fs/super.c
> >>>>> @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct
> >>>>> f2fs_sb_info
> >> *sbi)
> >>>>>  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);  #ifdef
> >>>> CONFIG_BLK_DEV_ZONED
> >>>>>  		kvfree(FDEV(i).blkz_seq);
> >>>>> +		kvfree(FDEV(i).zone_capacity_blocks);
> >>>>
> >>>> Now, f2fs_kzalloc won't allocate vmalloc's memory, so it's safe to use kfree().
> >>> Ok
> >>>>
> >>>>>  #endif
> >>>>>  	}
> >>>>>  	kvfree(sbi->devs);
> >>>>> @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
> >>>>> f2fs_sb_info *sbi)  }
> >>>>>
> >>>>>  #ifdef CONFIG_BLK_DEV_ZONED
> >>>>> +
> >>>>> +struct f2fs_report_zones_args {
> >>>>> +	struct f2fs_dev_info *dev;
> >>>>> +	bool zone_cap_mismatch;
> >>>>> +};
> >>>>> +
> >>>>>  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> >>>>> -			       void *data)
> >>>>> +			      void *data)
> >>>>>  {
> >>>>> -	struct f2fs_dev_info *dev = data;
> >>>>> +	struct f2fs_report_zones_args *rz_args = data;
> >>>>> +
> >>>>> +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> >>>>> +		return 0;
> >>>>> +
> >>>>> +	set_bit(idx, rz_args->dev->blkz_seq);
> >>>>> +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
> >>>>> +
> 	F2FS_LOG_SECTORS_PER_BLOCK;
> >>>>> +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
> >>>>> +		rz_args->zone_cap_mismatch = true;
> >>>>>
> >>>>> -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
> >>>>> -		set_bit(idx, dev->blkz_seq);
> >>>>>  	return 0;
> >>>>>  }
> >>>>>
> >>>>> @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct
> >>>>> f2fs_sb_info *sbi, int devi)  {
> >>>>>  	struct block_device *bdev = FDEV(devi).bdev;
> >>>>>  	sector_t nr_sectors = bdev->bd_part->nr_sects;
> >>>>> +	struct f2fs_report_zones_args rep_zone_arg;
> >>>>>  	int ret;
> >>>>>
> >>>>>  	if (!f2fs_sb_has_blkzoned(sbi))
> >>>>> @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct
> >>>>> f2fs_sb_info *sbi, int
> >>>> devi)
> >>>>>  	if (!FDEV(devi).blkz_seq)
> >>>>>  		return -ENOMEM;
> >>>>>
> >>>>> -	/* Get block zones type */
> >>>>> +	/* Get block zones type and zone-capacity */
> >>>>> +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
> >>>>> +					FDEV(devi).nr_blkz *
> sizeof(block_t),
> >>>>> +					GFP_KERNEL);
> >>>>> +	if (!FDEV(devi).zone_capacity_blocks)
> >>>>> +		return -ENOMEM;
> >>>>> +
> >>>>> +	rep_zone_arg.dev = &FDEV(devi);
> >>>>> +	rep_zone_arg.zone_cap_mismatch = false;
> >>>>> +
> >>>>>  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
> >>>>> -				  &FDEV(devi));
> >>>>> +				  &rep_zone_arg);
> >>>>>  	if (ret < 0)
> >>>>>  		return ret;
> >>>>
> >>>> Missed to call kfree(FDEV(devi).zone_capacity_blocks)?
> >>> Thanks for catching it. Will free it here also.
> >>>>
> >>>>>
> >>>>> +	if (!rep_zone_arg.zone_cap_mismatch) {
> >>>>> +		kvfree(FDEV(devi).zone_capacity_blocks);
> >>>>
> >>>> Ditto, kfree().
> >>> Ok.
> >>>>
> >>>> Thanks,
> >>>>
> >>>>> +		FDEV(devi).zone_capacity_blocks = NULL;
> >>>>> +	}
> >>>>> +
> >>>>>  	return 0;
> >>>>>  }
> >>>>>  #endif
> >>>>>
> >>> .
> >>>
> > .
> >


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-09  5:31             ` Aravind Ramesh
@ 2020-07-09  7:05               ` Chao Yu
  2020-07-09  7:11                 ` Aravind Ramesh
  0 siblings, 1 reply; 16+ messages in thread
From: Chao Yu @ 2020-07-09  7:05 UTC (permalink / raw)
  To: Aravind Ramesh, jaegeuk, linux-fsdevel, linux-f2fs-devel, hch
  Cc: Niklas Cassel, Damien Le Moal, Matias Bjorling

On 2020/7/9 13:31, Aravind Ramesh wrote:
> Please find my response inline.
> 
> Thanks,
> Aravind
> 
>> -----Original Message-----
>> From: Chao Yu <yuchao0@huawei.com>
>> Sent: Thursday, July 9, 2020 8:26 AM
>> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org; linux-
>> fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
>> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
>> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
>> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone size
>>
>> On 2020/7/8 21:04, Aravind Ramesh wrote:
>>> Please find my response inline.
>>>
>>> Thanks,
>>> Aravind
> [snip..]
>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
>>>>>>> c35614d255e1..d2156f3f56a5 100644
>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>> @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct
>>>>>>> f2fs_sb_info *sbi)  {
>>>>>>>  	unsigned int start;
>>>>>>>  	int type;
>>>>>>> +	struct seg_entry *sentry;
>>>>>>>
>>>>>>>  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
>>>>>>> -		struct seg_entry *sentry = get_seg_entry(sbi, start);
>>>>>>> +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
>>>>>>
>>>>>> If usable blocks count is zero, shouldn't we update
>>>>>> SIT_I(sbi)->written_valid_blocks as we did when there is partial
>>>>>> usable block in
>>>> current segment?
>>>>> If usable_block_count is zero, then it is like a dead segment, all
>>>>> blocks in the segment lie after the zone-capacity in the zone. So
>>>>> there can never be
>>>> a valid written content on these segments, hence it is not updated.
>>>>> In the other case, when a segment start before the zone-capacity and
>>>>> it ends beyond zone-capacity, then there are some blocks before
>>>>> zone-capacity
>>>> which can be used, so they are accounted for.
>>>>
>>>> I'm thinking that for limit_free_user_blocks() function, it assumes
>>>> all unwritten blocks as potential reclaimable blocks, however segment
>>>> after zone-capacity should never be used or reclaimable, it looks calculation could
>> be not correct here.
>>>>
>>> The sbi->user_block_count is updated with the total usable_blocks in
>>> the full file system during the formatting of the file system using
>>> mkfs.f2fs. Please see the f2fs-tools patch series that I have submitted along with
>> this patch set.
>>>
>>> So sbi->user_block_count reflects the actual number of usable blocks (i.e. total
>> blocks - unusable blocks).
>>
>> Alright, will check both kernel and f2fs-tools change again later. :)
>>
>>>
>>>> static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi) {
>>>> 	block_t reclaimable_user_blocks = sbi->user_block_count -
>>>> 		written_block_count(sbi);
>>>> 	return (long)(reclaimable_user_blocks * LIMIT_FREE_BLOCK) / 100; }
>>>>
>>>> static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi) {
>>>> 	block_t invalid_user_blocks = sbi->user_block_count -
>>>> 					written_block_count(sbi);
>>>> 	/*
>>>> 	 * Background GC is triggered with the following conditions.
>>>> 	 * 1. There are a number of invalid blocks.
>>>> 	 * 2. There is not enough free space.
>>>> 	 */
>>>> 	if (invalid_user_blocks > limit_invalid_user_blocks(sbi) &&
>>>> 			free_user_blocks(sbi) < limit_free_user_blocks(sbi))
>>>>
>>>> -- In this condition, free_user_blocks() doesn't include segments
>>>> after zone-capacity, however limit_free_user_blocks() includes them.
>>> In the second patch of this patch set, free_user_blocks is updated to account for
>> the segments after zone-capacity.
>>> It basically gets the free segment(segments before zone capacity and
>>> free) block count and deducts the overprovision segment block count. It also
>> considers the spanning segments block count into account.
>>
>> Okay.
>>
>>>
>>>
>>>>
>>>> 		return true;
>>>> 	return false;
>>>> }
>>>>
>>>>
>>>>>>
>>>>>>> +			continue;
>>>>>>> +		sentry = get_seg_entry(sbi, start);
>>>>>>>  		if (!sentry->valid_blocks)
>>>>>>>  			__set_free(sbi, start);
>>>>>>>  		else
>>>>>>> @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct
>>>>>>> f2fs_sb_info
>>>> *sbi)
>>>>>>>  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>>>>>>>  	struct free_segmap_info *free_i = FREE_I(sbi);
>>>>>>>  	unsigned int segno = 0, offset = 0, secno;
>>>>>>> -	unsigned short valid_blocks;
>>>>>>> +	unsigned short valid_blocks, usable_blks_in_seg;
>>>>>>>  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
>>>>>>>
>>>>>>>  	while (1) {
>>>>>>> @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct
>>>>>>> f2fs_sb_info
>>>> *sbi)
>>>>>>>  			break;
>>>>>>>  		offset = segno + 1;
>>>>>>>  		valid_blocks = get_valid_blocks(sbi, segno, false);
>>>>>>> -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
>>>>>>> +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
>>>>>>> +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
>>>>>>
>>>>>> It needs to traverse .cur_valid_map bitmap to check whether blocks
>>>>>> in range of [0, usable_blks_in_seg] are all valid or not, if there
>>>>>> is at least one usable block in the range, segment should be dirty.
>>>>> For the segments which start and end before zone-capacity are just
>>>>> like any
>>>> normal segments.
>>>>> Segments which start after the zone-capacity are fully unusable and
>>>>> are marked as
>>>> used in the free_seg_bitmap, so these segments are never used.
>>>>> Segments which span across the zone-capacity have some unusable
>>>>> blocks. Even
>>>> when blocks from these segments are allocated/deallocated the
>>>> valid_blocks counter is incremented/decremented, reflecting the current
>> valid_blocks count.
>>>>> Comparing valid_blocks count with usable_blocks count in the segment
>>>>> can
>>>> indicate if the segment is dirty or fully used.
>>>>
>>>> I thought that if there is one valid block locates in range of
>>>> [usable_blks_in_seg, blks_per_seg] (after zone-capacity), the
>>>> condition will be incorrect. That should never happen, right?
>>> Yes, this will never happen. All blocks after zone-capacity are never usable.
>>>>
>>>> If so, how about adjusting check_block_count() to do sanity check on
>>>> bitmap locates after zone-capacity to make sure there is no free slots there.
>>>
>>> Ok, I will add this check in check_block_count. It makes sense.
>>>
>>>>
>>>>> Sorry, but could you please share why cur_valid_map needs to be traversed ?
>>>>>
>>>>>>
>>>>>> One question, if we select dirty segment which across zone-capacity
>>>>>> as opened segment (in curseg), how can we avoid allocating usable
>>>>>> block beyong zone-capacity in such segment via .cur_valid_map?
>>>>> For zoned devices, we have to allocate blocks sequentially, so it's
>>>>> always in LFS
>>>> manner it is allocated.
>>>>> The __has_curseg_space() checks for the usable blocks and stops
>>>>> allocating blocks
>>>> after zone-capacity.
>>>>
>>>> Oh, that was implemented in patch 2, I haven't checked that
>>>> patch...sorry, however, IMO, patch should be made to apply
>>>> independently, what if do allocation only after applying patch 1..., do we need to
>> merge them into one?
>>> The patches were split keeping in mind that all data structure related
>>> and initialization Changes would go into patch 1 and IO path and GC related
>> changes in patch 2.
>>> But if you think, merging them to a single patch will be easier to
>>> review,
>>
>> Yes, please, it's not only about easier review, but also for better maintenance of
>> patches in upstream, otherwise, it's not possible to apply, backport, revert one of
>> two patches independently.
>>
>> I still didn't get the full picture of using such zns device which has configured zone-
>> capacity, is it like?
>> 1. configure zone-capacity in zns device 2. mkfs.f2fs zns device 3. mount zns device
> 
> Zone-capacity is set by the device vendor. It could be same as zone-size or less than zone-size
> depending on vendor. It cannot be configured by the user. So the step 1 is not possible.
> Since NVMe ZNS device zones are sequentially write only, we need another zoned device with
> Conventional zones or any normal block device for the metadata operations of F2fs.
> I have provided some more explanation in the cover letter of the kernel patch set on this.
> Step 2 is mkfs.f2fs zns device + block device (mkfs.f2fs -m -c /dev/nvme0n1 /dev/nullb1)
> 
> A typical nvme-cli output of a zoned device shows zone start and capacity and write pointer as below:
> 
> SLBA: 0x0             WP: 0x0             Cap: 0x18800    State: EMPTY        Type: SEQWRITE_REQ   Attrs: 0x0
> SLBA: 0x20000    WP: 0x20000    Cap: 0x18800    State: EMPTY        Type: SEQWRITE_REQ   Attrs: 0x0
> SLBA: 0x40000    WP: 0x40000    Cap: 0x18800    State: EMPTY        Type: SEQWRITE_REQ   Attrs: 0x0
> 
> Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zone is empty. Here for each zone,
> only zone start + 49MB is usable area, any lba/sector after 49MB cannot be read or written to, the drive will
> fail any attempts to read/write. So, the second zone starts at 64MB and is usable till 113MB (64 + 49) and the
> range between 113 and 128MB is again unusable. The next zone starts at 128MB, and so on.

Thanks for the detailed explanation, more clear now. :)

Could you please add above description into commit message of your kernel patch?
And also please consider to add simple introduction of f2fs zns device support
into f2fs.rst for our user?

Thanks,

> 
>>
>> Can we change zone-capacity dynamically after step 2? Or we should run mkfs.f2fs
>> again whenever update zone-capacity?
> User cannot change zone-capacity dynamically. It is device dependent.
>>
>> Thanks,
>>
>>> then I shall merge it and send it as one patch in V2, along with other suggestions
>> incorporated.
>>>
>>> Please let me know.
>>>>
>>>>>>
>>>>>>>  			continue;
>>>>>>> -		if (valid_blocks > sbi->blocks_per_seg) {
>>>>>>> +		if (valid_blocks > usable_blks_in_seg) {
>>>>>>>  			f2fs_bug_on(sbi, 1);
>>>>>>>  			continue;
>>>>>>>  		}
>>>>>>> @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct
>>>>>>> f2fs_sb_info *sbi)
>>>>>>>
>>>>>>>  	return 0;
>>>>>>>  }
>>>>>>> +
>>>>>>> +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
>>>>>>> +						unsigned int dev_idx)
>>>>>>> +{
>>>>>>> +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
>>>>>>> +		return true;
>>>>>>> +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
>>>>>>> +
>>>>>>> +/* Return the zone index in the given device */ static unsigned
>>>>>>> +int get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
>>>>>>> +					int dev_idx)
>>>>>>> +{
>>>>>>> +	block_t sec_start_blkaddr = START_BLOCK(sbi,
>>>>>>> +GET_SEG_FROM_SEC(sbi, secno));
>>>>>>> +
>>>>>>> +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
>>>>>>> +						sbi->log_blocks_per_blkz;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Return the usable segments in a section based on the zone's
>>>>>>> + * corresponding zone capacity. Zone is equal to a section.
>>>>>>> + */
>>>>>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(
>>>>>>> +		struct f2fs_sb_info *sbi, unsigned int segno) {
>>>>>>> +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
>>>>>>> +
>>>>>>> +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
>>>>>>> +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno),
>>>>>>> +dev_idx);
>>>>>>> +
>>>>>>> +	/* Conventional zone's capacity is always equal to zone size */
>>>>>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
>>>>>>> +		return sbi->segs_per_sec;
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * If the zone_capacity_blocks array is NULL, then zone capacity
>>>>>>> +	 * is equal to the zone size for all zones
>>>>>>> +	 */
>>>>>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
>>>>>>> +		return sbi->segs_per_sec;
>>>>>>> +
>>>>>>> +	/* Get the segment count beyond zone capacity block */
>>>>>>> +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
>>>>>>> +
>> 	FDEV(dev_idx).zone_capacity_blocks[zone_idx])
>>>>>>>>
>>>>>>> +				sbi->log_blocks_per_seg;
>>>>>>> +	return sbi->segs_per_sec - unusable_segs_in_sec; }
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Return the number of usable blocks in a segment. The number of
>>>>>>> +blocks
>>>>>>> + * returned is always equal to the number of blocks in a segment
>>>>>>> +for
>>>>>>> + * segments fully contained within a sequential zone capacity or
>>>>>>> +a
>>>>>>> + * conventional zone. For segments partially contained in a
>>>>>>> +sequential
>>>>>>> + * zone capacity, the number of usable blocks up to the zone
>>>>>>> +capacity
>>>>>>> + * is returned. 0 is returned in all other cases.
>>>>>>> + */
>>>>>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(
>>>>>>> +			struct f2fs_sb_info *sbi, unsigned int segno) {
>>>>>>> +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
>>>>>>> +	unsigned int zone_idx, dev_idx, secno;
>>>>>>> +
>>>>>>> +	secno = GET_SEC_FROM_SEG(sbi, segno);
>>>>>>> +	seg_start = START_BLOCK(sbi, segno);
>>>>>>> +	dev_idx = f2fs_target_device_index(sbi, seg_start);
>>>>>>> +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * Conventional zone's capacity is always equal to zone size,
>>>>>>> +	 * so, blocks per segment is unchanged.
>>>>>>> +	 */
>>>>>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
>>>>>>> +		return sbi->blocks_per_seg;
>>>>>>> +
>>>>>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
>>>>>>> +		return sbi->blocks_per_seg;
>>>>>>> +
>>>>>>> +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi,
>> secno));
>>>>>>> +	sec_cap_blkaddr = sec_start_blkaddr +
>>>>>>> +
>> 	FDEV(dev_idx).zone_capacity_blocks[zone_idx];
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * If segment starts before zone capacity and spans beyond
>>>>>>> +	 * zone capacity, then usable blocks are from seg start to
>>>>>>> +	 * zone capacity. If the segment starts after the zone capacity,
>>>>>>> +	 * then there are no usable blocks.
>>>>>>> +	 */
>>>>>>> +	if (seg_start >= sec_cap_blkaddr)
>>>>>>> +		return 0;
>>>>>>> +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
>>>>>>> +		return sec_cap_blkaddr - seg_start;
>>>>>>> +
>>>>>>> +	return sbi->blocks_per_seg;
>>>>>>> +}
>>>>>>>  #else
>>>>>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)  { @@
>>>>>>> -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct
>>>>>>> f2fs_sb_info
>>>>>>> *sbi)  {
>>>>>>>  	return 0;
>>>>>>>  }
>>>>>>> +
>>>>>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct
>>>>>>> +f2fs_sb_info
>>>> *sbi,
>>>>>>> +							unsigned int
>> segno)
>>>>>>> +{
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct
>>>>>>> +f2fs_sb_info
>>>> *sbi,
>>>>>>> +							unsigned int
>> segno)
>>>>>>> +{
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>>  #endif
>>>>>>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
>>>>>>> +					unsigned int segno)
>>>>>>> +{
>>>>>>> +	if (f2fs_sb_has_blkzoned(sbi))
>>>>>>> +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
>>>>>>> +
>>>>>>> +	return sbi->blocks_per_seg;
>>>>>>> +}
>>>>>>> +
>>>>>>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
>>>>>>> +					unsigned int segno)
>>>>>>> +{
>>>>>>> +	if (f2fs_sb_has_blkzoned(sbi))
>>>>>>> +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
>>>>>>> +
>>>>>>> +	return sbi->segs_per_sec;
>>>>>>> +}
>>>>>>>
>>>>>>>  /*
>>>>>>>   * Update min, max modified time for cost-benefit GC algorithm
>>>>>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
>>>>>>> f261e3e6a69b..79b0dc33feaf 100644
>>>>>>> --- a/fs/f2fs/segment.h
>>>>>>> +++ b/fs/f2fs/segment.h
>>>>>>> @@ -411,6 +411,7 @@ static inline void __set_free(struct
>>>>>>> f2fs_sb_info *sbi,
>>>>>> unsigned int segno)
>>>>>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>>>>>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>>>>>>>  	unsigned int next;
>>>>>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>>>>>>>
>>>>>>>  	spin_lock(&free_i->segmap_lock);
>>>>>>>  	clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@
>>>>>>> static inline void __set_free(struct f2fs_sb_info *sbi, unsigned
>>>>>>> int segno)
>>>>>>>
>>>>>>>  	next = find_next_bit(free_i->free_segmap,
>>>>>>>  			start_segno + sbi->segs_per_sec, start_segno);
>>>>>>> -	if (next >= start_segno + sbi->segs_per_sec) {
>>>>>>> +	if (next >= start_segno + usable_segs) {
>>>>>>>  		clear_bit(secno, free_i->free_secmap);
>>>>>>>  		free_i->free_sections++;
>>>>>>>  	}
>>>>>>> @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct
>>>>>>> f2fs_sb_info
>>>>>> *sbi,
>>>>>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>>>>>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>>>>>>>  	unsigned int next;
>>>>>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>>>>>>>
>>>>>>>  	spin_lock(&free_i->segmap_lock);
>>>>>>>  	if (test_and_clear_bit(segno, free_i->free_segmap)) { @@ -453,7
>>>>>>> +455,7 @@ static inline void __set_test_and_free(struct
>>>>>>> +f2fs_sb_info *sbi,
>>>>>>>  			goto skip_free;
>>>>>>>  		next = find_next_bit(free_i->free_segmap,
>>>>>>>  				start_segno + sbi->segs_per_sec, start_segno);
>>>>>>> -		if (next >= start_segno + sbi->segs_per_sec) {
>>>>>>> +		if (next >= start_segno + usable_segs) {
>>>>>>>  			if (test_and_clear_bit(secno, free_i->free_secmap))
>>>>>>>  				free_i->free_sections++;
>>>>>>>  		}
>>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index
>>>>>>> 80cb7cd358f8..2686b07ae7eb 100644
>>>>>>> --- a/fs/f2fs/super.c
>>>>>>> +++ b/fs/f2fs/super.c
>>>>>>> @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct
>>>>>>> f2fs_sb_info
>>>> *sbi)
>>>>>>>  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);  #ifdef
>>>>>> CONFIG_BLK_DEV_ZONED
>>>>>>>  		kvfree(FDEV(i).blkz_seq);
>>>>>>> +		kvfree(FDEV(i).zone_capacity_blocks);
>>>>>>
>>>>>> Now, f2fs_kzalloc won't allocate vmalloc's memory, so it's safe to use kfree().
>>>>> Ok
>>>>>>
>>>>>>>  #endif
>>>>>>>  	}
>>>>>>>  	kvfree(sbi->devs);
>>>>>>> @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
>>>>>>> f2fs_sb_info *sbi)  }
>>>>>>>
>>>>>>>  #ifdef CONFIG_BLK_DEV_ZONED
>>>>>>> +
>>>>>>> +struct f2fs_report_zones_args {
>>>>>>> +	struct f2fs_dev_info *dev;
>>>>>>> +	bool zone_cap_mismatch;
>>>>>>> +};
>>>>>>> +
>>>>>>>  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
>>>>>>> -			       void *data)
>>>>>>> +			      void *data)
>>>>>>>  {
>>>>>>> -	struct f2fs_dev_info *dev = data;
>>>>>>> +	struct f2fs_report_zones_args *rz_args = data;
>>>>>>> +
>>>>>>> +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
>>>>>>> +		return 0;
>>>>>>> +
>>>>>>> +	set_bit(idx, rz_args->dev->blkz_seq);
>>>>>>> +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
>>>>>>> +
>> 	F2FS_LOG_SECTORS_PER_BLOCK;
>>>>>>> +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
>>>>>>> +		rz_args->zone_cap_mismatch = true;
>>>>>>>
>>>>>>> -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
>>>>>>> -		set_bit(idx, dev->blkz_seq);
>>>>>>>  	return 0;
>>>>>>>  }
>>>>>>>
>>>>>>> @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct
>>>>>>> f2fs_sb_info *sbi, int devi)  {
>>>>>>>  	struct block_device *bdev = FDEV(devi).bdev;
>>>>>>>  	sector_t nr_sectors = bdev->bd_part->nr_sects;
>>>>>>> +	struct f2fs_report_zones_args rep_zone_arg;
>>>>>>>  	int ret;
>>>>>>>
>>>>>>>  	if (!f2fs_sb_has_blkzoned(sbi))
>>>>>>> @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct
>>>>>>> f2fs_sb_info *sbi, int
>>>>>> devi)
>>>>>>>  	if (!FDEV(devi).blkz_seq)
>>>>>>>  		return -ENOMEM;
>>>>>>>
>>>>>>> -	/* Get block zones type */
>>>>>>> +	/* Get block zones type and zone-capacity */
>>>>>>> +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
>>>>>>> +					FDEV(devi).nr_blkz *
>> sizeof(block_t),
>>>>>>> +					GFP_KERNEL);
>>>>>>> +	if (!FDEV(devi).zone_capacity_blocks)
>>>>>>> +		return -ENOMEM;
>>>>>>> +
>>>>>>> +	rep_zone_arg.dev = &FDEV(devi);
>>>>>>> +	rep_zone_arg.zone_cap_mismatch = false;
>>>>>>> +
>>>>>>>  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
>>>>>>> -				  &FDEV(devi));
>>>>>>> +				  &rep_zone_arg);
>>>>>>>  	if (ret < 0)
>>>>>>>  		return ret;
>>>>>>
>>>>>> Missed to call kfree(FDEV(devi).zone_capacity_blocks)?
>>>>> Thanks for catching it. Will free it here also.
>>>>>>
>>>>>>>
>>>>>>> +	if (!rep_zone_arg.zone_cap_mismatch) {
>>>>>>> +		kvfree(FDEV(devi).zone_capacity_blocks);
>>>>>>
>>>>>> Ditto, kfree().
>>>>> Ok.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>> +		FDEV(devi).zone_capacity_blocks = NULL;
>>>>>>> +	}
>>>>>>> +
>>>>>>>  	return 0;
>>>>>>>  }
>>>>>>>  #endif
>>>>>>>
>>>>> .
>>>>>
>>> .
>>>
> .
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-09  7:05               ` Chao Yu
@ 2020-07-09  7:11                 ` Aravind Ramesh
  2020-07-10 17:44                   ` Aravind Ramesh
  0 siblings, 1 reply; 16+ messages in thread
From: Aravind Ramesh @ 2020-07-09  7:11 UTC (permalink / raw)
  To: Chao Yu, jaegeuk, linux-fsdevel, linux-f2fs-devel, hch
  Cc: Niklas Cassel, Damien Le Moal, Matias Bjorling

Comments inline.

Thanks,
Aravind

> -----Original Message-----
> From: Chao Yu <yuchao0@huawei.com>
> Sent: Thursday, July 9, 2020 12:35 PM
> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org; linux-
> fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone size
> 
> On 2020/7/9 13:31, Aravind Ramesh wrote:
> > Please find my response inline.
> >
> > Thanks,
> > Aravind
> >
> >> -----Original Message-----
> >> From: Chao Yu <yuchao0@huawei.com>
> >> Sent: Thursday, July 9, 2020 8:26 AM
> >> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org;
> >> linux- fsdevel@vger.kernel.org;
> >> linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
> >> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
> >> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> >> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone
> >> size
> >>
> >> On 2020/7/8 21:04, Aravind Ramesh wrote:
> >>> Please find my response inline.
> >>>
> >>> Thanks,
> >>> Aravind
> > [snip..]
> >>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
> >>>>>>> c35614d255e1..d2156f3f56a5 100644
> >>>>>>> --- a/fs/f2fs/segment.c
> >>>>>>> +++ b/fs/f2fs/segment.c
> >>>>>>> @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct
> >>>>>>> f2fs_sb_info *sbi)  {
> >>>>>>>  	unsigned int start;
> >>>>>>>  	int type;
> >>>>>>> +	struct seg_entry *sentry;
> >>>>>>>
> >>>>>>>  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
> >>>>>>> -		struct seg_entry *sentry = get_seg_entry(sbi, start);
> >>>>>>> +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
> >>>>>>
> >>>>>> If usable blocks count is zero, shouldn't we update
> >>>>>> SIT_I(sbi)->written_valid_blocks as we did when there is partial
> >>>>>> usable block in
> >>>> current segment?
> >>>>> If usable_block_count is zero, then it is like a dead segment, all
> >>>>> blocks in the segment lie after the zone-capacity in the zone. So
> >>>>> there can never be
> >>>> a valid written content on these segments, hence it is not updated.
> >>>>> In the other case, when a segment start before the zone-capacity
> >>>>> and it ends beyond zone-capacity, then there are some blocks
> >>>>> before zone-capacity
> >>>> which can be used, so they are accounted for.
> >>>>
> >>>> I'm thinking that for limit_free_user_blocks() function, it assumes
> >>>> all unwritten blocks as potential reclaimable blocks, however
> >>>> segment after zone-capacity should never be used or reclaimable, it
> >>>> looks calculation could
> >> be not correct here.
> >>>>
> >>> The sbi->user_block_count is updated with the total usable_blocks in
> >>> the full file system during the formatting of the file system using
> >>> mkfs.f2fs. Please see the f2fs-tools patch series that I have
> >>> submitted along with
> >> this patch set.
> >>>
> >>> So sbi->user_block_count reflects the actual number of usable blocks
> >>> (i.e. total
> >> blocks - unusable blocks).
> >>
> >> Alright, will check both kernel and f2fs-tools change again later. :)
> >>
> >>>
> >>>> static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi) {
> >>>> 	block_t reclaimable_user_blocks = sbi->user_block_count -
> >>>> 		written_block_count(sbi);
> >>>> 	return (long)(reclaimable_user_blocks * LIMIT_FREE_BLOCK) / 100; }
> >>>>
> >>>> static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi) {
> >>>> 	block_t invalid_user_blocks = sbi->user_block_count -
> >>>> 					written_block_count(sbi);
> >>>> 	/*
> >>>> 	 * Background GC is triggered with the following conditions.
> >>>> 	 * 1. There are a number of invalid blocks.
> >>>> 	 * 2. There is not enough free space.
> >>>> 	 */
> >>>> 	if (invalid_user_blocks > limit_invalid_user_blocks(sbi) &&
> >>>> 			free_user_blocks(sbi) < limit_free_user_blocks(sbi))
> >>>>
> >>>> -- In this condition, free_user_blocks() doesn't include segments
> >>>> after zone-capacity, however limit_free_user_blocks() includes them.
> >>> In the second patch of this patch set, free_user_blocks is updated
> >>> to account for
> >> the segments after zone-capacity.
> >>> It basically gets the free segment(segments before zone capacity and
> >>> free) block count and deducts the overprovision segment block count.
> >>> It also
> >> considers the spanning segments block count into account.
> >>
> >> Okay.
> >>
> >>>
> >>>
> >>>>
> >>>> 		return true;
> >>>> 	return false;
> >>>> }
> >>>>
> >>>>
> >>>>>>
> >>>>>>> +			continue;
> >>>>>>> +		sentry = get_seg_entry(sbi, start);
> >>>>>>>  		if (!sentry->valid_blocks)
> >>>>>>>  			__set_free(sbi, start);
> >>>>>>>  		else
> >>>>>>> @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct
> >>>>>>> f2fs_sb_info
> >>>> *sbi)
> >>>>>>>  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
> >>>>>>>  	struct free_segmap_info *free_i = FREE_I(sbi);
> >>>>>>>  	unsigned int segno = 0, offset = 0, secno;
> >>>>>>> -	unsigned short valid_blocks;
> >>>>>>> +	unsigned short valid_blocks, usable_blks_in_seg;
> >>>>>>>  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
> >>>>>>>
> >>>>>>>  	while (1) {
> >>>>>>> @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct
> >>>>>>> f2fs_sb_info
> >>>> *sbi)
> >>>>>>>  			break;
> >>>>>>>  		offset = segno + 1;
> >>>>>>>  		valid_blocks = get_valid_blocks(sbi, segno, false);
> >>>>>>> -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
> >>>>>>> +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
> >>>>>>> +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
> >>>>>>
> >>>>>> It needs to traverse .cur_valid_map bitmap to check whether
> >>>>>> blocks in range of [0, usable_blks_in_seg] are all valid or not,
> >>>>>> if there is at least one usable block in the range, segment should be dirty.
> >>>>> For the segments which start and end before zone-capacity are just
> >>>>> like any
> >>>> normal segments.
> >>>>> Segments which start after the zone-capacity are fully unusable
> >>>>> and are marked as
> >>>> used in the free_seg_bitmap, so these segments are never used.
> >>>>> Segments which span across the zone-capacity have some unusable
> >>>>> blocks. Even
> >>>> when blocks from these segments are allocated/deallocated the
> >>>> valid_blocks counter is incremented/decremented, reflecting the
> >>>> current
> >> valid_blocks count.
> >>>>> Comparing valid_blocks count with usable_blocks count in the
> >>>>> segment can
> >>>> indicate if the segment is dirty or fully used.
> >>>>
> >>>> I thought that if there is one valid block locates in range of
> >>>> [usable_blks_in_seg, blks_per_seg] (after zone-capacity), the
> >>>> condition will be incorrect. That should never happen, right?
> >>> Yes, this will never happen. All blocks after zone-capacity are never usable.
> >>>>
> >>>> If so, how about adjusting check_block_count() to do sanity check
> >>>> on bitmap locates after zone-capacity to make sure there is no free slots there.
> >>>
> >>> Ok, I will add this check in check_block_count. It makes sense.
> >>>
> >>>>
> >>>>> Sorry, but could you please share why cur_valid_map needs to be traversed
> ?
> >>>>>
> >>>>>>
> >>>>>> One question, if we select dirty segment which across
> >>>>>> zone-capacity as opened segment (in curseg), how can we avoid
> >>>>>> allocating usable block beyong zone-capacity in such segment via
> .cur_valid_map?
> >>>>> For zoned devices, we have to allocate blocks sequentially, so
> >>>>> it's always in LFS
> >>>> manner it is allocated.
> >>>>> The __has_curseg_space() checks for the usable blocks and stops
> >>>>> allocating blocks
> >>>> after zone-capacity.
> >>>>
> >>>> Oh, that was implemented in patch 2, I haven't checked that
> >>>> patch...sorry, however, IMO, patch should be made to apply
> >>>> independently, what if do allocation only after applying patch
> >>>> 1..., do we need to
> >> merge them into one?
> >>> The patches were split keeping in mind that all data structure
> >>> related and initialization Changes would go into patch 1 and IO path
> >>> and GC related
> >> changes in patch 2.
> >>> But if you think, merging them to a single patch will be easier to
> >>> review,
> >>
> >> Yes, please, it's not only about easier review, but also for better
> >> maintenance of patches in upstream, otherwise, it's not possible to
> >> apply, backport, revert one of two patches independently.
> >>
> >> I still didn't get the full picture of using such zns device which
> >> has configured zone- capacity, is it like?
> >> 1. configure zone-capacity in zns device 2. mkfs.f2fs zns device 3.
> >> mount zns device
> >
> > Zone-capacity is set by the device vendor. It could be same as
> > zone-size or less than zone-size depending on vendor. It cannot be configured by
> the user. So the step 1 is not possible.
> > Since NVMe ZNS device zones are sequentially write only, we need
> > another zoned device with Conventional zones or any normal block device for the
> metadata operations of F2fs.
> > I have provided some more explanation in the cover letter of the kernel patch set
> on this.
> > Step 2 is mkfs.f2fs zns device + block device (mkfs.f2fs -m -c
> > /dev/nvme0n1 /dev/nullb1)
> >
> > A typical nvme-cli output of a zoned device shows zone start and capacity and
> write pointer as below:
> >
> > SLBA: 0x0             WP: 0x0             Cap: 0x18800    State: EMPTY        Type:
> SEQWRITE_REQ   Attrs: 0x0
> > SLBA: 0x20000    WP: 0x20000    Cap: 0x18800    State: EMPTY        Type:
> SEQWRITE_REQ   Attrs: 0x0
> > SLBA: 0x40000    WP: 0x40000    Cap: 0x18800    State: EMPTY        Type:
> SEQWRITE_REQ   Attrs: 0x0
> >
> > Here zone size is 64MB, capacity is 49MB, WP is at zone start as the
> > zone is empty. Here for each zone, only zone start + 49MB is usable
> > area, any lba/sector after 49MB cannot be read or written to, the
> > drive will fail any attempts to read/write. So, the second zone starts at 64MB and is
> usable till 113MB (64 + 49) and the range between 113 and 128MB is again
> unusable. The next zone starts at 128MB, and so on.
> 
> Thanks for the detailed explanation, more clear now. :)
> 
> Could you please add above description into commit message of your kernel patch?
> And also please consider to add simple introduction of f2fs zns device support into
> f2fs.rst for our user?

Sure :). 
I will update f2fs.rst and patch commit message in V2. 
Thank you.
> 
> Thanks,
> 
> >
> >>
> >> Can we change zone-capacity dynamically after step 2? Or we should
> >> run mkfs.f2fs again whenever update zone-capacity?
> > User cannot change zone-capacity dynamically. It is device dependent.
> >>
> >> Thanks,
> >>
> >>> then I shall merge it and send it as one patch in V2, along with
> >>> other suggestions
> >> incorporated.
> >>>
> >>> Please let me know.
> >>>>
> >>>>>>
> >>>>>>>  			continue;
> >>>>>>> -		if (valid_blocks > sbi->blocks_per_seg) {
> >>>>>>> +		if (valid_blocks > usable_blks_in_seg) {
> >>>>>>>  			f2fs_bug_on(sbi, 1);
> >>>>>>>  			continue;
> >>>>>>>  		}
> >>>>>>> @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct
> >>>>>>> f2fs_sb_info *sbi)
> >>>>>>>
> >>>>>>>  	return 0;
> >>>>>>>  }
> >>>>>>> +
> >>>>>>> +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
> >>>>>>> +						unsigned int dev_idx)
> >>>>>>> +{
> >>>>>>> +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
> >>>>>>> +		return true;
> >>>>>>> +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
> >>>>>>> +
> >>>>>>> +/* Return the zone index in the given device */ static unsigned
> >>>>>>> +int get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
> >>>>>>> +					int dev_idx)
> >>>>>>> +{
> >>>>>>> +	block_t sec_start_blkaddr = START_BLOCK(sbi,
> >>>>>>> +GET_SEG_FROM_SEC(sbi, secno));
> >>>>>>> +
> >>>>>>> +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
> >>>>>>> +						sbi->log_blocks_per_blkz;
> }
> >>>>>>> +
> >>>>>>> +/*
> >>>>>>> + * Return the usable segments in a section based on the zone's
> >>>>>>> + * corresponding zone capacity. Zone is equal to a section.
> >>>>>>> + */
> >>>>>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(
> >>>>>>> +		struct f2fs_sb_info *sbi, unsigned int segno) {
> >>>>>>> +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
> >>>>>>> +
> >>>>>>> +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
> >>>>>>> +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno),
> >>>>>>> +dev_idx);
> >>>>>>> +
> >>>>>>> +	/* Conventional zone's capacity is always equal to zone size */
> >>>>>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> >>>>>>> +		return sbi->segs_per_sec;
> >>>>>>> +
> >>>>>>> +	/*
> >>>>>>> +	 * If the zone_capacity_blocks array is NULL, then zone capacity
> >>>>>>> +	 * is equal to the zone size for all zones
> >>>>>>> +	 */
> >>>>>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> >>>>>>> +		return sbi->segs_per_sec;
> >>>>>>> +
> >>>>>>> +	/* Get the segment count beyond zone capacity block */
> >>>>>>> +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
> >>>>>>> +
> >> 	FDEV(dev_idx).zone_capacity_blocks[zone_idx])
> >>>>>>>>
> >>>>>>> +				sbi->log_blocks_per_seg;
> >>>>>>> +	return sbi->segs_per_sec - unusable_segs_in_sec; }
> >>>>>>> +
> >>>>>>> +/*
> >>>>>>> + * Return the number of usable blocks in a segment. The number
> >>>>>>> +of blocks
> >>>>>>> + * returned is always equal to the number of blocks in a
> >>>>>>> +segment for
> >>>>>>> + * segments fully contained within a sequential zone capacity
> >>>>>>> +or a
> >>>>>>> + * conventional zone. For segments partially contained in a
> >>>>>>> +sequential
> >>>>>>> + * zone capacity, the number of usable blocks up to the zone
> >>>>>>> +capacity
> >>>>>>> + * is returned. 0 is returned in all other cases.
> >>>>>>> + */
> >>>>>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(
> >>>>>>> +			struct f2fs_sb_info *sbi, unsigned int segno) {
> >>>>>>> +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
> >>>>>>> +	unsigned int zone_idx, dev_idx, secno;
> >>>>>>> +
> >>>>>>> +	secno = GET_SEC_FROM_SEG(sbi, segno);
> >>>>>>> +	seg_start = START_BLOCK(sbi, segno);
> >>>>>>> +	dev_idx = f2fs_target_device_index(sbi, seg_start);
> >>>>>>> +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
> >>>>>>> +
> >>>>>>> +	/*
> >>>>>>> +	 * Conventional zone's capacity is always equal to zone size,
> >>>>>>> +	 * so, blocks per segment is unchanged.
> >>>>>>> +	 */
> >>>>>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> >>>>>>> +		return sbi->blocks_per_seg;
> >>>>>>> +
> >>>>>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> >>>>>>> +		return sbi->blocks_per_seg;
> >>>>>>> +
> >>>>>>> +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi,
> >> secno));
> >>>>>>> +	sec_cap_blkaddr = sec_start_blkaddr +
> >>>>>>> +
> >> 	FDEV(dev_idx).zone_capacity_blocks[zone_idx];
> >>>>>>> +
> >>>>>>> +	/*
> >>>>>>> +	 * If segment starts before zone capacity and spans beyond
> >>>>>>> +	 * zone capacity, then usable blocks are from seg start to
> >>>>>>> +	 * zone capacity. If the segment starts after the zone capacity,
> >>>>>>> +	 * then there are no usable blocks.
> >>>>>>> +	 */
> >>>>>>> +	if (seg_start >= sec_cap_blkaddr)
> >>>>>>> +		return 0;
> >>>>>>> +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
> >>>>>>> +		return sec_cap_blkaddr - seg_start;
> >>>>>>> +
> >>>>>>> +	return sbi->blocks_per_seg;
> >>>>>>> +}
> >>>>>>>  #else
> >>>>>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)  {
> >>>>>>> @@
> >>>>>>> -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct
> >>>>>>> f2fs_sb_info
> >>>>>>> *sbi)  {
> >>>>>>>  	return 0;
> >>>>>>>  }
> >>>>>>> +
> >>>>>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct
> >>>>>>> +f2fs_sb_info
> >>>> *sbi,
> >>>>>>> +							unsigned int
> >> segno)
> >>>>>>> +{
> >>>>>>> +	return 0;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct
> >>>>>>> +f2fs_sb_info
> >>>> *sbi,
> >>>>>>> +							unsigned int
> >> segno)
> >>>>>>> +{
> >>>>>>> +	return 0;
> >>>>>>> +}
> >>>>>>>  #endif
> >>>>>>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> >>>>>>> +					unsigned int segno)
> >>>>>>> +{
> >>>>>>> +	if (f2fs_sb_has_blkzoned(sbi))
> >>>>>>> +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
> >>>>>>> +
> >>>>>>> +	return sbi->blocks_per_seg;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> >>>>>>> +					unsigned int segno)
> >>>>>>> +{
> >>>>>>> +	if (f2fs_sb_has_blkzoned(sbi))
> >>>>>>> +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
> >>>>>>> +
> >>>>>>> +	return sbi->segs_per_sec;
> >>>>>>> +}
> >>>>>>>
> >>>>>>>  /*
> >>>>>>>   * Update min, max modified time for cost-benefit GC algorithm
> >>>>>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
> >>>>>>> f261e3e6a69b..79b0dc33feaf 100644
> >>>>>>> --- a/fs/f2fs/segment.h
> >>>>>>> +++ b/fs/f2fs/segment.h
> >>>>>>> @@ -411,6 +411,7 @@ static inline void __set_free(struct
> >>>>>>> f2fs_sb_info *sbi,
> >>>>>> unsigned int segno)
> >>>>>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> >>>>>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> >>>>>>>  	unsigned int next;
> >>>>>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi,
> >>>>>>> +segno);
> >>>>>>>
> >>>>>>>  	spin_lock(&free_i->segmap_lock);
> >>>>>>>  	clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@
> >>>>>>> static inline void __set_free(struct f2fs_sb_info *sbi, unsigned
> >>>>>>> int segno)
> >>>>>>>
> >>>>>>>  	next = find_next_bit(free_i->free_segmap,
> >>>>>>>  			start_segno + sbi->segs_per_sec, start_segno);
> >>>>>>> -	if (next >= start_segno + sbi->segs_per_sec) {
> >>>>>>> +	if (next >= start_segno + usable_segs) {
> >>>>>>>  		clear_bit(secno, free_i->free_secmap);
> >>>>>>>  		free_i->free_sections++;
> >>>>>>>  	}
> >>>>>>> @@ -444,6 +445,7 @@ static inline void
> >>>>>>> __set_test_and_free(struct f2fs_sb_info
> >>>>>> *sbi,
> >>>>>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> >>>>>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> >>>>>>>  	unsigned int next;
> >>>>>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi,
> >>>>>>> +segno);
> >>>>>>>
> >>>>>>>  	spin_lock(&free_i->segmap_lock);
> >>>>>>>  	if (test_and_clear_bit(segno, free_i->free_segmap)) { @@
> >>>>>>> -453,7
> >>>>>>> +455,7 @@ static inline void __set_test_and_free(struct
> >>>>>>> +f2fs_sb_info *sbi,
> >>>>>>>  			goto skip_free;
> >>>>>>>  		next = find_next_bit(free_i->free_segmap,
> >>>>>>>  				start_segno + sbi->segs_per_sec,
> start_segno);
> >>>>>>> -		if (next >= start_segno + sbi->segs_per_sec) {
> >>>>>>> +		if (next >= start_segno + usable_segs) {
> >>>>>>>  			if (test_and_clear_bit(secno, free_i-
> >free_secmap))
> >>>>>>>  				free_i->free_sections++;
> >>>>>>>  		}
> >>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index
> >>>>>>> 80cb7cd358f8..2686b07ae7eb 100644
> >>>>>>> --- a/fs/f2fs/super.c
> >>>>>>> +++ b/fs/f2fs/super.c
> >>>>>>> @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct
> >>>>>>> f2fs_sb_info
> >>>> *sbi)
> >>>>>>>  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);  #ifdef
> >>>>>> CONFIG_BLK_DEV_ZONED
> >>>>>>>  		kvfree(FDEV(i).blkz_seq);
> >>>>>>> +		kvfree(FDEV(i).zone_capacity_blocks);
> >>>>>>
> >>>>>> Now, f2fs_kzalloc won't allocate vmalloc's memory, so it's safe to use
> kfree().
> >>>>> Ok
> >>>>>>
> >>>>>>>  #endif
> >>>>>>>  	}
> >>>>>>>  	kvfree(sbi->devs);
> >>>>>>> @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
> >>>>>>> f2fs_sb_info *sbi)  }
> >>>>>>>
> >>>>>>>  #ifdef CONFIG_BLK_DEV_ZONED
> >>>>>>> +
> >>>>>>> +struct f2fs_report_zones_args {
> >>>>>>> +	struct f2fs_dev_info *dev;
> >>>>>>> +	bool zone_cap_mismatch;
> >>>>>>> +};
> >>>>>>> +
> >>>>>>>  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> >>>>>>> -			       void *data)
> >>>>>>> +			      void *data)
> >>>>>>>  {
> >>>>>>> -	struct f2fs_dev_info *dev = data;
> >>>>>>> +	struct f2fs_report_zones_args *rz_args = data;
> >>>>>>> +
> >>>>>>> +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> >>>>>>> +		return 0;
> >>>>>>> +
> >>>>>>> +	set_bit(idx, rz_args->dev->blkz_seq);
> >>>>>>> +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
> >>>>>>> +
> >> 	F2FS_LOG_SECTORS_PER_BLOCK;
> >>>>>>> +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
> >>>>>>> +		rz_args->zone_cap_mismatch = true;
> >>>>>>>
> >>>>>>> -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
> >>>>>>> -		set_bit(idx, dev->blkz_seq);
> >>>>>>>  	return 0;
> >>>>>>>  }
> >>>>>>>
> >>>>>>> @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct
> >>>>>>> f2fs_sb_info *sbi, int devi)  {
> >>>>>>>  	struct block_device *bdev = FDEV(devi).bdev;
> >>>>>>>  	sector_t nr_sectors = bdev->bd_part->nr_sects;
> >>>>>>> +	struct f2fs_report_zones_args rep_zone_arg;
> >>>>>>>  	int ret;
> >>>>>>>
> >>>>>>>  	if (!f2fs_sb_has_blkzoned(sbi)) @@ -3078,12 +3093,26 @@ static
> >>>>>>> int init_blkz_info(struct f2fs_sb_info *sbi, int
> >>>>>> devi)
> >>>>>>>  	if (!FDEV(devi).blkz_seq)
> >>>>>>>  		return -ENOMEM;
> >>>>>>>
> >>>>>>> -	/* Get block zones type */
> >>>>>>> +	/* Get block zones type and zone-capacity */
> >>>>>>> +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
> >>>>>>> +					FDEV(devi).nr_blkz *
> >> sizeof(block_t),
> >>>>>>> +					GFP_KERNEL);
> >>>>>>> +	if (!FDEV(devi).zone_capacity_blocks)
> >>>>>>> +		return -ENOMEM;
> >>>>>>> +
> >>>>>>> +	rep_zone_arg.dev = &FDEV(devi);
> >>>>>>> +	rep_zone_arg.zone_cap_mismatch = false;
> >>>>>>> +
> >>>>>>>  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES,
> f2fs_report_zone_cb,
> >>>>>>> -				  &FDEV(devi));
> >>>>>>> +				  &rep_zone_arg);
> >>>>>>>  	if (ret < 0)
> >>>>>>>  		return ret;
> >>>>>>
> >>>>>> Missed to call kfree(FDEV(devi).zone_capacity_blocks)?
> >>>>> Thanks for catching it. Will free it here also.
> >>>>>>
> >>>>>>>
> >>>>>>> +	if (!rep_zone_arg.zone_cap_mismatch) {
> >>>>>>> +		kvfree(FDEV(devi).zone_capacity_blocks);
> >>>>>>
> >>>>>> Ditto, kfree().
> >>>>> Ok.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>>> +		FDEV(devi).zone_capacity_blocks = NULL;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>>  	return 0;
> >>>>>>>  }
> >>>>>>>  #endif
> >>>>>>>
> >>>>> .
> >>>>>
> >>> .
> >>>
> > .
> >


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size
  2020-07-09  7:11                 ` Aravind Ramesh
@ 2020-07-10 17:44                   ` Aravind Ramesh
  0 siblings, 0 replies; 16+ messages in thread
From: Aravind Ramesh @ 2020-07-10 17:44 UTC (permalink / raw)
  To: Chao Yu, jaegeuk, linux-fsdevel, linux-f2fs-devel, hch
  Cc: Niklas Cassel, Damien Le Moal, Matias Bjorling

Hello Chao,

I have sent the second version of this patch, with below changes.
Have merged the 2 patches into one. 
Updated commit message with nvme-cli output, as you suggested.
Updated check_block_count().
Updated f2fs.rst.
Changed kvfree() to kfree().
Please see inline, regarding one comment on kfree().

Please let me know your feedback.

Thanks,
Aravind

> -----Original Message-----
> From: Aravind Ramesh
> Sent: Thursday, July 9, 2020 12:41 PM
> To: Chao Yu <yuchao0@huawei.com>; jaegeuk@kernel.org; linux-
> fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> Subject: RE: [PATCH 1/2] f2fs: support zone capacity less than zone size
> 
> Comments inline.
> 
> Thanks,
> Aravind
> 
> > -----Original Message-----
> > From: Chao Yu <yuchao0@huawei.com>
> > Sent: Thursday, July 9, 2020 12:35 PM
> > To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org;
> > linux- fsdevel@vger.kernel.org;
> > linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
> > Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
> > <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> > Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone
> > size
> >
> > On 2020/7/9 13:31, Aravind Ramesh wrote:
> > > Please find my response inline.
> > >
> > > Thanks,
> > > Aravind
> > >
> > >> -----Original Message-----
> > >> From: Chao Yu <yuchao0@huawei.com>
> > >> Sent: Thursday, July 9, 2020 8:26 AM
> > >> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org;
> > >> linux- fsdevel@vger.kernel.org;
> > >> linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
> > >> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
> > >> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> > >> Subject: Re: [PATCH 1/2] f2fs: support zone capacity less than zone
> > >> size
> > >>
> > >> On 2020/7/8 21:04, Aravind Ramesh wrote:
> > >>> Please find my response inline.
> > >>>
> > >>> Thanks,
> > >>> Aravind
> > > [snip..]
> > >>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
> > >>>>>>> c35614d255e1..d2156f3f56a5 100644
> > >>>>>>> --- a/fs/f2fs/segment.c
> > >>>>>>> +++ b/fs/f2fs/segment.c
> > >>>>>>> @@ -4294,9 +4294,12 @@ static void init_free_segmap(struct
> > >>>>>>> f2fs_sb_info *sbi)  {
> > >>>>>>>  	unsigned int start;
> > >>>>>>>  	int type;
> > >>>>>>> +	struct seg_entry *sentry;
> > >>>>>>>
> > >>>>>>>  	for (start = 0; start < MAIN_SEGS(sbi); start++) {
> > >>>>>>> -		struct seg_entry *sentry = get_seg_entry(sbi, start);
> > >>>>>>> +		if (f2fs_usable_blks_in_seg(sbi, start) == 0)
> > >>>>>>
> > >>>>>> If usable blocks count is zero, shouldn't we update
> > >>>>>> SIT_I(sbi)->written_valid_blocks as we did when there is
> > >>>>>> partial usable block in
> > >>>> current segment?
> > >>>>> If usable_block_count is zero, then it is like a dead segment,
> > >>>>> all blocks in the segment lie after the zone-capacity in the
> > >>>>> zone. So there can never be
> > >>>> a valid written content on these segments, hence it is not updated.
> > >>>>> In the other case, when a segment start before the zone-capacity
> > >>>>> and it ends beyond zone-capacity, then there are some blocks
> > >>>>> before zone-capacity
> > >>>> which can be used, so they are accounted for.
> > >>>>
> > >>>> I'm thinking that for limit_free_user_blocks() function, it
> > >>>> assumes all unwritten blocks as potential reclaimable blocks,
> > >>>> however segment after zone-capacity should never be used or
> > >>>> reclaimable, it looks calculation could
> > >> be not correct here.
> > >>>>
> > >>> The sbi->user_block_count is updated with the total usable_blocks
> > >>> in the full file system during the formatting of the file system
> > >>> using mkfs.f2fs. Please see the f2fs-tools patch series that I
> > >>> have submitted along with
> > >> this patch set.
> > >>>
> > >>> So sbi->user_block_count reflects the actual number of usable
> > >>> blocks (i.e. total
> > >> blocks - unusable blocks).
> > >>
> > >> Alright, will check both kernel and f2fs-tools change again later.
> > >> :)
> > >>
> > >>>
> > >>>> static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi) {
> > >>>> 	block_t reclaimable_user_blocks = sbi->user_block_count -
> > >>>> 		written_block_count(sbi);
> > >>>> 	return (long)(reclaimable_user_blocks * LIMIT_FREE_BLOCK) / 100;
> > >>>> }
> > >>>>
> > >>>> static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi) {
> > >>>> 	block_t invalid_user_blocks = sbi->user_block_count -
> > >>>> 					written_block_count(sbi);
> > >>>> 	/*
> > >>>> 	 * Background GC is triggered with the following conditions.
> > >>>> 	 * 1. There are a number of invalid blocks.
> > >>>> 	 * 2. There is not enough free space.
> > >>>> 	 */
> > >>>> 	if (invalid_user_blocks > limit_invalid_user_blocks(sbi) &&
> > >>>> 			free_user_blocks(sbi) < limit_free_user_blocks(sbi))
> > >>>>
> > >>>> -- In this condition, free_user_blocks() doesn't include segments
> > >>>> after zone-capacity, however limit_free_user_blocks() includes them.
> > >>> In the second patch of this patch set, free_user_blocks is updated
> > >>> to account for
> > >> the segments after zone-capacity.
> > >>> It basically gets the free segment(segments before zone capacity
> > >>> and
> > >>> free) block count and deducts the overprovision segment block count.
> > >>> It also
> > >> considers the spanning segments block count into account.
> > >>
> > >> Okay.
> > >>
> > >>>
> > >>>
> > >>>>
> > >>>> 		return true;
> > >>>> 	return false;
> > >>>> }
> > >>>>
> > >>>>
> > >>>>>>
> > >>>>>>> +			continue;
> > >>>>>>> +		sentry = get_seg_entry(sbi, start);
> > >>>>>>>  		if (!sentry->valid_blocks)
> > >>>>>>>  			__set_free(sbi, start);
> > >>>>>>>  		else
> > >>>>>>> @@ -4316,7 +4319,7 @@ static void init_dirty_segmap(struct
> > >>>>>>> f2fs_sb_info
> > >>>> *sbi)
> > >>>>>>>  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
> > >>>>>>>  	struct free_segmap_info *free_i = FREE_I(sbi);
> > >>>>>>>  	unsigned int segno = 0, offset = 0, secno;
> > >>>>>>> -	unsigned short valid_blocks;
> > >>>>>>> +	unsigned short valid_blocks, usable_blks_in_seg;
> > >>>>>>>  	unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
> > >>>>>>>
> > >>>>>>>  	while (1) {
> > >>>>>>> @@ -4326,9 +4329,10 @@ static void init_dirty_segmap(struct
> > >>>>>>> f2fs_sb_info
> > >>>> *sbi)
> > >>>>>>>  			break;
> > >>>>>>>  		offset = segno + 1;
> > >>>>>>>  		valid_blocks = get_valid_blocks(sbi, segno, false);
> > >>>>>>> -		if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
> > >>>>>>> +		usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
> > >>>>>>> +		if (valid_blocks == usable_blks_in_seg || !valid_blocks)
> > >>>>>>
> > >>>>>> It needs to traverse .cur_valid_map bitmap to check whether
> > >>>>>> blocks in range of [0, usable_blks_in_seg] are all valid or
> > >>>>>> not, if there is at least one usable block in the range, segment should be
> dirty.
> > >>>>> For the segments which start and end before zone-capacity are
> > >>>>> just like any
> > >>>> normal segments.
> > >>>>> Segments which start after the zone-capacity are fully unusable
> > >>>>> and are marked as
> > >>>> used in the free_seg_bitmap, so these segments are never used.
> > >>>>> Segments which span across the zone-capacity have some unusable
> > >>>>> blocks. Even
> > >>>> when blocks from these segments are allocated/deallocated the
> > >>>> valid_blocks counter is incremented/decremented, reflecting the
> > >>>> current
> > >> valid_blocks count.
> > >>>>> Comparing valid_blocks count with usable_blocks count in the
> > >>>>> segment can
> > >>>> indicate if the segment is dirty or fully used.
> > >>>>
> > >>>> I thought that if there is one valid block locates in range of
> > >>>> [usable_blks_in_seg, blks_per_seg] (after zone-capacity), the
> > >>>> condition will be incorrect. That should never happen, right?
> > >>> Yes, this will never happen. All blocks after zone-capacity are never usable.
> > >>>>
> > >>>> If so, how about adjusting check_block_count() to do sanity check
> > >>>> on bitmap locates after zone-capacity to make sure there is no free slots
> there.
> > >>>
> > >>> Ok, I will add this check in check_block_count. It makes sense.
> > >>>
> > >>>>
> > >>>>> Sorry, but could you please share why cur_valid_map needs to be
> > >>>>> traversed
> > ?
> > >>>>>
> > >>>>>>
> > >>>>>> One question, if we select dirty segment which across
> > >>>>>> zone-capacity as opened segment (in curseg), how can we avoid
> > >>>>>> allocating usable block beyong zone-capacity in such segment
> > >>>>>> via
> > .cur_valid_map?
> > >>>>> For zoned devices, we have to allocate blocks sequentially, so
> > >>>>> it's always in LFS
> > >>>> manner it is allocated.
> > >>>>> The __has_curseg_space() checks for the usable blocks and stops
> > >>>>> allocating blocks
> > >>>> after zone-capacity.
> > >>>>
> > >>>> Oh, that was implemented in patch 2, I haven't checked that
> > >>>> patch...sorry, however, IMO, patch should be made to apply
> > >>>> independently, what if do allocation only after applying patch
> > >>>> 1..., do we need to
> > >> merge them into one?
> > >>> The patches were split keeping in mind that all data structure
> > >>> related and initialization Changes would go into patch 1 and IO
> > >>> path and GC related
> > >> changes in patch 2.
> > >>> But if you think, merging them to a single patch will be easier to
> > >>> review,
> > >>
> > >> Yes, please, it's not only about easier review, but also for better
> > >> maintenance of patches in upstream, otherwise, it's not possible to
> > >> apply, backport, revert one of two patches independently.
> > >>
> > >> I still didn't get the full picture of using such zns device which
> > >> has configured zone- capacity, is it like?
> > >> 1. configure zone-capacity in zns device 2. mkfs.f2fs zns device 3.
> > >> mount zns device
> > >
> > > Zone-capacity is set by the device vendor. It could be same as
> > > zone-size or less than zone-size depending on vendor. It cannot be
> > > configured by
> > the user. So the step 1 is not possible.
> > > Since NVMe ZNS device zones are sequentially write only, we need
> > > another zoned device with Conventional zones or any normal block
> > > device for the
> > metadata operations of F2fs.
> > > I have provided some more explanation in the cover letter of the
> > > kernel patch set
> > on this.
> > > Step 2 is mkfs.f2fs zns device + block device (mkfs.f2fs -m -c
> > > /dev/nvme0n1 /dev/nullb1)
> > >
> > > A typical nvme-cli output of a zoned device shows zone start and
> > > capacity and
> > write pointer as below:
> > >
> > > SLBA: 0x0             WP: 0x0             Cap: 0x18800    State: EMPTY        Type:
> > SEQWRITE_REQ   Attrs: 0x0
> > > SLBA: 0x20000    WP: 0x20000    Cap: 0x18800    State: EMPTY        Type:
> > SEQWRITE_REQ   Attrs: 0x0
> > > SLBA: 0x40000    WP: 0x40000    Cap: 0x18800    State: EMPTY        Type:
> > SEQWRITE_REQ   Attrs: 0x0
> > >
> > > Here zone size is 64MB, capacity is 49MB, WP is at zone start as the
> > > zone is empty. Here for each zone, only zone start + 49MB is usable
> > > area, any lba/sector after 49MB cannot be read or written to, the
> > > drive will fail any attempts to read/write. So, the second zone
> > > starts at 64MB and is
> > usable till 113MB (64 + 49) and the range between 113 and 128MB is
> > again unusable. The next zone starts at 128MB, and so on.
> >
> > Thanks for the detailed explanation, more clear now. :)
> >
> > Could you please add above description into commit message of your kernel
> patch?
> > And also please consider to add simple introduction of f2fs zns device
> > support into f2fs.rst for our user?
> 
> Sure :).
> I will update f2fs.rst and patch commit message in V2.
> Thank you.
> >
> > Thanks,
> >
> > >
> > >>
> > >> Can we change zone-capacity dynamically after step 2? Or we should
> > >> run mkfs.f2fs again whenever update zone-capacity?
> > > User cannot change zone-capacity dynamically. It is device dependent.
> > >>
> > >> Thanks,
> > >>
> > >>> then I shall merge it and send it as one patch in V2, along with
> > >>> other suggestions
> > >> incorporated.
> > >>>
> > >>> Please let me know.
> > >>>>
> > >>>>>>
> > >>>>>>>  			continue;
> > >>>>>>> -		if (valid_blocks > sbi->blocks_per_seg) {
> > >>>>>>> +		if (valid_blocks > usable_blks_in_seg) {
> > >>>>>>>  			f2fs_bug_on(sbi, 1);
> > >>>>>>>  			continue;
> > >>>>>>>  		}
> > >>>>>>> @@ -4678,6 +4682,101 @@ int f2fs_check_write_pointer(struct
> > >>>>>>> f2fs_sb_info *sbi)
> > >>>>>>>
> > >>>>>>>  	return 0;
> > >>>>>>>  }
> > >>>>>>> +
> > >>>>>>> +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int
> zone_idx,
> > >>>>>>> +						unsigned int dev_idx)
> > >>>>>>> +{
> > >>>>>>> +	if (!bdev_is_zoned(FDEV(dev_idx).bdev))
> > >>>>>>> +		return true;
> > >>>>>>> +	return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
> > >>>>>>> +
> > >>>>>>> +/* Return the zone index in the given device */ static
> > >>>>>>> +unsigned int get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
> > >>>>>>> +					int dev_idx)
> > >>>>>>> +{
> > >>>>>>> +	block_t sec_start_blkaddr = START_BLOCK(sbi,
> > >>>>>>> +GET_SEG_FROM_SEC(sbi, secno));
> > >>>>>>> +
> > >>>>>>> +	return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
> > >>>>>>> +						sbi->log_blocks_per_blkz;
> > }
> > >>>>>>> +
> > >>>>>>> +/*
> > >>>>>>> + * Return the usable segments in a section based on the
> > >>>>>>> +zone's
> > >>>>>>> + * corresponding zone capacity. Zone is equal to a section.
> > >>>>>>> + */
> > >>>>>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(
> > >>>>>>> +		struct f2fs_sb_info *sbi, unsigned int segno) {
> > >>>>>>> +	unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
> > >>>>>>> +
> > >>>>>>> +	dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
> > >>>>>>> +	zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno),
> > >>>>>>> +dev_idx);
> > >>>>>>> +
> > >>>>>>> +	/* Conventional zone's capacity is always equal to zone size */
> > >>>>>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> > >>>>>>> +		return sbi->segs_per_sec;
> > >>>>>>> +
> > >>>>>>> +	/*
> > >>>>>>> +	 * If the zone_capacity_blocks array is NULL, then zone capacity
> > >>>>>>> +	 * is equal to the zone size for all zones
> > >>>>>>> +	 */
> > >>>>>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> > >>>>>>> +		return sbi->segs_per_sec;
> > >>>>>>> +
> > >>>>>>> +	/* Get the segment count beyond zone capacity block */
> > >>>>>>> +	unusable_segs_in_sec = (sbi->blocks_per_blkz -
> > >>>>>>> +
> > >> 	FDEV(dev_idx).zone_capacity_blocks[zone_idx])
> > >>>>>>>>
> > >>>>>>> +				sbi->log_blocks_per_seg;
> > >>>>>>> +	return sbi->segs_per_sec - unusable_segs_in_sec; }
> > >>>>>>> +
> > >>>>>>> +/*
> > >>>>>>> + * Return the number of usable blocks in a segment. The
> > >>>>>>> +number of blocks
> > >>>>>>> + * returned is always equal to the number of blocks in a
> > >>>>>>> +segment for
> > >>>>>>> + * segments fully contained within a sequential zone capacity
> > >>>>>>> +or a
> > >>>>>>> + * conventional zone. For segments partially contained in a
> > >>>>>>> +sequential
> > >>>>>>> + * zone capacity, the number of usable blocks up to the zone
> > >>>>>>> +capacity
> > >>>>>>> + * is returned. 0 is returned in all other cases.
> > >>>>>>> + */
> > >>>>>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(
> > >>>>>>> +			struct f2fs_sb_info *sbi, unsigned int segno) {
> > >>>>>>> +	block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
> > >>>>>>> +	unsigned int zone_idx, dev_idx, secno;
> > >>>>>>> +
> > >>>>>>> +	secno = GET_SEC_FROM_SEG(sbi, segno);
> > >>>>>>> +	seg_start = START_BLOCK(sbi, segno);
> > >>>>>>> +	dev_idx = f2fs_target_device_index(sbi, seg_start);
> > >>>>>>> +	zone_idx = get_zone_idx(sbi, secno, dev_idx);
> > >>>>>>> +
> > >>>>>>> +	/*
> > >>>>>>> +	 * Conventional zone's capacity is always equal to zone size,
> > >>>>>>> +	 * so, blocks per segment is unchanged.
> > >>>>>>> +	 */
> > >>>>>>> +	if (is_conv_zone(sbi, zone_idx, dev_idx))
> > >>>>>>> +		return sbi->blocks_per_seg;
> > >>>>>>> +
> > >>>>>>> +	if (!FDEV(dev_idx).zone_capacity_blocks)
> > >>>>>>> +		return sbi->blocks_per_seg;
> > >>>>>>> +
> > >>>>>>> +	sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi,
> > >> secno));
> > >>>>>>> +	sec_cap_blkaddr = sec_start_blkaddr +
> > >>>>>>> +
> > >> 	FDEV(dev_idx).zone_capacity_blocks[zone_idx];
> > >>>>>>> +
> > >>>>>>> +	/*
> > >>>>>>> +	 * If segment starts before zone capacity and spans beyond
> > >>>>>>> +	 * zone capacity, then usable blocks are from seg start to
> > >>>>>>> +	 * zone capacity. If the segment starts after the zone capacity,
> > >>>>>>> +	 * then there are no usable blocks.
> > >>>>>>> +	 */
> > >>>>>>> +	if (seg_start >= sec_cap_blkaddr)
> > >>>>>>> +		return 0;
> > >>>>>>> +	if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
> > >>>>>>> +		return sec_cap_blkaddr - seg_start;
> > >>>>>>> +
> > >>>>>>> +	return sbi->blocks_per_seg;
> > >>>>>>> +}
> > >>>>>>>  #else
> > >>>>>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
> > >>>>>>> { @@
> > >>>>>>> -4688,7 +4787,36 @@ int f2fs_check_write_pointer(struct
> > >>>>>>> f2fs_sb_info
> > >>>>>>> *sbi)  {
> > >>>>>>>  	return 0;
> > >>>>>>>  }
> > >>>>>>> +
> > >>>>>>> +static inline unsigned int
> > >>>>>>> +f2fs_usable_zone_blks_in_seg(struct
> > >>>>>>> +f2fs_sb_info
> > >>>> *sbi,
> > >>>>>>> +							unsigned int
> > >> segno)
> > >>>>>>> +{
> > >>>>>>> +	return 0;
> > >>>>>>> +}
> > >>>>>>> +
> > >>>>>>> +static inline unsigned int
> > >>>>>>> +f2fs_usable_zone_segs_in_sec(struct
> > >>>>>>> +f2fs_sb_info
> > >>>> *sbi,
> > >>>>>>> +							unsigned int
> > >> segno)
> > >>>>>>> +{
> > >>>>>>> +	return 0;
> > >>>>>>> +}
> > >>>>>>>  #endif
> > >>>>>>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
> > >>>>>>> +					unsigned int segno)
> > >>>>>>> +{
> > >>>>>>> +	if (f2fs_sb_has_blkzoned(sbi))
> > >>>>>>> +		return f2fs_usable_zone_blks_in_seg(sbi, segno);
> > >>>>>>> +
> > >>>>>>> +	return sbi->blocks_per_seg;
> > >>>>>>> +}
> > >>>>>>> +
> > >>>>>>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
> > >>>>>>> +					unsigned int segno)
> > >>>>>>> +{
> > >>>>>>> +	if (f2fs_sb_has_blkzoned(sbi))
> > >>>>>>> +		return f2fs_usable_zone_segs_in_sec(sbi, segno);
> > >>>>>>> +
> > >>>>>>> +	return sbi->segs_per_sec;
> > >>>>>>> +}
> > >>>>>>>
> > >>>>>>>  /*
> > >>>>>>>   * Update min, max modified time for cost-benefit GC
> > >>>>>>> algorithm diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> > >>>>>>> index f261e3e6a69b..79b0dc33feaf 100644
> > >>>>>>> --- a/fs/f2fs/segment.h
> > >>>>>>> +++ b/fs/f2fs/segment.h
> > >>>>>>> @@ -411,6 +411,7 @@ static inline void __set_free(struct
> > >>>>>>> f2fs_sb_info *sbi,
> > >>>>>> unsigned int segno)
> > >>>>>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> > >>>>>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> > >>>>>>>  	unsigned int next;
> > >>>>>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi,
> > >>>>>>> +segno);
> > >>>>>>>
> > >>>>>>>  	spin_lock(&free_i->segmap_lock);
> > >>>>>>>  	clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@
> > >>>>>>> static inline void __set_free(struct f2fs_sb_info *sbi,
> > >>>>>>> unsigned int segno)
> > >>>>>>>
> > >>>>>>>  	next = find_next_bit(free_i->free_segmap,
> > >>>>>>>  			start_segno + sbi->segs_per_sec, start_segno);
> > >>>>>>> -	if (next >= start_segno + sbi->segs_per_sec) {
> > >>>>>>> +	if (next >= start_segno + usable_segs) {
> > >>>>>>>  		clear_bit(secno, free_i->free_secmap);
> > >>>>>>>  		free_i->free_sections++;
> > >>>>>>>  	}
> > >>>>>>> @@ -444,6 +445,7 @@ static inline void
> > >>>>>>> __set_test_and_free(struct f2fs_sb_info
> > >>>>>> *sbi,
> > >>>>>>>  	unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> > >>>>>>>  	unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
> > >>>>>>>  	unsigned int next;
> > >>>>>>> +	unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi,
> > >>>>>>> +segno);
> > >>>>>>>
> > >>>>>>>  	spin_lock(&free_i->segmap_lock);
> > >>>>>>>  	if (test_and_clear_bit(segno, free_i->free_segmap)) { @@
> > >>>>>>> -453,7
> > >>>>>>> +455,7 @@ static inline void __set_test_and_free(struct
> > >>>>>>> +f2fs_sb_info *sbi,
> > >>>>>>>  			goto skip_free;
> > >>>>>>>  		next = find_next_bit(free_i->free_segmap,
> > >>>>>>>  				start_segno + sbi->segs_per_sec,
> > start_segno);
> > >>>>>>> -		if (next >= start_segno + sbi->segs_per_sec) {
> > >>>>>>> +		if (next >= start_segno + usable_segs) {
> > >>>>>>>  			if (test_and_clear_bit(secno, free_i-
> > >free_secmap))
> > >>>>>>>  				free_i->free_sections++;
> > >>>>>>>  		}
> > >>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index
> > >>>>>>> 80cb7cd358f8..2686b07ae7eb 100644
> > >>>>>>> --- a/fs/f2fs/super.c
> > >>>>>>> +++ b/fs/f2fs/super.c
> > >>>>>>> @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct
> > >>>>>>> f2fs_sb_info
> > >>>> *sbi)
> > >>>>>>>  		blkdev_put(FDEV(i).bdev, FMODE_EXCL);  #ifdef
> > >>>>>> CONFIG_BLK_DEV_ZONED
> > >>>>>>>  		kvfree(FDEV(i).blkz_seq);
> > >>>>>>> +		kvfree(FDEV(i).zone_capacity_blocks);
> > >>>>>>
> > >>>>>> Now, f2fs_kzalloc won't allocate vmalloc's memory, so it's safe
> > >>>>>> to use
> > kfree().
> > >>>>> Ok
> > >>>>>>
> > >>>>>>>  #endif
> > >>>>>>>  	}
> > >>>>>>>  	kvfree(sbi->devs);
> > >>>>>>> @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
> > >>>>>>> f2fs_sb_info *sbi)  }
> > >>>>>>>
> > >>>>>>>  #ifdef CONFIG_BLK_DEV_ZONED
> > >>>>>>> +
> > >>>>>>> +struct f2fs_report_zones_args {
> > >>>>>>> +	struct f2fs_dev_info *dev;
> > >>>>>>> +	bool zone_cap_mismatch;
> > >>>>>>> +};
> > >>>>>>> +
> > >>>>>>>  static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> > >>>>>>> -			       void *data)
> > >>>>>>> +			      void *data)
> > >>>>>>>  {
> > >>>>>>> -	struct f2fs_dev_info *dev = data;
> > >>>>>>> +	struct f2fs_report_zones_args *rz_args = data;
> > >>>>>>> +
> > >>>>>>> +	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> > >>>>>>> +		return 0;
> > >>>>>>> +
> > >>>>>>> +	set_bit(idx, rz_args->dev->blkz_seq);
> > >>>>>>> +	rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
> > >>>>>>> +
> > >> 	F2FS_LOG_SECTORS_PER_BLOCK;
> > >>>>>>> +	if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
> > >>>>>>> +		rz_args->zone_cap_mismatch = true;
> > >>>>>>>
> > >>>>>>> -	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
> > >>>>>>> -		set_bit(idx, dev->blkz_seq);
> > >>>>>>>  	return 0;
> > >>>>>>>  }
> > >>>>>>>
> > >>>>>>> @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct
> > >>>>>>> f2fs_sb_info *sbi, int devi)  {
> > >>>>>>>  	struct block_device *bdev = FDEV(devi).bdev;
> > >>>>>>>  	sector_t nr_sectors = bdev->bd_part->nr_sects;
> > >>>>>>> +	struct f2fs_report_zones_args rep_zone_arg;
> > >>>>>>>  	int ret;
> > >>>>>>>
> > >>>>>>>  	if (!f2fs_sb_has_blkzoned(sbi)) @@ -3078,12 +3093,26 @@
> > >>>>>>> static int init_blkz_info(struct f2fs_sb_info *sbi, int
> > >>>>>> devi)
> > >>>>>>>  	if (!FDEV(devi).blkz_seq)
> > >>>>>>>  		return -ENOMEM;
> > >>>>>>>
> > >>>>>>> -	/* Get block zones type */
> > >>>>>>> +	/* Get block zones type and zone-capacity */
> > >>>>>>> +	FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
> > >>>>>>> +					FDEV(devi).nr_blkz *
> > >> sizeof(block_t),
> > >>>>>>> +					GFP_KERNEL);
> > >>>>>>> +	if (!FDEV(devi).zone_capacity_blocks)
> > >>>>>>> +		return -ENOMEM;
> > >>>>>>> +
> > >>>>>>> +	rep_zone_arg.dev = &FDEV(devi);
> > >>>>>>> +	rep_zone_arg.zone_cap_mismatch = false;
> > >>>>>>> +
> > >>>>>>>  	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES,
> > f2fs_report_zone_cb,
> > >>>>>>> -				  &FDEV(devi));
> > >>>>>>> +				  &rep_zone_arg);
> > >>>>>>>  	if (ret < 0)
> > >>>>>>>  		return ret;
> > >>>>>>
> > >>>>>> Missed to call kfree(FDEV(devi).zone_capacity_blocks)?
> > >>>>> Thanks for catching it. Will free it here also.

This case is actually handled. If error is returned, the control goes to destroy_device_list()
and does a kfree() on it, so it's not needed here. 

> > >>>>>>
> > >>>>>>>
> > >>>>>>> +	if (!rep_zone_arg.zone_cap_mismatch) {
> > >>>>>>> +		kvfree(FDEV(devi).zone_capacity_blocks);
> > >>>>>>
> > >>>>>> Ditto, kfree().
> > >>>>> Ok.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>>
> > >>>>>>> +		FDEV(devi).zone_capacity_blocks = NULL;
> > >>>>>>> +	}
> > >>>>>>> +
> > >>>>>>>  	return 0;
> > >>>>>>>  }
> > >>>>>>>  #endif
> > >>>>>>>
> > >>>>> .
> > >>>>>
> > >>> .
> > >>>
> > > .
> > >


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-07-10 17:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02 15:53 [f2fs-dev] [PATCH 0/2] f2fs: zns zone-capacity support Aravind Ramesh
2020-07-02 15:54 ` [f2fs-dev] [PATCH 1/2] f2fs: support zone capacity less than zone size Aravind Ramesh
2020-07-07  0:07   ` Jaegeuk Kim
2020-07-07  3:27     ` Aravind Ramesh
2020-07-07  3:49       ` Jaegeuk Kim
2020-07-07  5:18         ` Aravind Ramesh
2020-07-07 12:18   ` Chao Yu
2020-07-07 18:23     ` Aravind Ramesh
2020-07-08  2:33       ` Chao Yu
2020-07-08 13:04         ` Aravind Ramesh
2020-07-09  2:55           ` Chao Yu
2020-07-09  5:31             ` Aravind Ramesh
2020-07-09  7:05               ` Chao Yu
2020-07-09  7:11                 ` Aravind Ramesh
2020-07-10 17:44                   ` Aravind Ramesh
2020-07-02 15:54 ` [f2fs-dev] [PATCH 2/2] f2fs: manage zone capacity during writes and gc Aravind Ramesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).