From: Chao Yu <yuchao0@huawei.com>
To: Aravind Ramesh <Aravind.Ramesh@wdc.com>,
"jaegeuk@kernel.org" <jaegeuk@kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-f2fs-devel@lists.sourceforge.net"
<linux-f2fs-devel@lists.sourceforge.net>,
"hch@lst.de" <hch@lst.de>
Cc: Niklas Cassel <Niklas.Cassel@wdc.com>,
Damien Le Moal <Damien.LeMoal@wdc.com>,
Matias Bjorling <Matias.Bjorling@wdc.com>
Subject: Re: [f2fs-dev] [PATCH v2 1/1] f2fs: support zone capacity less than zone size
Date: Wed, 15 Jul 2020 10:02:05 +0800 [thread overview]
Message-ID: <af3f4ce5-5653-38f8-9fce-06efbaae09d5@huawei.com> (raw)
In-Reply-To: <BY5PR04MB69955AA241F77C37D6020CCE8C610@BY5PR04MB6995.namprd04.prod.outlook.com>
On 2020/7/15 2:40, Aravind Ramesh wrote:
> Thanks for the valuable feedback.
> My comments are inline.
> Will send the V3 with the feedback incorporated.
>
> Regards,
> Aravind
>
>> -----Original Message-----
>> From: Chao Yu <yuchao0@huawei.com>
>> Sent: Tuesday, July 14, 2020 5:28 PM
>> To: Aravind Ramesh <Aravind.Ramesh@wdc.com>; jaegeuk@kernel.org; linux-
>> fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net; hch@lst.de
>> Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Niklas Cassel
>> <Niklas.Cassel@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
>> Subject: Re: [PATCH v2 1/1] f2fs: support zone capacity less than zone size
>>
>> On 2020/7/11 1:43, Aravind Ramesh wrote:
>>> NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
>>> Zone-capacity indicates the maximum number of sectors that are usable
>>> in a zone beginning from the first sector of the zone. This makes the
>>> sectors sectors after the zone-capacity till zone-size to be unusable.
>>> This patch set tracks zone-size and zone-capacity in zoned devices and
>>> calculate the usable blocks per segment and usable segments per section.
>>>
>>> If zone-capacity is less than zone-size mark only those segments which
>>> start before zone-capacity as free segments. All segments at and
>>> beyond zone-capacity are treated as permanently used segments. In
>>> cases where zone-capacity does not align with segment size the last
>>> segment will start before zone-capacity and end beyond the
>>> zone-capacity of the zone. For such spanning segments only sectors within the
>> zone-capacity are used.
>>>
>>> During writes and GC manage the usable segments in a section and
>>> usable blocks per segment. Segments which are beyond zone-capacity are
>>> never allocated, and do not need to be garbage collected, only the
>>> segments which are before zone-capacity needs to garbage collected.
>>> For spanning segments based on the number of usable blocks in that
>>> segment, write to blocks only up to zone-capacity.
>>>
>>> Zone-capacity is device specific and cannot be configured by the user.
>>> Since NVMe ZNS device zones are sequentially write only, a block
>>> device with conventional zones or any normal block device is needed
>>> along with the ZNS device for the metadata operations of F2fs.
>>>
>>> A typical nvme-cli output of a zoned device shows zone start and
>>> capacity and write pointer as below:
>>>
>>> SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
>>> SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
>>> SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
>>>
>>> Here zone size is 64MB, capacity is 49MB, WP is at zone start as the
>>> zones are in EMPTY state. For each zone, only zone start + 49MB is
>>> usable area, any lba/sector after 49MB cannot be read or written to,
>>> the drive will fail any attempts to read/write. So, the second zone
>>> starts at 64MB and is usable till 113MB (64 + 49) and the range
>>> between 113 and 128MB is again unusable. The next zone starts at 128MB, and so
>> on.
>>>
>>> Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>>> ---
>>> Documentation/filesystems/f2fs.rst | 15 +++
>>> fs/f2fs/f2fs.h | 5 +
>>> fs/f2fs/gc.c | 27 +++--
>>> fs/f2fs/gc.h | 42 +++++++-
>>> fs/f2fs/segment.c | 154 ++++++++++++++++++++++++++---
>>> fs/f2fs/segment.h | 21 ++--
>>> fs/f2fs/super.c | 41 ++++++--
>>> 7 files changed, 267 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/Documentation/filesystems/f2fs.rst
>>> b/Documentation/filesystems/f2fs.rst
>>> index 535021c46260..cec2167a31db 100644
>>> --- a/Documentation/filesystems/f2fs.rst
>>> +++ b/Documentation/filesystems/f2fs.rst
>>> @@ -766,3 +766,18 @@ Compress metadata layout::
>>> +-------------+-------------+----------+----------------------------+
>>> | data length | data chksum | reserved | compressed data |
>>>
>>> +-------------+-------------+----------+----------------------------+
>>> +
>>> +NVMe Zoned Namespace devices
>>> +----------------------------
>>> +
>>> +- ZNS defines a per-zone capacity which can be equal or less than the
>>> + zone-size. Zone-capacity is the number of usable blocks in the zone.
>>> + F2fs checks if zone-capacity is less than zone-size, if it is, then
>>> +any
>>> + segment which starts after the zone-capacity is marked as not-free
>>> +in
>>> + the free segment bitmap at initial mount time. These segments are
>>> +marked
>>> + as permanently used so they are not allocated for writes and
>>> + consequently are not needed to be garbage collected. In case the
>>> + zone-capacity is not aligned to default segment size(2MB), then a
>>> +segment
>>> + can start before the zone-capacity and span across zone-capacity boundary.
>>> + Such spanning segments are also considered as usable segments. All
>>> +blocks
>>> + past the zone-capacity are considered unusable in these segments.
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index
>>> e6e47618a357..73219e4e1ba4 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -1232,6 +1232,7 @@ struct f2fs_dev_info { #ifdef
>>> CONFIG_BLK_DEV_ZONED
>>> unsigned int nr_blkz; /* Total number of zones */
>>> unsigned long *blkz_seq; /* Bitmap indicating sequential zones */
>>> + block_t *zone_capacity_blocks; /* Array of zone capacity in blks */
>>> #endif
>>> };
>>>
>>> @@ -3395,6 +3396,10 @@ void f2fs_destroy_segment_manager_caches(void);
>>> int f2fs_rw_hint_to_seg_type(enum rw_hint hint); enum rw_hint
>>> f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
>>> enum page_type type, enum temp_type temp);
>>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
>>> + unsigned int segno);
>>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
>>> + unsigned int segno);
>>>
>>> /*
>>> * checkpoint.c
>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index
>>> 9a40761445d3..dfa6d91cffcb 100644
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -266,13 +266,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info
>> *sbi, unsigned int segno)
>>> unsigned char age = 0;
>>> unsigned char u;
>>> unsigned int i;
>>> + unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi,
>>> +segno);
>>>
>>> - for (i = 0; i < sbi->segs_per_sec; i++)
>>> + for (i = 0; i < usable_segs_per_sec; i++)
>>> mtime += get_seg_entry(sbi, start + i)->mtime;
>>> vblocks = get_valid_blocks(sbi, segno, true);
>>>
>>> - mtime = div_u64(mtime, sbi->segs_per_sec);
>>> - vblocks = div_u64(vblocks, sbi->segs_per_sec);
>>> + mtime = div_u64(mtime, usable_segs_per_sec);
>>> + vblocks = div_u64(vblocks, usable_segs_per_sec);
>>>
>>> u = (vblocks * 100) >> sbi->log_blocks_per_seg;
>>>
>>> @@ -536,6 +537,7 @@ static int gc_node_segment(struct f2fs_sb_info *sbi,
>>> int phase = 0;
>>> bool fggc = (gc_type == FG_GC);
>>> int submitted = 0;
>>> + unsigned int usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi,
>>> +segno);
>>>
>>> start_addr = START_BLOCK(sbi, segno);
>>>
>>> @@ -545,7 +547,7 @@ static int gc_node_segment(struct f2fs_sb_info *sbi,
>>> if (fggc && phase == 2)
>>> atomic_inc(&sbi->wb_sync_req[NODE]);
>>>
>>> - for (off = 0; off < sbi->blocks_per_seg; off++, entry++) {
>>> + for (off = 0; off < usable_blks_in_seg; off++, entry++) {
>>> nid_t nid = le32_to_cpu(entry->nid);
>>> struct page *node_page;
>>> struct node_info ni;
>>> @@ -1033,13 +1035,14 @@ static int gc_data_segment(struct f2fs_sb_info *sbi,
>> struct f2fs_summary *sum,
>>> int off;
>>> int phase = 0;
>>> int submitted = 0;
>>> + unsigned int usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi,
>>> +segno);
>>>
>>> start_addr = START_BLOCK(sbi, segno);
>>>
>>> next_step:
>>> entry = sum;
>>>
>>> - for (off = 0; off < sbi->blocks_per_seg; off++, entry++) {
>>> + for (off = 0; off < usable_blks_in_seg; off++, entry++) {
>>> struct page *data_page;
>>> struct inode *inode;
>>> struct node_info dni; /* dnode info for the data */ @@ -1201,7
>>> +1204,16 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
>>> SUM_TYPE_DATA :
>> SUM_TYPE_NODE;
>>> int submitted = 0;
>>>
>>> - if (__is_large_section(sbi))
>>> + /*
>>> + * zone-capacity can be less than zone-size in zoned devices,
>>> + * resulting in less than expected usable segments in the zone,
>>> + * calculate the end segno in the zone which can be garbage collected
>>> + */
>>> + if (f2fs_sb_has_blkzoned(sbi))
>>> + end_segno -= sbi->segs_per_sec -
>>> + f2fs_usable_segs_in_sec(sbi, segno);
>>> +
>>> + else if (__is_large_section(sbi))
>>> end_segno = rounddown(end_segno, sbi->segs_per_sec);
>>
>> end_segno could be beyond end of section, so below calculation could adjust it to
>> the end of section first, then adjust to zone-capacity boundary.
>>
>> if (__is_large_section(sbi))
>> end_segno = rounddown(end_segno, sbi->segs_per_sec);
>>
>> if (f2fs_sb_has_blkzoned(sbi))
>> end_segno -= sbi->segs_per_sec -
>> f2fs_usable_segs_in_sec(sbi, segno);
> Ok, will change it.
>>
>>>
>>> /* readahead multi ssa blocks those have contiguous address */ @@
>>> -1356,7 +1368,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>> goto stop;
>>>
>>> seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type);
>>> - if (gc_type == FG_GC && seg_freed == sbi->segs_per_sec)
>>> + if (gc_type == FG_GC &&
>>> + seg_freed == f2fs_usable_segs_in_sec(sbi, segno))
>>> sec_freed++;
>>> total_freed += seg_freed;
>>>
>>> diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h index
>>> db3c61046aa4..463b4e38b864 100644
>>> --- a/fs/f2fs/gc.h
>>> +++ b/fs/f2fs/gc.h
>>> @@ -44,13 +44,47 @@ struct gc_inode_list {
>>> /*
>>> * inline functions
>>> */
>>> +
>>> +/*
>>> + * On a Zoned device zone-capacity can be less than zone-size and if
>>> + * zone-capacity is not aligned to f2fs segment size(2MB), then the
>>> +segment
>>> + * starting just before zone-capacity has some blocks spanning across
>>> +the
>>> + * zone-capacity, these blocks are not usable.
>>> + * Such spanning segments can be in free list so calculate the sum of
>>> +usable
>>> + * blocks in currently free segments including normal and spanning segments.
>>> + */
>>> +static inline block_t free_segs_blk_count_zoned(struct f2fs_sb_info
>>> +*sbi) {
>>> + block_t free_seg_blks = 0;
>>> + struct free_segmap_info *free_i = FREE_I(sbi);
>>> + int j;
>>> +
>>
>> spin_lock(&free_i->segmap_lock);
> Ok
>>
>>> + for (j = 0; j < MAIN_SEGS(sbi); j++)
>>> + if (!test_bit(j, free_i->free_segmap))
>>> + free_seg_blks += f2fs_usable_blks_in_seg(sbi, j);
>>
>> spin_unlock(&free_i->segmap_lock);
> Ok
>>
>>> +
>>> + return free_seg_blks;
>>> +}
>>> +
>>> +static inline block_t free_segs_blk_count(struct f2fs_sb_info *sbi) {
>>> + if (f2fs_sb_has_blkzoned(sbi))
>>> + return free_segs_blk_count_zoned(sbi);
>>> +
>>> + return free_segments(sbi) << sbi->log_blocks_per_seg; }
>>> +
>>> static inline block_t free_user_blocks(struct f2fs_sb_info *sbi) {
>>> - if (free_segments(sbi) < overprovision_segments(sbi))
>>> + block_t free_blks, ovp_blks;
>>> +
>>> + free_blks = free_segs_blk_count(sbi);
>>> + ovp_blks = overprovision_segments(sbi) << sbi->log_blocks_per_seg;
>>> +
>>> + if (free_blks < ovp_blks)
>>> return 0;
>>> - else
>>> - return (free_segments(sbi) - overprovision_segments(sbi))
>>> - << sbi->log_blocks_per_seg;
>>> +
>>> + return free_blks - ovp_blks;
>>> }
>>>
>>> static inline block_t limit_invalid_user_blocks(struct f2fs_sb_info
>>> *sbi) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
>>> c35614d255e1..d75c1849dc83 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -869,10 +869,10 @@ static void locate_dirty_segment(struct f2fs_sb_info
>> *sbi, unsigned int segno)
>>> ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno);
>>
>> unsigned int usable_blocks = f2fs_usable_blks_in_seg(sbi, segno);
>>
>>>
>>> if (valid_blocks == 0 && (!is_sbi_flag_set(sbi, SBI_CP_DISABLED) ||
>>> - ckpt_valid_blocks == sbi->blocks_per_seg)) {
>>
>> ckpt_valid_blocks == usable_blocks
>>
>>> + ckpt_valid_blocks == f2fs_usable_blks_in_seg(sbi, segno))) {
>>> __locate_dirty_segment(sbi, segno, PRE);
>>> __remove_dirty_segment(sbi, segno, DIRTY);
>>> - } else if (valid_blocks < sbi->blocks_per_seg) {
>>> + } else if (valid_blocks < f2fs_usable_blks_in_seg(sbi, segno)) {
>>
>> } else if (valid_blocks < usable_blocks) {
>>
> Ok.
>>
>> BTW, would you consider to add a field as below to avoid calculation whenever we
>> want to get usable_blocks in segment:
>
> The reason to do it like this is that f2fs supports using multiple devices to create a volume.
> So, if 2 ZNS devices are used where both have same zone-size but different zone capacity,
> then having this single field to indicate usable blocks will not work, as usable blocks of zones
The field @usable_blocks in 'struct seg_entry' is per-segment, that's not single, it can be
initialized to 0 or blocks_per_seg or 'sec_cap_blkaddr - seg_start' based on its locatation
in which zone of device during build_sit_entries().
> from different devices are not same. In such a scenario, we would need to maintain an array,
> which stores the usable blocks of each zone based on its zone-capacity.
> This approach can eliminate the calculation.
>
> So we actually implemented this approach with a buffer and at mount time this buffer would be
> allocated and populated with all the zone's usable_blocks information and would be read from
> this buffer subsequently. But we did not see any performance difference when compared to
> current approach and concluded that the overhead of calculation was negligible and also
> current approach does not need to modify core data structure, as this field will be used
> only when using a ZNS drive and if that drive has zone capacity less than zone size.
> And for all other cases, the function f2fs_usable_blks_in_seg() just returns sbi->blocks_per_seg
> without any calculation.
>
> If you think, like the calculation overhead is more, then we need to add a pointer and allocate memory
> to it to accommodate all zone's zone-capacity info. In my opinion, the gain is very less because of the
> reasons stated above. However, I am open to change it based on your feedback.
I just thought those fixed size could be recorded in advance to avoid redundant
calculation, but the change I suggested is not critical, and I do agree that it
will not help performance a bit.
Anyway, I'm not against with keeping it as it is, let's go ahead. :)
>
>>
>> struct seg_entry {
>> unsigned long long type:6; /* segment type like
>> CURSEG_XXX_TYPE */
>> unsigned long long valid_blocks:10; /* # of valid blocks */
>> unsigned long long ckpt_valid_blocks:10; /* # of valid blocks last cp
>> */
>> unsigned long long usable_blocks:10; /* usable block count in segment
>> */
>> unsigned long long padding:28; /* padding */
>> ...
>>
>> For non-zns device, usable_blocks equals to blks_per_seg, otherwise it depends on
>> where it locates, we can initialize it in build_sit_entries(), thoughts?
>>
>>> __locate_dirty_segment(sbi, segno, DIRTY);
>>> } else {
>>> /* Recovery routine with SSR needs this */ @@ -915,9 +915,11
>> @@
>>> block_t f2fs_get_unusable_blocks(struct f2fs_sb_info *sbi)
>>> for_each_set_bit(segno, dirty_i->dirty_segmap[DIRTY], MAIN_SEGS(sbi)) {
>>> se = get_seg_entry(sbi, segno);
>>> if (IS_NODESEG(se->type))
>>> - holes[NODE] += sbi->blocks_per_seg - se->valid_blocks;
>>> + holes[NODE] += f2fs_usable_blks_in_seg(sbi, segno) -
>>> + se->valid_blocks;
>>> else
>>> - holes[DATA] += sbi->blocks_per_seg - se->valid_blocks;
>>> + holes[DATA] += f2fs_usable_blks_in_seg(sbi, segno) -
>>> + se->valid_blocks;
>>> }
>>> mutex_unlock(&dirty_i->seglist_lock);
>>>
>>> @@ -2167,7 +2169,7 @@ static void update_sit_entry(struct f2fs_sb_info *sbi,
>> block_t blkaddr, int del)
>>> offset = GET_BLKOFF_FROM_SEG0(sbi, blkaddr);
>>>
>>> f2fs_bug_on(sbi, (new_vblocks >> (sizeof(unsigned short) << 3) ||
>>> - (new_vblocks > sbi->blocks_per_seg)));
>>> + (new_vblocks > f2fs_usable_blks_in_seg(sbi, segno))));
>>>
>>> se->valid_blocks = new_vblocks;
>>> se->mtime = get_mtime(sbi, false);
>>> @@ -2933,9 +2935,9 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi,
>>> struct fstrim_range *range) static bool __has_curseg_space(struct
>>> f2fs_sb_info *sbi, int type) {
>>> struct curseg_info *curseg = CURSEG_I(sbi, type);
>>> - if (curseg->next_blkoff < sbi->blocks_per_seg)
>>> - return true;
>>> - return false;
>>> +
>>> + return curseg->next_blkoff < f2fs_usable_blks_in_seg(sbi,
>>> + curseg->segno);
>>> }
>>>
>>> int f2fs_rw_hint_to_seg_type(enum rw_hint hint) @@ -4294,9 +4296,12
>>> @@ static void init_free_segmap(struct f2fs_sb_info *sbi) {
>>> unsigned int start;
>>> int type;
>>> + struct seg_entry *sentry;
>>>
>>> for (start = 0; start < MAIN_SEGS(sbi); start++) {
>>> - struct seg_entry *sentry = get_seg_entry(sbi, start);
>>> + if (f2fs_usable_blks_in_seg(sbi, start) == 0)
>>> + continue;
>>> + sentry = get_seg_entry(sbi, start);
>>> if (!sentry->valid_blocks)
>>> __set_free(sbi, start);
>>> else
>>> @@ -4316,7 +4321,7 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
>>> struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>>> struct free_segmap_info *free_i = FREE_I(sbi);
>>> unsigned int segno = 0, offset = 0, secno;
>>> - unsigned short valid_blocks;
>>> + unsigned short valid_blocks, usable_blks_in_seg;
>>> unsigned short blks_per_sec = BLKS_PER_SEC(sbi);
>>>
>>> while (1) {
>>> @@ -4326,9 +4331,10 @@ static void init_dirty_segmap(struct f2fs_sb_info *sbi)
>>> break;
>>> offset = segno + 1;
>>> valid_blocks = get_valid_blocks(sbi, segno, false);
>>> - if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
>>> + usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
>>> + if (valid_blocks == usable_blks_in_seg || !valid_blocks)
>>> continue;
>>> - if (valid_blocks > sbi->blocks_per_seg) {
>>> + if (valid_blocks > usable_blks_in_seg) {
>>> f2fs_bug_on(sbi, 1);
>>> continue;
>>> }
>>> @@ -4678,6 +4684,101 @@ int f2fs_check_write_pointer(struct
>>> f2fs_sb_info *sbi)
>>>
>>> return 0;
>>> }
>>> +
>>> +static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx,
>>> + unsigned int dev_idx)
>>> +{
>>> + if (!bdev_is_zoned(FDEV(dev_idx).bdev))
>>> + return true;
>>> + return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); }
>>> +
>>> +/* Return the zone index in the given device */ static unsigned int
>>> +get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno,
>>> + int dev_idx)
>>> +{
>>> + block_t sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi,
>>> +secno));
>>> +
>>> + return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >>
>>> + sbi->log_blocks_per_blkz;
>>> +}
>>> +
>>> +/*
>>> + * Return the usable segments in a section based on the zone's
>>> + * corresponding zone capacity. Zone is equal to a section.
>>> + */
>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(
>>> + struct f2fs_sb_info *sbi, unsigned int segno) {
>>> + unsigned int dev_idx, zone_idx, unusable_segs_in_sec;
>>> +
>>> + dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno));
>>> + zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno), dev_idx);
>>> +
>>> + /* Conventional zone's capacity is always equal to zone size */
>>> + if (is_conv_zone(sbi, zone_idx, dev_idx))
>>> + return sbi->segs_per_sec;
>>> +
>>> + /*
>>> + * If the zone_capacity_blocks array is NULL, then zone capacity
>>> + * is equal to the zone size for all zones
>>> + */
>>> + if (!FDEV(dev_idx).zone_capacity_blocks)
>>> + return sbi->segs_per_sec;
>>> +
>>> + /* Get the segment count beyond zone capacity block */
>>> + unusable_segs_in_sec = (sbi->blocks_per_blkz -
>>> + FDEV(dev_idx).zone_capacity_blocks[zone_idx])
>>>>
>>> + sbi->log_blocks_per_seg;
>>> + return sbi->segs_per_sec - unusable_segs_in_sec; }
>>> +
>>> +/*
>>> + * Return the number of usable blocks in a segment. The number of
>>> +blocks
>>> + * returned is always equal to the number of blocks in a segment for
>>> + * segments fully contained within a sequential zone capacity or a
>>> + * conventional zone. For segments partially contained in a
>>> +sequential
>>> + * zone capacity, the number of usable blocks up to the zone capacity
>>> + * is returned. 0 is returned in all other cases.
>>> + */
>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(
>>> + struct f2fs_sb_info *sbi, unsigned int segno) {
>>> + block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
>>> + unsigned int zone_idx, dev_idx, secno;
>>> +
>>> + secno = GET_SEC_FROM_SEG(sbi, segno);
>>> + seg_start = START_BLOCK(sbi, segno);
>>> + dev_idx = f2fs_target_device_index(sbi, seg_start);
>>> + zone_idx = get_zone_idx(sbi, secno, dev_idx);
>>> +
>>> + /*
>>> + * Conventional zone's capacity is always equal to zone size,
>>> + * so, blocks per segment is unchanged.
>>> + */
>>> + if (is_conv_zone(sbi, zone_idx, dev_idx))
>>> + return sbi->blocks_per_seg;
>>> +
>>> + if (!FDEV(dev_idx).zone_capacity_blocks)
>>> + return sbi->blocks_per_seg;
>>> +
>>> + sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
>>> + sec_cap_blkaddr = sec_start_blkaddr +
>>> + FDEV(dev_idx).zone_capacity_blocks[zone_idx];
>>> +
>>> + /*
>>> + * If segment starts before zone capacity and spans beyond
>>> + * zone capacity, then usable blocks are from seg start to
>>> + * zone capacity. If the segment starts after the zone capacity,
>>> + * then there are no usable blocks.
>>> + */
>>> + if (seg_start >= sec_cap_blkaddr)
>>> + return 0;
>>> + if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
>>> + return sec_cap_blkaddr - seg_start;
>>> +
>>> + return sbi->blocks_per_seg;
>>> +}
>>> #else
>>> int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi) { @@
>>> -4688,7 +4789,36 @@ int f2fs_check_write_pointer(struct f2fs_sb_info
>>> *sbi) {
>>> return 0;
>>> }
>>> +
>>> +static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info *sbi,
>>> + unsigned int segno)
>>> +{
>>> + return 0;
>>> +}
>>> +
>>> +static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info *sbi,
>>> + unsigned int segno)
>>> +{
>>> + return 0;
>>> +}
>>> #endif
>>> +unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
>>> + unsigned int segno)
>>> +{
>>> + if (f2fs_sb_has_blkzoned(sbi))
>>> + return f2fs_usable_zone_blks_in_seg(sbi, segno);
>>> +
>>> + return sbi->blocks_per_seg;
>>> +}
>>> +
>>> +unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
>>> + unsigned int segno)
>>> +{
>>> + if (f2fs_sb_has_blkzoned(sbi))
>>> + return f2fs_usable_zone_segs_in_sec(sbi, segno);
>>> +
>>> + return sbi->segs_per_sec;
>>> +}
>>>
>>> /*
>>> * Update min, max modified time for cost-benefit GC algorithm diff
>>> --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
>>> f261e3e6a69b..41bc08013160 100644
>>> --- a/fs/f2fs/segment.h
>>> +++ b/fs/f2fs/segment.h
>>> @@ -411,6 +411,7 @@ static inline void __set_free(struct f2fs_sb_info *sbi,
>> unsigned int segno)
>>> unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>>> unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>>> unsigned int next;
>>> + unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>>>
>>> spin_lock(&free_i->segmap_lock);
>>> clear_bit(segno, free_i->free_segmap); @@ -418,7 +419,7 @@ static
>>> inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
>>>
>>> next = find_next_bit(free_i->free_segmap,
>>> start_segno + sbi->segs_per_sec, start_segno);
>>> - if (next >= start_segno + sbi->segs_per_sec) {
>>> + if (next >= start_segno + usable_segs) {
>>> clear_bit(secno, free_i->free_secmap);
>>> free_i->free_sections++;
>>> }
>>> @@ -444,6 +445,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info
>> *sbi,
>>> unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
>>> unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno);
>>> unsigned int next;
>>> + unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno);
>>>
>>> spin_lock(&free_i->segmap_lock);
>>> if (test_and_clear_bit(segno, free_i->free_segmap)) { @@ -453,7
>>> +455,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
>>> goto skip_free;
>>> next = find_next_bit(free_i->free_segmap,
>>> start_segno + sbi->segs_per_sec, start_segno);
>>> - if (next >= start_segno + sbi->segs_per_sec) {
>>> + if (next >= start_segno + usable_segs) {
>>> if (test_and_clear_bit(secno, free_i->free_secmap))
>>> free_i->free_sections++;
>>> }
>>> @@ -546,8 +548,8 @@ static inline bool has_curseg_enough_space(struct
>> f2fs_sb_info *sbi)
>>> /* check current node segment */
>>> for (i = CURSEG_HOT_NODE; i <= CURSEG_COLD_NODE; i++) {
>>> segno = CURSEG_I(sbi, i)->segno;
>>> - left_blocks = sbi->blocks_per_seg -
>>> - get_seg_entry(sbi, segno)->ckpt_valid_blocks;
>>> + left_blocks = f2fs_usable_blks_in_seg(sbi, segno) -
>>> + get_seg_entry(sbi, segno)->ckpt_valid_blocks;
>>>
>>> if (node_blocks > left_blocks)
>>> return false;
>>> @@ -555,7 +557,7 @@ static inline bool has_curseg_enough_space(struct
>>> f2fs_sb_info *sbi)
>>>
>>> /* check current data segment */
>>> segno = CURSEG_I(sbi, CURSEG_HOT_DATA)->segno;
>>> - left_blocks = sbi->blocks_per_seg -
>>> + left_blocks = f2fs_usable_blks_in_seg(sbi, segno) -
>>> get_seg_entry(sbi, segno)->ckpt_valid_blocks;
>>> if (dent_blocks > left_blocks)
>>> return false;
>>> @@ -677,21 +679,22 @@ static inline int check_block_count(struct f2fs_sb_info
>> *sbi,
>>> bool is_valid = test_bit_le(0, raw_sit->valid_map) ? true : false;
>>> int valid_blocks = 0;
>>> int cur_pos = 0, next_pos;
>>> + unsigned int usable_blks_per_seg = f2fs_usable_blks_in_seg(sbi,
>>> +segno);
>>>
>>> /* check bitmap with valid block count */
>>> do {
>>> if (is_valid) {
>>> next_pos = find_next_zero_bit_le(&raw_sit->valid_map,
>>> - sbi->blocks_per_seg,
>>> + usable_blks_per_seg,
>>> cur_pos);
>>> valid_blocks += next_pos - cur_pos;
>>> } else
>>> next_pos = find_next_bit_le(&raw_sit->valid_map,
>>> - sbi->blocks_per_seg,
>>> + usable_blks_per_seg,
>>> cur_pos);
>>> cur_pos = next_pos;
>>> is_valid = !is_valid;
>>> - } while (cur_pos < sbi->blocks_per_seg);
>>> + } while (cur_pos < usable_blks_per_seg);
>>
>> I meant we need to check that there should be no valid slot locates beyond zone-
>> capacity in bitmap:
> Ahh, ok. Got it. Will change.
>>
>> if (usable_blks_per_seg < sbi->blocks_per_seg)
>> f2fs_bug_on(
>> find_next_bit_le(&raw_sit->valid_map,
>> sbi->blocks_per_seg,
>> usable_blks_per_seg) != sbi->blocks_per_seg)
>>
>>>
>>> if (unlikely(GET_SIT_VBLOCKS(raw_sit) != valid_blocks)) {
>>> f2fs_err(sbi, "Mismatch valid blocks %d vs. %d", @@ -701,7 +704,7
>>> @@ static inline int check_block_count(struct f2fs_sb_info *sbi,
>>> }
>>>
>>> /* check segment usage, and check boundary of a given segment number */
>>> - if (unlikely(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg
>>> + if (unlikely(GET_SIT_VBLOCKS(raw_sit) > usable_blks_per_seg
>>> || segno > TOTAL_SEGS(sbi) - 1)) {
>>> f2fs_err(sbi, "Wrong valid blocks %d or segno %u",
>>> GET_SIT_VBLOCKS(raw_sit), segno); diff --git
>> a/fs/f2fs/super.c
>>> b/fs/f2fs/super.c index 80cb7cd358f8..7ced425dc102 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -1164,6 +1164,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
>>> blkdev_put(FDEV(i).bdev, FMODE_EXCL); #ifdef
>> CONFIG_BLK_DEV_ZONED
>>> kvfree(FDEV(i).blkz_seq);
>>> + kfree(FDEV(i).zone_capacity_blocks);
>>> #endif
>>> }
>>> kvfree(sbi->devs);
>>> @@ -3039,13 +3040,26 @@ static int init_percpu_info(struct
>>> f2fs_sb_info *sbi) }
>>>
>>> #ifdef CONFIG_BLK_DEV_ZONED
>>> +
>>> +struct f2fs_report_zones_args {
>>> + struct f2fs_dev_info *dev;
>>> + bool zone_cap_mismatch;
>>> +};
>>> +
>>> static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
>>> - void *data)
>>> + void *data)
>>> {
>>> - struct f2fs_dev_info *dev = data;
>>> + struct f2fs_report_zones_args *rz_args = data;
>>> +
>>> + if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
>>> + return 0;
>>> +
>>> + set_bit(idx, rz_args->dev->blkz_seq);
>>> + rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >>
>>> + F2FS_LOG_SECTORS_PER_BLOCK;
>>> + if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch)
>>> + rz_args->zone_cap_mismatch = true;
>>>
>>> - if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
>>> - set_bit(idx, dev->blkz_seq);
>>> return 0;
>>> }
>>>
>>> @@ -3053,6 +3067,7 @@ static int init_blkz_info(struct f2fs_sb_info
>>> *sbi, int devi) {
>>> struct block_device *bdev = FDEV(devi).bdev;
>>> sector_t nr_sectors = bdev->bd_part->nr_sects;
>>> + struct f2fs_report_zones_args rep_zone_arg;
>>> int ret;
>>>
>>> if (!f2fs_sb_has_blkzoned(sbi))
>>> @@ -3078,12 +3093,26 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int
>> devi)
>>> if (!FDEV(devi).blkz_seq)
>>> return -ENOMEM;
>>>
>>> - /* Get block zones type */
>>> + /* Get block zones type and zone-capacity */
>>> + FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi,
>>> + FDEV(devi).nr_blkz * sizeof(block_t),
>>> + GFP_KERNEL);
>>> + if (!FDEV(devi).zone_capacity_blocks)
>>> + return -ENOMEM;
>>> +
>>> + rep_zone_arg.dev = &FDEV(devi);
>>> + rep_zone_arg.zone_cap_mismatch = false;
>>> +
>>> ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
>>> - &FDEV(devi));
>>> + &rep_zone_arg);
>>> if (ret < 0)
>>
>> kfree(FDEV(devi).zone_capacity_blocks);
> Actually, we do not need to free this here, because if error (ret < 0) is returned, then the control goes back to
> destroy_device_list() and deallocates the buffer.
Confirmed.
Thanks,
>
>>
>> Thanks,
>>
>>> return ret;
>>>
>>> + if (!rep_zone_arg.zone_cap_mismatch) {
>>> + kfree(FDEV(devi).zone_capacity_blocks);
>>> + FDEV(devi).zone_capacity_blocks = NULL;
>>> + }
>>> +
>>> return 0;
>>> }
>>> #endif
>>>
> .
>
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
next prev parent reply other threads:[~2020-07-15 2:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-10 17:43 [f2fs-dev] [PATCH v2 0/1] f2fs: zns zone-capacity support Aravind Ramesh
2020-07-10 17:43 ` [f2fs-dev] [PATCH v2 1/1] f2fs: support zone capacity less than zone size Aravind Ramesh
2020-07-14 11:57 ` Chao Yu
2020-07-14 18:40 ` Aravind Ramesh
2020-07-15 2:02 ` Chao Yu [this message]
2020-07-16 12:56 ` Aravind Ramesh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=af3f4ce5-5653-38f8-9fce-06efbaae09d5@huawei.com \
--to=yuchao0@huawei.com \
--cc=Aravind.Ramesh@wdc.com \
--cc=Damien.LeMoal@wdc.com \
--cc=Matias.Bjorling@wdc.com \
--cc=Niklas.Cassel@wdc.com \
--cc=hch@lst.de \
--cc=jaegeuk@kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).