From: Chao Yu <yuchao0@huawei.com>
To: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>,
Damien Le Moal <Damien.LeMoal@wdc.com>,
"linux-f2fs-devel@lists.sourceforge.net"
<linux-f2fs-devel@lists.sourceforge.net>
Subject: Re: [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones
Date: Sat, 30 Nov 2019 15:49:00 +0800 [thread overview]
Message-ID: <6d29c4f8-842f-9a48-9ec8-e0707f37d97b@huawei.com> (raw)
In-Reply-To: <20191129052148.kgx6ahy2zu4rrsvq@shindev.dhcp.fujisawa.hgst.com>
On 2019/11/29 13:21, Shinichiro Kawasaki wrote:
> On Nov 28, 2019 / 20:39, Chao Yu wrote:
>> On 2019/11/28 13:31, Shinichiro Kawasaki wrote:
>>> On Nov 25, 2019 / 15:37, Chao Yu wrote:
>>>> On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
>>>>> To catch f2fs bugs in write pointer handling code for zoned block
>>>>> devices, check write pointers of non-open zones that current segments do
>>>>> not point to. Do this check at mount time, after the fsync data recovery
>>>>> and current segments' write pointer consistency fix. Or when fsync data
>>>>> recovery is disabled by mount option, do the check when there is no fsync
>>>>> data.
>>>>>
>>>>> Check two items comparing write pointers with valid block maps in SIT.
>>>>> The first item is check for zones with no valid blocks. When there is no
>>>>> valid blocks in a zone, the write pointer should be at the start of the
>>>>> zone. If not, next write operation to the zone will cause unaligned write
>>>>> error. If write pointer is not at the zone start, make mount fail and ask
>>>>> users to run fsck.
>>>>>
>>>>> The second item is check between the write pointer position and the last
>>>>> valid block in the zone. It is unexpected that the last valid block
>>>>> position is beyond the write pointer. In such a case, report as a bug.
>>>>> Fix is not required for such zone, because the zone is not selected for
>>>>> next write operation until the zone get discarded.
>>>>>
>>>>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
>>>>> in segment.c also.
>>>>>
>>>>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>> ---
>>>>> fs/f2fs/f2fs.h | 3 +
>>>>> fs/f2fs/segment.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>> fs/f2fs/super.c | 16 ++++-
>>>>> 3 files changed, 165 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>> index a2e24718c13b..1bb64950d793 100644
>>>>> --- a/fs/f2fs/f2fs.h
>>>>> +++ b/fs/f2fs/f2fs.h
>>>>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
>>>>> unsigned int val, int alloc);
>>>>> void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
>>>>> int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
>>>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
>>>>> int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
>>>>> void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
>>>>> int __init f2fs_create_segment_manager_caches(void);
>>>>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
>>>>>
>>>>> return test_bit(zno, FDEV(devi).blkz_seq);
>>>>> }
>>>>> +
>>>>> +#define F2FS_REPORT_NR_ZONES 4096
>>>>> #endif
>>>>>
>>>>> static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>> index 6ece146dab34..29e3b6f62f8c 100644
>>>>> --- a/fs/f2fs/segment.c
>>>>> +++ b/fs/f2fs/segment.c
>>>>> @@ -4333,6 +4333,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
>>>>>
>>>>> #ifdef CONFIG_BLK_DEV_ZONED
>>>>>
>>>>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
>>>>> + struct f2fs_dev_info *fdev,
>>>>> + struct blk_zone *zone)
>>>>> +{
>>>>> + unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
>>>>> + block_t zone_block, wp_block, last_valid_block;
>>>>> + unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
>>>>> + int i, s, b;
>>>>> + struct seg_entry *se;
>>>>> +
>>>>> + wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
>>>>> + wp_segno = GET_SEGNO(sbi, wp_block);
>>>>> + wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
>>>>> + zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
>>>>> + zone_segno = GET_SEGNO(sbi, zone_block);
>>>>> + zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
>>>>> +
>>>>> + if (zone_segno >= MAIN_SEGS(sbi))
>>>>> + return 0;
>>>>> +
>>>>> + /*
>>>>> + * Skip check of zones cursegs point to, since
>>>>> + * fix_curseg_write_pointer() checks them.
>>>>> + */
>>>>> + for (i = 0; i < NO_CHECK_TYPE; i++)
>>>>> + if (zone_secno == GET_SEC_FROM_SEG(sbi,
>>>>> + CURSEG_I(sbi, i)->segno))
>>>>> + return 0;
>>>>> +
>>>>> + /*
>>>>> + * Get last valid block of the zone.
>>>>> + */
>>>>> + last_valid_block = zone_block - 1;
>>>>> + for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
>>>>> + segno = zone_segno + s;
>>>>> + se = get_seg_entry(sbi, segno);
>>>>> + for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
>>>>> + if (f2fs_test_bit(b, se->cur_valid_map)) {
>>>>> + last_valid_block = START_BLOCK(sbi, segno) + b;
>>>>> + break;
>>>>> + }
>>>>> + if (last_valid_block >= zone_block)
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * If last valid block is beyond the write pointer, report the
>>>>> + * inconsistency. This inconsistency does not cause write error
>>>>> + * because the zone will not be selected for write operation until
>>>>> + * it get discarded. Just report it.
>>>>> + */
>>>>> + if (last_valid_block >= wp_block) {
>>>>> + f2fs_notice(sbi, "Valid block beyond write pointer: "
>>>>> + "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
>>>>> + GET_SEGNO(sbi, last_valid_block),
>>>>> + GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
>>>>> + wp_segno, wp_blkoff);
>>>>> + return 0;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * If there is no valid block in the zone and if write pointer is
>>>>> + * not at zone start, report the error to run fsck.
>>>>
>>>> So we only need to report this as inconsistent status in the condition of
>>>> discard has been triggered, right? otherwise, f2fs will trigger discard later
>>>> to reset zone->wp before opening this zone?
>>>
>>> Hmm, my intent was to catch the inconsistency at mount time, assuming the
>>> inconsistency is not expected at mount time. In other words, I assume that
>>> discard is triggered for zones without valid blocks before that last clean
>>
>> IIUC, if there is too many pending discards, put_super() may drop discard entries
>> to avoid delaying umount, so we can not assume all discards are always being
>> triggered.
>
> I see. In this case, current code in the patch will miss-detect the zone with
> the dropped discard entries. This is not good. Thank you for catching this :)
>
>>
>> So what I mean is for the condition of a) there is valid (including fsycned) block,
>> b) zone->wp is not at correct position, f2fs can handle it by issuing discard. Let
>> me know if I misread this comment.
>
> For the condition a), do you mean "there is _no_ valid (include fsynced) block"?
Oops, yes, I meant that. :)
Thanks,
> If so, yes, I agree that f2fs can issue discard and both a) and b) are true. I
> can add a simple function call of "reset zone" to discard the zone.
>
>>
>>> umount. If the last sudden f2fs shutdown without clean umount caused the
>>> inconsistency, it should be reported and fixed, I think.
>>>
>>> SIT valid blocks are referred to check if there is no valid blocks in the zone.
>>> SIT may be broken due to software bug or hardware flaw, then I think it is the
>>> better to run fsck rather than discard by f2fs.
>>>
>>> If I miss anything, please let me know.
>>>
>>> --
>>> Best Regards,
>>> Shin'ichiro Kawasaki
>>>
>>>>
>>>> Thanks,
>>>>
>>>>> + */
>>>>> + if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
>>>>> + f2fs_notice(sbi,
>>>>> + "Zone without valid block has non-zero write "
>>>>> + "pointer, run fsck to fix: wp[0x%x,0x%x]",
>>>>> + wp_segno, wp_blkoff);
>>>>> + f2fs_stop_checkpoint(sbi, true);
>>>>> + set_sbi_flag(sbi, SBI_NEED_FSCK);
>>>>> + return -EINVAL;
>>>>> + }
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static int check_dev_write_pointer(struct f2fs_sb_info *sbi,
>>>>> + struct f2fs_dev_info *fdev) {
>>>>> + sector_t nr_sectors = fdev->bdev->bd_part->nr_sects;
>>>>> + sector_t sector = 0;
>>>>> + struct blk_zone *zones;
>>>>> + unsigned int i, nr_zones;
>>>>> + unsigned int n = 0;
>>>>> + int err = -EIO;
>>>>> +
>>>>> + if (!bdev_is_zoned(fdev->bdev))
>>>>> + return 0;
>>>>> +
>>>>> + zones = f2fs_kzalloc(sbi,
>>>>> + array_size(F2FS_REPORT_NR_ZONES,
>>>>> + sizeof(struct blk_zone)),
>>>>> + GFP_KERNEL);
>>>>> + if (!zones)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + /* Get block zones type */
>>>>> + while (zones && sector < nr_sectors) {
>>>>> +
>>>>> + nr_zones = F2FS_REPORT_NR_ZONES;
>>>>> + err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones);
>>>>> + if (err)
>>>>> + break;
>>>>> + if (!nr_zones) {
>>>>> + err = -EIO;
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> + for (i = 0; i < nr_zones; i++) {
>>>>> + if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
>>>>> + err = check_zone_write_pointer(sbi, fdev,
>>>>> + &zones[i]);
>>>>> + if (err)
>>>>> + break;
>>>>> + }
>>>>> + sector += zones[i].len;
>>>>> + n++;
>>>>> + }
>>>>> + if (err)
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> + kvfree(zones);
>>>>> +
>>>>> + return err;
>>>>> +}
>>>>> +
>>>>> static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
>>>>> block_t zone_blkaddr)
>>>>> {
>>>>> @@ -4399,6 +4526,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
>>>>> "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
>>>>> allocate_segment_by_default(sbi, type, true);
>>>>>
>>>>> + /* check consistency of the zone curseg pointed to */
>>>>> + if (check_zone_write_pointer(sbi, zbd, &zone))
>>>>> + return -EIO;
>>>>> +
>>>>> /* check newly assigned zone */
>>>>> cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
>>>>> cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
>>>>> @@ -4444,11 +4575,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
>>>>>
>>>>> return 0;
>>>>> }
>>>>> +
>>>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
>>>>> +{
>>>>> + int i, ret;
>>>>> +
>>>>> + for (i = 0; i < sbi->s_ndevs; i++) {
>>>>> + ret = check_dev_write_pointer(sbi, &FDEV(i));
>>>>> + if (ret)
>>>>> + return ret;
>>>>> + }
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> #else
>>>>> int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
>>>>> {
>>>>> return 0;
>>>>> }
>>>>> +
>>>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
>>>>> +{
>>>>> + return 0;
>>>>> +}
>>>>> #endif
>>>>>
>>>>> /*
>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>>> index 1443cee15863..8ca772670c67 100644
>>>>> --- a/fs/f2fs/super.c
>>>>> +++ b/fs/f2fs/super.c
>>>>> @@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>>>>> if (!FDEV(devi).blkz_seq)
>>>>> return -ENOMEM;
>>>>>
>>>>> -#define F2FS_REPORT_NR_ZONES 4096
>>>>> -
>>>>> zones = f2fs_kzalloc(sbi,
>>>>> array_size(F2FS_REPORT_NR_ZONES,
>>>>> sizeof(struct blk_zone)),
>>>>> @@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>
>>>>> err = f2fs_recover_fsync_data(sbi, false);
>>>>> if (err < 0) {
>>>>> - if (err != -ENOMEM)
>>>>> + if (err != -ENOMEM &&
>>>>> + !is_sbi_flag_set(sbi, SBI_NEED_FSCK))
>>>>> skip_recovery = true;
>>>>> need_fsck = true;
>>>>> f2fs_err(sbi, "Cannot recover all fsync data errno=%d",
>>>>> @@ -3525,6 +3524,17 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>> goto free_meta;
>>>>> }
>>>>> }
>>>>> +
>>>>> + /*
>>>>> + * If the f2fs is not readonly and fsync data recovery succeeds,
>>>>> + * check zoned block devices' write pointer consistency.
>>>>> + */
>>>>> + if (!err && !f2fs_readonly(sb) && f2fs_sb_has_blkzoned(sbi)) {
>>>>> + err = f2fs_check_write_pointer(sbi);
>>>>> + if (err)
>>>>> + goto free_meta;
>>>>> + }
>>>>> +
>>>>> reset_checkpoint:
>>>>> /* f2fs_recover_fsync_data() cleared this already */
>>>>> clear_sbi_flag(sbi, SBI_POR_DOING);
>>>>> .
>>>
>
> --
> Best Regards,
> Shin'ichiro Kawasaki
> .
>
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
next prev parent reply other threads:[~2019-11-30 7:49 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-14 8:19 [f2fs-dev] [PATCH v3 0/2] f2fs: Check write pointers of zoned block devices Shin'ichiro Kawasaki
2019-11-14 8:19 ` [f2fs-dev] [PATCH v3 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki
2019-11-25 6:59 ` Chao Yu
2019-11-28 4:07 ` Shinichiro Kawasaki
2019-11-28 12:26 ` Chao Yu
2019-11-29 1:58 ` Shinichiro Kawasaki
2019-11-14 8:19 ` [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
2019-11-25 7:37 ` Chao Yu
2019-11-28 5:31 ` Shinichiro Kawasaki
2019-11-28 12:39 ` Chao Yu
2019-11-29 5:21 ` Shinichiro Kawasaki
2019-11-30 7:49 ` Chao Yu [this message]
2019-12-02 1:38 ` Shinichiro Kawasaki
2019-12-02 9:51 ` Shinichiro Kawasaki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d29c4f8-842f-9a48-9ec8-e0707f37d97b@huawei.com \
--to=yuchao0@huawei.com \
--cc=Damien.LeMoal@wdc.com \
--cc=jaegeuk@kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).