* [f2fs-dev] [PATCH v2 0/2] f2fs: Check write pointers of zoned block devices @ 2019-10-28 6:57 Shin'ichiro Kawasaki 2019-10-28 6:58 ` [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki 2019-10-28 6:58 ` [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki 0 siblings, 2 replies; 10+ messages in thread From: Shin'ichiro Kawasaki @ 2019-10-28 6:57 UTC (permalink / raw) To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal On sudden f2fs shutdown, zoned block device status and f2fs meta data can be inconsistent. When f2fs shutdown happens during write operations, write pointers on the device go forward but the f2fs meta data does not reflect the write pointer progress. This inconsistency will eventually cause "Unaligned write command" error when restarting write operation after the next mount. This error is observed with xfstests test case generic/388, which enforces sudden shutdown during write operation and checks the file system recovery. This patch series adds a feature to f2fs to check and fix the write pointer consistency when zoned block devices get mounted. Per discussion on the list, implement two checks: a check for open zones and a check for non-open zones. The first patch add the check for open zones that current segments point to. Check write pointer consistency with current segment positions recorded in CP, and if they are inconsistent, assign a new zone to the current segment. When fsync data exists, check and fix the consistency after fsync data recovery. The second patch adds the check for non-open zones that current segments do not point to. Compare write pointers with valid block maps in SIT, and if they are inconsistent, report the failure. Even if fixes are required, do not have kernel fix because SIT consistency with CP is not guaranteed because of hardware failure or software bug possibility. In such check failure cases, fail the mount to ask users to run fsck. Corresponding patch series is being posted for fsck.f2fs to add the write pointer consistency check feature. Thank goes to Chao Yu for the detailed discussion on the list. Changes from v1: * 2nd patch: Added check to ensure zones are in main segments Shin'ichiro Kawasaki (2): f2fs: Check write pointer consistency of open zones f2fs: Check write pointer consistency of non-open zones fs/f2fs/f2fs.h | 4 + fs/f2fs/recovery.c | 6 + fs/f2fs/segment.c | 274 +++++++++++++++++++++++++++++++++++++++++++++ fs/f2fs/super.c | 17 ++- 4 files changed, 298 insertions(+), 3 deletions(-) -- 2.21.0 _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones 2019-10-28 6:57 [f2fs-dev] [PATCH v2 0/2] f2fs: Check write pointers of zoned block devices Shin'ichiro Kawasaki @ 2019-10-28 6:58 ` Shin'ichiro Kawasaki 2019-11-05 12:03 ` Chao Yu 2019-10-28 6:58 ` [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki 1 sibling, 1 reply; 10+ messages in thread From: Shin'ichiro Kawasaki @ 2019-10-28 6:58 UTC (permalink / raw) To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal On sudden f2fs shutdown, write pointers of zoned block devices can go further but f2fs meta data keeps current segments at positions before the write operations. After remounting the f2fs, this inconsistency causes write operations not at write pointers and "Unaligned write command" error is reported. To avoid the error, compare current segments with write pointers of open zones the current segments point to, during mount operation. If the write pointer position is not aligned with the current segment position, assign a new zone to the current segments. Also check the newly assigned zone has write pointer at zone start. If not, make mount fail and ask users to run fsck. Perform the consistency check twice. Once during fsync recovery. Not to lose the fsync data, do the check after fsync data gets restored and before checkpoint commit which flushes data at current segment positions. The second check is done at end of f2fs_fill_super() to make sure the write pointer consistency regardless of fsync data recovery execution. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> --- fs/f2fs/f2fs.h | 1 + fs/f2fs/recovery.c | 6 +++ fs/f2fs/segment.c | 127 +++++++++++++++++++++++++++++++++++++++++++++ fs/f2fs/super.c | 8 +++ 4 files changed, 142 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 4024790028aa..0216282c5b80 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3136,6 +3136,7 @@ void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk); int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, unsigned int val, int alloc); void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only); int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); int __init f2fs_create_segment_manager_caches(void); diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c index 783773e4560d..c75d1cbae4d1 100644 --- a/fs/f2fs/recovery.c +++ b/fs/f2fs/recovery.c @@ -795,6 +795,12 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only) if (need_writecp) { set_sbi_flag(sbi, SBI_IS_RECOVERED); + /* recover zoned block devices' write pointer consistency */ + if (!err && f2fs_sb_has_blkzoned(sbi)) { + err = f2fs_fix_curseg_write_pointer(sbi, false); + ret = err; + } + if (!err) { struct cp_control cpc = { .reason = CP_RECOVERY, diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 808709581481..2b6e637dd6d3 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -4331,6 +4331,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi) return 0; } +#ifdef CONFIG_BLK_DEV_ZONED + +static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi, + block_t zone_blkaddr) +{ + int i; + + for (i = 0; i < sbi->s_ndevs; i++) { + if (!bdev_is_zoned(FDEV(i).bdev)) + continue; + if (sbi->s_ndevs == 1 || (FDEV(i).start_blk <= zone_blkaddr && + zone_blkaddr <= FDEV(i).end_blk)) + return &FDEV(i); + } + + return NULL; +} + +static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type, + bool check_only) +{ + struct curseg_info *cs = CURSEG_I(sbi, type); + struct f2fs_dev_info *zbd; + struct blk_zone zone; + unsigned int cs_section, wp_segno, wp_blkoff, nr_zones, wp_sector_off; + block_t cs_zone_block, wp_block, cs_block; + unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; + sector_t zone_sector; + int err; + + cs_section = GET_SEC_FROM_SEG(sbi, cs->segno); + cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section)); + cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff; + + zbd = get_target_zoned_dev(sbi, cs_zone_block); + if (!zbd) + return 0; + + /* report zone for the sector the curseg points to */ + zone_sector = (sector_t)(cs_zone_block - zbd->start_blk) + << log_sectors_per_block; + nr_zones = 1; + err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones); + if (err) { + f2fs_err(sbi, "Report zone failed: %s errno=(%d)", + zbd->path, err); + return err; + } + + if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ) + return 0; + + wp_block = zbd->start_blk + (zone.wp >> log_sectors_per_block); + wp_segno = GET_SEGNO(sbi, wp_block); + wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno); + wp_sector_off = zone.wp & GENMASK(log_sectors_per_block - 1, 0); + + if (cs->segno == wp_segno && cs->next_blkoff == wp_blkoff && + wp_sector_off == 0) + return 0; + + f2fs_notice(sbi, "Unaligned curseg[%d] with write pointer: " + "curseg[0x%x,0x%x] wp[0x%x,0x%x]", + type, cs->segno, cs->next_blkoff, wp_segno, wp_blkoff); + + /* if check_only is specified, return error without fix */ + if (check_only) + return -EIO; + + f2fs_notice(sbi, "Assign new section to curseg[%d]: " + "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff); + allocate_segment_by_default(sbi, type, true); + + /* check newly assigned zone */ + cs_section = GET_SEC_FROM_SEG(sbi, cs->segno); + cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section)); + + zbd = get_target_zoned_dev(sbi, cs_zone_block); + if (!zbd) + return 0; + + zone_sector = (sector_t)(cs_zone_block - zbd->start_blk) + << log_sectors_per_block; + nr_zones = 1; + err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones); + if (err) { + f2fs_err(sbi, "Report zone failed: %s errno=(%d)", + zbd->path, err); + return err; + } + + if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ) + return 0; + + if (zone.wp != zone.start) { + f2fs_notice(sbi, + "New section for curseg[%d] is not empty, " + "run fsck to fix: curseg[0x%x,0x%x]", + type, cs->segno, cs->next_blkoff); + __set_inuse(sbi, GET_SEGNO(sbi, cs_zone_block)); + f2fs_stop_checkpoint(sbi, true); + set_sbi_flag(sbi, SBI_NEED_FSCK); + return -EINVAL; + } + + return 0; +} + +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only) +{ + int i, ret; + + for (i = 0; i < NO_CHECK_TYPE; i++) { + ret = fix_curseg_write_pointer(sbi, i, check_only); + if (ret) + return ret; + } + + return 0; +} +#else +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only) +{ + return 0; +} +#endif + /* * Update min, max modified time for cost-benefit GC algorithm */ diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 1443cee15863..ebd0ae02a260 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -3525,6 +3525,14 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) goto free_meta; } } + + /* check zoned block devices' write pointer consistency */ + if (f2fs_sb_has_blkzoned(sbi)) { + err = f2fs_fix_curseg_write_pointer(sbi, f2fs_readonly(sb)); + if (err) + goto free_meta; + } + reset_checkpoint: /* f2fs_recover_fsync_data() cleared this already */ clear_sbi_flag(sbi, SBI_POR_DOING); -- 2.21.0 _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones 2019-10-28 6:58 ` [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki @ 2019-11-05 12:03 ` Chao Yu 2019-11-08 4:09 ` Shinichiro Kawasaki 0 siblings, 1 reply; 10+ messages in thread From: Chao Yu @ 2019-11-05 12:03 UTC (permalink / raw) To: Shin'ichiro Kawasaki, Jaegeuk Kim, linux-f2fs-devel; +Cc: Damien Le Moal On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote: > On sudden f2fs shutdown, write pointers of zoned block devices can go > further but f2fs meta data keeps current segments at positions before the > write operations. After remounting the f2fs, this inconsistency causes > write operations not at write pointers and "Unaligned write command" > error is reported. > > To avoid the error, compare current segments with write pointers of open > zones the current segments point to, during mount operation. If the write > pointer position is not aligned with the current segment position, assign > a new zone to the current segments. Also check the newly assigned zone > has write pointer at zone start. If not, make mount fail and ask users to > run fsck. > > Perform the consistency check twice. Once during fsync recovery. Not to > lose the fsync data, do the check after fsync data gets restored and > before checkpoint commit which flushes data at current segment positions. > The second check is done at end of f2fs_fill_super() to make sure the > write pointer consistency regardless of fsync data recovery execution. > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > --- > fs/f2fs/f2fs.h | 1 + > fs/f2fs/recovery.c | 6 +++ > fs/f2fs/segment.c | 127 +++++++++++++++++++++++++++++++++++++++++++++ > fs/f2fs/super.c | 8 +++ > 4 files changed, 142 insertions(+) > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 4024790028aa..0216282c5b80 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -3136,6 +3136,7 @@ void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk); > int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, > unsigned int val, int alloc); > void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); > +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only); > int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); > void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); > int __init f2fs_create_segment_manager_caches(void); > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c > index 783773e4560d..c75d1cbae4d1 100644 > --- a/fs/f2fs/recovery.c > +++ b/fs/f2fs/recovery.c > @@ -795,6 +795,12 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only) > if (need_writecp) { > set_sbi_flag(sbi, SBI_IS_RECOVERED); > > + /* recover zoned block devices' write pointer consistency */ > + if (!err && f2fs_sb_has_blkzoned(sbi)) { > + err = f2fs_fix_curseg_write_pointer(sbi, false); Can we check and reset current segment under SBI_POR_DOING's protection? since once SBI_POR_DOING flag is cleared, kworker is able to flush dirty data/node, which may trigger unaligned write command if write pointer is inconsistent. Thanks, > + ret = err; > + } > + > if (!err) { > struct cp_control cpc = { > .reason = CP_RECOVERY, > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index 808709581481..2b6e637dd6d3 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -4331,6 +4331,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi) > return 0; > } > > +#ifdef CONFIG_BLK_DEV_ZONED > + > +static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi, > + block_t zone_blkaddr) > +{ > + int i; > + > + for (i = 0; i < sbi->s_ndevs; i++) { > + if (!bdev_is_zoned(FDEV(i).bdev)) > + continue; > + if (sbi->s_ndevs == 1 || (FDEV(i).start_blk <= zone_blkaddr && > + zone_blkaddr <= FDEV(i).end_blk)) > + return &FDEV(i); > + } > + > + return NULL; > +} > + > +static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type, > + bool check_only) > +{ > + struct curseg_info *cs = CURSEG_I(sbi, type); > + struct f2fs_dev_info *zbd; > + struct blk_zone zone; > + unsigned int cs_section, wp_segno, wp_blkoff, nr_zones, wp_sector_off; > + block_t cs_zone_block, wp_block, cs_block; > + unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; > + sector_t zone_sector; > + int err; > + > + cs_section = GET_SEC_FROM_SEG(sbi, cs->segno); > + cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section)); > + cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff; > + > + zbd = get_target_zoned_dev(sbi, cs_zone_block); > + if (!zbd) > + return 0; > + > + /* report zone for the sector the curseg points to */ > + zone_sector = (sector_t)(cs_zone_block - zbd->start_blk) > + << log_sectors_per_block; > + nr_zones = 1; > + err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones); > + if (err) { > + f2fs_err(sbi, "Report zone failed: %s errno=(%d)", > + zbd->path, err); > + return err; > + } > + > + if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ) > + return 0; > + > + wp_block = zbd->start_blk + (zone.wp >> log_sectors_per_block); > + wp_segno = GET_SEGNO(sbi, wp_block); > + wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno); > + wp_sector_off = zone.wp & GENMASK(log_sectors_per_block - 1, 0); > + > + if (cs->segno == wp_segno && cs->next_blkoff == wp_blkoff && > + wp_sector_off == 0) > + return 0; > + > + f2fs_notice(sbi, "Unaligned curseg[%d] with write pointer: " > + "curseg[0x%x,0x%x] wp[0x%x,0x%x]", > + type, cs->segno, cs->next_blkoff, wp_segno, wp_blkoff); > + > + /* if check_only is specified, return error without fix */ > + if (check_only) > + return -EIO; > + > + f2fs_notice(sbi, "Assign new section to curseg[%d]: " > + "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff); > + allocate_segment_by_default(sbi, type, true); > + > + /* check newly assigned zone */ > + cs_section = GET_SEC_FROM_SEG(sbi, cs->segno); > + cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section)); > + > + zbd = get_target_zoned_dev(sbi, cs_zone_block); > + if (!zbd) > + return 0; > + > + zone_sector = (sector_t)(cs_zone_block - zbd->start_blk) > + << log_sectors_per_block; > + nr_zones = 1; > + err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones); > + if (err) { > + f2fs_err(sbi, "Report zone failed: %s errno=(%d)", > + zbd->path, err); > + return err; > + } > + > + if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ) > + return 0; > + > + if (zone.wp != zone.start) { > + f2fs_notice(sbi, > + "New section for curseg[%d] is not empty, " > + "run fsck to fix: curseg[0x%x,0x%x]", > + type, cs->segno, cs->next_blkoff); > + __set_inuse(sbi, GET_SEGNO(sbi, cs_zone_block)); > + f2fs_stop_checkpoint(sbi, true); > + set_sbi_flag(sbi, SBI_NEED_FSCK); > + return -EINVAL; > + } > + > + return 0; > +} > + > +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only) > +{ > + int i, ret; > + > + for (i = 0; i < NO_CHECK_TYPE; i++) { > + ret = fix_curseg_write_pointer(sbi, i, check_only); > + if (ret) > + return ret; > + } > + > + return 0; > +} > +#else > +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only) > +{ > + return 0; > +} > +#endif > + > /* > * Update min, max modified time for cost-benefit GC algorithm > */ > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index 1443cee15863..ebd0ae02a260 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -3525,6 +3525,14 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) > goto free_meta; > } > } > + > + /* check zoned block devices' write pointer consistency */ > + if (f2fs_sb_has_blkzoned(sbi)) { > + err = f2fs_fix_curseg_write_pointer(sbi, f2fs_readonly(sb)); > + if (err) > + goto free_meta; > + } > + > reset_checkpoint: > /* f2fs_recover_fsync_data() cleared this already */ > clear_sbi_flag(sbi, SBI_POR_DOING); > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones 2019-11-05 12:03 ` Chao Yu @ 2019-11-08 4:09 ` Shinichiro Kawasaki 0 siblings, 0 replies; 10+ messages in thread From: Shinichiro Kawasaki @ 2019-11-08 4:09 UTC (permalink / raw) To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel On Nov 05, 2019 / 20:03, Chao Yu wrote: > On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote: > > On sudden f2fs shutdown, write pointers of zoned block devices can go > > further but f2fs meta data keeps current segments at positions before the > > write operations. After remounting the f2fs, this inconsistency causes > > write operations not at write pointers and "Unaligned write command" > > error is reported. > > > > To avoid the error, compare current segments with write pointers of open > > zones the current segments point to, during mount operation. If the write > > pointer position is not aligned with the current segment position, assign > > a new zone to the current segments. Also check the newly assigned zone > > has write pointer at zone start. If not, make mount fail and ask users to > > run fsck. > > > > Perform the consistency check twice. Once during fsync recovery. Not to > > lose the fsync data, do the check after fsync data gets restored and > > before checkpoint commit which flushes data at current segment positions. > > The second check is done at end of f2fs_fill_super() to make sure the > > write pointer consistency regardless of fsync data recovery execution. > > > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > > --- > > fs/f2fs/f2fs.h | 1 + > > fs/f2fs/recovery.c | 6 +++ > > fs/f2fs/segment.c | 127 +++++++++++++++++++++++++++++++++++++++++++++ > > fs/f2fs/super.c | 8 +++ > > 4 files changed, 142 insertions(+) > > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > index 4024790028aa..0216282c5b80 100644 > > --- a/fs/f2fs/f2fs.h > > +++ b/fs/f2fs/f2fs.h > > @@ -3136,6 +3136,7 @@ void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk); > > int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, > > unsigned int val, int alloc); > > void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); > > +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only); > > int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); > > void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); > > int __init f2fs_create_segment_manager_caches(void); > > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c > > index 783773e4560d..c75d1cbae4d1 100644 > > --- a/fs/f2fs/recovery.c > > +++ b/fs/f2fs/recovery.c > > @@ -795,6 +795,12 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only) > > if (need_writecp) { > > set_sbi_flag(sbi, SBI_IS_RECOVERED); > > > > + /* recover zoned block devices' write pointer consistency */ > > + if (!err && f2fs_sb_has_blkzoned(sbi)) { > > + err = f2fs_fix_curseg_write_pointer(sbi, false); > > Can we check and reset current segment under SBI_POR_DOING's protection? since > once SBI_POR_DOING flag is cleared, kworker is able to flush dirty data/node, > which may trigger unaligned write command if write pointer is inconsistent. Yes, will move that part before the SBI_POR_DOING flag clear. Thanks. -- Best Regards, Shin'ichiro Kawasaki _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones 2019-10-28 6:57 [f2fs-dev] [PATCH v2 0/2] f2fs: Check write pointers of zoned block devices Shin'ichiro Kawasaki 2019-10-28 6:58 ` [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki @ 2019-10-28 6:58 ` Shin'ichiro Kawasaki 2019-11-05 12:22 ` Chao Yu 1 sibling, 1 reply; 10+ messages in thread From: Shin'ichiro Kawasaki @ 2019-10-28 6:58 UTC (permalink / raw) To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal To catch f2fs bugs in write pointer handling code for zoned block devices, check write pointers of non-open zones that current segments do not point to. Do this check at mount time, after the fsync data recovery and current segments' write pointer consistency fix. Check two items comparing write pointers with valid block maps in SIT. The first item is check for zones with no valid blocks. When there is no valid blocks in a zone, the write pointer should be at the start of the zone. If not, next write operation to the zone will cause unaligned write error. If write pointer is not at the zone start, make mount fail and ask users to run fsck. The second item is check between the write pointer position and the last valid block in the zone. It is unexpected that the last valid block position is beyond the write pointer. In such a case, report as the bug. Fix is not required for such zone, because the zone is not selected for next write operation until the zone get discarded. Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it in segment.c also. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> --- fs/f2fs/f2fs.h | 3 + fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++ fs/f2fs/super.c | 11 ++-- 3 files changed, 157 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 0216282c5b80..e8524be17852 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, unsigned int val, int alloc); void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only); +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi); int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); int __init f2fs_create_segment_manager_caches(void); @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi, return test_bit(zno, FDEV(devi).blkz_seq); } + +#define F2FS_REPORT_NR_ZONES 4096 #endif static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 2b6e637dd6d3..20ef5b3705e1 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi) #ifdef CONFIG_BLK_DEV_ZONED +static int check_zone_write_pointer(struct f2fs_sb_info *sbi, + struct f2fs_dev_info *fdev, + struct blk_zone *zone) +{ + unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno; + block_t zone_block, wp_block, last_valid_block, b; + unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; + int i; + struct seg_entry *se; + + wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block); + wp_segno = GET_SEGNO(sbi, wp_block); + wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno); + zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block); + zone_segno = GET_SEGNO(sbi, zone_block); + zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno); + + if (zone_segno >= MAIN_SEGS(sbi)) + return 0; + + /* + * If a curseg points to the zone, skip check because the zone + * may have fsync data that valid block map does not mark. + */ + for (i = 0; i < NO_CHECK_TYPE; i++) + if (zone_secno == GET_SEC_FROM_SEG(sbi, + CURSEG_I(sbi, i)->segno)) + return 0; + + /* + * Get last valid block of the zone. + */ + last_valid_block = zone_block - 1; + for (s = 0; s < sbi->segs_per_sec; s++) { + segno = zone_segno + s; + se = get_seg_entry(sbi, segno); + for (b = 0; b < sbi->blocks_per_seg; b++) + if (f2fs_test_bit(b, se->cur_valid_map)) + last_valid_block = START_BLOCK(sbi, segno) + b; + } + + /* + * If last valid block is beyond the write pointer, report the + * inconsistency. This inconsistency does not cause write error + * because the zone will not be selected for write operation until + * it get discarded. Just report it. + */ + if (last_valid_block >= wp_block) { + f2fs_notice(sbi, "Valid block beyond write pointer: " + "valid block[0x%x,0x%x] wp[0x%x,0x%x]", + GET_SEGNO(sbi, last_valid_block), + GET_BLKOFF_FROM_SEG0(sbi, last_valid_block), + wp_segno, wp_blkoff); + return 0; + } + + /* + * If there is no valid block in the zone and if write pointer is + * not at zone start, report the error to run fsck and mark the + * zone as used. + */ + if (last_valid_block + 1 == zone_block && zone->wp != zone->start) { + f2fs_notice(sbi, + "Zone without valid block has non-zero write " + "pointer, run fsck to fix: wp[0x%x,0x%x]", + wp_segno, wp_blkoff); + __set_inuse(sbi, zone_segno); + f2fs_stop_checkpoint(sbi, true); + set_sbi_flag(sbi, SBI_NEED_FSCK); + return -EINVAL; + } + + return 0; +} + +static int check_dev_write_pointer(struct f2fs_sb_info *sbi, + struct f2fs_dev_info *fdev) { + sector_t nr_sectors = fdev->bdev->bd_part->nr_sects; + sector_t sector = 0; + struct blk_zone *zones; + unsigned int i, nr_zones; + unsigned int n = 0; + int err = -EIO; + + if (!bdev_is_zoned(fdev->bdev)) + return 0; + + zones = f2fs_kzalloc(sbi, + array_size(F2FS_REPORT_NR_ZONES, + sizeof(struct blk_zone)), + GFP_KERNEL); + if (!zones) + return -ENOMEM; + + /* Get block zones type */ + while (zones && sector < nr_sectors) { + + nr_zones = F2FS_REPORT_NR_ZONES; + err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones); + if (err) + break; + if (!nr_zones) { + err = -EIO; + break; + } + + for (i = 0; i < nr_zones; i++) { + if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) { + err = check_zone_write_pointer(sbi, fdev, + &zones[i]); + if (err) + break; + } + sector += zones[i].len; + n++; + } + if (err) + break; + } + + kvfree(zones); + + return err; +} + static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi, block_t zone_blkaddr) { @@ -4404,6 +4529,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type, "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff); allocate_segment_by_default(sbi, type, true); + /* check consistency of the zone curseg pointed to */ + if (check_zone_write_pointer(sbi, zbd, &zone)) + return -EIO; + /* check newly assigned zone */ cs_section = GET_SEC_FROM_SEG(sbi, cs->segno); cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section)); @@ -4451,11 +4580,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only) return 0; } + +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi) +{ + int i, ret; + + for (i = 0; i < sbi->s_ndevs; i++) { + ret = check_dev_write_pointer(sbi, &FDEV(i)); + if (ret) + return ret; + } + + return 0; +} #else int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only) { return 0; } + +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi) +{ + return 0; +} #endif /* diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index ebd0ae02a260..357b8f7fd333 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi) if (!FDEV(devi).blkz_seq) return -ENOMEM; -#define F2FS_REPORT_NR_ZONES 4096 - zones = f2fs_kzalloc(sbi, array_size(F2FS_REPORT_NR_ZONES, sizeof(struct blk_zone)), @@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) err = f2fs_recover_fsync_data(sbi, false); if (err < 0) { - if (err != -ENOMEM) + if (err != -ENOMEM && + !is_sbi_flag_set(sbi, SBI_NEED_FSCK)) skip_recovery = true; need_fsck = true; f2fs_err(sbi, "Cannot recover all fsync data errno=%d", @@ -3526,8 +3525,12 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) } } - /* check zoned block devices' write pointer consistency */ + /* fix and check zoned block devices' write pointer consistency */ if (f2fs_sb_has_blkzoned(sbi)) { + err = f2fs_check_write_pointer(sbi); + if (err) + goto free_meta; + err = f2fs_fix_curseg_write_pointer(sbi, f2fs_readonly(sb)); if (err) goto free_meta; -- 2.21.0 _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones 2019-10-28 6:58 ` [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki @ 2019-11-05 12:22 ` Chao Yu 2019-11-08 4:27 ` Shinichiro Kawasaki 0 siblings, 1 reply; 10+ messages in thread From: Chao Yu @ 2019-11-05 12:22 UTC (permalink / raw) To: Shin'ichiro Kawasaki, Jaegeuk Kim, linux-f2fs-devel; +Cc: Damien Le Moal On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote: > To catch f2fs bugs in write pointer handling code for zoned block > devices, check write pointers of non-open zones that current segments do > not point to. Do this check at mount time, after the fsync data recovery > and current segments' write pointer consistency fix. Check two items > comparing write pointers with valid block maps in SIT. > > The first item is check for zones with no valid blocks. When there is no > valid blocks in a zone, the write pointer should be at the start of the > zone. If not, next write operation to the zone will cause unaligned write > error. If write pointer is not at the zone start, make mount fail and ask > users to run fsck. > > The second item is check between the write pointer position and the last > valid block in the zone. It is unexpected that the last valid block > position is beyond the write pointer. In such a case, report as the bug. > Fix is not required for such zone, because the zone is not selected for > next write operation until the zone get discarded. > > Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it > in segment.c also. > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > --- > fs/f2fs/f2fs.h | 3 + > fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++ > fs/f2fs/super.c | 11 ++-- > 3 files changed, 157 insertions(+), 4 deletions(-) > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 0216282c5b80..e8524be17852 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, > unsigned int val, int alloc); > void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); > int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only); > +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi); > int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); > void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); > int __init f2fs_create_segment_manager_caches(void); > @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi, > > return test_bit(zno, FDEV(devi).blkz_seq); > } > + > +#define F2FS_REPORT_NR_ZONES 4096 > #endif > > static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi) > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index 2b6e637dd6d3..20ef5b3705e1 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi) > > #ifdef CONFIG_BLK_DEV_ZONED > > +static int check_zone_write_pointer(struct f2fs_sb_info *sbi, > + struct f2fs_dev_info *fdev, > + struct blk_zone *zone) > +{ > + unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno; > + block_t zone_block, wp_block, last_valid_block, b; > + unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; > + int i; > + struct seg_entry *se; > + > + wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block); > + wp_segno = GET_SEGNO(sbi, wp_block); > + wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno); > + zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block); > + zone_segno = GET_SEGNO(sbi, zone_block); > + zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno); > + > + if (zone_segno >= MAIN_SEGS(sbi)) > + return 0; > + > + /* > + * If a curseg points to the zone, skip check because the zone > + * may have fsync data that valid block map does not mark. None-curseg zone may also contain fsynced data as well? Maybe we can only verify on clean image or recovered image? > + */ > + for (i = 0; i < NO_CHECK_TYPE; i++) > + if (zone_secno == GET_SEC_FROM_SEG(sbi, > + CURSEG_I(sbi, i)->segno)) > + return 0; > + > + /* > + * Get last valid block of the zone. > + */ > + last_valid_block = zone_block - 1; > + for (s = 0; s < sbi->segs_per_sec; s++) { > + segno = zone_segno + s; > + se = get_seg_entry(sbi, segno); > + for (b = 0; b < sbi->blocks_per_seg; b++) > + if (f2fs_test_bit(b, se->cur_valid_map)) > + last_valid_block = START_BLOCK(sbi, segno) + b; > + } We search bitmap table reversedly. > + > + /* > + * If last valid block is beyond the write pointer, report the > + * inconsistency. This inconsistency does not cause write error > + * because the zone will not be selected for write operation until > + * it get discarded. Just report it. > + */ > + if (last_valid_block >= wp_block) { > + f2fs_notice(sbi, "Valid block beyond write pointer: " > + "valid block[0x%x,0x%x] wp[0x%x,0x%x]", > + GET_SEGNO(sbi, last_valid_block), > + GET_BLKOFF_FROM_SEG0(sbi, last_valid_block), > + wp_segno, wp_blkoff); > + return 0; > + } > + > + /* > + * If there is no valid block in the zone and if write pointer is > + * not at zone start, report the error to run fsck and mark the > + * zone as used. > + */ > + if (last_valid_block + 1 == zone_block && zone->wp != zone->start) { > + f2fs_notice(sbi, > + "Zone without valid block has non-zero write " > + "pointer, run fsck to fix: wp[0x%x,0x%x]", > + wp_segno, wp_blkoff); > + __set_inuse(sbi, zone_segno); Why do we need to set it inused? if this is necessary, we need to call this under free_i->segmap_lock. Thanks, > + f2fs_stop_checkpoint(sbi, true); > + set_sbi_flag(sbi, SBI_NEED_FSCK); > + return -EINVAL; > + } > + > + return 0; > +} > + > +static int check_dev_write_pointer(struct f2fs_sb_info *sbi, > + struct f2fs_dev_info *fdev) { > + sector_t nr_sectors = fdev->bdev->bd_part->nr_sects; > + sector_t sector = 0; > + struct blk_zone *zones; > + unsigned int i, nr_zones; > + unsigned int n = 0; > + int err = -EIO; > + > + if (!bdev_is_zoned(fdev->bdev)) > + return 0; > + > + zones = f2fs_kzalloc(sbi, > + array_size(F2FS_REPORT_NR_ZONES, > + sizeof(struct blk_zone)), > + GFP_KERNEL); > + if (!zones) > + return -ENOMEM; > + > + /* Get block zones type */ > + while (zones && sector < nr_sectors) { > + > + nr_zones = F2FS_REPORT_NR_ZONES; > + err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones); > + if (err) > + break; > + if (!nr_zones) { > + err = -EIO; > + break; > + } > + > + for (i = 0; i < nr_zones; i++) { > + if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) { > + err = check_zone_write_pointer(sbi, fdev, > + &zones[i]); > + if (err) > + break; > + } > + sector += zones[i].len; > + n++; > + } > + if (err) > + break; > + } > + > + kvfree(zones); > + > + return err; > +} > + > static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi, > block_t zone_blkaddr) > { > @@ -4404,6 +4529,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type, > "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff); > allocate_segment_by_default(sbi, type, true); > > + /* check consistency of the zone curseg pointed to */ > + if (check_zone_write_pointer(sbi, zbd, &zone)) > + return -EIO; > + > /* check newly assigned zone */ > cs_section = GET_SEC_FROM_SEG(sbi, cs->segno); > cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section)); > @@ -4451,11 +4580,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only) > > return 0; > } > + > +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi) > +{ > + int i, ret; > + > + for (i = 0; i < sbi->s_ndevs; i++) { > + ret = check_dev_write_pointer(sbi, &FDEV(i)); > + if (ret) > + return ret; > + } > + > + return 0; > +} > #else > int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only) > { > return 0; > } > + > +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi) > +{ > + return 0; > +} > #endif > > /* > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index ebd0ae02a260..357b8f7fd333 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi) > if (!FDEV(devi).blkz_seq) > return -ENOMEM; > > -#define F2FS_REPORT_NR_ZONES 4096 > - > zones = f2fs_kzalloc(sbi, > array_size(F2FS_REPORT_NR_ZONES, > sizeof(struct blk_zone)), > @@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) > > err = f2fs_recover_fsync_data(sbi, false); > if (err < 0) { > - if (err != -ENOMEM) > + if (err != -ENOMEM && > + !is_sbi_flag_set(sbi, SBI_NEED_FSCK)) > skip_recovery = true; > need_fsck = true; > f2fs_err(sbi, "Cannot recover all fsync data errno=%d", > @@ -3526,8 +3525,12 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) > } > } > > - /* check zoned block devices' write pointer consistency */ > + /* fix and check zoned block devices' write pointer consistency */ > if (f2fs_sb_has_blkzoned(sbi)) { > + err = f2fs_check_write_pointer(sbi); > + if (err) > + goto free_meta; > + > err = f2fs_fix_curseg_write_pointer(sbi, f2fs_readonly(sb)); > if (err) > goto free_meta; > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones 2019-11-05 12:22 ` Chao Yu @ 2019-11-08 4:27 ` Shinichiro Kawasaki 2019-11-11 3:27 ` Chao Yu 0 siblings, 1 reply; 10+ messages in thread From: Shinichiro Kawasaki @ 2019-11-08 4:27 UTC (permalink / raw) To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel On Nov 05, 2019 / 20:22, Chao Yu wrote: > On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote: > > To catch f2fs bugs in write pointer handling code for zoned block > > devices, check write pointers of non-open zones that current segments do > > not point to. Do this check at mount time, after the fsync data recovery > > and current segments' write pointer consistency fix. Check two items > > comparing write pointers with valid block maps in SIT. > > > > The first item is check for zones with no valid blocks. When there is no > > valid blocks in a zone, the write pointer should be at the start of the > > zone. If not, next write operation to the zone will cause unaligned write > > error. If write pointer is not at the zone start, make mount fail and ask > > users to run fsck. > > > > The second item is check between the write pointer position and the last > > valid block in the zone. It is unexpected that the last valid block > > position is beyond the write pointer. In such a case, report as the bug. > > Fix is not required for such zone, because the zone is not selected for > > next write operation until the zone get discarded. > > > > Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it > > in segment.c also. > > > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > > --- > > fs/f2fs/f2fs.h | 3 + > > fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++ > > fs/f2fs/super.c | 11 ++-- > > 3 files changed, 157 insertions(+), 4 deletions(-) > > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > index 0216282c5b80..e8524be17852 100644 > > --- a/fs/f2fs/f2fs.h > > +++ b/fs/f2fs/f2fs.h > > @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, > > unsigned int val, int alloc); > > void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); > > int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only); > > +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi); > > int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); > > void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); > > int __init f2fs_create_segment_manager_caches(void); > > @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi, > > > > return test_bit(zno, FDEV(devi).blkz_seq); > > } > > + > > +#define F2FS_REPORT_NR_ZONES 4096 > > #endif > > > > static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi) > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > > index 2b6e637dd6d3..20ef5b3705e1 100644 > > --- a/fs/f2fs/segment.c > > +++ b/fs/f2fs/segment.c > > @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi) > > > > #ifdef CONFIG_BLK_DEV_ZONED > > > > +static int check_zone_write_pointer(struct f2fs_sb_info *sbi, > > + struct f2fs_dev_info *fdev, > > + struct blk_zone *zone) > > +{ > > + unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno; > > + block_t zone_block, wp_block, last_valid_block, b; > > + unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; > > + int i; > > + struct seg_entry *se; > > + > > + wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block); > > + wp_segno = GET_SEGNO(sbi, wp_block); > > + wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno); > > + zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block); > > + zone_segno = GET_SEGNO(sbi, zone_block); > > + zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno); > > + > > + if (zone_segno >= MAIN_SEGS(sbi)) > > + return 0; > > + > > + /* > > + * If a curseg points to the zone, skip check because the zone > > + * may have fsync data that valid block map does not mark. > > None-curseg zone may also contain fsynced data as well? Maybe we can only verify > on clean image or recovered image? Right. This function for none-curseg zones should be called after fsync data recovery. I think my comment above is confusing. The point is that this function is for none-curseg zones, and other function covers check for curseg zones. Let me revise the comment as follows: Skip check of zones cursegs point to, since fix_curseg_write_pointer() checks them. > > > + */ > > + for (i = 0; i < NO_CHECK_TYPE; i++) > > + if (zone_secno == GET_SEC_FROM_SEG(sbi, > > + CURSEG_I(sbi, i)->segno)) > > + return 0; > > + > > + /* > > + * Get last valid block of the zone. > > + */ > > + last_valid_block = zone_block - 1; > > + for (s = 0; s < sbi->segs_per_sec; s++) { > > + segno = zone_segno + s; > > + se = get_seg_entry(sbi, segno); > > + for (b = 0; b < sbi->blocks_per_seg; b++) > > + if (f2fs_test_bit(b, se->cur_valid_map)) > > + last_valid_block = START_BLOCK(sbi, segno) + b; > > + } > > We search bitmap table reversedly. Yes, will reverse the loops in the next post. > > > + > > + /* > > + * If last valid block is beyond the write pointer, report the > > + * inconsistency. This inconsistency does not cause write error > > + * because the zone will not be selected for write operation until > > + * it get discarded. Just report it. > > + */ > > + if (last_valid_block >= wp_block) { > > + f2fs_notice(sbi, "Valid block beyond write pointer: " > > + "valid block[0x%x,0x%x] wp[0x%x,0x%x]", > > + GET_SEGNO(sbi, last_valid_block), > > + GET_BLKOFF_FROM_SEG0(sbi, last_valid_block), > > + wp_segno, wp_blkoff); > > + return 0; > > + } > > + > > + /* > > + * If there is no valid block in the zone and if write pointer is > > + * not at zone start, report the error to run fsck and mark the > > + * zone as used. > > + */ > > + if (last_valid_block + 1 == zone_block && zone->wp != zone->start) { > > + f2fs_notice(sbi, > > + "Zone without valid block has non-zero write " > > + "pointer, run fsck to fix: wp[0x%x,0x%x]", > > + wp_segno, wp_blkoff); > > + __set_inuse(sbi, zone_segno); > > Why do we need to set it inused? if this is necessary, we need to call this > under free_i->segmap_lock. I once thought that I need to set inconsistent zones in use, because I observed that write operation happened after zone consistency check failure (in fill_super() after free_meta label). It caused unaligned writer error. To avoid it, I added __set_inuse() to keep inconsistent zones not selected for the write target. But that write operation happened because the write pointer fix curseg was done out of the SBI_POR_DOING protection. Now I learned SBI_POR_DOING can protect write operation, and I don't think set in use for the inconsistent zones is required. Will remove __set_inuse() calls from this patch and the first patch. Thanks! -- Best Regards, Shin'ichiro Kawasaki _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones 2019-11-08 4:27 ` Shinichiro Kawasaki @ 2019-11-11 3:27 ` Chao Yu 2019-11-13 1:41 ` Shinichiro Kawasaki 0 siblings, 1 reply; 10+ messages in thread From: Chao Yu @ 2019-11-11 3:27 UTC (permalink / raw) To: Shinichiro Kawasaki; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel On 2019/11/8 12:27, Shinichiro Kawasaki wrote: > On Nov 05, 2019 / 20:22, Chao Yu wrote: >> On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote: >>> To catch f2fs bugs in write pointer handling code for zoned block >>> devices, check write pointers of non-open zones that current segments do >>> not point to. Do this check at mount time, after the fsync data recovery >>> and current segments' write pointer consistency fix. Check two items >>> comparing write pointers with valid block maps in SIT. >>> >>> The first item is check for zones with no valid blocks. When there is no >>> valid blocks in a zone, the write pointer should be at the start of the >>> zone. If not, next write operation to the zone will cause unaligned write >>> error. If write pointer is not at the zone start, make mount fail and ask >>> users to run fsck. >>> >>> The second item is check between the write pointer position and the last >>> valid block in the zone. It is unexpected that the last valid block >>> position is beyond the write pointer. In such a case, report as the bug. >>> Fix is not required for such zone, because the zone is not selected for >>> next write operation until the zone get discarded. >>> >>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it >>> in segment.c also. >>> >>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> >>> --- >>> fs/f2fs/f2fs.h | 3 + >>> fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++ >>> fs/f2fs/super.c | 11 ++-- >>> 3 files changed, 157 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >>> index 0216282c5b80..e8524be17852 100644 >>> --- a/fs/f2fs/f2fs.h >>> +++ b/fs/f2fs/f2fs.h >>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, >>> unsigned int val, int alloc); >>> void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); >>> int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only); >>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi); >>> int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); >>> void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); >>> int __init f2fs_create_segment_manager_caches(void); >>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi, >>> >>> return test_bit(zno, FDEV(devi).blkz_seq); >>> } >>> + >>> +#define F2FS_REPORT_NR_ZONES 4096 >>> #endif >>> >>> static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi) >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c >>> index 2b6e637dd6d3..20ef5b3705e1 100644 >>> --- a/fs/f2fs/segment.c >>> +++ b/fs/f2fs/segment.c >>> @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi) >>> >>> #ifdef CONFIG_BLK_DEV_ZONED >>> >>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi, >>> + struct f2fs_dev_info *fdev, >>> + struct blk_zone *zone) >>> +{ >>> + unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno; >>> + block_t zone_block, wp_block, last_valid_block, b; >>> + unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; >>> + int i; >>> + struct seg_entry *se; >>> + >>> + wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block); >>> + wp_segno = GET_SEGNO(sbi, wp_block); >>> + wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno); >>> + zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block); >>> + zone_segno = GET_SEGNO(sbi, zone_block); >>> + zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno); >>> + >>> + if (zone_segno >= MAIN_SEGS(sbi)) >>> + return 0; >>> + >>> + /* >>> + * If a curseg points to the zone, skip check because the zone >>> + * may have fsync data that valid block map does not mark. >> >> None-curseg zone may also contain fsynced data as well? Maybe we can only verify >> on clean image or recovered image? > > Right. This function for none-curseg zones should be called after fsync data If so, any place to check recovery is done? You know, user can choose to skip recovery by using disable_roll_forward/norecovery mount option. > recovery. I think my comment above is confusing. The point is that this > function is for none-curseg zones, and other function covers check for curseg > zones. Let me revise the comment as follows: > > Skip check of zones cursegs point to, since fix_curseg_write_pointer() > checks them. > >> >>> + */ >>> + for (i = 0; i < NO_CHECK_TYPE; i++) >>> + if (zone_secno == GET_SEC_FROM_SEG(sbi, >>> + CURSEG_I(sbi, i)->segno)) >>> + return 0; >>> + >>> + /* >>> + * Get last valid block of the zone. >>> + */ >>> + last_valid_block = zone_block - 1; >>> + for (s = 0; s < sbi->segs_per_sec; s++) { >>> + segno = zone_segno + s; >>> + se = get_seg_entry(sbi, segno); >>> + for (b = 0; b < sbi->blocks_per_seg; b++) >>> + if (f2fs_test_bit(b, se->cur_valid_map)) >>> + last_valid_block = START_BLOCK(sbi, segno) + b; >>> + } >> >> We search bitmap table reversedly. > > Yes, will reverse the loops in the next post. > >> >>> + >>> + /* >>> + * If last valid block is beyond the write pointer, report the >>> + * inconsistency. This inconsistency does not cause write error >>> + * because the zone will not be selected for write operation until >>> + * it get discarded. Just report it. >>> + */ >>> + if (last_valid_block >= wp_block) { >>> + f2fs_notice(sbi, "Valid block beyond write pointer: " >>> + "valid block[0x%x,0x%x] wp[0x%x,0x%x]", >>> + GET_SEGNO(sbi, last_valid_block), >>> + GET_BLKOFF_FROM_SEG0(sbi, last_valid_block), >>> + wp_segno, wp_blkoff); >>> + return 0; >>> + } >>> + >>> + /* >>> + * If there is no valid block in the zone and if write pointer is >>> + * not at zone start, report the error to run fsck and mark the >>> + * zone as used. >>> + */ >>> + if (last_valid_block + 1 == zone_block && zone->wp != zone->start) { >>> + f2fs_notice(sbi, >>> + "Zone without valid block has non-zero write " >>> + "pointer, run fsck to fix: wp[0x%x,0x%x]", >>> + wp_segno, wp_blkoff); >>> + __set_inuse(sbi, zone_segno); >> >> Why do we need to set it inused? if this is necessary, we need to call this >> under free_i->segmap_lock. > > I once thought that I need to set inconsistent zones in use, because I observed > that write operation happened after zone consistency check failure (in > fill_super() after free_meta label). It caused unaligned writer error. To avoid > it, I added __set_inuse() to keep inconsistent zones not selected for the write > target. > > But that write operation happened because the write pointer fix curseg was done > out of the SBI_POR_DOING protection. Now I learned SBI_POR_DOING can protect > write operation, and I don't think set in use for the inconsistent zones is > required. Will remove __set_inuse() calls from this patch and the first patch. Also f2fs_stop_checkpoint() will stop any data/node/meta writeback, so it'd be safe here. Thanks, > > Thanks! > > -- > Best Regards, > Shin'ichiro Kawasaki. > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones 2019-11-11 3:27 ` Chao Yu @ 2019-11-13 1:41 ` Shinichiro Kawasaki 2019-11-14 8:27 ` Shinichiro Kawasaki 0 siblings, 1 reply; 10+ messages in thread From: Shinichiro Kawasaki @ 2019-11-13 1:41 UTC (permalink / raw) To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel On Nov 11, 2019 / 11:27, Chao Yu wrote: > On 2019/11/8 12:27, Shinichiro Kawasaki wrote: > > On Nov 05, 2019 / 20:22, Chao Yu wrote: > >> On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote: > >>> To catch f2fs bugs in write pointer handling code for zoned block > >>> devices, check write pointers of non-open zones that current segments do > >>> not point to. Do this check at mount time, after the fsync data recovery > >>> and current segments' write pointer consistency fix. Check two items > >>> comparing write pointers with valid block maps in SIT. > >>> > >>> The first item is check for zones with no valid blocks. When there is no > >>> valid blocks in a zone, the write pointer should be at the start of the > >>> zone. If not, next write operation to the zone will cause unaligned write > >>> error. If write pointer is not at the zone start, make mount fail and ask > >>> users to run fsck. > >>> > >>> The second item is check between the write pointer position and the last > >>> valid block in the zone. It is unexpected that the last valid block > >>> position is beyond the write pointer. In such a case, report as the bug. > >>> Fix is not required for such zone, because the zone is not selected for > >>> next write operation until the zone get discarded. > >>> > >>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it > >>> in segment.c also. > >>> > >>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > >>> --- > >>> fs/f2fs/f2fs.h | 3 + > >>> fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++ > >>> fs/f2fs/super.c | 11 ++-- > >>> 3 files changed, 157 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > >>> index 0216282c5b80..e8524be17852 100644 > >>> --- a/fs/f2fs/f2fs.h > >>> +++ b/fs/f2fs/f2fs.h > >>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, > >>> unsigned int val, int alloc); > >>> void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); > >>> int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only); > >>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi); > >>> int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); > >>> void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); > >>> int __init f2fs_create_segment_manager_caches(void); > >>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi, > >>> > >>> return test_bit(zno, FDEV(devi).blkz_seq); > >>> } > >>> + > >>> +#define F2FS_REPORT_NR_ZONES 4096 > >>> #endif > >>> > >>> static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi) > >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > >>> index 2b6e637dd6d3..20ef5b3705e1 100644 > >>> --- a/fs/f2fs/segment.c > >>> +++ b/fs/f2fs/segment.c > >>> @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi) > >>> > >>> #ifdef CONFIG_BLK_DEV_ZONED > >>> > >>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi, > >>> + struct f2fs_dev_info *fdev, > >>> + struct blk_zone *zone) > >>> +{ > >>> + unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno; > >>> + block_t zone_block, wp_block, last_valid_block, b; > >>> + unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; > >>> + int i; > >>> + struct seg_entry *se; > >>> + > >>> + wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block); > >>> + wp_segno = GET_SEGNO(sbi, wp_block); > >>> + wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno); > >>> + zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block); > >>> + zone_segno = GET_SEGNO(sbi, zone_block); > >>> + zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno); > >>> + > >>> + if (zone_segno >= MAIN_SEGS(sbi)) > >>> + return 0; > >>> + > >>> + /* > >>> + * If a curseg points to the zone, skip check because the zone > >>> + * may have fsync data that valid block map does not mark. > >> > >> None-curseg zone may also contain fsynced data as well? Maybe we can only verify > >> on clean image or recovered image? > > > > Right. This function for none-curseg zones should be called after fsync data > > If so, any place to check recovery is done? You know, user can choose to skip > recovery by using disable_roll_forward/norecovery mount option. Ah, disable_roll_forward/norecovery mount option needs some care. The patch I posted did not care it well enough. When disable_roll_forward/norecovery mount option is set, I think the function for none-curseg zones should be called only if there is no fsync data. If fsync data exists, mount fails regardless of write pointer status. I will modify the function calling step and conditions in the next patch. > > > recovery. I think my comment above is confusing. The point is that this > > function is for none-curseg zones, and other function covers check for curseg > > zones. Let me revise the comment as follows: > > > > Skip check of zones cursegs point to, since fix_curseg_write_pointer() > > checks them. > > > >> > >>> + */ > >>> + for (i = 0; i < NO_CHECK_TYPE; i++) > >>> + if (zone_secno == GET_SEC_FROM_SEG(sbi, > >>> + CURSEG_I(sbi, i)->segno)) > >>> + return 0; > >>> + > >>> + /* > >>> + * Get last valid block of the zone. > >>> + */ > >>> + last_valid_block = zone_block - 1; > >>> + for (s = 0; s < sbi->segs_per_sec; s++) { > >>> + segno = zone_segno + s; > >>> + se = get_seg_entry(sbi, segno); > >>> + for (b = 0; b < sbi->blocks_per_seg; b++) > >>> + if (f2fs_test_bit(b, se->cur_valid_map)) > >>> + last_valid_block = START_BLOCK(sbi, segno) + b; > >>> + } > >> > >> We search bitmap table reversedly. > > > > Yes, will reverse the loops in the next post. > > > >> > >>> + > >>> + /* > >>> + * If last valid block is beyond the write pointer, report the > >>> + * inconsistency. This inconsistency does not cause write error > >>> + * because the zone will not be selected for write operation until > >>> + * it get discarded. Just report it. > >>> + */ > >>> + if (last_valid_block >= wp_block) { > >>> + f2fs_notice(sbi, "Valid block beyond write pointer: " > >>> + "valid block[0x%x,0x%x] wp[0x%x,0x%x]", > >>> + GET_SEGNO(sbi, last_valid_block), > >>> + GET_BLKOFF_FROM_SEG0(sbi, last_valid_block), > >>> + wp_segno, wp_blkoff); > >>> + return 0; > >>> + } > >>> + > >>> + /* > >>> + * If there is no valid block in the zone and if write pointer is > >>> + * not at zone start, report the error to run fsck and mark the > >>> + * zone as used. > >>> + */ > >>> + if (last_valid_block + 1 == zone_block && zone->wp != zone->start) { > >>> + f2fs_notice(sbi, > >>> + "Zone without valid block has non-zero write " > >>> + "pointer, run fsck to fix: wp[0x%x,0x%x]", > >>> + wp_segno, wp_blkoff); > >>> + __set_inuse(sbi, zone_segno); > >> > >> Why do we need to set it inused? if this is necessary, we need to call this > >> under free_i->segmap_lock. > > > > I once thought that I need to set inconsistent zones in use, because I observed > > that write operation happened after zone consistency check failure (in > > fill_super() after free_meta label). It caused unaligned writer error. To avoid > > it, I added __set_inuse() to keep inconsistent zones not selected for the write > > target. > > > > But that write operation happened because the write pointer fix curseg was done > > out of the SBI_POR_DOING protection. Now I learned SBI_POR_DOING can protect > > write operation, and I don't think set in use for the inconsistent zones is > > required. Will remove __set_inuse() calls from this patch and the first patch. > > Also f2fs_stop_checkpoint() will stop any data/node/meta writeback, so it'd be > safe here. Ok, I note it. Now I'm looking at the code again, and I think the write pointer fix for cursegs can be done within SBI_POR_DOING protection. At this moment, I think f2fs_stop_checkpoint() is not necessary. Thanks! -- Best Regards, Shin'ichiro Kawasaki _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones 2019-11-13 1:41 ` Shinichiro Kawasaki @ 2019-11-14 8:27 ` Shinichiro Kawasaki 0 siblings, 0 replies; 10+ messages in thread From: Shinichiro Kawasaki @ 2019-11-14 8:27 UTC (permalink / raw) To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel On Nov 13, 2019 / 10:41, Shin'ichiro Kawasaki wrote: > On Nov 11, 2019 / 11:27, Chao Yu wrote: > > On 2019/11/8 12:27, Shinichiro Kawasaki wrote: > > > On Nov 05, 2019 / 20:22, Chao Yu wrote: > > >> On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote: > > >>> To catch f2fs bugs in write pointer handling code for zoned block > > >>> devices, check write pointers of non-open zones that current segments do > > >>> not point to. Do this check at mount time, after the fsync data recovery > > >>> and current segments' write pointer consistency fix. Check two items > > >>> comparing write pointers with valid block maps in SIT. > > >>> > > >>> The first item is check for zones with no valid blocks. When there is no > > >>> valid blocks in a zone, the write pointer should be at the start of the > > >>> zone. If not, next write operation to the zone will cause unaligned write > > >>> error. If write pointer is not at the zone start, make mount fail and ask > > >>> users to run fsck. > > >>> > > >>> The second item is check between the write pointer position and the last > > >>> valid block in the zone. It is unexpected that the last valid block > > >>> position is beyond the write pointer. In such a case, report as the bug. > > >>> Fix is not required for such zone, because the zone is not selected for > > >>> next write operation until the zone get discarded. > > >>> > > >>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it > > >>> in segment.c also. > > >>> > > >>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > > >>> --- > > >>> fs/f2fs/f2fs.h | 3 + > > >>> fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++ > > >>> fs/f2fs/super.c | 11 ++-- > > >>> 3 files changed, 157 insertions(+), 4 deletions(-) > > >>> > > >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > >>> index 0216282c5b80..e8524be17852 100644 > > >>> --- a/fs/f2fs/f2fs.h > > >>> +++ b/fs/f2fs/f2fs.h > > >>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, > > >>> unsigned int val, int alloc); > > >>> void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); > > >>> int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only); > > >>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi); > > >>> int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); > > >>> void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); > > >>> int __init f2fs_create_segment_manager_caches(void); > > >>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi, > > >>> > > >>> return test_bit(zno, FDEV(devi).blkz_seq); > > >>> } > > >>> + > > >>> +#define F2FS_REPORT_NR_ZONES 4096 > > >>> #endif > > >>> > > >>> static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi) > > >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > > >>> index 2b6e637dd6d3..20ef5b3705e1 100644 > > >>> --- a/fs/f2fs/segment.c > > >>> +++ b/fs/f2fs/segment.c > > >>> @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi) > > >>> > > >>> #ifdef CONFIG_BLK_DEV_ZONED > > >>> > > >>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi, > > >>> + struct f2fs_dev_info *fdev, > > >>> + struct blk_zone *zone) > > >>> +{ > > >>> + unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno; > > >>> + block_t zone_block, wp_block, last_valid_block, b; > > >>> + unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; > > >>> + int i; > > >>> + struct seg_entry *se; > > >>> + > > >>> + wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block); > > >>> + wp_segno = GET_SEGNO(sbi, wp_block); > > >>> + wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno); > > >>> + zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block); > > >>> + zone_segno = GET_SEGNO(sbi, zone_block); > > >>> + zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno); > > >>> + > > >>> + if (zone_segno >= MAIN_SEGS(sbi)) > > >>> + return 0; > > >>> + > > >>> + /* > > >>> + * If a curseg points to the zone, skip check because the zone > > >>> + * may have fsync data that valid block map does not mark. > > >> > > >> None-curseg zone may also contain fsynced data as well? Maybe we can only verify > > >> on clean image or recovered image? > > > > > > Right. This function for none-curseg zones should be called after fsync data > > > > If so, any place to check recovery is done? You know, user can choose to skip > > recovery by using disable_roll_forward/norecovery mount option. > > Ah, disable_roll_forward/norecovery mount option needs some care. The patch I > posted did not care it well enough. > > When disable_roll_forward/norecovery mount option is set, I think the function > for none-curseg zones should be called only if there is no fsync data. If fsync > data exists, mount fails regardless of write pointer status. > > I will modify the function calling step and conditions in the next patch. I have sent out the v3 patch series, which reflects the comments above. > > > > > > recovery. I think my comment above is confusing. The point is that this > > > function is for none-curseg zones, and other function covers check for curseg > > > zones. Let me revise the comment as follows: > > > > > > Skip check of zones cursegs point to, since fix_curseg_write_pointer() > > > checks them. > > > > > >> > > >>> + */ > > >>> + for (i = 0; i < NO_CHECK_TYPE; i++) > > >>> + if (zone_secno == GET_SEC_FROM_SEG(sbi, > > >>> + CURSEG_I(sbi, i)->segno)) > > >>> + return 0; > > >>> + > > >>> + /* > > >>> + * Get last valid block of the zone. > > >>> + */ > > >>> + last_valid_block = zone_block - 1; > > >>> + for (s = 0; s < sbi->segs_per_sec; s++) { > > >>> + segno = zone_segno + s; > > >>> + se = get_seg_entry(sbi, segno); > > >>> + for (b = 0; b < sbi->blocks_per_seg; b++) > > >>> + if (f2fs_test_bit(b, se->cur_valid_map)) > > >>> + last_valid_block = START_BLOCK(sbi, segno) + b; > > >>> + } > > >> > > >> We search bitmap table reversedly. > > > > > > Yes, will reverse the loops in the next post. > > > > > >> > > >>> + > > >>> + /* > > >>> + * If last valid block is beyond the write pointer, report the > > >>> + * inconsistency. This inconsistency does not cause write error > > >>> + * because the zone will not be selected for write operation until > > >>> + * it get discarded. Just report it. > > >>> + */ > > >>> + if (last_valid_block >= wp_block) { > > >>> + f2fs_notice(sbi, "Valid block beyond write pointer: " > > >>> + "valid block[0x%x,0x%x] wp[0x%x,0x%x]", > > >>> + GET_SEGNO(sbi, last_valid_block), > > >>> + GET_BLKOFF_FROM_SEG0(sbi, last_valid_block), > > >>> + wp_segno, wp_blkoff); > > >>> + return 0; > > >>> + } > > >>> + > > >>> + /* > > >>> + * If there is no valid block in the zone and if write pointer is > > >>> + * not at zone start, report the error to run fsck and mark the > > >>> + * zone as used. > > >>> + */ > > >>> + if (last_valid_block + 1 == zone_block && zone->wp != zone->start) { > > >>> + f2fs_notice(sbi, > > >>> + "Zone without valid block has non-zero write " > > >>> + "pointer, run fsck to fix: wp[0x%x,0x%x]", > > >>> + wp_segno, wp_blkoff); > > >>> + __set_inuse(sbi, zone_segno); > > >> > > >> Why do we need to set it inused? if this is necessary, we need to call this > > >> under free_i->segmap_lock. > > > > > > I once thought that I need to set inconsistent zones in use, because I observed > > > that write operation happened after zone consistency check failure (in > > > fill_super() after free_meta label). It caused unaligned writer error. To avoid > > > it, I added __set_inuse() to keep inconsistent zones not selected for the write > > > target. > > > > > > But that write operation happened because the write pointer fix curseg was done > > > out of the SBI_POR_DOING protection. Now I learned SBI_POR_DOING can protect > > > write operation, and I don't think set in use for the inconsistent zones is > > > required. Will remove __set_inuse() calls from this patch and the first patch. > > > > Also f2fs_stop_checkpoint() will stop any data/node/meta writeback, so it'd be > > safe here. > > Ok, I note it. Now I'm looking at the code again, and I think the write pointer > fix for cursegs can be done within SBI_POR_DOING protection. At this moment, I > think f2fs_stop_checkpoint() is not necessary. My comment above could be confusing, which is about the 1st patch. I removed f2fs_stop_checkpoint() from the v3 1st patch. As for none-curseg zones cannot be done within SBI_POR_DOING, still f2fs_stop_checkpoint() is required. I kept it in v3 2nd patch. Thanks in advance for the reviews and comments on the v3 seires. -- Best Regards, Shin'ichiro Kawasaki _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-11-14 8:27 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-28 6:57 [f2fs-dev] [PATCH v2 0/2] f2fs: Check write pointers of zoned block devices Shin'ichiro Kawasaki 2019-10-28 6:58 ` [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki 2019-11-05 12:03 ` Chao Yu 2019-11-08 4:09 ` Shinichiro Kawasaki 2019-10-28 6:58 ` [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki 2019-11-05 12:22 ` Chao Yu 2019-11-08 4:27 ` Shinichiro Kawasaki 2019-11-11 3:27 ` Chao Yu 2019-11-13 1:41 ` Shinichiro Kawasaki 2019-11-14 8:27 ` Shinichiro Kawasaki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).