Linux-f2fs-devel Archive on lore.kernel.org
 help / color / Atom feed
* [f2fs-dev] [PATCH v2 0/2] f2fs: Check write pointers of zoned block devices
@ 2019-10-28  6:57 Shin'ichiro Kawasaki
  2019-10-28  6:58 ` [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki
  2019-10-28  6:58 ` [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
  0 siblings, 2 replies; 10+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:57 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

On sudden f2fs shutdown, zoned block device status and f2fs meta data can be
inconsistent. When f2fs shutdown happens during write operations, write pointers
on the device go forward but the f2fs meta data does not reflect the write
pointer progress. This inconsistency will eventually cause "Unaligned write
command" error when restarting write operation after the next mount.

This error is observed with xfstests test case generic/388, which enforces
sudden shutdown during write operation and checks the file system recovery.

This patch series adds a feature to f2fs to check and fix the write pointer
consistency when zoned block devices get mounted. Per discussion on the list,
implement two checks: a check for open zones and a check for non-open zones.

The first patch add the check for open zones that current segments point to.
Check write pointer consistency with current segment positions recorded in CP,
and if they are inconsistent, assign a new zone to the current segment. When
fsync data exists, check and fix the consistency after fsync data recovery.

The second patch adds the check for non-open zones that current segments do not
point to. Compare write pointers with valid block maps in SIT, and if they are
inconsistent, report the failure. Even if fixes are required, do not have kernel
fix because SIT consistency with CP is not guaranteed because of hardware
failure or software bug possibility. In such check failure cases, fail the mount
to ask users to run fsck. Corresponding patch series is being posted for
fsck.f2fs to add the write pointer consistency check feature.

Thank goes to Chao Yu for the detailed discussion on the list.

Changes from v1:
* 2nd patch: Added check to ensure zones are in main segments

Shin'ichiro Kawasaki (2):
  f2fs: Check write pointer consistency of open zones
  f2fs: Check write pointer consistency of non-open zones

 fs/f2fs/f2fs.h     |   4 +
 fs/f2fs/recovery.c |   6 +
 fs/f2fs/segment.c  | 274 +++++++++++++++++++++++++++++++++++++++++++++
 fs/f2fs/super.c    |  17 ++-
 4 files changed, 298 insertions(+), 3 deletions(-)

-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones
  2019-10-28  6:57 [f2fs-dev] [PATCH v2 0/2] f2fs: Check write pointers of zoned block devices Shin'ichiro Kawasaki
@ 2019-10-28  6:58 ` Shin'ichiro Kawasaki
  2019-11-05 12:03   ` Chao Yu
  2019-10-28  6:58 ` [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
  1 sibling, 1 reply; 10+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:58 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

On sudden f2fs shutdown, write pointers of zoned block devices can go
further but f2fs meta data keeps current segments at positions before the
write operations. After remounting the f2fs, this inconsistency causes
write operations not at write pointers and "Unaligned write command"
error is reported.

To avoid the error, compare current segments with write pointers of open
zones the current segments point to, during mount operation. If the write
pointer position is not aligned with the current segment position, assign
a new zone to the current segments. Also check the newly assigned zone
has write pointer at zone start. If not, make mount fail and ask users to
run fsck.

Perform the consistency check twice. Once during fsync recovery. Not to
lose the fsync data, do the check after fsync data gets restored and
before checkpoint commit which flushes data at current segment positions.
The second check is done at end of f2fs_fill_super() to make sure the
write pointer consistency regardless of fsync data recovery execution.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 fs/f2fs/f2fs.h     |   1 +
 fs/f2fs/recovery.c |   6 +++
 fs/f2fs/segment.c  | 127 +++++++++++++++++++++++++++++++++++++++++++++
 fs/f2fs/super.c    |   8 +++
 4 files changed, 142 insertions(+)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4024790028aa..0216282c5b80 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3136,6 +3136,7 @@ void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
 int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
 			unsigned int val, int alloc);
 void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
+int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only);
 int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
 void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
 int __init f2fs_create_segment_manager_caches(void);
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 783773e4560d..c75d1cbae4d1 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -795,6 +795,12 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
 	if (need_writecp) {
 		set_sbi_flag(sbi, SBI_IS_RECOVERED);
 
+		/* recover zoned block devices' write pointer consistency */
+		if (!err && f2fs_sb_has_blkzoned(sbi)) {
+			err = f2fs_fix_curseg_write_pointer(sbi, false);
+			ret = err;
+		}
+
 		if (!err) {
 			struct cp_control cpc = {
 				.reason = CP_RECOVERY,
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 808709581481..2b6e637dd6d3 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -4331,6 +4331,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
 	return 0;
 }
 
+#ifdef CONFIG_BLK_DEV_ZONED
+
+static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
+						  block_t zone_blkaddr)
+{
+	int i;
+
+	for (i = 0; i < sbi->s_ndevs; i++) {
+		if (!bdev_is_zoned(FDEV(i).bdev))
+			continue;
+		if (sbi->s_ndevs == 1 || (FDEV(i).start_blk <= zone_blkaddr &&
+					  zone_blkaddr <= FDEV(i).end_blk))
+			return &FDEV(i);
+	}
+
+	return NULL;
+}
+
+static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type,
+				    bool check_only)
+{
+	struct curseg_info *cs = CURSEG_I(sbi, type);
+	struct f2fs_dev_info *zbd;
+	struct blk_zone zone;
+	unsigned int cs_section, wp_segno, wp_blkoff, nr_zones, wp_sector_off;
+	block_t cs_zone_block, wp_block, cs_block;
+	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
+	sector_t zone_sector;
+	int err;
+
+	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
+	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
+	cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff;
+
+	zbd = get_target_zoned_dev(sbi, cs_zone_block);
+	if (!zbd)
+		return 0;
+
+	/* report zone for the sector the curseg points to */
+	zone_sector = (sector_t)(cs_zone_block - zbd->start_blk)
+		<< log_sectors_per_block;
+	nr_zones = 1;
+	err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones);
+	if (err) {
+		f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
+			 zbd->path, err);
+		return err;
+	}
+
+	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+		return 0;
+
+	wp_block = zbd->start_blk + (zone.wp >> log_sectors_per_block);
+	wp_segno = GET_SEGNO(sbi, wp_block);
+	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
+	wp_sector_off = zone.wp & GENMASK(log_sectors_per_block - 1, 0);
+
+	if (cs->segno == wp_segno && cs->next_blkoff == wp_blkoff &&
+	    wp_sector_off == 0)
+		return 0;
+
+	f2fs_notice(sbi, "Unaligned curseg[%d] with write pointer: "
+		    "curseg[0x%x,0x%x] wp[0x%x,0x%x]",
+		    type, cs->segno, cs->next_blkoff, wp_segno, wp_blkoff);
+
+	/* if check_only is specified, return error without fix */
+	if (check_only)
+		return -EIO;
+
+	f2fs_notice(sbi, "Assign new section to curseg[%d]: "
+		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
+	allocate_segment_by_default(sbi, type, true);
+
+	/* check newly assigned zone */
+	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
+	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
+
+	zbd = get_target_zoned_dev(sbi, cs_zone_block);
+	if (!zbd)
+		return 0;
+
+	zone_sector = (sector_t)(cs_zone_block - zbd->start_blk)
+		<< log_sectors_per_block;
+	nr_zones = 1;
+	err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones);
+	if (err) {
+		f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
+			 zbd->path, err);
+		return err;
+	}
+
+	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+		return 0;
+
+	if (zone.wp != zone.start) {
+		f2fs_notice(sbi,
+			    "New section for curseg[%d] is not empty, "
+			    "run fsck to fix: curseg[0x%x,0x%x]",
+			    type, cs->segno, cs->next_blkoff);
+		__set_inuse(sbi, GET_SEGNO(sbi, cs_zone_block));
+		f2fs_stop_checkpoint(sbi, true);
+		set_sbi_flag(sbi, SBI_NEED_FSCK);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only)
+{
+	int i, ret;
+
+	for (i = 0; i < NO_CHECK_TYPE; i++) {
+		ret = fix_curseg_write_pointer(sbi, i, check_only);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+#else
+int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only)
+{
+	return 0;
+}
+#endif
+
 /*
  * Update min, max modified time for cost-benefit GC algorithm
  */
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 1443cee15863..ebd0ae02a260 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3525,6 +3525,14 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 			goto free_meta;
 		}
 	}
+
+	/* check zoned block devices' write pointer consistency */
+	if (f2fs_sb_has_blkzoned(sbi)) {
+		err = f2fs_fix_curseg_write_pointer(sbi, f2fs_readonly(sb));
+		if (err)
+			goto free_meta;
+	}
+
 reset_checkpoint:
 	/* f2fs_recover_fsync_data() cleared this already */
 	clear_sbi_flag(sbi, SBI_POR_DOING);
-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-10-28  6:57 [f2fs-dev] [PATCH v2 0/2] f2fs: Check write pointers of zoned block devices Shin'ichiro Kawasaki
  2019-10-28  6:58 ` [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki
@ 2019-10-28  6:58 ` Shin'ichiro Kawasaki
  2019-11-05 12:22   ` Chao Yu
  1 sibling, 1 reply; 10+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:58 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

To catch f2fs bugs in write pointer handling code for zoned block
devices, check write pointers of non-open zones that current segments do
not point to. Do this check at mount time, after the fsync data recovery
and current segments' write pointer consistency fix. Check two items
comparing write pointers with valid block maps in SIT.

The first item is check for zones with no valid blocks. When there is no
valid blocks in a zone, the write pointer should be at the start of the
zone. If not, next write operation to the zone will cause unaligned write
error. If write pointer is not at the zone start, make mount fail and ask
users to run fsck.

The second item is check between the write pointer position and the last
valid block in the zone. It is unexpected that the last valid block
position is beyond the write pointer. In such a case, report as the bug.
Fix is not required for such zone, because the zone is not selected for
next write operation until the zone get discarded.

Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
in segment.c also.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 fs/f2fs/f2fs.h    |   3 +
 fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/f2fs/super.c   |  11 ++--
 3 files changed, 157 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 0216282c5b80..e8524be17852 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
 			unsigned int val, int alloc);
 void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
 int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only);
+int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
 int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
 void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
 int __init f2fs_create_segment_manager_caches(void);
@@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
 
 	return test_bit(zno, FDEV(devi).blkz_seq);
 }
+
+#define F2FS_REPORT_NR_ZONES   4096
 #endif
 
 static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 2b6e637dd6d3..20ef5b3705e1 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
 
 #ifdef CONFIG_BLK_DEV_ZONED
 
+static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
+				    struct f2fs_dev_info *fdev,
+				    struct blk_zone *zone)
+{
+	unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
+	block_t zone_block, wp_block, last_valid_block, b;
+	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
+	int i;
+	struct seg_entry *se;
+
+	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
+	wp_segno = GET_SEGNO(sbi, wp_block);
+	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
+	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
+	zone_segno = GET_SEGNO(sbi, zone_block);
+	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
+
+	if (zone_segno >= MAIN_SEGS(sbi))
+		return 0;
+
+	/*
+	 * If a curseg points to the zone, skip check because the zone
+	 * may have fsync data that valid block map does not mark.
+	 */
+	for (i = 0; i < NO_CHECK_TYPE; i++)
+		if (zone_secno == GET_SEC_FROM_SEG(sbi,
+						   CURSEG_I(sbi, i)->segno))
+			return 0;
+
+	/*
+	 * Get last valid block of the zone.
+	 */
+	last_valid_block = zone_block - 1;
+	for (s = 0; s < sbi->segs_per_sec; s++) {
+		segno = zone_segno + s;
+		se = get_seg_entry(sbi, segno);
+		for (b = 0; b < sbi->blocks_per_seg; b++)
+			if (f2fs_test_bit(b, se->cur_valid_map))
+				last_valid_block = START_BLOCK(sbi, segno) + b;
+	}
+
+	/*
+	 * If last valid block is beyond the write pointer, report the
+	 * inconsistency. This inconsistency does not cause write error
+	 * because the zone will not be selected for write operation until
+	 * it get discarded. Just report it.
+	 */
+	if (last_valid_block >= wp_block) {
+		f2fs_notice(sbi, "Valid block beyond write pointer: "
+			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
+			    GET_SEGNO(sbi, last_valid_block),
+			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
+			    wp_segno, wp_blkoff);
+		return 0;
+	}
+
+	/*
+	 * If there is no valid block in the zone and if write pointer is
+	 * not at zone start, report the error to run fsck and mark the
+	 * zone as used.
+	 */
+	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
+		f2fs_notice(sbi,
+			    "Zone without valid block has non-zero write "
+			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
+			    wp_segno, wp_blkoff);
+		__set_inuse(sbi, zone_segno);
+		f2fs_stop_checkpoint(sbi, true);
+		set_sbi_flag(sbi, SBI_NEED_FSCK);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int check_dev_write_pointer(struct f2fs_sb_info *sbi,
+				   struct f2fs_dev_info *fdev) {
+	sector_t nr_sectors = fdev->bdev->bd_part->nr_sects;
+	sector_t sector = 0;
+	struct blk_zone *zones;
+	unsigned int i, nr_zones;
+	unsigned int n = 0;
+	int err = -EIO;
+
+	if (!bdev_is_zoned(fdev->bdev))
+		return 0;
+
+	zones = f2fs_kzalloc(sbi,
+			     array_size(F2FS_REPORT_NR_ZONES,
+					sizeof(struct blk_zone)),
+			     GFP_KERNEL);
+	if (!zones)
+		return -ENOMEM;
+
+	/* Get block zones type */
+	while (zones && sector < nr_sectors) {
+
+		nr_zones = F2FS_REPORT_NR_ZONES;
+		err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones);
+		if (err)
+			break;
+		if (!nr_zones) {
+			err = -EIO;
+			break;
+		}
+
+		for (i = 0; i < nr_zones; i++) {
+			if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
+				err = check_zone_write_pointer(sbi, fdev,
+							       &zones[i]);
+				if (err)
+					break;
+			}
+			sector += zones[i].len;
+			n++;
+		}
+		if (err)
+			break;
+	}
+
+	kvfree(zones);
+
+	return err;
+}
+
 static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
 						  block_t zone_blkaddr)
 {
@@ -4404,6 +4529,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type,
 		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
 	allocate_segment_by_default(sbi, type, true);
 
+	/* check consistency of the zone curseg pointed to */
+	if (check_zone_write_pointer(sbi, zbd, &zone))
+		return -EIO;
+
 	/* check newly assigned zone */
 	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
 	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
@@ -4451,11 +4580,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only)
 
 	return 0;
 }
+
+int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
+{
+	int i, ret;
+
+	for (i = 0; i < sbi->s_ndevs; i++) {
+		ret = check_dev_write_pointer(sbi, &FDEV(i));
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
 #else
 int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only)
 {
 	return 0;
 }
+
+int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
+{
+	return 0;
+}
 #endif
 
 /*
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index ebd0ae02a260..357b8f7fd333 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
 	if (!FDEV(devi).blkz_seq)
 		return -ENOMEM;
 
-#define F2FS_REPORT_NR_ZONES   4096
-
 	zones = f2fs_kzalloc(sbi,
 			     array_size(F2FS_REPORT_NR_ZONES,
 					sizeof(struct blk_zone)),
@@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 
 		err = f2fs_recover_fsync_data(sbi, false);
 		if (err < 0) {
-			if (err != -ENOMEM)
+			if (err != -ENOMEM &&
+			    !is_sbi_flag_set(sbi, SBI_NEED_FSCK))
 				skip_recovery = true;
 			need_fsck = true;
 			f2fs_err(sbi, "Cannot recover all fsync data errno=%d",
@@ -3526,8 +3525,12 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 		}
 	}
 
-	/* check zoned block devices' write pointer consistency */
+	/* fix and check zoned block devices' write pointer consistency */
 	if (f2fs_sb_has_blkzoned(sbi)) {
+		err = f2fs_check_write_pointer(sbi);
+		if (err)
+			goto free_meta;
+
 		err = f2fs_fix_curseg_write_pointer(sbi, f2fs_readonly(sb));
 		if (err)
 			goto free_meta;
-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones
  2019-10-28  6:58 ` [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki
@ 2019-11-05 12:03   ` Chao Yu
  2019-11-08  4:09     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 10+ messages in thread
From: Chao Yu @ 2019-11-05 12:03 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, Jaegeuk Kim, linux-f2fs-devel; +Cc: Damien Le Moal

On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote:
> On sudden f2fs shutdown, write pointers of zoned block devices can go
> further but f2fs meta data keeps current segments at positions before the
> write operations. After remounting the f2fs, this inconsistency causes
> write operations not at write pointers and "Unaligned write command"
> error is reported.
> 
> To avoid the error, compare current segments with write pointers of open
> zones the current segments point to, during mount operation. If the write
> pointer position is not aligned with the current segment position, assign
> a new zone to the current segments. Also check the newly assigned zone
> has write pointer at zone start. If not, make mount fail and ask users to
> run fsck.
> 
> Perform the consistency check twice. Once during fsync recovery. Not to
> lose the fsync data, do the check after fsync data gets restored and
> before checkpoint commit which flushes data at current segment positions.
> The second check is done at end of f2fs_fill_super() to make sure the
> write pointer consistency regardless of fsync data recovery execution.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  fs/f2fs/f2fs.h     |   1 +
>  fs/f2fs/recovery.c |   6 +++
>  fs/f2fs/segment.c  | 127 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/f2fs/super.c    |   8 +++
>  4 files changed, 142 insertions(+)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4024790028aa..0216282c5b80 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3136,6 +3136,7 @@ void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
>  int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
>  			unsigned int val, int alloc);
>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only);
>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
>  int __init f2fs_create_segment_manager_caches(void);
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 783773e4560d..c75d1cbae4d1 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -795,6 +795,12 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
>  	if (need_writecp) {
>  		set_sbi_flag(sbi, SBI_IS_RECOVERED);
>  
> +		/* recover zoned block devices' write pointer consistency */
> +		if (!err && f2fs_sb_has_blkzoned(sbi)) {
> +			err = f2fs_fix_curseg_write_pointer(sbi, false);

Can we check and reset current segment under SBI_POR_DOING's protection? since
once SBI_POR_DOING flag is cleared, kworker is able to flush dirty data/node,
which may trigger unaligned write command if write pointer is inconsistent.

Thanks,

> +			ret = err;
> +		}
> +
>  		if (!err) {
>  			struct cp_control cpc = {
>  				.reason = CP_RECOVERY,
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 808709581481..2b6e637dd6d3 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -4331,6 +4331,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_BLK_DEV_ZONED
> +
> +static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
> +						  block_t zone_blkaddr)
> +{
> +	int i;
> +
> +	for (i = 0; i < sbi->s_ndevs; i++) {
> +		if (!bdev_is_zoned(FDEV(i).bdev))
> +			continue;
> +		if (sbi->s_ndevs == 1 || (FDEV(i).start_blk <= zone_blkaddr &&
> +					  zone_blkaddr <= FDEV(i).end_blk))
> +			return &FDEV(i);
> +	}
> +
> +	return NULL;
> +}
> +
> +static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type,
> +				    bool check_only)
> +{
> +	struct curseg_info *cs = CURSEG_I(sbi, type);
> +	struct f2fs_dev_info *zbd;
> +	struct blk_zone zone;
> +	unsigned int cs_section, wp_segno, wp_blkoff, nr_zones, wp_sector_off;
> +	block_t cs_zone_block, wp_block, cs_block;
> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> +	sector_t zone_sector;
> +	int err;
> +
> +	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
> +	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
> +	cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff;
> +
> +	zbd = get_target_zoned_dev(sbi, cs_zone_block);
> +	if (!zbd)
> +		return 0;
> +
> +	/* report zone for the sector the curseg points to */
> +	zone_sector = (sector_t)(cs_zone_block - zbd->start_blk)
> +		<< log_sectors_per_block;
> +	nr_zones = 1;
> +	err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones);
> +	if (err) {
> +		f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
> +			 zbd->path, err);
> +		return err;
> +	}
> +
> +	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
> +		return 0;
> +
> +	wp_block = zbd->start_blk + (zone.wp >> log_sectors_per_block);
> +	wp_segno = GET_SEGNO(sbi, wp_block);
> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> +	wp_sector_off = zone.wp & GENMASK(log_sectors_per_block - 1, 0);
> +
> +	if (cs->segno == wp_segno && cs->next_blkoff == wp_blkoff &&
> +	    wp_sector_off == 0)
> +		return 0;
> +
> +	f2fs_notice(sbi, "Unaligned curseg[%d] with write pointer: "
> +		    "curseg[0x%x,0x%x] wp[0x%x,0x%x]",
> +		    type, cs->segno, cs->next_blkoff, wp_segno, wp_blkoff);
> +
> +	/* if check_only is specified, return error without fix */
> +	if (check_only)
> +		return -EIO;
> +
> +	f2fs_notice(sbi, "Assign new section to curseg[%d]: "
> +		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
> +	allocate_segment_by_default(sbi, type, true);
> +
> +	/* check newly assigned zone */
> +	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
> +	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
> +
> +	zbd = get_target_zoned_dev(sbi, cs_zone_block);
> +	if (!zbd)
> +		return 0;
> +
> +	zone_sector = (sector_t)(cs_zone_block - zbd->start_blk)
> +		<< log_sectors_per_block;
> +	nr_zones = 1;
> +	err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones);
> +	if (err) {
> +		f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
> +			 zbd->path, err);
> +		return err;
> +	}
> +
> +	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
> +		return 0;
> +
> +	if (zone.wp != zone.start) {
> +		f2fs_notice(sbi,
> +			    "New section for curseg[%d] is not empty, "
> +			    "run fsck to fix: curseg[0x%x,0x%x]",
> +			    type, cs->segno, cs->next_blkoff);
> +		__set_inuse(sbi, GET_SEGNO(sbi, cs_zone_block));
> +		f2fs_stop_checkpoint(sbi, true);
> +		set_sbi_flag(sbi, SBI_NEED_FSCK);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only)
> +{
> +	int i, ret;
> +
> +	for (i = 0; i < NO_CHECK_TYPE; i++) {
> +		ret = fix_curseg_write_pointer(sbi, i, check_only);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +#else
> +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only)
> +{
> +	return 0;
> +}
> +#endif
> +
>  /*
>   * Update min, max modified time for cost-benefit GC algorithm
>   */
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 1443cee15863..ebd0ae02a260 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3525,6 +3525,14 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  			goto free_meta;
>  		}
>  	}
> +
> +	/* check zoned block devices' write pointer consistency */
> +	if (f2fs_sb_has_blkzoned(sbi)) {
> +		err = f2fs_fix_curseg_write_pointer(sbi, f2fs_readonly(sb));
> +		if (err)
> +			goto free_meta;
> +	}
> +
>  reset_checkpoint:
>  	/* f2fs_recover_fsync_data() cleared this already */
>  	clear_sbi_flag(sbi, SBI_POR_DOING);
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-10-28  6:58 ` [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
@ 2019-11-05 12:22   ` Chao Yu
  2019-11-08  4:27     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 10+ messages in thread
From: Chao Yu @ 2019-11-05 12:22 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, Jaegeuk Kim, linux-f2fs-devel; +Cc: Damien Le Moal

On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote:
> To catch f2fs bugs in write pointer handling code for zoned block
> devices, check write pointers of non-open zones that current segments do
> not point to. Do this check at mount time, after the fsync data recovery
> and current segments' write pointer consistency fix. Check two items
> comparing write pointers with valid block maps in SIT.
> 
> The first item is check for zones with no valid blocks. When there is no
> valid blocks in a zone, the write pointer should be at the start of the
> zone. If not, next write operation to the zone will cause unaligned write
> error. If write pointer is not at the zone start, make mount fail and ask
> users to run fsck.
> 
> The second item is check between the write pointer position and the last
> valid block in the zone. It is unexpected that the last valid block
> position is beyond the write pointer. In such a case, report as the bug.
> Fix is not required for such zone, because the zone is not selected for
> next write operation until the zone get discarded.
> 
> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
> in segment.c also.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  fs/f2fs/f2fs.h    |   3 +
>  fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/f2fs/super.c   |  11 ++--
>  3 files changed, 157 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 0216282c5b80..e8524be17852 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
>  			unsigned int val, int alloc);
>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only);
> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
>  int __init f2fs_create_segment_manager_caches(void);
> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
>  
>  	return test_bit(zno, FDEV(devi).blkz_seq);
>  }
> +
> +#define F2FS_REPORT_NR_ZONES   4096
>  #endif
>  
>  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 2b6e637dd6d3..20ef5b3705e1 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
>  
>  #ifdef CONFIG_BLK_DEV_ZONED
>  
> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
> +				    struct f2fs_dev_info *fdev,
> +				    struct blk_zone *zone)
> +{
> +	unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
> +	block_t zone_block, wp_block, last_valid_block, b;
> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> +	int i;
> +	struct seg_entry *se;
> +
> +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
> +	wp_segno = GET_SEGNO(sbi, wp_block);
> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
> +	zone_segno = GET_SEGNO(sbi, zone_block);
> +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
> +
> +	if (zone_segno >= MAIN_SEGS(sbi))
> +		return 0;
> +
> +	/*
> +	 * If a curseg points to the zone, skip check because the zone
> +	 * may have fsync data that valid block map does not mark.

None-curseg zone may also contain fsynced data as well? Maybe we can only verify
on clean image or recovered image?

> +	 */
> +	for (i = 0; i < NO_CHECK_TYPE; i++)
> +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
> +						   CURSEG_I(sbi, i)->segno))
> +			return 0;
> +
> +	/*
> +	 * Get last valid block of the zone.
> +	 */
> +	last_valid_block = zone_block - 1;
> +	for (s = 0; s < sbi->segs_per_sec; s++) {
> +		segno = zone_segno + s;
> +		se = get_seg_entry(sbi, segno);
> +		for (b = 0; b < sbi->blocks_per_seg; b++)
> +			if (f2fs_test_bit(b, se->cur_valid_map))
> +				last_valid_block = START_BLOCK(sbi, segno) + b;
> +	}

We search bitmap table reversedly.

> +
> +	/*
> +	 * If last valid block is beyond the write pointer, report the
> +	 * inconsistency. This inconsistency does not cause write error
> +	 * because the zone will not be selected for write operation until
> +	 * it get discarded. Just report it.
> +	 */
> +	if (last_valid_block >= wp_block) {
> +		f2fs_notice(sbi, "Valid block beyond write pointer: "
> +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
> +			    GET_SEGNO(sbi, last_valid_block),
> +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
> +			    wp_segno, wp_blkoff);
> +		return 0;
> +	}
> +
> +	/*
> +	 * If there is no valid block in the zone and if write pointer is
> +	 * not at zone start, report the error to run fsck and mark the
> +	 * zone as used.
> +	 */
> +	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
> +		f2fs_notice(sbi,
> +			    "Zone without valid block has non-zero write "
> +			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
> +			    wp_segno, wp_blkoff);
> +		__set_inuse(sbi, zone_segno);

Why do we need to set it inused? if this is necessary, we need to call this
under free_i->segmap_lock.

Thanks,

> +		f2fs_stop_checkpoint(sbi, true);
> +		set_sbi_flag(sbi, SBI_NEED_FSCK);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int check_dev_write_pointer(struct f2fs_sb_info *sbi,
> +				   struct f2fs_dev_info *fdev) {
> +	sector_t nr_sectors = fdev->bdev->bd_part->nr_sects;
> +	sector_t sector = 0;
> +	struct blk_zone *zones;
> +	unsigned int i, nr_zones;
> +	unsigned int n = 0;
> +	int err = -EIO;
> +
> +	if (!bdev_is_zoned(fdev->bdev))
> +		return 0;
> +
> +	zones = f2fs_kzalloc(sbi,
> +			     array_size(F2FS_REPORT_NR_ZONES,
> +					sizeof(struct blk_zone)),
> +			     GFP_KERNEL);
> +	if (!zones)
> +		return -ENOMEM;
> +
> +	/* Get block zones type */
> +	while (zones && sector < nr_sectors) {
> +
> +		nr_zones = F2FS_REPORT_NR_ZONES;
> +		err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones);
> +		if (err)
> +			break;
> +		if (!nr_zones) {
> +			err = -EIO;
> +			break;
> +		}
> +
> +		for (i = 0; i < nr_zones; i++) {
> +			if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
> +				err = check_zone_write_pointer(sbi, fdev,
> +							       &zones[i]);
> +				if (err)
> +					break;
> +			}
> +			sector += zones[i].len;
> +			n++;
> +		}
> +		if (err)
> +			break;
> +	}
> +
> +	kvfree(zones);
> +
> +	return err;
> +}
> +
>  static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
>  						  block_t zone_blkaddr)
>  {
> @@ -4404,6 +4529,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type,
>  		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
>  	allocate_segment_by_default(sbi, type, true);
>  
> +	/* check consistency of the zone curseg pointed to */
> +	if (check_zone_write_pointer(sbi, zbd, &zone))
> +		return -EIO;
> +
>  	/* check newly assigned zone */
>  	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
>  	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
> @@ -4451,11 +4580,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only)
>  
>  	return 0;
>  }
> +
> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
> +{
> +	int i, ret;
> +
> +	for (i = 0; i < sbi->s_ndevs; i++) {
> +		ret = check_dev_write_pointer(sbi, &FDEV(i));
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
>  #else
>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only)
>  {
>  	return 0;
>  }
> +
> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
> +{
> +	return 0;
> +}
>  #endif
>  
>  /*
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index ebd0ae02a260..357b8f7fd333 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>  	if (!FDEV(devi).blkz_seq)
>  		return -ENOMEM;
>  
> -#define F2FS_REPORT_NR_ZONES   4096
> -
>  	zones = f2fs_kzalloc(sbi,
>  			     array_size(F2FS_REPORT_NR_ZONES,
>  					sizeof(struct blk_zone)),
> @@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  
>  		err = f2fs_recover_fsync_data(sbi, false);
>  		if (err < 0) {
> -			if (err != -ENOMEM)
> +			if (err != -ENOMEM &&
> +			    !is_sbi_flag_set(sbi, SBI_NEED_FSCK))
>  				skip_recovery = true;
>  			need_fsck = true;
>  			f2fs_err(sbi, "Cannot recover all fsync data errno=%d",
> @@ -3526,8 +3525,12 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  		}
>  	}
>  
> -	/* check zoned block devices' write pointer consistency */
> +	/* fix and check zoned block devices' write pointer consistency */
>  	if (f2fs_sb_has_blkzoned(sbi)) {
> +		err = f2fs_check_write_pointer(sbi);
> +		if (err)
> +			goto free_meta;
> +
>  		err = f2fs_fix_curseg_write_pointer(sbi, f2fs_readonly(sb));
>  		if (err)
>  			goto free_meta;
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones
  2019-11-05 12:03   ` Chao Yu
@ 2019-11-08  4:09     ` Shinichiro Kawasaki
  0 siblings, 0 replies; 10+ messages in thread
From: Shinichiro Kawasaki @ 2019-11-08  4:09 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 05, 2019 / 20:03, Chao Yu wrote:
> On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote:
> > On sudden f2fs shutdown, write pointers of zoned block devices can go
> > further but f2fs meta data keeps current segments at positions before the
> > write operations. After remounting the f2fs, this inconsistency causes
> > write operations not at write pointers and "Unaligned write command"
> > error is reported.
> > 
> > To avoid the error, compare current segments with write pointers of open
> > zones the current segments point to, during mount operation. If the write
> > pointer position is not aligned with the current segment position, assign
> > a new zone to the current segments. Also check the newly assigned zone
> > has write pointer at zone start. If not, make mount fail and ask users to
> > run fsck.
> > 
> > Perform the consistency check twice. Once during fsync recovery. Not to
> > lose the fsync data, do the check after fsync data gets restored and
> > before checkpoint commit which flushes data at current segment positions.
> > The second check is done at end of f2fs_fill_super() to make sure the
> > write pointer consistency regardless of fsync data recovery execution.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  fs/f2fs/f2fs.h     |   1 +
> >  fs/f2fs/recovery.c |   6 +++
> >  fs/f2fs/segment.c  | 127 +++++++++++++++++++++++++++++++++++++++++++++
> >  fs/f2fs/super.c    |   8 +++
> >  4 files changed, 142 insertions(+)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 4024790028aa..0216282c5b80 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -3136,6 +3136,7 @@ void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
> >  int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> >  			unsigned int val, int alloc);
> >  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> > +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only);
> >  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
> >  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
> >  int __init f2fs_create_segment_manager_caches(void);
> > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> > index 783773e4560d..c75d1cbae4d1 100644
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -795,6 +795,12 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
> >  	if (need_writecp) {
> >  		set_sbi_flag(sbi, SBI_IS_RECOVERED);
> >  
> > +		/* recover zoned block devices' write pointer consistency */
> > +		if (!err && f2fs_sb_has_blkzoned(sbi)) {
> > +			err = f2fs_fix_curseg_write_pointer(sbi, false);
> 
> Can we check and reset current segment under SBI_POR_DOING's protection? since
> once SBI_POR_DOING flag is cleared, kworker is able to flush dirty data/node,
> which may trigger unaligned write command if write pointer is inconsistent.

Yes, will move that part before the SBI_POR_DOING flag clear. Thanks.

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-11-05 12:22   ` Chao Yu
@ 2019-11-08  4:27     ` Shinichiro Kawasaki
  2019-11-11  3:27       ` Chao Yu
  0 siblings, 1 reply; 10+ messages in thread
From: Shinichiro Kawasaki @ 2019-11-08  4:27 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 05, 2019 / 20:22, Chao Yu wrote:
> On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote:
> > To catch f2fs bugs in write pointer handling code for zoned block
> > devices, check write pointers of non-open zones that current segments do
> > not point to. Do this check at mount time, after the fsync data recovery
> > and current segments' write pointer consistency fix. Check two items
> > comparing write pointers with valid block maps in SIT.
> > 
> > The first item is check for zones with no valid blocks. When there is no
> > valid blocks in a zone, the write pointer should be at the start of the
> > zone. If not, next write operation to the zone will cause unaligned write
> > error. If write pointer is not at the zone start, make mount fail and ask
> > users to run fsck.
> > 
> > The second item is check between the write pointer position and the last
> > valid block in the zone. It is unexpected that the last valid block
> > position is beyond the write pointer. In such a case, report as the bug.
> > Fix is not required for such zone, because the zone is not selected for
> > next write operation until the zone get discarded.
> > 
> > Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
> > in segment.c also.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  fs/f2fs/f2fs.h    |   3 +
> >  fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/f2fs/super.c   |  11 ++--
> >  3 files changed, 157 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 0216282c5b80..e8524be17852 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> >  			unsigned int val, int alloc);
> >  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> >  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only);
> > +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
> >  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
> >  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
> >  int __init f2fs_create_segment_manager_caches(void);
> > @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
> >  
> >  	return test_bit(zno, FDEV(devi).blkz_seq);
> >  }
> > +
> > +#define F2FS_REPORT_NR_ZONES   4096
> >  #endif
> >  
> >  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index 2b6e637dd6d3..20ef5b3705e1 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
> >  
> >  #ifdef CONFIG_BLK_DEV_ZONED
> >  
> > +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
> > +				    struct f2fs_dev_info *fdev,
> > +				    struct blk_zone *zone)
> > +{
> > +	unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
> > +	block_t zone_block, wp_block, last_valid_block, b;
> > +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> > +	int i;
> > +	struct seg_entry *se;
> > +
> > +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
> > +	wp_segno = GET_SEGNO(sbi, wp_block);
> > +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> > +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
> > +	zone_segno = GET_SEGNO(sbi, zone_block);
> > +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
> > +
> > +	if (zone_segno >= MAIN_SEGS(sbi))
> > +		return 0;
> > +
> > +	/*
> > +	 * If a curseg points to the zone, skip check because the zone
> > +	 * may have fsync data that valid block map does not mark.
> 
> None-curseg zone may also contain fsynced data as well? Maybe we can only verify
> on clean image or recovered image?

Right. This function for none-curseg zones should be called after fsync data
recovery. I think my comment above is confusing. The point is that this
function is for none-curseg zones, and other function covers check for curseg
zones. Let me revise the comment as follows:

     Skip check of zones cursegs point to, since fix_curseg_write_pointer()
     checks them.

> 
> > +	 */
> > +	for (i = 0; i < NO_CHECK_TYPE; i++)
> > +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
> > +						   CURSEG_I(sbi, i)->segno))
> > +			return 0;
> > +
> > +	/*
> > +	 * Get last valid block of the zone.
> > +	 */
> > +	last_valid_block = zone_block - 1;
> > +	for (s = 0; s < sbi->segs_per_sec; s++) {
> > +		segno = zone_segno + s;
> > +		se = get_seg_entry(sbi, segno);
> > +		for (b = 0; b < sbi->blocks_per_seg; b++)
> > +			if (f2fs_test_bit(b, se->cur_valid_map))
> > +				last_valid_block = START_BLOCK(sbi, segno) + b;
> > +	}
> 
> We search bitmap table reversedly.

Yes, will reverse the loops in the next post.

> 
> > +
> > +	/*
> > +	 * If last valid block is beyond the write pointer, report the
> > +	 * inconsistency. This inconsistency does not cause write error
> > +	 * because the zone will not be selected for write operation until
> > +	 * it get discarded. Just report it.
> > +	 */
> > +	if (last_valid_block >= wp_block) {
> > +		f2fs_notice(sbi, "Valid block beyond write pointer: "
> > +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
> > +			    GET_SEGNO(sbi, last_valid_block),
> > +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
> > +			    wp_segno, wp_blkoff);
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * If there is no valid block in the zone and if write pointer is
> > +	 * not at zone start, report the error to run fsck and mark the
> > +	 * zone as used.
> > +	 */
> > +	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
> > +		f2fs_notice(sbi,
> > +			    "Zone without valid block has non-zero write "
> > +			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
> > +			    wp_segno, wp_blkoff);
> > +		__set_inuse(sbi, zone_segno);
> 
> Why do we need to set it inused? if this is necessary, we need to call this
> under free_i->segmap_lock.

I once thought that I need to set inconsistent zones in use, because I observed
that write operation happened after zone consistency check failure (in
fill_super() after free_meta label). It caused unaligned writer error. To avoid
it, I added __set_inuse() to keep inconsistent zones not selected for the write
target.

But that write operation happened because the write pointer fix curseg was done
out of the SBI_POR_DOING protection. Now I learned SBI_POR_DOING can protect
write operation, and I don't think set in use for the inconsistent zones is
required. Will remove __set_inuse() calls from this patch and the first patch.

Thanks!

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-11-08  4:27     ` Shinichiro Kawasaki
@ 2019-11-11  3:27       ` Chao Yu
  2019-11-13  1:41         ` Shinichiro Kawasaki
  0 siblings, 1 reply; 10+ messages in thread
From: Chao Yu @ 2019-11-11  3:27 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On 2019/11/8 12:27, Shinichiro Kawasaki wrote:
> On Nov 05, 2019 / 20:22, Chao Yu wrote:
>> On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote:
>>> To catch f2fs bugs in write pointer handling code for zoned block
>>> devices, check write pointers of non-open zones that current segments do
>>> not point to. Do this check at mount time, after the fsync data recovery
>>> and current segments' write pointer consistency fix. Check two items
>>> comparing write pointers with valid block maps in SIT.
>>>
>>> The first item is check for zones with no valid blocks. When there is no
>>> valid blocks in a zone, the write pointer should be at the start of the
>>> zone. If not, next write operation to the zone will cause unaligned write
>>> error. If write pointer is not at the zone start, make mount fail and ask
>>> users to run fsck.
>>>
>>> The second item is check between the write pointer position and the last
>>> valid block in the zone. It is unexpected that the last valid block
>>> position is beyond the write pointer. In such a case, report as the bug.
>>> Fix is not required for such zone, because the zone is not selected for
>>> next write operation until the zone get discarded.
>>>
>>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
>>> in segment.c also.
>>>
>>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>> ---
>>>  fs/f2fs/f2fs.h    |   3 +
>>>  fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++
>>>  fs/f2fs/super.c   |  11 ++--
>>>  3 files changed, 157 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 0216282c5b80..e8524be17852 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
>>>  			unsigned int val, int alloc);
>>>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only);
>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
>>>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
>>>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
>>>  int __init f2fs_create_segment_manager_caches(void);
>>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
>>>  
>>>  	return test_bit(zno, FDEV(devi).blkz_seq);
>>>  }
>>> +
>>> +#define F2FS_REPORT_NR_ZONES   4096
>>>  #endif
>>>  
>>>  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index 2b6e637dd6d3..20ef5b3705e1 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
>>>  
>>>  #ifdef CONFIG_BLK_DEV_ZONED
>>>  
>>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
>>> +				    struct f2fs_dev_info *fdev,
>>> +				    struct blk_zone *zone)
>>> +{
>>> +	unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
>>> +	block_t zone_block, wp_block, last_valid_block, b;
>>> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
>>> +	int i;
>>> +	struct seg_entry *se;
>>> +
>>> +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
>>> +	wp_segno = GET_SEGNO(sbi, wp_block);
>>> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
>>> +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
>>> +	zone_segno = GET_SEGNO(sbi, zone_block);
>>> +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
>>> +
>>> +	if (zone_segno >= MAIN_SEGS(sbi))
>>> +		return 0;
>>> +
>>> +	/*
>>> +	 * If a curseg points to the zone, skip check because the zone
>>> +	 * may have fsync data that valid block map does not mark.
>>
>> None-curseg zone may also contain fsynced data as well? Maybe we can only verify
>> on clean image or recovered image?
> 
> Right. This function for none-curseg zones should be called after fsync data

If so, any place to check recovery is done? You know, user can choose to skip
recovery by using disable_roll_forward/norecovery mount option.

> recovery. I think my comment above is confusing. The point is that this
> function is for none-curseg zones, and other function covers check for curseg
> zones. Let me revise the comment as follows:
> 
>      Skip check of zones cursegs point to, since fix_curseg_write_pointer()
>      checks them.
> 
>>
>>> +	 */
>>> +	for (i = 0; i < NO_CHECK_TYPE; i++)
>>> +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
>>> +						   CURSEG_I(sbi, i)->segno))
>>> +			return 0;
>>> +
>>> +	/*
>>> +	 * Get last valid block of the zone.
>>> +	 */
>>> +	last_valid_block = zone_block - 1;
>>> +	for (s = 0; s < sbi->segs_per_sec; s++) {
>>> +		segno = zone_segno + s;
>>> +		se = get_seg_entry(sbi, segno);
>>> +		for (b = 0; b < sbi->blocks_per_seg; b++)
>>> +			if (f2fs_test_bit(b, se->cur_valid_map))
>>> +				last_valid_block = START_BLOCK(sbi, segno) + b;
>>> +	}
>>
>> We search bitmap table reversedly.
> 
> Yes, will reverse the loops in the next post.
> 
>>
>>> +
>>> +	/*
>>> +	 * If last valid block is beyond the write pointer, report the
>>> +	 * inconsistency. This inconsistency does not cause write error
>>> +	 * because the zone will not be selected for write operation until
>>> +	 * it get discarded. Just report it.
>>> +	 */
>>> +	if (last_valid_block >= wp_block) {
>>> +		f2fs_notice(sbi, "Valid block beyond write pointer: "
>>> +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
>>> +			    GET_SEGNO(sbi, last_valid_block),
>>> +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
>>> +			    wp_segno, wp_blkoff);
>>> +		return 0;
>>> +	}
>>> +
>>> +	/*
>>> +	 * If there is no valid block in the zone and if write pointer is
>>> +	 * not at zone start, report the error to run fsck and mark the
>>> +	 * zone as used.
>>> +	 */
>>> +	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
>>> +		f2fs_notice(sbi,
>>> +			    "Zone without valid block has non-zero write "
>>> +			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
>>> +			    wp_segno, wp_blkoff);
>>> +		__set_inuse(sbi, zone_segno);
>>
>> Why do we need to set it inused? if this is necessary, we need to call this
>> under free_i->segmap_lock.
> 
> I once thought that I need to set inconsistent zones in use, because I observed
> that write operation happened after zone consistency check failure (in
> fill_super() after free_meta label). It caused unaligned writer error. To avoid
> it, I added __set_inuse() to keep inconsistent zones not selected for the write
> target.
> 
> But that write operation happened because the write pointer fix curseg was done
> out of the SBI_POR_DOING protection. Now I learned SBI_POR_DOING can protect
> write operation, and I don't think set in use for the inconsistent zones is
> required. Will remove __set_inuse() calls from this patch and the first patch.

Also f2fs_stop_checkpoint() will stop any data/node/meta writeback, so it'd be
safe here.

Thanks,

> 
> Thanks!
> 
> --
> Best Regards,
> Shin'ichiro Kawasaki.
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-11-11  3:27       ` Chao Yu
@ 2019-11-13  1:41         ` Shinichiro Kawasaki
  2019-11-14  8:27           ` Shinichiro Kawasaki
  0 siblings, 1 reply; 10+ messages in thread
From: Shinichiro Kawasaki @ 2019-11-13  1:41 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 11, 2019 / 11:27, Chao Yu wrote:
> On 2019/11/8 12:27, Shinichiro Kawasaki wrote:
> > On Nov 05, 2019 / 20:22, Chao Yu wrote:
> >> On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote:
> >>> To catch f2fs bugs in write pointer handling code for zoned block
> >>> devices, check write pointers of non-open zones that current segments do
> >>> not point to. Do this check at mount time, after the fsync data recovery
> >>> and current segments' write pointer consistency fix. Check two items
> >>> comparing write pointers with valid block maps in SIT.
> >>>
> >>> The first item is check for zones with no valid blocks. When there is no
> >>> valid blocks in a zone, the write pointer should be at the start of the
> >>> zone. If not, next write operation to the zone will cause unaligned write
> >>> error. If write pointer is not at the zone start, make mount fail and ask
> >>> users to run fsck.
> >>>
> >>> The second item is check between the write pointer position and the last
> >>> valid block in the zone. It is unexpected that the last valid block
> >>> position is beyond the write pointer. In such a case, report as the bug.
> >>> Fix is not required for such zone, because the zone is not selected for
> >>> next write operation until the zone get discarded.
> >>>
> >>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
> >>> in segment.c also.
> >>>
> >>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >>> ---
> >>>  fs/f2fs/f2fs.h    |   3 +
> >>>  fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++
> >>>  fs/f2fs/super.c   |  11 ++--
> >>>  3 files changed, 157 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>> index 0216282c5b80..e8524be17852 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> >>>  			unsigned int val, int alloc);
> >>>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> >>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only);
> >>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
> >>>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
> >>>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
> >>>  int __init f2fs_create_segment_manager_caches(void);
> >>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
> >>>  
> >>>  	return test_bit(zno, FDEV(devi).blkz_seq);
> >>>  }
> >>> +
> >>> +#define F2FS_REPORT_NR_ZONES   4096
> >>>  #endif
> >>>  
> >>>  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >>> index 2b6e637dd6d3..20ef5b3705e1 100644
> >>> --- a/fs/f2fs/segment.c
> >>> +++ b/fs/f2fs/segment.c
> >>> @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
> >>>  
> >>>  #ifdef CONFIG_BLK_DEV_ZONED
> >>>  
> >>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
> >>> +				    struct f2fs_dev_info *fdev,
> >>> +				    struct blk_zone *zone)
> >>> +{
> >>> +	unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
> >>> +	block_t zone_block, wp_block, last_valid_block, b;
> >>> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> >>> +	int i;
> >>> +	struct seg_entry *se;
> >>> +
> >>> +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
> >>> +	wp_segno = GET_SEGNO(sbi, wp_block);
> >>> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> >>> +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
> >>> +	zone_segno = GET_SEGNO(sbi, zone_block);
> >>> +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
> >>> +
> >>> +	if (zone_segno >= MAIN_SEGS(sbi))
> >>> +		return 0;
> >>> +
> >>> +	/*
> >>> +	 * If a curseg points to the zone, skip check because the zone
> >>> +	 * may have fsync data that valid block map does not mark.
> >>
> >> None-curseg zone may also contain fsynced data as well? Maybe we can only verify
> >> on clean image or recovered image?
> > 
> > Right. This function for none-curseg zones should be called after fsync data
> 
> If so, any place to check recovery is done? You know, user can choose to skip
> recovery by using disable_roll_forward/norecovery mount option.

Ah, disable_roll_forward/norecovery mount option needs some care. The patch I
posted did not care it well enough.

When disable_roll_forward/norecovery mount option is set, I think the function
for none-curseg zones should be called only if there is no fsync data. If fsync
data exists, mount fails regardless of write pointer status.

I will modify the function calling step and conditions in the next patch.

> 
> > recovery. I think my comment above is confusing. The point is that this
> > function is for none-curseg zones, and other function covers check for curseg
> > zones. Let me revise the comment as follows:
> > 
> >      Skip check of zones cursegs point to, since fix_curseg_write_pointer()
> >      checks them.
> > 
> >>
> >>> +	 */
> >>> +	for (i = 0; i < NO_CHECK_TYPE; i++)
> >>> +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
> >>> +						   CURSEG_I(sbi, i)->segno))
> >>> +			return 0;
> >>> +
> >>> +	/*
> >>> +	 * Get last valid block of the zone.
> >>> +	 */
> >>> +	last_valid_block = zone_block - 1;
> >>> +	for (s = 0; s < sbi->segs_per_sec; s++) {
> >>> +		segno = zone_segno + s;
> >>> +		se = get_seg_entry(sbi, segno);
> >>> +		for (b = 0; b < sbi->blocks_per_seg; b++)
> >>> +			if (f2fs_test_bit(b, se->cur_valid_map))
> >>> +				last_valid_block = START_BLOCK(sbi, segno) + b;
> >>> +	}
> >>
> >> We search bitmap table reversedly.
> > 
> > Yes, will reverse the loops in the next post.
> > 
> >>
> >>> +
> >>> +	/*
> >>> +	 * If last valid block is beyond the write pointer, report the
> >>> +	 * inconsistency. This inconsistency does not cause write error
> >>> +	 * because the zone will not be selected for write operation until
> >>> +	 * it get discarded. Just report it.
> >>> +	 */
> >>> +	if (last_valid_block >= wp_block) {
> >>> +		f2fs_notice(sbi, "Valid block beyond write pointer: "
> >>> +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
> >>> +			    GET_SEGNO(sbi, last_valid_block),
> >>> +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
> >>> +			    wp_segno, wp_blkoff);
> >>> +		return 0;
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * If there is no valid block in the zone and if write pointer is
> >>> +	 * not at zone start, report the error to run fsck and mark the
> >>> +	 * zone as used.
> >>> +	 */
> >>> +	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
> >>> +		f2fs_notice(sbi,
> >>> +			    "Zone without valid block has non-zero write "
> >>> +			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
> >>> +			    wp_segno, wp_blkoff);
> >>> +		__set_inuse(sbi, zone_segno);
> >>
> >> Why do we need to set it inused? if this is necessary, we need to call this
> >> under free_i->segmap_lock.
> > 
> > I once thought that I need to set inconsistent zones in use, because I observed
> > that write operation happened after zone consistency check failure (in
> > fill_super() after free_meta label). It caused unaligned writer error. To avoid
> > it, I added __set_inuse() to keep inconsistent zones not selected for the write
> > target.
> > 
> > But that write operation happened because the write pointer fix curseg was done
> > out of the SBI_POR_DOING protection. Now I learned SBI_POR_DOING can protect
> > write operation, and I don't think set in use for the inconsistent zones is
> > required. Will remove __set_inuse() calls from this patch and the first patch.
> 
> Also f2fs_stop_checkpoint() will stop any data/node/meta writeback, so it'd be
> safe here.

Ok, I note it. Now I'm looking at the code again, and I think the write pointer
fix for cursegs can be done within SBI_POR_DOING protection. At this moment, I
think f2fs_stop_checkpoint() is not necessary.

Thanks!

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-11-13  1:41         ` Shinichiro Kawasaki
@ 2019-11-14  8:27           ` Shinichiro Kawasaki
  0 siblings, 0 replies; 10+ messages in thread
From: Shinichiro Kawasaki @ 2019-11-14  8:27 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 13, 2019 / 10:41, Shin'ichiro Kawasaki wrote:
> On Nov 11, 2019 / 11:27, Chao Yu wrote:
> > On 2019/11/8 12:27, Shinichiro Kawasaki wrote:
> > > On Nov 05, 2019 / 20:22, Chao Yu wrote:
> > >> On 2019/10/28 14:58, Shin'ichiro Kawasaki wrote:
> > >>> To catch f2fs bugs in write pointer handling code for zoned block
> > >>> devices, check write pointers of non-open zones that current segments do
> > >>> not point to. Do this check at mount time, after the fsync data recovery
> > >>> and current segments' write pointer consistency fix. Check two items
> > >>> comparing write pointers with valid block maps in SIT.
> > >>>
> > >>> The first item is check for zones with no valid blocks. When there is no
> > >>> valid blocks in a zone, the write pointer should be at the start of the
> > >>> zone. If not, next write operation to the zone will cause unaligned write
> > >>> error. If write pointer is not at the zone start, make mount fail and ask
> > >>> users to run fsck.
> > >>>
> > >>> The second item is check between the write pointer position and the last
> > >>> valid block in the zone. It is unexpected that the last valid block
> > >>> position is beyond the write pointer. In such a case, report as the bug.
> > >>> Fix is not required for such zone, because the zone is not selected for
> > >>> next write operation until the zone get discarded.
> > >>>
> > >>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
> > >>> in segment.c also.
> > >>>
> > >>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > >>> ---
> > >>>  fs/f2fs/f2fs.h    |   3 +
> > >>>  fs/f2fs/segment.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++
> > >>>  fs/f2fs/super.c   |  11 ++--
> > >>>  3 files changed, 157 insertions(+), 4 deletions(-)
> > >>>
> > >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > >>> index 0216282c5b80..e8524be17852 100644
> > >>> --- a/fs/f2fs/f2fs.h
> > >>> +++ b/fs/f2fs/f2fs.h
> > >>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> > >>>  			unsigned int val, int alloc);
> > >>>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> > >>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi, bool check_only);
> > >>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
> > >>>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
> > >>>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
> > >>>  int __init f2fs_create_segment_manager_caches(void);
> > >>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
> > >>>  
> > >>>  	return test_bit(zno, FDEV(devi).blkz_seq);
> > >>>  }
> > >>> +
> > >>> +#define F2FS_REPORT_NR_ZONES   4096
> > >>>  #endif
> > >>>  
> > >>>  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
> > >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > >>> index 2b6e637dd6d3..20ef5b3705e1 100644
> > >>> --- a/fs/f2fs/segment.c
> > >>> +++ b/fs/f2fs/segment.c
> > >>> @@ -4333,6 +4333,131 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
> > >>>  
> > >>>  #ifdef CONFIG_BLK_DEV_ZONED
> > >>>  
> > >>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
> > >>> +				    struct f2fs_dev_info *fdev,
> > >>> +				    struct blk_zone *zone)
> > >>> +{
> > >>> +	unsigned int s, wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
> > >>> +	block_t zone_block, wp_block, last_valid_block, b;
> > >>> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> > >>> +	int i;
> > >>> +	struct seg_entry *se;
> > >>> +
> > >>> +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
> > >>> +	wp_segno = GET_SEGNO(sbi, wp_block);
> > >>> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> > >>> +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
> > >>> +	zone_segno = GET_SEGNO(sbi, zone_block);
> > >>> +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
> > >>> +
> > >>> +	if (zone_segno >= MAIN_SEGS(sbi))
> > >>> +		return 0;
> > >>> +
> > >>> +	/*
> > >>> +	 * If a curseg points to the zone, skip check because the zone
> > >>> +	 * may have fsync data that valid block map does not mark.
> > >>
> > >> None-curseg zone may also contain fsynced data as well? Maybe we can only verify
> > >> on clean image or recovered image?
> > > 
> > > Right. This function for none-curseg zones should be called after fsync data
> > 
> > If so, any place to check recovery is done? You know, user can choose to skip
> > recovery by using disable_roll_forward/norecovery mount option.
> 
> Ah, disable_roll_forward/norecovery mount option needs some care. The patch I
> posted did not care it well enough.
> 
> When disable_roll_forward/norecovery mount option is set, I think the function
> for none-curseg zones should be called only if there is no fsync data. If fsync
> data exists, mount fails regardless of write pointer status.
> 
> I will modify the function calling step and conditions in the next patch.

I have sent out the v3 patch series, which reflects the comments above.

> 
> > 
> > > recovery. I think my comment above is confusing. The point is that this
> > > function is for none-curseg zones, and other function covers check for curseg
> > > zones. Let me revise the comment as follows:
> > > 
> > >      Skip check of zones cursegs point to, since fix_curseg_write_pointer()
> > >      checks them.
> > > 
> > >>
> > >>> +	 */
> > >>> +	for (i = 0; i < NO_CHECK_TYPE; i++)
> > >>> +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
> > >>> +						   CURSEG_I(sbi, i)->segno))
> > >>> +			return 0;
> > >>> +
> > >>> +	/*
> > >>> +	 * Get last valid block of the zone.
> > >>> +	 */
> > >>> +	last_valid_block = zone_block - 1;
> > >>> +	for (s = 0; s < sbi->segs_per_sec; s++) {
> > >>> +		segno = zone_segno + s;
> > >>> +		se = get_seg_entry(sbi, segno);
> > >>> +		for (b = 0; b < sbi->blocks_per_seg; b++)
> > >>> +			if (f2fs_test_bit(b, se->cur_valid_map))
> > >>> +				last_valid_block = START_BLOCK(sbi, segno) + b;
> > >>> +	}
> > >>
> > >> We search bitmap table reversedly.
> > > 
> > > Yes, will reverse the loops in the next post.
> > > 
> > >>
> > >>> +
> > >>> +	/*
> > >>> +	 * If last valid block is beyond the write pointer, report the
> > >>> +	 * inconsistency. This inconsistency does not cause write error
> > >>> +	 * because the zone will not be selected for write operation until
> > >>> +	 * it get discarded. Just report it.
> > >>> +	 */
> > >>> +	if (last_valid_block >= wp_block) {
> > >>> +		f2fs_notice(sbi, "Valid block beyond write pointer: "
> > >>> +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
> > >>> +			    GET_SEGNO(sbi, last_valid_block),
> > >>> +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
> > >>> +			    wp_segno, wp_blkoff);
> > >>> +		return 0;
> > >>> +	}
> > >>> +
> > >>> +	/*
> > >>> +	 * If there is no valid block in the zone and if write pointer is
> > >>> +	 * not at zone start, report the error to run fsck and mark the
> > >>> +	 * zone as used.
> > >>> +	 */
> > >>> +	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
> > >>> +		f2fs_notice(sbi,
> > >>> +			    "Zone without valid block has non-zero write "
> > >>> +			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
> > >>> +			    wp_segno, wp_blkoff);
> > >>> +		__set_inuse(sbi, zone_segno);
> > >>
> > >> Why do we need to set it inused? if this is necessary, we need to call this
> > >> under free_i->segmap_lock.
> > > 
> > > I once thought that I need to set inconsistent zones in use, because I observed
> > > that write operation happened after zone consistency check failure (in
> > > fill_super() after free_meta label). It caused unaligned writer error. To avoid
> > > it, I added __set_inuse() to keep inconsistent zones not selected for the write
> > > target.
> > > 
> > > But that write operation happened because the write pointer fix curseg was done
> > > out of the SBI_POR_DOING protection. Now I learned SBI_POR_DOING can protect
> > > write operation, and I don't think set in use for the inconsistent zones is
> > > required. Will remove __set_inuse() calls from this patch and the first patch.
> > 
> > Also f2fs_stop_checkpoint() will stop any data/node/meta writeback, so it'd be
> > safe here.
> 
> Ok, I note it. Now I'm looking at the code again, and I think the write pointer
> fix for cursegs can be done within SBI_POR_DOING protection. At this moment, I
> think f2fs_stop_checkpoint() is not necessary.

My comment above could be confusing, which is about the 1st patch. I removed
f2fs_stop_checkpoint() from the v3 1st patch.

As for none-curseg zones cannot be done within SBI_POR_DOING, still
f2fs_stop_checkpoint() is required. I kept it in v3 2nd patch.

Thanks in advance for the reviews and comments on the v3 seires.

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-28  6:57 [f2fs-dev] [PATCH v2 0/2] f2fs: Check write pointers of zoned block devices Shin'ichiro Kawasaki
2019-10-28  6:58 ` [f2fs-dev] [PATCH v2 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki
2019-11-05 12:03   ` Chao Yu
2019-11-08  4:09     ` Shinichiro Kawasaki
2019-10-28  6:58 ` [f2fs-dev] [PATCH v2 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
2019-11-05 12:22   ` Chao Yu
2019-11-08  4:27     ` Shinichiro Kawasaki
2019-11-11  3:27       ` Chao Yu
2019-11-13  1:41         ` Shinichiro Kawasaki
2019-11-14  8:27           ` Shinichiro Kawasaki

Linux-f2fs-devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-f2fs-devel/0 linux-f2fs-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-f2fs-devel linux-f2fs-devel/ https://lore.kernel.org/linux-f2fs-devel \
		linux-f2fs-devel@lists.sourceforge.net
	public-inbox-index linux-f2fs-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/net.sourceforge.lists.linux-f2fs-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git