Linux-f2fs-devel Archive on lore.kernel.org
 help / color / Atom feed
* [f2fs-dev] [PATCH v3 0/2] f2fs: Check write pointers of zoned block devices
@ 2019-11-14  8:19 Shin'ichiro Kawasaki
  2019-11-14  8:19 ` [f2fs-dev] [PATCH v3 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki
  2019-11-14  8:19 ` [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
  0 siblings, 2 replies; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-11-14  8:19 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

On sudden f2fs shutdown, zoned block device status and f2fs meta data can be
inconsistent. When f2fs shutdown happens during write operations, write pointers
on the device go forward but the f2fs meta data does not reflect the write
pointer progress. This inconsistency will eventually cause "Unaligned write
command" error when restarting write operation after the next mount.

This error is observed with xfstests test case generic/388, which enforces
sudden shutdown during write operation and checks the file system recovery.

This patch series adds a feature to f2fs to check and fix the write pointer
consistency when zoned block devices get mounted. Per discussion on the list,
implement two checks: a check for open zones and a check for non-open zones.

The first patch add the check for open zones that current segments point to.
Check write pointer consistency with current segment positions recorded in CP,
and if they are inconsistent, assign a new zone to the current segment. When
fsync data exists, check and fix the consistency after fsync data recovery.

The second patch adds the check for non-open zones that current segments do not
point to. Compare write pointers with valid block maps in SIT, and if they are
inconsistent, report the failure. Even if fixes are required, do not have kernel
fix because SIT consistency with CP is not guaranteed because of hardware
failure or software bug possibility. In such check failure cases, fail the mount
to ask users to run fsck. Corresponding patch series is being posted for
fsck.f2fs to add the write pointer consistency check feature.

Thank goes to Chao Yu for the detailed discussion on the list.

Changes from v2:
* 1st patch: Fix write pointer in SBI_POR_DOING guard
             Removed __set_inuse() and f2fs_stop_checkpoint()
* 2nd patch: Reversed bitmap search and removed __set_inuse()
             Changed condition for disable_roll_forward/norecovery mount options

Changes from v1:
* 2nd patch: Added check to ensure zones are in main segments

Shin'ichiro Kawasaki (2):
  f2fs: Check write pointer consistency of open zones
  f2fs: Check write pointer consistency of non-open zones

 fs/f2fs/f2fs.h     |   4 +
 fs/f2fs/recovery.c |  17 ++-
 fs/f2fs/segment.c  | 269 +++++++++++++++++++++++++++++++++++++++++++++
 fs/f2fs/super.c    |  16 ++-
 4 files changed, 301 insertions(+), 5 deletions(-)

-- 
2.23.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [f2fs-dev] [PATCH v3 1/2] f2fs: Check write pointer consistency of open zones
  2019-11-14  8:19 [f2fs-dev] [PATCH v3 0/2] f2fs: Check write pointers of zoned block devices Shin'ichiro Kawasaki
@ 2019-11-14  8:19 ` Shin'ichiro Kawasaki
  2019-11-25  6:59   ` Chao Yu
  2019-11-14  8:19 ` [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
  1 sibling, 1 reply; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-11-14  8:19 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

On sudden f2fs shutdown, write pointers of zoned block devices can go
further but f2fs meta data keeps current segments at positions before the
write operations. After remounting the f2fs, this inconsistency causes
write operations not at write pointers and "Unaligned write command"
error is reported.

To avoid the error, compare current segments with write pointers of open
zones the current segments point to, during mount operation. If the write
pointer position is not aligned with the current segment position, assign
a new zone to the current segment. Also check the newly assigned zone has
write pointer at zone start. If not, make mount fail and ask users to run
fsck.

Perform the consistency check during fsync recovery. Not to lose the
fsync data, do the check after fsync data gets restored and before
checkpoint commit which flushes data at current segment positions. Not to
cause conflict with kworker's dirfy data/node flush, do the fix within
SBI_POR_DOING protection.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 fs/f2fs/f2fs.h     |   1 +
 fs/f2fs/recovery.c |  17 ++++++-
 fs/f2fs/segment.c  | 120 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4024790028aa..a2e24718c13b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3136,6 +3136,7 @@ void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
 int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
 			unsigned int val, int alloc);
 void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
+int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
 int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
 void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
 int __init f2fs_create_segment_manager_caches(void);
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 783773e4560d..712054ed8d64 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -784,9 +784,22 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
 	if (err) {
 		truncate_inode_pages_final(NODE_MAPPING(sbi));
 		truncate_inode_pages_final(META_MAPPING(sbi));
-	} else {
-		clear_sbi_flag(sbi, SBI_POR_DOING);
 	}
+
+	/*
+	 * If fsync data succeeds or there is no fsync data to recover,
+	 * and the f2fs is not read only, check and fix zoned block devices'
+	 * write pointer consistency.
+	 */
+	if (!ret && !err && !f2fs_readonly(sbi->sb)
+	    && f2fs_sb_has_blkzoned(sbi)) {
+		err = f2fs_fix_curseg_write_pointer(sbi);
+		ret = err;
+	}
+
+	if (!err)
+		clear_sbi_flag(sbi, SBI_POR_DOING);
+
 	mutex_unlock(&sbi->cp_mutex);
 
 	/* let's drop all the directory inodes for clean checkpoint */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 808709581481..6ece146dab34 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -4331,6 +4331,126 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
 	return 0;
 }
 
+#ifdef CONFIG_BLK_DEV_ZONED
+
+static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
+						  block_t zone_blkaddr)
+{
+	int i;
+
+	for (i = 0; i < sbi->s_ndevs; i++) {
+		if (!bdev_is_zoned(FDEV(i).bdev))
+			continue;
+		if (sbi->s_ndevs == 1 || (FDEV(i).start_blk <= zone_blkaddr &&
+					  zone_blkaddr <= FDEV(i).end_blk))
+			return &FDEV(i);
+	}
+
+	return NULL;
+}
+
+static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
+{
+	struct curseg_info *cs = CURSEG_I(sbi, type);
+	struct f2fs_dev_info *zbd;
+	struct blk_zone zone;
+	unsigned int cs_section, wp_segno, wp_blkoff, nr_zones, wp_sector_off;
+	block_t cs_zone_block, wp_block, cs_block;
+	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
+	sector_t zone_sector;
+	int err;
+
+	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
+	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
+	cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff;
+
+	zbd = get_target_zoned_dev(sbi, cs_zone_block);
+	if (!zbd)
+		return 0;
+
+	/* report zone for the sector the curseg points to */
+	zone_sector = (sector_t)(cs_zone_block - zbd->start_blk)
+		<< log_sectors_per_block;
+	nr_zones = 1;
+	err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones);
+	if (err) {
+		f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
+			 zbd->path, err);
+		return err;
+	}
+
+	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+		return 0;
+
+	wp_block = zbd->start_blk + (zone.wp >> log_sectors_per_block);
+	wp_segno = GET_SEGNO(sbi, wp_block);
+	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
+	wp_sector_off = zone.wp & GENMASK(log_sectors_per_block - 1, 0);
+
+	if (cs->segno == wp_segno && cs->next_blkoff == wp_blkoff &&
+	    wp_sector_off == 0)
+		return 0;
+
+	f2fs_notice(sbi, "Unaligned curseg[%d] with write pointer: "
+		    "curseg[0x%x,0x%x] wp[0x%x,0x%x]",
+		    type, cs->segno, cs->next_blkoff, wp_segno, wp_blkoff);
+
+	f2fs_notice(sbi, "Assign new section to curseg[%d]: "
+		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
+	allocate_segment_by_default(sbi, type, true);
+
+	/* check newly assigned zone */
+	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
+	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
+
+	zbd = get_target_zoned_dev(sbi, cs_zone_block);
+	if (!zbd)
+		return 0;
+
+	zone_sector = (sector_t)(cs_zone_block - zbd->start_blk)
+		<< log_sectors_per_block;
+	nr_zones = 1;
+	err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones);
+	if (err) {
+		f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
+			 zbd->path, err);
+		return err;
+	}
+
+	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+		return 0;
+
+	if (zone.wp != zone.start) {
+		f2fs_notice(sbi,
+			    "New section for curseg[%d] is not empty, "
+			    "run fsck to fix: curseg[0x%x,0x%x]",
+			    type, cs->segno, cs->next_blkoff);
+		set_sbi_flag(sbi, SBI_NEED_FSCK);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
+{
+	int i, ret;
+
+	for (i = 0; i < NO_CHECK_TYPE; i++) {
+		ret = fix_curseg_write_pointer(sbi, i);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+#else
+int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
+{
+	return 0;
+}
+#endif
+
 /*
  * Update min, max modified time for cost-benefit GC algorithm
  */
-- 
2.23.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-11-14  8:19 [f2fs-dev] [PATCH v3 0/2] f2fs: Check write pointers of zoned block devices Shin'ichiro Kawasaki
  2019-11-14  8:19 ` [f2fs-dev] [PATCH v3 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki
@ 2019-11-14  8:19 ` Shin'ichiro Kawasaki
  2019-11-25  7:37   ` Chao Yu
  1 sibling, 1 reply; 14+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-11-14  8:19 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

To catch f2fs bugs in write pointer handling code for zoned block
devices, check write pointers of non-open zones that current segments do
not point to. Do this check at mount time, after the fsync data recovery
and current segments' write pointer consistency fix. Or when fsync data
recovery is disabled by mount option, do the check when there is no fsync
data.

Check two items comparing write pointers with valid block maps in SIT.
The first item is check for zones with no valid blocks. When there is no
valid blocks in a zone, the write pointer should be at the start of the
zone. If not, next write operation to the zone will cause unaligned write
error. If write pointer is not at the zone start, make mount fail and ask
users to run fsck.

The second item is check between the write pointer position and the last
valid block in the zone. It is unexpected that the last valid block
position is beyond the write pointer. In such a case, report as a bug.
Fix is not required for such zone, because the zone is not selected for
next write operation until the zone get discarded.

Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
in segment.c also.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 fs/f2fs/f2fs.h    |   3 +
 fs/f2fs/segment.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/f2fs/super.c   |  16 ++++-
 3 files changed, 165 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index a2e24718c13b..1bb64950d793 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
 			unsigned int val, int alloc);
 void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
 int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
+int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
 int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
 void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
 int __init f2fs_create_segment_manager_caches(void);
@@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
 
 	return test_bit(zno, FDEV(devi).blkz_seq);
 }
+
+#define F2FS_REPORT_NR_ZONES   4096
 #endif
 
 static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 6ece146dab34..29e3b6f62f8c 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -4333,6 +4333,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
 
 #ifdef CONFIG_BLK_DEV_ZONED
 
+static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
+				    struct f2fs_dev_info *fdev,
+				    struct blk_zone *zone)
+{
+	unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
+	block_t zone_block, wp_block, last_valid_block;
+	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
+	int i, s, b;
+	struct seg_entry *se;
+
+	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
+	wp_segno = GET_SEGNO(sbi, wp_block);
+	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
+	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
+	zone_segno = GET_SEGNO(sbi, zone_block);
+	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
+
+	if (zone_segno >= MAIN_SEGS(sbi))
+		return 0;
+
+	/*
+	 * Skip check of zones cursegs point to, since
+	 * fix_curseg_write_pointer() checks them.
+	 */
+	for (i = 0; i < NO_CHECK_TYPE; i++)
+		if (zone_secno == GET_SEC_FROM_SEG(sbi,
+						   CURSEG_I(sbi, i)->segno))
+			return 0;
+
+	/*
+	 * Get last valid block of the zone.
+	 */
+	last_valid_block = zone_block - 1;
+	for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
+		segno = zone_segno + s;
+		se = get_seg_entry(sbi, segno);
+		for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
+			if (f2fs_test_bit(b, se->cur_valid_map)) {
+				last_valid_block = START_BLOCK(sbi, segno) + b;
+				break;
+			}
+		if (last_valid_block >= zone_block)
+			break;
+	}
+
+	/*
+	 * If last valid block is beyond the write pointer, report the
+	 * inconsistency. This inconsistency does not cause write error
+	 * because the zone will not be selected for write operation until
+	 * it get discarded. Just report it.
+	 */
+	if (last_valid_block >= wp_block) {
+		f2fs_notice(sbi, "Valid block beyond write pointer: "
+			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
+			    GET_SEGNO(sbi, last_valid_block),
+			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
+			    wp_segno, wp_blkoff);
+		return 0;
+	}
+
+	/*
+	 * If there is no valid block in the zone and if write pointer is
+	 * not at zone start, report the error to run fsck.
+	 */
+	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
+		f2fs_notice(sbi,
+			    "Zone without valid block has non-zero write "
+			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
+			    wp_segno, wp_blkoff);
+		f2fs_stop_checkpoint(sbi, true);
+		set_sbi_flag(sbi, SBI_NEED_FSCK);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int check_dev_write_pointer(struct f2fs_sb_info *sbi,
+				   struct f2fs_dev_info *fdev) {
+	sector_t nr_sectors = fdev->bdev->bd_part->nr_sects;
+	sector_t sector = 0;
+	struct blk_zone *zones;
+	unsigned int i, nr_zones;
+	unsigned int n = 0;
+	int err = -EIO;
+
+	if (!bdev_is_zoned(fdev->bdev))
+		return 0;
+
+	zones = f2fs_kzalloc(sbi,
+			     array_size(F2FS_REPORT_NR_ZONES,
+					sizeof(struct blk_zone)),
+			     GFP_KERNEL);
+	if (!zones)
+		return -ENOMEM;
+
+	/* Get block zones type */
+	while (zones && sector < nr_sectors) {
+
+		nr_zones = F2FS_REPORT_NR_ZONES;
+		err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones);
+		if (err)
+			break;
+		if (!nr_zones) {
+			err = -EIO;
+			break;
+		}
+
+		for (i = 0; i < nr_zones; i++) {
+			if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
+				err = check_zone_write_pointer(sbi, fdev,
+							       &zones[i]);
+				if (err)
+					break;
+			}
+			sector += zones[i].len;
+			n++;
+		}
+		if (err)
+			break;
+	}
+
+	kvfree(zones);
+
+	return err;
+}
+
 static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
 						  block_t zone_blkaddr)
 {
@@ -4399,6 +4526,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
 		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
 	allocate_segment_by_default(sbi, type, true);
 
+	/* check consistency of the zone curseg pointed to */
+	if (check_zone_write_pointer(sbi, zbd, &zone))
+		return -EIO;
+
 	/* check newly assigned zone */
 	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
 	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
@@ -4444,11 +4575,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
 
 	return 0;
 }
+
+int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
+{
+	int i, ret;
+
+	for (i = 0; i < sbi->s_ndevs; i++) {
+		ret = check_dev_write_pointer(sbi, &FDEV(i));
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
 #else
 int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
 {
 	return 0;
 }
+
+int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
+{
+	return 0;
+}
 #endif
 
 /*
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 1443cee15863..8ca772670c67 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
 	if (!FDEV(devi).blkz_seq)
 		return -ENOMEM;
 
-#define F2FS_REPORT_NR_ZONES   4096
-
 	zones = f2fs_kzalloc(sbi,
 			     array_size(F2FS_REPORT_NR_ZONES,
 					sizeof(struct blk_zone)),
@@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 
 		err = f2fs_recover_fsync_data(sbi, false);
 		if (err < 0) {
-			if (err != -ENOMEM)
+			if (err != -ENOMEM &&
+			    !is_sbi_flag_set(sbi, SBI_NEED_FSCK))
 				skip_recovery = true;
 			need_fsck = true;
 			f2fs_err(sbi, "Cannot recover all fsync data errno=%d",
@@ -3525,6 +3524,17 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 			goto free_meta;
 		}
 	}
+
+	/*
+	 * If the f2fs is not readonly and fsync data recovery succeeds,
+	 * check zoned block devices' write pointer consistency.
+	 */
+	if (!err && !f2fs_readonly(sb) && f2fs_sb_has_blkzoned(sbi)) {
+		err = f2fs_check_write_pointer(sbi);
+		if (err)
+			goto free_meta;
+	}
+
 reset_checkpoint:
 	/* f2fs_recover_fsync_data() cleared this already */
 	clear_sbi_flag(sbi, SBI_POR_DOING);
-- 
2.23.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [PATCH v3 1/2] f2fs: Check write pointer consistency of open zones
  2019-11-14  8:19 ` [f2fs-dev] [PATCH v3 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki
@ 2019-11-25  6:59   ` Chao Yu
  2019-11-28  4:07     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 14+ messages in thread
From: Chao Yu @ 2019-11-25  6:59 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, Jaegeuk Kim, linux-f2fs-devel; +Cc: Damien Le Moal

On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
> On sudden f2fs shutdown, write pointers of zoned block devices can go
> further but f2fs meta data keeps current segments at positions before the
> write operations. After remounting the f2fs, this inconsistency causes
> write operations not at write pointers and "Unaligned write command"
> error is reported.
> 
> To avoid the error, compare current segments with write pointers of open
> zones the current segments point to, during mount operation. If the write
> pointer position is not aligned with the current segment position, assign
> a new zone to the current segment. Also check the newly assigned zone has
> write pointer at zone start. If not, make mount fail and ask users to run
> fsck.
> 
> Perform the consistency check during fsync recovery. Not to lose the
> fsync data, do the check after fsync data gets restored and before
> checkpoint commit which flushes data at current segment positions. Not to
> cause conflict with kworker's dirfy data/node flush, do the fix within
> SBI_POR_DOING protection.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  fs/f2fs/f2fs.h     |   1 +
>  fs/f2fs/recovery.c |  17 ++++++-
>  fs/f2fs/segment.c  | 120 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 136 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4024790028aa..a2e24718c13b 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3136,6 +3136,7 @@ void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
>  int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
>  			unsigned int val, int alloc);
>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
>  int __init f2fs_create_segment_manager_caches(void);
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 783773e4560d..712054ed8d64 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -784,9 +784,22 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
>  	if (err) {
>  		truncate_inode_pages_final(NODE_MAPPING(sbi));
>  		truncate_inode_pages_final(META_MAPPING(sbi));
> -	} else {
> -		clear_sbi_flag(sbi, SBI_POR_DOING);
>  	}
> +
> +	/*
> +	 * If fsync data succeeds or there is no fsync data to recover,
> +	 * and the f2fs is not read only, check and fix zoned block devices'
> +	 * write pointer consistency.
> +	 */
> +	if (!ret && !err && !f2fs_readonly(sbi->sb)

Using !check_only will be more readable?

if (!err && !check_only && !f2fs_readonly(sbi->sb)

> +	    && f2fs_sb_has_blkzoned(sbi)) {
> +		err = f2fs_fix_curseg_write_pointer(sbi);
> +		ret = err;
> +	}
> +
> +	if (!err)
> +		clear_sbi_flag(sbi, SBI_POR_DOING);
> +
>  	mutex_unlock(&sbi->cp_mutex);
>  
>  	/* let's drop all the directory inodes for clean checkpoint */
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 808709581481..6ece146dab34 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -4331,6 +4331,126 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_BLK_DEV_ZONED
> +
> +static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
> +						  block_t zone_blkaddr)
> +{
> +	int i;
> +
> +	for (i = 0; i < sbi->s_ndevs; i++) {
> +		if (!bdev_is_zoned(FDEV(i).bdev))
> +			continue;
> +		if (sbi->s_ndevs == 1 || (FDEV(i).start_blk <= zone_blkaddr &&
> +					  zone_blkaddr <= FDEV(i).end_blk))
> +			return &FDEV(i);
> +	}
> +
> +	return NULL;
> +}
> +
> +static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> +{
> +	struct curseg_info *cs = CURSEG_I(sbi, type);
> +	struct f2fs_dev_info *zbd;
> +	struct blk_zone zone;
> +	unsigned int cs_section, wp_segno, wp_blkoff, nr_zones, wp_sector_off;
> +	block_t cs_zone_block, wp_block, cs_block;
> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> +	sector_t zone_sector;
> +	int err;
> +
> +	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
> +	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
> +	cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff;
> +
> +	zbd = get_target_zoned_dev(sbi, cs_zone_block);
> +	if (!zbd)
> +		return 0;
> +
> +	/* report zone for the sector the curseg points to */
> +	zone_sector = (sector_t)(cs_zone_block - zbd->start_blk)
> +		<< log_sectors_per_block;
> +	nr_zones = 1;
> +	err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones);
> +	if (err) {
> +		f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
> +			 zbd->path, err);
> +		return err;
> +	}
> +
> +	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
> +		return 0;
> +
> +	wp_block = zbd->start_blk + (zone.wp >> log_sectors_per_block);
> +	wp_segno = GET_SEGNO(sbi, wp_block);
> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> +	wp_sector_off = zone.wp & GENMASK(log_sectors_per_block - 1, 0);
> +
> +	if (cs->segno == wp_segno && cs->next_blkoff == wp_blkoff &&
> +	    wp_sector_off == 0)

We uses indent instead of space in f2fs coding style, please keep line
with it.

Thanks,

> +		return 0;
> +
> +	f2fs_notice(sbi, "Unaligned curseg[%d] with write pointer: "
> +		    "curseg[0x%x,0x%x] wp[0x%x,0x%x]",
> +		    type, cs->segno, cs->next_blkoff, wp_segno, wp_blkoff);
> +
> +	f2fs_notice(sbi, "Assign new section to curseg[%d]: "
> +		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
> +	allocate_segment_by_default(sbi, type, true);
> +
> +	/* check newly assigned zone */
> +	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
> +	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
> +
> +	zbd = get_target_zoned_dev(sbi, cs_zone_block);
> +	if (!zbd)
> +		return 0;
> +
> +	zone_sector = (sector_t)(cs_zone_block - zbd->start_blk)
> +		<< log_sectors_per_block;
> +	nr_zones = 1;
> +	err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones);
> +	if (err) {
> +		f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
> +			 zbd->path, err);
> +		return err;
> +	}
> +
> +	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
> +		return 0;
> +
> +	if (zone.wp != zone.start) {
> +		f2fs_notice(sbi,
> +			    "New section for curseg[%d] is not empty, "
> +			    "run fsck to fix: curseg[0x%x,0x%x]",
> +			    type, cs->segno, cs->next_blkoff);
> +		set_sbi_flag(sbi, SBI_NEED_FSCK);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
> +{
> +	int i, ret;
> +
> +	for (i = 0; i < NO_CHECK_TYPE; i++) {
> +		ret = fix_curseg_write_pointer(sbi, i);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +#else
> +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
> +{
> +	return 0;
> +}
> +#endif
> +
>  /*
>   * Update min, max modified time for cost-benefit GC algorithm
>   */
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-11-14  8:19 ` [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
@ 2019-11-25  7:37   ` Chao Yu
  2019-11-28  5:31     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 14+ messages in thread
From: Chao Yu @ 2019-11-25  7:37 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, Jaegeuk Kim, linux-f2fs-devel; +Cc: Damien Le Moal

On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
> To catch f2fs bugs in write pointer handling code for zoned block
> devices, check write pointers of non-open zones that current segments do
> not point to. Do this check at mount time, after the fsync data recovery
> and current segments' write pointer consistency fix. Or when fsync data
> recovery is disabled by mount option, do the check when there is no fsync
> data.
> 
> Check two items comparing write pointers with valid block maps in SIT.
> The first item is check for zones with no valid blocks. When there is no
> valid blocks in a zone, the write pointer should be at the start of the
> zone. If not, next write operation to the zone will cause unaligned write
> error. If write pointer is not at the zone start, make mount fail and ask
> users to run fsck.
> 
> The second item is check between the write pointer position and the last
> valid block in the zone. It is unexpected that the last valid block
> position is beyond the write pointer. In such a case, report as a bug.
> Fix is not required for such zone, because the zone is not selected for
> next write operation until the zone get discarded.
> 
> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
> in segment.c also.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  fs/f2fs/f2fs.h    |   3 +
>  fs/f2fs/segment.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/f2fs/super.c   |  16 ++++-
>  3 files changed, 165 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index a2e24718c13b..1bb64950d793 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
>  			unsigned int val, int alloc);
>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
>  int __init f2fs_create_segment_manager_caches(void);
> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
>  
>  	return test_bit(zno, FDEV(devi).blkz_seq);
>  }
> +
> +#define F2FS_REPORT_NR_ZONES   4096
>  #endif
>  
>  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 6ece146dab34..29e3b6f62f8c 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -4333,6 +4333,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
>  
>  #ifdef CONFIG_BLK_DEV_ZONED
>  
> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
> +				    struct f2fs_dev_info *fdev,
> +				    struct blk_zone *zone)
> +{
> +	unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
> +	block_t zone_block, wp_block, last_valid_block;
> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> +	int i, s, b;
> +	struct seg_entry *se;
> +
> +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
> +	wp_segno = GET_SEGNO(sbi, wp_block);
> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
> +	zone_segno = GET_SEGNO(sbi, zone_block);
> +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
> +
> +	if (zone_segno >= MAIN_SEGS(sbi))
> +		return 0;
> +
> +	/*
> +	 * Skip check of zones cursegs point to, since
> +	 * fix_curseg_write_pointer() checks them.
> +	 */
> +	for (i = 0; i < NO_CHECK_TYPE; i++)
> +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
> +						   CURSEG_I(sbi, i)->segno))
> +			return 0;
> +
> +	/*
> +	 * Get last valid block of the zone.
> +	 */
> +	last_valid_block = zone_block - 1;
> +	for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
> +		segno = zone_segno + s;
> +		se = get_seg_entry(sbi, segno);
> +		for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
> +			if (f2fs_test_bit(b, se->cur_valid_map)) {
> +				last_valid_block = START_BLOCK(sbi, segno) + b;
> +				break;
> +			}
> +		if (last_valid_block >= zone_block)
> +			break;
> +	}
> +
> +	/*
> +	 * If last valid block is beyond the write pointer, report the
> +	 * inconsistency. This inconsistency does not cause write error
> +	 * because the zone will not be selected for write operation until
> +	 * it get discarded. Just report it.
> +	 */
> +	if (last_valid_block >= wp_block) {
> +		f2fs_notice(sbi, "Valid block beyond write pointer: "
> +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
> +			    GET_SEGNO(sbi, last_valid_block),
> +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
> +			    wp_segno, wp_blkoff);
> +		return 0;
> +	}
> +
> +	/*
> +	 * If there is no valid block in the zone and if write pointer is
> +	 * not at zone start, report the error to run fsck.

So we only need to report this as inconsistent status in the condition of
discard has been triggered, right? otherwise, f2fs will trigger discard later
to reset zone->wp before opening this zone?

Thanks,

> +	 */
> +	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
> +		f2fs_notice(sbi,
> +			    "Zone without valid block has non-zero write "
> +			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
> +			    wp_segno, wp_blkoff);
> +		f2fs_stop_checkpoint(sbi, true);
> +		set_sbi_flag(sbi, SBI_NEED_FSCK);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int check_dev_write_pointer(struct f2fs_sb_info *sbi,
> +				   struct f2fs_dev_info *fdev) {
> +	sector_t nr_sectors = fdev->bdev->bd_part->nr_sects;
> +	sector_t sector = 0;
> +	struct blk_zone *zones;
> +	unsigned int i, nr_zones;
> +	unsigned int n = 0;
> +	int err = -EIO;
> +
> +	if (!bdev_is_zoned(fdev->bdev))
> +		return 0;
> +
> +	zones = f2fs_kzalloc(sbi,
> +			     array_size(F2FS_REPORT_NR_ZONES,
> +					sizeof(struct blk_zone)),
> +			     GFP_KERNEL);
> +	if (!zones)
> +		return -ENOMEM;
> +
> +	/* Get block zones type */
> +	while (zones && sector < nr_sectors) {
> +
> +		nr_zones = F2FS_REPORT_NR_ZONES;
> +		err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones);
> +		if (err)
> +			break;
> +		if (!nr_zones) {
> +			err = -EIO;
> +			break;
> +		}
> +
> +		for (i = 0; i < nr_zones; i++) {
> +			if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
> +				err = check_zone_write_pointer(sbi, fdev,
> +							       &zones[i]);
> +				if (err)
> +					break;
> +			}
> +			sector += zones[i].len;
> +			n++;
> +		}
> +		if (err)
> +			break;
> +	}
> +
> +	kvfree(zones);
> +
> +	return err;
> +}
> +
>  static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
>  						  block_t zone_blkaddr)
>  {
> @@ -4399,6 +4526,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
>  		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
>  	allocate_segment_by_default(sbi, type, true);
>  
> +	/* check consistency of the zone curseg pointed to */
> +	if (check_zone_write_pointer(sbi, zbd, &zone))
> +		return -EIO;
> +
>  	/* check newly assigned zone */
>  	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
>  	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
> @@ -4444,11 +4575,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
>  
>  	return 0;
>  }
> +
> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
> +{
> +	int i, ret;
> +
> +	for (i = 0; i < sbi->s_ndevs; i++) {
> +		ret = check_dev_write_pointer(sbi, &FDEV(i));
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
>  #else
>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
>  {
>  	return 0;
>  }
> +
> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
> +{
> +	return 0;
> +}
>  #endif
>  
>  /*
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 1443cee15863..8ca772670c67 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>  	if (!FDEV(devi).blkz_seq)
>  		return -ENOMEM;
>  
> -#define F2FS_REPORT_NR_ZONES   4096
> -
>  	zones = f2fs_kzalloc(sbi,
>  			     array_size(F2FS_REPORT_NR_ZONES,
>  					sizeof(struct blk_zone)),
> @@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  
>  		err = f2fs_recover_fsync_data(sbi, false);
>  		if (err < 0) {
> -			if (err != -ENOMEM)
> +			if (err != -ENOMEM &&
> +			    !is_sbi_flag_set(sbi, SBI_NEED_FSCK))
>  				skip_recovery = true;
>  			need_fsck = true;
>  			f2fs_err(sbi, "Cannot recover all fsync data errno=%d",
> @@ -3525,6 +3524,17 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  			goto free_meta;
>  		}
>  	}
> +
> +	/*
> +	 * If the f2fs is not readonly and fsync data recovery succeeds,
> +	 * check zoned block devices' write pointer consistency.
> +	 */
> +	if (!err && !f2fs_readonly(sb) && f2fs_sb_has_blkzoned(sbi)) {
> +		err = f2fs_check_write_pointer(sbi);
> +		if (err)
> +			goto free_meta;
> +	}
> +
>  reset_checkpoint:
>  	/* f2fs_recover_fsync_data() cleared this already */
>  	clear_sbi_flag(sbi, SBI_POR_DOING);
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [PATCH v3 1/2] f2fs: Check write pointer consistency of open zones
  2019-11-25  6:59   ` Chao Yu
@ 2019-11-28  4:07     ` Shinichiro Kawasaki
  2019-11-28 12:26       ` Chao Yu
  0 siblings, 1 reply; 14+ messages in thread
From: Shinichiro Kawasaki @ 2019-11-28  4:07 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 25, 2019 / 14:59, Chao Yu wrote:
> On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
> > On sudden f2fs shutdown, write pointers of zoned block devices can go
> > further but f2fs meta data keeps current segments at positions before the
> > write operations. After remounting the f2fs, this inconsistency causes
> > write operations not at write pointers and "Unaligned write command"
> > error is reported.
> > 
> > To avoid the error, compare current segments with write pointers of open
> > zones the current segments point to, during mount operation. If the write
> > pointer position is not aligned with the current segment position, assign
> > a new zone to the current segment. Also check the newly assigned zone has
> > write pointer at zone start. If not, make mount fail and ask users to run
> > fsck.
> > 
> > Perform the consistency check during fsync recovery. Not to lose the
> > fsync data, do the check after fsync data gets restored and before
> > checkpoint commit which flushes data at current segment positions. Not to
> > cause conflict with kworker's dirfy data/node flush, do the fix within
> > SBI_POR_DOING protection.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  fs/f2fs/f2fs.h     |   1 +
> >  fs/f2fs/recovery.c |  17 ++++++-
> >  fs/f2fs/segment.c  | 120 +++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 136 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 4024790028aa..a2e24718c13b 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -3136,6 +3136,7 @@ void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
> >  int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> >  			unsigned int val, int alloc);
> >  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> > +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
> >  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
> >  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
> >  int __init f2fs_create_segment_manager_caches(void);
> > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> > index 783773e4560d..712054ed8d64 100644
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -784,9 +784,22 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
> >  	if (err) {
> >  		truncate_inode_pages_final(NODE_MAPPING(sbi));
> >  		truncate_inode_pages_final(META_MAPPING(sbi));
> > -	} else {
> > -		clear_sbi_flag(sbi, SBI_POR_DOING);
> >  	}
> > +
> > +	/*
> > +	 * If fsync data succeeds or there is no fsync data to recover,
> > +	 * and the f2fs is not read only, check and fix zoned block devices'
> > +	 * write pointer consistency.
> > +	 */
> > +	if (!ret && !err && !f2fs_readonly(sbi->sb)
> 
> Using !check_only will be more readable?
> 
> if (!err && !check_only && !f2fs_readonly(sbi->sb)

When check_only is on and there is no fsync data, I think we should fix the
write pointer inconsistency. With the condition you suggested, this case can
not be covered.

Having said that, my expression with !ret is not good from readability point
of view. How about this?


bool fix_curseg_write_pointer;
fix_curseg_write_pointer = !check_only || (check_only && list_empty(&inode_list));

...

if (!err && fix_curseg_write_pointer && !f2fs_readonly(sbi->sb)
	&& f2fs_sb_has_blkzoned(sbi)) {
	err = f2fs_fix_curseg_write_pointer(sbi);
	ret = err;
}


> > +	    && f2fs_sb_has_blkzoned(sbi)) {
> > +		err = f2fs_fix_curseg_write_pointer(sbi);
> > +		ret = err;
> > +	}
> > +
> > +	if (!err)
> > +		clear_sbi_flag(sbi, SBI_POR_DOING);
> > +
> >  	mutex_unlock(&sbi->cp_mutex);
> >  
> >  	/* let's drop all the directory inodes for clean checkpoint */
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index 808709581481..6ece146dab34 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -4331,6 +4331,126 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
> >  	return 0;
> >  }
> >  
> > +#ifdef CONFIG_BLK_DEV_ZONED
> > +
> > +static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
> > +						  block_t zone_blkaddr)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < sbi->s_ndevs; i++) {
> > +		if (!bdev_is_zoned(FDEV(i).bdev))
> > +			continue;
> > +		if (sbi->s_ndevs == 1 || (FDEV(i).start_blk <= zone_blkaddr &&
> > +					  zone_blkaddr <= FDEV(i).end_blk))
> > +			return &FDEV(i);
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> > +static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> > +{
> > +	struct curseg_info *cs = CURSEG_I(sbi, type);
> > +	struct f2fs_dev_info *zbd;
> > +	struct blk_zone zone;
> > +	unsigned int cs_section, wp_segno, wp_blkoff, nr_zones, wp_sector_off;
> > +	block_t cs_zone_block, wp_block, cs_block;
> > +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> > +	sector_t zone_sector;
> > +	int err;
> > +
> > +	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
> > +	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
> > +	cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff;
> > +
> > +	zbd = get_target_zoned_dev(sbi, cs_zone_block);
> > +	if (!zbd)
> > +		return 0;
> > +
> > +	/* report zone for the sector the curseg points to */
> > +	zone_sector = (sector_t)(cs_zone_block - zbd->start_blk)
> > +		<< log_sectors_per_block;
> > +	nr_zones = 1;
> > +	err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones);
> > +	if (err) {
> > +		f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
> > +			 zbd->path, err);
> > +		return err;
> > +	}
> > +
> > +	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
> > +		return 0;
> > +
> > +	wp_block = zbd->start_blk + (zone.wp >> log_sectors_per_block);
> > +	wp_segno = GET_SEGNO(sbi, wp_block);
> > +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> > +	wp_sector_off = zone.wp & GENMASK(log_sectors_per_block - 1, 0);
> > +
> > +	if (cs->segno == wp_segno && cs->next_blkoff == wp_blkoff &&
> > +	    wp_sector_off == 0)
> 
> We uses indent instead of space in f2fs coding style, please keep line
> with it.

Noted this f2fs conding style. Will replace the spaces with tab indent.
Thanks!

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-11-25  7:37   ` Chao Yu
@ 2019-11-28  5:31     ` Shinichiro Kawasaki
  2019-11-28 12:39       ` Chao Yu
  0 siblings, 1 reply; 14+ messages in thread
From: Shinichiro Kawasaki @ 2019-11-28  5:31 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 25, 2019 / 15:37, Chao Yu wrote:
> On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
> > To catch f2fs bugs in write pointer handling code for zoned block
> > devices, check write pointers of non-open zones that current segments do
> > not point to. Do this check at mount time, after the fsync data recovery
> > and current segments' write pointer consistency fix. Or when fsync data
> > recovery is disabled by mount option, do the check when there is no fsync
> > data.
> > 
> > Check two items comparing write pointers with valid block maps in SIT.
> > The first item is check for zones with no valid blocks. When there is no
> > valid blocks in a zone, the write pointer should be at the start of the
> > zone. If not, next write operation to the zone will cause unaligned write
> > error. If write pointer is not at the zone start, make mount fail and ask
> > users to run fsck.
> > 
> > The second item is check between the write pointer position and the last
> > valid block in the zone. It is unexpected that the last valid block
> > position is beyond the write pointer. In such a case, report as a bug.
> > Fix is not required for such zone, because the zone is not selected for
> > next write operation until the zone get discarded.
> > 
> > Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
> > in segment.c also.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  fs/f2fs/f2fs.h    |   3 +
> >  fs/f2fs/segment.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/f2fs/super.c   |  16 ++++-
> >  3 files changed, 165 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index a2e24718c13b..1bb64950d793 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> >  			unsigned int val, int alloc);
> >  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> >  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
> > +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
> >  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
> >  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
> >  int __init f2fs_create_segment_manager_caches(void);
> > @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
> >  
> >  	return test_bit(zno, FDEV(devi).blkz_seq);
> >  }
> > +
> > +#define F2FS_REPORT_NR_ZONES   4096
> >  #endif
> >  
> >  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index 6ece146dab34..29e3b6f62f8c 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -4333,6 +4333,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
> >  
> >  #ifdef CONFIG_BLK_DEV_ZONED
> >  
> > +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
> > +				    struct f2fs_dev_info *fdev,
> > +				    struct blk_zone *zone)
> > +{
> > +	unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
> > +	block_t zone_block, wp_block, last_valid_block;
> > +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> > +	int i, s, b;
> > +	struct seg_entry *se;
> > +
> > +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
> > +	wp_segno = GET_SEGNO(sbi, wp_block);
> > +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> > +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
> > +	zone_segno = GET_SEGNO(sbi, zone_block);
> > +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
> > +
> > +	if (zone_segno >= MAIN_SEGS(sbi))
> > +		return 0;
> > +
> > +	/*
> > +	 * Skip check of zones cursegs point to, since
> > +	 * fix_curseg_write_pointer() checks them.
> > +	 */
> > +	for (i = 0; i < NO_CHECK_TYPE; i++)
> > +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
> > +						   CURSEG_I(sbi, i)->segno))
> > +			return 0;
> > +
> > +	/*
> > +	 * Get last valid block of the zone.
> > +	 */
> > +	last_valid_block = zone_block - 1;
> > +	for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
> > +		segno = zone_segno + s;
> > +		se = get_seg_entry(sbi, segno);
> > +		for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
> > +			if (f2fs_test_bit(b, se->cur_valid_map)) {
> > +				last_valid_block = START_BLOCK(sbi, segno) + b;
> > +				break;
> > +			}
> > +		if (last_valid_block >= zone_block)
> > +			break;
> > +	}
> > +
> > +	/*
> > +	 * If last valid block is beyond the write pointer, report the
> > +	 * inconsistency. This inconsistency does not cause write error
> > +	 * because the zone will not be selected for write operation until
> > +	 * it get discarded. Just report it.
> > +	 */
> > +	if (last_valid_block >= wp_block) {
> > +		f2fs_notice(sbi, "Valid block beyond write pointer: "
> > +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
> > +			    GET_SEGNO(sbi, last_valid_block),
> > +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
> > +			    wp_segno, wp_blkoff);
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * If there is no valid block in the zone and if write pointer is
> > +	 * not at zone start, report the error to run fsck.
> 
> So we only need to report this as inconsistent status in the condition of
> discard has been triggered, right? otherwise, f2fs will trigger discard later
> to reset zone->wp before opening this zone?

Hmm, my intent was to catch the inconsistency at mount time, assuming the
inconsistency is not expected at mount time. In other words, I assume that
discard is triggered for zones without valid blocks before that last clean
umount. If the last sudden f2fs shutdown without clean umount caused the
inconsistency, it should be reported and fixed, I think.

SIT valid blocks are referred to check if there is no valid blocks in the zone.
SIT may be broken due to software bug or hardware flaw, then I think it is the
better to run fsck rather than discard by f2fs.

If I miss anything, please let me know.

--
Best Regards,
Shin'ichiro Kawasaki

> 
> Thanks,
> 
> > +	 */
> > +	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
> > +		f2fs_notice(sbi,
> > +			    "Zone without valid block has non-zero write "
> > +			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
> > +			    wp_segno, wp_blkoff);
> > +		f2fs_stop_checkpoint(sbi, true);
> > +		set_sbi_flag(sbi, SBI_NEED_FSCK);
> > +		return -EINVAL;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int check_dev_write_pointer(struct f2fs_sb_info *sbi,
> > +				   struct f2fs_dev_info *fdev) {
> > +	sector_t nr_sectors = fdev->bdev->bd_part->nr_sects;
> > +	sector_t sector = 0;
> > +	struct blk_zone *zones;
> > +	unsigned int i, nr_zones;
> > +	unsigned int n = 0;
> > +	int err = -EIO;
> > +
> > +	if (!bdev_is_zoned(fdev->bdev))
> > +		return 0;
> > +
> > +	zones = f2fs_kzalloc(sbi,
> > +			     array_size(F2FS_REPORT_NR_ZONES,
> > +					sizeof(struct blk_zone)),
> > +			     GFP_KERNEL);
> > +	if (!zones)
> > +		return -ENOMEM;
> > +
> > +	/* Get block zones type */
> > +	while (zones && sector < nr_sectors) {
> > +
> > +		nr_zones = F2FS_REPORT_NR_ZONES;
> > +		err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones);
> > +		if (err)
> > +			break;
> > +		if (!nr_zones) {
> > +			err = -EIO;
> > +			break;
> > +		}
> > +
> > +		for (i = 0; i < nr_zones; i++) {
> > +			if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
> > +				err = check_zone_write_pointer(sbi, fdev,
> > +							       &zones[i]);
> > +				if (err)
> > +					break;
> > +			}
> > +			sector += zones[i].len;
> > +			n++;
> > +		}
> > +		if (err)
> > +			break;
> > +	}
> > +
> > +	kvfree(zones);
> > +
> > +	return err;
> > +}
> > +
> >  static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
> >  						  block_t zone_blkaddr)
> >  {
> > @@ -4399,6 +4526,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> >  		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
> >  	allocate_segment_by_default(sbi, type, true);
> >  
> > +	/* check consistency of the zone curseg pointed to */
> > +	if (check_zone_write_pointer(sbi, zbd, &zone))
> > +		return -EIO;
> > +
> >  	/* check newly assigned zone */
> >  	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
> >  	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
> > @@ -4444,11 +4575,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
> >  
> >  	return 0;
> >  }
> > +
> > +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
> > +{
> > +	int i, ret;
> > +
> > +	for (i = 0; i < sbi->s_ndevs; i++) {
> > +		ret = check_dev_write_pointer(sbi, &FDEV(i));
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> >  #else
> >  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
> >  {
> >  	return 0;
> >  }
> > +
> > +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
> > +{
> > +	return 0;
> > +}
> >  #endif
> >  
> >  /*
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 1443cee15863..8ca772670c67 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
> >  	if (!FDEV(devi).blkz_seq)
> >  		return -ENOMEM;
> >  
> > -#define F2FS_REPORT_NR_ZONES   4096
> > -
> >  	zones = f2fs_kzalloc(sbi,
> >  			     array_size(F2FS_REPORT_NR_ZONES,
> >  					sizeof(struct blk_zone)),
> > @@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >  
> >  		err = f2fs_recover_fsync_data(sbi, false);
> >  		if (err < 0) {
> > -			if (err != -ENOMEM)
> > +			if (err != -ENOMEM &&
> > +			    !is_sbi_flag_set(sbi, SBI_NEED_FSCK))
> >  				skip_recovery = true;
> >  			need_fsck = true;
> >  			f2fs_err(sbi, "Cannot recover all fsync data errno=%d",
> > @@ -3525,6 +3524,17 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >  			goto free_meta;
> >  		}
> >  	}
> > +
> > +	/*
> > +	 * If the f2fs is not readonly and fsync data recovery succeeds,
> > +	 * check zoned block devices' write pointer consistency.
> > +	 */
> > +	if (!err && !f2fs_readonly(sb) && f2fs_sb_has_blkzoned(sbi)) {
> > +		err = f2fs_check_write_pointer(sbi);
> > +		if (err)
> > +			goto free_meta;
> > +	}
> > +
> >  reset_checkpoint:
> >  	/* f2fs_recover_fsync_data() cleared this already */
> >  	clear_sbi_flag(sbi, SBI_POR_DOING);
> > 

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [PATCH v3 1/2] f2fs: Check write pointer consistency of open zones
  2019-11-28  4:07     ` Shinichiro Kawasaki
@ 2019-11-28 12:26       ` Chao Yu
  2019-11-29  1:58         ` Shinichiro Kawasaki
  0 siblings, 1 reply; 14+ messages in thread
From: Chao Yu @ 2019-11-28 12:26 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On 2019/11/28 12:07, Shinichiro Kawasaki wrote:
> On Nov 25, 2019 / 14:59, Chao Yu wrote:
>> On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
>>> On sudden f2fs shutdown, write pointers of zoned block devices can go
>>> further but f2fs meta data keeps current segments at positions before the
>>> write operations. After remounting the f2fs, this inconsistency causes
>>> write operations not at write pointers and "Unaligned write command"
>>> error is reported.
>>>
>>> To avoid the error, compare current segments with write pointers of open
>>> zones the current segments point to, during mount operation. If the write
>>> pointer position is not aligned with the current segment position, assign
>>> a new zone to the current segment. Also check the newly assigned zone has
>>> write pointer at zone start. If not, make mount fail and ask users to run
>>> fsck.
>>>
>>> Perform the consistency check during fsync recovery. Not to lose the
>>> fsync data, do the check after fsync data gets restored and before
>>> checkpoint commit which flushes data at current segment positions. Not to
>>> cause conflict with kworker's dirfy data/node flush, do the fix within
>>> SBI_POR_DOING protection.
>>>
>>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>> ---
>>>  fs/f2fs/f2fs.h     |   1 +
>>>  fs/f2fs/recovery.c |  17 ++++++-
>>>  fs/f2fs/segment.c  | 120 +++++++++++++++++++++++++++++++++++++++++++++
>>>  3 files changed, 136 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 4024790028aa..a2e24718c13b 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -3136,6 +3136,7 @@ void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
>>>  int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
>>>  			unsigned int val, int alloc);
>>>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
>>> +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
>>>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
>>>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
>>>  int __init f2fs_create_segment_manager_caches(void);
>>> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
>>> index 783773e4560d..712054ed8d64 100644
>>> --- a/fs/f2fs/recovery.c
>>> +++ b/fs/f2fs/recovery.c
>>> @@ -784,9 +784,22 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
>>>  	if (err) {
>>>  		truncate_inode_pages_final(NODE_MAPPING(sbi));
>>>  		truncate_inode_pages_final(META_MAPPING(sbi));
>>> -	} else {
>>> -		clear_sbi_flag(sbi, SBI_POR_DOING);
>>>  	}
>>> +
>>> +	/*
>>> +	 * If fsync data succeeds or there is no fsync data to recover,
>>> +	 * and the f2fs is not read only, check and fix zoned block devices'
>>> +	 * write pointer consistency.
>>> +	 */
>>> +	if (!ret && !err && !f2fs_readonly(sbi->sb)
>>
>> Using !check_only will be more readable?
>>
>> if (!err && !check_only && !f2fs_readonly(sbi->sb)
> 
> When check_only is on and there is no fsync data, I think we should fix the
> write pointer inconsistency. With the condition you suggested, this case can
> not be covered.

Alright.

> 
> Having said that, my expression with !ret is not good from readability point
> of view. How about this?
> 
> 
> bool fix_curseg_write_pointer;
> fix_curseg_write_pointer = !check_only || (check_only && list_empty(&inode_list));

fix_curseg_write_pointer = !check_only || list_empty(&inode_list); is enough.

> 
> ...
> 
> if (!err && fix_curseg_write_pointer && !f2fs_readonly(sbi->sb)
> 	&& f2fs_sb_has_blkzoned(sbi)) {
> 	err = f2fs_fix_curseg_write_pointer(sbi);
> 	ret = err;
> }

It's okay to me.

Thanks,

> 
> 
>>> +	    && f2fs_sb_has_blkzoned(sbi)) {
>>> +		err = f2fs_fix_curseg_write_pointer(sbi);
>>> +		ret = err;
>>> +	}
>>> +
>>> +	if (!err)
>>> +		clear_sbi_flag(sbi, SBI_POR_DOING);
>>> +
>>>  	mutex_unlock(&sbi->cp_mutex);
>>>  
>>>  	/* let's drop all the directory inodes for clean checkpoint */
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index 808709581481..6ece146dab34 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -4331,6 +4331,126 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
>>>  	return 0;
>>>  }
>>>  
>>> +#ifdef CONFIG_BLK_DEV_ZONED
>>> +
>>> +static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
>>> +						  block_t zone_blkaddr)
>>> +{
>>> +	int i;
>>> +
>>> +	for (i = 0; i < sbi->s_ndevs; i++) {
>>> +		if (!bdev_is_zoned(FDEV(i).bdev))
>>> +			continue;
>>> +		if (sbi->s_ndevs == 1 || (FDEV(i).start_blk <= zone_blkaddr &&
>>> +					  zone_blkaddr <= FDEV(i).end_blk))
>>> +			return &FDEV(i);
>>> +	}
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
>>> +{
>>> +	struct curseg_info *cs = CURSEG_I(sbi, type);
>>> +	struct f2fs_dev_info *zbd;
>>> +	struct blk_zone zone;
>>> +	unsigned int cs_section, wp_segno, wp_blkoff, nr_zones, wp_sector_off;
>>> +	block_t cs_zone_block, wp_block, cs_block;
>>> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
>>> +	sector_t zone_sector;
>>> +	int err;
>>> +
>>> +	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
>>> +	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
>>> +	cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff;
>>> +
>>> +	zbd = get_target_zoned_dev(sbi, cs_zone_block);
>>> +	if (!zbd)
>>> +		return 0;
>>> +
>>> +	/* report zone for the sector the curseg points to */
>>> +	zone_sector = (sector_t)(cs_zone_block - zbd->start_blk)
>>> +		<< log_sectors_per_block;
>>> +	nr_zones = 1;
>>> +	err = blkdev_report_zones(zbd->bdev, zone_sector, &zone, &nr_zones);
>>> +	if (err) {
>>> +		f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
>>> +			 zbd->path, err);
>>> +		return err;
>>> +	}
>>> +
>>> +	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
>>> +		return 0;
>>> +
>>> +	wp_block = zbd->start_blk + (zone.wp >> log_sectors_per_block);
>>> +	wp_segno = GET_SEGNO(sbi, wp_block);
>>> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
>>> +	wp_sector_off = zone.wp & GENMASK(log_sectors_per_block - 1, 0);
>>> +
>>> +	if (cs->segno == wp_segno && cs->next_blkoff == wp_blkoff &&
>>> +	    wp_sector_off == 0)
>>
>> We uses indent instead of space in f2fs coding style, please keep line
>> with it.
> 
> Noted this f2fs conding style. Will replace the spaces with tab indent.
> Thanks!
> 
> --
> Best Regards,
> Shin'ichiro Kawasaki.
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-11-28  5:31     ` Shinichiro Kawasaki
@ 2019-11-28 12:39       ` Chao Yu
  2019-11-29  5:21         ` Shinichiro Kawasaki
  0 siblings, 1 reply; 14+ messages in thread
From: Chao Yu @ 2019-11-28 12:39 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel



On 2019/11/28 13:31, Shinichiro Kawasaki wrote:
> On Nov 25, 2019 / 15:37, Chao Yu wrote:
>> On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
>>> To catch f2fs bugs in write pointer handling code for zoned block
>>> devices, check write pointers of non-open zones that current segments do
>>> not point to. Do this check at mount time, after the fsync data recovery
>>> and current segments' write pointer consistency fix. Or when fsync data
>>> recovery is disabled by mount option, do the check when there is no fsync
>>> data.
>>>
>>> Check two items comparing write pointers with valid block maps in SIT.
>>> The first item is check for zones with no valid blocks. When there is no
>>> valid blocks in a zone, the write pointer should be at the start of the
>>> zone. If not, next write operation to the zone will cause unaligned write
>>> error. If write pointer is not at the zone start, make mount fail and ask
>>> users to run fsck.
>>>
>>> The second item is check between the write pointer position and the last
>>> valid block in the zone. It is unexpected that the last valid block
>>> position is beyond the write pointer. In such a case, report as a bug.
>>> Fix is not required for such zone, because the zone is not selected for
>>> next write operation until the zone get discarded.
>>>
>>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
>>> in segment.c also.
>>>
>>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>> ---
>>>  fs/f2fs/f2fs.h    |   3 +
>>>  fs/f2fs/segment.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++
>>>  fs/f2fs/super.c   |  16 ++++-
>>>  3 files changed, 165 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index a2e24718c13b..1bb64950d793 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
>>>  			unsigned int val, int alloc);
>>>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
>>>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
>>>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
>>>  int __init f2fs_create_segment_manager_caches(void);
>>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
>>>  
>>>  	return test_bit(zno, FDEV(devi).blkz_seq);
>>>  }
>>> +
>>> +#define F2FS_REPORT_NR_ZONES   4096
>>>  #endif
>>>  
>>>  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index 6ece146dab34..29e3b6f62f8c 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -4333,6 +4333,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
>>>  
>>>  #ifdef CONFIG_BLK_DEV_ZONED
>>>  
>>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
>>> +				    struct f2fs_dev_info *fdev,
>>> +				    struct blk_zone *zone)
>>> +{
>>> +	unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
>>> +	block_t zone_block, wp_block, last_valid_block;
>>> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
>>> +	int i, s, b;
>>> +	struct seg_entry *se;
>>> +
>>> +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
>>> +	wp_segno = GET_SEGNO(sbi, wp_block);
>>> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
>>> +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
>>> +	zone_segno = GET_SEGNO(sbi, zone_block);
>>> +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
>>> +
>>> +	if (zone_segno >= MAIN_SEGS(sbi))
>>> +		return 0;
>>> +
>>> +	/*
>>> +	 * Skip check of zones cursegs point to, since
>>> +	 * fix_curseg_write_pointer() checks them.
>>> +	 */
>>> +	for (i = 0; i < NO_CHECK_TYPE; i++)
>>> +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
>>> +						   CURSEG_I(sbi, i)->segno))
>>> +			return 0;
>>> +
>>> +	/*
>>> +	 * Get last valid block of the zone.
>>> +	 */
>>> +	last_valid_block = zone_block - 1;
>>> +	for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
>>> +		segno = zone_segno + s;
>>> +		se = get_seg_entry(sbi, segno);
>>> +		for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
>>> +			if (f2fs_test_bit(b, se->cur_valid_map)) {
>>> +				last_valid_block = START_BLOCK(sbi, segno) + b;
>>> +				break;
>>> +			}
>>> +		if (last_valid_block >= zone_block)
>>> +			break;
>>> +	}
>>> +
>>> +	/*
>>> +	 * If last valid block is beyond the write pointer, report the
>>> +	 * inconsistency. This inconsistency does not cause write error
>>> +	 * because the zone will not be selected for write operation until
>>> +	 * it get discarded. Just report it.
>>> +	 */
>>> +	if (last_valid_block >= wp_block) {
>>> +		f2fs_notice(sbi, "Valid block beyond write pointer: "
>>> +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
>>> +			    GET_SEGNO(sbi, last_valid_block),
>>> +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
>>> +			    wp_segno, wp_blkoff);
>>> +		return 0;
>>> +	}
>>> +
>>> +	/*
>>> +	 * If there is no valid block in the zone and if write pointer is
>>> +	 * not at zone start, report the error to run fsck.
>>
>> So we only need to report this as inconsistent status in the condition of
>> discard has been triggered, right? otherwise, f2fs will trigger discard later
>> to reset zone->wp before opening this zone?
> 
> Hmm, my intent was to catch the inconsistency at mount time, assuming the
> inconsistency is not expected at mount time. In other words, I assume that
> discard is triggered for zones without valid blocks before that last clean

IIUC, if there is too many pending discards, put_super() may drop discard entries
to avoid delaying umount, so we can not assume all discards are always being
triggered.

So what I mean is for the condition of a) there is valid (including fsycned) block,
b) zone->wp is not at correct position, f2fs can handle it by issuing discard. Let
me know if I misread this comment.

Thanks,

> umount. If the last sudden f2fs shutdown without clean umount caused the
> inconsistency, it should be reported and fixed, I think.
> 
> SIT valid blocks are referred to check if there is no valid blocks in the zone.
> SIT may be broken due to software bug or hardware flaw, then I think it is the
> better to run fsck rather than discard by f2fs.
> 
> If I miss anything, please let me know.
> 
> --
> Best Regards,
> Shin'ichiro Kawasaki
> 
>>
>> Thanks,
>>
>>> +	 */
>>> +	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
>>> +		f2fs_notice(sbi,
>>> +			    "Zone without valid block has non-zero write "
>>> +			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
>>> +			    wp_segno, wp_blkoff);
>>> +		f2fs_stop_checkpoint(sbi, true);
>>> +		set_sbi_flag(sbi, SBI_NEED_FSCK);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int check_dev_write_pointer(struct f2fs_sb_info *sbi,
>>> +				   struct f2fs_dev_info *fdev) {
>>> +	sector_t nr_sectors = fdev->bdev->bd_part->nr_sects;
>>> +	sector_t sector = 0;
>>> +	struct blk_zone *zones;
>>> +	unsigned int i, nr_zones;
>>> +	unsigned int n = 0;
>>> +	int err = -EIO;
>>> +
>>> +	if (!bdev_is_zoned(fdev->bdev))
>>> +		return 0;
>>> +
>>> +	zones = f2fs_kzalloc(sbi,
>>> +			     array_size(F2FS_REPORT_NR_ZONES,
>>> +					sizeof(struct blk_zone)),
>>> +			     GFP_KERNEL);
>>> +	if (!zones)
>>> +		return -ENOMEM;
>>> +
>>> +	/* Get block zones type */
>>> +	while (zones && sector < nr_sectors) {
>>> +
>>> +		nr_zones = F2FS_REPORT_NR_ZONES;
>>> +		err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones);
>>> +		if (err)
>>> +			break;
>>> +		if (!nr_zones) {
>>> +			err = -EIO;
>>> +			break;
>>> +		}
>>> +
>>> +		for (i = 0; i < nr_zones; i++) {
>>> +			if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
>>> +				err = check_zone_write_pointer(sbi, fdev,
>>> +							       &zones[i]);
>>> +				if (err)
>>> +					break;
>>> +			}
>>> +			sector += zones[i].len;
>>> +			n++;
>>> +		}
>>> +		if (err)
>>> +			break;
>>> +	}
>>> +
>>> +	kvfree(zones);
>>> +
>>> +	return err;
>>> +}
>>> +
>>>  static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
>>>  						  block_t zone_blkaddr)
>>>  {
>>> @@ -4399,6 +4526,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
>>>  		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
>>>  	allocate_segment_by_default(sbi, type, true);
>>>  
>>> +	/* check consistency of the zone curseg pointed to */
>>> +	if (check_zone_write_pointer(sbi, zbd, &zone))
>>> +		return -EIO;
>>> +
>>>  	/* check newly assigned zone */
>>>  	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
>>>  	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
>>> @@ -4444,11 +4575,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
>>>  
>>>  	return 0;
>>>  }
>>> +
>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
>>> +{
>>> +	int i, ret;
>>> +
>>> +	for (i = 0; i < sbi->s_ndevs; i++) {
>>> +		ret = check_dev_write_pointer(sbi, &FDEV(i));
>>> +		if (ret)
>>> +			return ret;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>>  #else
>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
>>>  {
>>>  	return 0;
>>>  }
>>> +
>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
>>> +{
>>> +	return 0;
>>> +}
>>>  #endif
>>>  
>>>  /*
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index 1443cee15863..8ca772670c67 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>>>  	if (!FDEV(devi).blkz_seq)
>>>  		return -ENOMEM;
>>>  
>>> -#define F2FS_REPORT_NR_ZONES   4096
>>> -
>>>  	zones = f2fs_kzalloc(sbi,
>>>  			     array_size(F2FS_REPORT_NR_ZONES,
>>>  					sizeof(struct blk_zone)),
>>> @@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>  
>>>  		err = f2fs_recover_fsync_data(sbi, false);
>>>  		if (err < 0) {
>>> -			if (err != -ENOMEM)
>>> +			if (err != -ENOMEM &&
>>> +			    !is_sbi_flag_set(sbi, SBI_NEED_FSCK))
>>>  				skip_recovery = true;
>>>  			need_fsck = true;
>>>  			f2fs_err(sbi, "Cannot recover all fsync data errno=%d",
>>> @@ -3525,6 +3524,17 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>  			goto free_meta;
>>>  		}
>>>  	}
>>> +
>>> +	/*
>>> +	 * If the f2fs is not readonly and fsync data recovery succeeds,
>>> +	 * check zoned block devices' write pointer consistency.
>>> +	 */
>>> +	if (!err && !f2fs_readonly(sb) && f2fs_sb_has_blkzoned(sbi)) {
>>> +		err = f2fs_check_write_pointer(sbi);
>>> +		if (err)
>>> +			goto free_meta;
>>> +	}
>>> +
>>>  reset_checkpoint:
>>>  	/* f2fs_recover_fsync_data() cleared this already */
>>>  	clear_sbi_flag(sbi, SBI_POR_DOING);
>>> .
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [PATCH v3 1/2] f2fs: Check write pointer consistency of open zones
  2019-11-28 12:26       ` Chao Yu
@ 2019-11-29  1:58         ` Shinichiro Kawasaki
  0 siblings, 0 replies; 14+ messages in thread
From: Shinichiro Kawasaki @ 2019-11-29  1:58 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 28, 2019 / 20:26, Chao Yu wrote:
> On 2019/11/28 12:07, Shinichiro Kawasaki wrote:
> > On Nov 25, 2019 / 14:59, Chao Yu wrote:
> >> On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
> >>> On sudden f2fs shutdown, write pointers of zoned block devices can go
> >>> further but f2fs meta data keeps current segments at positions before the
> >>> write operations. After remounting the f2fs, this inconsistency causes
> >>> write operations not at write pointers and "Unaligned write command"
> >>> error is reported.
> >>>
> >>> To avoid the error, compare current segments with write pointers of open
> >>> zones the current segments point to, during mount operation. If the write
> >>> pointer position is not aligned with the current segment position, assign
> >>> a new zone to the current segment. Also check the newly assigned zone has
> >>> write pointer at zone start. If not, make mount fail and ask users to run
> >>> fsck.
> >>>
> >>> Perform the consistency check during fsync recovery. Not to lose the
> >>> fsync data, do the check after fsync data gets restored and before
> >>> checkpoint commit which flushes data at current segment positions. Not to
> >>> cause conflict with kworker's dirfy data/node flush, do the fix within
> >>> SBI_POR_DOING protection.
> >>>
> >>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >>> ---
> >>>  fs/f2fs/f2fs.h     |   1 +
> >>>  fs/f2fs/recovery.c |  17 ++++++-
> >>>  fs/f2fs/segment.c  | 120 +++++++++++++++++++++++++++++++++++++++++++++
> >>>  3 files changed, 136 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>> index 4024790028aa..a2e24718c13b 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -3136,6 +3136,7 @@ void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
> >>>  int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> >>>  			unsigned int val, int alloc);
> >>>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> >>> +int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
> >>>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
> >>>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
> >>>  int __init f2fs_create_segment_manager_caches(void);
> >>> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> >>> index 783773e4560d..712054ed8d64 100644
> >>> --- a/fs/f2fs/recovery.c
> >>> +++ b/fs/f2fs/recovery.c
> >>> @@ -784,9 +784,22 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
> >>>  	if (err) {
> >>>  		truncate_inode_pages_final(NODE_MAPPING(sbi));
> >>>  		truncate_inode_pages_final(META_MAPPING(sbi));
> >>> -	} else {
> >>> -		clear_sbi_flag(sbi, SBI_POR_DOING);
> >>>  	}
> >>> +
> >>> +	/*
> >>> +	 * If fsync data succeeds or there is no fsync data to recover,
> >>> +	 * and the f2fs is not read only, check and fix zoned block devices'
> >>> +	 * write pointer consistency.
> >>> +	 */
> >>> +	if (!ret && !err && !f2fs_readonly(sbi->sb)
> >>
> >> Using !check_only will be more readable?
> >>
> >> if (!err && !check_only && !f2fs_readonly(sbi->sb)
> > 
> > When check_only is on and there is no fsync data, I think we should fix the
> > write pointer inconsistency. With the condition you suggested, this case can
> > not be covered.
> 
> Alright.
> 
> > 
> > Having said that, my expression with !ret is not good from readability point
> > of view. How about this?
> > 
> > 
> > bool fix_curseg_write_pointer;
> > fix_curseg_write_pointer = !check_only || (check_only && list_empty(&inode_list));
> 
> fix_curseg_write_pointer = !check_only || list_empty(&inode_list); is enough.

Oops, thanks.

> 
> > 
> > ...
> > 
> > if (!err && fix_curseg_write_pointer && !f2fs_readonly(sbi->sb)
> > 	&& f2fs_sb_has_blkzoned(sbi)) {
> > 	err = f2fs_fix_curseg_write_pointer(sbi);
> > 	ret = err;
> > }
> 
> It's okay to me.

Will update the patch. Thanks!

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-11-28 12:39       ` Chao Yu
@ 2019-11-29  5:21         ` Shinichiro Kawasaki
  2019-11-30  7:49           ` Chao Yu
  0 siblings, 1 reply; 14+ messages in thread
From: Shinichiro Kawasaki @ 2019-11-29  5:21 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 28, 2019 / 20:39, Chao Yu wrote:
> On 2019/11/28 13:31, Shinichiro Kawasaki wrote:
> > On Nov 25, 2019 / 15:37, Chao Yu wrote:
> >> On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
> >>> To catch f2fs bugs in write pointer handling code for zoned block
> >>> devices, check write pointers of non-open zones that current segments do
> >>> not point to. Do this check at mount time, after the fsync data recovery
> >>> and current segments' write pointer consistency fix. Or when fsync data
> >>> recovery is disabled by mount option, do the check when there is no fsync
> >>> data.
> >>>
> >>> Check two items comparing write pointers with valid block maps in SIT.
> >>> The first item is check for zones with no valid blocks. When there is no
> >>> valid blocks in a zone, the write pointer should be at the start of the
> >>> zone. If not, next write operation to the zone will cause unaligned write
> >>> error. If write pointer is not at the zone start, make mount fail and ask
> >>> users to run fsck.
> >>>
> >>> The second item is check between the write pointer position and the last
> >>> valid block in the zone. It is unexpected that the last valid block
> >>> position is beyond the write pointer. In such a case, report as a bug.
> >>> Fix is not required for such zone, because the zone is not selected for
> >>> next write operation until the zone get discarded.
> >>>
> >>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
> >>> in segment.c also.
> >>>
> >>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >>> ---
> >>>  fs/f2fs/f2fs.h    |   3 +
> >>>  fs/f2fs/segment.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++
> >>>  fs/f2fs/super.c   |  16 ++++-
> >>>  3 files changed, 165 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>> index a2e24718c13b..1bb64950d793 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> >>>  			unsigned int val, int alloc);
> >>>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> >>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
> >>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
> >>>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
> >>>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
> >>>  int __init f2fs_create_segment_manager_caches(void);
> >>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
> >>>  
> >>>  	return test_bit(zno, FDEV(devi).blkz_seq);
> >>>  }
> >>> +
> >>> +#define F2FS_REPORT_NR_ZONES   4096
> >>>  #endif
> >>>  
> >>>  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >>> index 6ece146dab34..29e3b6f62f8c 100644
> >>> --- a/fs/f2fs/segment.c
> >>> +++ b/fs/f2fs/segment.c
> >>> @@ -4333,6 +4333,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
> >>>  
> >>>  #ifdef CONFIG_BLK_DEV_ZONED
> >>>  
> >>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
> >>> +				    struct f2fs_dev_info *fdev,
> >>> +				    struct blk_zone *zone)
> >>> +{
> >>> +	unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
> >>> +	block_t zone_block, wp_block, last_valid_block;
> >>> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> >>> +	int i, s, b;
> >>> +	struct seg_entry *se;
> >>> +
> >>> +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
> >>> +	wp_segno = GET_SEGNO(sbi, wp_block);
> >>> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> >>> +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
> >>> +	zone_segno = GET_SEGNO(sbi, zone_block);
> >>> +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
> >>> +
> >>> +	if (zone_segno >= MAIN_SEGS(sbi))
> >>> +		return 0;
> >>> +
> >>> +	/*
> >>> +	 * Skip check of zones cursegs point to, since
> >>> +	 * fix_curseg_write_pointer() checks them.
> >>> +	 */
> >>> +	for (i = 0; i < NO_CHECK_TYPE; i++)
> >>> +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
> >>> +						   CURSEG_I(sbi, i)->segno))
> >>> +			return 0;
> >>> +
> >>> +	/*
> >>> +	 * Get last valid block of the zone.
> >>> +	 */
> >>> +	last_valid_block = zone_block - 1;
> >>> +	for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
> >>> +		segno = zone_segno + s;
> >>> +		se = get_seg_entry(sbi, segno);
> >>> +		for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
> >>> +			if (f2fs_test_bit(b, se->cur_valid_map)) {
> >>> +				last_valid_block = START_BLOCK(sbi, segno) + b;
> >>> +				break;
> >>> +			}
> >>> +		if (last_valid_block >= zone_block)
> >>> +			break;
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * If last valid block is beyond the write pointer, report the
> >>> +	 * inconsistency. This inconsistency does not cause write error
> >>> +	 * because the zone will not be selected for write operation until
> >>> +	 * it get discarded. Just report it.
> >>> +	 */
> >>> +	if (last_valid_block >= wp_block) {
> >>> +		f2fs_notice(sbi, "Valid block beyond write pointer: "
> >>> +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
> >>> +			    GET_SEGNO(sbi, last_valid_block),
> >>> +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
> >>> +			    wp_segno, wp_blkoff);
> >>> +		return 0;
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * If there is no valid block in the zone and if write pointer is
> >>> +	 * not at zone start, report the error to run fsck.
> >>
> >> So we only need to report this as inconsistent status in the condition of
> >> discard has been triggered, right? otherwise, f2fs will trigger discard later
> >> to reset zone->wp before opening this zone?
> > 
> > Hmm, my intent was to catch the inconsistency at mount time, assuming the
> > inconsistency is not expected at mount time. In other words, I assume that
> > discard is triggered for zones without valid blocks before that last clean
> 
> IIUC, if there is too many pending discards, put_super() may drop discard entries
> to avoid delaying umount, so we can not assume all discards are always being
> triggered.

I see. In this case, current code in the patch will miss-detect the zone with
the dropped discard entries. This is not good. Thank you for catching this :)

> 
> So what I mean is for the condition of a) there is valid (including fsycned) block,
> b) zone->wp is not at correct position, f2fs can handle it by issuing discard. Let
> me know if I misread this comment.

For the condition a), do you mean "there is _no_ valid (include fsynced) block"?
If so, yes, I agree that f2fs can issue discard and both a) and b) are true. I
can add a simple function call of "reset zone" to discard the zone.

> 
> > umount. If the last sudden f2fs shutdown without clean umount caused the
> > inconsistency, it should be reported and fixed, I think.
> > 
> > SIT valid blocks are referred to check if there is no valid blocks in the zone.
> > SIT may be broken due to software bug or hardware flaw, then I think it is the
> > better to run fsck rather than discard by f2fs.
> > 
> > If I miss anything, please let me know.
> > 
> > --
> > Best Regards,
> > Shin'ichiro Kawasaki
> > 
> >>
> >> Thanks,
> >>
> >>> +	 */
> >>> +	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
> >>> +		f2fs_notice(sbi,
> >>> +			    "Zone without valid block has non-zero write "
> >>> +			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
> >>> +			    wp_segno, wp_blkoff);
> >>> +		f2fs_stop_checkpoint(sbi, true);
> >>> +		set_sbi_flag(sbi, SBI_NEED_FSCK);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +static int check_dev_write_pointer(struct f2fs_sb_info *sbi,
> >>> +				   struct f2fs_dev_info *fdev) {
> >>> +	sector_t nr_sectors = fdev->bdev->bd_part->nr_sects;
> >>> +	sector_t sector = 0;
> >>> +	struct blk_zone *zones;
> >>> +	unsigned int i, nr_zones;
> >>> +	unsigned int n = 0;
> >>> +	int err = -EIO;
> >>> +
> >>> +	if (!bdev_is_zoned(fdev->bdev))
> >>> +		return 0;
> >>> +
> >>> +	zones = f2fs_kzalloc(sbi,
> >>> +			     array_size(F2FS_REPORT_NR_ZONES,
> >>> +					sizeof(struct blk_zone)),
> >>> +			     GFP_KERNEL);
> >>> +	if (!zones)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	/* Get block zones type */
> >>> +	while (zones && sector < nr_sectors) {
> >>> +
> >>> +		nr_zones = F2FS_REPORT_NR_ZONES;
> >>> +		err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones);
> >>> +		if (err)
> >>> +			break;
> >>> +		if (!nr_zones) {
> >>> +			err = -EIO;
> >>> +			break;
> >>> +		}
> >>> +
> >>> +		for (i = 0; i < nr_zones; i++) {
> >>> +			if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
> >>> +				err = check_zone_write_pointer(sbi, fdev,
> >>> +							       &zones[i]);
> >>> +				if (err)
> >>> +					break;
> >>> +			}
> >>> +			sector += zones[i].len;
> >>> +			n++;
> >>> +		}
> >>> +		if (err)
> >>> +			break;
> >>> +	}
> >>> +
> >>> +	kvfree(zones);
> >>> +
> >>> +	return err;
> >>> +}
> >>> +
> >>>  static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
> >>>  						  block_t zone_blkaddr)
> >>>  {
> >>> @@ -4399,6 +4526,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> >>>  		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
> >>>  	allocate_segment_by_default(sbi, type, true);
> >>>  
> >>> +	/* check consistency of the zone curseg pointed to */
> >>> +	if (check_zone_write_pointer(sbi, zbd, &zone))
> >>> +		return -EIO;
> >>> +
> >>>  	/* check newly assigned zone */
> >>>  	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
> >>>  	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
> >>> @@ -4444,11 +4575,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
> >>>  
> >>>  	return 0;
> >>>  }
> >>> +
> >>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
> >>> +{
> >>> +	int i, ret;
> >>> +
> >>> +	for (i = 0; i < sbi->s_ndevs; i++) {
> >>> +		ret = check_dev_write_pointer(sbi, &FDEV(i));
> >>> +		if (ret)
> >>> +			return ret;
> >>> +	}
> >>> +
> >>> +	return 0;
> >>> +}
> >>>  #else
> >>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
> >>>  {
> >>>  	return 0;
> >>>  }
> >>> +
> >>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
> >>> +{
> >>> +	return 0;
> >>> +}
> >>>  #endif
> >>>  
> >>>  /*
> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>> index 1443cee15863..8ca772670c67 100644
> >>> --- a/fs/f2fs/super.c
> >>> +++ b/fs/f2fs/super.c
> >>> @@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
> >>>  	if (!FDEV(devi).blkz_seq)
> >>>  		return -ENOMEM;
> >>>  
> >>> -#define F2FS_REPORT_NR_ZONES   4096
> >>> -
> >>>  	zones = f2fs_kzalloc(sbi,
> >>>  			     array_size(F2FS_REPORT_NR_ZONES,
> >>>  					sizeof(struct blk_zone)),
> >>> @@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>  
> >>>  		err = f2fs_recover_fsync_data(sbi, false);
> >>>  		if (err < 0) {
> >>> -			if (err != -ENOMEM)
> >>> +			if (err != -ENOMEM &&
> >>> +			    !is_sbi_flag_set(sbi, SBI_NEED_FSCK))
> >>>  				skip_recovery = true;
> >>>  			need_fsck = true;
> >>>  			f2fs_err(sbi, "Cannot recover all fsync data errno=%d",
> >>> @@ -3525,6 +3524,17 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>  			goto free_meta;
> >>>  		}
> >>>  	}
> >>> +
> >>> +	/*
> >>> +	 * If the f2fs is not readonly and fsync data recovery succeeds,
> >>> +	 * check zoned block devices' write pointer consistency.
> >>> +	 */
> >>> +	if (!err && !f2fs_readonly(sb) && f2fs_sb_has_blkzoned(sbi)) {
> >>> +		err = f2fs_check_write_pointer(sbi);
> >>> +		if (err)
> >>> +			goto free_meta;
> >>> +	}
> >>> +
> >>>  reset_checkpoint:
> >>>  	/* f2fs_recover_fsync_data() cleared this already */
> >>>  	clear_sbi_flag(sbi, SBI_POR_DOING);
> >>> .
> > 

--
Best Regards,
Shin'ichiro Kawasaki


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-11-29  5:21         ` Shinichiro Kawasaki
@ 2019-11-30  7:49           ` Chao Yu
  2019-12-02  1:38             ` Shinichiro Kawasaki
  0 siblings, 1 reply; 14+ messages in thread
From: Chao Yu @ 2019-11-30  7:49 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On 2019/11/29 13:21, Shinichiro Kawasaki wrote:
> On Nov 28, 2019 / 20:39, Chao Yu wrote:
>> On 2019/11/28 13:31, Shinichiro Kawasaki wrote:
>>> On Nov 25, 2019 / 15:37, Chao Yu wrote:
>>>> On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
>>>>> To catch f2fs bugs in write pointer handling code for zoned block
>>>>> devices, check write pointers of non-open zones that current segments do
>>>>> not point to. Do this check at mount time, after the fsync data recovery
>>>>> and current segments' write pointer consistency fix. Or when fsync data
>>>>> recovery is disabled by mount option, do the check when there is no fsync
>>>>> data.
>>>>>
>>>>> Check two items comparing write pointers with valid block maps in SIT.
>>>>> The first item is check for zones with no valid blocks. When there is no
>>>>> valid blocks in a zone, the write pointer should be at the start of the
>>>>> zone. If not, next write operation to the zone will cause unaligned write
>>>>> error. If write pointer is not at the zone start, make mount fail and ask
>>>>> users to run fsck.
>>>>>
>>>>> The second item is check between the write pointer position and the last
>>>>> valid block in the zone. It is unexpected that the last valid block
>>>>> position is beyond the write pointer. In such a case, report as a bug.
>>>>> Fix is not required for such zone, because the zone is not selected for
>>>>> next write operation until the zone get discarded.
>>>>>
>>>>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
>>>>> in segment.c also.
>>>>>
>>>>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>> ---
>>>>>  fs/f2fs/f2fs.h    |   3 +
>>>>>  fs/f2fs/segment.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  fs/f2fs/super.c   |  16 ++++-
>>>>>  3 files changed, 165 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>> index a2e24718c13b..1bb64950d793 100644
>>>>> --- a/fs/f2fs/f2fs.h
>>>>> +++ b/fs/f2fs/f2fs.h
>>>>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
>>>>>  			unsigned int val, int alloc);
>>>>>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
>>>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
>>>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
>>>>>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
>>>>>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
>>>>>  int __init f2fs_create_segment_manager_caches(void);
>>>>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
>>>>>  
>>>>>  	return test_bit(zno, FDEV(devi).blkz_seq);
>>>>>  }
>>>>> +
>>>>> +#define F2FS_REPORT_NR_ZONES   4096
>>>>>  #endif
>>>>>  
>>>>>  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>> index 6ece146dab34..29e3b6f62f8c 100644
>>>>> --- a/fs/f2fs/segment.c
>>>>> +++ b/fs/f2fs/segment.c
>>>>> @@ -4333,6 +4333,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
>>>>>  
>>>>>  #ifdef CONFIG_BLK_DEV_ZONED
>>>>>  
>>>>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
>>>>> +				    struct f2fs_dev_info *fdev,
>>>>> +				    struct blk_zone *zone)
>>>>> +{
>>>>> +	unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
>>>>> +	block_t zone_block, wp_block, last_valid_block;
>>>>> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
>>>>> +	int i, s, b;
>>>>> +	struct seg_entry *se;
>>>>> +
>>>>> +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
>>>>> +	wp_segno = GET_SEGNO(sbi, wp_block);
>>>>> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
>>>>> +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
>>>>> +	zone_segno = GET_SEGNO(sbi, zone_block);
>>>>> +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
>>>>> +
>>>>> +	if (zone_segno >= MAIN_SEGS(sbi))
>>>>> +		return 0;
>>>>> +
>>>>> +	/*
>>>>> +	 * Skip check of zones cursegs point to, since
>>>>> +	 * fix_curseg_write_pointer() checks them.
>>>>> +	 */
>>>>> +	for (i = 0; i < NO_CHECK_TYPE; i++)
>>>>> +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
>>>>> +						   CURSEG_I(sbi, i)->segno))
>>>>> +			return 0;
>>>>> +
>>>>> +	/*
>>>>> +	 * Get last valid block of the zone.
>>>>> +	 */
>>>>> +	last_valid_block = zone_block - 1;
>>>>> +	for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
>>>>> +		segno = zone_segno + s;
>>>>> +		se = get_seg_entry(sbi, segno);
>>>>> +		for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
>>>>> +			if (f2fs_test_bit(b, se->cur_valid_map)) {
>>>>> +				last_valid_block = START_BLOCK(sbi, segno) + b;
>>>>> +				break;
>>>>> +			}
>>>>> +		if (last_valid_block >= zone_block)
>>>>> +			break;
>>>>> +	}
>>>>> +
>>>>> +	/*
>>>>> +	 * If last valid block is beyond the write pointer, report the
>>>>> +	 * inconsistency. This inconsistency does not cause write error
>>>>> +	 * because the zone will not be selected for write operation until
>>>>> +	 * it get discarded. Just report it.
>>>>> +	 */
>>>>> +	if (last_valid_block >= wp_block) {
>>>>> +		f2fs_notice(sbi, "Valid block beyond write pointer: "
>>>>> +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
>>>>> +			    GET_SEGNO(sbi, last_valid_block),
>>>>> +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
>>>>> +			    wp_segno, wp_blkoff);
>>>>> +		return 0;
>>>>> +	}
>>>>> +
>>>>> +	/*
>>>>> +	 * If there is no valid block in the zone and if write pointer is
>>>>> +	 * not at zone start, report the error to run fsck.
>>>>
>>>> So we only need to report this as inconsistent status in the condition of
>>>> discard has been triggered, right? otherwise, f2fs will trigger discard later
>>>> to reset zone->wp before opening this zone?
>>>
>>> Hmm, my intent was to catch the inconsistency at mount time, assuming the
>>> inconsistency is not expected at mount time. In other words, I assume that
>>> discard is triggered for zones without valid blocks before that last clean
>>
>> IIUC, if there is too many pending discards, put_super() may drop discard entries
>> to avoid delaying umount, so we can not assume all discards are always being
>> triggered.
> 
> I see. In this case, current code in the patch will miss-detect the zone with
> the dropped discard entries. This is not good. Thank you for catching this :)
> 
>>
>> So what I mean is for the condition of a) there is valid (including fsycned) block,
>> b) zone->wp is not at correct position, f2fs can handle it by issuing discard. Let
>> me know if I misread this comment.
> 
> For the condition a), do you mean "there is _no_ valid (include fsynced) block"?

Oops, yes, I meant that. :)

Thanks,

> If so, yes, I agree that f2fs can issue discard and both a) and b) are true. I
> can add a simple function call of "reset zone" to discard the zone.
> 
>>
>>> umount. If the last sudden f2fs shutdown without clean umount caused the
>>> inconsistency, it should be reported and fixed, I think.
>>>
>>> SIT valid blocks are referred to check if there is no valid blocks in the zone.
>>> SIT may be broken due to software bug or hardware flaw, then I think it is the
>>> better to run fsck rather than discard by f2fs.
>>>
>>> If I miss anything, please let me know.
>>>
>>> --
>>> Best Regards,
>>> Shin'ichiro Kawasaki
>>>
>>>>
>>>> Thanks,
>>>>
>>>>> +	 */
>>>>> +	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
>>>>> +		f2fs_notice(sbi,
>>>>> +			    "Zone without valid block has non-zero write "
>>>>> +			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
>>>>> +			    wp_segno, wp_blkoff);
>>>>> +		f2fs_stop_checkpoint(sbi, true);
>>>>> +		set_sbi_flag(sbi, SBI_NEED_FSCK);
>>>>> +		return -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>> +static int check_dev_write_pointer(struct f2fs_sb_info *sbi,
>>>>> +				   struct f2fs_dev_info *fdev) {
>>>>> +	sector_t nr_sectors = fdev->bdev->bd_part->nr_sects;
>>>>> +	sector_t sector = 0;
>>>>> +	struct blk_zone *zones;
>>>>> +	unsigned int i, nr_zones;
>>>>> +	unsigned int n = 0;
>>>>> +	int err = -EIO;
>>>>> +
>>>>> +	if (!bdev_is_zoned(fdev->bdev))
>>>>> +		return 0;
>>>>> +
>>>>> +	zones = f2fs_kzalloc(sbi,
>>>>> +			     array_size(F2FS_REPORT_NR_ZONES,
>>>>> +					sizeof(struct blk_zone)),
>>>>> +			     GFP_KERNEL);
>>>>> +	if (!zones)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	/* Get block zones type */
>>>>> +	while (zones && sector < nr_sectors) {
>>>>> +
>>>>> +		nr_zones = F2FS_REPORT_NR_ZONES;
>>>>> +		err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones);
>>>>> +		if (err)
>>>>> +			break;
>>>>> +		if (!nr_zones) {
>>>>> +			err = -EIO;
>>>>> +			break;
>>>>> +		}
>>>>> +
>>>>> +		for (i = 0; i < nr_zones; i++) {
>>>>> +			if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
>>>>> +				err = check_zone_write_pointer(sbi, fdev,
>>>>> +							       &zones[i]);
>>>>> +				if (err)
>>>>> +					break;
>>>>> +			}
>>>>> +			sector += zones[i].len;
>>>>> +			n++;
>>>>> +		}
>>>>> +		if (err)
>>>>> +			break;
>>>>> +	}
>>>>> +
>>>>> +	kvfree(zones);
>>>>> +
>>>>> +	return err;
>>>>> +}
>>>>> +
>>>>>  static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
>>>>>  						  block_t zone_blkaddr)
>>>>>  {
>>>>> @@ -4399,6 +4526,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
>>>>>  		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
>>>>>  	allocate_segment_by_default(sbi, type, true);
>>>>>  
>>>>> +	/* check consistency of the zone curseg pointed to */
>>>>> +	if (check_zone_write_pointer(sbi, zbd, &zone))
>>>>> +		return -EIO;
>>>>> +
>>>>>  	/* check newly assigned zone */
>>>>>  	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
>>>>>  	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
>>>>> @@ -4444,11 +4575,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
>>>>>  
>>>>>  	return 0;
>>>>>  }
>>>>> +
>>>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
>>>>> +{
>>>>> +	int i, ret;
>>>>> +
>>>>> +	for (i = 0; i < sbi->s_ndevs; i++) {
>>>>> +		ret = check_dev_write_pointer(sbi, &FDEV(i));
>>>>> +		if (ret)
>>>>> +			return ret;
>>>>> +	}
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>>  #else
>>>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
>>>>>  {
>>>>>  	return 0;
>>>>>  }
>>>>> +
>>>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
>>>>> +{
>>>>> +	return 0;
>>>>> +}
>>>>>  #endif
>>>>>  
>>>>>  /*
>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>>> index 1443cee15863..8ca772670c67 100644
>>>>> --- a/fs/f2fs/super.c
>>>>> +++ b/fs/f2fs/super.c
>>>>> @@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
>>>>>  	if (!FDEV(devi).blkz_seq)
>>>>>  		return -ENOMEM;
>>>>>  
>>>>> -#define F2FS_REPORT_NR_ZONES   4096
>>>>> -
>>>>>  	zones = f2fs_kzalloc(sbi,
>>>>>  			     array_size(F2FS_REPORT_NR_ZONES,
>>>>>  					sizeof(struct blk_zone)),
>>>>> @@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>  
>>>>>  		err = f2fs_recover_fsync_data(sbi, false);
>>>>>  		if (err < 0) {
>>>>> -			if (err != -ENOMEM)
>>>>> +			if (err != -ENOMEM &&
>>>>> +			    !is_sbi_flag_set(sbi, SBI_NEED_FSCK))
>>>>>  				skip_recovery = true;
>>>>>  			need_fsck = true;
>>>>>  			f2fs_err(sbi, "Cannot recover all fsync data errno=%d",
>>>>> @@ -3525,6 +3524,17 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>  			goto free_meta;
>>>>>  		}
>>>>>  	}
>>>>> +
>>>>> +	/*
>>>>> +	 * If the f2fs is not readonly and fsync data recovery succeeds,
>>>>> +	 * check zoned block devices' write pointer consistency.
>>>>> +	 */
>>>>> +	if (!err && !f2fs_readonly(sb) && f2fs_sb_has_blkzoned(sbi)) {
>>>>> +		err = f2fs_check_write_pointer(sbi);
>>>>> +		if (err)
>>>>> +			goto free_meta;
>>>>> +	}
>>>>> +
>>>>>  reset_checkpoint:
>>>>>  	/* f2fs_recover_fsync_data() cleared this already */
>>>>>  	clear_sbi_flag(sbi, SBI_POR_DOING);
>>>>> .
>>>
> 
> --
> Best Regards,
> Shin'ichiro Kawasaki
> .
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-11-30  7:49           ` Chao Yu
@ 2019-12-02  1:38             ` Shinichiro Kawasaki
  2019-12-02  9:51               ` Shinichiro Kawasaki
  0 siblings, 1 reply; 14+ messages in thread
From: Shinichiro Kawasaki @ 2019-12-02  1:38 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 30, 2019 / 15:49, Chao Yu wrote:
> On 2019/11/29 13:21, Shinichiro Kawasaki wrote:
> > On Nov 28, 2019 / 20:39, Chao Yu wrote:
> >> On 2019/11/28 13:31, Shinichiro Kawasaki wrote:
> >>> On Nov 25, 2019 / 15:37, Chao Yu wrote:
> >>>> On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
> >>>>> To catch f2fs bugs in write pointer handling code for zoned block
> >>>>> devices, check write pointers of non-open zones that current segments do
> >>>>> not point to. Do this check at mount time, after the fsync data recovery
> >>>>> and current segments' write pointer consistency fix. Or when fsync data
> >>>>> recovery is disabled by mount option, do the check when there is no fsync
> >>>>> data.
> >>>>>
> >>>>> Check two items comparing write pointers with valid block maps in SIT.
> >>>>> The first item is check for zones with no valid blocks. When there is no
> >>>>> valid blocks in a zone, the write pointer should be at the start of the
> >>>>> zone. If not, next write operation to the zone will cause unaligned write
> >>>>> error. If write pointer is not at the zone start, make mount fail and ask
> >>>>> users to run fsck.
> >>>>>
> >>>>> The second item is check between the write pointer position and the last
> >>>>> valid block in the zone. It is unexpected that the last valid block
> >>>>> position is beyond the write pointer. In such a case, report as a bug.
> >>>>> Fix is not required for such zone, because the zone is not selected for
> >>>>> next write operation until the zone get discarded.
> >>>>>
> >>>>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
> >>>>> in segment.c also.
> >>>>>
> >>>>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >>>>> ---
> >>>>>  fs/f2fs/f2fs.h    |   3 +
> >>>>>  fs/f2fs/segment.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>  fs/f2fs/super.c   |  16 ++++-
> >>>>>  3 files changed, 165 insertions(+), 3 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>>>> index a2e24718c13b..1bb64950d793 100644
> >>>>> --- a/fs/f2fs/f2fs.h
> >>>>> +++ b/fs/f2fs/f2fs.h
> >>>>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> >>>>>  			unsigned int val, int alloc);
> >>>>>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> >>>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
> >>>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
> >>>>>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
> >>>>>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
> >>>>>  int __init f2fs_create_segment_manager_caches(void);
> >>>>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
> >>>>>  
> >>>>>  	return test_bit(zno, FDEV(devi).blkz_seq);
> >>>>>  }
> >>>>> +
> >>>>> +#define F2FS_REPORT_NR_ZONES   4096
> >>>>>  #endif
> >>>>>  
> >>>>>  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
> >>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >>>>> index 6ece146dab34..29e3b6f62f8c 100644
> >>>>> --- a/fs/f2fs/segment.c
> >>>>> +++ b/fs/f2fs/segment.c
> >>>>> @@ -4333,6 +4333,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
> >>>>>  
> >>>>>  #ifdef CONFIG_BLK_DEV_ZONED
> >>>>>  
> >>>>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
> >>>>> +				    struct f2fs_dev_info *fdev,
> >>>>> +				    struct blk_zone *zone)
> >>>>> +{
> >>>>> +	unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
> >>>>> +	block_t zone_block, wp_block, last_valid_block;
> >>>>> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> >>>>> +	int i, s, b;
> >>>>> +	struct seg_entry *se;
> >>>>> +
> >>>>> +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
> >>>>> +	wp_segno = GET_SEGNO(sbi, wp_block);
> >>>>> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> >>>>> +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
> >>>>> +	zone_segno = GET_SEGNO(sbi, zone_block);
> >>>>> +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
> >>>>> +
> >>>>> +	if (zone_segno >= MAIN_SEGS(sbi))
> >>>>> +		return 0;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * Skip check of zones cursegs point to, since
> >>>>> +	 * fix_curseg_write_pointer() checks them.
> >>>>> +	 */
> >>>>> +	for (i = 0; i < NO_CHECK_TYPE; i++)
> >>>>> +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
> >>>>> +						   CURSEG_I(sbi, i)->segno))
> >>>>> +			return 0;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * Get last valid block of the zone.
> >>>>> +	 */
> >>>>> +	last_valid_block = zone_block - 1;
> >>>>> +	for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
> >>>>> +		segno = zone_segno + s;
> >>>>> +		se = get_seg_entry(sbi, segno);
> >>>>> +		for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
> >>>>> +			if (f2fs_test_bit(b, se->cur_valid_map)) {
> >>>>> +				last_valid_block = START_BLOCK(sbi, segno) + b;
> >>>>> +				break;
> >>>>> +			}
> >>>>> +		if (last_valid_block >= zone_block)
> >>>>> +			break;
> >>>>> +	}
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * If last valid block is beyond the write pointer, report the
> >>>>> +	 * inconsistency. This inconsistency does not cause write error
> >>>>> +	 * because the zone will not be selected for write operation until
> >>>>> +	 * it get discarded. Just report it.
> >>>>> +	 */
> >>>>> +	if (last_valid_block >= wp_block) {
> >>>>> +		f2fs_notice(sbi, "Valid block beyond write pointer: "
> >>>>> +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
> >>>>> +			    GET_SEGNO(sbi, last_valid_block),
> >>>>> +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
> >>>>> +			    wp_segno, wp_blkoff);
> >>>>> +		return 0;
> >>>>> +	}
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * If there is no valid block in the zone and if write pointer is
> >>>>> +	 * not at zone start, report the error to run fsck.
> >>>>
> >>>> So we only need to report this as inconsistent status in the condition of
> >>>> discard has been triggered, right? otherwise, f2fs will trigger discard later
> >>>> to reset zone->wp before opening this zone?
> >>>
> >>> Hmm, my intent was to catch the inconsistency at mount time, assuming the
> >>> inconsistency is not expected at mount time. In other words, I assume that
> >>> discard is triggered for zones without valid blocks before that last clean
> >>
> >> IIUC, if there is too many pending discards, put_super() may drop discard entries
> >> to avoid delaying umount, so we can not assume all discards are always being
> >> triggered.
> > 
> > I see. In this case, current code in the patch will miss-detect the zone with
> > the dropped discard entries. This is not good. Thank you for catching this :)
> > 
> >>
> >> So what I mean is for the condition of a) there is valid (including fsycned) block,
> >> b) zone->wp is not at correct position, f2fs can handle it by issuing discard. Let
> >> me know if I misread this comment.
> > 
> > For the condition a), do you mean "there is _no_ valid (include fsynced) block"?
> 
> Oops, yes, I meant that. :)

OK, will add the discard by reset zone and will post next version. Thanks.

Also, I noticed that this patch series have conflicts with the latest linux
kernel master branch because of the commit d41003513e61 "block: rework zone
reporting", which changed interface of blkdev_report_zones() function.
Will rebase to the master branch and resolve the conflicts.

--
best Regards,
Shin'ichiro Kawasaki

> 
> Thanks,
> 
> > If so, yes, I agree that f2fs can issue discard and both a) and b) are true. I
> > can add a simple function call of "reset zone" to discard the zone.
> > 
> >>
> >>> umount. If the last sudden f2fs shutdown without clean umount caused the
> >>> inconsistency, it should be reported and fixed, I think.
> >>>
> >>> SIT valid blocks are referred to check if there is no valid blocks in the zone.
> >>> SIT may be broken due to software bug or hardware flaw, then I think it is the
> >>> better to run fsck rather than discard by f2fs.
> >>>
> >>> If I miss anything, please let me know.
> >>>
> >>> --
> >>> Best Regards,
> >>> Shin'ichiro Kawasaki
> >>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>>> +	 */
> >>>>> +	if (last_valid_block + 1 == zone_block && zone->wp != zone->start) {
> >>>>> +		f2fs_notice(sbi,
> >>>>> +			    "Zone without valid block has non-zero write "
> >>>>> +			    "pointer, run fsck to fix: wp[0x%x,0x%x]",
> >>>>> +			    wp_segno, wp_blkoff);
> >>>>> +		f2fs_stop_checkpoint(sbi, true);
> >>>>> +		set_sbi_flag(sbi, SBI_NEED_FSCK);
> >>>>> +		return -EINVAL;
> >>>>> +	}
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +
> >>>>> +static int check_dev_write_pointer(struct f2fs_sb_info *sbi,
> >>>>> +				   struct f2fs_dev_info *fdev) {
> >>>>> +	sector_t nr_sectors = fdev->bdev->bd_part->nr_sects;
> >>>>> +	sector_t sector = 0;
> >>>>> +	struct blk_zone *zones;
> >>>>> +	unsigned int i, nr_zones;
> >>>>> +	unsigned int n = 0;
> >>>>> +	int err = -EIO;
> >>>>> +
> >>>>> +	if (!bdev_is_zoned(fdev->bdev))
> >>>>> +		return 0;
> >>>>> +
> >>>>> +	zones = f2fs_kzalloc(sbi,
> >>>>> +			     array_size(F2FS_REPORT_NR_ZONES,
> >>>>> +					sizeof(struct blk_zone)),
> >>>>> +			     GFP_KERNEL);
> >>>>> +	if (!zones)
> >>>>> +		return -ENOMEM;
> >>>>> +
> >>>>> +	/* Get block zones type */
> >>>>> +	while (zones && sector < nr_sectors) {
> >>>>> +
> >>>>> +		nr_zones = F2FS_REPORT_NR_ZONES;
> >>>>> +		err = blkdev_report_zones(fdev->bdev, sector, zones, &nr_zones);
> >>>>> +		if (err)
> >>>>> +			break;
> >>>>> +		if (!nr_zones) {
> >>>>> +			err = -EIO;
> >>>>> +			break;
> >>>>> +		}
> >>>>> +
> >>>>> +		for (i = 0; i < nr_zones; i++) {
> >>>>> +			if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
> >>>>> +				err = check_zone_write_pointer(sbi, fdev,
> >>>>> +							       &zones[i]);
> >>>>> +				if (err)
> >>>>> +					break;
> >>>>> +			}
> >>>>> +			sector += zones[i].len;
> >>>>> +			n++;
> >>>>> +		}
> >>>>> +		if (err)
> >>>>> +			break;
> >>>>> +	}
> >>>>> +
> >>>>> +	kvfree(zones);
> >>>>> +
> >>>>> +	return err;
> >>>>> +}
> >>>>> +
> >>>>>  static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
> >>>>>  						  block_t zone_blkaddr)
> >>>>>  {
> >>>>> @@ -4399,6 +4526,10 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> >>>>>  		    "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
> >>>>>  	allocate_segment_by_default(sbi, type, true);
> >>>>>  
> >>>>> +	/* check consistency of the zone curseg pointed to */
> >>>>> +	if (check_zone_write_pointer(sbi, zbd, &zone))
> >>>>> +		return -EIO;
> >>>>> +
> >>>>>  	/* check newly assigned zone */
> >>>>>  	cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
> >>>>>  	cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
> >>>>> @@ -4444,11 +4575,29 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
> >>>>>  
> >>>>>  	return 0;
> >>>>>  }
> >>>>> +
> >>>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
> >>>>> +{
> >>>>> +	int i, ret;
> >>>>> +
> >>>>> +	for (i = 0; i < sbi->s_ndevs; i++) {
> >>>>> +		ret = check_dev_write_pointer(sbi, &FDEV(i));
> >>>>> +		if (ret)
> >>>>> +			return ret;
> >>>>> +	}
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>>  #else
> >>>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
> >>>>>  {
> >>>>>  	return 0;
> >>>>>  }
> >>>>> +
> >>>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
> >>>>> +{
> >>>>> +	return 0;
> >>>>> +}
> >>>>>  #endif
> >>>>>  
> >>>>>  /*
> >>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>>>> index 1443cee15863..8ca772670c67 100644
> >>>>> --- a/fs/f2fs/super.c
> >>>>> +++ b/fs/f2fs/super.c
> >>>>> @@ -2890,8 +2890,6 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
> >>>>>  	if (!FDEV(devi).blkz_seq)
> >>>>>  		return -ENOMEM;
> >>>>>  
> >>>>> -#define F2FS_REPORT_NR_ZONES   4096
> >>>>> -
> >>>>>  	zones = f2fs_kzalloc(sbi,
> >>>>>  			     array_size(F2FS_REPORT_NR_ZONES,
> >>>>>  					sizeof(struct blk_zone)),
> >>>>> @@ -3509,7 +3507,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>  
> >>>>>  		err = f2fs_recover_fsync_data(sbi, false);
> >>>>>  		if (err < 0) {
> >>>>> -			if (err != -ENOMEM)
> >>>>> +			if (err != -ENOMEM &&
> >>>>> +			    !is_sbi_flag_set(sbi, SBI_NEED_FSCK))
> >>>>>  				skip_recovery = true;
> >>>>>  			need_fsck = true;
> >>>>>  			f2fs_err(sbi, "Cannot recover all fsync data errno=%d",
> >>>>> @@ -3525,6 +3524,17 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>  			goto free_meta;
> >>>>>  		}
> >>>>>  	}
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * If the f2fs is not readonly and fsync data recovery succeeds,
> >>>>> +	 * check zoned block devices' write pointer consistency.
> >>>>> +	 */
> >>>>> +	if (!err && !f2fs_readonly(sb) && f2fs_sb_has_blkzoned(sbi)) {
> >>>>> +		err = f2fs_check_write_pointer(sbi);
> >>>>> +		if (err)
> >>>>> +			goto free_meta;
> >>>>> +	}
> >>>>> +
> >>>>>  reset_checkpoint:
> >>>>>  	/* f2fs_recover_fsync_data() cleared this already */
> >>>>>  	clear_sbi_flag(sbi, SBI_POR_DOING);
> >>>>> .
> >>>
> > 
> > --
> > Best Regards,
> > Shin'ichiro Kawasaki
> > .
> > 

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones
  2019-12-02  1:38             ` Shinichiro Kawasaki
@ 2019-12-02  9:51               ` Shinichiro Kawasaki
  0 siblings, 0 replies; 14+ messages in thread
From: Shinichiro Kawasaki @ 2019-12-02  9:51 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Dec 02, 2019 / 10:38, Shin'ichiro Kawasaki wrote:
> On Nov 30, 2019 / 15:49, Chao Yu wrote:
> > On 2019/11/29 13:21, Shinichiro Kawasaki wrote:
> > > On Nov 28, 2019 / 20:39, Chao Yu wrote:
> > >> On 2019/11/28 13:31, Shinichiro Kawasaki wrote:
> > >>> On Nov 25, 2019 / 15:37, Chao Yu wrote:
> > >>>> On 2019/11/14 16:19, Shin'ichiro Kawasaki wrote:
> > >>>>> To catch f2fs bugs in write pointer handling code for zoned block
> > >>>>> devices, check write pointers of non-open zones that current segments do
> > >>>>> not point to. Do this check at mount time, after the fsync data recovery
> > >>>>> and current segments' write pointer consistency fix. Or when fsync data
> > >>>>> recovery is disabled by mount option, do the check when there is no fsync
> > >>>>> data.
> > >>>>>
> > >>>>> Check two items comparing write pointers with valid block maps in SIT.
> > >>>>> The first item is check for zones with no valid blocks. When there is no
> > >>>>> valid blocks in a zone, the write pointer should be at the start of the
> > >>>>> zone. If not, next write operation to the zone will cause unaligned write
> > >>>>> error. If write pointer is not at the zone start, make mount fail and ask
> > >>>>> users to run fsck.
> > >>>>>
> > >>>>> The second item is check between the write pointer position and the last
> > >>>>> valid block in the zone. It is unexpected that the last valid block
> > >>>>> position is beyond the write pointer. In such a case, report as a bug.
> > >>>>> Fix is not required for such zone, because the zone is not selected for
> > >>>>> next write operation until the zone get discarded.
> > >>>>>
> > >>>>> Also move a constant F2FS_REPORT_ZONE from super.c to f2fs.h to use it
> > >>>>> in segment.c also.
> > >>>>>
> > >>>>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > >>>>> ---
> > >>>>>  fs/f2fs/f2fs.h    |   3 +
> > >>>>>  fs/f2fs/segment.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++
> > >>>>>  fs/f2fs/super.c   |  16 ++++-
> > >>>>>  3 files changed, 165 insertions(+), 3 deletions(-)
> > >>>>>
> > >>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > >>>>> index a2e24718c13b..1bb64950d793 100644
> > >>>>> --- a/fs/f2fs/f2fs.h
> > >>>>> +++ b/fs/f2fs/f2fs.h
> > >>>>> @@ -3137,6 +3137,7 @@ int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> > >>>>>  			unsigned int val, int alloc);
> > >>>>>  void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
> > >>>>>  int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
> > >>>>> +int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
> > >>>>>  int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
> > >>>>>  void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
> > >>>>>  int __init f2fs_create_segment_manager_caches(void);
> > >>>>> @@ -3610,6 +3611,8 @@ static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
> > >>>>>  
> > >>>>>  	return test_bit(zno, FDEV(devi).blkz_seq);
> > >>>>>  }
> > >>>>> +
> > >>>>> +#define F2FS_REPORT_NR_ZONES   4096
> > >>>>>  #endif
> > >>>>>  
> > >>>>>  static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
> > >>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > >>>>> index 6ece146dab34..29e3b6f62f8c 100644
> > >>>>> --- a/fs/f2fs/segment.c
> > >>>>> +++ b/fs/f2fs/segment.c
> > >>>>> @@ -4333,6 +4333,133 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
> > >>>>>  
> > >>>>>  #ifdef CONFIG_BLK_DEV_ZONED
> > >>>>>  
> > >>>>> +static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
> > >>>>> +				    struct f2fs_dev_info *fdev,
> > >>>>> +				    struct blk_zone *zone)
> > >>>>> +{
> > >>>>> +	unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
> > >>>>> +	block_t zone_block, wp_block, last_valid_block;
> > >>>>> +	unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> > >>>>> +	int i, s, b;
> > >>>>> +	struct seg_entry *se;
> > >>>>> +
> > >>>>> +	wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
> > >>>>> +	wp_segno = GET_SEGNO(sbi, wp_block);
> > >>>>> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> > >>>>> +	zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
> > >>>>> +	zone_segno = GET_SEGNO(sbi, zone_block);
> > >>>>> +	zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
> > >>>>> +
> > >>>>> +	if (zone_segno >= MAIN_SEGS(sbi))
> > >>>>> +		return 0;
> > >>>>> +
> > >>>>> +	/*
> > >>>>> +	 * Skip check of zones cursegs point to, since
> > >>>>> +	 * fix_curseg_write_pointer() checks them.
> > >>>>> +	 */
> > >>>>> +	for (i = 0; i < NO_CHECK_TYPE; i++)
> > >>>>> +		if (zone_secno == GET_SEC_FROM_SEG(sbi,
> > >>>>> +						   CURSEG_I(sbi, i)->segno))
> > >>>>> +			return 0;
> > >>>>> +
> > >>>>> +	/*
> > >>>>> +	 * Get last valid block of the zone.
> > >>>>> +	 */
> > >>>>> +	last_valid_block = zone_block - 1;
> > >>>>> +	for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
> > >>>>> +		segno = zone_segno + s;
> > >>>>> +		se = get_seg_entry(sbi, segno);
> > >>>>> +		for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
> > >>>>> +			if (f2fs_test_bit(b, se->cur_valid_map)) {
> > >>>>> +				last_valid_block = START_BLOCK(sbi, segno) + b;
> > >>>>> +				break;
> > >>>>> +			}
> > >>>>> +		if (last_valid_block >= zone_block)
> > >>>>> +			break;
> > >>>>> +	}
> > >>>>> +
> > >>>>> +	/*
> > >>>>> +	 * If last valid block is beyond the write pointer, report the
> > >>>>> +	 * inconsistency. This inconsistency does not cause write error
> > >>>>> +	 * because the zone will not be selected for write operation until
> > >>>>> +	 * it get discarded. Just report it.
> > >>>>> +	 */
> > >>>>> +	if (last_valid_block >= wp_block) {
> > >>>>> +		f2fs_notice(sbi, "Valid block beyond write pointer: "
> > >>>>> +			    "valid block[0x%x,0x%x] wp[0x%x,0x%x]",
> > >>>>> +			    GET_SEGNO(sbi, last_valid_block),
> > >>>>> +			    GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
> > >>>>> +			    wp_segno, wp_blkoff);
> > >>>>> +		return 0;
> > >>>>> +	}
> > >>>>> +
> > >>>>> +	/*
> > >>>>> +	 * If there is no valid block in the zone and if write pointer is
> > >>>>> +	 * not at zone start, report the error to run fsck.
> > >>>>
> > >>>> So we only need to report this as inconsistent status in the condition of
> > >>>> discard has been triggered, right? otherwise, f2fs will trigger discard later
> > >>>> to reset zone->wp before opening this zone?
> > >>>
> > >>> Hmm, my intent was to catch the inconsistency at mount time, assuming the
> > >>> inconsistency is not expected at mount time. In other words, I assume that
> > >>> discard is triggered for zones without valid blocks before that last clean
> > >>
> > >> IIUC, if there is too many pending discards, put_super() may drop discard entries
> > >> to avoid delaying umount, so we can not assume all discards are always being
> > >> triggered.
> > > 
> > > I see. In this case, current code in the patch will miss-detect the zone with
> > > the dropped discard entries. This is not good. Thank you for catching this :)
> > > 
> > >>
> > >> So what I mean is for the condition of a) there is valid (including fsycned) block,
> > >> b) zone->wp is not at correct position, f2fs can handle it by issuing discard. Let
> > >> me know if I misread this comment.
> > > 
> > > For the condition a), do you mean "there is _no_ valid (include fsynced) block"?
> > 
> > Oops, yes, I meant that. :)
> 
> OK, will add the discard by reset zone and will post next version. Thanks.
> 
> Also, I noticed that this patch series have conflicts with the latest linux
> kernel master branch because of the commit d41003513e61 "block: rework zone
> reporting", which changed interface of blkdev_report_zones() function.
> Will rebase to the master branch and resolve the conflicts.

I have sent out the v4 series. As always, review will be appreciated.

I noticed that the conditions a) and b) can be applied to a check logic in the
1st patch also. It checks that the newly assigned zones to cursegs have write
pointers at zone start. If not, the v3 patch reports an error for fsck run. I
replaced the error report with discard by reset zone in same manner as the 2nd
patch. With this change, no longer need to ask fsck run. A little bit simpler.

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-14  8:19 [f2fs-dev] [PATCH v3 0/2] f2fs: Check write pointers of zoned block devices Shin'ichiro Kawasaki
2019-11-14  8:19 ` [f2fs-dev] [PATCH v3 1/2] f2fs: Check write pointer consistency of open zones Shin'ichiro Kawasaki
2019-11-25  6:59   ` Chao Yu
2019-11-28  4:07     ` Shinichiro Kawasaki
2019-11-28 12:26       ` Chao Yu
2019-11-29  1:58         ` Shinichiro Kawasaki
2019-11-14  8:19 ` [f2fs-dev] [PATCH v3 2/2] f2fs: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
2019-11-25  7:37   ` Chao Yu
2019-11-28  5:31     ` Shinichiro Kawasaki
2019-11-28 12:39       ` Chao Yu
2019-11-29  5:21         ` Shinichiro Kawasaki
2019-11-30  7:49           ` Chao Yu
2019-12-02  1:38             ` Shinichiro Kawasaki
2019-12-02  9:51               ` Shinichiro Kawasaki

Linux-f2fs-devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-f2fs-devel/0 linux-f2fs-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-f2fs-devel linux-f2fs-devel/ https://lore.kernel.org/linux-f2fs-devel \
		linux-f2fs-devel@lists.sourceforge.net
	public-inbox-index linux-f2fs-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/net.sourceforge.lists.linux-f2fs-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git