linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed
* [f2fs-dev] [PATCH v6 0/8] fsck: Check write pointers of zoned block devices
@ 2019-10-28  6:55 Shin'ichiro Kawasaki
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 1/8] libf2fs_zoned: Introduce f2fs_report_zones() helper function Shin'ichiro Kawasaki
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:55 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

On sudden f2fs shutdown, zoned block device status and f2fs meta data can be
inconsistent. When f2fs shutdown happens during write operations, write pointers
on the device go forward but the f2fs meta data does not reflect the write
pointer progress. This inconsistency will eventually cause "Unaligned write
command" error when restarting write operation after the next mount.

This error is observed with xfstests test case generic/388, which enforces
sudden shutdown during write operation and checks the file system recovery.

This patch series adds a feature to fsck.f2fs to check and fix the
inconsistency. Per discussion on the list, implement two checks. The first check
is for open zones that current segments point to. Check write pointer
consistency with current segment positions recorded in CP, and if they are
inconsistent, assign a new zone to the current segment. The second check is for
non-open zones that current segments do not point to. Check write pointer
consistency with valid block maps recorded in SIT.

Reflect fsync data blocks in these checks. If fsync data exists and current
segments point to zones with fsync data, keep the fsync data and the current
segments untouched so that kernel can recover the fsync data. Another patch
series for kernel is being posted to check and fix write pointer consistency
after fsync data recovery.

Have fsck check and fix the consistency twice. The first fix is at the beginning
of fsck so that write by fsck for fix do not fail with unaligned write command
error. The second fix is at the end of the fsck to reflect SIT valid block maps
updates by fsck.

The first three patches add three helper functions to call report zone and reset
zone commands to zoned block devices. Next three patches modify existing fsck
functions to meet zoned block devices' requirements. The last two patches add
the two checks for write pointer consistency.

Thank goes to Chao Yu for the detailed discussion on the list.

Changes from v5:
* 1st-3rd patch: Reflected review comments on helper functions
* 8th patch: Ensure zones are in main segments and removed errno print

Changes from v4:
* Renewed the series based on design discussion on the list

Changes from v3:
* Set curseg position at a new zone start when its write pointer is at zone end

Changes from v2:
* Reflected review comments by Chao Yu
* Dropped 4th patch and 2nd patch (2nd patch was required for the 4th patch)

Changes from v1:
* Fixed build failure on dev branch

Shin'ichiro Kawasaki (8):
  libf2fs_zoned: Introduce f2fs_report_zones() helper function
  libf2fs_zoned: Introduce f2fs_report_zone() helper function
  libf2fs_zoned: Introduce f2fs_reset_zone() helper function
  fsck: Find free zones instead of blocks to assign to current segments
  fsck: Introduce move_one_curseg_info() function
  fsck: Check fsync data always for zoned block devices
  fsck: Check write pointer consistency of open zones
  fsck: Check write pointer consistency of non-open zones

 fsck/defrag.c       |   2 +-
 fsck/f2fs.h         |   5 +
 fsck/fsck.c         | 273 ++++++++++++++++++++++++++++++++++++++++++++
 fsck/fsck.h         |  11 +-
 fsck/main.c         |   2 +
 fsck/mount.c        | 137 +++++++++++++++-------
 fsck/segment.c      |   4 +-
 include/f2fs_fs.h   |   7 ++
 lib/libf2fs_zoned.c | 124 +++++++++++++++++++-
 9 files changed, 519 insertions(+), 46 deletions(-)

-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [f2fs-dev] [PATCH v6 1/8] libf2fs_zoned: Introduce f2fs_report_zones() helper function
  2019-10-28  6:55 [f2fs-dev] [PATCH v6 0/8] fsck: Check write pointers of zoned block devices Shin'ichiro Kawasaki
@ 2019-10-28  6:55 ` Shin'ichiro Kawasaki
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 2/8] libf2fs_zoned: Introduce f2fs_report_zone() " Shin'ichiro Kawasaki
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:55 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

To prepare for write pointer consistency check by fsck, add
f2fs_report_zones() helper function which calls REPORT ZONE command to
get write pointer status. The function is added to lib/libf2fs_zoned
which gathers zoned block device related functions.

To check write pointer consistency with f2fs meta data, fsck needs to
refer both of reported zone information and f2fs super block structure
"f2fs_sb_info". However, libf2fs_zoned does not import f2fs_sb_info. To
keep f2fs_sb_info structure out of libf2fs_zoned, provide a callback
function in fsck to f2fs_report_zones() and call it for each zone.

Add SECTOR_SHIFT definition in include/f2fs_fs.h to avoid a magic number
to convert bytes into 512B sectors.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 include/f2fs_fs.h   |  5 ++++
 lib/libf2fs_zoned.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
index 84f3f3e..a437379 100644
--- a/include/f2fs_fs.h
+++ b/include/f2fs_fs.h
@@ -279,6 +279,9 @@ static inline uint64_t bswap_64(uint64_t val)
 #endif
 #define PAGE_CACHE_SIZE		4096
 #define BITS_PER_BYTE		8
+#ifndef SECTOR_SHIFT
+#define SECTOR_SHIFT		9
+#endif
 #define F2FS_SUPER_MAGIC	0xF2F52010	/* F2FS Magic Number */
 #define CP_CHKSUM_OFFSET	4092
 #define SB_CHKSUM_OFFSET	3068
@@ -1296,6 +1299,8 @@ blk_zone_cond_str(struct blk_zone *blkz)
 
 extern int f2fs_get_zoned_model(int);
 extern int f2fs_get_zone_blocks(int);
+typedef int (report_zones_cb_t)(int i, void *, void *);
+extern int f2fs_report_zones(int, report_zones_cb_t *, void *);
 extern int f2fs_check_zones(int);
 extern int f2fs_reset_zones(int);
 
diff --git a/lib/libf2fs_zoned.c b/lib/libf2fs_zoned.c
index af00b44..8ad4171 100644
--- a/lib/libf2fs_zoned.c
+++ b/lib/libf2fs_zoned.c
@@ -193,6 +193,59 @@ int f2fs_get_zone_blocks(int i)
 
 #define F2FS_REPORT_ZONES_BUFSZ	524288
 
+int f2fs_report_zones(int j, report_zones_cb_t *report_zones_cb, void *opaque)
+{
+	struct device_info *dev = c.devices + j;
+	struct blk_zone_report *rep;
+	struct blk_zone *blkz;
+	unsigned int i, n = 0;
+	u_int64_t total_sectors = (dev->total_sectors * c.sector_size)
+		>> SECTOR_SHIFT;
+	u_int64_t sector = 0;
+	int ret = -1;
+
+	rep = malloc(F2FS_REPORT_ZONES_BUFSZ);
+	if (!rep) {
+		ERR_MSG("No memory for report zones\n");
+		return -ENOMEM;
+	}
+
+	while (sector < total_sectors) {
+
+		/* Get zone info */
+		rep->sector = sector;
+		rep->nr_zones = (F2FS_REPORT_ZONES_BUFSZ - sizeof(struct blk_zone_report))
+			/ sizeof(struct blk_zone);
+
+		ret = ioctl(dev->fd, BLKREPORTZONE, rep);
+		if (ret != 0) {
+			ret = -errno;
+			ERR_MSG("ioctl BLKREPORTZONE failed: errno=%d\n",
+				errno);
+			goto out;
+		}
+
+		if (!rep->nr_zones) {
+			ret = -EIO;
+			ERR_MSG("Unexpected ioctl BLKREPORTZONE result\n");
+			goto out;
+		}
+
+		blkz = (struct blk_zone *)(rep + 1);
+		for (i = 0; i < rep->nr_zones; i++) {
+			ret = report_zones_cb(n, blkz, opaque);
+			if (ret)
+				goto out;
+			sector = blk_zone_sector(blkz) + blk_zone_length(blkz);
+			n++;
+			blkz++;
+		}
+	}
+out:
+	free(rep);
+	return ret;
+}
+
 int f2fs_check_zones(int j)
 {
 	struct device_info *dev = c.devices + j;
@@ -372,6 +425,12 @@ out:
 
 #else
 
+int f2fs_report_zones(int i, report_zones_cb_t *report_zones_cb, void *opaque)
+{
+	ERR_MSG("%d: Unsupported zoned block device\n", i);
+	return -1;
+}
+
 int f2fs_get_zoned_model(int i)
 {
 	struct device_info *dev = c.devices + i;
-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [f2fs-dev] [PATCH v6 2/8] libf2fs_zoned: Introduce f2fs_report_zone() helper function
  2019-10-28  6:55 [f2fs-dev] [PATCH v6 0/8] fsck: Check write pointers of zoned block devices Shin'ichiro Kawasaki
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 1/8] libf2fs_zoned: Introduce f2fs_report_zones() helper function Shin'ichiro Kawasaki
@ 2019-10-28  6:55 ` Shin'ichiro Kawasaki
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 3/8] libf2fs_zoned: Introduce f2fs_reset_zone() " Shin'ichiro Kawasaki
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:55 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

To prepare for write pointer consistency check by fsck, add
f2fs_report_zone() helper function which calls REPORT ZONE command to
get write pointer status of a single zone. The function is added to
lib/libf2fs_zoned which gathers zoned block device related functions.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 include/f2fs_fs.h   |  1 +
 lib/libf2fs_zoned.c | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
index a437379..1f7ef05 100644
--- a/include/f2fs_fs.h
+++ b/include/f2fs_fs.h
@@ -1299,6 +1299,7 @@ blk_zone_cond_str(struct blk_zone *blkz)
 
 extern int f2fs_get_zoned_model(int);
 extern int f2fs_get_zone_blocks(int);
+extern int f2fs_report_zone(int, u_int64_t, void *);
 typedef int (report_zones_cb_t)(int i, void *, void *);
 extern int f2fs_report_zones(int, report_zones_cb_t *, void *);
 extern int f2fs_check_zones(int);
diff --git a/lib/libf2fs_zoned.c b/lib/libf2fs_zoned.c
index 8ad4171..5328c56 100644
--- a/lib/libf2fs_zoned.c
+++ b/lib/libf2fs_zoned.c
@@ -191,6 +191,33 @@ int f2fs_get_zone_blocks(int i)
 	return 0;
 }
 
+int f2fs_report_zone(int i, u_int64_t sector, void *blkzone)
+{
+	struct blk_zone *blkz = (struct blk_zone *)blkzone;
+	struct blk_zone_report *rep;
+	int ret = -1;
+
+	rep = malloc(sizeof(struct blk_zone_report) + sizeof(struct blk_zone));
+	if (!rep) {
+		ERR_MSG("No memory for report zones\n");
+		return -ENOMEM;
+	}
+
+	rep->sector = sector;
+	rep->nr_zones = 1;
+	ret = ioctl(c.devices[i].fd, BLKREPORTZONE, rep);
+	if (ret != 0) {
+		ret = -errno;
+		ERR_MSG("ioctl BLKREPORTZONE failed: errno=%d\n", errno);
+		goto out;
+	}
+
+	*blkz = *(struct blk_zone *)(rep + 1);
+out:
+	free(rep);
+	return ret;
+}
+
 #define F2FS_REPORT_ZONES_BUFSZ	524288
 
 int f2fs_report_zones(int j, report_zones_cb_t *report_zones_cb, void *opaque)
@@ -425,6 +452,12 @@ out:
 
 #else
 
+int f2fs_report_zone(int i, u_int64_t sector, void *blkzone)
+{
+	ERR_MSG("%d: Unsupported zoned block device\n", i);
+	return -1;
+}
+
 int f2fs_report_zones(int i, report_zones_cb_t *report_zones_cb, void *opaque)
 {
 	ERR_MSG("%d: Unsupported zoned block device\n", i);
-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [f2fs-dev] [PATCH v6 3/8] libf2fs_zoned: Introduce f2fs_reset_zone() helper function
  2019-10-28  6:55 [f2fs-dev] [PATCH v6 0/8] fsck: Check write pointers of zoned block devices Shin'ichiro Kawasaki
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 1/8] libf2fs_zoned: Introduce f2fs_report_zones() helper function Shin'ichiro Kawasaki
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 2/8] libf2fs_zoned: Introduce f2fs_report_zone() " Shin'ichiro Kawasaki
@ 2019-10-28  6:55 ` Shin'ichiro Kawasaki
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 4/8] fsck: Find free zones instead of blocks to assign to current segments Shin'ichiro Kawasaki
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:55 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

To prepare for write pointer consistency fix by fsck, add
f2fs_reset_zone() helper function which calls RESET ZONE command. The
function is added to lib/libf2fs_zoned which gathers zoned block device
related functions.

When f2fs-tools are built without blkzoned.h kernel header, the helper
function f2fs_reset_zone() prints an error message as other helper
functions in lib/libf2fs_zoned print. To make the message consistent
through the all helper functions, modify message strings in
f2fs_check_zones() and f2fs_reset_zones().

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 include/f2fs_fs.h   |  1 +
 lib/libf2fs_zoned.c | 32 ++++++++++++++++++++++++++++++--
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
index 1f7ef05..a36927b 100644
--- a/include/f2fs_fs.h
+++ b/include/f2fs_fs.h
@@ -1303,6 +1303,7 @@ extern int f2fs_report_zone(int, u_int64_t, void *);
 typedef int (report_zones_cb_t)(int i, void *, void *);
 extern int f2fs_report_zones(int, report_zones_cb_t *, void *);
 extern int f2fs_check_zones(int);
+int f2fs_reset_zone(int, void *);
 extern int f2fs_reset_zones(int);
 
 #define SIZE_ALIGN(val, size)	((val) + (size) - 1) / (size)
diff --git a/lib/libf2fs_zoned.c b/lib/libf2fs_zoned.c
index 5328c56..8b88fe9 100644
--- a/lib/libf2fs_zoned.c
+++ b/lib/libf2fs_zoned.c
@@ -388,6 +388,28 @@ out:
 	return ret;
 }
 
+int f2fs_reset_zone(int i, void *blkzone)
+{
+	struct blk_zone *blkz = (struct blk_zone *)blkzone;
+	struct device_info *dev = c.devices + i;
+	struct blk_zone_range range;
+	int ret;
+
+	if (!blk_zone_seq(blkz) || blk_zone_empty(blkz))
+		return 0;
+
+	/* Non empty sequential zone: reset */
+	range.sector = blk_zone_sector(blkz);
+	range.nr_sectors = blk_zone_length(blkz);
+	ret = ioctl(dev->fd, BLKRESETZONE, &range);
+	if (ret != 0) {
+		ret = -errno;
+		ERR_MSG("ioctl BLKRESETZONE failed: errno=%d\n", errno);
+	}
+
+	return ret;
+}
+
 int f2fs_reset_zones(int j)
 {
 	struct device_info *dev = c.devices + j;
@@ -487,13 +509,19 @@ int f2fs_get_zone_blocks(int i)
 
 int f2fs_check_zones(int i)
 {
-	ERR_MSG("%d: Zoned block devices are not supported\n", i);
+	ERR_MSG("%d: Unsupported zoned block device\n", i);
+	return -1;
+}
+
+int f2fs_reset_zone(int i, void *blkzone)
+{
+	ERR_MSG("%d: Unsupported zoned block device\n", i);
 	return -1;
 }
 
 int f2fs_reset_zones(int i)
 {
-	ERR_MSG("%d: Zoned block devices are not supported\n", i);
+	ERR_MSG("%d: Unsupported zoned block device\n", i);
 	return -1;
 }
 
-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [f2fs-dev] [PATCH v6 4/8] fsck: Find free zones instead of blocks to assign to current segments
  2019-10-28  6:55 [f2fs-dev] [PATCH v6 0/8] fsck: Check write pointers of zoned block devices Shin'ichiro Kawasaki
                   ` (2 preceding siblings ...)
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 3/8] libf2fs_zoned: Introduce f2fs_reset_zone() " Shin'ichiro Kawasaki
@ 2019-10-28  6:55 ` Shin'ichiro Kawasaki
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 5/8] fsck: Introduce move_one_curseg_info() function Shin'ichiro Kawasaki
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:55 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

When fsck needs to assign a new area to a curreng segment, it calls
find_next_free_block() function to find a new block to assign. For zoned
block devices, fsck checks write pointer consistency with current
segments' positions. In case a curseg is inconsistent with the
write pointer of the zone it points to, fsck should assign not a new free
block but a new free zone/section with write pointer at the zone start,
so that next write to the current segment succeeds without error.

To extend find_next_free_block() function's capability to find not only
a block but also a zone/section, add new_sec flag to
find_next_free_block() function. When new_sec flag is true, skip check
for each block's availability so that the check is done with unit of
section. Note that it is ensured that one zone has one section for f2fs
on zoned block devices. Then the logic to find a new free section is good
to find a new free zone.

When fsck target devices have ZONED_HM model, set new_sec flag true to
call find_next_free_block() from move_curseg_info(). Set curseg's
alloc_type not SSR but LFS for the devices with ZONED_HM model, because
SSR block allocation is not allowed for zoned block devices. Also skip
relocate_curseg_offset() for the devices with ZONED_HM model for the
same reason.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
---
 fsck/defrag.c  |  2 +-
 fsck/fsck.h    |  2 +-
 fsck/mount.c   | 12 ++++++++----
 fsck/segment.c |  2 +-
 4 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/fsck/defrag.c b/fsck/defrag.c
index fc6b7cf..3473637 100644
--- a/fsck/defrag.c
+++ b/fsck/defrag.c
@@ -77,7 +77,7 @@ int f2fs_defragment(struct f2fs_sb_info *sbi, u64 from, u64 len, u64 to, int lef
 		if (!f2fs_test_bit(offset, (const char *)se->cur_valid_map))
 			continue;
 
-		if (find_next_free_block(sbi, &target, left, se->type)) {
+		if (find_next_free_block(sbi, &target, left, se->type, false)) {
 			MSG(0, "Not enough space to migrate blocks");
 			return -1;
 		}
diff --git a/fsck/fsck.h b/fsck/fsck.h
index ccf4a39..8da0ebb 100644
--- a/fsck/fsck.h
+++ b/fsck/fsck.h
@@ -191,7 +191,7 @@ extern void zero_journal_entries(struct f2fs_sb_info *);
 extern void flush_sit_entries(struct f2fs_sb_info *);
 extern void move_curseg_info(struct f2fs_sb_info *, u64, int);
 extern void write_curseg_info(struct f2fs_sb_info *);
-extern int find_next_free_block(struct f2fs_sb_info *, u64 *, int, int);
+extern int find_next_free_block(struct f2fs_sb_info *, u64 *, int, int, bool);
 extern void duplicate_checkpoint(struct f2fs_sb_info *);
 extern void write_checkpoint(struct f2fs_sb_info *);
 extern void write_checkpoints(struct f2fs_sb_info *);
diff --git a/fsck/mount.c b/fsck/mount.c
index 4814dfe..8d2ba55 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -2430,6 +2430,9 @@ int relocate_curseg_offset(struct f2fs_sb_info *sbi, int type)
 	struct seg_entry *se = get_seg_entry(sbi, curseg->segno);
 	unsigned int i;
 
+	if (c.zoned_model == F2FS_ZONED_HM)
+		return -EINVAL;
+
 	for (i = 0; i < sbi->blocks_per_seg; i++) {
 		if (!f2fs_test_bit(i, (const char *)se->cur_valid_map))
 			break;
@@ -2462,7 +2465,7 @@ void set_section_type(struct f2fs_sb_info *sbi, unsigned int segno, int type)
 	}
 }
 
-int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_type)
+int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_type, bool new_sec)
 {
 	struct f2fs_super_block *sb = F2FS_RAW_SUPER(sbi);
 	struct seg_entry *se;
@@ -2520,7 +2523,7 @@ int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_t
 			}
 		}
 
-		if (type == want_type &&
+		if (type == want_type && !new_sec &&
 			!f2fs_test_bit(offset, (const char *)bitmap))
 			return 0;
 
@@ -2546,13 +2549,14 @@ void move_curseg_info(struct f2fs_sb_info *sbi, u64 from, int left)
 		ASSERT(ret >= 0);
 
 		to = from;
-		ret = find_next_free_block(sbi, &to, left, i);
+		ret = find_next_free_block(sbi, &to, left, i,
+					   c.zoned_model == F2FS_ZONED_HM);
 		ASSERT(ret == 0);
 
 		old_segno = curseg->segno;
 		curseg->segno = GET_SEGNO(sbi, to);
 		curseg->next_blkoff = OFFSET_IN_SEG(sbi, to);
-		curseg->alloc_type = SSR;
+		curseg->alloc_type = c.zoned_model == F2FS_ZONED_HM ? LFS : SSR;
 
 		/* update new segno */
 		ssa_blk = GET_SUM_BLKADDR(sbi, curseg->segno);
diff --git a/fsck/segment.c b/fsck/segment.c
index e3a90da..ccde05f 100644
--- a/fsck/segment.c
+++ b/fsck/segment.c
@@ -52,7 +52,7 @@ int reserve_new_block(struct f2fs_sb_info *sbi, block_t *to,
 
 	blkaddr = SM_I(sbi)->main_blkaddr;
 
-	if (find_next_free_block(sbi, &blkaddr, 0, type)) {
+	if (find_next_free_block(sbi, &blkaddr, 0, type, false)) {
 		ERR_MSG("Can't find free block");
 		ASSERT(0);
 	}
-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [f2fs-dev] [PATCH v6 5/8] fsck: Introduce move_one_curseg_info() function
  2019-10-28  6:55 [f2fs-dev] [PATCH v6 0/8] fsck: Check write pointers of zoned block devices Shin'ichiro Kawasaki
                   ` (3 preceding siblings ...)
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 4/8] fsck: Find free zones instead of blocks to assign to current segments Shin'ichiro Kawasaki
@ 2019-10-28  6:55 ` Shin'ichiro Kawasaki
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 6/8] fsck: Check fsync data always for zoned block devices Shin'ichiro Kawasaki
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:55 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

When fsck updates one of the current segments, update_curseg_info() is
called specifying a single current segment as its argument. However,
update_curseg_info() calls move_curseg_info() function which updates all
six current segments. Then update_curseg_info() for a single current
segment moves all current segments.

This excessive current segment move causes an issue when a new zone is
assigned to a current segment because of write pointer inconsistency.
Even when a current segment has write pointer inconsistency, all other
current segments should not be moved because they may have fsync data
at their positions.

To avoid the excessive current segment move, introduce
move_one_curseg_info() function which does same work as
move_curseg_info() only for a single current segment. Call
move_one_curseg_info() in place of move_curseg_info() from
update_curseg_info().

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
---
 fsck/mount.c | 68 ++++++++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/fsck/mount.c b/fsck/mount.c
index 8d2ba55..915f487 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -2532,52 +2532,58 @@ int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_t
 	return -1;
 }
 
-void move_curseg_info(struct f2fs_sb_info *sbi, u64 from, int left)
+static void move_one_curseg_info(struct f2fs_sb_info *sbi, u64 from, int left,
+				 int i)
 {
-	int i, ret;
+	struct curseg_info *curseg = CURSEG_I(sbi, i);
+	struct f2fs_summary_block buf;
+	u32 old_segno;
+	u64 ssa_blk, to;
+	int ret;
 
-	/* update summary blocks having nullified journal entries */
-	for (i = 0; i < NO_CHECK_TYPE; i++) {
-		struct curseg_info *curseg = CURSEG_I(sbi, i);
-		struct f2fs_summary_block buf;
-		u32 old_segno;
-		u64 ssa_blk, to;
+	/* update original SSA too */
+	ssa_blk = GET_SUM_BLKADDR(sbi, curseg->segno);
+	ret = dev_write_block(curseg->sum_blk, ssa_blk);
+	ASSERT(ret >= 0);
 
-		/* update original SSA too */
-		ssa_blk = GET_SUM_BLKADDR(sbi, curseg->segno);
-		ret = dev_write_block(curseg->sum_blk, ssa_blk);
-		ASSERT(ret >= 0);
+	to = from;
+	ret = find_next_free_block(sbi, &to, left, i,
+				   c.zoned_model == F2FS_ZONED_HM);
+	ASSERT(ret == 0);
 
-		to = from;
-		ret = find_next_free_block(sbi, &to, left, i,
-					   c.zoned_model == F2FS_ZONED_HM);
-		ASSERT(ret == 0);
+	old_segno = curseg->segno;
+	curseg->segno = GET_SEGNO(sbi, to);
+	curseg->next_blkoff = OFFSET_IN_SEG(sbi, to);
+	curseg->alloc_type = c.zoned_model == F2FS_ZONED_HM ? LFS : SSR;
 
-		old_segno = curseg->segno;
-		curseg->segno = GET_SEGNO(sbi, to);
-		curseg->next_blkoff = OFFSET_IN_SEG(sbi, to);
-		curseg->alloc_type = c.zoned_model == F2FS_ZONED_HM ? LFS : SSR;
+	/* update new segno */
+	ssa_blk = GET_SUM_BLKADDR(sbi, curseg->segno);
+	ret = dev_read_block(&buf, ssa_blk);
+	ASSERT(ret >= 0);
 
-		/* update new segno */
-		ssa_blk = GET_SUM_BLKADDR(sbi, curseg->segno);
-		ret = dev_read_block(&buf, ssa_blk);
-		ASSERT(ret >= 0);
+	memcpy(curseg->sum_blk, &buf, SUM_ENTRIES_SIZE);
 
-		memcpy(curseg->sum_blk, &buf, SUM_ENTRIES_SIZE);
+	/* update se->types */
+	reset_curseg(sbi, i);
 
-		/* update se->types */
-		reset_curseg(sbi, i);
+	FIX_MSG("Move curseg[%d] %x -> %x after %"PRIx64"\n",
+		i, old_segno, curseg->segno, from);
+}
 
-		DBG(1, "Move curseg[%d] %x -> %x after %"PRIx64"\n",
-				i, old_segno, curseg->segno, from);
-	}
+void move_curseg_info(struct f2fs_sb_info *sbi, u64 from, int left)
+{
+	int i;
+
+	/* update summary blocks having nullified journal entries */
+	for (i = 0; i < NO_CHECK_TYPE; i++)
+		move_one_curseg_info(sbi, from, left, i);
 }
 
 void update_curseg_info(struct f2fs_sb_info *sbi, int type)
 {
 	if (!relocate_curseg_offset(sbi, type))
 		return;
-	move_curseg_info(sbi, SM_I(sbi)->main_blkaddr, 0);
+	move_one_curseg_info(sbi, SM_I(sbi)->main_blkaddr, 0, type);
 }
 
 void zero_journal_entries(struct f2fs_sb_info *sbi)
-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [f2fs-dev] [PATCH v6 6/8] fsck: Check fsync data always for zoned block devices
  2019-10-28  6:55 [f2fs-dev] [PATCH v6 0/8] fsck: Check write pointers of zoned block devices Shin'ichiro Kawasaki
                   ` (4 preceding siblings ...)
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 5/8] fsck: Introduce move_one_curseg_info() function Shin'ichiro Kawasaki
@ 2019-10-28  6:55 ` Shin'ichiro Kawasaki
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 7/8] fsck: Check write pointer consistency of open zones Shin'ichiro Kawasaki
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 8/8] fsck: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
  7 siblings, 0 replies; 15+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:55 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

Fsck checks fsync data when UMOUNT flag is not set. When the f2fs was not
cleanly unmouted, UMOUNT flag is not recorded in meta data and fsync data
can be left in the f2fs. The first fsck run checks fsync data to reflect
it on quota status recovery. After that, fsck writes UMOUNT flag in the
f2fs meta data so that second fsck run can skip fsync data check.

However, fsck for zoned block devices need to care fsync data for all
fsck runs. The first fsck run checks fsync data, then fsck can check
write pointer consistency with fsync data. However, since second fsck run
does not check fsync data, fsck detects write pointer at fsync data end
is not consistent with f2fs meta data. This results in meta data update
by fsck and fsync data gets lost.

To have fsck check fsync data always for zoned block devices, introduce
need_fsync_data_record() helper function which returns boolean to tell
if fsck needs fsync data check or not. For zoned block devices, always
return true. Otherwise, return true if UMOUNT flag is not set in CP.
Replace UMOUNT flag check codes for fsync data with the function call.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
---
 fsck/fsck.h    |  6 ++++++
 fsck/mount.c   | 14 +++++++-------
 fsck/segment.c |  2 +-
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/fsck/fsck.h b/fsck/fsck.h
index 8da0ebb..75052d8 100644
--- a/fsck/fsck.h
+++ b/fsck/fsck.h
@@ -133,6 +133,12 @@ enum seg_type {
 
 struct selabel_handle;
 
+static inline bool need_fsync_data_record(struct f2fs_sb_info *sbi)
+{
+	return !is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG) ||
+		c.zoned_model == F2FS_ZONED_HM;
+}
+
 extern int fsck_chk_orphan_node(struct f2fs_sb_info *);
 extern int fsck_chk_quota_node(struct f2fs_sb_info *);
 extern int fsck_chk_quota_files(struct f2fs_sb_info *);
diff --git a/fsck/mount.c b/fsck/mount.c
index 915f487..2979865 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -1525,7 +1525,7 @@ int build_sit_info(struct f2fs_sb_info *sbi)
 
 	bitmap_size = TOTAL_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE;
 
-	if (!is_set_ckpt_flags(cp, CP_UMOUNT_FLAG))
+	if (need_fsync_data_record(sbi))
 		bitmap_size += bitmap_size;
 
 	sit_i->bitmap = calloc(bitmap_size, 1);
@@ -1540,7 +1540,7 @@ int build_sit_info(struct f2fs_sb_info *sbi)
 		sit_i->sentries[start].cur_valid_map = bitmap;
 		bitmap += SIT_VBLOCK_MAP_SIZE;
 
-		if (!is_set_ckpt_flags(cp, CP_UMOUNT_FLAG)) {
+		if (need_fsync_data_record(sbi)) {
 			sit_i->sentries[start].ckpt_valid_map = bitmap;
 			bitmap += SIT_VBLOCK_MAP_SIZE;
 		}
@@ -1887,7 +1887,7 @@ void seg_info_from_raw_sit(struct f2fs_sb_info *sbi, struct seg_entry *se,
 {
 	__seg_info_from_raw_sit(se, raw_sit);
 
-	if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
+	if (!need_fsync_data_record(sbi))
 		return;
 	se->ckpt_valid_blocks = se->valid_blocks;
 	memcpy(se->ckpt_valid_map, se->cur_valid_map, SIT_VBLOCK_MAP_SIZE);
@@ -1903,7 +1903,7 @@ struct seg_entry *get_seg_entry(struct f2fs_sb_info *sbi,
 
 unsigned short get_seg_vblocks(struct f2fs_sb_info *sbi, struct seg_entry *se)
 {
-	if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
+	if (!need_fsync_data_record(sbi))
 		return se->valid_blocks;
 	else
 		return se->ckpt_valid_blocks;
@@ -1911,7 +1911,7 @@ unsigned short get_seg_vblocks(struct f2fs_sb_info *sbi, struct seg_entry *se)
 
 unsigned char *get_seg_bitmap(struct f2fs_sb_info *sbi, struct seg_entry *se)
 {
-	if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
+	if (!need_fsync_data_record(sbi))
 		return se->cur_valid_map;
 	else
 		return se->ckpt_valid_map;
@@ -1919,7 +1919,7 @@ unsigned char *get_seg_bitmap(struct f2fs_sb_info *sbi, struct seg_entry *se)
 
 unsigned char get_seg_type(struct f2fs_sb_info *sbi, struct seg_entry *se)
 {
-	if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
+	if (!need_fsync_data_record(sbi))
 		return se->type;
 	else
 		return se->ckpt_type;
@@ -3242,7 +3242,7 @@ static int record_fsync_data(struct f2fs_sb_info *sbi)
 	struct list_head inode_list = LIST_HEAD_INIT(inode_list);
 	int ret;
 
-	if (is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG))
+	if (!need_fsync_data_record(sbi))
 		return 0;
 
 	ret = find_fsync_inode(sbi, &inode_list);
diff --git a/fsck/segment.c b/fsck/segment.c
index ccde05f..17c42b7 100644
--- a/fsck/segment.c
+++ b/fsck/segment.c
@@ -62,7 +62,7 @@ int reserve_new_block(struct f2fs_sb_info *sbi, block_t *to,
 	se->type = type;
 	se->valid_blocks++;
 	f2fs_set_bit(offset, (char *)se->cur_valid_map);
-	if (!is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG)) {
+	if (need_fsync_data_record(sbi)) {
 		se->ckpt_type = type;
 		se->ckpt_valid_blocks++;
 		f2fs_set_bit(offset, (char *)se->ckpt_valid_map);
-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [f2fs-dev] [PATCH v6 7/8] fsck: Check write pointer consistency of open zones
  2019-10-28  6:55 [f2fs-dev] [PATCH v6 0/8] fsck: Check write pointers of zoned block devices Shin'ichiro Kawasaki
                   ` (5 preceding siblings ...)
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 6/8] fsck: Check fsync data always for zoned block devices Shin'ichiro Kawasaki
@ 2019-10-28  6:55 ` Shin'ichiro Kawasaki
  2019-11-05 11:06   ` Chao Yu
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 8/8] fsck: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
  7 siblings, 1 reply; 15+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:55 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

On sudden f2fs shutdown, write pointers of zoned block devices can go
further but f2fs meta data keeps current segments at positions before the
write operations. After remounting the f2fs, this inconsistency causes
write operations not at write pointers and "Unaligned write command"
error is reported.

To avoid the error, have f2fs.fsck check consistency of write pointers
of open zones that current segments point to. Compare each current
segment's position and the write pointer position of the open zone. If
inconsistency is found and 'fix_on' flag is set, assign a new zone to the
current segment and check the newly assigned zone has write pointer at
the zone start. Leave the original zone as is to keep data recorded in
it.

To care about fsync data, refer each seg_entry's ckpt_valid_map to get
the last valid block in the zone. If the last valid block is beyond the
current segments position, fsync data exits in the zone. In case fsync
data exists, do not assign a new zone to the current segment not to lose
the fsync data. It is expected that the kernel replay the fsync data and
fix the write pointer inconsistency at mount time.

Also check consistency between write pointer of the zone the current
segment points to with valid block maps of the zone. If the last valid
block is beyond the write pointer position, report to indicate f2fs bug.
If 'fix_on' flag is set, assign a new zone to the current segment.

When inconsistencies are found, turn on 'bug_on' flag in fsck_verify() to
ask users to fix them or not. When inconsistencies get fixed, turn on
'force' flag in fsck_verify() to enforce fixes in following checks.

This check and fix is done twice. The first is done at the beginning of
do_fsck() function so that other fixes can reflect the current segment
modification. The second is done in fsck_verify() to reflect updated meta
data by other fixes.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 fsck/f2fs.h  |   5 ++
 fsck/fsck.c  | 154 +++++++++++++++++++++++++++++++++++++++++++++++++++
 fsck/fsck.h  |   3 +
 fsck/main.c  |   2 +
 fsck/mount.c |  49 +++++++++++++++-
 5 files changed, 212 insertions(+), 1 deletion(-)

diff --git a/fsck/f2fs.h b/fsck/f2fs.h
index 399c74d..07513cb 100644
--- a/fsck/f2fs.h
+++ b/fsck/f2fs.h
@@ -429,6 +429,11 @@ static inline block_t __end_block_addr(struct f2fs_sb_info *sbi)
 #define GET_BLKOFF_FROM_SEG0(sbi, blk_addr)				\
 	(GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & (sbi->blocks_per_seg - 1))
 
+#define GET_SEC_FROM_SEG(sbi, segno)					\
+	((segno) / (sbi)->segs_per_sec)
+#define GET_SEG_FROM_SEC(sbi, secno)					\
+	((secno) * (sbi)->segs_per_sec)
+
 #define FREE_I_START_SEGNO(sbi)						\
 	GET_SEGNO_FROM_SEG0(sbi, SM_I(sbi)->main_blkaddr)
 #define GET_R2L_SEGNO(sbi, segno)	(segno + FREE_I_START_SEGNO(sbi))
diff --git a/fsck/fsck.c b/fsck/fsck.c
index 2ae3bd5..e0eda4e 100644
--- a/fsck/fsck.c
+++ b/fsck/fsck.c
@@ -2181,6 +2181,125 @@ static void fix_checkpoints(struct f2fs_sb_info *sbi)
 	fix_checkpoint(sbi);
 }
 
+#ifdef HAVE_LINUX_BLKZONED_H
+
+/*
+ * Refer valid block map and return offset of the last valid block in the zone.
+ * Obtain valid block map from SIT and fsync data.
+ * If there is no valid block in the zone, return -1.
+ */
+static int last_vblk_off_in_zone(struct f2fs_sb_info *sbi,
+				 unsigned int zone_segno)
+{
+	unsigned int s;
+	unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
+	struct seg_entry *se;
+	block_t b;
+	int ret = -1;
+
+	for (s = 0; s < segs_per_zone; s++) {
+		se = get_seg_entry(sbi, zone_segno + s);
+
+		/*
+		 * Refer not cur_valid_map but ckpt_valid_map which reflects
+		 * fsync data.
+		 */
+		ASSERT(se->ckpt_valid_map);
+		for (b = 0; b < sbi->blocks_per_seg; b++)
+			if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map))
+				ret = b + (s << sbi->log_blocks_per_seg);
+	}
+
+	return ret;
+}
+
+static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
+{
+	struct curseg_info *curseg = CURSEG_I(sbi, type);
+	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
+	struct blk_zone blkz;
+	block_t cs_block, wp_block, zone_last_vblock;
+	u_int64_t cs_sector, wp_sector;
+	int i, ret;
+	unsigned int zone_segno;
+	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
+
+	/* get the device the curseg points to */
+	cs_block = START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff;
+	for (i = 0; i < MAX_DEVICES; i++) {
+		if (!c.devices[i].path)
+			break;
+		if (c.devices[i].start_blkaddr <= cs_block &&
+		    cs_block <= c.devices[i].end_blkaddr)
+			break;
+	}
+
+	if (i >= MAX_DEVICES)
+		return -EINVAL;
+
+	/* get write pointer position of the zone the curseg points to */
+	cs_sector = (cs_block - c.devices[i].start_blkaddr)
+		<< log_sectors_per_block;
+	ret = f2fs_report_zone(i, cs_sector, &blkz);
+	if (ret)
+		return ret;
+
+	if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ)
+		return 0;
+
+	/* check consistency between the curseg and the write pointer */
+	wp_block = c.devices[i].start_blkaddr +
+		(blk_zone_wp_sector(&blkz) >> log_sectors_per_block);
+	wp_sector = blk_zone_wp_sector(&blkz);
+
+	if (cs_sector == wp_sector)
+		return 0;
+
+	if (cs_sector > wp_sector) {
+		MSG(0, "Inconsistent write pointer with curseg %d: "
+		    "curseg %d[0x%x,0x%x] > wp[0x%x,0x%x]\n",
+		    type, type, curseg->segno, curseg->next_blkoff,
+		    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
+		fsck->chk.wp_inconsistent_zones++;
+		return -EINVAL;
+	}
+
+	MSG(0, "Write pointer goes advance from curseg %d: "
+	    "curseg %d[0x%x,0x%x] wp[0x%x,0x%x]\n",
+	    type, type, curseg->segno, curseg->next_blkoff,
+	    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
+
+	zone_segno = GET_SEG_FROM_SEC(sbi,
+				      GET_SEC_FROM_SEG(sbi, curseg->segno));
+	zone_last_vblock = START_BLOCK(sbi, zone_segno) +
+		last_vblk_off_in_zone(sbi, zone_segno);
+
+	/*
+	 * If fsync data exists between the curseg and the last valid block,
+	 * it is not an error to fix. Leave it for kernel to recover later.
+	 */
+	if (cs_block <= zone_last_vblock) {
+		MSG(0, "Curseg has fsync data: curseg %d[0x%x,0x%x] "
+		    "last valid block in zone[0x%x,0x%x]\n",
+		    type, curseg->segno, curseg->next_blkoff,
+		    GET_SEGNO(sbi, zone_last_vblock),
+		    OFFSET_IN_SEG(sbi, zone_last_vblock));
+		return 0;
+	}
+
+	fsck->chk.wp_inconsistent_zones++;
+	return -EINVAL;
+}
+
+#else
+
+static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
+{
+	return 0;
+}
+
+#endif
+
 int check_curseg_offset(struct f2fs_sb_info *sbi, int type)
 {
 	struct curseg_info *curseg = CURSEG_I(sbi, type);
@@ -2209,6 +2328,10 @@ int check_curseg_offset(struct f2fs_sb_info *sbi, int type)
 			return -EINVAL;
 		}
 	}
+
+	if (c.zoned_model == F2FS_ZONED_HM)
+		return check_curseg_write_pointer(sbi, type);
+
 	return 0;
 }
 
@@ -2628,6 +2751,23 @@ out:
 	return cnt;
 }
 
+/*
+ * Check and fix consistency with write pointers at the beginning of
+ * fsck so that following writes by fsck do not fail.
+ */
+void fsck_chk_and_fix_write_pointers(struct f2fs_sb_info *sbi)
+{
+	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
+
+	if (c.zoned_model != F2FS_ZONED_HM)
+		return;
+
+	if (check_curseg_offsets(sbi) && c.fix_on) {
+		fix_curseg_info(sbi);
+		fsck->chk.wp_fixed = 1;
+	}
+}
+
 int fsck_chk_curseg_info(struct f2fs_sb_info *sbi)
 {
 	struct curseg_info *curseg;
@@ -2678,6 +2818,20 @@ int fsck_verify(struct f2fs_sb_info *sbi)
 
 	printf("\n");
 
+	if (c.zoned_model == F2FS_ZONED_HM) {
+		printf("[FSCK] Write pointers consistency                    ");
+		if (fsck->chk.wp_inconsistent_zones == 0x0) {
+			printf(" [Ok..]\n");
+		} else {
+			printf(" [Fail] [0x%x]\n",
+			       fsck->chk.wp_inconsistent_zones);
+			c.bug_on = 1;
+		}
+
+		if (fsck->chk.wp_fixed && c.fix_on)
+			force = 1;
+	}
+
 	if (c.feature & cpu_to_le32(F2FS_FEATURE_LOST_FOUND)) {
 		for (i = 0; i < fsck->nr_nat_entries; i++)
 			if (f2fs_test_bit(i, fsck->nat_area_bitmap) != 0)
diff --git a/fsck/fsck.h b/fsck/fsck.h
index 75052d8..c4432e8 100644
--- a/fsck/fsck.h
+++ b/fsck/fsck.h
@@ -80,6 +80,8 @@ struct f2fs_fsck {
 		u32 multi_hard_link_files;
 		u64 sit_valid_blocks;
 		u32 sit_free_segs;
+		u32 wp_fixed;
+		u32 wp_inconsistent_zones;
 	} chk;
 
 	struct hard_link_node *hard_link_list_head;
@@ -162,6 +164,7 @@ int fsck_chk_inline_dentries(struct f2fs_sb_info *, struct f2fs_node *,
 		struct child_info *);
 void fsck_chk_checkpoint(struct f2fs_sb_info *sbi);
 int fsck_chk_meta(struct f2fs_sb_info *sbi);
+void fsck_chk_and_fix_write_pointers(struct f2fs_sb_info *);
 int fsck_chk_curseg_info(struct f2fs_sb_info *);
 void pretty_print_filename(const u8 *raw_name, u32 len,
 			   char out[F2FS_PRINT_NAMELEN], int enc_name);
diff --git a/fsck/main.c b/fsck/main.c
index 8c62a14..9a7d499 100644
--- a/fsck/main.c
+++ b/fsck/main.c
@@ -602,6 +602,8 @@ static void do_fsck(struct f2fs_sb_info *sbi)
 
 	print_cp_state(flag);
 
+	fsck_chk_and_fix_write_pointers(sbi);
+
 	fsck_chk_curseg_info(sbi);
 
 	if (!c.fix_on && !c.bug_on) {
diff --git a/fsck/mount.c b/fsck/mount.c
index 2979865..5085e6c 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -2465,6 +2465,52 @@ void set_section_type(struct f2fs_sb_info *sbi, unsigned int segno, int type)
 	}
 }
 
+#ifdef HAVE_LINUX_BLKZONED_H
+
+static bool write_pointer_at_zone_start(struct f2fs_sb_info *sbi,
+					unsigned int zone_segno)
+{
+	u_int64_t sector;
+	struct blk_zone blkz;
+	block_t block = START_BLOCK(sbi, zone_segno);
+	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
+	int ret, j;
+
+	if (c.zoned_model != F2FS_ZONED_HM)
+		return true;
+
+	for (j = 0; j < MAX_DEVICES; j++) {
+		if (!c.devices[j].path)
+			break;
+		if (c.devices[j].start_blkaddr <= block &&
+		    block <= c.devices[j].end_blkaddr)
+			break;
+	}
+
+	if (j >= MAX_DEVICES)
+		return false;
+
+	sector = (block - c.devices[j].start_blkaddr) << log_sectors_per_block;
+	ret = f2fs_report_zone(j, sector, &blkz);
+	if (ret)
+		return false;
+
+	if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ)
+		return true;
+
+	return blk_zone_sector(&blkz) == blk_zone_wp_sector(&blkz);
+}
+
+#else
+
+static bool write_pointer_at_zone_start(struct f2fs_sb_info *sbi,
+					unsigned int zone_segno)
+{
+	return true;
+}
+
+#endif
+
 int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_type, bool new_sec)
 {
 	struct f2fs_super_block *sb = F2FS_RAW_SUPER(sbi);
@@ -2517,7 +2563,8 @@ int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_t
 					break;
 			}
 
-			if (i == sbi->segs_per_sec) {
+			if (i == sbi->segs_per_sec &&
+			    write_pointer_at_zone_start(sbi, segno)) {
 				set_section_type(sbi, segno, want_type);
 				return 0;
 			}
-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [f2fs-dev] [PATCH v6 8/8] fsck: Check write pointer consistency of non-open zones
  2019-10-28  6:55 [f2fs-dev] [PATCH v6 0/8] fsck: Check write pointers of zoned block devices Shin'ichiro Kawasaki
                   ` (6 preceding siblings ...)
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 7/8] fsck: Check write pointer consistency of open zones Shin'ichiro Kawasaki
@ 2019-10-28  6:55 ` Shin'ichiro Kawasaki
  2019-11-05 11:32   ` Chao Yu
  7 siblings, 1 reply; 15+ messages in thread
From: Shin'ichiro Kawasaki @ 2019-10-28  6:55 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: Damien Le Moal

To catch f2fs bug in write pointer handling code for zoned block devices,
have fsck check consistency of write pointers of non-open zones, that
current segments do not point to. Check two items comparing write pointer
positions with valid block maps in SIT.

The first item is check for zones with no valid blocks. When there is no
valid blocks in a zone, the write pointer should be at the start of the
zone. If not, next write operation to the zone will cause unaligned write
error. If write pointer is not at the zone start, reset the zone to move
the write pointer to the zone start.

The second item is check between write pointer position and the last
valid block in the zone. It is unexpected that the last valid block
position is beyond the write pointer. In such a case, report as the bug.
Fix is not required for such zone, because the zone is not selected for
next write operation until the zone get discarded.

In the same manner as the consistency check for current segments, do the
check and fix twice: at the beginning of do_fsck() to avoid unaligned
write error during fsck, and at fsck_verify() to reflect meta data
updates by fsck.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 fsck/fsck.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)

diff --git a/fsck/fsck.c b/fsck/fsck.c
index e0eda4e..8400929 100644
--- a/fsck/fsck.c
+++ b/fsck/fsck.c
@@ -2751,6 +2751,122 @@ out:
 	return cnt;
 }
 
+#ifdef HAVE_LINUX_BLKZONED_H
+
+struct write_pointer_check_data {
+	struct f2fs_sb_info *sbi;
+	int dev_index;
+};
+
+static int chk_and_fix_wp_with_sit(int i, void *blkzone, void *opaque)
+{
+	struct blk_zone *blkz = (struct blk_zone *)blkzone;
+	struct write_pointer_check_data *wpd = opaque;
+	struct f2fs_sb_info *sbi = wpd->sbi;
+	struct device_info *dev = c.devices + wpd->dev_index;
+	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
+	block_t zone_block, wp_block, wp_blkoff;
+	unsigned int zone_segno, wp_segno;
+	struct curseg_info *cs;
+	int cs_index, ret, last_valid_blkoff;
+	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
+	unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
+
+	if (blk_zone_conv(blkz))
+		return 0;
+
+	zone_block = dev->start_blkaddr
+		+ (blk_zone_sector(blkz) >> log_sectors_per_block);
+	zone_segno = GET_SEGNO(sbi, zone_block);
+	if (zone_segno >= MAIN_SEGS(sbi))
+		return 0;
+
+	wp_block = dev->start_blkaddr
+		+ (blk_zone_wp_sector(blkz) >> log_sectors_per_block);
+	wp_segno = GET_SEGNO(sbi, wp_block);
+	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
+
+	/* if a curseg points to the zone, skip the check */
+	for (cs_index = 0; cs_index < NO_CHECK_TYPE; cs_index++) {
+		cs = &SM_I(sbi)->curseg_array[cs_index];
+		if (zone_segno <= cs->segno &&
+		    cs->segno < zone_segno + segs_per_zone)
+			return 0;
+	}
+
+	last_valid_blkoff = last_vblk_off_in_zone(sbi, zone_segno);
+
+	/*
+	 * When there is no valid block in the zone, check write pointer is
+	 * at zone start. If not, reset the write pointer.
+	 */
+	if (last_valid_blkoff < 0 &&
+	    blk_zone_wp_sector(blkz) != blk_zone_sector(blkz)) {
+		if (!c.fix_on) {
+			MSG(0, "Inconsistent write pointer: wp[0x%x,0x%x]\n",
+			    wp_segno, wp_blkoff);
+			fsck->chk.wp_inconsistent_zones++;
+			return 0;
+		}
+
+		FIX_MSG("Reset write pointer of zone at segment 0x%x",
+			zone_segno);
+		ret = f2fs_reset_zone(wpd->dev_index, blkz);
+		if (ret) {
+			printf("[FSCK] Write pointer reset failed: %s\n",
+			       dev->path);
+			return ret;
+		}
+		fsck->chk.wp_fixed = 1;
+		return 0;
+	}
+
+	/*
+	 * If valid blocks exist in the zone beyond the write pointer, it
+	 * is a f2fs bug. No need to fix because the zone is not selected
+	 * for the write. Just report it.
+	 */
+	if (last_valid_blkoff + zone_block > wp_block) {
+		MSG(0, "Unexpected invalid write pointer: wp[0x%x,0x%x]\n",
+		    wp_segno, wp_blkoff);
+		return 0;
+	}
+
+	return 0;
+}
+
+static void fix_wp_sit_alignment(struct f2fs_sb_info *sbi)
+{
+	unsigned int i;
+	struct write_pointer_check_data wpd = {	sbi, 0 };
+
+	if (c.zoned_model != F2FS_ZONED_HM)
+		return;
+
+	for (i = 0; i < MAX_DEVICES; i++) {
+		if (!c.devices[i].path)
+			break;
+		if (c.devices[i].zoned_model != F2FS_ZONED_HM)
+			break;
+
+		wpd.dev_index = i;
+		if (f2fs_report_zones(i, chk_and_fix_wp_with_sit, &wpd)) {
+			printf("[FSCK] Write pointer check failed: %s\n",
+			       c.devices[i].path);
+			return;
+		}
+	}
+}
+
+#else
+
+static void fix_wp_sit_alignment(struct f2fs_sb_info *sbi)
+{
+	return;
+}
+
+#endif
+
 /*
  * Check and fix consistency with write pointers at the beginning of
  * fsck so that following writes by fsck do not fail.
@@ -2766,6 +2882,8 @@ void fsck_chk_and_fix_write_pointers(struct f2fs_sb_info *sbi)
 		fix_curseg_info(sbi);
 		fsck->chk.wp_fixed = 1;
 	}
+
+	fix_wp_sit_alignment(sbi);
 }
 
 int fsck_chk_curseg_info(struct f2fs_sb_info *sbi)
@@ -2984,6 +3102,7 @@ int fsck_verify(struct f2fs_sb_info *sbi)
 			fix_hard_links(sbi);
 			fix_nat_entries(sbi);
 			rewrite_sit_area_bitmap(sbi);
+			fix_wp_sit_alignment(sbi);
 			fix_curseg_info(sbi);
 			fix_checksum(sbi);
 			fix_checkpoints(sbi);
-- 
2.21.0



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [f2fs-dev] [PATCH v6 7/8] fsck: Check write pointer consistency of open zones
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 7/8] fsck: Check write pointer consistency of open zones Shin'ichiro Kawasaki
@ 2019-11-05 11:06   ` Chao Yu
  2019-11-06  9:45     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 15+ messages in thread
From: Chao Yu @ 2019-11-05 11:06 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, Jaegeuk Kim, linux-f2fs-devel; +Cc: Damien Le Moal

On 2019/10/28 14:55, Shin'ichiro Kawasaki wrote:
> On sudden f2fs shutdown, write pointers of zoned block devices can go
> further but f2fs meta data keeps current segments at positions before the
> write operations. After remounting the f2fs, this inconsistency causes
> write operations not at write pointers and "Unaligned write command"
> error is reported.
> 
> To avoid the error, have f2fs.fsck check consistency of write pointers
> of open zones that current segments point to. Compare each current
> segment's position and the write pointer position of the open zone. If
> inconsistency is found and 'fix_on' flag is set, assign a new zone to the
> current segment and check the newly assigned zone has write pointer at
> the zone start. Leave the original zone as is to keep data recorded in
> it.
> 
> To care about fsync data, refer each seg_entry's ckpt_valid_map to get
> the last valid block in the zone. If the last valid block is beyond the
> current segments position, fsync data exits in the zone. In case fsync
> data exists, do not assign a new zone to the current segment not to lose
> the fsync data. It is expected that the kernel replay the fsync data and
> fix the write pointer inconsistency at mount time.
> 
> Also check consistency between write pointer of the zone the current
> segment points to with valid block maps of the zone. If the last valid
> block is beyond the write pointer position, report to indicate f2fs bug.
> If 'fix_on' flag is set, assign a new zone to the current segment.
> 
> When inconsistencies are found, turn on 'bug_on' flag in fsck_verify() to
> ask users to fix them or not. When inconsistencies get fixed, turn on
> 'force' flag in fsck_verify() to enforce fixes in following checks.
> 
> This check and fix is done twice. The first is done at the beginning of
> do_fsck() function so that other fixes can reflect the current segment
> modification. The second is done in fsck_verify() to reflect updated meta
> data by other fixes.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  fsck/f2fs.h  |   5 ++
>  fsck/fsck.c  | 154 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  fsck/fsck.h  |   3 +
>  fsck/main.c  |   2 +
>  fsck/mount.c |  49 +++++++++++++++-
>  5 files changed, 212 insertions(+), 1 deletion(-)
> 
> diff --git a/fsck/f2fs.h b/fsck/f2fs.h
> index 399c74d..07513cb 100644
> --- a/fsck/f2fs.h
> +++ b/fsck/f2fs.h
> @@ -429,6 +429,11 @@ static inline block_t __end_block_addr(struct f2fs_sb_info *sbi)
>  #define GET_BLKOFF_FROM_SEG0(sbi, blk_addr)				\
>  	(GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & (sbi->blocks_per_seg - 1))
>  
> +#define GET_SEC_FROM_SEG(sbi, segno)					\
> +	((segno) / (sbi)->segs_per_sec)
> +#define GET_SEG_FROM_SEC(sbi, secno)					\
> +	((secno) * (sbi)->segs_per_sec)
> +
>  #define FREE_I_START_SEGNO(sbi)						\
>  	GET_SEGNO_FROM_SEG0(sbi, SM_I(sbi)->main_blkaddr)
>  #define GET_R2L_SEGNO(sbi, segno)	(segno + FREE_I_START_SEGNO(sbi))
> diff --git a/fsck/fsck.c b/fsck/fsck.c
> index 2ae3bd5..e0eda4e 100644
> --- a/fsck/fsck.c
> +++ b/fsck/fsck.c
> @@ -2181,6 +2181,125 @@ static void fix_checkpoints(struct f2fs_sb_info *sbi)
>  	fix_checkpoint(sbi);
>  }
>  
> +#ifdef HAVE_LINUX_BLKZONED_H
> +
> +/*
> + * Refer valid block map and return offset of the last valid block in the zone.
> + * Obtain valid block map from SIT and fsync data.
> + * If there is no valid block in the zone, return -1.
> + */
> +static int last_vblk_off_in_zone(struct f2fs_sb_info *sbi,
> +				 unsigned int zone_segno)
> +{
> +	unsigned int s;
> +	unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
> +	struct seg_entry *se;
> +	block_t b;
> +	int ret = -1;
> +
> +	for (s = 0; s < segs_per_zone; s++) {
> +		se = get_seg_entry(sbi, zone_segno + s);
> +
> +		/*
> +		 * Refer not cur_valid_map but ckpt_valid_map which reflects
> +		 * fsync data.
> +		 */
> +		ASSERT(se->ckpt_valid_map);
> +		for (b = 0; b < sbi->blocks_per_seg; b++)
> +			if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map))
> +				ret = b + (s << sbi->log_blocks_per_seg);

Minor thing, I guess we only need to find last valid block in target zone?

int s, b;

for (s = segs_per_zone - 1; s >= 0; s--) {
	for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
		if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map)) {
			ret = b + (s << sbi->log_blocks_per_seg);
			break;
		}
}

> +	}
> +
> +	return ret;
> +}
> +
> +static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> +{
> +	struct curseg_info *curseg = CURSEG_I(sbi, type);
> +	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
> +	struct blk_zone blkz;
> +	block_t cs_block, wp_block, zone_last_vblock;
> +	u_int64_t cs_sector, wp_sector;
> +	int i, ret;
> +	unsigned int zone_segno;
> +	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> +
> +	/* get the device the curseg points to */
> +	cs_block = START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff;
> +	for (i = 0; i < MAX_DEVICES; i++) {
> +		if (!c.devices[i].path)
> +			break;
> +		if (c.devices[i].start_blkaddr <= cs_block &&
> +		    cs_block <= c.devices[i].end_blkaddr)
> +			break;
> +	}
> +
> +	if (i >= MAX_DEVICES)
> +		return -EINVAL;
> +
> +	/* get write pointer position of the zone the curseg points to */
> +	cs_sector = (cs_block - c.devices[i].start_blkaddr)
> +		<< log_sectors_per_block;
> +	ret = f2fs_report_zone(i, cs_sector, &blkz);
> +	if (ret)
> +		return ret;
> +
> +	if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ)
> +		return 0;
> +
> +	/* check consistency between the curseg and the write pointer */
> +	wp_block = c.devices[i].start_blkaddr +
> +		(blk_zone_wp_sector(&blkz) >> log_sectors_per_block);
> +	wp_sector = blk_zone_wp_sector(&blkz);
> +
> +	if (cs_sector == wp_sector)
> +		return 0;
> +
> +	if (cs_sector > wp_sector) {
> +		MSG(0, "Inconsistent write pointer with curseg %d: "
> +		    "curseg %d[0x%x,0x%x] > wp[0x%x,0x%x]\n",
> +		    type, type, curseg->segno, curseg->next_blkoff,
> +		    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
> +		fsck->chk.wp_inconsistent_zones++;
> +		return -EINVAL;
> +	}
> +
> +	MSG(0, "Write pointer goes advance from curseg %d: "
> +	    "curseg %d[0x%x,0x%x] wp[0x%x,0x%x]\n",
> +	    type, type, curseg->segno, curseg->next_blkoff,
> +	    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
> +
> +	zone_segno = GET_SEG_FROM_SEC(sbi,
> +				      GET_SEC_FROM_SEG(sbi, curseg->segno));
> +	zone_last_vblock = START_BLOCK(sbi, zone_segno) +
> +		last_vblk_off_in_zone(sbi, zone_segno);
> +
> +	/*
> +	 * If fsync data exists between the curseg and the last valid block,
> +	 * it is not an error to fix. Leave it for kernel to recover later.
> +	 */
> +	if (cs_block <= zone_last_vblock) {

According to above comments, should condition be?

if (cs_block < zone_last_vblock && zone_last_vblock <= wp_block)

> +		MSG(0, "Curseg has fsync data: curseg %d[0x%x,0x%x] "
> +		    "last valid block in zone[0x%x,0x%x]\n",
> +		    type, curseg->segno, curseg->next_blkoff,
> +		    GET_SEGNO(sbi, zone_last_vblock),
> +		    OFFSET_IN_SEG(sbi, zone_last_vblock));
> +		return 0;
> +	}
> +
> +	fsck->chk.wp_inconsistent_zones++;
> +	return -EINVAL;
> +}
> +
> +#else
> +
> +static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> +{
> +	return 0;
> +}
> +
> +#endif
> +
>  int check_curseg_offset(struct f2fs_sb_info *sbi, int type)
>  {
>  	struct curseg_info *curseg = CURSEG_I(sbi, type);
> @@ -2209,6 +2328,10 @@ int check_curseg_offset(struct f2fs_sb_info *sbi, int type)
>  			return -EINVAL;
>  		}
>  	}
> +
> +	if (c.zoned_model == F2FS_ZONED_HM)
> +		return check_curseg_write_pointer(sbi, type);
> +
>  	return 0;
>  }
>  
> @@ -2628,6 +2751,23 @@ out:
>  	return cnt;
>  }
>  
> +/*
> + * Check and fix consistency with write pointers at the beginning of
> + * fsck so that following writes by fsck do not fail.
> + */
> +void fsck_chk_and_fix_write_pointers(struct f2fs_sb_info *sbi)
> +{
> +	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
> +
> +	if (c.zoned_model != F2FS_ZONED_HM)
> +		return;
> +
> +	if (check_curseg_offsets(sbi) && c.fix_on) {
> +		fix_curseg_info(sbi);
> +		fsck->chk.wp_fixed = 1;
> +	}
> +}
> +
>  int fsck_chk_curseg_info(struct f2fs_sb_info *sbi)
>  {
>  	struct curseg_info *curseg;
> @@ -2678,6 +2818,20 @@ int fsck_verify(struct f2fs_sb_info *sbi)
>  
>  	printf("\n");
>  
> +	if (c.zoned_model == F2FS_ZONED_HM) {
> +		printf("[FSCK] Write pointers consistency                    ");
> +		if (fsck->chk.wp_inconsistent_zones == 0x0) {
> +			printf(" [Ok..]\n");
> +		} else {
> +			printf(" [Fail] [0x%x]\n",
> +			       fsck->chk.wp_inconsistent_zones);
> +			c.bug_on = 1;
> +		}
> +
> +		if (fsck->chk.wp_fixed && c.fix_on)
> +			force = 1;
> +	}
> +
>  	if (c.feature & cpu_to_le32(F2FS_FEATURE_LOST_FOUND)) {
>  		for (i = 0; i < fsck->nr_nat_entries; i++)
>  			if (f2fs_test_bit(i, fsck->nat_area_bitmap) != 0)
> diff --git a/fsck/fsck.h b/fsck/fsck.h
> index 75052d8..c4432e8 100644
> --- a/fsck/fsck.h
> +++ b/fsck/fsck.h
> @@ -80,6 +80,8 @@ struct f2fs_fsck {
>  		u32 multi_hard_link_files;
>  		u64 sit_valid_blocks;
>  		u32 sit_free_segs;
> +		u32 wp_fixed;
> +		u32 wp_inconsistent_zones;
>  	} chk;
>  
>  	struct hard_link_node *hard_link_list_head;
> @@ -162,6 +164,7 @@ int fsck_chk_inline_dentries(struct f2fs_sb_info *, struct f2fs_node *,
>  		struct child_info *);
>  void fsck_chk_checkpoint(struct f2fs_sb_info *sbi);
>  int fsck_chk_meta(struct f2fs_sb_info *sbi);
> +void fsck_chk_and_fix_write_pointers(struct f2fs_sb_info *);
>  int fsck_chk_curseg_info(struct f2fs_sb_info *);
>  void pretty_print_filename(const u8 *raw_name, u32 len,
>  			   char out[F2FS_PRINT_NAMELEN], int enc_name);
> diff --git a/fsck/main.c b/fsck/main.c
> index 8c62a14..9a7d499 100644
> --- a/fsck/main.c
> +++ b/fsck/main.c
> @@ -602,6 +602,8 @@ static void do_fsck(struct f2fs_sb_info *sbi)
>  
>  	print_cp_state(flag);
>  
> +	fsck_chk_and_fix_write_pointers(sbi);
> +
>  	fsck_chk_curseg_info(sbi);
>  
>  	if (!c.fix_on && !c.bug_on) {
> diff --git a/fsck/mount.c b/fsck/mount.c
> index 2979865..5085e6c 100644
> --- a/fsck/mount.c
> +++ b/fsck/mount.c
> @@ -2465,6 +2465,52 @@ void set_section_type(struct f2fs_sb_info *sbi, unsigned int segno, int type)
>  	}
>  }
>  
> +#ifdef HAVE_LINUX_BLKZONED_H
> +
> +static bool write_pointer_at_zone_start(struct f2fs_sb_info *sbi,
> +					unsigned int zone_segno)
> +{
> +	u_int64_t sector;
> +	struct blk_zone blkz;
> +	block_t block = START_BLOCK(sbi, zone_segno);
> +	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> +	int ret, j;
> +
> +	if (c.zoned_model != F2FS_ZONED_HM)
> +		return true;
> +
> +	for (j = 0; j < MAX_DEVICES; j++) {
> +		if (!c.devices[j].path)
> +			break;
> +		if (c.devices[j].start_blkaddr <= block &&
> +		    block <= c.devices[j].end_blkaddr)
> +			break;
> +	}
> +
> +	if (j >= MAX_DEVICES)
> +		return false;
> +
> +	sector = (block - c.devices[j].start_blkaddr) << log_sectors_per_block;
> +	ret = f2fs_report_zone(j, sector, &blkz);
> +	if (ret)
> +		return false;
> +
> +	if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ)
> +		return true;
> +
> +	return blk_zone_sector(&blkz) == blk_zone_wp_sector(&blkz);
> +}
> +
> +#else
> +
> +static bool write_pointer_at_zone_start(struct f2fs_sb_info *sbi,
> +					unsigned int zone_segno)
> +{
> +	return true;
> +}
> +
> +#endif
> +
>  int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_type, bool new_sec)
>  {
>  	struct f2fs_super_block *sb = F2FS_RAW_SUPER(sbi);
> @@ -2517,7 +2563,8 @@ int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_t
>  					break;
>  			}
>  
> -			if (i == sbi->segs_per_sec) {
> +			if (i == sbi->segs_per_sec &&
> +			    write_pointer_at_zone_start(sbi, segno)) {
>  				set_section_type(sbi, segno, want_type);
>  				return 0;
>  			}
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [f2fs-dev] [PATCH v6 8/8] fsck: Check write pointer consistency of non-open zones
  2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 8/8] fsck: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
@ 2019-11-05 11:32   ` Chao Yu
  2019-11-06  9:49     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 15+ messages in thread
From: Chao Yu @ 2019-11-05 11:32 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, Jaegeuk Kim, linux-f2fs-devel; +Cc: Damien Le Moal

On 2019/10/28 14:55, Shin'ichiro Kawasaki wrote:
> To catch f2fs bug in write pointer handling code for zoned block devices,
> have fsck check consistency of write pointers of non-open zones, that
> current segments do not point to. Check two items comparing write pointer
> positions with valid block maps in SIT.
> 
> The first item is check for zones with no valid blocks. When there is no
> valid blocks in a zone, the write pointer should be at the start of the
> zone. If not, next write operation to the zone will cause unaligned write
> error. If write pointer is not at the zone start, reset the zone to move
> the write pointer to the zone start.
> 
> The second item is check between write pointer position and the last
> valid block in the zone. It is unexpected that the last valid block
> position is beyond the write pointer. In such a case, report as the bug.
> Fix is not required for such zone, because the zone is not selected for
> next write operation until the zone get discarded.
> 
> In the same manner as the consistency check for current segments, do the
> check and fix twice: at the beginning of do_fsck() to avoid unaligned
> write error during fsck, and at fsck_verify() to reflect meta data
> updates by fsck.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  fsck/fsck.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 119 insertions(+)
> 
> diff --git a/fsck/fsck.c b/fsck/fsck.c
> index e0eda4e..8400929 100644
> --- a/fsck/fsck.c
> +++ b/fsck/fsck.c
> @@ -2751,6 +2751,122 @@ out:
>  	return cnt;
>  }
>  
> +#ifdef HAVE_LINUX_BLKZONED_H
> +
> +struct write_pointer_check_data {
> +	struct f2fs_sb_info *sbi;
> +	int dev_index;
> +};
> +
> +static int chk_and_fix_wp_with_sit(int i, void *blkzone, void *opaque)
> +{
> +	struct blk_zone *blkz = (struct blk_zone *)blkzone;
> +	struct write_pointer_check_data *wpd = opaque;
> +	struct f2fs_sb_info *sbi = wpd->sbi;
> +	struct device_info *dev = c.devices + wpd->dev_index;
> +	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
> +	block_t zone_block, wp_block, wp_blkoff;
> +	unsigned int zone_segno, wp_segno;
> +	struct curseg_info *cs;
> +	int cs_index, ret, last_valid_blkoff;
> +	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> +	unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
> +
> +	if (blk_zone_conv(blkz))
> +		return 0;
> +
> +	zone_block = dev->start_blkaddr
> +		+ (blk_zone_sector(blkz) >> log_sectors_per_block);
> +	zone_segno = GET_SEGNO(sbi, zone_block);
> +	if (zone_segno >= MAIN_SEGS(sbi))
> +		return 0;
> +
> +	wp_block = dev->start_blkaddr
> +		+ (blk_zone_wp_sector(blkz) >> log_sectors_per_block);
> +	wp_segno = GET_SEGNO(sbi, wp_block);
> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> +
> +	/* if a curseg points to the zone, skip the check */
> +	for (cs_index = 0; cs_index < NO_CHECK_TYPE; cs_index++) {
> +		cs = &SM_I(sbi)->curseg_array[cs_index];
> +		if (zone_segno <= cs->segno &&
> +		    cs->segno < zone_segno + segs_per_zone)
> +			return 0;
> +	}
> +
> +	last_valid_blkoff = last_vblk_off_in_zone(sbi, zone_segno);
> +
> +	/*
> +	 * When there is no valid block in the zone, check write pointer is
> +	 * at zone start. If not, reset the write pointer.
> +	 */
> +	if (last_valid_blkoff < 0 &&
> +	    blk_zone_wp_sector(blkz) != blk_zone_sector(blkz)) {
> +		if (!c.fix_on) {
> +			MSG(0, "Inconsistent write pointer: wp[0x%x,0x%x]\n",
> +			    wp_segno, wp_blkoff);
> +			fsck->chk.wp_inconsistent_zones++;
> +			return 0;
> +		}
> +
> +		FIX_MSG("Reset write pointer of zone at segment 0x%x",
> +			zone_segno);
> +		ret = f2fs_reset_zone(wpd->dev_index, blkz);
> +		if (ret) {
> +			printf("[FSCK] Write pointer reset failed: %s\n",
> +			       dev->path);
> +			return ret;
> +		}
> +		fsck->chk.wp_fixed = 1;
> +		return 0;
> +	}
> +
> +	/*
> +	 * If valid blocks exist in the zone beyond the write pointer, it
> +	 * is a f2fs bug. No need to fix because the zone is not selected

Minor thing, mostly probably it is a f2fs bug, however there should be
software/hardware bug in other layer can cause such inconsistent.. so let's get
rid of such first impression. :)

Reviewed-by: Chao Yu <yuchao0@huawei.com>

Thanks,

> +	 * for the write. Just report it.
> +	 */
> +	if (last_valid_blkoff + zone_block > wp_block) {
> +		MSG(0, "Unexpected invalid write pointer: wp[0x%x,0x%x]\n",
> +		    wp_segno, wp_blkoff);
> +		return 0;
> +	}
> +
> +	return 0;
> +}
> +
> +static void fix_wp_sit_alignment(struct f2fs_sb_info *sbi)
> +{
> +	unsigned int i;
> +	struct write_pointer_check_data wpd = {	sbi, 0 };
> +
> +	if (c.zoned_model != F2FS_ZONED_HM)
> +		return;
> +
> +	for (i = 0; i < MAX_DEVICES; i++) {
> +		if (!c.devices[i].path)
> +			break;
> +		if (c.devices[i].zoned_model != F2FS_ZONED_HM)
> +			break;
> +
> +		wpd.dev_index = i;
> +		if (f2fs_report_zones(i, chk_and_fix_wp_with_sit, &wpd)) {
> +			printf("[FSCK] Write pointer check failed: %s\n",
> +			       c.devices[i].path);
> +			return;
> +		}
> +	}
> +}
> +
> +#else
> +
> +static void fix_wp_sit_alignment(struct f2fs_sb_info *sbi)
> +{
> +	return;
> +}
> +
> +#endif
> +
>  /*
>   * Check and fix consistency with write pointers at the beginning of
>   * fsck so that following writes by fsck do not fail.
> @@ -2766,6 +2882,8 @@ void fsck_chk_and_fix_write_pointers(struct f2fs_sb_info *sbi)
>  		fix_curseg_info(sbi);
>  		fsck->chk.wp_fixed = 1;
>  	}
> +
> +	fix_wp_sit_alignment(sbi);
>  }
>  
>  int fsck_chk_curseg_info(struct f2fs_sb_info *sbi)
> @@ -2984,6 +3102,7 @@ int fsck_verify(struct f2fs_sb_info *sbi)
>  			fix_hard_links(sbi);
>  			fix_nat_entries(sbi);
>  			rewrite_sit_area_bitmap(sbi);
> +			fix_wp_sit_alignment(sbi);
>  			fix_curseg_info(sbi);
>  			fix_checksum(sbi);
>  			fix_checkpoints(sbi);
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [f2fs-dev] [PATCH v6 7/8] fsck: Check write pointer consistency of open zones
  2019-11-05 11:06   ` Chao Yu
@ 2019-11-06  9:45     ` Shinichiro Kawasaki
  2019-11-11  3:14       ` Chao Yu
  0 siblings, 1 reply; 15+ messages in thread
From: Shinichiro Kawasaki @ 2019-11-06  9:45 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 05, 2019 / 19:06, Chao Yu wrote:
> On 2019/10/28 14:55, Shin'ichiro Kawasaki wrote:
> > On sudden f2fs shutdown, write pointers of zoned block devices can go
> > further but f2fs meta data keeps current segments at positions before the
> > write operations. After remounting the f2fs, this inconsistency causes
> > write operations not at write pointers and "Unaligned write command"
> > error is reported.
> > 
> > To avoid the error, have f2fs.fsck check consistency of write pointers
> > of open zones that current segments point to. Compare each current
> > segment's position and the write pointer position of the open zone. If
> > inconsistency is found and 'fix_on' flag is set, assign a new zone to the
> > current segment and check the newly assigned zone has write pointer at
> > the zone start. Leave the original zone as is to keep data recorded in
> > it.
> > 
> > To care about fsync data, refer each seg_entry's ckpt_valid_map to get
> > the last valid block in the zone. If the last valid block is beyond the
> > current segments position, fsync data exits in the zone. In case fsync
> > data exists, do not assign a new zone to the current segment not to lose
> > the fsync data. It is expected that the kernel replay the fsync data and
> > fix the write pointer inconsistency at mount time.
> > 
> > Also check consistency between write pointer of the zone the current
> > segment points to with valid block maps of the zone. If the last valid
> > block is beyond the write pointer position, report to indicate f2fs bug.
> > If 'fix_on' flag is set, assign a new zone to the current segment.
> > 
> > When inconsistencies are found, turn on 'bug_on' flag in fsck_verify() to
> > ask users to fix them or not. When inconsistencies get fixed, turn on
> > 'force' flag in fsck_verify() to enforce fixes in following checks.
> > 
> > This check and fix is done twice. The first is done at the beginning of
> > do_fsck() function so that other fixes can reflect the current segment
> > modification. The second is done in fsck_verify() to reflect updated meta
> > data by other fixes.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  fsck/f2fs.h  |   5 ++
> >  fsck/fsck.c  | 154 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  fsck/fsck.h  |   3 +
> >  fsck/main.c  |   2 +
> >  fsck/mount.c |  49 +++++++++++++++-
> >  5 files changed, 212 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fsck/f2fs.h b/fsck/f2fs.h
> > index 399c74d..07513cb 100644
> > --- a/fsck/f2fs.h
> > +++ b/fsck/f2fs.h
> > @@ -429,6 +429,11 @@ static inline block_t __end_block_addr(struct f2fs_sb_info *sbi)
> >  #define GET_BLKOFF_FROM_SEG0(sbi, blk_addr)				\
> >  	(GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & (sbi->blocks_per_seg - 1))
> >  
> > +#define GET_SEC_FROM_SEG(sbi, segno)					\
> > +	((segno) / (sbi)->segs_per_sec)
> > +#define GET_SEG_FROM_SEC(sbi, secno)					\
> > +	((secno) * (sbi)->segs_per_sec)
> > +
> >  #define FREE_I_START_SEGNO(sbi)						\
> >  	GET_SEGNO_FROM_SEG0(sbi, SM_I(sbi)->main_blkaddr)
> >  #define GET_R2L_SEGNO(sbi, segno)	(segno + FREE_I_START_SEGNO(sbi))
> > diff --git a/fsck/fsck.c b/fsck/fsck.c
> > index 2ae3bd5..e0eda4e 100644
> > --- a/fsck/fsck.c
> > +++ b/fsck/fsck.c
> > @@ -2181,6 +2181,125 @@ static void fix_checkpoints(struct f2fs_sb_info *sbi)
> >  	fix_checkpoint(sbi);
> >  }
> >  
> > +#ifdef HAVE_LINUX_BLKZONED_H
> > +
> > +/*
> > + * Refer valid block map and return offset of the last valid block in the zone.
> > + * Obtain valid block map from SIT and fsync data.
> > + * If there is no valid block in the zone, return -1.
> > + */
> > +static int last_vblk_off_in_zone(struct f2fs_sb_info *sbi,
> > +				 unsigned int zone_segno)
> > +{
> > +	unsigned int s;
> > +	unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
> > +	struct seg_entry *se;
> > +	block_t b;
> > +	int ret = -1;
> > +
> > +	for (s = 0; s < segs_per_zone; s++) {
> > +		se = get_seg_entry(sbi, zone_segno + s);
> > +
> > +		/*
> > +		 * Refer not cur_valid_map but ckpt_valid_map which reflects
> > +		 * fsync data.
> > +		 */
> > +		ASSERT(se->ckpt_valid_map);
> > +		for (b = 0; b < sbi->blocks_per_seg; b++)
> > +			if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map))
> > +				ret = b + (s << sbi->log_blocks_per_seg);
> 
> Minor thing, I guess we only need to find last valid block in target zone?
> 
> int s, b;
> 
> for (s = segs_per_zone - 1; s >= 0; s--) {
> 	for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
> 		if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map)) {
> 			ret = b + (s << sbi->log_blocks_per_seg);
> 			break;
> 		}
> }

Yes, reveresed search is the better. Will modify the code as suggested.

> 
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> > +{
> > +	struct curseg_info *curseg = CURSEG_I(sbi, type);
> > +	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
> > +	struct blk_zone blkz;
> > +	block_t cs_block, wp_block, zone_last_vblock;
> > +	u_int64_t cs_sector, wp_sector;
> > +	int i, ret;
> > +	unsigned int zone_segno;
> > +	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> > +
> > +	/* get the device the curseg points to */
> > +	cs_block = START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff;
> > +	for (i = 0; i < MAX_DEVICES; i++) {
> > +		if (!c.devices[i].path)
> > +			break;
> > +		if (c.devices[i].start_blkaddr <= cs_block &&
> > +		    cs_block <= c.devices[i].end_blkaddr)
> > +			break;
> > +	}
> > +
> > +	if (i >= MAX_DEVICES)
> > +		return -EINVAL;
> > +
> > +	/* get write pointer position of the zone the curseg points to */
> > +	cs_sector = (cs_block - c.devices[i].start_blkaddr)
> > +		<< log_sectors_per_block;
> > +	ret = f2fs_report_zone(i, cs_sector, &blkz);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ)
> > +		return 0;
> > +
> > +	/* check consistency between the curseg and the write pointer */
> > +	wp_block = c.devices[i].start_blkaddr +
> > +		(blk_zone_wp_sector(&blkz) >> log_sectors_per_block);
> > +	wp_sector = blk_zone_wp_sector(&blkz);
> > +
> > +	if (cs_sector == wp_sector)
> > +		return 0;
> > +
> > +	if (cs_sector > wp_sector) {
> > +		MSG(0, "Inconsistent write pointer with curseg %d: "
> > +		    "curseg %d[0x%x,0x%x] > wp[0x%x,0x%x]\n",
> > +		    type, type, curseg->segno, curseg->next_blkoff,
> > +		    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
> > +		fsck->chk.wp_inconsistent_zones++;
> > +		return -EINVAL;
> > +	}
> > +
> > +	MSG(0, "Write pointer goes advance from curseg %d: "
> > +	    "curseg %d[0x%x,0x%x] wp[0x%x,0x%x]\n",
> > +	    type, type, curseg->segno, curseg->next_blkoff,
> > +	    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
> > +
> > +	zone_segno = GET_SEG_FROM_SEC(sbi,
> > +				      GET_SEC_FROM_SEG(sbi, curseg->segno));
> > +	zone_last_vblock = START_BLOCK(sbi, zone_segno) +
> > +		last_vblk_off_in_zone(sbi, zone_segno);
> > +
> > +	/*
> > +	 * If fsync data exists between the curseg and the last valid block,
> > +	 * it is not an error to fix. Leave it for kernel to recover later.
> > +	 */
> > +	if (cs_block <= zone_last_vblock) {
> 
> According to above comments, should condition be?
> 
> if (cs_block < zone_last_vblock && zone_last_vblock <= wp_block)
>

To be precise, cs_block points to curseg->next_blkoff, which is the block
curseg will write in the next write request. Then, if cs_block equals to
zone_last_vblock, it means that the block curseg->next_blkoff points to
already have valid block and fsync data. Then, comparator between cs_block
and zone_last_vblock should be "<=".

I agree that it is the better to check zone_last_vblock with wp_block.
wp_block corresponds to the write pointer position that next write will be
made. It wp_block equals to zone_last_vblock, it means that unexpected data
is written beyond the write pointer. Then, comparator should be "<" between
zone_last_vblock and wp_block.

In short, I suggest the condition check below as the good one.

if (cs_block <= zone_last_vblock && zone_last_vblock < wp_block)

> > +		MSG(0, "Curseg has fsync data: curseg %d[0x%x,0x%x] "
> > +		    "last valid block in zone[0x%x,0x%x]\n",
> > +		    type, curseg->segno, curseg->next_blkoff,
> > +		    GET_SEGNO(sbi, zone_last_vblock),
> > +		    OFFSET_IN_SEG(sbi, zone_last_vblock));
> > +		return 0;
> > +	}
> > +
> > +	fsck->chk.wp_inconsistent_zones++;
> > +	return -EINVAL;
> > +}
> > +
> > +#else
> > +
> > +static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> > +{
> > +	return 0;
> > +}
> > +
> > +#endif
> > +
> >  int check_curseg_offset(struct f2fs_sb_info *sbi, int type)
> >  {
> >  	struct curseg_info *curseg = CURSEG_I(sbi, type);
> > @@ -2209,6 +2328,10 @@ int check_curseg_offset(struct f2fs_sb_info *sbi, int type)
> >  			return -EINVAL;
> >  		}
> >  	}
> > +
> > +	if (c.zoned_model == F2FS_ZONED_HM)
> > +		return check_curseg_write_pointer(sbi, type);
> > +
> >  	return 0;
> >  }
> >  
> > @@ -2628,6 +2751,23 @@ out:
> >  	return cnt;
> >  }
> >  
> > +/*
> > + * Check and fix consistency with write pointers at the beginning of
> > + * fsck so that following writes by fsck do not fail.
> > + */
> > +void fsck_chk_and_fix_write_pointers(struct f2fs_sb_info *sbi)
> > +{
> > +	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
> > +
> > +	if (c.zoned_model != F2FS_ZONED_HM)
> > +		return;
> > +
> > +	if (check_curseg_offsets(sbi) && c.fix_on) {
> > +		fix_curseg_info(sbi);
> > +		fsck->chk.wp_fixed = 1;
> > +	}
> > +}
> > +
> >  int fsck_chk_curseg_info(struct f2fs_sb_info *sbi)
> >  {
> >  	struct curseg_info *curseg;
> > @@ -2678,6 +2818,20 @@ int fsck_verify(struct f2fs_sb_info *sbi)
> >  
> >  	printf("\n");
> >  
> > +	if (c.zoned_model == F2FS_ZONED_HM) {
> > +		printf("[FSCK] Write pointers consistency                    ");
> > +		if (fsck->chk.wp_inconsistent_zones == 0x0) {
> > +			printf(" [Ok..]\n");
> > +		} else {
> > +			printf(" [Fail] [0x%x]\n",
> > +			       fsck->chk.wp_inconsistent_zones);
> > +			c.bug_on = 1;
> > +		}
> > +
> > +		if (fsck->chk.wp_fixed && c.fix_on)
> > +			force = 1;
> > +	}
> > +
> >  	if (c.feature & cpu_to_le32(F2FS_FEATURE_LOST_FOUND)) {
> >  		for (i = 0; i < fsck->nr_nat_entries; i++)
> >  			if (f2fs_test_bit(i, fsck->nat_area_bitmap) != 0)
> > diff --git a/fsck/fsck.h b/fsck/fsck.h
> > index 75052d8..c4432e8 100644
> > --- a/fsck/fsck.h
> > +++ b/fsck/fsck.h
> > @@ -80,6 +80,8 @@ struct f2fs_fsck {
> >  		u32 multi_hard_link_files;
> >  		u64 sit_valid_blocks;
> >  		u32 sit_free_segs;
> > +		u32 wp_fixed;
> > +		u32 wp_inconsistent_zones;
> >  	} chk;
> >  
> >  	struct hard_link_node *hard_link_list_head;
> > @@ -162,6 +164,7 @@ int fsck_chk_inline_dentries(struct f2fs_sb_info *, struct f2fs_node *,
> >  		struct child_info *);
> >  void fsck_chk_checkpoint(struct f2fs_sb_info *sbi);
> >  int fsck_chk_meta(struct f2fs_sb_info *sbi);
> > +void fsck_chk_and_fix_write_pointers(struct f2fs_sb_info *);
> >  int fsck_chk_curseg_info(struct f2fs_sb_info *);
> >  void pretty_print_filename(const u8 *raw_name, u32 len,
> >  			   char out[F2FS_PRINT_NAMELEN], int enc_name);
> > diff --git a/fsck/main.c b/fsck/main.c
> > index 8c62a14..9a7d499 100644
> > --- a/fsck/main.c
> > +++ b/fsck/main.c
> > @@ -602,6 +602,8 @@ static void do_fsck(struct f2fs_sb_info *sbi)
> >  
> >  	print_cp_state(flag);
> >  
> > +	fsck_chk_and_fix_write_pointers(sbi);
> > +
> >  	fsck_chk_curseg_info(sbi);
> >  
> >  	if (!c.fix_on && !c.bug_on) {
> > diff --git a/fsck/mount.c b/fsck/mount.c
> > index 2979865..5085e6c 100644
> > --- a/fsck/mount.c
> > +++ b/fsck/mount.c
> > @@ -2465,6 +2465,52 @@ void set_section_type(struct f2fs_sb_info *sbi, unsigned int segno, int type)
> >  	}
> >  }
> >  
> > +#ifdef HAVE_LINUX_BLKZONED_H
> > +
> > +static bool write_pointer_at_zone_start(struct f2fs_sb_info *sbi,
> > +					unsigned int zone_segno)
> > +{
> > +	u_int64_t sector;
> > +	struct blk_zone blkz;
> > +	block_t block = START_BLOCK(sbi, zone_segno);
> > +	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> > +	int ret, j;
> > +
> > +	if (c.zoned_model != F2FS_ZONED_HM)
> > +		return true;
> > +
> > +	for (j = 0; j < MAX_DEVICES; j++) {
> > +		if (!c.devices[j].path)
> > +			break;
> > +		if (c.devices[j].start_blkaddr <= block &&
> > +		    block <= c.devices[j].end_blkaddr)
> > +			break;
> > +	}
> > +
> > +	if (j >= MAX_DEVICES)
> > +		return false;
> > +
> > +	sector = (block - c.devices[j].start_blkaddr) << log_sectors_per_block;
> > +	ret = f2fs_report_zone(j, sector, &blkz);
> > +	if (ret)
> > +		return false;
> > +
> > +	if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ)
> > +		return true;
> > +
> > +	return blk_zone_sector(&blkz) == blk_zone_wp_sector(&blkz);
> > +}
> > +
> > +#else
> > +
> > +static bool write_pointer_at_zone_start(struct f2fs_sb_info *sbi,
> > +					unsigned int zone_segno)
> > +{
> > +	return true;
> > +}
> > +
> > +#endif
> > +
> >  int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_type, bool new_sec)
> >  {
> >  	struct f2fs_super_block *sb = F2FS_RAW_SUPER(sbi);
> > @@ -2517,7 +2563,8 @@ int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_t
> >  					break;
> >  			}
> >  
> > -			if (i == sbi->segs_per_sec) {
> > +			if (i == sbi->segs_per_sec &&
> > +			    write_pointer_at_zone_start(sbi, segno)) {
> >  				set_section_type(sbi, segno, want_type);
> >  				return 0;
> >  			}
> > 

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [f2fs-dev] [PATCH v6 8/8] fsck: Check write pointer consistency of non-open zones
  2019-11-05 11:32   ` Chao Yu
@ 2019-11-06  9:49     ` Shinichiro Kawasaki
  0 siblings, 0 replies; 15+ messages in thread
From: Shinichiro Kawasaki @ 2019-11-06  9:49 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 05, 2019 / 19:32, Chao Yu wrote:
> On 2019/10/28 14:55, Shin'ichiro Kawasaki wrote:
> > To catch f2fs bug in write pointer handling code for zoned block devices,
> > have fsck check consistency of write pointers of non-open zones, that
> > current segments do not point to. Check two items comparing write pointer
> > positions with valid block maps in SIT.
> > 
> > The first item is check for zones with no valid blocks. When there is no
> > valid blocks in a zone, the write pointer should be at the start of the
> > zone. If not, next write operation to the zone will cause unaligned write
> > error. If write pointer is not at the zone start, reset the zone to move
> > the write pointer to the zone start.
> > 
> > The second item is check between write pointer position and the last
> > valid block in the zone. It is unexpected that the last valid block
> > position is beyond the write pointer. In such a case, report as the bug.
> > Fix is not required for such zone, because the zone is not selected for
> > next write operation until the zone get discarded.
> > 
> > In the same manner as the consistency check for current segments, do the
> > check and fix twice: at the beginning of do_fsck() to avoid unaligned
> > write error during fsck, and at fsck_verify() to reflect meta data
> > updates by fsck.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  fsck/fsck.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 119 insertions(+)
> > 
> > diff --git a/fsck/fsck.c b/fsck/fsck.c
> > index e0eda4e..8400929 100644
> > --- a/fsck/fsck.c
> > +++ b/fsck/fsck.c
> > @@ -2751,6 +2751,122 @@ out:
> >  	return cnt;
> >  }
> >  
> > +#ifdef HAVE_LINUX_BLKZONED_H
> > +
> > +struct write_pointer_check_data {
> > +	struct f2fs_sb_info *sbi;
> > +	int dev_index;
> > +};
> > +
> > +static int chk_and_fix_wp_with_sit(int i, void *blkzone, void *opaque)
> > +{
> > +	struct blk_zone *blkz = (struct blk_zone *)blkzone;
> > +	struct write_pointer_check_data *wpd = opaque;
> > +	struct f2fs_sb_info *sbi = wpd->sbi;
> > +	struct device_info *dev = c.devices + wpd->dev_index;
> > +	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
> > +	block_t zone_block, wp_block, wp_blkoff;
> > +	unsigned int zone_segno, wp_segno;
> > +	struct curseg_info *cs;
> > +	int cs_index, ret, last_valid_blkoff;
> > +	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> > +	unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
> > +
> > +	if (blk_zone_conv(blkz))
> > +		return 0;
> > +
> > +	zone_block = dev->start_blkaddr
> > +		+ (blk_zone_sector(blkz) >> log_sectors_per_block);
> > +	zone_segno = GET_SEGNO(sbi, zone_block);
> > +	if (zone_segno >= MAIN_SEGS(sbi))
> > +		return 0;
> > +
> > +	wp_block = dev->start_blkaddr
> > +		+ (blk_zone_wp_sector(blkz) >> log_sectors_per_block);
> > +	wp_segno = GET_SEGNO(sbi, wp_block);
> > +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> > +
> > +	/* if a curseg points to the zone, skip the check */
> > +	for (cs_index = 0; cs_index < NO_CHECK_TYPE; cs_index++) {
> > +		cs = &SM_I(sbi)->curseg_array[cs_index];
> > +		if (zone_segno <= cs->segno &&
> > +		    cs->segno < zone_segno + segs_per_zone)
> > +			return 0;
> > +	}
> > +
> > +	last_valid_blkoff = last_vblk_off_in_zone(sbi, zone_segno);
> > +
> > +	/*
> > +	 * When there is no valid block in the zone, check write pointer is
> > +	 * at zone start. If not, reset the write pointer.
> > +	 */
> > +	if (last_valid_blkoff < 0 &&
> > +	    blk_zone_wp_sector(blkz) != blk_zone_sector(blkz)) {
> > +		if (!c.fix_on) {
> > +			MSG(0, "Inconsistent write pointer: wp[0x%x,0x%x]\n",
> > +			    wp_segno, wp_blkoff);
> > +			fsck->chk.wp_inconsistent_zones++;
> > +			return 0;
> > +		}
> > +
> > +		FIX_MSG("Reset write pointer of zone at segment 0x%x",
> > +			zone_segno);
> > +		ret = f2fs_reset_zone(wpd->dev_index, blkz);
> > +		if (ret) {
> > +			printf("[FSCK] Write pointer reset failed: %s\n",
> > +			       dev->path);
> > +			return ret;
> > +		}
> > +		fsck->chk.wp_fixed = 1;
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * If valid blocks exist in the zone beyond the write pointer, it
> > +	 * is a f2fs bug. No need to fix because the zone is not selected
> 
> Minor thing, mostly probably it is a f2fs bug, however there should be
> software/hardware bug in other layer can cause such inconsistent.. so let's get
> rid of such first impression. :)
> 
> Reviewed-by: Chao Yu <yuchao0@huawei.com>
> 
> Thanks,

Ah, that is a stereotype. I think it's the better to remove the word "f2fs", as
follows. Will do that edit.

/*
 * If valid blocks exist in the zone beyond the write pointer, it
 * is a bug. No need to ...

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [f2fs-dev] [PATCH v6 7/8] fsck: Check write pointer consistency of open zones
  2019-11-06  9:45     ` Shinichiro Kawasaki
@ 2019-11-11  3:14       ` Chao Yu
  2019-11-13  1:44         ` Shinichiro Kawasaki
  0 siblings, 1 reply; 15+ messages in thread
From: Chao Yu @ 2019-11-11  3:14 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On 2019/11/6 17:45, Shinichiro Kawasaki wrote:
> On Nov 05, 2019 / 19:06, Chao Yu wrote:
>> On 2019/10/28 14:55, Shin'ichiro Kawasaki wrote:
>>> On sudden f2fs shutdown, write pointers of zoned block devices can go
>>> further but f2fs meta data keeps current segments at positions before the
>>> write operations. After remounting the f2fs, this inconsistency causes
>>> write operations not at write pointers and "Unaligned write command"
>>> error is reported.
>>>
>>> To avoid the error, have f2fs.fsck check consistency of write pointers
>>> of open zones that current segments point to. Compare each current
>>> segment's position and the write pointer position of the open zone. If
>>> inconsistency is found and 'fix_on' flag is set, assign a new zone to the
>>> current segment and check the newly assigned zone has write pointer at
>>> the zone start. Leave the original zone as is to keep data recorded in
>>> it.
>>>
>>> To care about fsync data, refer each seg_entry's ckpt_valid_map to get
>>> the last valid block in the zone. If the last valid block is beyond the
>>> current segments position, fsync data exits in the zone. In case fsync
>>> data exists, do not assign a new zone to the current segment not to lose
>>> the fsync data. It is expected that the kernel replay the fsync data and
>>> fix the write pointer inconsistency at mount time.
>>>
>>> Also check consistency between write pointer of the zone the current
>>> segment points to with valid block maps of the zone. If the last valid
>>> block is beyond the write pointer position, report to indicate f2fs bug.
>>> If 'fix_on' flag is set, assign a new zone to the current segment.
>>>
>>> When inconsistencies are found, turn on 'bug_on' flag in fsck_verify() to
>>> ask users to fix them or not. When inconsistencies get fixed, turn on
>>> 'force' flag in fsck_verify() to enforce fixes in following checks.
>>>
>>> This check and fix is done twice. The first is done at the beginning of
>>> do_fsck() function so that other fixes can reflect the current segment
>>> modification. The second is done in fsck_verify() to reflect updated meta
>>> data by other fixes.
>>>
>>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>> ---
>>>  fsck/f2fs.h  |   5 ++
>>>  fsck/fsck.c  | 154 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  fsck/fsck.h  |   3 +
>>>  fsck/main.c  |   2 +
>>>  fsck/mount.c |  49 +++++++++++++++-
>>>  5 files changed, 212 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fsck/f2fs.h b/fsck/f2fs.h
>>> index 399c74d..07513cb 100644
>>> --- a/fsck/f2fs.h
>>> +++ b/fsck/f2fs.h
>>> @@ -429,6 +429,11 @@ static inline block_t __end_block_addr(struct f2fs_sb_info *sbi)
>>>  #define GET_BLKOFF_FROM_SEG0(sbi, blk_addr)				\
>>>  	(GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & (sbi->blocks_per_seg - 1))
>>>  
>>> +#define GET_SEC_FROM_SEG(sbi, segno)					\
>>> +	((segno) / (sbi)->segs_per_sec)
>>> +#define GET_SEG_FROM_SEC(sbi, secno)					\
>>> +	((secno) * (sbi)->segs_per_sec)
>>> +
>>>  #define FREE_I_START_SEGNO(sbi)						\
>>>  	GET_SEGNO_FROM_SEG0(sbi, SM_I(sbi)->main_blkaddr)
>>>  #define GET_R2L_SEGNO(sbi, segno)	(segno + FREE_I_START_SEGNO(sbi))
>>> diff --git a/fsck/fsck.c b/fsck/fsck.c
>>> index 2ae3bd5..e0eda4e 100644
>>> --- a/fsck/fsck.c
>>> +++ b/fsck/fsck.c
>>> @@ -2181,6 +2181,125 @@ static void fix_checkpoints(struct f2fs_sb_info *sbi)
>>>  	fix_checkpoint(sbi);
>>>  }
>>>  
>>> +#ifdef HAVE_LINUX_BLKZONED_H
>>> +
>>> +/*
>>> + * Refer valid block map and return offset of the last valid block in the zone.
>>> + * Obtain valid block map from SIT and fsync data.
>>> + * If there is no valid block in the zone, return -1.
>>> + */
>>> +static int last_vblk_off_in_zone(struct f2fs_sb_info *sbi,
>>> +				 unsigned int zone_segno)
>>> +{
>>> +	unsigned int s;
>>> +	unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
>>> +	struct seg_entry *se;
>>> +	block_t b;
>>> +	int ret = -1;
>>> +
>>> +	for (s = 0; s < segs_per_zone; s++) {
>>> +		se = get_seg_entry(sbi, zone_segno + s);
>>> +
>>> +		/*
>>> +		 * Refer not cur_valid_map but ckpt_valid_map which reflects
>>> +		 * fsync data.
>>> +		 */
>>> +		ASSERT(se->ckpt_valid_map);
>>> +		for (b = 0; b < sbi->blocks_per_seg; b++)
>>> +			if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map))
>>> +				ret = b + (s << sbi->log_blocks_per_seg);
>>
>> Minor thing, I guess we only need to find last valid block in target zone?
>>
>> int s, b;
>>
>> for (s = segs_per_zone - 1; s >= 0; s--) {
>> 	for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
>> 		if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map)) {
>> 			ret = b + (s << sbi->log_blocks_per_seg);
>> 			break;
>> 		}
>> }
> 
> Yes, reveresed search is the better. Will modify the code as suggested.
> 
>>
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
>>> +{
>>> +	struct curseg_info *curseg = CURSEG_I(sbi, type);
>>> +	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
>>> +	struct blk_zone blkz;
>>> +	block_t cs_block, wp_block, zone_last_vblock;
>>> +	u_int64_t cs_sector, wp_sector;
>>> +	int i, ret;
>>> +	unsigned int zone_segno;
>>> +	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
>>> +
>>> +	/* get the device the curseg points to */
>>> +	cs_block = START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff;
>>> +	for (i = 0; i < MAX_DEVICES; i++) {
>>> +		if (!c.devices[i].path)
>>> +			break;
>>> +		if (c.devices[i].start_blkaddr <= cs_block &&
>>> +		    cs_block <= c.devices[i].end_blkaddr)
>>> +			break;
>>> +	}
>>> +
>>> +	if (i >= MAX_DEVICES)
>>> +		return -EINVAL;
>>> +
>>> +	/* get write pointer position of the zone the curseg points to */
>>> +	cs_sector = (cs_block - c.devices[i].start_blkaddr)
>>> +		<< log_sectors_per_block;
>>> +	ret = f2fs_report_zone(i, cs_sector, &blkz);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ)
>>> +		return 0;
>>> +
>>> +	/* check consistency between the curseg and the write pointer */
>>> +	wp_block = c.devices[i].start_blkaddr +
>>> +		(blk_zone_wp_sector(&blkz) >> log_sectors_per_block);
>>> +	wp_sector = blk_zone_wp_sector(&blkz);
>>> +
>>> +	if (cs_sector == wp_sector)
>>> +		return 0;
>>> +
>>> +	if (cs_sector > wp_sector) {
>>> +		MSG(0, "Inconsistent write pointer with curseg %d: "
>>> +		    "curseg %d[0x%x,0x%x] > wp[0x%x,0x%x]\n",
>>> +		    type, type, curseg->segno, curseg->next_blkoff,
>>> +		    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
>>> +		fsck->chk.wp_inconsistent_zones++;
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	MSG(0, "Write pointer goes advance from curseg %d: "
>>> +	    "curseg %d[0x%x,0x%x] wp[0x%x,0x%x]\n",
>>> +	    type, type, curseg->segno, curseg->next_blkoff,
>>> +	    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
>>> +
>>> +	zone_segno = GET_SEG_FROM_SEC(sbi,
>>> +				      GET_SEC_FROM_SEG(sbi, curseg->segno));
>>> +	zone_last_vblock = START_BLOCK(sbi, zone_segno) +
>>> +		last_vblk_off_in_zone(sbi, zone_segno);
>>> +
>>> +	/*
>>> +	 * If fsync data exists between the curseg and the last valid block,
>>> +	 * it is not an error to fix. Leave it for kernel to recover later.
>>> +	 */
>>> +	if (cs_block <= zone_last_vblock) {
>>
>> According to above comments, should condition be?
>>
>> if (cs_block < zone_last_vblock && zone_last_vblock <= wp_block)
>>
> 
> To be precise, cs_block points to curseg->next_blkoff, which is the block
> curseg will write in the next write request. Then, if cs_block equals to
> zone_last_vblock, it means that the block curseg->next_blkoff points to
> already have valid block and fsync data. Then, comparator between cs_block
> and zone_last_vblock should be "<=".

You're right.

> 
> I agree that it is the better to check zone_last_vblock with wp_block.
> wp_block corresponds to the write pointer position that next write will be
> made. It wp_block equals to zone_last_vblock, it means that unexpected data
> is written beyond the write pointer. Then, comparator should be "<" between
> zone_last_vblock and wp_block.

Oh, so wp has almost the same meaning to .next_blkoff in f2fs, it points to next
free block/sector. I will keep that in mind.

> 
> In short, I suggest the condition check below as the good one.
> 
> if (cs_block <= zone_last_vblock && zone_last_vblock < wp_block)

It's fine to me. :)

Thanks,

> 
>>> +		MSG(0, "Curseg has fsync data: curseg %d[0x%x,0x%x] "
>>> +		    "last valid block in zone[0x%x,0x%x]\n",
>>> +		    type, curseg->segno, curseg->next_blkoff,
>>> +		    GET_SEGNO(sbi, zone_last_vblock),
>>> +		    OFFSET_IN_SEG(sbi, zone_last_vblock));
>>> +		return 0;
>>> +	}
>>> +
>>> +	fsck->chk.wp_inconsistent_zones++;
>>> +	return -EINVAL;
>>> +}
>>> +
>>> +#else
>>> +
>>> +static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
>>> +{
>>> +	return 0;
>>> +}
>>> +
>>> +#endif
>>> +
>>>  int check_curseg_offset(struct f2fs_sb_info *sbi, int type)
>>>  {
>>>  	struct curseg_info *curseg = CURSEG_I(sbi, type);
>>> @@ -2209,6 +2328,10 @@ int check_curseg_offset(struct f2fs_sb_info *sbi, int type)
>>>  			return -EINVAL;
>>>  		}
>>>  	}
>>> +
>>> +	if (c.zoned_model == F2FS_ZONED_HM)
>>> +		return check_curseg_write_pointer(sbi, type);
>>> +
>>>  	return 0;
>>>  }
>>>  
>>> @@ -2628,6 +2751,23 @@ out:
>>>  	return cnt;
>>>  }
>>>  
>>> +/*
>>> + * Check and fix consistency with write pointers at the beginning of
>>> + * fsck so that following writes by fsck do not fail.
>>> + */
>>> +void fsck_chk_and_fix_write_pointers(struct f2fs_sb_info *sbi)
>>> +{
>>> +	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
>>> +
>>> +	if (c.zoned_model != F2FS_ZONED_HM)
>>> +		return;
>>> +
>>> +	if (check_curseg_offsets(sbi) && c.fix_on) {
>>> +		fix_curseg_info(sbi);
>>> +		fsck->chk.wp_fixed = 1;
>>> +	}
>>> +}
>>> +
>>>  int fsck_chk_curseg_info(struct f2fs_sb_info *sbi)
>>>  {
>>>  	struct curseg_info *curseg;
>>> @@ -2678,6 +2818,20 @@ int fsck_verify(struct f2fs_sb_info *sbi)
>>>  
>>>  	printf("\n");
>>>  
>>> +	if (c.zoned_model == F2FS_ZONED_HM) {
>>> +		printf("[FSCK] Write pointers consistency                    ");
>>> +		if (fsck->chk.wp_inconsistent_zones == 0x0) {
>>> +			printf(" [Ok..]\n");
>>> +		} else {
>>> +			printf(" [Fail] [0x%x]\n",
>>> +			       fsck->chk.wp_inconsistent_zones);
>>> +			c.bug_on = 1;
>>> +		}
>>> +
>>> +		if (fsck->chk.wp_fixed && c.fix_on)
>>> +			force = 1;
>>> +	}
>>> +
>>>  	if (c.feature & cpu_to_le32(F2FS_FEATURE_LOST_FOUND)) {
>>>  		for (i = 0; i < fsck->nr_nat_entries; i++)
>>>  			if (f2fs_test_bit(i, fsck->nat_area_bitmap) != 0)
>>> diff --git a/fsck/fsck.h b/fsck/fsck.h
>>> index 75052d8..c4432e8 100644
>>> --- a/fsck/fsck.h
>>> +++ b/fsck/fsck.h
>>> @@ -80,6 +80,8 @@ struct f2fs_fsck {
>>>  		u32 multi_hard_link_files;
>>>  		u64 sit_valid_blocks;
>>>  		u32 sit_free_segs;
>>> +		u32 wp_fixed;
>>> +		u32 wp_inconsistent_zones;
>>>  	} chk;
>>>  
>>>  	struct hard_link_node *hard_link_list_head;
>>> @@ -162,6 +164,7 @@ int fsck_chk_inline_dentries(struct f2fs_sb_info *, struct f2fs_node *,
>>>  		struct child_info *);
>>>  void fsck_chk_checkpoint(struct f2fs_sb_info *sbi);
>>>  int fsck_chk_meta(struct f2fs_sb_info *sbi);
>>> +void fsck_chk_and_fix_write_pointers(struct f2fs_sb_info *);
>>>  int fsck_chk_curseg_info(struct f2fs_sb_info *);
>>>  void pretty_print_filename(const u8 *raw_name, u32 len,
>>>  			   char out[F2FS_PRINT_NAMELEN], int enc_name);
>>> diff --git a/fsck/main.c b/fsck/main.c
>>> index 8c62a14..9a7d499 100644
>>> --- a/fsck/main.c
>>> +++ b/fsck/main.c
>>> @@ -602,6 +602,8 @@ static void do_fsck(struct f2fs_sb_info *sbi)
>>>  
>>>  	print_cp_state(flag);
>>>  
>>> +	fsck_chk_and_fix_write_pointers(sbi);
>>> +
>>>  	fsck_chk_curseg_info(sbi);
>>>  
>>>  	if (!c.fix_on && !c.bug_on) {
>>> diff --git a/fsck/mount.c b/fsck/mount.c
>>> index 2979865..5085e6c 100644
>>> --- a/fsck/mount.c
>>> +++ b/fsck/mount.c
>>> @@ -2465,6 +2465,52 @@ void set_section_type(struct f2fs_sb_info *sbi, unsigned int segno, int type)
>>>  	}
>>>  }
>>>  
>>> +#ifdef HAVE_LINUX_BLKZONED_H
>>> +
>>> +static bool write_pointer_at_zone_start(struct f2fs_sb_info *sbi,
>>> +					unsigned int zone_segno)
>>> +{
>>> +	u_int64_t sector;
>>> +	struct blk_zone blkz;
>>> +	block_t block = START_BLOCK(sbi, zone_segno);
>>> +	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
>>> +	int ret, j;
>>> +
>>> +	if (c.zoned_model != F2FS_ZONED_HM)
>>> +		return true;
>>> +
>>> +	for (j = 0; j < MAX_DEVICES; j++) {
>>> +		if (!c.devices[j].path)
>>> +			break;
>>> +		if (c.devices[j].start_blkaddr <= block &&
>>> +		    block <= c.devices[j].end_blkaddr)
>>> +			break;
>>> +	}
>>> +
>>> +	if (j >= MAX_DEVICES)
>>> +		return false;
>>> +
>>> +	sector = (block - c.devices[j].start_blkaddr) << log_sectors_per_block;
>>> +	ret = f2fs_report_zone(j, sector, &blkz);
>>> +	if (ret)
>>> +		return false;
>>> +
>>> +	if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ)
>>> +		return true;
>>> +
>>> +	return blk_zone_sector(&blkz) == blk_zone_wp_sector(&blkz);
>>> +}
>>> +
>>> +#else
>>> +
>>> +static bool write_pointer_at_zone_start(struct f2fs_sb_info *sbi,
>>> +					unsigned int zone_segno)
>>> +{
>>> +	return true;
>>> +}
>>> +
>>> +#endif
>>> +
>>>  int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_type, bool new_sec)
>>>  {
>>>  	struct f2fs_super_block *sb = F2FS_RAW_SUPER(sbi);
>>> @@ -2517,7 +2563,8 @@ int find_next_free_block(struct f2fs_sb_info *sbi, u64 *to, int left, int want_t
>>>  					break;
>>>  			}
>>>  
>>> -			if (i == sbi->segs_per_sec) {
>>> +			if (i == sbi->segs_per_sec &&
>>> +			    write_pointer_at_zone_start(sbi, segno)) {
>>>  				set_section_type(sbi, segno, want_type);
>>>  				return 0;
>>>  			}
>>>
> 
> --
> Best Regards,
> Shin'ichiro Kawasaki.
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [f2fs-dev] [PATCH v6 7/8] fsck: Check write pointer consistency of open zones
  2019-11-11  3:14       ` Chao Yu
@ 2019-11-13  1:44         ` Shinichiro Kawasaki
  0 siblings, 0 replies; 15+ messages in thread
From: Shinichiro Kawasaki @ 2019-11-13  1:44 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Damien Le Moal, linux-f2fs-devel

On Nov 11, 2019 / 11:14, Chao Yu wrote:
> On 2019/11/6 17:45, Shinichiro Kawasaki wrote:
> > On Nov 05, 2019 / 19:06, Chao Yu wrote:
> >> On 2019/10/28 14:55, Shin'ichiro Kawasaki wrote:
> >>> On sudden f2fs shutdown, write pointers of zoned block devices can go
> >>> further but f2fs meta data keeps current segments at positions before the
> >>> write operations. After remounting the f2fs, this inconsistency causes
> >>> write operations not at write pointers and "Unaligned write command"
> >>> error is reported.
> >>>
> >>> To avoid the error, have f2fs.fsck check consistency of write pointers
> >>> of open zones that current segments point to. Compare each current
> >>> segment's position and the write pointer position of the open zone. If
> >>> inconsistency is found and 'fix_on' flag is set, assign a new zone to the
> >>> current segment and check the newly assigned zone has write pointer at
> >>> the zone start. Leave the original zone as is to keep data recorded in
> >>> it.
> >>>
> >>> To care about fsync data, refer each seg_entry's ckpt_valid_map to get
> >>> the last valid block in the zone. If the last valid block is beyond the
> >>> current segments position, fsync data exits in the zone. In case fsync
> >>> data exists, do not assign a new zone to the current segment not to lose
> >>> the fsync data. It is expected that the kernel replay the fsync data and
> >>> fix the write pointer inconsistency at mount time.
> >>>
> >>> Also check consistency between write pointer of the zone the current
> >>> segment points to with valid block maps of the zone. If the last valid
> >>> block is beyond the write pointer position, report to indicate f2fs bug.
> >>> If 'fix_on' flag is set, assign a new zone to the current segment.
> >>>
> >>> When inconsistencies are found, turn on 'bug_on' flag in fsck_verify() to
> >>> ask users to fix them or not. When inconsistencies get fixed, turn on
> >>> 'force' flag in fsck_verify() to enforce fixes in following checks.
> >>>
> >>> This check and fix is done twice. The first is done at the beginning of
> >>> do_fsck() function so that other fixes can reflect the current segment
> >>> modification. The second is done in fsck_verify() to reflect updated meta
> >>> data by other fixes.
> >>>
> >>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >>> ---
> >>>  fsck/f2fs.h  |   5 ++
> >>>  fsck/fsck.c  | 154 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  fsck/fsck.h  |   3 +
> >>>  fsck/main.c  |   2 +
> >>>  fsck/mount.c |  49 +++++++++++++++-
> >>>  5 files changed, 212 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/fsck/f2fs.h b/fsck/f2fs.h
> >>> index 399c74d..07513cb 100644
> >>> --- a/fsck/f2fs.h
> >>> +++ b/fsck/f2fs.h
> >>> @@ -429,6 +429,11 @@ static inline block_t __end_block_addr(struct f2fs_sb_info *sbi)
> >>>  #define GET_BLKOFF_FROM_SEG0(sbi, blk_addr)				\
> >>>  	(GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & (sbi->blocks_per_seg - 1))
> >>>  
> >>> +#define GET_SEC_FROM_SEG(sbi, segno)					\
> >>> +	((segno) / (sbi)->segs_per_sec)
> >>> +#define GET_SEG_FROM_SEC(sbi, secno)					\
> >>> +	((secno) * (sbi)->segs_per_sec)
> >>> +
> >>>  #define FREE_I_START_SEGNO(sbi)						\
> >>>  	GET_SEGNO_FROM_SEG0(sbi, SM_I(sbi)->main_blkaddr)
> >>>  #define GET_R2L_SEGNO(sbi, segno)	(segno + FREE_I_START_SEGNO(sbi))
> >>> diff --git a/fsck/fsck.c b/fsck/fsck.c
> >>> index 2ae3bd5..e0eda4e 100644
> >>> --- a/fsck/fsck.c
> >>> +++ b/fsck/fsck.c
> >>> @@ -2181,6 +2181,125 @@ static void fix_checkpoints(struct f2fs_sb_info *sbi)
> >>>  	fix_checkpoint(sbi);
> >>>  }
> >>>  
> >>> +#ifdef HAVE_LINUX_BLKZONED_H
> >>> +
> >>> +/*
> >>> + * Refer valid block map and return offset of the last valid block in the zone.
> >>> + * Obtain valid block map from SIT and fsync data.
> >>> + * If there is no valid block in the zone, return -1.
> >>> + */
> >>> +static int last_vblk_off_in_zone(struct f2fs_sb_info *sbi,
> >>> +				 unsigned int zone_segno)
> >>> +{
> >>> +	unsigned int s;
> >>> +	unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
> >>> +	struct seg_entry *se;
> >>> +	block_t b;
> >>> +	int ret = -1;
> >>> +
> >>> +	for (s = 0; s < segs_per_zone; s++) {
> >>> +		se = get_seg_entry(sbi, zone_segno + s);
> >>> +
> >>> +		/*
> >>> +		 * Refer not cur_valid_map but ckpt_valid_map which reflects
> >>> +		 * fsync data.
> >>> +		 */
> >>> +		ASSERT(se->ckpt_valid_map);
> >>> +		for (b = 0; b < sbi->blocks_per_seg; b++)
> >>> +			if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map))
> >>> +				ret = b + (s << sbi->log_blocks_per_seg);
> >>
> >> Minor thing, I guess we only need to find last valid block in target zone?
> >>
> >> int s, b;
> >>
> >> for (s = segs_per_zone - 1; s >= 0; s--) {
> >> 	for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
> >> 		if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map)) {
> >> 			ret = b + (s << sbi->log_blocks_per_seg);
> >> 			break;
> >> 		}
> >> }
> > 
> > Yes, reveresed search is the better. Will modify the code as suggested.
> > 
> >>
> >>> +	}
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> >>> +{
> >>> +	struct curseg_info *curseg = CURSEG_I(sbi, type);
> >>> +	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
> >>> +	struct blk_zone blkz;
> >>> +	block_t cs_block, wp_block, zone_last_vblock;
> >>> +	u_int64_t cs_sector, wp_sector;
> >>> +	int i, ret;
> >>> +	unsigned int zone_segno;
> >>> +	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> >>> +
> >>> +	/* get the device the curseg points to */
> >>> +	cs_block = START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff;
> >>> +	for (i = 0; i < MAX_DEVICES; i++) {
> >>> +		if (!c.devices[i].path)
> >>> +			break;
> >>> +		if (c.devices[i].start_blkaddr <= cs_block &&
> >>> +		    cs_block <= c.devices[i].end_blkaddr)
> >>> +			break;
> >>> +	}
> >>> +
> >>> +	if (i >= MAX_DEVICES)
> >>> +		return -EINVAL;
> >>> +
> >>> +	/* get write pointer position of the zone the curseg points to */
> >>> +	cs_sector = (cs_block - c.devices[i].start_blkaddr)
> >>> +		<< log_sectors_per_block;
> >>> +	ret = f2fs_report_zone(i, cs_sector, &blkz);
> >>> +	if (ret)
> >>> +		return ret;
> >>> +
> >>> +	if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ)
> >>> +		return 0;
> >>> +
> >>> +	/* check consistency between the curseg and the write pointer */
> >>> +	wp_block = c.devices[i].start_blkaddr +
> >>> +		(blk_zone_wp_sector(&blkz) >> log_sectors_per_block);
> >>> +	wp_sector = blk_zone_wp_sector(&blkz);
> >>> +
> >>> +	if (cs_sector == wp_sector)
> >>> +		return 0;
> >>> +
> >>> +	if (cs_sector > wp_sector) {
> >>> +		MSG(0, "Inconsistent write pointer with curseg %d: "
> >>> +		    "curseg %d[0x%x,0x%x] > wp[0x%x,0x%x]\n",
> >>> +		    type, type, curseg->segno, curseg->next_blkoff,
> >>> +		    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
> >>> +		fsck->chk.wp_inconsistent_zones++;
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	MSG(0, "Write pointer goes advance from curseg %d: "
> >>> +	    "curseg %d[0x%x,0x%x] wp[0x%x,0x%x]\n",
> >>> +	    type, type, curseg->segno, curseg->next_blkoff,
> >>> +	    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
> >>> +
> >>> +	zone_segno = GET_SEG_FROM_SEC(sbi,
> >>> +				      GET_SEC_FROM_SEG(sbi, curseg->segno));
> >>> +	zone_last_vblock = START_BLOCK(sbi, zone_segno) +
> >>> +		last_vblk_off_in_zone(sbi, zone_segno);
> >>> +
> >>> +	/*
> >>> +	 * If fsync data exists between the curseg and the last valid block,
> >>> +	 * it is not an error to fix. Leave it for kernel to recover later.
> >>> +	 */
> >>> +	if (cs_block <= zone_last_vblock) {
> >>
> >> According to above comments, should condition be?
> >>
> >> if (cs_block < zone_last_vblock && zone_last_vblock <= wp_block)
> >>
> > 
> > To be precise, cs_block points to curseg->next_blkoff, which is the block
> > curseg will write in the next write request. Then, if cs_block equals to
> > zone_last_vblock, it means that the block curseg->next_blkoff points to
> > already have valid block and fsync data. Then, comparator between cs_block
> > and zone_last_vblock should be "<=".
> 
> You're right.
> 
> > 
> > I agree that it is the better to check zone_last_vblock with wp_block.
> > wp_block corresponds to the write pointer position that next write will be
> > made. It wp_block equals to zone_last_vblock, it means that unexpected data
> > is written beyond the write pointer. Then, comparator should be "<" between
> > zone_last_vblock and wp_block.
> 
> Oh, so wp has almost the same meaning to .next_blkoff in f2fs, it points to next
> free block/sector. I will keep that in mind.

Right, wp and .next_blkoff are really similar, or same :)

> 
> > 
> > In short, I suggest the condition check below as the good one.
> > 
> > if (cs_block <= zone_last_vblock && zone_last_vblock < wp_block)
> 
> It's fine to me. :)

Ok, will reflect in the next patch post.

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-11-13  1:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-28  6:55 [f2fs-dev] [PATCH v6 0/8] fsck: Check write pointers of zoned block devices Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 1/8] libf2fs_zoned: Introduce f2fs_report_zones() helper function Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 2/8] libf2fs_zoned: Introduce f2fs_report_zone() " Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 3/8] libf2fs_zoned: Introduce f2fs_reset_zone() " Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 4/8] fsck: Find free zones instead of blocks to assign to current segments Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 5/8] fsck: Introduce move_one_curseg_info() function Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 6/8] fsck: Check fsync data always for zoned block devices Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 7/8] fsck: Check write pointer consistency of open zones Shin'ichiro Kawasaki
2019-11-05 11:06   ` Chao Yu
2019-11-06  9:45     ` Shinichiro Kawasaki
2019-11-11  3:14       ` Chao Yu
2019-11-13  1:44         ` Shinichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 8/8] fsck: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
2019-11-05 11:32   ` Chao Yu
2019-11-06  9:49     ` Shinichiro Kawasaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).