All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: Chao Yu <yuchao0@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>,
	Damien Le Moal <Damien.LeMoal@wdc.com>,
	"linux-f2fs-devel@lists.sourceforge.net"
	<linux-f2fs-devel@lists.sourceforge.net>
Subject: Re: [f2fs-dev] [PATCH v6 7/8] fsck: Check write pointer consistency of open zones
Date: Wed, 13 Nov 2019 01:44:33 +0000	[thread overview]
Message-ID: <20191113014432.bdmnvthhrb6cnfi7@shindev.dhcp.fujisawa.hgst.com> (raw)
In-Reply-To: <74c119b6-a524-10ad-7093-87adcce3fb2d@huawei.com>

On Nov 11, 2019 / 11:14, Chao Yu wrote:
> On 2019/11/6 17:45, Shinichiro Kawasaki wrote:
> > On Nov 05, 2019 / 19:06, Chao Yu wrote:
> >> On 2019/10/28 14:55, Shin'ichiro Kawasaki wrote:
> >>> On sudden f2fs shutdown, write pointers of zoned block devices can go
> >>> further but f2fs meta data keeps current segments at positions before the
> >>> write operations. After remounting the f2fs, this inconsistency causes
> >>> write operations not at write pointers and "Unaligned write command"
> >>> error is reported.
> >>>
> >>> To avoid the error, have f2fs.fsck check consistency of write pointers
> >>> of open zones that current segments point to. Compare each current
> >>> segment's position and the write pointer position of the open zone. If
> >>> inconsistency is found and 'fix_on' flag is set, assign a new zone to the
> >>> current segment and check the newly assigned zone has write pointer at
> >>> the zone start. Leave the original zone as is to keep data recorded in
> >>> it.
> >>>
> >>> To care about fsync data, refer each seg_entry's ckpt_valid_map to get
> >>> the last valid block in the zone. If the last valid block is beyond the
> >>> current segments position, fsync data exits in the zone. In case fsync
> >>> data exists, do not assign a new zone to the current segment not to lose
> >>> the fsync data. It is expected that the kernel replay the fsync data and
> >>> fix the write pointer inconsistency at mount time.
> >>>
> >>> Also check consistency between write pointer of the zone the current
> >>> segment points to with valid block maps of the zone. If the last valid
> >>> block is beyond the write pointer position, report to indicate f2fs bug.
> >>> If 'fix_on' flag is set, assign a new zone to the current segment.
> >>>
> >>> When inconsistencies are found, turn on 'bug_on' flag in fsck_verify() to
> >>> ask users to fix them or not. When inconsistencies get fixed, turn on
> >>> 'force' flag in fsck_verify() to enforce fixes in following checks.
> >>>
> >>> This check and fix is done twice. The first is done at the beginning of
> >>> do_fsck() function so that other fixes can reflect the current segment
> >>> modification. The second is done in fsck_verify() to reflect updated meta
> >>> data by other fixes.
> >>>
> >>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >>> ---
> >>>  fsck/f2fs.h  |   5 ++
> >>>  fsck/fsck.c  | 154 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  fsck/fsck.h  |   3 +
> >>>  fsck/main.c  |   2 +
> >>>  fsck/mount.c |  49 +++++++++++++++-
> >>>  5 files changed, 212 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/fsck/f2fs.h b/fsck/f2fs.h
> >>> index 399c74d..07513cb 100644
> >>> --- a/fsck/f2fs.h
> >>> +++ b/fsck/f2fs.h
> >>> @@ -429,6 +429,11 @@ static inline block_t __end_block_addr(struct f2fs_sb_info *sbi)
> >>>  #define GET_BLKOFF_FROM_SEG0(sbi, blk_addr)				\
> >>>  	(GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & (sbi->blocks_per_seg - 1))
> >>>  
> >>> +#define GET_SEC_FROM_SEG(sbi, segno)					\
> >>> +	((segno) / (sbi)->segs_per_sec)
> >>> +#define GET_SEG_FROM_SEC(sbi, secno)					\
> >>> +	((secno) * (sbi)->segs_per_sec)
> >>> +
> >>>  #define FREE_I_START_SEGNO(sbi)						\
> >>>  	GET_SEGNO_FROM_SEG0(sbi, SM_I(sbi)->main_blkaddr)
> >>>  #define GET_R2L_SEGNO(sbi, segno)	(segno + FREE_I_START_SEGNO(sbi))
> >>> diff --git a/fsck/fsck.c b/fsck/fsck.c
> >>> index 2ae3bd5..e0eda4e 100644
> >>> --- a/fsck/fsck.c
> >>> +++ b/fsck/fsck.c
> >>> @@ -2181,6 +2181,125 @@ static void fix_checkpoints(struct f2fs_sb_info *sbi)
> >>>  	fix_checkpoint(sbi);
> >>>  }
> >>>  
> >>> +#ifdef HAVE_LINUX_BLKZONED_H
> >>> +
> >>> +/*
> >>> + * Refer valid block map and return offset of the last valid block in the zone.
> >>> + * Obtain valid block map from SIT and fsync data.
> >>> + * If there is no valid block in the zone, return -1.
> >>> + */
> >>> +static int last_vblk_off_in_zone(struct f2fs_sb_info *sbi,
> >>> +				 unsigned int zone_segno)
> >>> +{
> >>> +	unsigned int s;
> >>> +	unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
> >>> +	struct seg_entry *se;
> >>> +	block_t b;
> >>> +	int ret = -1;
> >>> +
> >>> +	for (s = 0; s < segs_per_zone; s++) {
> >>> +		se = get_seg_entry(sbi, zone_segno + s);
> >>> +
> >>> +		/*
> >>> +		 * Refer not cur_valid_map but ckpt_valid_map which reflects
> >>> +		 * fsync data.
> >>> +		 */
> >>> +		ASSERT(se->ckpt_valid_map);
> >>> +		for (b = 0; b < sbi->blocks_per_seg; b++)
> >>> +			if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map))
> >>> +				ret = b + (s << sbi->log_blocks_per_seg);
> >>
> >> Minor thing, I guess we only need to find last valid block in target zone?
> >>
> >> int s, b;
> >>
> >> for (s = segs_per_zone - 1; s >= 0; s--) {
> >> 	for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
> >> 		if (f2fs_test_bit(b, (const char*)se->ckpt_valid_map)) {
> >> 			ret = b + (s << sbi->log_blocks_per_seg);
> >> 			break;
> >> 		}
> >> }
> > 
> > Yes, reveresed search is the better. Will modify the code as suggested.
> > 
> >>
> >>> +	}
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +static int check_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
> >>> +{
> >>> +	struct curseg_info *curseg = CURSEG_I(sbi, type);
> >>> +	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
> >>> +	struct blk_zone blkz;
> >>> +	block_t cs_block, wp_block, zone_last_vblock;
> >>> +	u_int64_t cs_sector, wp_sector;
> >>> +	int i, ret;
> >>> +	unsigned int zone_segno;
> >>> +	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
> >>> +
> >>> +	/* get the device the curseg points to */
> >>> +	cs_block = START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff;
> >>> +	for (i = 0; i < MAX_DEVICES; i++) {
> >>> +		if (!c.devices[i].path)
> >>> +			break;
> >>> +		if (c.devices[i].start_blkaddr <= cs_block &&
> >>> +		    cs_block <= c.devices[i].end_blkaddr)
> >>> +			break;
> >>> +	}
> >>> +
> >>> +	if (i >= MAX_DEVICES)
> >>> +		return -EINVAL;
> >>> +
> >>> +	/* get write pointer position of the zone the curseg points to */
> >>> +	cs_sector = (cs_block - c.devices[i].start_blkaddr)
> >>> +		<< log_sectors_per_block;
> >>> +	ret = f2fs_report_zone(i, cs_sector, &blkz);
> >>> +	if (ret)
> >>> +		return ret;
> >>> +
> >>> +	if (blk_zone_type(&blkz) != BLK_ZONE_TYPE_SEQWRITE_REQ)
> >>> +		return 0;
> >>> +
> >>> +	/* check consistency between the curseg and the write pointer */
> >>> +	wp_block = c.devices[i].start_blkaddr +
> >>> +		(blk_zone_wp_sector(&blkz) >> log_sectors_per_block);
> >>> +	wp_sector = blk_zone_wp_sector(&blkz);
> >>> +
> >>> +	if (cs_sector == wp_sector)
> >>> +		return 0;
> >>> +
> >>> +	if (cs_sector > wp_sector) {
> >>> +		MSG(0, "Inconsistent write pointer with curseg %d: "
> >>> +		    "curseg %d[0x%x,0x%x] > wp[0x%x,0x%x]\n",
> >>> +		    type, type, curseg->segno, curseg->next_blkoff,
> >>> +		    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
> >>> +		fsck->chk.wp_inconsistent_zones++;
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	MSG(0, "Write pointer goes advance from curseg %d: "
> >>> +	    "curseg %d[0x%x,0x%x] wp[0x%x,0x%x]\n",
> >>> +	    type, type, curseg->segno, curseg->next_blkoff,
> >>> +	    GET_SEGNO(sbi, wp_block), OFFSET_IN_SEG(sbi, wp_block));
> >>> +
> >>> +	zone_segno = GET_SEG_FROM_SEC(sbi,
> >>> +				      GET_SEC_FROM_SEG(sbi, curseg->segno));
> >>> +	zone_last_vblock = START_BLOCK(sbi, zone_segno) +
> >>> +		last_vblk_off_in_zone(sbi, zone_segno);
> >>> +
> >>> +	/*
> >>> +	 * If fsync data exists between the curseg and the last valid block,
> >>> +	 * it is not an error to fix. Leave it for kernel to recover later.
> >>> +	 */
> >>> +	if (cs_block <= zone_last_vblock) {
> >>
> >> According to above comments, should condition be?
> >>
> >> if (cs_block < zone_last_vblock && zone_last_vblock <= wp_block)
> >>
> > 
> > To be precise, cs_block points to curseg->next_blkoff, which is the block
> > curseg will write in the next write request. Then, if cs_block equals to
> > zone_last_vblock, it means that the block curseg->next_blkoff points to
> > already have valid block and fsync data. Then, comparator between cs_block
> > and zone_last_vblock should be "<=".
> 
> You're right.
> 
> > 
> > I agree that it is the better to check zone_last_vblock with wp_block.
> > wp_block corresponds to the write pointer position that next write will be
> > made. It wp_block equals to zone_last_vblock, it means that unexpected data
> > is written beyond the write pointer. Then, comparator should be "<" between
> > zone_last_vblock and wp_block.
> 
> Oh, so wp has almost the same meaning to .next_blkoff in f2fs, it points to next
> free block/sector. I will keep that in mind.

Right, wp and .next_blkoff are really similar, or same :)

> 
> > 
> > In short, I suggest the condition check below as the good one.
> > 
> > if (cs_block <= zone_last_vblock && zone_last_vblock < wp_block)
> 
> It's fine to me. :)

Ok, will reflect in the next patch post.

--
Best Regards,
Shin'ichiro Kawasaki

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2019-11-13  1:44 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-28  6:55 [f2fs-dev] [PATCH v6 0/8] fsck: Check write pointers of zoned block devices Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 1/8] libf2fs_zoned: Introduce f2fs_report_zones() helper function Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 2/8] libf2fs_zoned: Introduce f2fs_report_zone() " Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 3/8] libf2fs_zoned: Introduce f2fs_reset_zone() " Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 4/8] fsck: Find free zones instead of blocks to assign to current segments Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 5/8] fsck: Introduce move_one_curseg_info() function Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 6/8] fsck: Check fsync data always for zoned block devices Shin'ichiro Kawasaki
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 7/8] fsck: Check write pointer consistency of open zones Shin'ichiro Kawasaki
2019-11-05 11:06   ` Chao Yu
2019-11-06  9:45     ` Shinichiro Kawasaki
2019-11-11  3:14       ` Chao Yu
2019-11-13  1:44         ` Shinichiro Kawasaki [this message]
2019-10-28  6:55 ` [f2fs-dev] [PATCH v6 8/8] fsck: Check write pointer consistency of non-open zones Shin'ichiro Kawasaki
2019-11-05 11:32   ` Chao Yu
2019-11-06  9:49     ` Shinichiro Kawasaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191113014432.bdmnvthhrb6cnfi7@shindev.dhcp.fujisawa.hgst.com \
    --to=shinichiro.kawasaki@wdc.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.