Linux-f2fs-devel Archive on lore.kernel.org
 help / color / Atom feed
From: Chao Yu <yuchao0@huawei.com>
To: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>,
	Damien Le Moal <Damien.LeMoal@wdc.com>,
	"linux-f2fs-devel@lists.sourceforge.net"
	<linux-f2fs-devel@lists.sourceforge.net>
Subject: Re: [f2fs-dev] [PATCH v4 2/2] fsck.f2fs: Check write pointer consistency with current segments
Date: Mon, 16 Sep 2019 09:37:01 +0800
Message-ID: <dd935f8f-276e-fa7b-e202-2a8722be60e0@huawei.com> (raw)
In-Reply-To: <20190912081648.wq2wi447hpold4t6@shindev.dhcp.fujisawa.hgst.com>

On 2019/9/12 16:16, Shinichiro Kawasaki wrote:
> On Sep 10, 2019 / 17:12, Chao Yu wrote:
>> On 2019/9/10 16:10, Shinichiro Kawasaki wrote:
>>> On Sep 09, 2019 / 15:14, Chao Yu wrote:
>>>> On 2019/9/6 16:31, Shinichiro Kawasaki wrote:
>>>>> On Sep 05, 2019 / 17:58, Chao Yu wrote:
>>>>>> Hi Shinichiro,
>>>>>>
>>>>>> Sorry for the delay.
>>>>>>
>>>>>> On 2019/9/3 16:37, Shinichiro Kawasaki wrote:
>>>>>>> On Sep 02, 2019 / 15:02, Chao Yu wrote:
>>>>>>>> On 2019/8/30 18:19, Shin'ichiro Kawasaki wrote:
>>>>>>>>> On sudden f2fs shutdown, zoned block device status and f2fs current
>>>>>>>>> segment positions in meta data can be inconsistent. When f2fs shutdown
>>>>>>>>> happens before write operations completes, write pointers of zoned block
>>>>>>>>> devices can go further but f2fs meta data keeps current segments at
>>>>>>>>> positions before the write operations. After remounting the f2fs, the
>>>>>>>>> inconsistency causes write operations not at write pointers and
>>>>>>>>> "Unaligned write command" error is reported. This error was observed when
>>>>>>>>> xfstests test case generic/388 was run with f2fs on a zoned block device.
>>>>>>>>>
>>>>>>>>> To avoid the error, have f2fs.fsck check consistency between each current
>>>>>>>>> segment's position and the write pointer of the zone the current segment
>>>>>>>>> points to. If the write pointer goes advance from the current segment,
>>>>>>>>> fix the current segment position setting at same as the write pointer
>>>>>>>>> position. If the write pointer goes to the zone end, find a new zone and
>>>>>>>>> set the current segment position at the new zone start. In case the write
>>>>>>>>> pointer is behind the current segment, write zero data at the write
>>>>>>>>> pointer position to make write pointer position at same as the current
>>>>>>>>> segment.
>>>>>>>>>
>>>>>>>>> When inconsistencies are found, turn on c.bug_on flag in fsck_verify() to
>>>>>>>>> ask users to fix them or not. When inconsistencies get fixed, turn on
>>>>>>>>> 'force' flag in fsck_verify() to enforce fixes in following checks. This
>>>>>>>>> position fix is done at the beginning of do_fsck() function so that other
>>>>>>>>> checks reflect the current segment modification.
>>>>>>>>>
>>>>>>>>> Also add GET_SEC_FROM_SEG and GET_SEG_FROM_SEC macros in fsck/fsck.h to
>>>>>>>>> simplify the code.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>>>>>> ---
>>>>>>>>>  fsck/f2fs.h |   5 ++
>>>>>>>>>  fsck/fsck.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>  fsck/fsck.h |   3 +
>>>>>>>>>  fsck/main.c |   2 +
>>>>>>>>>  4 files changed, 208 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/fsck/f2fs.h b/fsck/f2fs.h
>>>>>>>>> index 4dc6698..2c1c2b3 100644
>>>>>>>>> --- a/fsck/f2fs.h
>>>>>>>>> +++ b/fsck/f2fs.h
>>>>>>>>> @@ -337,6 +337,11 @@ static inline block_t __end_block_addr(struct f2fs_sb_info *sbi)
>>>>>>>>>  #define GET_BLKOFF_FROM_SEG0(sbi, blk_addr)				\
>>>>>>>>>  	(GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & (sbi->blocks_per_seg - 1))
>>>>>>>>>  
>>>>>>>>> +#define GET_SEC_FROM_SEG(sbi, segno)					\
>>>>>>>>> +	((segno) / (sbi)->segs_per_sec)
>>>>>>>>> +#define GET_SEG_FROM_SEC(sbi, secno)					\
>>>>>>>>> +	((secno) * (sbi)->segs_per_sec)
>>>>>>>>> +
>>>>>>>>>  #define FREE_I_START_SEGNO(sbi)						\
>>>>>>>>>  	GET_SEGNO_FROM_SEG0(sbi, SM_I(sbi)->main_blkaddr)
>>>>>>>>>  #define GET_R2L_SEGNO(sbi, segno)	(segno + FREE_I_START_SEGNO(sbi))
>>>>>>>>> diff --git a/fsck/fsck.c b/fsck/fsck.c
>>>>>>>>> index 8953ca1..a0f6849 100644
>>>>>>>>> --- a/fsck/fsck.c
>>>>>>>>> +++ b/fsck/fsck.c
>>>>>>>>> @@ -2574,6 +2574,190 @@ out:
>>>>>>>>>  	return cnt;
>>>>>>>>>  }
>>>>>>>>>  
>>>>>>>>> +/*
>>>>>>>>> + * Search a free section in main area. Start search from the section specified
>>>>>>>>> + * with segno argument toward main area end. Return first segment of the found
>>>>>>>>> + * section in segno argument.
>>>>>>>>> + */
>>>>>>>>> +static int find_next_free_section(struct f2fs_sb_info *sbi,
>>>>>>>>> +				  unsigned int *segno)
>>>>>>>>> +{
>>>>>>>>> +	unsigned int i, sec, section_valid_blocks;
>>>>>>>>> +	unsigned int end_segno = GET_SEGNO(sbi, SM_I(sbi)->main_blkaddr)
>>>>>>>>> +		+ SM_I(sbi)->main_segments;
>>>>>>>>> +	unsigned int end_sec = GET_SEC_FROM_SEG(sbi, end_segno);
>>>>>>>>> +	struct seg_entry *se;
>>>>>>>>> +	struct curseg_info *cs;
>>>>>>>>> +
>>>>>>>>> +	for (sec = GET_SEC_FROM_SEG(sbi, *segno); sec < end_sec; sec++) {
>>>>>>>>> +		/* find a section without valid blocks */
>>>>>>>>> +		section_valid_blocks = 0;
>>>>>>>>> +		for (i = 0; i < sbi->segs_per_sec; i++) {
>>>>>>>>> +			se = get_seg_entry(sbi, GET_SEG_FROM_SEC(sbi, sec) + i);
>>>>>>>>> +			section_valid_blocks += se->valid_blocks;
>>>>>>>>
>>>>>>>> If we want to find a 'real' free section, we'd better to use
>>>>>>>> se->ckpt_valid_blocks (wrapped with get_seg_vblocks()) in where we has recorded
>>>>>>>> fsynced data count.
>>>>>>>
>>>>>>> Thanks. When I create next patch series, I will use get_seg_vblocks().
>>>>>>> I will rebase to dev-test branch in which get_seg_vblocks() is available.
>>>>>>>
>>>>>>>>
>>>>>>>>> +		}
>>>>>>>>> +		if (section_valid_blocks)
>>>>>>>>> +			continue;
>>>>>>>>> +
>>>>>>>>> +		/* check the cursegs do not use the section */
>>>>>>>>> +		for (i = 0; i < NO_CHECK_TYPE; i++) {
>>>>>>>>> +			cs = &SM_I(sbi)->curseg_array[i];
>>>>>>>>> +			if (GET_SEC_FROM_SEG(sbi, cs->segno) == sec)
>>>>>>>>> +				break;
>>>>>>>>> +		}
>>>>>>>>> +		if (i >= NR_CURSEG_TYPE) {
>>>>>>>>> +			*segno = GET_SEG_FROM_SEC(sbi, sec);
>>>>>>>>> +			return 0;
>>>>>>>>> +		}
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	return -1;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +struct write_pointer_check_data {
>>>>>>>>> +	struct f2fs_sb_info *sbi;
>>>>>>>>> +	struct device_info *dev;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +static int fsck_chk_write_pointer(int i, struct blk_zone *blkz, void *opaque)
>>>>>>>>> +{
>>>>>>>>> +	struct write_pointer_check_data *wpd = opaque;
>>>>>>>>> +	struct f2fs_sb_info *sbi = wpd->sbi;
>>>>>>>>> +	struct device_info *dev = wpd->dev;
>>>>>>>>> +	struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
>>>>>>>>> +	block_t zone_block, wp_block, wp_blkoff, cs_block, b;
>>>>>>>>> +	unsigned int zone_segno, wp_segno, new_segno;
>>>>>>>>> +	struct seg_entry *se;
>>>>>>>>> +	struct curseg_info *cs;
>>>>>>>>> +	int cs_index, ret;
>>>>>>>>> +	int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
>>>>>>>>> +	unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
>>>>>>>>> +	void *zero_blk;
>>>>>>>>> +
>>>>>>>>> +	if (blk_zone_conv(blkz))
>>>>>>>>> +		return 0;
>>>>>>>>> +
>>>>>>>>> +	zone_block = dev->start_blkaddr
>>>>>>>>> +		+ (blk_zone_sector(blkz) >> log_sectors_per_block);
>>>>>>>>> +	zone_segno = GET_SEGNO(sbi, zone_block);
>>>>>>>>> +	wp_block = dev->start_blkaddr
>>>>>>>>> +		+ (blk_zone_wp_sector(blkz) >> log_sectors_per_block);
>>>>>>>>> +	wp_segno = GET_SEGNO(sbi, wp_block);
>>>>>>>>> +	wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
>>>>>>>>> +
>>>>>>>>> +	/* find the curseg which points to the zone */
>>>>>>>>> +	for (cs_index = 0; cs_index < NO_CHECK_TYPE; cs_index++) {
>>>>>>>>> +		cs = &SM_I(sbi)->curseg_array[cs_index];
>>>>>>>>> +		if (zone_segno <= cs->segno &&
>>>>>>>>> +		    cs->segno < zone_segno + segs_per_zone)
>>>>>>>>> +			break;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	if (cs_index >= NR_CURSEG_TYPE)
>>>>>>>>> +		return 0;
>>>>>>>>> +
>>>>>>>>> +	/* check write pointer consistency with the curseg in the zone */
>>>>>>>>> +	cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff;
>>>>>>>>> +	if (wp_block == cs_block)
>>>>>>>>> +		return 0;
>>>>>>>>> +
>>>>>>>>> +	if (!c.fix_on) {
>>>>>>>>> +		MSG(0, "Inconsistent write pointer: "
>>>>>>>>> +		    "curseg %d[0x%x,0x%x] wp[0x%x,0x%x]\n",
>>>>>>>>> +		    cs_index, cs->segno, cs->next_blkoff, wp_segno, wp_blkoff);
>>>>>>>>> +		fsck->chk.wp_inconsistent_zones++;
>>>>>>>>> +		return 0;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	/*
>>>>>>>>> +	 * If the curseg is in advance from the write pointer, write zero to
>>>>>>>>> +	 * move the write pointer forward to the same position as the curseg.
>>>>>>>>> +	 */
>>>>>>>>> +	if (wp_block < cs_block) {
>>>>>>>>> +		ret = 0;
>>>>>>>>> +		zero_blk = calloc(BLOCK_SZ, 1);
>>>>>>>>> +		if (!zero_blk)
>>>>>>>>> +			return -EINVAL;
>>>>>>>>> +
>>>>>>>>> +		FIX_MSG("Advance write pointer to match with curseg %d: "
>>>>>>>>> +			"[0x%x,0x%x]->[0x%x,0x%x]",
>>>>>>>>> +			cs_index, wp_segno, wp_blkoff,
>>>>>>>>> +			cs->segno, cs->next_blkoff);
>>>>>>>>> +		for (b = wp_block; b < cs_block && !ret; b++)
>>>>>>>>> +			ret = dev_write_block(zero_blk, b);
>>>>>>>>> +
>>>>>>>>> +		fsck->chk.wp_fixed_zones++;
>>>>>>>>> +		free(zero_blk);
>>>>>>>>> +		return ret;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	if (wp_segno == zone_segno + segs_per_zone) {
>>>>>>>>> +		/*
>>>>>>>>> +		 * If the write pointer is in advance from the curseg and at
>>>>>>>>> +		 * the zone end (section end), search a new free zone (section)
>>>>>>>>> +		 * between the curseg and main area end.
>>>>>>>>> +		 */
>>>>>>>>> +		new_segno = wp_segno;
>>>>>>>>> +		ret = find_next_free_section(sbi, &new_segno);
>>>>>>>>> +		if (ret) {
>>>>>>>>> +			/* search again from main area start */
>>>>>>>>> +			new_segno = GET_SEGNO(sbi, SM_I(sbi)->main_blkaddr);
>>>>>>>>> +			ret = find_next_free_section(sbi, &new_segno);
>>>>>>>>> +		}
>>>>>>>>
>>>>>>>> If curseg's type is warm node, relocating curseg would ruin warm node chain,
>>>>>>>> result in losing fsynced data for ever as well.
>>>>>>>>
>>>>>>>> So I guess we should migrate all dnode blocks chained with cs->next_blkoff in
>>>>>>>> current section.
>>>>>>>>
>>>>>>>>> +		if (ret) {
>>>>>>>>> +			MSG(0, "Free section not found\n");
>>>>>>>>> +			return ret;
>>>>>>>>> +		}
>>>>>>>>> +		FIX_MSG("New section for curseg %d: [0x%x,0x%x]->[0x%x,0x%x]",
>>>>>>>>> +			cs_index, cs->segno, cs->next_blkoff, new_segno, 0);
>>>>>>>>> +		cs->segno = new_segno;
>>>>>>>>> +		cs->next_blkoff = 0;
>>>>>>>>> +	} else {
>>>>>>>>> +		/*
>>>>>>>>> +		 * If the write pointer is in advance from the curseg within
>>>>>>>>> +		 * the zone, modify the curseg position to be same as the
>>>>>>>>> +		 * write pointer.
>>>>>>>>> +		 */
>>>>>>>>> +		ASSERT(wp_segno < zone_segno + segs_per_zone);
>>>>>>>>> +		FIX_MSG("Advance curseg %d: [0x%x,0x%x]->[0x%x,0x%x]",
>>>>>>>>> +			cs_index, cs->segno, cs->next_blkoff,
>>>>>>>>> +			wp_segno, wp_blkoff);
>>>>>>>>> +		cs->segno = wp_segno;
>>>>>>>>> +		cs->next_blkoff = wp_blkoff;
>>>>>>>>
>>>>>>>> The same data lose problem here, I guess we'd better handle it with the same way
>>>>>>>> as above.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>
>>>>>>> I created f2fs status with fsync data and warm node chain, and confirmed the v4
>>>>>>> implementation ruins the dnode blocks chain. Hmm.
>>>>>>>
>>>>>>> My understanding is that when f2fs kernel module recovers the fsync data, it
>>>>>>> sets the warm node curseg position at the start of the fsync data, and the fsync
>>>>>>> data will be overwritten as new nodes created. Is this understanding correct?
>>>>>>
>>>>>> Sorry, I'm not sure I've understood you correctly.
>>>>>
>>>>> Apology is mine, my question was not clear enough.
>>>>> And thanks for the explanation below. It helps me to understand better.
>>>>>
>>>>>> Current recovery flow is:
>>>>>> 1)   find all valid fsynced dnode in warm node chain
>>>>>> 2.a) recover fsynced dnode in memory, and set it dirty
>>>>>> 2.b) recover directory entry in memory, and set it dirty
>>>>>> 2.c) during regular's dnode recovery, physical blocks indexed by recovered dnode
>>>>>> will be revalidated
>>>>>> Note: we didn't move any cursegs before 3)
>>>>>> 3)   relocate all cursegs to new segments
>>>>>> 4)   trigger checkpoint to persist all recovered data(fs' meta, file's meta/data)
>>>>>
>>>>> Question, does the step 3) correspond to f2fs_allocate_new_segments(sbi) call
>>>>> at the end of recover_data()? The f2fs_allocate_new_segments() function
>>>>
>>>> Yeah, I meant that function.
>>>>
>>>>> relocates new segments only for DATA cursegs, and it keeps NODE cursegs same as
>>>>> before the fsync data recovery. Then I thought WARM NODE curseg would not change
>>>>> by recovery (and still keeps pointing to the fsync recovery data).
>>>>
>>>> Yes, that's correct. WARM NODE curseg won't change until step 4).
>>>
>>> Thanks. Following your idea "we can simply adjust to reset all invalid zone and
>>> allocate new zone for curseg before data/meta writes" for fix by kernel, I think
>>> kernel code change is required to allocate new zones for NODE cursegs also at
>>> step 3). WARM NODE curseg should be kept untouched by step 2 completion to refer
>>> fsynced dnodes at WARM NODE curseg's next_blkaddr. And at step 4, the fsyced
>>> dnodes recovered and set dirty will be written back with one of NODE cursegs
>>> (HOT NODE curseg?). At that time, we need to make sure the NODE curseg points to
>>
>> Directory's dnode goes to hot node area, other file's dnode goes to warm node
>> area, the left node goes to cold node area.
>>
>>> the position consistent with its zone's write pointer.
>>
>> Yes, before step 4), we should keep f2fs and zoned block device's write pointer
>> being consistent.
> 
> Ok, thanks.
> 
>>
>>>
>>>>>
>>>>>>>
>>>>>>> If this is the case, I think write pointer inconsistency will happen even if
>>>>>>> fsck does "migrate all dnode blocks" (I assume that the warm node curseg's next
>>>>>>
>>>>>> Actually, I notice that scheme's problem is: we've changed zone's write pointer
>>>>>> during dnode blocks migration, however if kernel drops recovery, we will still
>>>>>> face inconsistent status in between .next_blkaddr and .write_pointer, once we
>>>>>> start to write data from .next_blkaddr. So in fsck, after migration, we need to
>>>>>> reset .write_pointer to .next_blkaddr.
>>>>>>
>>>>>> I guess we need to consider four cases:
>>>>>>
>>>>>> o: support .write_pointer recovery
>>>>>> x: not support .write_pointer recovery
>>>>>>
>>>>>> 1) kernel: o, fsck: x, trigger recovery in kernel
>>>>>> 2) kernel: o, fsck: x, not trigger recovery in kernel
>>>>>> 3) kernel: x, fsck: o, trigger recovery in kernel
>>>>>> 4) kernel: x, fsck: o, not trigger recovery in kernel
>>>>>>
>>>>>> For 1) and 2), we can simply adjust to reset all invalid zone and allocate new
>>>>>> zone for curseg before data/meta writes.
>>>>>
>>>>> Thanks for the clarification. This approach for case 1) and 2) is simple. Let me
>>>>> try to create a patch for it.
>>>>>
>>>>>>
>>>>>> For 3) and 4), I haven't figured out a way handling all cases perfectly.
>>>>>
>>>>> For 3), I suppose fsck cannot fix write pointer inconsistency without fsync data
>>>>> loss, since recovery is judged and done by kernel. The write pointer consistency
>>>>> fix after recovery can be done only by kernel.
>>>>>
>>>>> It is not a good one but one idea is to confirm fsck users to enforce fsync data
>>>>> loss or not. This could be better than nothing.
>>>>>
>>>>> For 4), my understanding is that fsync data is not left in the file system. I
>>>>> think fsck can check fsync data existence and fix write pointer consistency, as
>>>>> my patch series shows.
>>>>
>>>> Yeah.
>>>>
>>>> Let's think about that whether there is a way to cover all cases.
>>>>
>>>> 1) For non-opened zones, we need to adjust all such zones' write pointer to
>>>> zero. I assume that once write pointer is zero, we still can access valid block
>>>> in zone. (recovery may need to revalidate blocks in free segments)
>>>
>>> When write pointer is reset to zero, all data in the zone gets lost. When we
>>> read data in the zone beyond the write pointer, just zero data is read. When any
>>> valid block or fsync data is left in a non-opened zone, I think the zone's write
>>> pointer should be left as is. Otherwise, if the zone do not have valid block and
>>> fsync data, the zone should be reset to avoid unaligned write error.
>>
>> Okay, if data beyond write pointer is invalid, we should keep write pointer as
>> it is if there are fsynced data in that zone.
>>
>>>
>>> One additional check I can think of is to check the last valid block in the zone
>>> and write pointer position of the zone. If .write_pointer >= .last_valid_block,
>>> , it is ok. If .write_pointer < .last_valid_block, this is a bug of f2fs. In
>>
>> Sounds reasonable, how can we find last valid block, as you said, content of
>> block beyond write pointer is all zero... or you mean curseg's next_blkaddr?
>> like the condition 2.c) described?
> 
> I think we can get each zone's last valid block referring each segment's valid
> block bitmap in SIT. In other words, this is a consistency check between write
> pointer and SIT. Is this feasible approach?

Good point.

I guess
- we should do such sanity check with a image which has consistent metadata (SIT
should not be broken)
- need to consider fsynced block in SIT

> 
>>
>>> this case, the data in the valid blocks beyond write pointer is just lost, and
>>> there is no way to recover this. I think this zone will not be selected for
>>> cursegs for further data write in the zone until the zone get discarded. No
>>> need to fix write pointer position to avoid unaligned write errors. I wonder
>>
>> Yes,
>>
>>> if fsck or kernel should detect and report this case, because users still use
>>> the f2fs partition without any action. May I ask your opinion?
>>
>> If we can detect that, I think it should be reported.
> 
> I see, thanks for the comment.
> 
>>
>>>
>>>>
>>>> 2) For opened zones (cursegs point to)
>>>> 2.a) .write_pointer == .next_blkaddr, no need to handle
>>>> 2.b) .write_pointer > .next_blkaddr, that should be the common state in sudden
>>>> power-off case. It needs to reset .write_pointer to .next_blkaddr.
> 
> Sorry but let me amend. I have just noticed that the fix above is not possible.
> We cannot set .write_pointer at .next_blkaddr, because write pointers cannot be

Alright..

> reset to desired position. It only can be reset to zero (at the zone start).
> Instead of resetting .write_pointer, how about to fix by allocating a new zone to
> the curseg for 2.b) in same manner as 2.c)?

Yeah, it's okay to me.

Thanks,

> 
>>>
>>> Agreed. To be more precise, if fsync data is recorded after .next_blkaddr, we
>>> need to allocate a new zone to the curseg after fsync data recovery.
>>>
>>>> 2.c) .write_pointer < .next_blkaddr, should be a bug of f2fs, we should have
>>>> lost datas in the range of [.write_pointer, .next_blkaddr] at least, if we're
>>>> sure about that there is no valid data existed out side of .write_pointer? can we?
>>>
>>> As you note, this is a bug and we lost the data in the range of [.write_pointer,
>>> .next_blkaddr]. We should leave the write pointer as is to keep the data in the
>>> zone, and allocate a new zone to the curseg to avoid unaligned write errors.
>>
>> Okay.
>>
>>>
>>>>
>>>> 3) f2fs_allocate_new_segments() may reset data type cursegs' position, I'd like
>>>> to know what's the value of -z option we configured, if we configured as one
>>>> zone contains one section, no matter kernel triggers recovery or not (via set
>>>> disable_roll_forward option), the write pointer consistency should has been kept
>>>> in all cases. Otherwise (one zone contains multiple sections), if kernel doesn't
>>>> support .write_pointer recovery, f2fs_allocate_new_segments() can make
>>>> .next_blkaddr larger than .write_pointer in the same zone.
>>>
>>> "Single section per zone" is a key design of the f2fs zoned block device
>>> support. For zoned block devices, mkfs.f2fs requires -m option. When -m
>>> options is set, mkfs sets '1' to c.secs_per_zone regardless of the -z option
>>> (refer f2fs_get_device_info() in lib/libf2fs.c). With this prerequisite,
>>
>> Oh, correct, thanks for noticing that. I did miss that we only use -m option for
>> zoned block device.
>>
>>> I think new zones are allocated for DATA cursegs in
>>> f2fs_allocate_new_segment() and its sub-function call path.
>>>
>>>>
>>>> Let me know, if I'm missing sth.
>>>
>>> Assuming fsync data is the only one exceptional data which is read beyond the
>>> cursegs' next_blkoff, this discussion looks comprehensive for me:
>>>
>>>  - with and without kernel write pointer fix
>>>  - with and without fsync data recovery
>>>  - open/non-open zones (zones cursegs point to, zones cursegs do not point to)
>>>
>>> All fs meta data in front of main segments are stored in conventional zones
>>> which do not have write pointers (refer f2fs_prepare_super_block() in mkfs/
>>> f2fs_format.c). Just we need to care about write pointer consistency for the
>>> main data area.
>>
>> Agreed,
> 
> --
> Best Regards,
> Shin'ichiro Kawasaki.
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply index

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-30 10:19 [f2fs-dev] [PATCH v4 0/2] fsck: Check write pointers of zoned block devices Shin'ichiro Kawasaki
2019-08-30 10:19 ` [f2fs-dev] [PATCH v4 1/2] libf2fs_zoned: Introduce f2fs_report_zones() helper function Shin'ichiro Kawasaki
2019-08-30 10:19 ` [f2fs-dev] [PATCH v4 2/2] fsck.f2fs: Check write pointer consistency with current segments Shin'ichiro Kawasaki
2019-09-02  7:02   ` Chao Yu
2019-09-03  8:37     ` Shinichiro Kawasaki
2019-09-05  9:58       ` Chao Yu
2019-09-06  8:31         ` Shinichiro Kawasaki
2019-09-09  7:14           ` Chao Yu
2019-09-10  8:10             ` Shinichiro Kawasaki
2019-09-10  9:12               ` Chao Yu
2019-09-12  8:16                 ` Shinichiro Kawasaki
2019-09-16  1:37                   ` Chao Yu [this message]
2019-09-18  3:07                     ` Shinichiro Kawasaki
2019-09-21  9:42                       ` Chao Yu
2019-09-25  8:05                         ` Shinichiro Kawasaki
2019-09-29  2:09                           ` Chao Yu
2019-10-02  5:30                             ` Shinichiro Kawasaki

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dd935f8f-276e-fa7b-e202-2a8722be60e0@huawei.com \
    --to=yuchao0@huawei.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-f2fs-devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-f2fs-devel/0 linux-f2fs-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-f2fs-devel linux-f2fs-devel/ https://lore.kernel.org/linux-f2fs-devel \
		linux-f2fs-devel@lists.sourceforge.net linux-f2fs-devel@archiver.kernel.org
	public-inbox-index linux-f2fs-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/net.sourceforge.lists.linux-f2fs-devel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox