All of lore.kernel.org
 help / color / mirror / Atom feed
* f2fs: SIT corruption upon fsck
@ 2019-04-17 10:25 Ju Hyung Park
  2019-04-17 10:28 ` Ju Hyung Park
  0 siblings, 1 reply; 7+ messages in thread
From: Ju Hyung Park @ 2019-04-17 10:25 UTC (permalink / raw)
  To: linux-f2fs-devel

Hi,

I was noticing some truncation warnings on the kernel logs so I went
ahead and ran fsck.

I dumped the image to a btrfs snapshot just in case if fsck fails and
corrupts the image even more, and turns out it did.

The image is formatted with large nat bitmap enabled and using
mkfs.f2fs v1.12.0.
The fsck is from the latest master branch.

After fsck, the image fails to mount at all:
[ 3052.298358] ------------[ cut here ]------------
[ 3052.298359] generic_make_request: Trying to write to read-only
block-device loop1 (partno 0)
[ 3052.298368] WARNING: CPU: 1 PID: 3854 at block/blk-core.c:2174
generic_make_request_checks+0x590/0x630
[ 3052.298394] CPU: 1 PID: 3854 Comm: umount Tainted: G           O
  4.19.35-zen+ #1
[ 3052.298394] Hardware name: System manufacturer System Product
Name/ROG MAXIMUS X HERO (WI-FI AC), BIOS 1704 09/14/2018
[ 3052.298395] RIP: 0010:generic_make_request_checks+0x590/0x630
[ 3052.298396] Code: 5c 03 00 00 48 8d 74 24 08 48 89 df c6 05 b5 cd
36 01 01 e8 c2 90 01 00 48 89 c6 44 89 ea 48 c7 c7 98 64 59 82 e8 d5
9b a7 ff <0f> 0b 48 8b 7b 08 e9 f2 fa ff ff 41 8b 86 98 02 00 00 49 8b
16 89
[ 3052.298397] RSP: 0018:ffff8882bcc97950 EFLAGS: 00010282
[ 3052.298397] RAX: 0000000000000050 RBX: ffff8883f22f7500 RCX: 0000000000000006
[ 3052.298398] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88884ec56340
[ 3052.298398] RBP: ffff8888437fa6c0 R08: 0000000000000004 R09: 0000000000000405
[ 3052.298398] R10: 0000000000000001 R11: 00000000000006f3 R12: 0000000000001000
[ 3052.298399] R13: 0000000000000000 R14: ffff8886eb92d800 R15: ffff8882bcc97ac0
[ 3052.298399] FS:  00007f0471dae8c0(0000) GS:ffff88884ec40000(0000)
knlGS:0000000000000000
[ 3052.298400] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3052.298400] CR2: 00007f04727302c0 CR3: 0000000213c6f004 CR4: 00000000003606e0
[ 3052.298400] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3052.298401] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3052.298401] Call Trace:
[ 3052.298403]  ? generic_make_request+0x46/0x3d0
[ 3052.298404]  ? wait_woken+0x80/0x80
[ 3052.298405]  ? mempool_alloc+0xb7/0x1a0
[ 3052.298406]  ? submit_bio+0x30/0x110
[ 3052.298407]  ? bvec_alloc+0x7c/0xd0
[ 3052.298409]  ? __submit_merged_bio+0x68/0x390
[ 3052.298410]  ? f2fs_submit_page_write+0x1bb/0x7f0
[ 3052.298410]  ? f2fs_do_write_meta_page+0x7f/0x160
[ 3052.298411]  ? __f2fs_write_meta_page+0x70/0x140
[ 3052.298412]  ? f2fs_sync_meta_pages+0x140/0x250
[ 3052.298413]  ? f2fs_write_checkpoint+0x5c5/0x17b0
[ 3052.298414]  ? f2fs_sync_fs+0x9c/0x110
[ 3052.298416]  ? sync_filesystem+0x66/0x80
[ 3052.298417]  ? generic_shutdown_super+0x1d/0x100
[ 3052.298418]  ? kill_block_super+0x1c/0x40
[ 3052.298418]  ? kill_f2fs_super+0x64/0xb0
[ 3052.298419]  ? deactivate_locked_super+0x2d/0xb0
[ 3052.298420]  ? cleanup_mnt+0x65/0xa0
[ 3052.298421]  ? task_work_run+0x7f/0xa0
[ 3052.298422]  ? exit_to_usermode_loop+0x9c/0xa0
[ 3052.298423]  ? do_syscall_64+0xc7/0xf0
[ 3052.298425]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3052.298425] ---[ end trace d1bb538504008702 ]---
[ 3052.298481] print_req_error: I/O error, dev loop1, sector 12488
[ 3052.298486] print_req_error: I/O error, dev loop1, sector 12648
[ 3052.298488] print_req_error: I/O error, dev loop1, sector 12672
[ 3052.298488] print_req_error: I/O error, dev loop1, sector 33040
[ 3052.298489] print_req_error: I/O error, dev loop1, sector 143048
[ 3052.298491] print_req_error: I/O error, dev loop1, sector 152200
[ 3052.298492] print_req_error: I/O error, dev loop1, sector 8192
[ 3091.204798] F2FS-fs (loop1): SIT is corrupted node# 2481418 vs 2517451
[ 3091.204801] F2FS-fs (loop1): Failed to initialize F2FS segment manager

The fsck output was too huge to to attach here, please visit here instead:
http://arter97.com/f2fs.log.xz

Since I stored this f2fs image inside a btrfs snapshot, I always have
access to the original(mountable) image. Feel free to suggest risky
patches, etc.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: f2fs: SIT corruption upon fsck
  2019-04-17 10:25 f2fs: SIT corruption upon fsck Ju Hyung Park
@ 2019-04-17 10:28 ` Ju Hyung Park
  2019-04-17 10:40   ` Ju Hyung Park
  0 siblings, 1 reply; 7+ messages in thread
From: Ju Hyung Park @ 2019-04-17 10:28 UTC (permalink / raw)
  To: linux-f2fs-devel

This is an additional f2fs status from the original image when mounted
with 5.1-rc1-4.19.

Thanks.

=====[ partition info(loop1). #1, RO, CP: Good]=====
[SB: 1] [CP: 2] [SIT: 10] [NAT: 114] [SSA: 256] [MAIN:
130697(OverProv:1027 Resv:520)]

Utilization: 52% (34963544 valid blocks, 32181458 discard blocks)
  - Node: 2517444 (Inode: 2494236, Other: 23208)
  - Data: 32446100
  - Inline_xattr Inode: 0
  - Inline_data Inode: 0
  - Inline_dentry Inode: 0
  - Orphan/Append/Update Inode: 0, 0, 0

Main area: 130697 segs, 130697 secs 130697 zones
  - COLD  data: 21160, 21160, 21160
  - WARM  data: 22632, 22632, 22632
  - HOT   data: 2552, 2552, 2552
  - Dir   dnode: 2547, 2547, 2547
  - File   dnode: 2551, 2551, 2551
  - Indir nodes: 2486, 2486, 2486

  - Valid: 65216
  - Dirty: 4445
  - Prefree: 0
  - Free: 61036 (61036)

CP calls: 0 (BG: 0)
  - cp blocks : 0
  - sit blocks : 0
  - nat blocks : 0
  - ssa blocks : 0
GC calls: 0 (BG: 0)
  - data segments : 0 (0)
  - node segments : 0 (0)
Try to move 0 blocks (BG: 0)
  - data blocks : 0 (0)
  - node blocks : 0 (0)
Skipped : atomic write 0 (0)
BG skip : IO: 0, Other: 0

Extent Cache:
  - Hit Count: L1-1:0 L1-2:0 L2:0
  - Hit Ratio: 0% (0 / 0)
  - Inner Struct Count: tree: 0(0), node: 0

Balancing F2FS Async:
  - DIO (R:    0, W:    0)
  - IO_R (Data:    0, Node:    0, Meta:    0
  - IO_W (CP:    0, Data:    0, Flush: (   0    0    0), Discard: (
0    0)) cmd:    0 undiscard:   0
  - inmem:    0, atomic IO:    0 (Max.    0), volatile IO:    0 (Max.    0)
  - nodes:    0 in    1
  - dents:    0 in dirs:   0 (   0)
  - datas:    0 in files:   0
  - quota datas:    0 in quota files:   0
  - meta:    0 in 2399
  - imeta:    0
  - NATs:         7/        8
  - SITs:         6/   130697
  - free_nids:      3224/ 10761266
  - alloc_nids:         0

Distribution of User Blocks: [ valid | invalid | free ]
  [--------------------------|-|-----------------------]

IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 0 blocks in 0 segments

BDF: 98, avg. vblocks: 302

Memory: 41409 KB
  - static: 31725 KB
  - cached: 84 KB
  - paged : 9600 KB

On Wed, Apr 17, 2019 at 7:25 PM Ju Hyung Park <qkrwngud825@gmail.com> wrote:
>
> Hi,
>
> I was noticing some truncation warnings on the kernel logs so I went
> ahead and ran fsck.
>
> I dumped the image to a btrfs snapshot just in case if fsck fails and
> corrupts the image even more, and turns out it did.
>
> The image is formatted with large nat bitmap enabled and using
> mkfs.f2fs v1.12.0.
> The fsck is from the latest master branch.
>
> After fsck, the image fails to mount at all:
> [ 3052.298358] ------------[ cut here ]------------
> [ 3052.298359] generic_make_request: Trying to write to read-only
> block-device loop1 (partno 0)
> [ 3052.298368] WARNING: CPU: 1 PID: 3854 at block/blk-core.c:2174
> generic_make_request_checks+0x590/0x630
> [ 3052.298394] CPU: 1 PID: 3854 Comm: umount Tainted: G           O
>   4.19.35-zen+ #1
> [ 3052.298394] Hardware name: System manufacturer System Product
> Name/ROG MAXIMUS X HERO (WI-FI AC), BIOS 1704 09/14/2018
> [ 3052.298395] RIP: 0010:generic_make_request_checks+0x590/0x630
> [ 3052.298396] Code: 5c 03 00 00 48 8d 74 24 08 48 89 df c6 05 b5 cd
> 36 01 01 e8 c2 90 01 00 48 89 c6 44 89 ea 48 c7 c7 98 64 59 82 e8 d5
> 9b a7 ff <0f> 0b 48 8b 7b 08 e9 f2 fa ff ff 41 8b 86 98 02 00 00 49 8b
> 16 89
> [ 3052.298397] RSP: 0018:ffff8882bcc97950 EFLAGS: 00010282
> [ 3052.298397] RAX: 0000000000000050 RBX: ffff8883f22f7500 RCX: 0000000000000006
> [ 3052.298398] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88884ec56340
> [ 3052.298398] RBP: ffff8888437fa6c0 R08: 0000000000000004 R09: 0000000000000405
> [ 3052.298398] R10: 0000000000000001 R11: 00000000000006f3 R12: 0000000000001000
> [ 3052.298399] R13: 0000000000000000 R14: ffff8886eb92d800 R15: ffff8882bcc97ac0
> [ 3052.298399] FS:  00007f0471dae8c0(0000) GS:ffff88884ec40000(0000)
> knlGS:0000000000000000
> [ 3052.298400] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3052.298400] CR2: 00007f04727302c0 CR3: 0000000213c6f004 CR4: 00000000003606e0
> [ 3052.298400] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3052.298401] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3052.298401] Call Trace:
> [ 3052.298403]  ? generic_make_request+0x46/0x3d0
> [ 3052.298404]  ? wait_woken+0x80/0x80
> [ 3052.298405]  ? mempool_alloc+0xb7/0x1a0
> [ 3052.298406]  ? submit_bio+0x30/0x110
> [ 3052.298407]  ? bvec_alloc+0x7c/0xd0
> [ 3052.298409]  ? __submit_merged_bio+0x68/0x390
> [ 3052.298410]  ? f2fs_submit_page_write+0x1bb/0x7f0
> [ 3052.298410]  ? f2fs_do_write_meta_page+0x7f/0x160
> [ 3052.298411]  ? __f2fs_write_meta_page+0x70/0x140
> [ 3052.298412]  ? f2fs_sync_meta_pages+0x140/0x250
> [ 3052.298413]  ? f2fs_write_checkpoint+0x5c5/0x17b0
> [ 3052.298414]  ? f2fs_sync_fs+0x9c/0x110
> [ 3052.298416]  ? sync_filesystem+0x66/0x80
> [ 3052.298417]  ? generic_shutdown_super+0x1d/0x100
> [ 3052.298418]  ? kill_block_super+0x1c/0x40
> [ 3052.298418]  ? kill_f2fs_super+0x64/0xb0
> [ 3052.298419]  ? deactivate_locked_super+0x2d/0xb0
> [ 3052.298420]  ? cleanup_mnt+0x65/0xa0
> [ 3052.298421]  ? task_work_run+0x7f/0xa0
> [ 3052.298422]  ? exit_to_usermode_loop+0x9c/0xa0
> [ 3052.298423]  ? do_syscall_64+0xc7/0xf0
> [ 3052.298425]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 3052.298425] ---[ end trace d1bb538504008702 ]---
> [ 3052.298481] print_req_error: I/O error, dev loop1, sector 12488
> [ 3052.298486] print_req_error: I/O error, dev loop1, sector 12648
> [ 3052.298488] print_req_error: I/O error, dev loop1, sector 12672
> [ 3052.298488] print_req_error: I/O error, dev loop1, sector 33040
> [ 3052.298489] print_req_error: I/O error, dev loop1, sector 143048
> [ 3052.298491] print_req_error: I/O error, dev loop1, sector 152200
> [ 3052.298492] print_req_error: I/O error, dev loop1, sector 8192
> [ 3091.204798] F2FS-fs (loop1): SIT is corrupted node# 2481418 vs 2517451
> [ 3091.204801] F2FS-fs (loop1): Failed to initialize F2FS segment manager
>
> The fsck output was too huge to to attach here, please visit here instead:
> http://arter97.com/f2fs.log.xz
>
> Since I stored this f2fs image inside a btrfs snapshot, I always have
> access to the original(mountable) image. Feel free to suggest risky
> patches, etc.
>
> Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: f2fs: SIT corruption upon fsck
  2019-04-17 10:28 ` Ju Hyung Park
@ 2019-04-17 10:40   ` Ju Hyung Park
  2019-04-17 14:15     ` Jaegeuk Kim
  0 siblings, 1 reply; 7+ messages in thread
From: Ju Hyung Park @ 2019-04-17 10:40 UTC (permalink / raw)
  To: linux-f2fs-devel

Oh sorry, I should have been more careful attaching the kernel log.
Disregard anything before 3091s mark and just pay attention to those 2 lines:

> [ 3091.204798] F2FS-fs (loop1): SIT is corrupted node# 2481418 vs 2517451
> [ 3091.204801] F2FS-fs (loop1): Failed to initialize F2FS segment manager

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: f2fs: SIT corruption upon fsck
  2019-04-17 10:40   ` Ju Hyung Park
@ 2019-04-17 14:15     ` Jaegeuk Kim
  2019-04-17 15:51       ` Jaegeuk Kim
  0 siblings, 1 reply; 7+ messages in thread
From: Jaegeuk Kim @ 2019-04-17 14:15 UTC (permalink / raw)
  To: Ju Hyung Park; +Cc: linux-f2fs-devel

Hi Ju Hyung,

On 04/17, Ju Hyung Park wrote:
> Oh sorry, I should have been more careful attaching the kernel log.
> Disregard anything before 3091s mark and just pay attention to those 2 lines:
> 
> > [ 3091.204798] F2FS-fs (loop1): SIT is corrupted node# 2481418 vs 2517451
> > [ 3091.204801] F2FS-fs (loop1): Failed to initialize F2FS segment manager

Could you please bisect f2fs-tools first to see whether there is any wrong
patch? I feel like fsck is corrupting something.

Thanks,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: f2fs: SIT corruption upon fsck
  2019-04-17 14:15     ` Jaegeuk Kim
@ 2019-04-17 15:51       ` Jaegeuk Kim
  2019-04-17 16:29         ` Ju Hyung Park
  2019-04-18  9:31         ` Chao Yu
  0 siblings, 2 replies; 7+ messages in thread
From: Jaegeuk Kim @ 2019-04-17 15:51 UTC (permalink / raw)
  To: Ju Hyung Park; +Cc: linux-f2fs-devel

On 04/17, Jaegeuk Kim wrote:
> Hi Ju Hyung,
> 
> On 04/17, Ju Hyung Park wrote:
> > Oh sorry, I should have been more careful attaching the kernel log.
> > Disregard anything before 3091s mark and just pay attention to those 2 lines:
> > 
> > > [ 3091.204798] F2FS-fs (loop1): SIT is corrupted node# 2481418 vs 2517451
> > > [ 3091.204801] F2FS-fs (loop1): Failed to initialize F2FS segment manager
> 
> Could you please bisect f2fs-tools first to see whether there is any wrong
> patch? I feel like fsck is corrupting something.

nvm. I shoot the email too early. It seems that large_nat_bitmap gave old NAT
set somehow, so that NAT points the previous checkpointed node blocks, since
SSA were obsolete and some SIT were valid and others were invalid.

When considering no node block was missing from dentry, I think GC was doing
between two checkpoints.

[FSCK] Unreachable nat entries                        [Ok..] [0x0]
[FSCK] SIT valid block bitmap checking                [Fail]
[FSCK] Hard link checking for regular file            [Ok..] [0x7]
[FSCK] valid_block_count matching with CP             [Fail] [0x215805c]
[FSCK] valid_node_count matcing with CP (de lookup)   [Ok..] [0x2669cb]
[FSCK] valid_node_count matcing with CP (nat lookup)  [Ok..] [0x2669cb]
[FSCK] valid_inode_count matched with CP              [Ok..] [0x260f23]
[FSCK] free segment_count matched with CP             [Fail] [0xee6c]
[FSCK] next block offset is free                      [Ok..]


Chao,

BTW, why can't fsck fix this in large_nat_bitmap? Could you check the feature
in kernel side as well?

Thanks,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: f2fs: SIT corruption upon fsck
  2019-04-17 15:51       ` Jaegeuk Kim
@ 2019-04-17 16:29         ` Ju Hyung Park
  2019-04-18  9:31         ` Chao Yu
  1 sibling, 0 replies; 7+ messages in thread
From: Ju Hyung Park @ 2019-04-17 16:29 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-f2fs-devel

Hi Jaegeuk,

On Thu, Apr 18, 2019 at 12:51 AM Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> nvm. I shoot the email too early. It seems that large_nat_bitmap gave old NAT
> set somehow, so that NAT points the previous checkpointed node blocks, since
> SSA were obsolete and some SIT were valid and others were invalid.
>
> When considering no node block was missing from dentry, I think GC was doing
> between two checkpoints.

For some contexts, this is the same issue I think I reported about 6
months ago, under a very similar scenario. I didn't have the luxury
back then to backup the original working image.

I was re-formatting my f2fs partition as I didn't use the large bitmap
option before. I backed up all the data via 'rsync -a' to another
filesystem, formatted via mkfs.f2fs -i, restored the data back to the
newly formatted f2fs partition(also via rsync) and did a gc_urgent
with 1ms gc_urgent_sleep_time. IIRC GC ran about 5,000 times.

After all that, deleting files from the newly formatted f2fs partition
randomly gave me truncation warnings.
I was just able to reproduce this truncation warning by taking another
btrfs snapshot of the original image and doing 'rm -rf /home'.

[ 4756.407571] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343043
[ 4756.407574] WARNING: CPU: 1 PID: 6473 at fs/f2fs/segment.c:2165
update_sit_entry+0x3a4/0x3f0
[ 4756.407590] CPU: 1 PID: 6473 Comm: rm Tainted: G        W  O
4.19.35-zen+ #1
[ 4756.407591] Hardware name: System manufacturer System Product
Name/ROG MAXIMUS X HERO (WI-FI AC), BIOS 1704 09/14/2018
[ 4756.407591] RIP: 0010:update_sit_entry+0x3a4/0x3f0
[ 4756.407592] Code: ff ff 48 8b 7d 00 44 89 f9 48 c7 c2 e0 e0 58 82
48 c7 c6 93 96 54 82 44 89 4c 24 14 4c 89 44 24 08 44 88 14 24 e8 68
29 fe ff <0f> 0b f0 80 4d 48 04 66 83 03 40 45 31 ed 31 c9 44 0f b6 14
24 4c
[ 4756.407592] RSP: 0018:ffff88849ace3c40 EFLAGS: 00010286
[ 4756.407593] RAX: 0000000000000039 RBX: ffffc90006442fd8 RCX: 0000000000000006
[ 4756.407593] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88884ec56340
[ 4756.407593] RBP: ffff8884b5ee3800 R08: 0000000000063813 R09: 000000000026e1d0
[ 4756.407594] R10: 0000000000000000 R11: ffffffff82d4c26d R12: 000000000001b3ff
[ 4756.407594] R13: 00000000ffffffff R14: 000000000001b3ff R15: 00000000036afc43
[ 4756.407594] FS:  00007fd799482780(0000) GS:ffff88884ec40000(0000)
knlGS:0000000000000000
[ 4756.407595] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4756.407595] CR2: 000036195d5f4000 CR3: 00000005bb1d9002 CR4: 00000000003606e0
[ 4756.407595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4756.407596] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4756.407596] Call Trace:
[ 4756.407597]  ? f2fs_invalidate_blocks+0x83/0x110
[ 4756.407598]  ? f2fs_truncate_data_blocks+0xb9/0x340
[ 4756.407598]  ? truncate_nodes+0x39c/0x570
[ 4756.407599]  ? f2fs_truncate_inode_blocks+0x1a0/0x530
[ 4756.407600]  ? f2fs_truncate_blocks+0x179/0x3e0
[ 4756.407601]  ? f2fs_truncate+0x7a/0xf0
[ 4756.407602]  ? f2fs_evict_inode+0x2f6/0x380
[ 4756.407602]  ? evict+0xbf/0x1e0
[ 4756.407603]  ? do_unlinkat+0x27e/0x330
[ 4756.407604]  ? do_syscall_64+0x4a/0xf0
[ 4756.407604]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4756.407605] ---[ end trace d1bb53850401c466 ]---

I get ton of these during 'rm' with blk incrementing.
This is the same log but with "F2FS" grepped:

[ 4756.403795] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343002
[ 4756.403888] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343003
[ 4756.404012] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343004
[ 4756.404119] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343005
[ 4756.404213] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343006
[ 4756.404330] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343007
[ 4756.404443] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343008
[ 4756.404537] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343009
[ 4756.404657] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343010
[ 4756.404767] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343011
[ 4756.404860] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343012
[ 4756.404979] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343013
[ 4756.405090] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343014
[ 4756.405183] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343015
[ 4756.405301] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343016
[ 4756.405414] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343017
[ 4756.405506] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343018
[ 4756.405626] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343019
[ 4756.405736] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343020
[ 4756.405831] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343021
[ 4756.405948] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343022
[ 4756.406065] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343023
[ 4756.406161] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343024
[ 4756.406262] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343025
[ 4756.406353] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343026
[ 4756.406437] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343027
[ 4756.406529] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343028
[ 4756.406609] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343029
[ 4756.406677] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343030
[ 4756.406746] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343031
[ 4756.406815] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343032
[ 4756.406884] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343033
[ 4756.406952] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343034
[ 4756.407021] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343035
[ 4756.407089] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343036
[ 4756.407158] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343037
[ 4756.407226] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343038
[ 4756.407294] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343039
[ 4756.407363] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343040
[ 4756.407432] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343041
[ 4756.407503] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343042
[ 4756.407571] F2FS-fs (loop1): Bitmap was wrongly cleared, blk:57343043

I think I can reproduce this by doing the same rsync, mkfs, rsync over
again, though it'll take a hell of a time.
Please note that I didn't have sudden power offs and unmounts were
done cleanly before I ran into those issues, so I think we need some
solution too at the kernel-side, not just fsck.

Judging by my experiences with reformatting f2fs partitions, I'm
fairly certain this is specific to large bitmap. (Now I'm wondering if
f2fs tests(e.g. using xfstests) are also done under large bitmap.) I
didn't have any similar issues with regular bitmap formatted f2fs on
my desktop, laptop and Android.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: f2fs: SIT corruption upon fsck
  2019-04-17 15:51       ` Jaegeuk Kim
  2019-04-17 16:29         ` Ju Hyung Park
@ 2019-04-18  9:31         ` Chao Yu
  1 sibling, 0 replies; 7+ messages in thread
From: Chao Yu @ 2019-04-18  9:31 UTC (permalink / raw)
  To: Jaegeuk Kim, Ju Hyung Park; +Cc: linux-f2fs-devel

On 2019/4/17 23:51, Jaegeuk Kim wrote:
> On 04/17, Jaegeuk Kim wrote:
>> Hi Ju Hyung,
>>
>> On 04/17, Ju Hyung Park wrote:
>>> Oh sorry, I should have been more careful attaching the kernel log.
>>> Disregard anything before 3091s mark and just pay attention to those 2 lines:
>>>
>>>> [ 3091.204798] F2FS-fs (loop1): SIT is corrupted node# 2481418 vs 2517451
>>>> [ 3091.204801] F2FS-fs (loop1): Failed to initialize F2FS segment manager
>>
>> Could you please bisect f2fs-tools first to see whether there is any wrong
>> patch? I feel like fsck is corrupting something.
> 
> nvm. I shoot the email too early. It seems that large_nat_bitmap gave old NAT
> set somehow, so that NAT points the previous checkpointed node blocks, since
> SSA were obsolete and some SIT were valid and others were invalid.
> 
> When considering no node block was missing from dentry, I think GC was doing
> between two checkpoints.
> 
> [FSCK] Unreachable nat entries                        [Ok..] [0x0]
> [FSCK] SIT valid block bitmap checking                [Fail]
> [FSCK] Hard link checking for regular file            [Ok..] [0x7]
> [FSCK] valid_block_count matching with CP             [Fail] [0x215805c]
> [FSCK] valid_node_count matcing with CP (de lookup)   [Ok..] [0x2669cb]
> [FSCK] valid_node_count matcing with CP (nat lookup)  [Ok..] [0x2669cb]
> [FSCK] valid_inode_count matched with CP              [Ok..] [0x260f23]
> [FSCK] free segment_count matched with CP             [Fail] [0xee6c]
> [FSCK] next block offset is free                      [Ok..]
> 
> 
> Chao,
> 
> BTW, why can't fsck fix this in large_nat_bitmap? Could you check the feature

I found that after repairing corrupted image in where large_nat_bitmap is
enabled, we will lose CP_LARGE_NAT_BITMAP_FLAG flag, result in loading sit
bitmap area with incorrect offset in kernel side.

I've sent a patch to fix this issue.

> in kernel side as well?

But there is another design problem, once nat_bitmap_size + sit_bitmap_size is
larger than MAX_BITMAP_SIZE_IN_CKPT, nat_bitmap or sit_bitmap may overlap
checkpoint checksum's position, whenever checkpoint is written, bitmap will be
damaged...

	if (is_set_ckpt_flags(ckpt, CP_LARGE_NAT_BITMAP_FLAG)) {
		offset = (flag == SIT_BITMAP) ?
			le32_to_cpu(ckpt->nat_ver_bitmap_bytesize) : 0;
		return &ckpt->sit_nat_version_bitmap + offset;
	}

Ju Hyung's image size is 256GB, nat bitmap size is 3648 bytes, sit bitmap size
is 320 bytes, checksum position locates at sit_bitmap[252,255], so for those
segments which segno is in range of:

252 * 8 * 55 + 383 (# of meta_seg) = 111263
256 * 8 * 55 + 383 (# of meta_seg) = 113023

Their SIT may not be correct due to corrupted sit_bitmap.

>From the fsck log, it looks that can match the above analyzing of root cause.

[FIX] (is_valid_ssa_data_blk: 321)  --> Set data summary 0x1b329 -> [0x22dda4]
[0x0] [0x0]

corrupted segno: 0x1b329 = 111401

...
[FIX] (is_valid_ssa_data_blk: 321)  --> Set data summary 0x1b68d -> [0x22f811]
[0x0] [0x103]

corrupted segno: 0x1b68d = 112269

...


Sorry to introduce that bug.... let me take some time to fix.


Thanks,

> 
> Thanks,
> 
> 
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> .
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-18  9:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-17 10:25 f2fs: SIT corruption upon fsck Ju Hyung Park
2019-04-17 10:28 ` Ju Hyung Park
2019-04-17 10:40   ` Ju Hyung Park
2019-04-17 14:15     ` Jaegeuk Kim
2019-04-17 15:51       ` Jaegeuk Kim
2019-04-17 16:29         ` Ju Hyung Park
2019-04-18  9:31         ` Chao Yu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.