All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
@ 2017-09-30  3:28 Zorro Lang
  2017-10-01 22:58 ` Dave Chinner
  2017-10-02 13:56 ` Brian Foster
  0 siblings, 2 replies; 14+ messages in thread
From: Zorro Lang @ 2017-09-30  3:28 UTC (permalink / raw)
  To: linux-xfs

Hi,

I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
(with xfsprogs 4.13.1), and I can reproduce it on the same machine
twice. But I can't reproduce it on another machine.

Maybe there're some hardware specific requirement to trigger this panic. I
tested on normal disk partition, but the disk is multi stripes RAID device.
I didn't get the mkfs output of g/085, bug I found the default mkfs output
(mkfs.xfs -f /dev/sda3) is:

meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
data     =                       bsize=1024   blocks=15720448, imaxpct=25
         =                       sunit=512    swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=1024   blocks=10240, version=2
         =                       sectsz=512   sunit=32 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

(The test machine is not on my hand now, I need time reserve it.)

Thanks,
Zorro

[1]:

[  373.165020] run fstests generic/085 at 2017-09-29 10:29:32 
[  373.522944] XFS (sda4): Unmounting Filesystem 
[  373.700510] device-mapper: uevent: version 1.0.3 
[  373.725266] device-mapper: ioctl: 4.36.0-ioctl (2017-06-09) initialised: dm-devel@redhat.com 
[  374.199737] XFS (dm-0): Mounting V5 Filesystem 
[  374.228642] XFS (dm-0): Ending clean mount 
[  374.285479] XFS (dm-0): Unmounting Filesystem 
[  374.319080] XFS (dm-0): Mounting V5 Filesystem 
[  374.353123] XFS (dm-0): Ending clean mount 
[  374.409625] XFS (dm-0): Unmounting Filesystem 
[  374.437494] XFS (dm-0): Mounting V5 Filesystem 
[  374.477124] XFS (dm-0): Ending clean mount 
[  374.549775] XFS (dm-0): Unmounting Filesystem 
[  374.578300] XFS (dm-0): Mounting V5 Filesystem 
[  374.618208] XFS (dm-0): Ending clean mount 
[  374.672593] XFS (dm-0): Unmounting Filesystem 
[  374.701455] XFS (dm-0): Mounting V5 Filesystem 
[  374.741861] XFS (dm-0): Ending clean mount 
[  374.798972] XFS (dm-0): Unmounting Filesystem 
[  374.827584] XFS (dm-0): Mounting V5 Filesystem 
[  374.872622] XFS (dm-0): Ending clean mount 
[  374.938045] XFS (dm-0): Unmounting Filesystem 
[  374.966630] XFS (dm-0): Mounting V5 Filesystem 
[  375.009748] XFS (dm-0): Ending clean mount 
[  375.067006] XFS (dm-0): Unmounting Filesystem 
[  375.095371] XFS (dm-0): Mounting V5 Filesystem 
[  375.134992] XFS (dm-0): Ending clean mount 
[  375.198436] XFS (dm-0): Unmounting Filesystem 
[  375.226926] XFS (dm-0): Mounting V5 Filesystem 
[  375.271643] XFS (dm-0): Ending clean mount 
[  375.326618] XFS (dm-0): Unmounting Filesystem 
[  375.357583] XFS (dm-0): Mounting V5 Filesystem 
[  375.402952] XFS (dm-0): Ending clean mount 
[  375.454747] XFS (dm-0): Unmounting Filesystem 
[  375.483053] XFS (dm-0): Mounting V5 Filesystem 
[  375.527584] XFS (dm-0): Ending clean mount 
[  375.592113] XFS (dm-0): Unmounting Filesystem 
[  375.620637] XFS (dm-0): Mounting V5 Filesystem 
[  375.683969] XFS (dm-0): Invalid block length (0xfffffed8) for buffer 
[  375.713282] BUG: unable to handle kernel NULL pointer dereference at           (null) 
[  375.749352] IP: xlog_header_check_mount+0x11/0xd0 [xfs] 
[  375.773424] PGD 0 P4D 0  
[  375.784990] Oops: 0000 [#1] SMP 
[  375.799382] Modules linked in: dm_mod rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ext4 mbcache jbd2 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc vfat fat aesni_intel crypto_simd glue_helper ipmi_ssif cryptd iTCO_wdt joydev hpilo iTCO_vendor_support sg ipmi_si hpwdt ipmi_devintf i2c_i801 pcspkr lpc_ich ioatdma ipmi_msghandler shpchp dca nfsd wmi acpi_power_meter auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper sd_mod syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core hpsa be2net 
[  376.134079]  scsi_transport_sas 
[  376.148586] CPU: 52 PID: 46126 Comm: mount Not tainted 4.14.0-rc2 #1 
[  376.177733] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016 
[  376.209076] task: ffff9e448206b4c0 task.stack: ffffab2bc9828000 
[  376.236861] RIP: 0010:xlog_header_check_mount+0x11/0xd0 [xfs] 
[  376.263261] RSP: 0018:ffffab2bc982baf8 EFLAGS: 00010246 
[  376.287307] RAX: 0000000000000001 RBX: fffffffffffffed7 RCX: 0000000000000000 
[  376.320119] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9e44827ee000 
[  376.353016] RBP: ffffab2bc982bb10 R08: 0000000000000000 R09: 0000000000000000 
[  376.388077] R10: 0000000000000001 R11: 000000008dbdaba7 R12: ffff9e3e84567b80 
[  376.423650] R13: ffff9e44827ee000 R14: 0000000000000001 R15: 0000000000000000 
[  376.456573] FS:  00007f7ea6c46880(0000) GS:ffff9e3ea7a00000(0000) knlGS:0000000000000000 
[  376.493753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[  376.520232] CR2: 0000000000000000 CR3: 0000000c3b76a004 CR4: 00000000001606e0 
[  376.553291] Call Trace: 
[  376.564479]  xlog_find_verify_log_record+0x13b/0x270 [xfs] 
[  376.589761]  xlog_find_head+0x1ed/0x4d0 [xfs] 
[  376.609787]  ? mark_held_locks+0x66/0x90 
[  376.627819]  xlog_find_tail+0x43/0x3a0 [xfs] 
[  376.647431]  ? try_to_wake_up+0x59/0x750 
[  376.665459]  xlog_recover+0x2d/0x170 [xfs] 
[  376.684250]  ? xfs_trans_ail_init+0xc7/0xf0 [xfs] 
[  376.706261]  xfs_log_mount+0x2b0/0x320 [xfs] 
[  376.726658]  xfs_mountfs+0x55c/0xaf0 [xfs] 
[  376.745614]  ? xfs_mru_cache_create+0x178/0x1d0 [xfs] 
[  376.768813]  xfs_fs_fill_super+0x4bd/0x620 [xfs] 
[  376.790017]  mount_bdev+0x18c/0x1c0 
[  376.806030]  ? xfs_test_remount_options.isra.15+0x60/0x60 [xfs] 
[  376.833247]  xfs_fs_mount+0x15/0x20 [xfs] 
[  376.851728]  mount_fs+0x39/0x150 
[  376.866558]  vfs_kern_mount+0x6b/0x170 
[  376.884623]  do_mount+0x1f0/0xd60 
[  376.901879]  ? memdup_user+0x42/0x60 
[  376.919545]  SyS_mount+0x83/0xd0 
[  376.936736]  do_syscall_64+0x6c/0x220 
[  376.954020]  entry_SYSCALL64_slow_path+0x25/0x25 
[  376.975265] RIP: 0033:0x7f7ea5ec0aaa 
[  376.991661] RSP: 002b:00007ffe777381e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 
[  377.026394] RAX: ffffffffffffffda RBX: 00005596b8d93080 RCX: 00007f7ea5ec0aaa 
[  377.059178] RDX: 00005596b8d95640 RSI: 00005596b8d93270 RDI: 00005596b8d93250 
[  377.092234] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000010 
[  377.125095] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00005596b8d93250 
[  377.158157] R13: 00005596b8d95640 R14: 0000000000000000 R15: 00005596b8d93080 
[  377.192994] Code: c0 48 c7 c7 58 13 70 c0 e8 2d 2a fe ff e9 aa fd ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 41 54 53 <81> 3e fe ed ba be 48 89 f3 75 5b 4c 8d a3 30 01 00 00 ba 10 00  
[  377.291657] RIP: xlog_header_check_mount+0x11/0xd0 [xfs] RSP: ffffab2bc982baf8 
[  377.326553] CR2: 0000000000000000 
[  377.342022] ---[ end trace 85d9cc5b8e738db6 ]--- 





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-09-30  3:28 [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2 Zorro Lang
@ 2017-10-01 22:58 ` Dave Chinner
  2017-10-02 22:15   ` Darrick J. Wong
  2017-10-02 13:56 ` Brian Foster
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2017-10-01 22:58 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs

On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> Hi,
> 
> I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> twice. But I can't reproduce it on another machine.
> 
> Maybe there're some hardware specific requirement to trigger this panic. I
> tested on normal disk partition, but the disk is multi stripes RAID device.
> I didn't get the mkfs output of g/085, bug I found the default mkfs output
> (mkfs.xfs -f /dev/sda3) is:
> 
> meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> data     =                       bsize=1024   blocks=15720448, imaxpct=25
>          =                       sunit=512    swidth=1024 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal log           bsize=1024   blocks=10240, version=2
>          =                       sectsz=512   sunit=32 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0

FWIW, I've come across a few of these log recovery crashes recently
when reworking mkfs.xfs. The cause of them has always been either a
log being too small or a mismatch between log size and log stripe
unit configuration. The typical sign of that was either a
negative buffer length like this one (XFS (dm-0): Invalid block
length (0xfffffed8) for buffer) or the head/tail block initially
being calculated before/after the actual log and so the log offset
was negative.

I'm guessing the recent log validity checking we've added isn't as
robust as it should be, but I haven't had time to dig into it yet.
I've debugged the issues far enough to point to mkfs being wrong
with xfs_logprint - it runs the same head/tail recovery code as the
kernel so typically crashes on the same problems as the kernel. It's
much easier to debug in userspace with gdb, though.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-09-30  3:28 [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2 Zorro Lang
  2017-10-01 22:58 ` Dave Chinner
@ 2017-10-02 13:56 ` Brian Foster
  2017-10-03  2:27   ` Zorro Lang
  2017-10-13 13:29   ` Zorro Lang
  1 sibling, 2 replies; 14+ messages in thread
From: Brian Foster @ 2017-10-02 13:56 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs

On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> Hi,
> 
> I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> twice. But I can't reproduce it on another machine.
> 
> Maybe there're some hardware specific requirement to trigger this panic. I
> tested on normal disk partition, but the disk is multi stripes RAID device.
> I didn't get the mkfs output of g/085, bug I found the default mkfs output
> (mkfs.xfs -f /dev/sda3) is:
> 
> meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> data     =                       bsize=1024   blocks=15720448, imaxpct=25
>          =                       sunit=512    swidth=1024 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal log           bsize=1024   blocks=10240, version=2
>          =                       sectsz=512   sunit=32 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> (The test machine is not on my hand now, I need time reserve it.)
> 

If you are able to reproduce, could you provide a metadump of this fs
immediately after the crash?

Brian

> Thanks,
> Zorro
> 
> [1]:
> 
> [  373.165020] run fstests generic/085 at 2017-09-29 10:29:32 
> [  373.522944] XFS (sda4): Unmounting Filesystem 
> [  373.700510] device-mapper: uevent: version 1.0.3 
> [  373.725266] device-mapper: ioctl: 4.36.0-ioctl (2017-06-09) initialised: dm-devel@redhat.com 
> [  374.199737] XFS (dm-0): Mounting V5 Filesystem 
> [  374.228642] XFS (dm-0): Ending clean mount 
> [  374.285479] XFS (dm-0): Unmounting Filesystem 
> [  374.319080] XFS (dm-0): Mounting V5 Filesystem 
> [  374.353123] XFS (dm-0): Ending clean mount 
> [  374.409625] XFS (dm-0): Unmounting Filesystem 
> [  374.437494] XFS (dm-0): Mounting V5 Filesystem 
> [  374.477124] XFS (dm-0): Ending clean mount 
> [  374.549775] XFS (dm-0): Unmounting Filesystem 
> [  374.578300] XFS (dm-0): Mounting V5 Filesystem 
> [  374.618208] XFS (dm-0): Ending clean mount 
> [  374.672593] XFS (dm-0): Unmounting Filesystem 
> [  374.701455] XFS (dm-0): Mounting V5 Filesystem 
> [  374.741861] XFS (dm-0): Ending clean mount 
> [  374.798972] XFS (dm-0): Unmounting Filesystem 
> [  374.827584] XFS (dm-0): Mounting V5 Filesystem 
> [  374.872622] XFS (dm-0): Ending clean mount 
> [  374.938045] XFS (dm-0): Unmounting Filesystem 
> [  374.966630] XFS (dm-0): Mounting V5 Filesystem 
> [  375.009748] XFS (dm-0): Ending clean mount 
> [  375.067006] XFS (dm-0): Unmounting Filesystem 
> [  375.095371] XFS (dm-0): Mounting V5 Filesystem 
> [  375.134992] XFS (dm-0): Ending clean mount 
> [  375.198436] XFS (dm-0): Unmounting Filesystem 
> [  375.226926] XFS (dm-0): Mounting V5 Filesystem 
> [  375.271643] XFS (dm-0): Ending clean mount 
> [  375.326618] XFS (dm-0): Unmounting Filesystem 
> [  375.357583] XFS (dm-0): Mounting V5 Filesystem 
> [  375.402952] XFS (dm-0): Ending clean mount 
> [  375.454747] XFS (dm-0): Unmounting Filesystem 
> [  375.483053] XFS (dm-0): Mounting V5 Filesystem 
> [  375.527584] XFS (dm-0): Ending clean mount 
> [  375.592113] XFS (dm-0): Unmounting Filesystem 
> [  375.620637] XFS (dm-0): Mounting V5 Filesystem 
> [  375.683969] XFS (dm-0): Invalid block length (0xfffffed8) for buffer 
> [  375.713282] BUG: unable to handle kernel NULL pointer dereference at           (null) 
> [  375.749352] IP: xlog_header_check_mount+0x11/0xd0 [xfs] 
> [  375.773424] PGD 0 P4D 0  
> [  375.784990] Oops: 0000 [#1] SMP 
> [  375.799382] Modules linked in: dm_mod rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ext4 mbcache jbd2 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc vfat fat aesni_intel crypto_simd glue_helper ipmi_ssif cryptd iTCO_wdt joydev hpilo iTCO_vendor_support sg ipmi_si hpwdt ipmi_devintf i2c_i801 pcspkr lpc_ich ioatdma ipmi_msghandler shpchp dca nfsd wmi acpi_power_meter auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper sd_mod syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core hpsa be2net 
> [  376.134079]  scsi_transport_sas 
> [  376.148586] CPU: 52 PID: 46126 Comm: mount Not tainted 4.14.0-rc2 #1 
> [  376.177733] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016 
> [  376.209076] task: ffff9e448206b4c0 task.stack: ffffab2bc9828000 
> [  376.236861] RIP: 0010:xlog_header_check_mount+0x11/0xd0 [xfs] 
> [  376.263261] RSP: 0018:ffffab2bc982baf8 EFLAGS: 00010246 
> [  376.287307] RAX: 0000000000000001 RBX: fffffffffffffed7 RCX: 0000000000000000 
> [  376.320119] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9e44827ee000 
> [  376.353016] RBP: ffffab2bc982bb10 R08: 0000000000000000 R09: 0000000000000000 
> [  376.388077] R10: 0000000000000001 R11: 000000008dbdaba7 R12: ffff9e3e84567b80 
> [  376.423650] R13: ffff9e44827ee000 R14: 0000000000000001 R15: 0000000000000000 
> [  376.456573] FS:  00007f7ea6c46880(0000) GS:ffff9e3ea7a00000(0000) knlGS:0000000000000000 
> [  376.493753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
> [  376.520232] CR2: 0000000000000000 CR3: 0000000c3b76a004 CR4: 00000000001606e0 
> [  376.553291] Call Trace: 
> [  376.564479]  xlog_find_verify_log_record+0x13b/0x270 [xfs] 
> [  376.589761]  xlog_find_head+0x1ed/0x4d0 [xfs] 
> [  376.609787]  ? mark_held_locks+0x66/0x90 
> [  376.627819]  xlog_find_tail+0x43/0x3a0 [xfs] 
> [  376.647431]  ? try_to_wake_up+0x59/0x750 
> [  376.665459]  xlog_recover+0x2d/0x170 [xfs] 
> [  376.684250]  ? xfs_trans_ail_init+0xc7/0xf0 [xfs] 
> [  376.706261]  xfs_log_mount+0x2b0/0x320 [xfs] 
> [  376.726658]  xfs_mountfs+0x55c/0xaf0 [xfs] 
> [  376.745614]  ? xfs_mru_cache_create+0x178/0x1d0 [xfs] 
> [  376.768813]  xfs_fs_fill_super+0x4bd/0x620 [xfs] 
> [  376.790017]  mount_bdev+0x18c/0x1c0 
> [  376.806030]  ? xfs_test_remount_options.isra.15+0x60/0x60 [xfs] 
> [  376.833247]  xfs_fs_mount+0x15/0x20 [xfs] 
> [  376.851728]  mount_fs+0x39/0x150 
> [  376.866558]  vfs_kern_mount+0x6b/0x170 
> [  376.884623]  do_mount+0x1f0/0xd60 
> [  376.901879]  ? memdup_user+0x42/0x60 
> [  376.919545]  SyS_mount+0x83/0xd0 
> [  376.936736]  do_syscall_64+0x6c/0x220 
> [  376.954020]  entry_SYSCALL64_slow_path+0x25/0x25 
> [  376.975265] RIP: 0033:0x7f7ea5ec0aaa 
> [  376.991661] RSP: 002b:00007ffe777381e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 
> [  377.026394] RAX: ffffffffffffffda RBX: 00005596b8d93080 RCX: 00007f7ea5ec0aaa 
> [  377.059178] RDX: 00005596b8d95640 RSI: 00005596b8d93270 RDI: 00005596b8d93250 
> [  377.092234] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000010 
> [  377.125095] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00005596b8d93250 
> [  377.158157] R13: 00005596b8d95640 R14: 0000000000000000 R15: 00005596b8d93080 
> [  377.192994] Code: c0 48 c7 c7 58 13 70 c0 e8 2d 2a fe ff e9 aa fd ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 41 54 53 <81> 3e fe ed ba be 48 89 f3 75 5b 4c 8d a3 30 01 00 00 ba 10 00  
> [  377.291657] RIP: xlog_header_check_mount+0x11/0xd0 [xfs] RSP: ffffab2bc982baf8 
> [  377.326553] CR2: 0000000000000000 
> [  377.342022] ---[ end trace 85d9cc5b8e738db6 ]--- 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-10-01 22:58 ` Dave Chinner
@ 2017-10-02 22:15   ` Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2017-10-02 22:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Zorro Lang, linux-xfs

On Mon, Oct 02, 2017 at 09:58:49AM +1100, Dave Chinner wrote:
> On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> > Hi,
> > 
> > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> > (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> > twice. But I can't reproduce it on another machine.
> > 
> > Maybe there're some hardware specific requirement to trigger this panic. I
> > tested on normal disk partition, but the disk is multi stripes RAID device.
> > I didn't get the mkfs output of g/085, bug I found the default mkfs output
> > (mkfs.xfs -f /dev/sda3) is:
> > 
> > meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
> >          =                       sectsz=512   attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> > data     =                       bsize=1024   blocks=15720448, imaxpct=25
> >          =                       sunit=512    swidth=1024 blks
> > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > log      =internal log           bsize=1024   blocks=10240, version=2
> >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> FWIW, I've come across a few of these log recovery crashes recently
> when reworking mkfs.xfs. The cause of them has always been either a
> log being too small or a mismatch between log size and log stripe
> unit configuration. The typical sign of that was either a
> negative buffer length like this one (XFS (dm-0): Invalid block
> length (0xfffffed8) for buffer) or the head/tail block initially
> being calculated before/after the actual log and so the log offset
> was negative.
> 
> I'm guessing the recent log validity checking we've added isn't as
> robust as it should be, but I haven't had time to dig into it yet.
> I've debugged the issues far enough to point to mkfs being wrong
> with xfs_logprint - it runs the same head/tail recovery code as the
> kernel so typically crashes on the same problems as the kernel. It's
> much easier to debug in userspace with gdb, though.....

Just to pile on with everyone else: I've noticed that fuzzing logsunit
to -1 causes the mount process to spit out a bunch of recovery-related
io errors.  Shortly thereafter the kernel crashes too.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-10-02 13:56 ` Brian Foster
@ 2017-10-03  2:27   ` Zorro Lang
  2017-10-13 13:29   ` Zorro Lang
  1 sibling, 0 replies; 14+ messages in thread
From: Zorro Lang @ 2017-10-03  2:27 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Oct 02, 2017 at 09:56:18AM -0400, Brian Foster wrote:
> On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> > Hi,
> > 
> > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> > (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> > twice. But I can't reproduce it on another machine.
> > 
> > Maybe there're some hardware specific requirement to trigger this panic. I
> > tested on normal disk partition, but the disk is multi stripes RAID device.
> > I didn't get the mkfs output of g/085, bug I found the default mkfs output
> > (mkfs.xfs -f /dev/sda3) is:
> > 
> > meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
> >          =                       sectsz=512   attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> > data     =                       bsize=1024   blocks=15720448, imaxpct=25
> >          =                       sunit=512    swidth=1024 blks
> > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > log      =internal log           bsize=1024   blocks=10240, version=2
> >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > 
> > (The test machine is not on my hand now, I need time reserve it.)
> > 
> 
> If you are able to reproduce, could you provide a metadump of this fs
> immediately after the crash?

Sure, I'm really trying that. But that machine is out of control, I need
to contact to system admin at first and wait their response.

Thanks
Zorro

> 
> Brian
> 
> > Thanks,
> > Zorro
> > 
> > [1]:
> > 
> > [  373.165020] run fstests generic/085 at 2017-09-29 10:29:32 
> > [  373.522944] XFS (sda4): Unmounting Filesystem 
> > [  373.700510] device-mapper: uevent: version 1.0.3 
> > [  373.725266] device-mapper: ioctl: 4.36.0-ioctl (2017-06-09) initialised: dm-devel@redhat.com 
> > [  374.199737] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.228642] XFS (dm-0): Ending clean mount 
> > [  374.285479] XFS (dm-0): Unmounting Filesystem 
> > [  374.319080] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.353123] XFS (dm-0): Ending clean mount 
> > [  374.409625] XFS (dm-0): Unmounting Filesystem 
> > [  374.437494] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.477124] XFS (dm-0): Ending clean mount 
> > [  374.549775] XFS (dm-0): Unmounting Filesystem 
> > [  374.578300] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.618208] XFS (dm-0): Ending clean mount 
> > [  374.672593] XFS (dm-0): Unmounting Filesystem 
> > [  374.701455] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.741861] XFS (dm-0): Ending clean mount 
> > [  374.798972] XFS (dm-0): Unmounting Filesystem 
> > [  374.827584] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.872622] XFS (dm-0): Ending clean mount 
> > [  374.938045] XFS (dm-0): Unmounting Filesystem 
> > [  374.966630] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.009748] XFS (dm-0): Ending clean mount 
> > [  375.067006] XFS (dm-0): Unmounting Filesystem 
> > [  375.095371] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.134992] XFS (dm-0): Ending clean mount 
> > [  375.198436] XFS (dm-0): Unmounting Filesystem 
> > [  375.226926] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.271643] XFS (dm-0): Ending clean mount 
> > [  375.326618] XFS (dm-0): Unmounting Filesystem 
> > [  375.357583] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.402952] XFS (dm-0): Ending clean mount 
> > [  375.454747] XFS (dm-0): Unmounting Filesystem 
> > [  375.483053] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.527584] XFS (dm-0): Ending clean mount 
> > [  375.592113] XFS (dm-0): Unmounting Filesystem 
> > [  375.620637] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.683969] XFS (dm-0): Invalid block length (0xfffffed8) for buffer 
> > [  375.713282] BUG: unable to handle kernel NULL pointer dereference at           (null) 
> > [  375.749352] IP: xlog_header_check_mount+0x11/0xd0 [xfs] 
> > [  375.773424] PGD 0 P4D 0  
> > [  375.784990] Oops: 0000 [#1] SMP 
> > [  375.799382] Modules linked in: dm_mod rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ext4 mbcache jbd2 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc vfat fat aesni_intel crypto_simd glue_helper ipmi_ssif cryptd iTCO_wdt joydev hpilo iTCO_vendor_support sg ipmi_si hpwdt ipmi_devintf i2c_i801 pcspkr lpc_ich ioatdma ipmi_msghandler shpchp dca nfsd wmi acpi_power_meter auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper sd_mod syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core hpsa be2net 
> > [  376.134079]  scsi_transport_sas 
> > [  376.148586] CPU: 52 PID: 46126 Comm: mount Not tainted 4.14.0-rc2 #1 
> > [  376.177733] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016 
> > [  376.209076] task: ffff9e448206b4c0 task.stack: ffffab2bc9828000 
> > [  376.236861] RIP: 0010:xlog_header_check_mount+0x11/0xd0 [xfs] 
> > [  376.263261] RSP: 0018:ffffab2bc982baf8 EFLAGS: 00010246 
> > [  376.287307] RAX: 0000000000000001 RBX: fffffffffffffed7 RCX: 0000000000000000 
> > [  376.320119] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9e44827ee000 
> > [  376.353016] RBP: ffffab2bc982bb10 R08: 0000000000000000 R09: 0000000000000000 
> > [  376.388077] R10: 0000000000000001 R11: 000000008dbdaba7 R12: ffff9e3e84567b80 
> > [  376.423650] R13: ffff9e44827ee000 R14: 0000000000000001 R15: 0000000000000000 
> > [  376.456573] FS:  00007f7ea6c46880(0000) GS:ffff9e3ea7a00000(0000) knlGS:0000000000000000 
> > [  376.493753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
> > [  376.520232] CR2: 0000000000000000 CR3: 0000000c3b76a004 CR4: 00000000001606e0 
> > [  376.553291] Call Trace: 
> > [  376.564479]  xlog_find_verify_log_record+0x13b/0x270 [xfs] 
> > [  376.589761]  xlog_find_head+0x1ed/0x4d0 [xfs] 
> > [  376.609787]  ? mark_held_locks+0x66/0x90 
> > [  376.627819]  xlog_find_tail+0x43/0x3a0 [xfs] 
> > [  376.647431]  ? try_to_wake_up+0x59/0x750 
> > [  376.665459]  xlog_recover+0x2d/0x170 [xfs] 
> > [  376.684250]  ? xfs_trans_ail_init+0xc7/0xf0 [xfs] 
> > [  376.706261]  xfs_log_mount+0x2b0/0x320 [xfs] 
> > [  376.726658]  xfs_mountfs+0x55c/0xaf0 [xfs] 
> > [  376.745614]  ? xfs_mru_cache_create+0x178/0x1d0 [xfs] 
> > [  376.768813]  xfs_fs_fill_super+0x4bd/0x620 [xfs] 
> > [  376.790017]  mount_bdev+0x18c/0x1c0 
> > [  376.806030]  ? xfs_test_remount_options.isra.15+0x60/0x60 [xfs] 
> > [  376.833247]  xfs_fs_mount+0x15/0x20 [xfs] 
> > [  376.851728]  mount_fs+0x39/0x150 
> > [  376.866558]  vfs_kern_mount+0x6b/0x170 
> > [  376.884623]  do_mount+0x1f0/0xd60 
> > [  376.901879]  ? memdup_user+0x42/0x60 
> > [  376.919545]  SyS_mount+0x83/0xd0 
> > [  376.936736]  do_syscall_64+0x6c/0x220 
> > [  376.954020]  entry_SYSCALL64_slow_path+0x25/0x25 
> > [  376.975265] RIP: 0033:0x7f7ea5ec0aaa 
> > [  376.991661] RSP: 002b:00007ffe777381e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 
> > [  377.026394] RAX: ffffffffffffffda RBX: 00005596b8d93080 RCX: 00007f7ea5ec0aaa 
> > [  377.059178] RDX: 00005596b8d95640 RSI: 00005596b8d93270 RDI: 00005596b8d93250 
> > [  377.092234] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000010 
> > [  377.125095] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00005596b8d93250 
> > [  377.158157] R13: 00005596b8d95640 R14: 0000000000000000 R15: 00005596b8d93080 
> > [  377.192994] Code: c0 48 c7 c7 58 13 70 c0 e8 2d 2a fe ff e9 aa fd ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 41 54 53 <81> 3e fe ed ba be 48 89 f3 75 5b 4c 8d a3 30 01 00 00 ba 10 00  
> > [  377.291657] RIP: xlog_header_check_mount+0x11/0xd0 [xfs] RSP: ffffab2bc982baf8 
> > [  377.326553] CR2: 0000000000000000 
> > [  377.342022] ---[ end trace 85d9cc5b8e738db6 ]--- 
> > 
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-10-02 13:56 ` Brian Foster
  2017-10-03  2:27   ` Zorro Lang
@ 2017-10-13 13:29   ` Zorro Lang
  2017-10-13 18:16     ` Brian Foster
  1 sibling, 1 reply; 14+ messages in thread
From: Zorro Lang @ 2017-10-13 13:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, darrick.wong, david

On Mon, Oct 02, 2017 at 09:56:18AM -0400, Brian Foster wrote:
> On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> > Hi,
> > 
> > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> > (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> > twice. But I can't reproduce it on another machine.
> > 
> > Maybe there're some hardware specific requirement to trigger this panic. I
> > tested on normal disk partition, but the disk is multi stripes RAID device.
> > I didn't get the mkfs output of g/085, bug I found the default mkfs output
> > (mkfs.xfs -f /dev/sda3) is:
> > 
> > meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
> >          =                       sectsz=512   attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> > data     =                       bsize=1024   blocks=15720448, imaxpct=25
> >          =                       sunit=512    swidth=1024 blks
> > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > log      =internal log           bsize=1024   blocks=10240, version=2
> >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > 
> > (The test machine is not on my hand now, I need time reserve it.)
> > 
> 
> If you are able to reproduce, could you provide a metadump of this fs
> immediately after the crash?

Finally I got the machine which can reproduce this bug for 1 day. Then I
got the XFS metadump which can trigger this bug.

Please download the metadump file by opening below link:
https://drive.google.com/file/d/0B5dFDeCXGOPXalNuMUJNdDM3STQ/view?usp=sharing

Just mount this xfs image, then kernel will crash. I didn't do any operations
on this XFS, just did "mkfs.xfs -b size=1024".

Thanks,
Zorro

> 
> Brian
> 
> > Thanks,
> > Zorro
> > 
> > [1]:
> > 
> > [  373.165020] run fstests generic/085 at 2017-09-29 10:29:32 
> > [  373.522944] XFS (sda4): Unmounting Filesystem 
> > [  373.700510] device-mapper: uevent: version 1.0.3 
> > [  373.725266] device-mapper: ioctl: 4.36.0-ioctl (2017-06-09) initialised: dm-devel@redhat.com 
> > [  374.199737] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.228642] XFS (dm-0): Ending clean mount 
> > [  374.285479] XFS (dm-0): Unmounting Filesystem 
> > [  374.319080] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.353123] XFS (dm-0): Ending clean mount 
> > [  374.409625] XFS (dm-0): Unmounting Filesystem 
> > [  374.437494] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.477124] XFS (dm-0): Ending clean mount 
> > [  374.549775] XFS (dm-0): Unmounting Filesystem 
> > [  374.578300] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.618208] XFS (dm-0): Ending clean mount 
> > [  374.672593] XFS (dm-0): Unmounting Filesystem 
> > [  374.701455] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.741861] XFS (dm-0): Ending clean mount 
> > [  374.798972] XFS (dm-0): Unmounting Filesystem 
> > [  374.827584] XFS (dm-0): Mounting V5 Filesystem 
> > [  374.872622] XFS (dm-0): Ending clean mount 
> > [  374.938045] XFS (dm-0): Unmounting Filesystem 
> > [  374.966630] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.009748] XFS (dm-0): Ending clean mount 
> > [  375.067006] XFS (dm-0): Unmounting Filesystem 
> > [  375.095371] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.134992] XFS (dm-0): Ending clean mount 
> > [  375.198436] XFS (dm-0): Unmounting Filesystem 
> > [  375.226926] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.271643] XFS (dm-0): Ending clean mount 
> > [  375.326618] XFS (dm-0): Unmounting Filesystem 
> > [  375.357583] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.402952] XFS (dm-0): Ending clean mount 
> > [  375.454747] XFS (dm-0): Unmounting Filesystem 
> > [  375.483053] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.527584] XFS (dm-0): Ending clean mount 
> > [  375.592113] XFS (dm-0): Unmounting Filesystem 
> > [  375.620637] XFS (dm-0): Mounting V5 Filesystem 
> > [  375.683969] XFS (dm-0): Invalid block length (0xfffffed8) for buffer 
> > [  375.713282] BUG: unable to handle kernel NULL pointer dereference at           (null) 
> > [  375.749352] IP: xlog_header_check_mount+0x11/0xd0 [xfs] 
> > [  375.773424] PGD 0 P4D 0  
> > [  375.784990] Oops: 0000 [#1] SMP 
> > [  375.799382] Modules linked in: dm_mod rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ext4 mbcache jbd2 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc vfat fat aesni_intel crypto_simd glue_helper ipmi_ssif cryptd iTCO_wdt joydev hpilo iTCO_vendor_support sg ipmi_si hpwdt ipmi_devintf i2c_i801 pcspkr lpc_ich ioatdma ipmi_msghandler shpchp dca nfsd wmi acpi_power_meter auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper sd_mod syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core hpsa be2net 
> > [  376.134079]  scsi_transport_sas 
> > [  376.148586] CPU: 52 PID: 46126 Comm: mount Not tainted 4.14.0-rc2 #1 
> > [  376.177733] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016 
> > [  376.209076] task: ffff9e448206b4c0 task.stack: ffffab2bc9828000 
> > [  376.236861] RIP: 0010:xlog_header_check_mount+0x11/0xd0 [xfs] 
> > [  376.263261] RSP: 0018:ffffab2bc982baf8 EFLAGS: 00010246 
> > [  376.287307] RAX: 0000000000000001 RBX: fffffffffffffed7 RCX: 0000000000000000 
> > [  376.320119] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9e44827ee000 
> > [  376.353016] RBP: ffffab2bc982bb10 R08: 0000000000000000 R09: 0000000000000000 
> > [  376.388077] R10: 0000000000000001 R11: 000000008dbdaba7 R12: ffff9e3e84567b80 
> > [  376.423650] R13: ffff9e44827ee000 R14: 0000000000000001 R15: 0000000000000000 
> > [  376.456573] FS:  00007f7ea6c46880(0000) GS:ffff9e3ea7a00000(0000) knlGS:0000000000000000 
> > [  376.493753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
> > [  376.520232] CR2: 0000000000000000 CR3: 0000000c3b76a004 CR4: 00000000001606e0 
> > [  376.553291] Call Trace: 
> > [  376.564479]  xlog_find_verify_log_record+0x13b/0x270 [xfs] 
> > [  376.589761]  xlog_find_head+0x1ed/0x4d0 [xfs] 
> > [  376.609787]  ? mark_held_locks+0x66/0x90 
> > [  376.627819]  xlog_find_tail+0x43/0x3a0 [xfs] 
> > [  376.647431]  ? try_to_wake_up+0x59/0x750 
> > [  376.665459]  xlog_recover+0x2d/0x170 [xfs] 
> > [  376.684250]  ? xfs_trans_ail_init+0xc7/0xf0 [xfs] 
> > [  376.706261]  xfs_log_mount+0x2b0/0x320 [xfs] 
> > [  376.726658]  xfs_mountfs+0x55c/0xaf0 [xfs] 
> > [  376.745614]  ? xfs_mru_cache_create+0x178/0x1d0 [xfs] 
> > [  376.768813]  xfs_fs_fill_super+0x4bd/0x620 [xfs] 
> > [  376.790017]  mount_bdev+0x18c/0x1c0 
> > [  376.806030]  ? xfs_test_remount_options.isra.15+0x60/0x60 [xfs] 
> > [  376.833247]  xfs_fs_mount+0x15/0x20 [xfs] 
> > [  376.851728]  mount_fs+0x39/0x150 
> > [  376.866558]  vfs_kern_mount+0x6b/0x170 
> > [  376.884623]  do_mount+0x1f0/0xd60 
> > [  376.901879]  ? memdup_user+0x42/0x60 
> > [  376.919545]  SyS_mount+0x83/0xd0 
> > [  376.936736]  do_syscall_64+0x6c/0x220 
> > [  376.954020]  entry_SYSCALL64_slow_path+0x25/0x25 
> > [  376.975265] RIP: 0033:0x7f7ea5ec0aaa 
> > [  376.991661] RSP: 002b:00007ffe777381e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 
> > [  377.026394] RAX: ffffffffffffffda RBX: 00005596b8d93080 RCX: 00007f7ea5ec0aaa 
> > [  377.059178] RDX: 00005596b8d95640 RSI: 00005596b8d93270 RDI: 00005596b8d93250 
> > [  377.092234] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000010 
> > [  377.125095] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00005596b8d93250 
> > [  377.158157] R13: 00005596b8d95640 R14: 0000000000000000 R15: 00005596b8d93080 
> > [  377.192994] Code: c0 48 c7 c7 58 13 70 c0 e8 2d 2a fe ff e9 aa fd ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 41 54 53 <81> 3e fe ed ba be 48 89 f3 75 5b 4c 8d a3 30 01 00 00 ba 10 00  
> > [  377.291657] RIP: xlog_header_check_mount+0x11/0xd0 [xfs] RSP: ffffab2bc982baf8 
> > [  377.326553] CR2: 0000000000000000 
> > [  377.342022] ---[ end trace 85d9cc5b8e738db6 ]--- 
> > 
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-10-13 13:29   ` Zorro Lang
@ 2017-10-13 18:16     ` Brian Foster
  2017-10-13 19:53       ` Brian Foster
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2017-10-13 18:16 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs, darrick.wong, david

On Fri, Oct 13, 2017 at 09:29:35PM +0800, Zorro Lang wrote:
> On Mon, Oct 02, 2017 at 09:56:18AM -0400, Brian Foster wrote:
> > On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> > > Hi,
> > > 
> > > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> > > (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> > > twice. But I can't reproduce it on another machine.
> > > 
> > > Maybe there're some hardware specific requirement to trigger this panic. I
> > > tested on normal disk partition, but the disk is multi stripes RAID device.
> > > I didn't get the mkfs output of g/085, bug I found the default mkfs output
> > > (mkfs.xfs -f /dev/sda3) is:
> > > 
> > > meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
> > >          =                       sectsz=512   attr=2, projid32bit=1
> > >          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> > > data     =                       bsize=1024   blocks=15720448, imaxpct=25
> > >          =                       sunit=512    swidth=1024 blks
> > > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > > log      =internal log           bsize=1024   blocks=10240, version=2
> > >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > > 
> > > (The test machine is not on my hand now, I need time reserve it.)
> > > 
> > 
> > If you are able to reproduce, could you provide a metadump of this fs
> > immediately after the crash?
> 
> Finally I got the machine which can reproduce this bug for 1 day. Then I
> got the XFS metadump which can trigger this bug.
> 
> Please download the metadump file by opening below link:
> https://drive.google.com/file/d/0B5dFDeCXGOPXalNuMUJNdDM3STQ/view?usp=sharing
> 
> Just mount this xfs image, then kernel will crash. I didn't do any operations
> on this XFS, just did "mkfs.xfs -b size=1024".
> 

Thanks Zorro. I can reproduce with this image. It looks like the root
problem is that a block address calculation goes wrong in
xlog_find_head():

	start_blk = log_bbnum - (num_scan_bblks - head_blk);

With log_bbnum = 3264, num_scan_bblks = 4096 and head_blk = 512,
start_blk underflows and we go off the rails from there. Aside from
addressing the crash, I think either this value and/or num_scan_bblks
need to be clamped to within the range of the log.

Brian

> Thanks,
> Zorro
> 
> > 
> > Brian
> > 
> > > Thanks,
> > > Zorro
> > > 
> > > [1]:
> > > 
> > > [  373.165020] run fstests generic/085 at 2017-09-29 10:29:32 
> > > [  373.522944] XFS (sda4): Unmounting Filesystem 
> > > [  373.700510] device-mapper: uevent: version 1.0.3 
> > > [  373.725266] device-mapper: ioctl: 4.36.0-ioctl (2017-06-09) initialised: dm-devel@redhat.com 
> > > [  374.199737] XFS (dm-0): Mounting V5 Filesystem 
> > > [  374.228642] XFS (dm-0): Ending clean mount 
> > > [  374.285479] XFS (dm-0): Unmounting Filesystem 
> > > [  374.319080] XFS (dm-0): Mounting V5 Filesystem 
> > > [  374.353123] XFS (dm-0): Ending clean mount 
> > > [  374.409625] XFS (dm-0): Unmounting Filesystem 
> > > [  374.437494] XFS (dm-0): Mounting V5 Filesystem 
> > > [  374.477124] XFS (dm-0): Ending clean mount 
> > > [  374.549775] XFS (dm-0): Unmounting Filesystem 
> > > [  374.578300] XFS (dm-0): Mounting V5 Filesystem 
> > > [  374.618208] XFS (dm-0): Ending clean mount 
> > > [  374.672593] XFS (dm-0): Unmounting Filesystem 
> > > [  374.701455] XFS (dm-0): Mounting V5 Filesystem 
> > > [  374.741861] XFS (dm-0): Ending clean mount 
> > > [  374.798972] XFS (dm-0): Unmounting Filesystem 
> > > [  374.827584] XFS (dm-0): Mounting V5 Filesystem 
> > > [  374.872622] XFS (dm-0): Ending clean mount 
> > > [  374.938045] XFS (dm-0): Unmounting Filesystem 
> > > [  374.966630] XFS (dm-0): Mounting V5 Filesystem 
> > > [  375.009748] XFS (dm-0): Ending clean mount 
> > > [  375.067006] XFS (dm-0): Unmounting Filesystem 
> > > [  375.095371] XFS (dm-0): Mounting V5 Filesystem 
> > > [  375.134992] XFS (dm-0): Ending clean mount 
> > > [  375.198436] XFS (dm-0): Unmounting Filesystem 
> > > [  375.226926] XFS (dm-0): Mounting V5 Filesystem 
> > > [  375.271643] XFS (dm-0): Ending clean mount 
> > > [  375.326618] XFS (dm-0): Unmounting Filesystem 
> > > [  375.357583] XFS (dm-0): Mounting V5 Filesystem 
> > > [  375.402952] XFS (dm-0): Ending clean mount 
> > > [  375.454747] XFS (dm-0): Unmounting Filesystem 
> > > [  375.483053] XFS (dm-0): Mounting V5 Filesystem 
> > > [  375.527584] XFS (dm-0): Ending clean mount 
> > > [  375.592113] XFS (dm-0): Unmounting Filesystem 
> > > [  375.620637] XFS (dm-0): Mounting V5 Filesystem 
> > > [  375.683969] XFS (dm-0): Invalid block length (0xfffffed8) for buffer 
> > > [  375.713282] BUG: unable to handle kernel NULL pointer dereference at           (null) 
> > > [  375.749352] IP: xlog_header_check_mount+0x11/0xd0 [xfs] 
> > > [  375.773424] PGD 0 P4D 0  
> > > [  375.784990] Oops: 0000 [#1] SMP 
> > > [  375.799382] Modules linked in: dm_mod rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ext4 mbcache jbd2 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc vfat fat aesni_intel crypto_simd glue_helper ipmi_ssif cryptd iTCO_wdt joydev hpilo iTCO_vendor_support sg ipmi_si hpwdt ipmi_devintf i2c_i801 pcspkr lpc_ich ioatdma ipmi_msghandler shpchp dca nfsd wmi acpi_power_meter auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper sd_mod syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core hpsa be2net 
> > > [  376.134079]  scsi_transport_sas 
> > > [  376.148586] CPU: 52 PID: 46126 Comm: mount Not tainted 4.14.0-rc2 #1 
> > > [  376.177733] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016 
> > > [  376.209076] task: ffff9e448206b4c0 task.stack: ffffab2bc9828000 
> > > [  376.236861] RIP: 0010:xlog_header_check_mount+0x11/0xd0 [xfs] 
> > > [  376.263261] RSP: 0018:ffffab2bc982baf8 EFLAGS: 00010246 
> > > [  376.287307] RAX: 0000000000000001 RBX: fffffffffffffed7 RCX: 0000000000000000 
> > > [  376.320119] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9e44827ee000 
> > > [  376.353016] RBP: ffffab2bc982bb10 R08: 0000000000000000 R09: 0000000000000000 
> > > [  376.388077] R10: 0000000000000001 R11: 000000008dbdaba7 R12: ffff9e3e84567b80 
> > > [  376.423650] R13: ffff9e44827ee000 R14: 0000000000000001 R15: 0000000000000000 
> > > [  376.456573] FS:  00007f7ea6c46880(0000) GS:ffff9e3ea7a00000(0000) knlGS:0000000000000000 
> > > [  376.493753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
> > > [  376.520232] CR2: 0000000000000000 CR3: 0000000c3b76a004 CR4: 00000000001606e0 
> > > [  376.553291] Call Trace: 
> > > [  376.564479]  xlog_find_verify_log_record+0x13b/0x270 [xfs] 
> > > [  376.589761]  xlog_find_head+0x1ed/0x4d0 [xfs] 
> > > [  376.609787]  ? mark_held_locks+0x66/0x90 
> > > [  376.627819]  xlog_find_tail+0x43/0x3a0 [xfs] 
> > > [  376.647431]  ? try_to_wake_up+0x59/0x750 
> > > [  376.665459]  xlog_recover+0x2d/0x170 [xfs] 
> > > [  376.684250]  ? xfs_trans_ail_init+0xc7/0xf0 [xfs] 
> > > [  376.706261]  xfs_log_mount+0x2b0/0x320 [xfs] 
> > > [  376.726658]  xfs_mountfs+0x55c/0xaf0 [xfs] 
> > > [  376.745614]  ? xfs_mru_cache_create+0x178/0x1d0 [xfs] 
> > > [  376.768813]  xfs_fs_fill_super+0x4bd/0x620 [xfs] 
> > > [  376.790017]  mount_bdev+0x18c/0x1c0 
> > > [  376.806030]  ? xfs_test_remount_options.isra.15+0x60/0x60 [xfs] 
> > > [  376.833247]  xfs_fs_mount+0x15/0x20 [xfs] 
> > > [  376.851728]  mount_fs+0x39/0x150 
> > > [  376.866558]  vfs_kern_mount+0x6b/0x170 
> > > [  376.884623]  do_mount+0x1f0/0xd60 
> > > [  376.901879]  ? memdup_user+0x42/0x60 
> > > [  376.919545]  SyS_mount+0x83/0xd0 
> > > [  376.936736]  do_syscall_64+0x6c/0x220 
> > > [  376.954020]  entry_SYSCALL64_slow_path+0x25/0x25 
> > > [  376.975265] RIP: 0033:0x7f7ea5ec0aaa 
> > > [  376.991661] RSP: 002b:00007ffe777381e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 
> > > [  377.026394] RAX: ffffffffffffffda RBX: 00005596b8d93080 RCX: 00007f7ea5ec0aaa 
> > > [  377.059178] RDX: 00005596b8d95640 RSI: 00005596b8d93270 RDI: 00005596b8d93250 
> > > [  377.092234] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000010 
> > > [  377.125095] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00005596b8d93250 
> > > [  377.158157] R13: 00005596b8d95640 R14: 0000000000000000 R15: 00005596b8d93080 
> > > [  377.192994] Code: c0 48 c7 c7 58 13 70 c0 e8 2d 2a fe ff e9 aa fd ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 41 54 53 <81> 3e fe ed ba be 48 89 f3 75 5b 4c 8d a3 30 01 00 00 ba 10 00  
> > > [  377.291657] RIP: xlog_header_check_mount+0x11/0xd0 [xfs] RSP: ffffab2bc982baf8 
> > > [  377.326553] CR2: 0000000000000000 
> > > [  377.342022] ---[ end trace 85d9cc5b8e738db6 ]--- 
> > > 
> > > 
> > > 
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-10-13 18:16     ` Brian Foster
@ 2017-10-13 19:53       ` Brian Foster
  2017-10-14 13:30         ` Zorro Lang
  2017-10-14 22:34         ` Dave Chinner
  0 siblings, 2 replies; 14+ messages in thread
From: Brian Foster @ 2017-10-13 19:53 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs, darrick.wong, david

On Fri, Oct 13, 2017 at 02:16:05PM -0400, Brian Foster wrote:
> On Fri, Oct 13, 2017 at 09:29:35PM +0800, Zorro Lang wrote:
> > On Mon, Oct 02, 2017 at 09:56:18AM -0400, Brian Foster wrote:
> > > On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> > > > Hi,
> > > > 
> > > > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> > > > (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> > > > twice. But I can't reproduce it on another machine.
> > > > 
> > > > Maybe there're some hardware specific requirement to trigger this panic. I
> > > > tested on normal disk partition, but the disk is multi stripes RAID device.
> > > > I didn't get the mkfs output of g/085, bug I found the default mkfs output
> > > > (mkfs.xfs -f /dev/sda3) is:
> > > > 
> > > > meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
> > > >          =                       sectsz=512   attr=2, projid32bit=1
> > > >          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> > > > data     =                       bsize=1024   blocks=15720448, imaxpct=25
> > > >          =                       sunit=512    swidth=1024 blks
> > > > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > > > log      =internal log           bsize=1024   blocks=10240, version=2
> > > >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > > > 
> > > > (The test machine is not on my hand now, I need time reserve it.)
> > > > 
> > > 
> > > If you are able to reproduce, could you provide a metadump of this fs
> > > immediately after the crash?
> > 
> > Finally I got the machine which can reproduce this bug for 1 day. Then I
> > got the XFS metadump which can trigger this bug.
> > 
> > Please download the metadump file by opening below link:
> > https://drive.google.com/file/d/0B5dFDeCXGOPXalNuMUJNdDM3STQ/view?usp=sharing
> > 
> > Just mount this xfs image, then kernel will crash. I didn't do any operations
> > on this XFS, just did "mkfs.xfs -b size=1024".
> > 
> 
> Thanks Zorro. I can reproduce with this image. It looks like the root
> problem is that a block address calculation goes wrong in
> xlog_find_head():
> 
> 	start_blk = log_bbnum - (num_scan_bblks - head_blk);
> 
> With log_bbnum = 3264, num_scan_bblks = 4096 and head_blk = 512,
> start_blk underflows and we go off the rails from there. Aside from
> addressing the crash, I think either this value and/or num_scan_bblks
> need to be clamped to within the range of the log.
> 

Actually Zorro, how are you creating a filesystem with such a small log?
I can't seem to create anything with a log smaller than 2MB. FWIW,
xfs_info shows the following once I work around the crash and mount the
fs:

meta-data=/dev/mapper/test-scratch isize=512    agcount=8, agsize=32256 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=1024   blocks=258048, imaxpct=25
         =                       sunit=512    swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=1024   blocks=1632, version=2
         =                       sectsz=512   sunit=32 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Brian

> Brian
> 
> > Thanks,
> > Zorro
> > 
> > > 
> > > Brian
> > > 
> > > > Thanks,
> > > > Zorro
> > > > 
> > > > [1]:
> > > > 
> > > > [  373.165020] run fstests generic/085 at 2017-09-29 10:29:32 
> > > > [  373.522944] XFS (sda4): Unmounting Filesystem 
> > > > [  373.700510] device-mapper: uevent: version 1.0.3 
> > > > [  373.725266] device-mapper: ioctl: 4.36.0-ioctl (2017-06-09) initialised: dm-devel@redhat.com 
> > > > [  374.199737] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  374.228642] XFS (dm-0): Ending clean mount 
> > > > [  374.285479] XFS (dm-0): Unmounting Filesystem 
> > > > [  374.319080] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  374.353123] XFS (dm-0): Ending clean mount 
> > > > [  374.409625] XFS (dm-0): Unmounting Filesystem 
> > > > [  374.437494] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  374.477124] XFS (dm-0): Ending clean mount 
> > > > [  374.549775] XFS (dm-0): Unmounting Filesystem 
> > > > [  374.578300] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  374.618208] XFS (dm-0): Ending clean mount 
> > > > [  374.672593] XFS (dm-0): Unmounting Filesystem 
> > > > [  374.701455] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  374.741861] XFS (dm-0): Ending clean mount 
> > > > [  374.798972] XFS (dm-0): Unmounting Filesystem 
> > > > [  374.827584] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  374.872622] XFS (dm-0): Ending clean mount 
> > > > [  374.938045] XFS (dm-0): Unmounting Filesystem 
> > > > [  374.966630] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  375.009748] XFS (dm-0): Ending clean mount 
> > > > [  375.067006] XFS (dm-0): Unmounting Filesystem 
> > > > [  375.095371] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  375.134992] XFS (dm-0): Ending clean mount 
> > > > [  375.198436] XFS (dm-0): Unmounting Filesystem 
> > > > [  375.226926] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  375.271643] XFS (dm-0): Ending clean mount 
> > > > [  375.326618] XFS (dm-0): Unmounting Filesystem 
> > > > [  375.357583] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  375.402952] XFS (dm-0): Ending clean mount 
> > > > [  375.454747] XFS (dm-0): Unmounting Filesystem 
> > > > [  375.483053] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  375.527584] XFS (dm-0): Ending clean mount 
> > > > [  375.592113] XFS (dm-0): Unmounting Filesystem 
> > > > [  375.620637] XFS (dm-0): Mounting V5 Filesystem 
> > > > [  375.683969] XFS (dm-0): Invalid block length (0xfffffed8) for buffer 
> > > > [  375.713282] BUG: unable to handle kernel NULL pointer dereference at           (null) 
> > > > [  375.749352] IP: xlog_header_check_mount+0x11/0xd0 [xfs] 
> > > > [  375.773424] PGD 0 P4D 0  
> > > > [  375.784990] Oops: 0000 [#1] SMP 
> > > > [  375.799382] Modules linked in: dm_mod rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ext4 mbcache jbd2 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc vfat fat aesni_intel crypto_simd glue_helper ipmi_ssif cryptd iTCO_wdt joydev hpilo iTCO_vendor_support sg ipmi_si hpwdt ipmi_devintf i2c_i801 pcspkr lpc_ich ioatdma ipmi_msghandler shpchp dca nfsd wmi acpi_power_meter auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper sd_mod syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core hpsa be2net 
> > > > [  376.134079]  scsi_transport_sas 
> > > > [  376.148586] CPU: 52 PID: 46126 Comm: mount Not tainted 4.14.0-rc2 #1 
> > > > [  376.177733] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016 
> > > > [  376.209076] task: ffff9e448206b4c0 task.stack: ffffab2bc9828000 
> > > > [  376.236861] RIP: 0010:xlog_header_check_mount+0x11/0xd0 [xfs] 
> > > > [  376.263261] RSP: 0018:ffffab2bc982baf8 EFLAGS: 00010246 
> > > > [  376.287307] RAX: 0000000000000001 RBX: fffffffffffffed7 RCX: 0000000000000000 
> > > > [  376.320119] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9e44827ee000 
> > > > [  376.353016] RBP: ffffab2bc982bb10 R08: 0000000000000000 R09: 0000000000000000 
> > > > [  376.388077] R10: 0000000000000001 R11: 000000008dbdaba7 R12: ffff9e3e84567b80 
> > > > [  376.423650] R13: ffff9e44827ee000 R14: 0000000000000001 R15: 0000000000000000 
> > > > [  376.456573] FS:  00007f7ea6c46880(0000) GS:ffff9e3ea7a00000(0000) knlGS:0000000000000000 
> > > > [  376.493753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
> > > > [  376.520232] CR2: 0000000000000000 CR3: 0000000c3b76a004 CR4: 00000000001606e0 
> > > > [  376.553291] Call Trace: 
> > > > [  376.564479]  xlog_find_verify_log_record+0x13b/0x270 [xfs] 
> > > > [  376.589761]  xlog_find_head+0x1ed/0x4d0 [xfs] 
> > > > [  376.609787]  ? mark_held_locks+0x66/0x90 
> > > > [  376.627819]  xlog_find_tail+0x43/0x3a0 [xfs] 
> > > > [  376.647431]  ? try_to_wake_up+0x59/0x750 
> > > > [  376.665459]  xlog_recover+0x2d/0x170 [xfs] 
> > > > [  376.684250]  ? xfs_trans_ail_init+0xc7/0xf0 [xfs] 
> > > > [  376.706261]  xfs_log_mount+0x2b0/0x320 [xfs] 
> > > > [  376.726658]  xfs_mountfs+0x55c/0xaf0 [xfs] 
> > > > [  376.745614]  ? xfs_mru_cache_create+0x178/0x1d0 [xfs] 
> > > > [  376.768813]  xfs_fs_fill_super+0x4bd/0x620 [xfs] 
> > > > [  376.790017]  mount_bdev+0x18c/0x1c0 
> > > > [  376.806030]  ? xfs_test_remount_options.isra.15+0x60/0x60 [xfs] 
> > > > [  376.833247]  xfs_fs_mount+0x15/0x20 [xfs] 
> > > > [  376.851728]  mount_fs+0x39/0x150 
> > > > [  376.866558]  vfs_kern_mount+0x6b/0x170 
> > > > [  376.884623]  do_mount+0x1f0/0xd60 
> > > > [  376.901879]  ? memdup_user+0x42/0x60 
> > > > [  376.919545]  SyS_mount+0x83/0xd0 
> > > > [  376.936736]  do_syscall_64+0x6c/0x220 
> > > > [  376.954020]  entry_SYSCALL64_slow_path+0x25/0x25 
> > > > [  376.975265] RIP: 0033:0x7f7ea5ec0aaa 
> > > > [  376.991661] RSP: 002b:00007ffe777381e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 
> > > > [  377.026394] RAX: ffffffffffffffda RBX: 00005596b8d93080 RCX: 00007f7ea5ec0aaa 
> > > > [  377.059178] RDX: 00005596b8d95640 RSI: 00005596b8d93270 RDI: 00005596b8d93250 
> > > > [  377.092234] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000010 
> > > > [  377.125095] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00005596b8d93250 
> > > > [  377.158157] R13: 00005596b8d95640 R14: 0000000000000000 R15: 00005596b8d93080 
> > > > [  377.192994] Code: c0 48 c7 c7 58 13 70 c0 e8 2d 2a fe ff e9 aa fd ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 41 54 53 <81> 3e fe ed ba be 48 89 f3 75 5b 4c 8d a3 30 01 00 00 ba 10 00  
> > > > [  377.291657] RIP: xlog_header_check_mount+0x11/0xd0 [xfs] RSP: ffffab2bc982baf8 
> > > > [  377.326553] CR2: 0000000000000000 
> > > > [  377.342022] ---[ end trace 85d9cc5b8e738db6 ]--- 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-10-13 19:53       ` Brian Foster
@ 2017-10-14 13:30         ` Zorro Lang
  2017-10-14 22:34         ` Dave Chinner
  1 sibling, 0 replies; 14+ messages in thread
From: Zorro Lang @ 2017-10-14 13:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, darrick.wong, david

On Fri, Oct 13, 2017 at 03:53:35PM -0400, Brian Foster wrote:
> On Fri, Oct 13, 2017 at 02:16:05PM -0400, Brian Foster wrote:
> > On Fri, Oct 13, 2017 at 09:29:35PM +0800, Zorro Lang wrote:
> > > On Mon, Oct 02, 2017 at 09:56:18AM -0400, Brian Foster wrote:
> > > > On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> > > > > Hi,
> > > > > 
> > > > > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> > > > > (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> > > > > twice. But I can't reproduce it on another machine.
> > > > > 
> > > > > Maybe there're some hardware specific requirement to trigger this panic. I
> > > > > tested on normal disk partition, but the disk is multi stripes RAID device.
> > > > > I didn't get the mkfs output of g/085, bug I found the default mkfs output
> > > > > (mkfs.xfs -f /dev/sda3) is:
> > > > > 
> > > > > meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
> > > > >          =                       sectsz=512   attr=2, projid32bit=1
> > > > >          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> > > > > data     =                       bsize=1024   blocks=15720448, imaxpct=25
> > > > >          =                       sunit=512    swidth=1024 blks
> > > > > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > > > > log      =internal log           bsize=1024   blocks=10240, version=2
> > > > >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > > > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > > > > 
> > > > > (The test machine is not on my hand now, I need time reserve it.)
> > > > > 
> > > > 
> > > > If you are able to reproduce, could you provide a metadump of this fs
> > > > immediately after the crash?
> > > 
> > > Finally I got the machine which can reproduce this bug for 1 day. Then I
> > > got the XFS metadump which can trigger this bug.
> > > 
> > > Please download the metadump file by opening below link:
> > > https://drive.google.com/file/d/0B5dFDeCXGOPXalNuMUJNdDM3STQ/view?usp=sharing
> > > 
> > > Just mount this xfs image, then kernel will crash. I didn't do any operations
> > > on this XFS, just did "mkfs.xfs -b size=1024".
> > > 
> > 
> > Thanks Zorro. I can reproduce with this image. It looks like the root
> > problem is that a block address calculation goes wrong in
> > xlog_find_head():
> > 
> > 	start_blk = log_bbnum - (num_scan_bblks - head_blk);
> > 
> > With log_bbnum = 3264, num_scan_bblks = 4096 and head_blk = 512,
> > start_blk underflows and we go off the rails from there. Aside from
> > addressing the crash, I think either this value and/or num_scan_bblks
> > need to be clamped to within the range of the log.
> > 
> 
> Actually Zorro, how are you creating a filesystem with such a small log?
> I can't seem to create anything with a log smaller than 2MB. FWIW,

Hi Brian,

I found this panic by run g/085 on my test machine. Then I got this metadump
file only by:

MKFS_OPTIONS="-b size=1024"
size=$((256 * 1024 * 1024))
_scratch_mkfs_sized $size
xfs_metadump -o $SCRATCH_DEV /path/to/mymetadumpfile

4k block size can't reproduce it on my test machine, but 1k can. So I think
maybe mkfs.xfs make something wrong?

Thanks,
Zorro

> xfs_info shows the following once I work around the crash and mount the
> fs:
> 
> meta-data=/dev/mapper/test-scratch isize=512    agcount=8, agsize=32256 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1 spinodes=0 rmapbt=0
>          =                       reflink=0
> data     =                       bsize=1024   blocks=258048, imaxpct=25
>          =                       sunit=512    swidth=1024 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal               bsize=1024   blocks=1632, version=2
>          =                       sectsz=512   sunit=32 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> Brian
> 
> > Brian
> > 
> > > Thanks,
> > > Zorro
> > > 
> > > > 
> > > > Brian
> > > > 
> > > > > Thanks,
> > > > > Zorro
> > > > > 
> > > > > [1]:
> > > > > 
> > > > > [  373.165020] run fstests generic/085 at 2017-09-29 10:29:32 
> > > > > [  373.522944] XFS (sda4): Unmounting Filesystem 
> > > > > [  373.700510] device-mapper: uevent: version 1.0.3 
> > > > > [  373.725266] device-mapper: ioctl: 4.36.0-ioctl (2017-06-09) initialised: dm-devel@redhat.com 
> > > > > [  374.199737] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  374.228642] XFS (dm-0): Ending clean mount 
> > > > > [  374.285479] XFS (dm-0): Unmounting Filesystem 
> > > > > [  374.319080] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  374.353123] XFS (dm-0): Ending clean mount 
> > > > > [  374.409625] XFS (dm-0): Unmounting Filesystem 
> > > > > [  374.437494] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  374.477124] XFS (dm-0): Ending clean mount 
> > > > > [  374.549775] XFS (dm-0): Unmounting Filesystem 
> > > > > [  374.578300] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  374.618208] XFS (dm-0): Ending clean mount 
> > > > > [  374.672593] XFS (dm-0): Unmounting Filesystem 
> > > > > [  374.701455] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  374.741861] XFS (dm-0): Ending clean mount 
> > > > > [  374.798972] XFS (dm-0): Unmounting Filesystem 
> > > > > [  374.827584] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  374.872622] XFS (dm-0): Ending clean mount 
> > > > > [  374.938045] XFS (dm-0): Unmounting Filesystem 
> > > > > [  374.966630] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  375.009748] XFS (dm-0): Ending clean mount 
> > > > > [  375.067006] XFS (dm-0): Unmounting Filesystem 
> > > > > [  375.095371] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  375.134992] XFS (dm-0): Ending clean mount 
> > > > > [  375.198436] XFS (dm-0): Unmounting Filesystem 
> > > > > [  375.226926] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  375.271643] XFS (dm-0): Ending clean mount 
> > > > > [  375.326618] XFS (dm-0): Unmounting Filesystem 
> > > > > [  375.357583] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  375.402952] XFS (dm-0): Ending clean mount 
> > > > > [  375.454747] XFS (dm-0): Unmounting Filesystem 
> > > > > [  375.483053] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  375.527584] XFS (dm-0): Ending clean mount 
> > > > > [  375.592113] XFS (dm-0): Unmounting Filesystem 
> > > > > [  375.620637] XFS (dm-0): Mounting V5 Filesystem 
> > > > > [  375.683969] XFS (dm-0): Invalid block length (0xfffffed8) for buffer 
> > > > > [  375.713282] BUG: unable to handle kernel NULL pointer dereference at           (null) 
> > > > > [  375.749352] IP: xlog_header_check_mount+0x11/0xd0 [xfs] 
> > > > > [  375.773424] PGD 0 P4D 0  
> > > > > [  375.784990] Oops: 0000 [#1] SMP 
> > > > > [  375.799382] Modules linked in: dm_mod rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ext4 mbcache jbd2 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc vfat fat aesni_intel crypto_simd glue_helper ipmi_ssif cryptd iTCO_wdt joydev hpilo iTCO_vendor_support sg ipmi_si hpwdt ipmi_devintf i2c_i801 pcspkr lpc_ich ioatdma ipmi_msghandler shpchp dca nfsd wmi acpi_power_meter auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper sd_mod syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core hpsa be2net 
> > > > > [  376.134079]  scsi_transport_sas 
> > > > > [  376.148586] CPU: 52 PID: 46126 Comm: mount Not tainted 4.14.0-rc2 #1 
> > > > > [  376.177733] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016 
> > > > > [  376.209076] task: ffff9e448206b4c0 task.stack: ffffab2bc9828000 
> > > > > [  376.236861] RIP: 0010:xlog_header_check_mount+0x11/0xd0 [xfs] 
> > > > > [  376.263261] RSP: 0018:ffffab2bc982baf8 EFLAGS: 00010246 
> > > > > [  376.287307] RAX: 0000000000000001 RBX: fffffffffffffed7 RCX: 0000000000000000 
> > > > > [  376.320119] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9e44827ee000 
> > > > > [  376.353016] RBP: ffffab2bc982bb10 R08: 0000000000000000 R09: 0000000000000000 
> > > > > [  376.388077] R10: 0000000000000001 R11: 000000008dbdaba7 R12: ffff9e3e84567b80 
> > > > > [  376.423650] R13: ffff9e44827ee000 R14: 0000000000000001 R15: 0000000000000000 
> > > > > [  376.456573] FS:  00007f7ea6c46880(0000) GS:ffff9e3ea7a00000(0000) knlGS:0000000000000000 
> > > > > [  376.493753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
> > > > > [  376.520232] CR2: 0000000000000000 CR3: 0000000c3b76a004 CR4: 00000000001606e0 
> > > > > [  376.553291] Call Trace: 
> > > > > [  376.564479]  xlog_find_verify_log_record+0x13b/0x270 [xfs] 
> > > > > [  376.589761]  xlog_find_head+0x1ed/0x4d0 [xfs] 
> > > > > [  376.609787]  ? mark_held_locks+0x66/0x90 
> > > > > [  376.627819]  xlog_find_tail+0x43/0x3a0 [xfs] 
> > > > > [  376.647431]  ? try_to_wake_up+0x59/0x750 
> > > > > [  376.665459]  xlog_recover+0x2d/0x170 [xfs] 
> > > > > [  376.684250]  ? xfs_trans_ail_init+0xc7/0xf0 [xfs] 
> > > > > [  376.706261]  xfs_log_mount+0x2b0/0x320 [xfs] 
> > > > > [  376.726658]  xfs_mountfs+0x55c/0xaf0 [xfs] 
> > > > > [  376.745614]  ? xfs_mru_cache_create+0x178/0x1d0 [xfs] 
> > > > > [  376.768813]  xfs_fs_fill_super+0x4bd/0x620 [xfs] 
> > > > > [  376.790017]  mount_bdev+0x18c/0x1c0 
> > > > > [  376.806030]  ? xfs_test_remount_options.isra.15+0x60/0x60 [xfs] 
> > > > > [  376.833247]  xfs_fs_mount+0x15/0x20 [xfs] 
> > > > > [  376.851728]  mount_fs+0x39/0x150 
> > > > > [  376.866558]  vfs_kern_mount+0x6b/0x170 
> > > > > [  376.884623]  do_mount+0x1f0/0xd60 
> > > > > [  376.901879]  ? memdup_user+0x42/0x60 
> > > > > [  376.919545]  SyS_mount+0x83/0xd0 
> > > > > [  376.936736]  do_syscall_64+0x6c/0x220 
> > > > > [  376.954020]  entry_SYSCALL64_slow_path+0x25/0x25 
> > > > > [  376.975265] RIP: 0033:0x7f7ea5ec0aaa 
> > > > > [  376.991661] RSP: 002b:00007ffe777381e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 
> > > > > [  377.026394] RAX: ffffffffffffffda RBX: 00005596b8d93080 RCX: 00007f7ea5ec0aaa 
> > > > > [  377.059178] RDX: 00005596b8d95640 RSI: 00005596b8d93270 RDI: 00005596b8d93250 
> > > > > [  377.092234] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000010 
> > > > > [  377.125095] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00005596b8d93250 
> > > > > [  377.158157] R13: 00005596b8d95640 R14: 0000000000000000 R15: 00005596b8d93080 
> > > > > [  377.192994] Code: c0 48 c7 c7 58 13 70 c0 e8 2d 2a fe ff e9 aa fd ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 41 54 53 <81> 3e fe ed ba be 48 89 f3 75 5b 4c 8d a3 30 01 00 00 ba 10 00  
> > > > > [  377.291657] RIP: xlog_header_check_mount+0x11/0xd0 [xfs] RSP: ffffab2bc982baf8 
> > > > > [  377.326553] CR2: 0000000000000000 
> > > > > [  377.342022] ---[ end trace 85d9cc5b8e738db6 ]--- 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-10-13 19:53       ` Brian Foster
  2017-10-14 13:30         ` Zorro Lang
@ 2017-10-14 22:34         ` Dave Chinner
  2017-10-16 10:09           ` Brian Foster
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2017-10-14 22:34 UTC (permalink / raw)
  To: Brian Foster; +Cc: Zorro Lang, linux-xfs, darrick.wong

On Fri, Oct 13, 2017 at 03:53:35PM -0400, Brian Foster wrote:
> On Fri, Oct 13, 2017 at 02:16:05PM -0400, Brian Foster wrote:
> > On Fri, Oct 13, 2017 at 09:29:35PM +0800, Zorro Lang wrote:
> > > On Mon, Oct 02, 2017 at 09:56:18AM -0400, Brian Foster wrote:
> > > > On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> > > > > Hi,
> > > > > 
> > > > > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> > > > > (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> > > > > twice. But I can't reproduce it on another machine.
> > > > > 
> > > > > Maybe there're some hardware specific requirement to trigger this panic. I
> > > > > tested on normal disk partition, but the disk is multi stripes RAID device.
> > > > > I didn't get the mkfs output of g/085, bug I found the default mkfs output
> > > > > (mkfs.xfs -f /dev/sda3) is:
> > > > > 
> > > > > meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
> > > > >          =                       sectsz=512   attr=2, projid32bit=1
> > > > >          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> > > > > data     =                       bsize=1024   blocks=15720448, imaxpct=25
> > > > >          =                       sunit=512    swidth=1024 blks
> > > > > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > > > > log      =internal log           bsize=1024   blocks=10240, version=2
> > > > >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > > > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > > > > 
> > > > > (The test machine is not on my hand now, I need time reserve it.)
> > > > > 
> > > > 
> > > > If you are able to reproduce, could you provide a metadump of this fs
> > > > immediately after the crash?
> > > 
> > > Finally I got the machine which can reproduce this bug for 1 day. Then I
> > > got the XFS metadump which can trigger this bug.
> > > 
> > > Please download the metadump file by opening below link:
> > > https://drive.google.com/file/d/0B5dFDeCXGOPXalNuMUJNdDM3STQ/view?usp=sharing
> > > 
> > > Just mount this xfs image, then kernel will crash. I didn't do any operations
> > > on this XFS, just did "mkfs.xfs -b size=1024".
> > > 
> > 
> > Thanks Zorro. I can reproduce with this image. It looks like the root
> > problem is that a block address calculation goes wrong in
> > xlog_find_head():
> > 
> > 	start_blk = log_bbnum - (num_scan_bblks - head_blk);
> > 
> > With log_bbnum = 3264, num_scan_bblks = 4096 and head_blk = 512,
> > start_blk underflows and we go off the rails from there. Aside from
> > addressing the crash, I think either this value and/or num_scan_bblks
> > need to be clamped to within the range of the log.
> > 
> 
> Actually Zorro, how are you creating a filesystem with such a small log?
> I can't seem to create anything with a log smaller than 2MB. FWIW,
> xfs_info shows the following once I work around the crash and mount the
> fs:
> 
> meta-data=/dev/mapper/test-scratch isize=512    agcount=8, agsize=32256 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1 spinodes=0 rmapbt=0
>          =                       reflink=0
> data     =                       bsize=1024   blocks=258048, imaxpct=25
>          =                       sunit=512    swidth=1024 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal               bsize=1024   blocks=1632, version=2
>          =                       sectsz=512   sunit=32 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0

THis is one of the issues I came across with my mkfs refactoring.

The problem is the block size is 1k, not 4k, and there's a check
somewhere against the number of log blocks rather than bytes, and
so you can get a log smaller than the 2MB window that log recovery
expects from 8x256k log buffers....

i.e. somewhere in mkfs we need to clamp the minimum log size to
something greater than 2MB. I didn't get to the bottom of it - I
fixed the option parsing bug that caused it and the log went to
someting like 4.5MB instead of 1.6MB....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-10-14 22:34         ` Dave Chinner
@ 2017-10-16 10:09           ` Brian Foster
  2017-10-16 19:11             ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2017-10-16 10:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Zorro Lang, linux-xfs, darrick.wong

On Sun, Oct 15, 2017 at 09:34:47AM +1100, Dave Chinner wrote:
> On Fri, Oct 13, 2017 at 03:53:35PM -0400, Brian Foster wrote:
> > On Fri, Oct 13, 2017 at 02:16:05PM -0400, Brian Foster wrote:
> > > On Fri, Oct 13, 2017 at 09:29:35PM +0800, Zorro Lang wrote:
> > > > On Mon, Oct 02, 2017 at 09:56:18AM -0400, Brian Foster wrote:
> > > > > On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> > > > > > (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> > > > > > twice. But I can't reproduce it on another machine.
> > > > > > 
> > > > > > Maybe there're some hardware specific requirement to trigger this panic. I
> > > > > > tested on normal disk partition, but the disk is multi stripes RAID device.
> > > > > > I didn't get the mkfs output of g/085, bug I found the default mkfs output
> > > > > > (mkfs.xfs -f /dev/sda3) is:
> > > > > > 
> > > > > > meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
> > > > > >          =                       sectsz=512   attr=2, projid32bit=1
> > > > > >          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> > > > > > data     =                       bsize=1024   blocks=15720448, imaxpct=25
> > > > > >          =                       sunit=512    swidth=1024 blks
> > > > > > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > > > > > log      =internal log           bsize=1024   blocks=10240, version=2
> > > > > >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > > > > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > > > > > 
> > > > > > (The test machine is not on my hand now, I need time reserve it.)
> > > > > > 
> > > > > 
> > > > > If you are able to reproduce, could you provide a metadump of this fs
> > > > > immediately after the crash?
> > > > 
> > > > Finally I got the machine which can reproduce this bug for 1 day. Then I
> > > > got the XFS metadump which can trigger this bug.
> > > > 
> > > > Please download the metadump file by opening below link:
> > > > https://drive.google.com/file/d/0B5dFDeCXGOPXalNuMUJNdDM3STQ/view?usp=sharing
> > > > 
> > > > Just mount this xfs image, then kernel will crash. I didn't do any operations
> > > > on this XFS, just did "mkfs.xfs -b size=1024".
> > > > 
> > > 
> > > Thanks Zorro. I can reproduce with this image. It looks like the root
> > > problem is that a block address calculation goes wrong in
> > > xlog_find_head():
> > > 
> > > 	start_blk = log_bbnum - (num_scan_bblks - head_blk);
> > > 
> > > With log_bbnum = 3264, num_scan_bblks = 4096 and head_blk = 512,
> > > start_blk underflows and we go off the rails from there. Aside from
> > > addressing the crash, I think either this value and/or num_scan_bblks
> > > need to be clamped to within the range of the log.
> > > 
> > 
> > Actually Zorro, how are you creating a filesystem with such a small log?
> > I can't seem to create anything with a log smaller than 2MB. FWIW,
> > xfs_info shows the following once I work around the crash and mount the
> > fs:
> > 
> > meta-data=/dev/mapper/test-scratch isize=512    agcount=8, agsize=32256 blks
> >          =                       sectsz=512   attr=2, projid32bit=1
> >          =                       crc=1        finobt=1 spinodes=0 rmapbt=0
> >          =                       reflink=0
> > data     =                       bsize=1024   blocks=258048, imaxpct=25
> >          =                       sunit=512    swidth=1024 blks
> > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > log      =internal               bsize=1024   blocks=1632, version=2
> >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> THis is one of the issues I came across with my mkfs refactoring.
> 
> The problem is the block size is 1k, not 4k, and there's a check
> somewhere against the number of log blocks rather than bytes, and
> so you can get a log smaller than the 2MB window that log recovery
> expects from 8x256k log buffers....
> 
> i.e. somewhere in mkfs we need to clamp the minimum log size to
> something greater than 2MB. I didn't get to the bottom of it - I
> fixed the option parsing bug that caused it and the log went to
> someting like 4.5MB instead of 1.6MB....
> 

Ok, so it sounds like that is the root cause and is fixed by the mkfs
rework. I have a couple patches laying around that fix up the
calculation and add some error checks to prevent the kernel crash, but
this has me wondering whether we should fail to mount the fs due to the
geometry. Thoughts?

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-10-16 10:09           ` Brian Foster
@ 2017-10-16 19:11             ` Darrick J. Wong
  2017-10-16 22:26               ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2017-10-16 19:11 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, Zorro Lang, linux-xfs

On Mon, Oct 16, 2017 at 06:09:04AM -0400, Brian Foster wrote:
> On Sun, Oct 15, 2017 at 09:34:47AM +1100, Dave Chinner wrote:
> > On Fri, Oct 13, 2017 at 03:53:35PM -0400, Brian Foster wrote:
> > > On Fri, Oct 13, 2017 at 02:16:05PM -0400, Brian Foster wrote:
> > > > On Fri, Oct 13, 2017 at 09:29:35PM +0800, Zorro Lang wrote:
> > > > > On Mon, Oct 02, 2017 at 09:56:18AM -0400, Brian Foster wrote:
> > > > > > On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> > > > > > > (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> > > > > > > twice. But I can't reproduce it on another machine.
> > > > > > > 
> > > > > > > Maybe there're some hardware specific requirement to trigger this panic. I
> > > > > > > tested on normal disk partition, but the disk is multi stripes RAID device.
> > > > > > > I didn't get the mkfs output of g/085, bug I found the default mkfs output
> > > > > > > (mkfs.xfs -f /dev/sda3) is:
> > > > > > > 
> > > > > > > meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
> > > > > > >          =                       sectsz=512   attr=2, projid32bit=1
> > > > > > >          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> > > > > > > data     =                       bsize=1024   blocks=15720448, imaxpct=25
> > > > > > >          =                       sunit=512    swidth=1024 blks
> > > > > > > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > > > > > > log      =internal log           bsize=1024   blocks=10240, version=2
> > > > > > >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > > > > > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > > > > > > 
> > > > > > > (The test machine is not on my hand now, I need time reserve it.)
> > > > > > > 
> > > > > > 
> > > > > > If you are able to reproduce, could you provide a metadump of this fs
> > > > > > immediately after the crash?
> > > > > 
> > > > > Finally I got the machine which can reproduce this bug for 1 day. Then I
> > > > > got the XFS metadump which can trigger this bug.
> > > > > 
> > > > > Please download the metadump file by opening below link:
> > > > > https://drive.google.com/file/d/0B5dFDeCXGOPXalNuMUJNdDM3STQ/view?usp=sharing
> > > > > 
> > > > > Just mount this xfs image, then kernel will crash. I didn't do any operations
> > > > > on this XFS, just did "mkfs.xfs -b size=1024".
> > > > > 
> > > > 
> > > > Thanks Zorro. I can reproduce with this image. It looks like the root
> > > > problem is that a block address calculation goes wrong in
> > > > xlog_find_head():
> > > > 
> > > > 	start_blk = log_bbnum - (num_scan_bblks - head_blk);
> > > > 
> > > > With log_bbnum = 3264, num_scan_bblks = 4096 and head_blk = 512,
> > > > start_blk underflows and we go off the rails from there. Aside from
> > > > addressing the crash, I think either this value and/or num_scan_bblks
> > > > need to be clamped to within the range of the log.
> > > > 
> > > 
> > > Actually Zorro, how are you creating a filesystem with such a small log?
> > > I can't seem to create anything with a log smaller than 2MB. FWIW,
> > > xfs_info shows the following once I work around the crash and mount the
> > > fs:
> > > 
> > > meta-data=/dev/mapper/test-scratch isize=512    agcount=8, agsize=32256 blks
> > >          =                       sectsz=512   attr=2, projid32bit=1
> > >          =                       crc=1        finobt=1 spinodes=0 rmapbt=0
> > >          =                       reflink=0
> > > data     =                       bsize=1024   blocks=258048, imaxpct=25
> > >          =                       sunit=512    swidth=1024 blks
> > > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > > log      =internal               bsize=1024   blocks=1632, version=2
> > >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > 
> > THis is one of the issues I came across with my mkfs refactoring.
> > 
> > The problem is the block size is 1k, not 4k, and there's a check
> > somewhere against the number of log blocks rather than bytes, and
> > so you can get a log smaller than the 2MB window that log recovery
> > expects from 8x256k log buffers....
> > 
> > i.e. somewhere in mkfs we need to clamp the minimum log size to
> > something greater than 2MB. I didn't get to the bottom of it - I
> > fixed the option parsing bug that caused it and the log went to
> > someting like 4.5MB instead of 1.6MB....
> > 
> 
> Ok, so it sounds like that is the root cause and is fixed by the mkfs
> rework. I have a couple patches laying around that fix up the
> calculation and add some error checks to prevent the kernel crash, but
> this has me wondering whether we should fail to mount the fs due to the
> geometry. Thoughts?

Failing the mount sounds ok to me, but do we have any other options to
deal with undersized logs?  Is there a scenario where we /could/ mount
an undersized log and not blow up, either immediately or later on?

--D

> 
> Brian
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-10-16 19:11             ` Darrick J. Wong
@ 2017-10-16 22:26               ` Dave Chinner
  2017-10-18 12:05                 ` Brian Foster
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2017-10-16 22:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, Zorro Lang, linux-xfs

On Mon, Oct 16, 2017 at 12:11:47PM -0700, Darrick J. Wong wrote:
> On Mon, Oct 16, 2017 at 06:09:04AM -0400, Brian Foster wrote:
> > On Sun, Oct 15, 2017 at 09:34:47AM +1100, Dave Chinner wrote:
> > > On Fri, Oct 13, 2017 at 03:53:35PM -0400, Brian Foster wrote:
> > > > On Fri, Oct 13, 2017 at 02:16:05PM -0400, Brian Foster wrote:
> > > > > On Fri, Oct 13, 2017 at 09:29:35PM +0800, Zorro Lang wrote:
> > > > > > On Mon, Oct 02, 2017 at 09:56:18AM -0400, Brian Foster wrote:
> > > > > > > On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2
> > > > > > > > (with xfsprogs 4.13.1), and I can reproduce it on the same machine
> > > > > > > > twice. But I can't reproduce it on another machine.
> > > > > > > > 
> > > > > > > > Maybe there're some hardware specific requirement to trigger this panic. I
> > > > > > > > tested on normal disk partition, but the disk is multi stripes RAID device.
> > > > > > > > I didn't get the mkfs output of g/085, bug I found the default mkfs output
> > > > > > > > (mkfs.xfs -f /dev/sda3) is:
> > > > > > > > 
> > > > > > > > meta-data=/dev/sda3              isize=512    agcount=16, agsize=982528 blks
> > > > > > > >          =                       sectsz=512   attr=2, projid32bit=1
> > > > > > > >          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
> > > > > > > > data     =                       bsize=1024   blocks=15720448, imaxpct=25
> > > > > > > >          =                       sunit=512    swidth=1024 blks
> > > > > > > > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > > > > > > > log      =internal log           bsize=1024   blocks=10240, version=2
> > > > > > > >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > > > > > > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > > > > > > > 
> > > > > > > > (The test machine is not on my hand now, I need time reserve it.)
> > > > > > > > 
> > > > > > > 
> > > > > > > If you are able to reproduce, could you provide a metadump of this fs
> > > > > > > immediately after the crash?
> > > > > > 
> > > > > > Finally I got the machine which can reproduce this bug for 1 day. Then I
> > > > > > got the XFS metadump which can trigger this bug.
> > > > > > 
> > > > > > Please download the metadump file by opening below link:
> > > > > > https://drive.google.com/file/d/0B5dFDeCXGOPXalNuMUJNdDM3STQ/view?usp=sharing
> > > > > > 
> > > > > > Just mount this xfs image, then kernel will crash. I didn't do any operations
> > > > > > on this XFS, just did "mkfs.xfs -b size=1024".
> > > > > > 
> > > > > 
> > > > > Thanks Zorro. I can reproduce with this image. It looks like the root
> > > > > problem is that a block address calculation goes wrong in
> > > > > xlog_find_head():
> > > > > 
> > > > > 	start_blk = log_bbnum - (num_scan_bblks - head_blk);
> > > > > 
> > > > > With log_bbnum = 3264, num_scan_bblks = 4096 and head_blk = 512,
> > > > > start_blk underflows and we go off the rails from there. Aside from
> > > > > addressing the crash, I think either this value and/or num_scan_bblks
> > > > > need to be clamped to within the range of the log.
> > > > > 
> > > > 
> > > > Actually Zorro, how are you creating a filesystem with such a small log?
> > > > I can't seem to create anything with a log smaller than 2MB. FWIW,
> > > > xfs_info shows the following once I work around the crash and mount the
> > > > fs:
> > > > 
> > > > meta-data=/dev/mapper/test-scratch isize=512    agcount=8, agsize=32256 blks
> > > >          =                       sectsz=512   attr=2, projid32bit=1
> > > >          =                       crc=1        finobt=1 spinodes=0 rmapbt=0
> > > >          =                       reflink=0
> > > > data     =                       bsize=1024   blocks=258048, imaxpct=25
> > > >          =                       sunit=512    swidth=1024 blks
> > > > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > > > log      =internal               bsize=1024   blocks=1632, version=2
> > > >          =                       sectsz=512   sunit=32 blks, lazy-count=1
> > > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > > 
> > > THis is one of the issues I came across with my mkfs refactoring.
> > > 
> > > The problem is the block size is 1k, not 4k, and there's a check
> > > somewhere against the number of log blocks rather than bytes, and
> > > so you can get a log smaller than the 2MB window that log recovery
> > > expects from 8x256k log buffers....
> > > 
> > > i.e. somewhere in mkfs we need to clamp the minimum log size to
> > > something greater than 2MB. I didn't get to the bottom of it - I
> > > fixed the option parsing bug that caused it and the log went to
> > > someting like 4.5MB instead of 1.6MB....
> > 
> > Ok, so it sounds like that is the root cause and is fixed by the mkfs
> > rework.

For the cases I came across. I haven't solved the root problem in
mkfs yet, so this could still occur. FYI, there was some interesting
and unexpected interactions with log stripe units that triggered it,
and I note the above filesystem has a log stripe unit set. So it's
likely I haven't solved all the issues yet.

The problem most likely is based in confusion around these
definitions for log size (in xfs_fs.h):

/*
 * Minimum and maximum sizes need for growth checks.
 *
 * Block counts are in units of filesystem blocks, not basic blocks.
 */
#define XFS_MIN_AG_BLOCKS       64
#define XFS_MIN_LOG_BLOCKS      512ULL
#define XFS_MAX_LOG_BLOCKS      (1024 * 1024ULL)
#define XFS_MIN_LOG_BYTES       (10 * 1024 * 1024ULL)

The log size limits need to be rationalised and moved to
xfs_log_format.h, and mkfs needs to be made to enforce them
consistently regardless of block size.

> > I have a couple patches laying around that fix up the
> > calculation and add some error checks to prevent the kernel crash, but
> > this has me wondering whether we should fail to mount the fs due to the
> > geometry. Thoughts?

Unfortunately, I don't think that's an option - there will be
filesysetms out there that the geometry checks fail and refuse to
mount with a new kernel, despite it working without problems for
years....

> Failing the mount sounds ok to me, but do we have any other options to
> deal with undersized logs?  Is there a scenario where we /could/ mount
> an undersized log and not blow up, either immediately or later on?

In this case, I think the problem is the 2MB window we consider to
be the worst case "partially written" window at the head of the log.
i.e.  8x256k log buffers.  We assume that this is always the worst
case, because the previous mount could have been using that.

This assumption has always been used for simplicity. i.e. worst case
is assumed in place of detecting the previous mount's log buffer
size from the log records written to the log. If we read the log and
determine the maximum record size, we know exactly what the worst
case dirty region is going to be.

I think the kernel side defense needs to be some combination of these
three things:

	1. check what the log record sizes we see during the tail
	search and trim the "dirty head" window to suit.

	2. If the dirty window is larger than the log, then trim
	it to search the entire log rather than overrun into
	negative block offsets like we do now.

	3. in the case of tiny logs, we shouldn't even be allowing
	users to mount with logbsize * logbufs > log size / 2.  Then
	we can size the initial dirty window appropiately based on
	the log size....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2
  2017-10-16 22:26               ` Dave Chinner
@ 2017-10-18 12:05                 ` Brian Foster
  0 siblings, 0 replies; 14+ messages in thread
From: Brian Foster @ 2017-10-18 12:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, Zorro Lang, linux-xfs

On Tue, Oct 17, 2017 at 09:26:04AM +1100, Dave Chinner wrote:
...
> 
> For the cases I came across. I haven't solved the root problem in
> mkfs yet, so this could still occur. FYI, there was some interesting
> and unexpected interactions with log stripe units that triggered it,
> and I note the above filesystem has a log stripe unit set. So it's
> likely I haven't solved all the issues yet.
> 

Ok.

> The problem most likely is based in confusion around these
> definitions for log size (in xfs_fs.h):
> 
> /*
>  * Minimum and maximum sizes need for growth checks.
>  *
>  * Block counts are in units of filesystem blocks, not basic blocks.
>  */
> #define XFS_MIN_AG_BLOCKS       64
> #define XFS_MIN_LOG_BLOCKS      512ULL
> #define XFS_MAX_LOG_BLOCKS      (1024 * 1024ULL)
> #define XFS_MIN_LOG_BYTES       (10 * 1024 * 1024ULL)
> 
> The log size limits need to be rationalised and moved to
> xfs_log_format.h, and mkfs needs to be made to enforce them
> consistently regardless of block size.
> 
> > > I have a couple patches laying around that fix up the
> > > calculation and add some error checks to prevent the kernel crash, but
> > > this has me wondering whether we should fail to mount the fs due to the
> > > geometry. Thoughts?
> 
> Unfortunately, I don't think that's an option - there will be
> filesysetms out there that the geometry checks fail and refuse to
> mount with a new kernel, despite it working without problems for
> years....
> 

That's kind of what I was afraid of. :P

> > Failing the mount sounds ok to me, but do we have any other options to
> > deal with undersized logs?  Is there a scenario where we /could/ mount
> > an undersized log and not blow up, either immediately or later on?
> 
> In this case, I think the problem is the 2MB window we consider to
> be the worst case "partially written" window at the head of the log.
> i.e.  8x256k log buffers.  We assume that this is always the worst
> case, because the previous mount could have been using that.
> 
> This assumption has always been used for simplicity. i.e. worst case
> is assumed in place of detecting the previous mount's log buffer
> size from the log records written to the log. If we read the log and
> determine the maximum record size, we know exactly what the worst
> case dirty region is going to be.
> 

So from doing a quick, high-level rundown of the log recovery
algorithm...

- We read the blocks at the start and end of the physical log to
  determine the current cycle. We then essentially bisect the log to try
  and locate the last block in the current cycle (or more specifically,
  the first block in the previous cycle that has not yet been
  overwritten). IOW, this is where the prospective log head appears to
  be based on an initial cycle value bisection/estimation.
- From there, we scan backwards XLOG_TOTAL_REC_SHIFT() blocks (which is
  the 2MB window referenced above) looking for earlier instances of the
  "previous" cycle. I.e., this is essentially a finer tuned cycle-based
  search for the appropriate head, using the maximum possible partial or
  out of order write window based on the max iclog count/size. (This is
  where we explode.)
- Next, we potentially fine tune the head again based on the last valid
  log record header in the log. This uses XLOG_REC_SHIFT(), which is the
  max size of a single record (256k) and so I don't think should post a
  risk.

It isn't until this step that we actually have a reference to
potentially valid log record content.

- We return the head block and xlog_find_tail() seeks backwards for the
  last valid log record header, using it to look up the tail block.
- We check for an unmount record to determine whether the log is clean.
- If not, we move on to the higher level head/tail validation that has
  been added more recently (torn write detection, tail overwrite
  detection).
  
At this point, it looks like scanning associated with the higher-level
verification is mostly based on max iclog count (i.e., scan windows are
based on a certain number of record headers within the head/tail) rather
than a fixed size window such as the 2MB window used during the cycle
search.

> I think the kernel side defense needs to be some combination of these
> three things:
> 
> 	1. check what the log record sizes we see during the tail
> 	search and trim the "dirty head" window to suit.
> 

The reason I wrote up the above is because I'm not quite following what
you mean here. The first few steps noted above that use the 2MB window
don't actually have log record data to reference, because this is the
early part of the process of locating a valid record.

Am I missing something or just misreading..? Could you elaborate on
where in the above sequence we could tighten up a fixed 2MB window..?

> 	2. If the dirty window is larger than the log, then trim
> 	it to search the entire log rather than overrun into
> 	negative block offsets like we do now.
> 

Otherwise, this is basically the fix I have so far and I think it covers
the immediate underflow problem.

> 	3. in the case of tiny logs, we shouldn't even be allowing
> 	users to mount with logbsize * logbufs > log size / 2.  Then
> 	we can size the initial dirty window appropiately based on
> 	the log size....
> 

Ok, I'll look into this. Should this be a generic restriction or only
associated with logs under a particular size threshold? For example, I
can currently create a small, 512b blocksize fs with a 4800 block
(~2.3MB) log and mount it with '-o logbufs=8,logbsize=256k' without a
problem.

Either way, what is the (log size / 2) metric based on? Thanks.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-10-18 12:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-30  3:28 [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2 Zorro Lang
2017-10-01 22:58 ` Dave Chinner
2017-10-02 22:15   ` Darrick J. Wong
2017-10-02 13:56 ` Brian Foster
2017-10-03  2:27   ` Zorro Lang
2017-10-13 13:29   ` Zorro Lang
2017-10-13 18:16     ` Brian Foster
2017-10-13 19:53       ` Brian Foster
2017-10-14 13:30         ` Zorro Lang
2017-10-14 22:34         ` Dave Chinner
2017-10-16 10:09           ` Brian Foster
2017-10-16 19:11             ` Darrick J. Wong
2017-10-16 22:26               ` Dave Chinner
2017-10-18 12:05                 ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.