Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
* WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
@ 2019-11-08  6:01 Arkadiusz Miśkiewicz
  2019-11-08  6:56 ` Christoph Hellwig
  2019-11-09 22:38 ` Dave Chinner
  0 siblings, 2 replies; 8+ messages in thread
From: Arkadiusz Miśkiewicz @ 2019-11-08  6:01 UTC (permalink / raw)
  To: linux-xfs


Hello.

I have two servers:

backup4 - one with Adaptec ASR8885Q (that's the one which breaks so
often but this time adaptes works)

backup3 - other with software raid only

Both are now running 5.3.8 kernels and both end up like this log below.
It takes ~ up to day to reproduce.

Here are more complete logs and kernel configs:

https://ixion.pld-linux.org/~arekm/p2/xfs/

Any ideas what is happening and what can I do to help debug the problem?

> Nov  8 00:55:19 backup4 kernel: WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
> Nov  8 00:55:19 backup4 kernel: Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_sfq nfnetlink_log nfnetlink xt_NFLOG xt_comment xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables bpfilter xfs mlx4_ib ib_uverbs ib_core mlx4_en ses enclosure scsi_transport_sas joydev input_leds hid_generic usbhid hid coretemp intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper xhci_pci mlx4_core intel_cstate xhci_hcd intel_uncore ehci_pci ehci_hcd igb intel_rapl_perf pcspkr ipmi_si usbcore i2c_i801 mei_me ioatdma acpi_power_meter aacraid i2c_algo_bit ipmi_devintf mei dca lpc_ich i2c_core evdev ipmi_msghandler wmi hwmon acpi_pad button sch_fq_codel ext4 libcrc32c crc32c_generic crc32c_intel crc16 mbcache jbd2 sd_mod raid1 md_mod ahci libahci libata
> Nov  8 00:55:19 backup4 kernel:  scsi_mod
> Nov  8 00:55:19 backup4 kernel: CPU: 5 PID: 25802 Comm: kworker/u65:3 Tainted: G                T 5.3.8-1 #1
> Nov  8 00:55:19 backup4 kernel: Hardware name: Supermicro X10DRi/X10DRi, BIOS 3.0a 02/06/2018
> Nov  8 00:55:19 backup4 kernel: Workqueue: writeback wb_workfn (flush-8:48)
> Nov  8 00:55:19 backup4 kernel: RIP: 0010:xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
> Nov  8 00:55:19 backup4 kernel: Code: b7 c0 83 c0 01 e9 9c fd ff ff 41 80 bc 24 e9 00 00 00 03 49 8d 44 24 48 74 d9 c7 84 24 c0 00 00 00 01 00 00 00 e9 94 fd ff ff <0f> 0b 41 be e4 ff ff ff 48 8d 7c 24 30 44 89 f2 44 89 ee e8 84 6f
> Nov  8 00:55:19 backup4 kernel: RSP: 0018:ffffac6346a5b838 EFLAGS: 00010246
> Nov  8 00:55:19 backup4 kernel: RAX: 0000000000000000 RBX: ffff95c8f972a000 RCX: 0000000000000022
> Nov  8 00:55:19 backup4 kernel: RDX: 0000000000001fda RSI: ffffffffffffffff RDI: ffff95c8f93b0600
> Nov  8 00:55:19 backup4 kernel: RBP: ffffac6346a5b938 R08: 0000000000000000 R09: 0000000000000001
> Nov  8 00:55:19 backup4 kernel: R10: ffff95c33e6f41d8 R11: 0000000000000026 R12: ffff95be37512a80
> Nov  8 00:55:19 backup4 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff95be37512ac8
> Nov  8 00:55:19 backup4 kernel: FS:  0000000000000000(0000) GS:ffff95c8ff940000(0000) knlGS:0000000000000000
> Nov  8 00:55:19 backup4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Nov  8 00:55:19 backup4 kernel: CR2: 00007ff938389000 CR3: 000000048720a001 CR4: 00000000003606e0
> Nov  8 00:55:19 backup4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Nov  8 00:55:19 backup4 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Nov  8 00:55:19 backup4 kernel: Call Trace:
> Nov  8 00:55:19 backup4 kernel:  xfs_map_blocks+0x18e/0x420 [xfs]
> Nov  8 00:55:19 backup4 kernel:  xfs_do_writepage+0x11c/0x440 [xfs]
> Nov  8 00:55:19 backup4 kernel:  write_cache_pages+0x185/0x430
> Nov  8 00:55:19 backup4 kernel:  ? xfs_vm_writepages+0x90/0x90 [xfs]
> Nov  8 00:55:19 backup4 kernel:  xfs_vm_writepages+0x5e/0x90 [xfs]
> Nov  8 00:55:19 backup4 kernel:  do_writepages+0x1c/0x60
> Nov  8 00:55:19 backup4 kernel:  __writeback_single_inode+0x41/0x360
> Nov  8 00:55:19 backup4 kernel:  writeback_sb_inodes+0x20c/0x490
> Nov  8 00:55:19 backup4 kernel:  wb_writeback+0x12a/0x320
> Nov  8 00:55:19 backup4 kernel:  wb_workfn+0xdd/0x4a0
> Nov  8 00:55:19 backup4 kernel:  process_one_work+0x1d5/0x370
> Nov  8 00:55:19 backup4 kernel:  worker_thread+0x4d/0x3d0
> Nov  8 00:55:19 backup4 kernel:  kthread+0xfb/0x140
> Nov  8 00:55:19 backup4 kernel:  ? process_one_work+0x370/0x370
> Nov  8 00:55:19 backup4 kernel:  ? kthread_park+0x80/0x80
> Nov  8 00:55:19 backup4 kernel:  ret_from_fork+0x35/0x40
> Nov  8 00:55:19 backup4 kernel: ---[ end trace a08de0c8c2851936 ]---
> Nov  8 00:55:19 backup4 kernel: XFS (sdd1): page discard on page 00000000e56e0621, inode 0x7a17f4206, offset 10354688.
> Nov  8 00:55:19 backup4 kernel: XFS (sdd1): page discard on page 000000004edc1753, inode 0x7a17f4206, offset 10358784.
> Nov  8 00:55:19 backup4 kernel: XFS (sdd1): page discard on page 0000000092e61276, inode 0x7a17f4206, offset 10362880.
> [...]
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 0000000088af1abd, inode 0x48c3f52e0, offset 692224.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 0000000068aea27c, inode 0x48c3f52e0, offset 696320.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): Internal error xfs_trans_cancel at line 1048 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x568/0x5d0 [xfs]
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 00000000a3687c66, inode 0x48c3f52e0, offset 700416.
> Nov  8 00:55:24 backup4 kernel: CPU: 3 PID: 17761 Comm: cp Tainted: G        W       T 5.3.8-1 #1
> Nov  8 00:55:24 backup4 kernel: Hardware name: Supermicro X10DRi/X10DRi, BIOS 3.0a 02/06/2018
> Nov  8 00:55:24 backup4 kernel: Call Trace:
> Nov  8 00:55:24 backup4 kernel:  dump_stack+0x5c/0x78
> Nov  8 00:55:24 backup4 kernel:  xfs_trans_cancel+0x12e/0x150 [xfs]
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 0000000005fdfc7a, inode 0x48c3f52e0, offset 704512.
> Nov  8 00:55:24 backup4 kernel:  xfs_create+0x568/0x5d0 [xfs]
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 00000000d6f25163, inode 0x48c3f52e0, offset 708608.
> Nov  8 00:55:24 backup4 kernel:  xfs_generic_create+0x262/0x300 [xfs]
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 0000000007856c43, inode 0x48c3f52e0, offset 712704.
> Nov  8 00:55:24 backup4 kernel:  vfs_mkdir+0x115/0x1b0
> Nov  8 00:55:24 backup4 kernel:  do_mkdirat+0x102/0x130
> Nov  8 00:55:24 backup4 kernel:  do_syscall_64+0x5b/0x120
> Nov  8 00:55:24 backup4 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Nov  8 00:55:24 backup4 kernel: RIP: 0033:0x7f3821a6328b
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 0000000011df7484, inode 0x48c3f52e0, offset 716800.
> Nov  8 00:55:24 backup4 kernel: Code: 48 8b 05 f8 1b 0d 00 c7 04 02 5f 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c5 1b 0d 00 f7 d8 64 89 01 48
> Nov  8 00:55:24 backup4 kernel: RSP: 002b:00007ffefbc114e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000053
> Nov  8 00:55:24 backup4 kernel: RAX: ffffffffffffffda RBX: 00007ffefbc11a10 RCX: 00007f3821a6328b
> Nov  8 00:55:24 backup4 kernel: RDX: 000000119df48827 RSI: 00000000000005c0 RDI: 00000000023d5cd0
> Nov  8 00:55:24 backup4 kernel: RBP: 00007ffefbc118e0 R08: 00007ffefbc11a10 R09: 00007ffefbc11a10
> Nov  8 00:55:24 backup4 kernel: R10: 00007ffefbc11660 R11: 0000000000000206 R12: 00000000000005c0
> Nov  8 00:55:24 backup4 kernel: R13: 0000000000000000 R14: 0000000000004000 R15: 0000000000000000
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 00000000494ee2cb, inode 0x48c3f52e0, offset 720896.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 000000009616eed4, inode 0x48c3f52e0, offset 724992.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 00000000ce3b6c81, inode 0x48c3f52e0, offset 729088.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 000000002768ed9c, inode 0x48c3f52e0, offset 733184.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 00000000a136fe98, inode 0x48c3f52e0, offset 737280.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 00000000507c1fd1, inode 0x48c3f52e0, offset 741376.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 000000001ddec55a, inode 0x48c3f52e0, offset 745472.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 0000000031cb158d, inode 0x48c3f52e0, offset 749568.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 000000004bf2c3b1, inode 0x48c3f52e0, offset 753664.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): page discard on page 000000005ad5b5e3, inode 0x48c3f52e0, offset 757760.
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): xfs_do_force_shutdown(0x8) called from line 1049 of file fs/xfs/xfs_trans.c. Return address = 00000000d9b29010
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): Corruption of in-memory data detected.  Shutting down filesystem
> Nov  8 00:55:24 backup4 kernel: XFS (sdd1): Please unmount the filesystem and rectify the problem(s)




-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
  2019-11-08  6:01 WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs] Arkadiusz Miśkiewicz
@ 2019-11-08  6:56 ` Christoph Hellwig
  2019-11-08  7:36   ` Arkadiusz Miśkiewicz
  2019-11-09 22:38 ` Dave Chinner
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-11-08  6:56 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: linux-xfs

On Fri, Nov 08, 2019 at 07:01:15AM +0100, Arkadiusz Miśkiewicz wrote:
> 
> Hello.
> 
> I have two servers:
> 
> backup4 - one with Adaptec ASR8885Q (that's the one which breaks so
> often but this time adaptes works)
> 
> backup3 - other with software raid only
> 
> Both are now running 5.3.8 kernels and both end up like this log below.
> It takes ~ up to day to reproduce.

The WARN_ON means that conversion of delalloc blocks failed to find
free space.  Something that should not be possible due to the delalloc
reservations.  What as the last kernel where you did not see something
like this?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
  2019-11-08  6:56 ` Christoph Hellwig
@ 2019-11-08  7:36   ` Arkadiusz Miśkiewicz
  2019-11-27 15:43     ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Arkadiusz Miśkiewicz @ 2019-11-08  7:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On 08/11/2019 07:56, Christoph Hellwig wrote:
> On Fri, Nov 08, 2019 at 07:01:15AM +0100, Arkadiusz Miśkiewicz wrote:
>>
>> Hello.
>>
>> I have two servers:
>>
>> backup4 - one with Adaptec ASR8885Q (that's the one which breaks so
>> often but this time adaptes works)
>>
>> backup3 - other with software raid only
>>
>> Both are now running 5.3.8 kernels and both end up like this log below.
>> It takes ~ up to day to reproduce.
> 
> The WARN_ON means that conversion of delalloc blocks failed to find
> free space.  Something that should not be possible due to the delalloc
> reservations.  What as the last kernel where you did not see something
> like this?
> 

4.20.13 looks ok
5.1.15 bad (that warn_on triggered)

on both machines according to my old logs

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
  2019-11-08  6:01 WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs] Arkadiusz Miśkiewicz
  2019-11-08  6:56 ` Christoph Hellwig
@ 2019-11-09 22:38 ` Dave Chinner
  2019-11-12 21:22   ` Arkadiusz Miśkiewicz
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2019-11-09 22:38 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: linux-xfs

On Fri, Nov 08, 2019 at 07:01:15AM +0100, Arkadiusz Miśkiewicz wrote:
> 
> Hello.
> 
> I have two servers:
> 
> backup4 - one with Adaptec ASR8885Q (that's the one which breaks so
> often but this time adaptes works)
> 
> backup3 - other with software raid only
> 
> Both are now running 5.3.8 kernels and both end up like this log below.
> It takes ~ up to day to reproduce.
> 
> Here are more complete logs and kernel configs:
> 
> https://ixion.pld-linux.org/~arekm/p2/xfs/
> 
> Any ideas what is happening and what can I do to help debug the problem?
> 
> > Nov  8 00:55:19 backup4 kernel: WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
> > Nov  8 00:55:19 backup4 kernel: Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_sfq nfnetlink_log nfnetlink xt_NFLOG xt_comment xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables bpfilter xfs mlx4_ib ib_uverbs ib_core mlx4_en ses enclosure scsi_transport_sas joydev input_leds hid_generic usbhid hid coretemp intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper xhci_pci mlx4_core intel_cstate xhci_hcd intel_uncore ehci_pci ehci_hcd igb intel_rapl_perf pcspkr ipmi_si usbcore i2c_i801 mei_me ioatdma acpi_power_meter aacraid i2c_algo_bit ipmi_devintf mei dca lpc_ich i2c_core evdev ipmi_msghandler wmi hwmon acpi_pad button sch_fq_codel ext4 libcrc32c crc32c_generic crc32c_intel crc16 mbcache jbd2 sd_mod raid1 md_mod ahci 
 libahci libata
> > Nov  8 00:55:19 backup4 kernel:  scsi_mod
> > Nov  8 00:55:19 backup4 kernel: CPU: 5 PID: 25802 Comm: kworker/u65:3 Tainted: G                T 5.3.8-1 #1
> > Nov  8 00:55:19 backup4 kernel: Hardware name: Supermicro X10DRi/X10DRi, BIOS 3.0a 02/06/2018
> > Nov  8 00:55:19 backup4 kernel: Workqueue: writeback wb_workfn (flush-8:48)
> > Nov  8 00:55:19 backup4 kernel: RIP: 0010:xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
> > Nov  8 00:55:19 backup4 kernel: Code: b7 c0 83 c0 01 e9 9c fd ff ff 41 80 bc 24 e9 00 00 00 03 49 8d 44 24 48 74 d9 c7 84 24 c0 00 00 00 01 00 00 00 e9 94 fd ff ff <0f> 0b 41 be e4 ff ff ff 48 8d 7c 24 30 44 89 f2 44 89 ee e8 84 6f
> > Nov  8 00:55:19 backup4 kernel: RSP: 0018:ffffac6346a5b838 EFLAGS: 00010246
> > Nov  8 00:55:19 backup4 kernel: RAX: 0000000000000000 RBX: ffff95c8f972a000 RCX: 0000000000000022
> > Nov  8 00:55:19 backup4 kernel: RDX: 0000000000001fda RSI: ffffffffffffffff RDI: ffff95c8f93b0600
> > Nov  8 00:55:19 backup4 kernel: RBP: ffffac6346a5b938 R08: 0000000000000000 R09: 0000000000000001
> > Nov  8 00:55:19 backup4 kernel: R10: ffff95c33e6f41d8 R11: 0000000000000026 R12: ffff95be37512a80
> > Nov  8 00:55:19 backup4 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff95be37512ac8
> > Nov  8 00:55:19 backup4 kernel: FS:  0000000000000000(0000) GS:ffff95c8ff940000(0000) knlGS:0000000000000000
> > Nov  8 00:55:19 backup4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Nov  8 00:55:19 backup4 kernel: CR2: 00007ff938389000 CR3: 000000048720a001 CR4: 00000000003606e0
> > Nov  8 00:55:19 backup4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > Nov  8 00:55:19 backup4 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Nov  8 00:55:19 backup4 kernel: Call Trace:
> > Nov  8 00:55:19 backup4 kernel:  xfs_map_blocks+0x18e/0x420 [xfs]
> > Nov  8 00:55:19 backup4 kernel:  xfs_do_writepage+0x11c/0x440 [xfs]
> > Nov  8 00:55:19 backup4 kernel:  write_cache_pages+0x185/0x430
> > Nov  8 00:55:19 backup4 kernel:  ? xfs_vm_writepages+0x90/0x90 [xfs]
> > Nov  8 00:55:19 backup4 kernel:  xfs_vm_writepages+0x5e/0x90 [xfs]
> > Nov  8 00:55:19 backup4 kernel:  do_writepages+0x1c/0x60
> > Nov  8 00:55:19 backup4 kernel:  __writeback_single_inode+0x41/0x360
> > Nov  8 00:55:19 backup4 kernel:  writeback_sb_inodes+0x20c/0x490
> > Nov  8 00:55:19 backup4 kernel:  wb_writeback+0x12a/0x320
> > Nov  8 00:55:19 backup4 kernel:  wb_workfn+0xdd/0x4a0
> > Nov  8 00:55:19 backup4 kernel:  process_one_work+0x1d5/0x370
> > Nov  8 00:55:19 backup4 kernel:  worker_thread+0x4d/0x3d0
> > Nov  8 00:55:19 backup4 kernel:  kthread+0xfb/0x140
> > Nov  8 00:55:19 backup4 kernel:  ? process_one_work+0x370/0x370
> > Nov  8 00:55:19 backup4 kernel:  ? kthread_park+0x80/0x80
> > Nov  8 00:55:19 backup4 kernel:  ret_from_fork+0x35/0x40
> > Nov  8 00:55:19 backup4 kernel: ---[ end trace a08de0c8c2851936 ]---
> > Nov  8 00:55:19 backup4 kernel: XFS (sdd1): page discard on page 00000000e56e0621, inode 0x7a17f4206, offset 10354688.

This goes with what you reported on #xfs:

[9/11/19 08:04] <arekm> [   54.777715] Filesystem "sdd1": reserve blocks depleted! Consider increasing reserve pool size.

And:

[9/11/19 08:16] <arekm> ah, yes [   54.777720] XFS (sdd1): Per-AG reservation for AG 38 failed.  Filesystem may run out of space


which implies that we are regularly under-estimating delayed
allocation metadata reservations, or a large delalloc extent is
being split repeatedly on writeback. e.g. severe file fragmentation
due to operating at/near ENOSPC resulting in a 100MB delalloc extent
being split into individual single block allocations.

Eventually this will deplete the reserve pool and trigger ENOSPC,
which will eventually result in a dirty allocation transaction being
cancelled and shutdown occurring.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
  2019-11-09 22:38 ` Dave Chinner
@ 2019-11-12 21:22   ` Arkadiusz Miśkiewicz
  0 siblings, 0 replies; 8+ messages in thread
From: Arkadiusz Miśkiewicz @ 2019-11-12 21:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On 09/11/2019 23:38, Dave Chinner wrote:
> On Fri, Nov 08, 2019 at 07:01:15AM +0100, Arkadiusz Miśkiewicz wrote:
>>
>> Hello.
>>
>> I have two servers:
>>
>> backup4 - one with Adaptec ASR8885Q (that's the one which breaks so
>> often but this time adaptes works)
>>
>> backup3 - other with software raid only
>>
>> Both are now running 5.3.8 kernels and both end up like this log below.
>> It takes ~ up to day to reproduce.
>>
>> Here are more complete logs and kernel configs:
>>
>> https://ixion.pld-linux.org/~arekm/p2/xfs/
>>
>> Any ideas what is happening and what can I do to help debug the problem?
>>
>>> Nov  8 00:55:19 backup4 kernel: WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
>>> Nov  8 00:55:19 backup4 kernel: Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_sfq nfnetlink_log nfnetlink xt_NFLOG xt_comment xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables bpfilter xfs mlx4_ib ib_uverbs ib_core mlx4_en ses enclosure scsi_transport_sas joydev input_leds hid_generic usbhid hid coretemp intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper xhci_pci mlx4_core intel_cstate xhci_hcd intel_uncore ehci_pci ehci_hcd igb intel_rapl_perf pcspkr ipmi_si usbcore i2c_i801 mei_me ioatdma acpi_power_meter aacraid i2c_algo_bit ipmi_devintf mei dca lpc_ich i2c_core evdev ipmi_msghandler wmi hwmon acpi_pad button sch_fq_codel ext4 libcrc32c crc32c_generic crc32c_intel crc16 mbcache jbd2 sd_mod raid1 md_mod ahci libahci libata
>>> Nov  8 00:55:19 backup4 kernel:  scsi_mod
>>> Nov  8 00:55:19 backup4 kernel: CPU: 5 PID: 25802 Comm: kworker/u65:3 Tainted: G                T 5.3.8-1 #1
>>> Nov  8 00:55:19 backup4 kernel: Hardware name: Supermicro X10DRi/X10DRi, BIOS 3.0a 02/06/2018
>>> Nov  8 00:55:19 backup4 kernel: Workqueue: writeback wb_workfn (flush-8:48)
>>> Nov  8 00:55:19 backup4 kernel: RIP: 0010:xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
>>> Nov  8 00:55:19 backup4 kernel: Code: b7 c0 83 c0 01 e9 9c fd ff ff 41 80 bc 24 e9 00 00 00 03 49 8d 44 24 48 74 d9 c7 84 24 c0 00 00 00 01 00 00 00 e9 94 fd ff ff <0f> 0b 41 be e4 ff ff ff 48 8d 7c 24 30 44 89 f2 44 89 ee e8 84 6f
>>> Nov  8 00:55:19 backup4 kernel: RSP: 0018:ffffac6346a5b838 EFLAGS: 00010246
>>> Nov  8 00:55:19 backup4 kernel: RAX: 0000000000000000 RBX: ffff95c8f972a000 RCX: 0000000000000022
>>> Nov  8 00:55:19 backup4 kernel: RDX: 0000000000001fda RSI: ffffffffffffffff RDI: ffff95c8f93b0600
>>> Nov  8 00:55:19 backup4 kernel: RBP: ffffac6346a5b938 R08: 0000000000000000 R09: 0000000000000001
>>> Nov  8 00:55:19 backup4 kernel: R10: ffff95c33e6f41d8 R11: 0000000000000026 R12: ffff95be37512a80
>>> Nov  8 00:55:19 backup4 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff95be37512ac8
>>> Nov  8 00:55:19 backup4 kernel: FS:  0000000000000000(0000) GS:ffff95c8ff940000(0000) knlGS:0000000000000000
>>> Nov  8 00:55:19 backup4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Nov  8 00:55:19 backup4 kernel: CR2: 00007ff938389000 CR3: 000000048720a001 CR4: 00000000003606e0
>>> Nov  8 00:55:19 backup4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> Nov  8 00:55:19 backup4 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Nov  8 00:55:19 backup4 kernel: Call Trace:
>>> Nov  8 00:55:19 backup4 kernel:  xfs_map_blocks+0x18e/0x420 [xfs]
>>> Nov  8 00:55:19 backup4 kernel:  xfs_do_writepage+0x11c/0x440 [xfs]
>>> Nov  8 00:55:19 backup4 kernel:  write_cache_pages+0x185/0x430
>>> Nov  8 00:55:19 backup4 kernel:  ? xfs_vm_writepages+0x90/0x90 [xfs]
>>> Nov  8 00:55:19 backup4 kernel:  xfs_vm_writepages+0x5e/0x90 [xfs]
>>> Nov  8 00:55:19 backup4 kernel:  do_writepages+0x1c/0x60
>>> Nov  8 00:55:19 backup4 kernel:  __writeback_single_inode+0x41/0x360
>>> Nov  8 00:55:19 backup4 kernel:  writeback_sb_inodes+0x20c/0x490
>>> Nov  8 00:55:19 backup4 kernel:  wb_writeback+0x12a/0x320
>>> Nov  8 00:55:19 backup4 kernel:  wb_workfn+0xdd/0x4a0
>>> Nov  8 00:55:19 backup4 kernel:  process_one_work+0x1d5/0x370
>>> Nov  8 00:55:19 backup4 kernel:  worker_thread+0x4d/0x3d0
>>> Nov  8 00:55:19 backup4 kernel:  kthread+0xfb/0x140
>>> Nov  8 00:55:19 backup4 kernel:  ? process_one_work+0x370/0x370
>>> Nov  8 00:55:19 backup4 kernel:  ? kthread_park+0x80/0x80
>>> Nov  8 00:55:19 backup4 kernel:  ret_from_fork+0x35/0x40
>>> Nov  8 00:55:19 backup4 kernel: ---[ end trace a08de0c8c2851936 ]---
>>> Nov  8 00:55:19 backup4 kernel: XFS (sdd1): page discard on page 00000000e56e0621, inode 0x7a17f4206, offset 10354688.
> 
> This goes with what you reported on #xfs:
> 
> [9/11/19 08:04] <arekm> [   54.777715] Filesystem "sdd1": reserve blocks depleted! Consider increasing reserve pool size.
> 
> And:
> 
> [9/11/19 08:16] <arekm> ah, yes [   54.777720] XFS (sdd1): Per-AG reservation for AG 38 failed.  Filesystem may run out of space

A note - I got these messages only once on backup4. Didn't find such
messages in old logs (on both machines).

> which implies that we are regularly under-estimating delayed
> allocation metadata reservations, or a large delalloc extent is
> being split repeatedly on writeback. e.g. severe file fragmentation
> due to operating at/near ENOSPC resulting in a 100MB delalloc extent
> being split into individual single block allocations.
> 
> Eventually this will deplete the reserve pool and trigger ENOSPC,
> which will eventually result in a dirty allocation transaction being
> cancelled and shutdown occurring.

Did anything important in that area change between 4.20 (where I had no
problems) and 5.1 (first occurrence of assert hit on both machines) ? I
see xfs_bmapi_convert_delalloc related changes in git log back then.

Currently I'm back on 4.19.82 on both machines for 4 days (on 5.3.x 1
day was enough to end up with fs shutdown) and no assert hits.

[resend due to bad vger postmasters]
-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
  2019-11-08  7:36   ` Arkadiusz Miśkiewicz
@ 2019-11-27 15:43     ` Christoph Hellwig
  2019-11-27 16:25       ` Arkadiusz Miśkiewicz
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-11-27 15:43 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: Christoph Hellwig, linux-xfs

On Fri, Nov 08, 2019 at 08:36:31AM +0100, Arkadiusz Miśkiewicz wrote:
> > The WARN_ON means that conversion of delalloc blocks failed to find
> > free space.  Something that should not be possible due to the delalloc
> > reservations.  What as the last kernel where you did not see something
> > like this?
> > 
> 
> 4.20.13 looks ok
> 5.1.15 bad (that warn_on triggered)
> 
> on both machines according to my old logs

Sorry for dropping the ball.  We have some pretty significant changes
between those version as 5.3 has a pretty big rewrite of the high level
writeback code, and this happens in code called from that.  But it
doesn't actually change (or should change) how we use the reservations,
which leaves me a little puzzled.  If I gave you an instrumented kernel
with new trace points, could you provide traces based on that?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
  2019-11-27 15:43     ` Christoph Hellwig
@ 2019-11-27 16:25       ` Arkadiusz Miśkiewicz
  2019-11-27 16:26         ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Arkadiusz Miśkiewicz @ 2019-11-27 16:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On 27/11/2019 16:43, Christoph Hellwig wrote:
> On Fri, Nov 08, 2019 at 08:36:31AM +0100, Arkadiusz Miśkiewicz wrote:
>>> The WARN_ON means that conversion of delalloc blocks failed to find
>>> free space.  Something that should not be possible due to the delalloc
>>> reservations.  What as the last kernel where you did not see something
>>> like this?
>>>
>>
>> 4.20.13 looks ok
>> 5.1.15 bad (that warn_on triggered)
>>
>> on both machines according to my old logs
> 
> Sorry for dropping the ball.  We have some pretty significant changes
> between those version as 5.3 has a pretty big rewrite of the high level
> writeback code, and this happens in code called from that. 

Hm, 5.3 but I saw this on 5.1.15, too. See below. (or did you mean 5.1
was with big changes?)

For now I'm on 4.19.XX for over 2 weeks without this issue, on both servers.

> Oct 29 19:04:47 backup3 kernel: [1573435.718506][ T6332] WARNING: CPU: 2 PID: 6332 at fs/xfs/libxfs/xfs_bmap.c:4513 xfs_bmapi_convert_delalloc+0x44d/0x4b0 [xfs]
> Oct 29 19:04:47 backup3 kernel: [1573435.718507][ T6332] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_sfq bonding nf_log_ipv4 nf_log_common xt_LOG xt_comment xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables bpfilter ext4 crc16 mbcache jbd2 input_leds joydev hid_generic usbhid hid coretemp intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel raid456 kvm async_raid6_recov irqbypass async_memcpy igb async_pq async_xor async_tx xor cmdlinepart intel_spi_platform intel_spi spi_nor mtd iTCO_wdt iTCO_vendor_support evdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid1 aesni_intel aes_x86_64 crypto_simd cryptd raid6_pq glue_helper hwmon dca intel_cstate pcspkr i2c_algo_bit intel_uncore i2c_i801 intel_pch_thermal intel_rapl_perf lpc_ich ipmi_ssif i2c_core xhci_pci xhci_hcd ses enclosure ehci_pci ehci_hcd usbcore ipmi_si ipmi_devintf ipmi_msghandler fan thermal button video ie31200_edac pcc_cpufreq sch_fq_codel xfs libcrc32c crc32c_generic
> Oct 29 19:04:47 backup3 kernel: [1573435.718530][ T6332]  crc32c_intel sd_mod mpt3sas raid_class scsi_transport_sas raid10 md_mod ahci libahci libata scsi_mod
> Oct 29 19:04:47 backup3 kernel: [1573435.718536][ T6332] CPU: 2 PID: 6332 Comm: kworker/u8:9 Tainted: G                T 5.1.15-1 #1
> Oct 29 19:04:47 backup3 kernel: [1573435.718537][ T6332] Hardware name: Supermicro X10SL7-F/X10SL7-F, BIOS 2.00 04/24/2014
> Oct 29 19:04:47 backup3 kernel: [1573435.718540][ T6332] Workqueue: writeback wb_workfn (flush-9:5)
> Oct 29 19:04:47 backup3 kernel: [1573435.718555][ T6332] RIP: 0010:xfs_bmapi_convert_delalloc+0x44d/0x4b0 [xfs]
> Oct 29 19:04:47 backup3 kernel: [1573435.718556][ T6332] Code: 8b 43 38 48 89 44 24 10 e9 3e fc ff ff 80 bb e1 00 00 00 03 48 8d 43 48 74 ce c7 84 24 c8 00 00 00 01 00 00 00 e9 85 fd ff ff <0f> 0b 41 bf e4 ff ff ff 4c 89 e7 44 89 fa 44 89 f6 e8 2d 71 ff ff
> Oct 29 19:04:47 backup3 kernel: [1573435.718556][ T6332] RSP: 0018:ffffb8eb4384b858 EFLAGS: 00010246
> Oct 29 19:04:47 backup3 kernel: [1573435.718557][ T6332] RAX: 0000000000000000 RBX: ffff8d81fa9261c0 RCX: 0000000000000000
> Oct 29 19:04:47 backup3 kernel: [1573435.718558][ T6332] RDX: 0000000000001f8e RSI: ffffffffffffffff RDI: ffff8d83e8d66600
> Oct 29 19:04:47 backup3 kernel: [1573435.718558][ T6332] RBP: ffffb8eb4384b960 R08: 0000000000000000 R09: 000000000001bd06
> Oct 29 19:04:47 backup3 kernel: [1573435.718558][ T6332] R10: ffff8d7f5fafaaa8 R11: 0000000000000001 R12: ffffb8eb4384b890
> Oct 29 19:04:47 backup3 kernel: [1573435.718559][ T6332] R13: ffff8d83dfe8e000 R14: 0000000000000000 R15: 0000000000000000
> Oct 29 19:04:47 backup3 kernel: [1573435.718560][ T6332] FS:  0000000000000000(0000) GS:ffff8d841fd00000(0000) knlGS:0000000000000000
> Oct 29 19:04:47 backup3 kernel: [1573435.718560][ T6332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 29 19:04:47 backup3 kernel: [1573435.718561][ T6332] CR2: 00000000021b3fc0 CR3: 000000061020e005 CR4: 00000000001606e0
> Oct 29 19:04:47 backup3 kernel: [1573435.718561][ T6332] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Oct 29 19:04:47 backup3 kernel: [1573435.718562][ T6332] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Oct 29 19:04:47 backup3 kernel: [1573435.718562][ T6332] Call Trace:
> Oct 29 19:04:47 backup3 kernel: [1573435.718582][ T6332]  xfs_map_blocks+0x1ae/0x450 [xfs]
> Oct 29 19:04:47 backup3 kernel: [1573435.718599][ T6332]  xfs_do_writepage+0x114/0x420 [xfs]
> Oct 29 19:04:47 backup3 kernel: [1573435.718603][ T6332]  write_cache_pages+0x1bd/0x460
> Oct 29 19:04:47 backup3 kernel: [1573435.718620][ T6332]  ? xfs_vm_writepages+0x90/0x90 [xfs]
> Oct 29 19:04:47 backup3 kernel: [1573435.718637][ T6332]  xfs_vm_writepages+0x5b/0x90 [xfs]
> Oct 29 19:04:47 backup3 kernel: [1573435.718639][ T6332]  do_writepages+0x1c/0x60
> Oct 29 19:04:47 backup3 kernel: [1573435.718641][ T6332]  __writeback_single_inode+0x41/0x340
> Oct 29 19:04:47 backup3 kernel: [1573435.718644][ T6332]  ? __wake_up_common_lock+0x88/0xc0
> Oct 29 19:04:47 backup3 kernel: [1573435.718645][ T6332]  writeback_sb_inodes+0x1dd/0x460
> Oct 29 19:04:47 backup3 kernel: [1573435.718646][ T6332]  wb_writeback+0x12a/0x310
> Oct 29 19:04:47 backup3 kernel: [1573435.718648][ T6332]  wb_workfn+0xde/0x460
> Oct 29 19:04:47 backup3 kernel: [1573435.718650][ T6332]  process_one_work+0x1fb/0x3f0
> Oct 29 19:04:47 backup3 kernel: [1573435.718651][ T6332]  worker_thread+0x246/0x3d0
> Oct 29 19:04:47 backup3 kernel: [1573435.718652][ T6332]  ? process_one_work+0x3f0/0x3f0
> Oct 29 19:04:47 backup3 kernel: [1573435.718654][ T6332]  kthread+0x113/0x130
> Oct 29 19:04:47 backup3 kernel: [1573435.718655][ T6332]  ? kthread_park+0x80/0x80
> Oct 29 19:04:47 backup3 kernel: [1573435.718658][ T6332]  ret_from_fork+0x35/0x40
> Oct 29 19:04:47 backup3 kernel: [1573435.718659][ T6332] ---[ end trace 37cbd88cae263afa ]---
> Oct 29 19:04:47 backup3 kernel: [1573435.718663][ T6332] XFS (md5): page discard on page 000000007bcb59fa, inode 0x101f3041f, offset 234819584.
> Oct 29 19:04:47 backup3 kernel: [1573435.718685][ T6332] XFS (md5): page discard on page 00000000b1b6fee1, inode 0x101f3041f, offset 234823680.
> Oct 29 19:04:47 backup3 kernel: [1573435.718703][ T6332] XFS (md5): page discard on page 00000000f7525550, inode 0x101f3041f, offset 234827776.


> But it
> doesn't actually change (or should change) how we use the reservations,
> which leaves me a little puzzled.  If I gave you an instrumented kernel
> with new trace points, could you provide traces based on that?

Probably it will be easier to just bisect and I plan to do that after
backup catches up with missing data (and it takes days).


-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs]
  2019-11-27 16:25       ` Arkadiusz Miśkiewicz
@ 2019-11-27 16:26         ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2019-11-27 16:26 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: Christoph Hellwig, linux-xfs

On Wed, Nov 27, 2019 at 05:25:20PM +0100, Arkadiusz Miśkiewicz wrote:
> Hm, 5.3 but I saw this on 5.1.15, too. See below. (or did you mean 5.1
> was with big changes?)

I meant 5.1, sorry.

> Probably it will be easier to just bisect and I plan to do that after
> backup catches up with missing data (and it takes days).

Thanks a lot!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-08  6:01 WARNING: CPU: 5 PID: 25802 at fs/xfs/libxfs/xfs_bmap.c:4530 xfs_bmapi_convert_delalloc+0x434/0x4a0 [xfs] Arkadiusz Miśkiewicz
2019-11-08  6:56 ` Christoph Hellwig
2019-11-08  7:36   ` Arkadiusz Miśkiewicz
2019-11-27 15:43     ` Christoph Hellwig
2019-11-27 16:25       ` Arkadiusz Miśkiewicz
2019-11-27 16:26         ` Christoph Hellwig
2019-11-09 22:38 ` Dave Chinner
2019-11-12 21:22   ` Arkadiusz Miśkiewicz

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git