Interrupted and resumed scrubs seem to have caused filesystem to go readonly (EFBIG error)

* Interrupted and resumed scrubs seem to have caused filesystem to go readonly (EFBIG error)
@ 2020-01-01 23:35 Graham Cobb
  2020-01-02  1:26 ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Graham Cobb @ 2020-01-01 23:35 UTC (permalink / raw)
  To: linux-btrfs

I have a problem on one BTRFS filesystem. It is not a critical
filesystem (it is used for backups) and I have not yet tried even
unmounting and remounting, let alone a "btrfs check".

The problem seems to be that after several iterations of running 'btrfs
scrub' for 30 minutes, then pausing for a while, then resuming the
scrub, I got a transaction aborted with an EFBIG error and a warning in
the kernel log. The fs went readonly, and transid verify errors are now
reported. The original log extract is available at
http://www.cobb.uk.net/kern.log.bug-010120 but I have pasted the key
part below.

The kernel is a Debian Testing kernel:
Linux black 5.3.0-2-amd64 #1 SMP Debian 5.3.9-3 (2019-11-19) x86_64
GNU/Linux

I run this same script monthly, and I have not seen this problem before,
so I cannot be certain it is caused by the scrub. I have not yet tried
to reproduce it, or to investigate the filesystem (check, etc).

Does anyone recognise this as a known/fixed problem? If not, is there
any particular further information I could gather before or during my
attempt to either recover the filesystem or just rebuild it?

Here is the log (starting with the 7th resumed scrub):

Jan  1 06:41:45 black kernel: [1930660.938782] BTRFS info (device sdc3):
scrub: started on devid 1
Jan  1 06:41:45 black kernel: [1930660.939195] BTRFS info (device sdc3):
scrub: started on devid 4
Jan  1 06:41:45 black kernel: [1930661.475557] ------------[ cut here
]------------
Jan  1 06:41:45 black kernel: [1930661.475562] BTRFS: Transaction
aborted (error -27)
Jan  1 06:41:45 black kernel: [1930661.475667] WARNING: CPU: 0 PID:
771075 at fs/btrfs/extent-tree.c:8247 btrfs_create_pending_block_
groups+0x1db/0x230 [btrfs]
Jan  1 06:41:45 black kernel: [1930661.475669] Modules linked in: fuse
nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bnep nf_t
ables snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq
snd_seq_device cpufreq_userspace cpufreq_powersave cpufreq_cons
ervative nfnetlink_queue nfnetlink_log nfnetlink bluetooth drbg
ansi_cprng ecdh_generic ecc binfmt_misc hid_generic usbhid hid it87 h
wmon_vid radeon edac_mce_amd kvm_amd eeepc_wmi ccp asus_wmi rng_core
evdev sparse_keymap kvm snd_hda_codec_realtek rfkill irqbypass s
nd_hda_codec_generic ttm video wmi_bmof ledtrig_audio pcspkr
snd_hda_codec_hdmi drm_kms_helper fam15h_power k10temp snd_hda_intel snd
_hda_codec snd_hda_core snd_hwdep sp5100_tco drm snd_pcm_oss
snd_mixer_oss watchdog snd_pcm snd_timer sg snd soundcore i2c_algo_bit b
utton acpi_cpufreq eeprom i2c_nforce2 firewire_sbp2 firewire_core
crc_itu_t psmouse nfsd parport_pc ppdev auth_rpcgss lp nfs_acl parp
ort lockd grace sunrpc ip_tables x_tables autofs4 btrfs xor
zstd_decompress zstd_compress raid6_pq libcrc32c
Jan  1 06:41:45 black kernel: [1930661.475710]  ext4 crc16 mbcache jbd2
crc32c_generic sr_mod cdrom uas usb_storage sd_mod dm_crypt d
m_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
ohci_pci aesni_intel ahci libahci xhci_pci aes_x86_64 xhci_hcd c
rypto_simd libata ehci_pci ohci_hcd ehci_hcd cryptd glue_helper scsi_mod
usbcore r8169 i2c_piix4 realtek libphy usb_common wmi
Jan  1 06:41:45 black kernel: [1930661.475737] CPU: 0 PID: 771075 Comm:
btrfs Not tainted 5.3.0-2-amd64 #1 Debian 5.3.9-3
Jan  1 06:41:45 black kernel: [1930661.475739] Hardware name: To be
filled by O.E.M. To be filled by O.E.M./M5A97, BIOS 0705 08/22/20
11
Jan  1 06:41:45 black kernel: [1930661.475767] RIP:
0010:btrfs_create_pending_block_groups+0x1db/0x230 [btrfs]
Jan  1 06:41:45 black kernel: [1930661.475770] Code: e9 26 ff ff ff 48
8b 45 50 f0 48 0f ba a8 38 17 00 00 02 72 17 41 83 fc fb 74 2d
 44 89 e6 48 c7 c7 50 2e 7a c0 e8 23 9d 19 e3 <0f> 0b 44 89 e1 ba 37 20
00 00 48 c7 c6 20 80 79 c0 48 89 ef e8 73
Jan  1 06:41:45 black kernel: [1930661.475772] RSP:
0018:ffff9c69804cfb00 EFLAGS: 00010286
Jan  1 06:41:45 black kernel: [1930661.475775] RAX: 0000000000000000
RBX: ffff909444e7a520 RCX: 0000000000000006
Jan  1 06:41:45 black kernel: [1930661.475777] RDX: 0000000000000007
RSI: 0000000000000096 RDI: ffff90957aa17680
Jan  1 06:41:45 black kernel: [1930661.475779] RBP: ffff90946c745d68
R08: 0000000000010ec1 R09: 0000000000000007
Jan  1 06:41:45 black kernel: [1930661.475781] R10: 0000000000000000
R11: 0000000000000001 R12: 00000000ffffffe5
Jan  1 06:41:45 black kernel: [1930661.475783] R13: ffff90946c745dc0
R14: ffff909575d2e000 R15: ffff909574444000
Jan  1 06:41:45 black kernel: [1930661.475786] FS:
00007ff2eb4c7700(0000) GS:ffff90957aa00000(0000) knlGS:0000000000000000
Jan  1 06:41:45 black kernel: [1930661.475788] CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Jan  1 06:41:45 black kernel: [1930661.475790] CR2: 00005634edab7008
CR3: 00000000bd0f2000 CR4: 00000000000406f0
Jan  1 06:41:45 black kernel: [1930661.475792] Call Trace:
Jan  1 06:41:45 black kernel: [1930661.475826]
__btrfs_end_transaction+0x3f/0x1b0 [btrfs]
Jan  1 06:41:45 black kernel: [1930661.475855]
btrfs_inc_block_group_ro+0x10e/0x150 [btrfs]
Jan  1 06:41:45 black kernel: [1930661.475891]
scrub_enumerate_chunks+0x162/0x560 [btrfs]
Jan  1 06:41:45 black kernel: [1930661.475900]  ?
remove_wait_queue+0x20/0x60
Jan  1 06:41:45 black kernel: [1930661.475936]
btrfs_scrub_dev+0x26b/0x590 [btrfs]
Jan  1 06:41:45 black kernel: [1930661.475942]  ? _cond_resched+0x15/0x30
Jan  1 06:41:45 black kernel: [1930661.475946]  ?
__kmalloc_track_caller+0x16e/0x260
Jan  1 06:41:45 black kernel: [1930661.475980]  ?
btrfs_ioctl+0x82f/0x2e10 [btrfs]
Jan  1 06:41:45 black kernel: [1930661.475984]  ?
__check_object_size+0x136/0x147
Jan  1 06:41:45 black kernel: [1930661.476019]  btrfs_ioctl+0x87a/0x2e10
[btrfs]
Jan  1 06:41:45 black kernel: [1930661.476024]  ?
tomoyo_path_number_perm+0x66/0x1d0
Jan  1 06:41:45 black kernel: [1930661.476030]  ? do_vfs_ioctl+0x40e/0x670
Jan  1 06:41:45 black kernel: [1930661.476033]  do_vfs_ioctl+0x40e/0x670
Jan  1 06:41:45 black kernel: [1930661.476036]  ?
create_task_io_context+0x95/0x100
Jan  1 06:41:45 black kernel: [1930661.476040]  ksys_ioctl+0x5e/0x90
Jan  1 06:41:45 black kernel: [1930661.476044]  __x64_sys_ioctl+0x16/0x20
Jan  1 06:41:45 black kernel: [1930661.476048]  do_syscall_64+0x53/0x140
Jan  1 06:41:45 black kernel: [1930661.476052]
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan  1 06:41:45 black kernel: [1930661.476055] RIP: 0033:0x7ff2eb5b95b7
Jan  1 06:41:45 black kernel: [1930661.476058] Code: 00 00 90 48 8b 05
d9 78 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84
00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b
0d a9 78 0c 00 f7 d8 64 89 01 48
Jan  1 06:41:45 black kernel: [1930661.476061] RSP:
002b:00007ff2eb4c6d38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan  1 06:41:45 black kernel: [1930661.476064] RAX: ffffffffffffffda
RBX: 000055eeaf2e94b0 RCX: 00007ff2eb5b95b7
Jan  1 06:41:45 black kernel: [1930661.476066] RDX: 000055eeaf2e94b0
RSI: 00000000c400941b RDI: 0000000000000003
Jan  1 06:41:45 black kernel: [1930661.476067] RBP: 0000000000000000
R08: 00007ff2eb4c7700 R09: 0000000000000000
Jan  1 06:41:45 black kernel: [1930661.476069] R10: 00007ff2eb4c7700
R11: 0000000000000246 R12: 00007ffc4cfa511e
Jan  1 06:41:45 black kernel: [1930661.476071] R13: 00007ffc4cfa511f
R14: 00007ff2eb4c7700 R15: 00007ff2eb4c6e40
Jan  1 06:41:45 black kernel: [1930661.476075] ---[ end trace
6429c1bf293fecb8 ]---
Jan  1 06:41:45 black kernel: [1930661.476079] BTRFS: error (device
sdc3) in btrfs_create_pending_block_groups:8247: errno=-27 unknown
Jan  1 06:41:45 black kernel: [1930661.476082] BTRFS info (device sdc3):
forced readonly
Jan  1 06:41:45 black kernel: [1930661.489816] BTRFS warning (device
sdc3): failed setting block group ro: -30
Jan  1 06:41:45 black kernel: [1930661.489821] BTRFS info (device sdc3):
scrub: not finished on devid 1 with status: -30
Jan  1 06:41:52 black kernel: [1930668.052295] BTRFS warning (device
sdc3): failed setting block group ro: -30
Jan  1 06:41:52 black kernel: [1930668.052301] BTRFS info (device sdc3):
scrub: not finished on devid 4 with status: -30
Jan  1 06:51:56 black kernel: [1931271.801468] BTRFS error (device
sdc3): parent transid verify failed on 16216583520256 wanted 301800
found 301756
Jan  1 06:51:56 black kernel: [1931271.822215] BTRFS error (device
sdc3): parent transid verify failed on 16216583520256 wanted 301800
found 301756
Jan  1 06:51:57 black kernel: [1931273.492798] BTRFS error (device
sdc3): parent transid verify failed on 16216583520256 wanted 301800
found 301756
Jan  1 06:51:57 black kernel: [1931273.493041] BTRFS error (device
sdc3): parent transid verify failed on 16216583520256 wanted 301800
found 301756

^ permalink raw reply	[flat|nested] 5+ messages in thread