All of lore.kernel.org
 help / color / mirror / Atom feed
* failed to read the system array: -2 / open_ctree failed
@ 2021-08-18 23:36 Christoph Anton Mitterer
  2021-08-19  7:46 ` Nikolay Borisov
  2021-08-19 16:12 ` Anand Jain
  0 siblings, 2 replies; 4+ messages in thread
From: Christoph Anton Mitterer @ 2021-08-18 23:36 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4973 bytes --]

Hey.

I have a (physically not accessible) server that runs Debian unstable
with a 5.10 kernel and two SATA HDDs which have a RAID1 btrfs.

Every now and then the server ... well I don't know, but it probably
crashes and then hangs at boot when it tries to mount the root fs
(which is the aforementioned btrfs RAID) with the error shown in the
screenshot.


When I boot from a rescue system (kernel 5.13.1, but only btrfs-progs
v4.20.1) and do a normal and lowmem fsck, nothing is found:
root@rescue ~ # btrfs check /dev/sda2  ; echo $?
Opening filesystem to check...
Checking filesystem on /dev/sda2
UUID: 67e35b5c-3dfd-4b00-909d-88308e6b8d85
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 42220433408 bytes used, no error found
total csum bytes: 38246716
total tree bytes: 389464064
total fs tree bytes: 213270528
total extent tree bytes: 110657536
btree space waste bytes: 91413374
file data blocks allocated: 2509346316288
 referenced 27977461760
0
root@rescue ~ # btrfs check --mode=lowmem /dev/sda2  ; echo $?
Opening filesystem to check...
Checking filesystem on /dev/sda2
UUID: 67e35b5c-3dfd-4b00-909d-88308e6b8d85
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs done with fs roots in lowmem mode, skipping
[7/7] checking quota groups skipped (not enabled on this FS)
found 42220433408 bytes used, no error found
total csum bytes: 38246716
total tree bytes: 389464064
total fs tree bytes: 213270528
total extent tree bytes: 110657536
btree space waste bytes: 91413374
file data blocks allocated: 2509346316288
 referenced 27977461760
0



And the fs mounts just fine.


What I also found in (the rescue system’s) dmesg is:
[Thu Aug 19 00:30:56 2021] ahci 0000:00:11.0: version 3.0
[Thu Aug 19 00:30:56 2021] ahci 0000:00:11.0: AHCI 0001.0100 32 slots 6 ports 3 Gbps 0x3f impl SATA mode
[Thu Aug 19 00:30:56 2021] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part ccc 
[Thu Aug 19 00:30:56 2021] scsi host2: ahci
[Thu Aug 19 00:30:56 2021] scsi host3: ahci
[Thu Aug 19 00:30:56 2021] scsi host4: ahci
[Thu Aug 19 00:30:56 2021] scsi host5: ahci
[Thu Aug 19 00:30:56 2021] scsi host6: ahci
[Thu Aug 19 00:30:56 2021] scsi host7: ahci
[Thu Aug 19 00:30:56 2021] ata3: SATA max UDMA/133 abar m1024@0xfe8ffc00 port 0xfe8ffd00 irq 22
[Thu Aug 19 00:30:56 2021] ata4: SATA max UDMA/133 abar m1024@0xfe8ffc00 port 0xfe8ffd80 irq 22
[Thu Aug 19 00:30:56 2021] ata5: SATA max UDMA/133 abar m1024@0xfe8ffc00 port 0xfe8ffe00 irq 22
[Thu Aug 19 00:30:56 2021] ata6: SATA max UDMA/133 abar m1024@0xfe8ffc00 port 0xfe8ffe80 irq 22
[Thu Aug 19 00:30:56 2021] ata7: SATA max UDMA/133 abar m1024@0xfe8ffc00 port 0xfe8fff00 irq 22
[Thu Aug 19 00:30:56 2021] ata8: SATA max UDMA/133 abar m1024@0xfe8ffc00 port 0xfe8fff80 irq 22
[Thu Aug 19 00:30:56 2021] ata8: SATA link down (SStatus 0 SControl 300)
[Thu Aug 19 00:30:56 2021] ata7: SATA link down (SStatus 0 SControl 300)
[Thu Aug 19 00:30:56 2021] ata6: SATA link down (SStatus 0 SControl 300)
[Thu Aug 19 00:30:56 2021] ata5: SATA link down (SStatus 0 SControl 300)
[Thu Aug 19 00:30:56 2021] ata4: softreset failed (device not ready)
[Thu Aug 19 00:30:56 2021] ata4: applying PMP SRST workaround and retrying
[Thu Aug 19 00:30:56 2021] ata3: softreset failed (device not ready)
[Thu Aug 19 00:30:56 2021] ata3: applying PMP SRST workaround and retrying

=> the "softreset failed (device not ready)"

with ata3 and ata4 being the disks:
[Thu Aug 19 00:30:57 2021] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[Thu Aug 19 00:30:57 2021] ata4.00: ATA-8: TOSHIBA DT01ACA100, MS2OA750, max UDMA/133
[Thu Aug 19 00:30:57 2021] ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA
[Thu Aug 19 00:30:57 2021] ata4.00: configured for UDMA/133
[Thu Aug 19 00:30:57 2021] powernow_k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ (2 cpu cores) (version 2.20.00)
[Thu Aug 19 00:30:57 2021] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[Thu Aug 19 00:30:57 2021] ata3.00: ATA-8: TOSHIBA DT01ACA100, MS2OA750, max UDMA/133
[Thu Aug 19 00:30:57 2021] ata3.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA
[Thu Aug 19 00:30:57 2021] ata3.00: configured for UDMA/133

Could that be the problem? I mean that there is some timing issue, the
kernel tries to bring up the devices but that fails at first, while it
succeeds later (but then mounting the root fs already failed)?



Maybe it's also some hardware issue (though the provider has already
replaced cables and a memtest didn't reveal anything useful).


Any ideas what the errors (failed to read the system array: -2 /
open_ctree failed) could indicate?

Thanks,
Chris.

[-- Attachment #2: screenshpt.jpeg --]
[-- Type: image/jpeg, Size: 60658 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: failed to read the system array: -2 / open_ctree failed
  2021-08-18 23:36 failed to read the system array: -2 / open_ctree failed Christoph Anton Mitterer
@ 2021-08-19  7:46 ` Nikolay Borisov
  2021-08-19 20:00   ` Christoph Anton Mitterer
  2021-08-19 16:12 ` Anand Jain
  1 sibling, 1 reply; 4+ messages in thread
From: Nikolay Borisov @ 2021-08-19  7:46 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs



On 19.08.21 г. 2:36, Christoph Anton Mitterer wrote:
> Hey.
> 


The system array is the array which holds the chunk maps i.e it's the
first thing which needs to be read from the super block (housed at
offset 64k in the device). So the error basically tells you that this
cannot be read and return -ENOENT. If the system chunk array is broken
then you can't really do anything with the filesystem. But given your
other explanation - that the system doesn't really have corrupted trees
(as visible from btrfs check output) then this is indeed caused by some
timing issues with the hard drives not being able to be brought up.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: failed to read the system array: -2 / open_ctree failed
  2021-08-18 23:36 failed to read the system array: -2 / open_ctree failed Christoph Anton Mitterer
  2021-08-19  7:46 ` Nikolay Borisov
@ 2021-08-19 16:12 ` Anand Jain
  1 sibling, 0 replies; 4+ messages in thread
From: Anand Jain @ 2021-08-19 16:12 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs

Devid 2 is bad/slow.
The screenshot shows devid 2 is missing as an error, which means the 
mount option contains no degraded option. So the mount fails in a RAID1.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: failed to read the system array: -2 / open_ctree failed
  2021-08-19  7:46 ` Nikolay Borisov
@ 2021-08-19 20:00   ` Christoph Anton Mitterer
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Anton Mitterer @ 2021-08-19 20:00 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs

Hey Nikolay, Anand.

After some further testing last night, it seems that a rootdelay=60
options solves the whole issue (and turning it off again, brings it
back in most boots).

So in the end, no btrfs issue at all.

Still a bit strange the whole thing - I'd have expected the device file
to appear only once after the device is really usable.


Thanks,
Chris.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-19 20:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-18 23:36 failed to read the system array: -2 / open_ctree failed Christoph Anton Mitterer
2021-08-19  7:46 ` Nikolay Borisov
2021-08-19 20:00   ` Christoph Anton Mitterer
2021-08-19 16:12 ` Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.