* 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure @ 2022-07-18 12:20 Nix 2022-07-18 13:17 ` Wols Lists ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Nix @ 2022-07-18 12:20 UTC (permalink / raw) To: linux-raid So I have a pair of RAID-6 mdraid arrays on this machine (one of which has a bcache layered on top of it, with an LVM VG stretched across both). Kernel 5.16 + mdadm 4.0 (I know, it's old) works fine, but I just rebooted into 5.18.12 and it failed to assemble. mdadm didn't display anything useful: an mdadm --assemble --scan --auto=md --freeze-reshape simply didn't find anything to assemble, and after that nothing else was going to work. But rebooting into 5.16 worked fine, so everything was (thank goodness) actually still there. Alas I can't say what the state of the blockdevs was (other than that they all seemed to be in /dev, and I'm using DEVICE partitions so they should all have been spotted) or anything else about the boot because console scrollback is still a nonexistent thing (as far as I can tell), it scrolls past too fast for me to video it, and I can't use netconsole because this is the NFS and loghost server for the local network so all the other machines are more or less frozen waiting for NFS to come back. Any suggestions for getting more useful info out of this thing? I suppose I could get a spare laptop and set it up to run as a netconsole server for this one boot... but even that won't tell me what's going on if the error (if any) is reported by some userspace process rather than in the kernel message log. I'll do some mdadm --examine's and look at /proc/partitions next time I try booting (which won't be before this weekend), but I'd be fairly surprised if mdadm itself was at fault, even though it's the failing component and it's old, unless the kernel upgrade has tripped some bug in 4.0 -- or perhaps 4.0 built against a fairly old musl: I haven't even recompiled it since 2019. So this looks like something in the blockdev layer, which at this stage in booting is purely libata-based. (There is an SSD on the machine, but it's used as a bcache cache device and for XFS journals, both of which are at layers below mdadm so can't possibly be involved in this.) -- NULL && (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-18 12:20 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure Nix @ 2022-07-18 13:17 ` Wols Lists 2022-07-19 9:17 ` Jani Partanen 2022-07-20 15:55 ` Nix 2022-07-18 15:55 ` Roger Heflin 2022-07-19 7:00 ` Guoqing Jiang 2 siblings, 2 replies; 25+ messages in thread From: Wols Lists @ 2022-07-18 13:17 UTC (permalink / raw) To: Nix, linux-raid On 18/07/2022 13:20, Nix wrote: > So I have a pair of RAID-6 mdraid arrays on this machine (one of which > has a bcache layered on top of it, with an LVM VG stretched across > both). Kernel 5.16 + mdadm 4.0 (I know, it's old) works fine, but I just > rebooted into 5.18.12 and it failed to assemble. mdadm didn't display > anything useful: an mdadm --assemble --scan --auto=md --freeze-reshape > simply didn't find anything to assemble, and after that nothing else was > going to work. But rebooting into 5.16 worked fine, so everything was > (thank goodness) actually still there. Everything should still be there ... and the difference between mdadm 4.0 and 4.2 isn't that much I don't think ... a few bugfixes here and there ... When you reboot into the new kernel, try lsdrv https://raid.wiki.kernel.org/index.php/Asking_for_help#lsdrv I don't know the current state of play with regard to Python versions there ... last I knew I had to explicitly get it to invoke 2.7 ... But I've not seen any reports of problems elsewhere, so this is either new or unique to you I would think ... Cheers, Wol ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-18 13:17 ` Wols Lists @ 2022-07-19 9:17 ` Jani Partanen 2022-07-19 17:09 ` Wols Lists 2022-07-20 15:55 ` Nix 1 sibling, 1 reply; 25+ messages in thread From: Jani Partanen @ 2022-07-19 9:17 UTC (permalink / raw) To: Wols Lists, Nix, linux-raid Sorry to jump in but could you suggest something what is quite much default programs, not something that works only debian or something.. lsdrv on Fedora 36 spit this: ./lsdrv File "/root/lsdrv/./lsdrv", line 323 os.mkdir('/dev/block', 0755) ^ SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers Wols Lists kirjoitti 18/07/2022 klo 16.17: > On 18/07/2022 13:20, Nix wrote: >> So I have a pair of RAID-6 mdraid arrays on this machine (one of which >> has a bcache layered on top of it, with an LVM VG stretched across >> both). Kernel 5.16 + mdadm 4.0 (I know, it's old) works fine, but I just >> rebooted into 5.18.12 and it failed to assemble. mdadm didn't display >> anything useful: an mdadm --assemble --scan --auto=md --freeze-reshape >> simply didn't find anything to assemble, and after that nothing else was >> going to work. But rebooting into 5.16 worked fine, so everything was >> (thank goodness) actually still there. > > Everything should still be there ... and the difference between mdadm > 4.0 and 4.2 isn't that much I don't think ... a few bugfixes here and > there ... > > When you reboot into the new kernel, try lsdrv > > https://raid.wiki.kernel.org/index.php/Asking_for_help#lsdrv > > I don't know the current state of play with regard to Python versions > there ... last I knew I had to explicitly get it to invoke 2.7 ... > > But I've not seen any reports of problems elsewhere, so this is either > new or unique to you I would think ... > > Cheers, > Wol ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-19 9:17 ` Jani Partanen @ 2022-07-19 17:09 ` Wols Lists 2022-07-19 17:40 ` Roger Heflin 2022-07-19 18:10 ` Reindl Harald 0 siblings, 2 replies; 25+ messages in thread From: Wols Lists @ 2022-07-19 17:09 UTC (permalink / raw) To: Jani Partanen, Nix, linux-raid On 19/07/2022 10:17, Jani Partanen wrote: > Sorry to jump in but could you suggest something what is quite much > default programs, not something that works only debian or something.. > lsdrv on Fedora 36 spit this: > ./lsdrv > File "/root/lsdrv/./lsdrv", line 323 > os.mkdir('/dev/block', 0755) > ^ > SyntaxError: leading zeros in decimal integer literals are not > permitted; use an 0o prefix for octal integers > Well, LAST I TRIED, it worked fine on gentoo, so it's certainly not Debian-specific. I did have to tell it to use Python 2.7 because gentoo defaulted to 3. Apparently it's since been updated, but I haven't (tried to) use it for a while. I've just googled your error, and it looks like a Python-2-ism, so it's nothing to do with the distro, and everything to do with the Python version change. (As I did warn about in my original post!) Cheers, Wol ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-19 17:09 ` Wols Lists @ 2022-07-19 17:40 ` Roger Heflin 2022-07-19 18:10 ` Reindl Harald 1 sibling, 0 replies; 25+ messages in thread From: Roger Heflin @ 2022-07-19 17:40 UTC (permalink / raw) To: Wols Lists; +Cc: Jani Partanen, Nix, Linux RAID It worked fine for me on Fedora so long as I change it to use python2.7, like the note warned about. One has to love languages/standards (NOT) that make simple almost pointless changes to something that has been around forever, and was a standard, but is now being removed and breaks significant amounts of code. On Tue, Jul 19, 2022 at 12:21 PM Wols Lists <antlists@youngman.org.uk> wrote: > > On 19/07/2022 10:17, Jani Partanen wrote: > > Sorry to jump in but could you suggest something what is quite much > > default programs, not something that works only debian or something.. > > lsdrv on Fedora 36 spit this: > > ./lsdrv > > File "/root/lsdrv/./lsdrv", line 323 > > os.mkdir('/dev/block', 0755) > > ^ > > SyntaxError: leading zeros in decimal integer literals are not > > permitted; use an 0o prefix for octal integers > > > Well, LAST I TRIED, it worked fine on gentoo, so it's certainly not > Debian-specific. > > I did have to tell it to use Python 2.7 because gentoo defaulted to 3. > Apparently it's since been updated, but I haven't (tried to) use it for > a while. > > I've just googled your error, and it looks like a Python-2-ism, so it's > nothing to do with the distro, and everything to do with the Python > version change. (As I did warn about in my original post!) > > Cheers, > Wol ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-19 17:09 ` Wols Lists 2022-07-19 17:40 ` Roger Heflin @ 2022-07-19 18:10 ` Reindl Harald 2022-07-19 19:22 ` Wol 1 sibling, 1 reply; 25+ messages in thread From: Reindl Harald @ 2022-07-19 18:10 UTC (permalink / raw) To: Wols Lists, Jani Partanen, Nix, linux-raid Am 19.07.22 um 19:09 schrieb Wols Lists: > Well, LAST I TRIED, it worked fine on gentoo, so it's certainly not > Debian-specific i can't follow that logic ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-19 18:10 ` Reindl Harald @ 2022-07-19 19:22 ` Wol 2022-07-19 20:01 ` Reindl Harald 0 siblings, 1 reply; 25+ messages in thread From: Wol @ 2022-07-19 19:22 UTC (permalink / raw) To: Reindl Harald, Jani Partanen, Nix, linux-raid On 19/07/2022 19:10, Reindl Harald wrote: > > > Am 19.07.22 um 19:09 schrieb Wols Lists: >> Well, LAST I TRIED, it worked fine on gentoo, so it's certainly not >> Debian-specific > > i can't follow that logic Gentoo is a rolling release. I strongly suspect that 2.7 is deceased. It is no more. It has shuffled off this mortal coil. Cheers, Wol ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-19 19:22 ` Wol @ 2022-07-19 20:01 ` Reindl Harald 2022-07-19 21:51 ` Wols Lists 0 siblings, 1 reply; 25+ messages in thread From: Reindl Harald @ 2022-07-19 20:01 UTC (permalink / raw) To: Wol, Jani Partanen, Nix, linux-raid Am 19.07.22 um 21:22 schrieb Wol: > On 19/07/2022 19:10, Reindl Harald wrote: >> >> >> Am 19.07.22 um 19:09 schrieb Wols Lists: >>> Well, LAST I TRIED, it worked fine on gentoo, so it's certainly not >>> Debian-specific >> >> i can't follow that logic > > Gentoo is a rolling release. I strongly suspect that 2.7 is deceased. It > is no more. It has shuffled off this mortal coil no matter what just because something woked fine on Gentoo don't rule out a Debian specific problem and for sure not "certainly" ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-19 20:01 ` Reindl Harald @ 2022-07-19 21:51 ` Wols Lists 2022-07-19 22:35 ` Jani Partanen 0 siblings, 1 reply; 25+ messages in thread From: Wols Lists @ 2022-07-19 21:51 UTC (permalink / raw) To: Reindl Harald, Jani Partanen, Nix, linux-raid On 19/07/2022 21:01, Reindl Harald wrote: > > > Am 19.07.22 um 21:22 schrieb Wol: >> On 19/07/2022 19:10, Reindl Harald wrote: >>> >>> >>> Am 19.07.22 um 19:09 schrieb Wols Lists: >>>> Well, LAST I TRIED, it worked fine on gentoo, so it's certainly not >>>> Debian-specific >>> >>> i can't follow that logic >> >> Gentoo is a rolling release. I strongly suspect that 2.7 is deceased. >> It is no more. It has shuffled off this mortal coil > > no matter what just because something woked fine on Gentoo don't rule > out a Debian specific problem and for sure not "certainly" PLEASE FOLLOW THE THREAD. The complaint was it was a Debian-specific PROGRAM - not problem. Cheers, Wol ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-19 21:51 ` Wols Lists @ 2022-07-19 22:35 ` Jani Partanen 2022-07-20 12:33 ` Phil Turmel 0 siblings, 1 reply; 25+ messages in thread From: Jani Partanen @ 2022-07-19 22:35 UTC (permalink / raw) To: Wols Lists, Reindl Harald, Nix, linux-raid I said debian specific because it provide some debian stuff. Maybe debian comes python 2.7 installed by default. Tool itself havent got any update in 4 years. It really should be converted to work with python3. IIRC python devs have said that python 2.7 should not be used anymore and that was already years ago. https://www.python.org/doc/sunset-python-2/ Wols Lists kirjoitti 20/07/2022 klo 0.51: > On 19/07/2022 21:01, Reindl Harald wrote: >> >> >> Am 19.07.22 um 21:22 schrieb Wol: >>> On 19/07/2022 19:10, Reindl Harald wrote: >>>> >>>> >>>> Am 19.07.22 um 19:09 schrieb Wols Lists: >>>>> Well, LAST I TRIED, it worked fine on gentoo, so it's certainly >>>>> not Debian-specific >>>> >>>> i can't follow that logic >>> >>> Gentoo is a rolling release. I strongly suspect that 2.7 is >>> deceased. It is no more. It has shuffled off this mortal coil >> >> no matter what just because something woked fine on Gentoo don't rule >> out a Debian specific problem and for sure not "certainly" > > PLEASE FOLLOW THE THREAD. > > The complaint was it was a Debian-specific PROGRAM - not problem. > > Cheers, > Wol ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-19 22:35 ` Jani Partanen @ 2022-07-20 12:33 ` Phil Turmel 0 siblings, 0 replies; 25+ messages in thread From: Phil Turmel @ 2022-07-20 12:33 UTC (permalink / raw) To: Jani Partanen, Wols Lists, Reindl Harald, Nix, linux-raid As the author of lsdrv, I can say that python3 is on my to-do list, via the techniques ESR recommended for mutual compatibility. Pointless syntax changes like this one for constants is discouraging. FWIW, python 2.7 (the language) is far from dead, as jython has not yet released a python 3 (the language) implementation. I use jython heavily in my commercial work, on a platform that is very popular worldwide. (Ignition by Inductive Automation, if anyone cares.) The two primary drivers of the move to python3, namely, distinguishing between character and bytes, and the introduction of async programming to mitigate the GIL, are both non-issues in jython 2.7 thanks to the java standard library. I suspect python 2.7 (the language) will be alive and kicking for at least another decade. On 7/19/22 18:35, Jani Partanen wrote: > I said debian specific because it provide some debian stuff. Maybe > debian comes python 2.7 installed by default. > Tool itself havent got any update in 4 years. It really should be > converted to work with python3. > IIRC python devs have said that python 2.7 should not be used anymore > and that was already years ago. > > https://www.python.org/doc/sunset-python-2/ > > > Wols Lists kirjoitti 20/07/2022 klo 0.51: >> On 19/07/2022 21:01, Reindl Harald wrote: >>> >>> >>> Am 19.07.22 um 21:22 schrieb Wol: >>>> On 19/07/2022 19:10, Reindl Harald wrote: >>>>> >>>>> >>>>> Am 19.07.22 um 19:09 schrieb Wols Lists: >>>>>> Well, LAST I TRIED, it worked fine on gentoo, so it's certainly >>>>>> not Debian-specific >>>>> >>>>> i can't follow that logic >>>> >>>> Gentoo is a rolling release. I strongly suspect that 2.7 is >>>> deceased. It is no more. It has shuffled off this mortal coil >>> >>> no matter what just because something woked fine on Gentoo don't rule >>> out a Debian specific problem and for sure not "certainly" >> >> PLEASE FOLLOW THE THREAD. >> >> The complaint was it was a Debian-specific PROGRAM - not problem. >> >> Cheers, >> Wol > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-18 13:17 ` Wols Lists 2022-07-19 9:17 ` Jani Partanen @ 2022-07-20 15:55 ` Nix 2022-07-20 18:32 ` Wols Lists 1 sibling, 1 reply; 25+ messages in thread From: Nix @ 2022-07-20 15:55 UTC (permalink / raw) To: Wols Lists; +Cc: linux-raid On 18 Jul 2022, Wols Lists spake thusly: > On 18/07/2022 13:20, Nix wrote: >> So I have a pair of RAID-6 mdraid arrays on this machine (one of which >> has a bcache layered on top of it, with an LVM VG stretched across >> both). Kernel 5.16 + mdadm 4.0 (I know, it's old) works fine, but I just >> rebooted into 5.18.12 and it failed to assemble. mdadm didn't display >> anything useful: an mdadm --assemble --scan --auto=md --freeze-reshape >> simply didn't find anything to assemble, and after that nothing else was >> going to work. But rebooting into 5.16 worked fine, so everything was >> (thank goodness) actually still there. > > Everything should still be there ... and the difference between mdadm 4.0 and 4.2 isn't that much I don't think ... a few bugfixes > here and there ... Yeah, I was just a bit worried :) > When you reboot into the new kernel, try lsdrv > > https://raid.wiki.kernel.org/index.php/Asking_for_help#lsdrv Um, it's a Python script. I don't have Python in my initramfs and it seems a bit excessive to put it there. (The system doesn't boot any further than that.) I'll stick dmesg in there so I can at least see what that says about block device enumeration, because I fear something has gone wrong with that. Normal operation says (non-blockdev lines pruned): [ 2.931660] SCSI subsystem initialized [ 2.931660] libata version 3.00 loaded. [ 4.116098] ahci 0000:00:11.4: version 3.0 [ 4.116209] ahci 0000:00:11.4: SSS flag set, parallel bus scan disabled [ 4.116241] ahci 0000:00:11.4: AHCI 0001.0300 32 slots 4 ports 6 Gbps 0xf impl SATA mode [ 4.116246] ahci 0000:00:11.4: flags: 64bit ncq stag led clo pio slum part ems apst [ 4.197974] scsi host0: ahci [ 4.198200] scsi host1: ahci [ 4.198475] scsi host2: ahci [ 4.198719] scsi host3: ahci [ 4.198755] ata1: SATA max UDMA/133 abar m2048@0x91d00000 port 0x91d00100 irq 40 [ 4.198760] ata2: SATA max UDMA/133 abar m2048@0x91d00000 port 0x91d00180 irq 40 [ 4.198764] ata3: SATA max UDMA/133 abar m2048@0x91d00000 port 0x91d00200 irq 40 [ 4.198767] ata4: SATA max UDMA/133 abar m2048@0x91d00000 port 0x91d00280 irq 40 [ 4.198891] ahci 0000:00:1f.2: SSS flag set, parallel bus scan disabled [ 4.198925] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x3f impl SATA mode [ 4.198929] ahci 0000:00:1f.2: flags: 64bit ncq stag led clo pio slum part ems apst [ 4.348114] scsi host4: ahci [ 4.348244] scsi host5: ahci [ 4.348364] scsi host6: ahci [ 4.348487] scsi host7: ahci [ 4.348605] scsi host8: ahci [ 4.348718] scsi host9: ahci [ 4.348747] ata5: SATA max UDMA/133 abar m2048@0x91d04000 port 0x91d04100 irq 41 [ 4.348751] ata6: SATA max UDMA/133 abar m2048@0x91d04000 port 0x91d04180 irq 41 [ 4.348754] ata7: SATA max UDMA/133 abar m2048@0x91d04000 port 0x91d04200 irq 41 [ 4.348757] ata8: SATA max UDMA/133 abar m2048@0x91d04000 port 0x91d04280 irq 41 [ 4.348760] ata9: SATA max UDMA/133 abar m2048@0x91d04000 port 0x91d04300 irq 41 [ 4.348763] ata10: SATA max UDMA/133 abar m2048@0x91d04000 port 0x91d04380 irq 41 [ 4.530018] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 4.566415] ata1.00: ATA-10: ST8000NM0055-1RM112, SN02, max UDMA/133 [ 4.566826] ata1.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 32), AA [ 4.566834] ata1.00: Features: NCQ-sndrcv [ 4.569771] ata1.00: configured for UDMA/133 [ 4.569903] scsi 0:0:0:0: Direct-Access ATA ST8000NM0055-1RM SN02 PQ: 0 ANSI: 5 [ 4.570019] scsi 0:0:0:0: Attached scsi generic sg0 type 0 [ 4.570110] sd 0:0:0:0: [sda] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) [ 4.570115] sd 0:0:0:0: [sda] 4096-byte physical blocks [ 4.570125] sd 0:0:0:0: [sda] Write Protect is off [ 4.570133] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 4.570147] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 4.610510] sda: sda1 sda2 sda3 sda4 [ 4.610751] sd 0:0:0:0: [sda] Attached SCSI disk [ 4.689645] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 4.690219] ata5.00: ATA-9: INTEL SSDSC2BB480G6, G2010150, max UDMA/133 [ 4.690225] ata5.00: 937703088 sectors, multi 1: LBA48 NCQ (depth 32) [ 4.690860] ata5.00: configured for UDMA/133 [ 4.899866] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 4.938473] ata2.00: ATA-10: ST8000NM0055-1RM112, SN02, max UDMA/133 [ 4.938961] ata2.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 32), AA [ 4.938969] ata2.00: Features: NCQ-sndrcv [ 4.941907] ata2.00: configured for UDMA/133 [ 4.942104] scsi 1:0:0:0: Direct-Access ATA ST8000NM0055-1RM SN02 PQ: 0 ANSI: 5 [ 4.942278] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 4.942382] sd 1:0:0:0: [sdb] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) [ 4.942390] sd 1:0:0:0: [sdb] 4096-byte physical blocks [ 4.942400] sd 1:0:0:0: [sdb] Write Protect is off [ 4.942407] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 4.942421] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 4.987817] sdb: sdb1 sdb2 sdb3 sdb4 [ 4.988027] sd 1:0:0:0: [sdb] Attached SCSI disk [ 5.270040] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 5.301388] ata3.00: ATA-10: ST8000NM0055-1RM112, SN02, max UDMA/133 [ 5.301788] ata3.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 32), AA [ 5.301796] ata3.00: Features: NCQ-sndrcv [ 5.304877] ata3.00: configured for UDMA/133 [ 5.305004] scsi 2:0:0:0: Direct-Access ATA ST8000NM0055-1RM SN02 PQ: 0 ANSI: 5 [ 5.305121] scsi 2:0:0:0: Attached scsi generic sg2 type 0 [ 5.305214] sd 2:0:0:0: [sdc] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) [ 5.305223] sd 2:0:0:0: [sdc] 4096-byte physical blocks [ 5.305233] sd 2:0:0:0: [sdc] Write Protect is off [ 5.305240] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 5.305255] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.349535] sdc: sdc1 sdc2 sdc3 sdc4 [ 5.349777] sd 2:0:0:0: [sdc] Attached SCSI disk [ 5.628177] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 5.683296] ata4.00: ATA-10: ST8000NM0055-1RM112, SN02, max UDMA/133 [ 5.704658] ata4.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 32), AA [ 5.723876] ata4.00: Features: NCQ-sndrcv [ 5.749175] ata4.00: configured for UDMA/133 [ 5.751203] scsi 3:0:0:0: Direct-Access ATA ST8000NM0055-1RM SN02 PQ: 0 ANSI: 5 [ 5.753293] sd 3:0:0:0: Attached scsi generic sg3 type 0 [ 5.754362] sd 3:0:0:0: [sdd] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) [ 5.754366] sd 3:0:0:0: [sdd] 4096-byte physical blocks [ 5.754566] sd 3:0:0:0: [sdd] Write Protect is off [ 5.754572] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 5.754943] sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.756201] scsi 4:0:0:0: Direct-Access ATA INTEL SSDSC2BB48 0150 PQ: 0 ANSI: 5 [ 5.759989] sd 4:0:0:0: Attached scsi generic sg4 type 0 [ 5.760299] ata5.00: Enabling discard_zeroes_data [ 5.760991] sd 4:0:0:0: [sde] 937703088 512-byte logical blocks: (480 GB/447 GiB) [ 5.761024] sd 4:0:0:0: [sde] 4096-byte physical blocks [ 5.761510] sd 4:0:0:0: [sde] Write Protect is off [ 5.761552] sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00 [ 5.761741] sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.763924] ata5.00: Enabling discard_zeroes_data [ 5.773947] sde: sde1 sde2 sde3 sde4 [ 5.777368] ata5.00: Enabling discard_zeroes_data [ 5.783988] sd 4:0:0:0: [sde] Attached SCSI disk [ 5.802611] sdd: sdd1 sdd2 sdd3 sdd4 [ 5.983148] sd 3:0:0:0: [sdd] Attached SCSI disk [ 6.089568] ata6: SATA link down (SStatus 0 SControl 300) [ 6.168948] scsi host10: usb-storage 4-2:1.0 [ 6.439910] ata7: SATA link down (SStatus 0 SControl 300) [ 6.679999] scsi host11: usb-storage 4-5:1.0 [ 6.781246] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 6.810999] ata8.00: ATA-9: WDC WD8002FRYZ-01FF2B0, 01.01H01, max UDMA/133 [ 6.826101] ata8.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 32), AA [ 6.836947] ata8.00: Features: NCQ-prio [ 6.855027] ata8.00: configured for UDMA/133 [ 6.865804] scsi 7:0:0:0: Direct-Access ATA WDC WD8002FRYZ-0 1H01 PQ: 0 ANSI: 5 [ 6.876950] sd 7:0:0:0: Attached scsi generic sg5 type 0 [ 6.878866] sd 7:0:0:0: [sdf] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) [ 6.898797] sd 7:0:0:0: [sdf] 4096-byte physical blocks [ 6.909825] sd 7:0:0:0: [sdf] Write Protect is off [ 6.920793] sd 7:0:0:0: [sdf] Mode Sense: 00 3a 00 00 [ 6.920885] sd 7:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 6.984949] sdf: sdf1 sdf2 sdf3 sdf4 [ 6.996451] sd 7:0:0:0: [sdf] Attached SCSI disk [ 7.237933] scsi 10:0:0:0: Direct-Access WD Elements 2620 1018 PQ: 0 ANSI: 6 [ 7.240503] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 7.250420] sd 10:0:0:0: Attached scsi generic sg6 type 0 [ 7.254284] sd 10:0:0:0: [sdg] Very big device. Trying to use READ CAPACITY(16). [ 7.256321] sd 10:0:0:0: [sdg] 7813969920 512-byte logical blocks: (4.00 TB/3.64 TiB) [ 7.256323] sd 10:0:0:0: [sdg] 4096-byte physical blocks [ 7.259476] sd 10:0:0:0: [sdg] Write Protect is off [ 7.259557] sd 10:0:0:0: [sdg] Mode Sense: 47 00 10 08 [ 7.262178] sd 10:0:0:0: [sdg] No Caching mode page found [ 7.263117] ata9.00: ATAPI: DRW-24D5MT, 1.00, max UDMA/133 [ 7.265518] ata9.00: configured for UDMA/133 [ 7.273067] scsi 8:0:0:0: CD-ROM ASUS DRW-24D5MT 1.00 PQ: 0 ANSI: 5 [ 7.367203] sd 10:0:0:0: [sdg] Assuming drive cache: write through [ 7.422400] sdg: sdg1 [ 7.434721] sd 10:0:0:0: [sdg] Attached SCSI disk [ 7.510635] sr 8:0:0:0: [sr0] scsi3-mmc drive: 48x/12x writer dvd-ram cd/rw xa/form2 cdda tray [ 7.522495] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 7.588448] sr 8:0:0:0: Attached scsi CD-ROM sr0 [ 7.588544] sr 8:0:0:0: Attached scsi generic sg7 type 5 [ 7.718002] scsi 11:0:0:0: Direct-Access WD My Book 25EE 4004 PQ: 0 ANSI: 6 [ 7.729417] sd 11:0:0:0: Attached scsi generic sg8 type 0 [ 7.730810] sd 11:0:0:0: [sdh] Very big device. Trying to use READ CAPACITY(16). [ 7.742732] scsi 11:0:0:1: Enclosure WD SES Device 4004 PQ: 0 ANSI: 6 [ 7.752636] sd 11:0:0:0: [sdh] 15628052480 512-byte logical blocks: (8.00 TB/7.28 TiB) [ 7.752639] sd 11:0:0:0: [sdh] 4096-byte physical blocks [ 7.754504] sd 11:0:0:0: [sdh] Write Protect is off [ 7.754637] sd 11:0:0:0: [sdh] Mode Sense: 47 00 10 08 [ 7.756623] sd 11:0:0:0: [sdh] No Caching mode page found [ 7.756639] sd 11:0:0:0: [sdh] Assuming drive cache: write through [ 7.776877] scsi 11:0:0:1: Attached scsi generic sg9 type 13 [ 7.870482] sdh: sdh1 [ 7.889706] sd 11:0:0:0: [sdh] Attached SCSI disk [ 7.931823] ata10: SATA link down (SStatus 0 SControl 300) (and then, of course) [ 9.547004] md: md127 stopped. [ 9.559904] md127: detected capacity change from 0 to 2620129280 [ 9.833720] md: md126 stopped. [ 9.847327] md/raid:md126: device sda4 operational as raid disk 0 [ 9.857837] md/raid:md126: device sdf4 operational as raid disk 4 [ 9.868167] md/raid:md126: device sdd4 operational as raid disk 3 [ 9.878245] md/raid:md126: device sdc4 operational as raid disk 2 [ 9.887941] md/raid:md126: device sdb4 operational as raid disk 1 [ 9.897551] md/raid:md126: raid level 6 active with 5 out of 5 devices, algorithm 2 [ 9.925899] md126: detected capacity change from 0 to 14520041472 [ 10.265325] md: md125 stopped. [ 10.276577] md/raid:md125: device sda3 operational as raid disk 0 [ 10.285798] md/raid:md125: device sdf3 operational as raid disk 4 [ 10.294810] md/raid:md125: device sdd3 operational as raid disk 3 [ 10.303631] md/raid:md125: device sdc3 operational as raid disk 2 [ 10.312258] md/raid:md125: device sdb3 operational as raid disk 1 [ 10.321129] md/raid:md125: raid level 6 active with 5 out of 5 devices, algorithm 2 [ 10.329649] md125: detected capacity change from 0 to 30783378432 and then piles of noise as bcache initializes; but I'm betting I'll see something quite different this time. > But I've not seen any reports of problems elsewhere, so this is either > new or unique to you I would think ... I could believe either :) -- NULL && (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-20 15:55 ` Nix @ 2022-07-20 18:32 ` Wols Lists 2022-07-22 9:41 ` Nix 0 siblings, 1 reply; 25+ messages in thread From: Wols Lists @ 2022-07-20 18:32 UTC (permalink / raw) To: Nix; +Cc: linux-raid On 20/07/2022 16:55, Nix wrote: > [ 9.833720] md: md126 stopped. > [ 9.847327] md/raid:md126: device sda4 operational as raid disk 0 > [ 9.857837] md/raid:md126: device sdf4 operational as raid disk 4 > [ 9.868167] md/raid:md126: device sdd4 operational as raid disk 3 > [ 9.878245] md/raid:md126: device sdc4 operational as raid disk 2 > [ 9.887941] md/raid:md126: device sdb4 operational as raid disk 1 > [ 9.897551] md/raid:md126: raid level 6 active with 5 out of 5 devices, algorithm 2 > [ 9.925899] md126: detected capacity change from 0 to 14520041472 Hmm. Most of that looks perfectly normal to me. The only oddity, to my eyes, is that md126 is stopped before the disks become operational. That could be perfectly okay, it could be down to a bug, whatever whatever. The capacity change thing scares some people but that's a a normal part of an array coming up ... Cheers, Wol ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-20 18:32 ` Wols Lists @ 2022-07-22 9:41 ` Nix 2022-07-22 11:58 ` Roger Heflin 0 siblings, 1 reply; 25+ messages in thread From: Nix @ 2022-07-22 9:41 UTC (permalink / raw) To: Wols Lists; +Cc: linux-raid On 20 Jul 2022, Wols Lists outgrape: > On 20/07/2022 16:55, Nix wrote: >> [ 9.833720] md: md126 stopped. >> [ 9.847327] md/raid:md126: device sda4 operational as raid disk 0 >> [ 9.857837] md/raid:md126: device sdf4 operational as raid disk 4 >> [ 9.868167] md/raid:md126: device sdd4 operational as raid disk 3 >> [ 9.878245] md/raid:md126: device sdc4 operational as raid disk 2 >> [ 9.887941] md/raid:md126: device sdb4 operational as raid disk 1 >> [ 9.897551] md/raid:md126: raid level 6 active with 5 out of 5 devices, algorithm 2 >> [ 9.925899] md126: detected capacity change from 0 to 14520041472 > > Hmm. > > Most of that looks perfectly normal to me. The only oddity, to my eyes, is that md126 is stopped before the disks become > operational. That could be perfectly okay, it could be down to a bug, whatever whatever. Yeah this is the *working* boot. I can't easily get logs of the non-working one because, well, no writable filesystems and most of the interesting stuff scrolls straight off the screen anyway. (It's mostly for comparison with the non-working boot once I manage to capture that. Somehow. A high-speed camera on video mode and hand-transcribing? Uggh.) -- NULL && (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-22 9:41 ` Nix @ 2022-07-22 11:58 ` Roger Heflin 2022-09-29 12:41 ` Nix 0 siblings, 1 reply; 25+ messages in thread From: Roger Heflin @ 2022-07-22 11:58 UTC (permalink / raw) To: Nix; +Cc: Wols Lists, Linux RAID if you find the partitions missing if you initrd has kpartx on it that will create the mappings. kpartx -av <device> If something is not creating the partitions a workaround might be simply to add that command in before the commands that bring up the array. There did seem to be a lot of changes that did change how partitions were handled. Probably some sort of unexpected side-effect. I wonder if it is some sort of module loading order issue and/or build-in vs module for one or more of the critical drives in the chain. On Fri, Jul 22, 2022 at 5:11 AM Nix <nix@esperi.org.uk> wrote: > > On 20 Jul 2022, Wols Lists outgrape: > > > On 20/07/2022 16:55, Nix wrote: > >> [ 9.833720] md: md126 stopped. > >> [ 9.847327] md/raid:md126: device sda4 operational as raid disk 0 > >> [ 9.857837] md/raid:md126: device sdf4 operational as raid disk 4 > >> [ 9.868167] md/raid:md126: device sdd4 operational as raid disk 3 > >> [ 9.878245] md/raid:md126: device sdc4 operational as raid disk 2 > >> [ 9.887941] md/raid:md126: device sdb4 operational as raid disk 1 > >> [ 9.897551] md/raid:md126: raid level 6 active with 5 out of 5 devices, algorithm 2 > >> [ 9.925899] md126: detected capacity change from 0 to 14520041472 > > > > Hmm. > > > > Most of that looks perfectly normal to me. The only oddity, to my eyes, is that md126 is stopped before the disks become > > operational. That could be perfectly okay, it could be down to a bug, whatever whatever. > > Yeah this is the *working* boot. I can't easily get logs of the > non-working one because, well, no writable filesystems and most of the > interesting stuff scrolls straight off the screen anyway. (It's mostly > for comparison with the non-working boot once I manage to capture that. > Somehow. A high-speed camera on video mode and hand-transcribing? Uggh.) > > -- > NULL && (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-22 11:58 ` Roger Heflin @ 2022-09-29 12:41 ` Nix 2022-09-29 14:24 ` Roger Heflin 0 siblings, 1 reply; 25+ messages in thread From: Nix @ 2022-09-29 12:41 UTC (permalink / raw) To: Roger Heflin; +Cc: Wols Lists, Linux RAID On 22 Jul 2022, Roger Heflin verbalised: > On Fri, Jul 22, 2022 at 5:11 AM Nix <nix@esperi.org.uk> wrote: >> >> On 20 Jul 2022, Wols Lists outgrape: >> >> > On 20/07/2022 16:55, Nix wrote: >> >> [ 9.833720] md: md126 stopped. >> >> [ 9.847327] md/raid:md126: device sda4 operational as raid disk 0 >> >> [ 9.857837] md/raid:md126: device sdf4 operational as raid disk 4 >> >> [ 9.868167] md/raid:md126: device sdd4 operational as raid disk 3 >> >> [ 9.878245] md/raid:md126: device sdc4 operational as raid disk 2 >> >> [ 9.887941] md/raid:md126: device sdb4 operational as raid disk 1 >> >> [ 9.897551] md/raid:md126: raid level 6 active with 5 out of 5 devices, algorithm 2 >> >> [ 9.925899] md126: detected capacity change from 0 to 14520041472 >> > >> > Hmm. >> > >> > Most of that looks perfectly normal to me. The only oddity, to my eyes, is that md126 is stopped before the disks become >> > operational. That could be perfectly okay, it could be down to a bug, whatever whatever. >> >> Yeah this is the *working* boot. I can't easily get logs of the >> non-working one because, well, no writable filesystems and most of the >> interesting stuff scrolls straight off the screen anyway. (It's mostly >> for comparison with the non-working boot once I manage to capture that. >> Somehow. A high-speed camera on video mode and hand-transcribing? Uggh.) > > if you find the partitions missing if you initrd has kpartx on it that > will create the mappings. > > kpartx -av <device> I may have to fall back to that, but the system is supposed to be doing this for me dammit! :) The initrd is using busybox 1.30.1 mdev and mdadm 4.0 both linked against musl -- if this has suddenly broken, I suspect a lot of udevs have similarly broken. But these are both old, upgraded only when essential to avoid breaking stuff critical for boot (hah!): upgrading all of these is on the cards to make sure it's not something fixed in the userspace tools... (Not been rebooting because of lots of time away from home: now not rebooting because I've got probable flu and can't face it. But once that's over, I'll attack this.) > I wonder if it is some sort of module loading order issue and/or > build-in vs module for one or more of the critical drives in the > chain. Definitely not! This kernel is almost totally non-modular: compiler@loom 126 /usr/src/boost% cat /proc/modules vfat 20480 1 - Live 0xffffffffc0176000 fat 73728 1 vfat, Live 0xffffffffc015c000 That's *it* for the currently loaded modules (those are probably loaded because I built a test kernel and had to mount the EFI boot fs to install it, which is not needed during normal boots because the initramfs is linked into the kernel image). -- NULL && (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-09-29 12:41 ` Nix @ 2022-09-29 14:24 ` Roger Heflin 0 siblings, 0 replies; 25+ messages in thread From: Roger Heflin @ 2022-09-29 14:24 UTC (permalink / raw) To: Nix; +Cc: Wols Lists, Linux RAID partprobe will also recreate as needed the partition mappings. A serial port crossover cable (assuming your machine still has a serial port and you have another machine close by and a serial port and/or usb serial cable) can collect all of the console if console=ttyS0,115200 is set on the boot line (S0 = com1, s1=com2,...) Another option would be to use a fat/vfat formatted usb key and save it on that. It does not actually matter if the initramfs is built into the kernel image or not, grub is what loads both the kernel and initrd into memory and then tells the kernel to execute. Once the kernel/ramfs is loaded you don't actually even need to be able to mount /boot and /boot/efi except to update the kernel and/or change parameters stored in /boot or /boot/efi. On Thu, Sep 29, 2022 at 7:41 AM Nix <nix@esperi.org.uk> wrote: > > On 22 Jul 2022, Roger Heflin verbalised: > > > On Fri, Jul 22, 2022 at 5:11 AM Nix <nix@esperi.org.uk> wrote: > >> > >> On 20 Jul 2022, Wols Lists outgrape: > >> > >> > On 20/07/2022 16:55, Nix wrote: > >> >> [ 9.833720] md: md126 stopped. > >> >> [ 9.847327] md/raid:md126: device sda4 operational as raid disk 0 > >> >> [ 9.857837] md/raid:md126: device sdf4 operational as raid disk 4 > >> >> [ 9.868167] md/raid:md126: device sdd4 operational as raid disk 3 > >> >> [ 9.878245] md/raid:md126: device sdc4 operational as raid disk 2 > >> >> [ 9.887941] md/raid:md126: device sdb4 operational as raid disk 1 > >> >> [ 9.897551] md/raid:md126: raid level 6 active with 5 out of 5 devices, algorithm 2 > >> >> [ 9.925899] md126: detected capacity change from 0 to 14520041472 > >> > > >> > Hmm. > >> > > >> > Most of that looks perfectly normal to me. The only oddity, to my eyes, is that md126 is stopped before the disks become > >> > operational. That could be perfectly okay, it could be down to a bug, whatever whatever. > >> > >> Yeah this is the *working* boot. I can't easily get logs of the > >> non-working one because, well, no writable filesystems and most of the > >> interesting stuff scrolls straight off the screen anyway. (It's mostly > >> for comparison with the non-working boot once I manage to capture that. > >> Somehow. A high-speed camera on video mode and hand-transcribing? Uggh.) > > > > if you find the partitions missing if you initrd has kpartx on it that > > will create the mappings. > > > > kpartx -av <device> > > I may have to fall back to that, but the system is supposed to be doing > this for me dammit! :) > > The initrd is using busybox 1.30.1 mdev and mdadm 4.0 both linked > against musl -- if this has suddenly broken, I suspect a lot of udevs > have similarly broken. But these are both old, upgraded only when > essential to avoid breaking stuff critical for boot (hah!): upgrading > all of these is on the cards to make sure it's not something fixed in > the userspace tools... > > (Not been rebooting because of lots of time away from home: now not > rebooting because I've got probable flu and can't face it. But once > that's over, I'll attack this.) > > > I wonder if it is some sort of module loading order issue and/or > > build-in vs module for one or more of the critical drives in the > > chain. > > Definitely not! This kernel is almost totally non-modular: > > compiler@loom 126 /usr/src/boost% cat /proc/modules > vfat 20480 1 - Live 0xffffffffc0176000 > fat 73728 1 vfat, Live 0xffffffffc015c000 > > That's *it* for the currently loaded modules (those are probably loaded > because I built a test kernel and had to mount the EFI boot fs to > install it, which is not needed during normal boots because the > initramfs is linked into the kernel image). > > -- > NULL && (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-18 12:20 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure Nix 2022-07-18 13:17 ` Wols Lists @ 2022-07-18 15:55 ` Roger Heflin 2022-07-20 16:18 ` Nix 2022-07-19 7:00 ` Guoqing Jiang 2 siblings, 1 reply; 25+ messages in thread From: Roger Heflin @ 2022-07-18 15:55 UTC (permalink / raw) To: Nix; +Cc: Linux RAID Did it drop you into the dracut shell (since you do not have scroll-back this seem the case? or the did the machine fully boot up and simply not find the arrays? If it dropped you to the dracut shell, add nofail on the fstab filesystem entries for the raids so it will let you boot up and debug. Also make sure you don't have rd_lvm_(vg|lv)= set to the devices used for the raid on the kernel command like (this will also drop you to dracut). I have done that on my main server, the goal is to avoid the no-network/no-logging dracut shell if at all possible so it can be debugged on the network. If it is inside the dracut shell it sounds like something in the initramfs might be missing. I have seen a dracut (re)build "fail" to determining that a device driver is required and not include it in the initramfs, and/or have seen the driver name change and the new kernel and dracut not find the new name. If it is this then building a hostonly=no (include all drivers) would likely make it work for the immediate future. I have also seen newer versions of software stacks/kernels create/ignore underlying partitions that worked on older versions (ie a partition on a device that has the data also--sometimes a partitioned partition). Newer version have suddenly saw that /dev/sda1 was partitioned and created a /dev/sda1p1 and "hidden" sda1 from scanning causing LVM not not find pvs. I have also seen where the in-use/data device was /dev/sda1p1 and an update broke partitioning a partition so only showed /dev/sda1, and so no longer sees the devices. On Mon, Jul 18, 2022 at 8:16 AM Nix <nix@esperi.org.uk> wrote: > > So I have a pair of RAID-6 mdraid arrays on this machine (one of which > has a bcache layered on top of it, with an LVM VG stretched across > both). Kernel 5.16 + mdadm 4.0 (I know, it's old) works fine, but I just > rebooted into 5.18.12 and it failed to assemble. mdadm didn't display > anything useful: an mdadm --assemble --scan --auto=md --freeze-reshape > simply didn't find anything to assemble, and after that nothing else was > going to work. But rebooting into 5.16 worked fine, so everything was > (thank goodness) actually still there. > > Alas I can't say what the state of the blockdevs was (other than that > they all seemed to be in /dev, and I'm using DEVICE partitions so they > should all have been spotted) or anything else about the boot because > console scrollback is still a nonexistent thing (as far as I can tell), > it scrolls past too fast for me to video it, and I can't use netconsole > because this is the NFS and loghost server for the local network so all > the other machines are more or less frozen waiting for NFS to come back. > > Any suggestions for getting more useful info out of this thing? I > suppose I could get a spare laptop and set it up to run as a netconsole > server for this one boot... but even that won't tell me what's going on > if the error (if any) is reported by some userspace process rather than > in the kernel message log. > > I'll do some mdadm --examine's and look at /proc/partitions next time I > try booting (which won't be before this weekend), but I'd be fairly > surprised if mdadm itself was at fault, even though it's the failing > component and it's old, unless the kernel upgrade has tripped some bug > in 4.0 -- or perhaps 4.0 built against a fairly old musl: I haven't even > recompiled it since 2019. So this looks like something in the blockdev > layer, which at this stage in booting is purely libata-based. (There is > an SSD on the machine, but it's used as a bcache cache device and for > XFS journals, both of which are at layers below mdadm so can't possibly > be involved in this.) > > -- > NULL && (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-18 15:55 ` Roger Heflin @ 2022-07-20 16:18 ` Nix 0 siblings, 0 replies; 25+ messages in thread From: Nix @ 2022-07-20 16:18 UTC (permalink / raw) To: Roger Heflin; +Cc: Linux RAID On 18 Jul 2022, Roger Heflin told this: Oh I was hoping you'd weigh in :) > Did it drop you into the dracut shell (since you do not have > scroll-back this seem the case? or the did the machine fully boot up > and simply not find the arrays? Well, I'm not using dracut, which is a horrifically complex failure-prone nightmare as you note: I wrote my own (very simple) early init script, which failed as it is designed to do when mdadm doesn't assemble any arrays and dropped me into an emergency ash shell (statically linked against musl). As a result I can be absolutely certain that nothing has changed or been rebuilt in early init since I built my last working kernel. (It's all assembled into an initramfs by the in-kernel automated assembly stuff under the usr/ subdirectory in the kernel source tree.) Having the initramfs linked into the kernel is *such a good thing* in situations like this: I can be absolutely certain that as long as the data on disk is not fubared nothing can possibly have messed up the initramfs or early boot in general after the kernel is linked, because nothing can change it at all. :) (I just diffed both initramfses from the working and non-working kernels: the one in the running kernel I keep mounted under /run/initramfs after boot is over because it also gets used during late shutdown, and the one from the new broken one is still in the cpio archive in the build tree, so this was easy. They're absolutely identical.) > If it dropped you to the dracut shell, add nofail on the fstab > filesystem entries for the raids so it will let you boot up and debug. Well... the rootfs is *on* the raid, so that's not going to work. (Actually, it's under a raid -> bcache -> lvm stack, with one of two raid arrays bcached and the lvm stretching across both of them. If only the raid had come up, I have a rescue fs on the non-bcached half of the lvm so I could have assembled it in degraded mode and booted from that. But it didn't, so I couldn't.) Assembly does this: /sbin/mdadm --assemble --scan --auto=md --freeze-reshape The initramfs includes this mdadm.conf: DEVICE partitions ARRAY /dev/md/transient UUID=28f4c81c:f44742ea:89d4df21:6aea852b ARRAY /dev/md/slow UUID=a35c9c54:bcdbff37:4f18163e:a93e9aa2 ARRAY /dev/md/fast UUID=4eb6bf4e:7458f1f1:d05bdfe4:6d38ca23 MAILADDR postmaster@esperi.org.uk (which is again identical on the working kernels.) > If it is inside the dracut shell it sounds like something in the > initramfs might be missing. Definitely not, thank goodness. > I have seen a dracut (re)build "fail" to > determining that a device driver is required and not include it in the > initramfs, and/or have seen the driver name change and the new kernel Oh yes: this is one of many horrible things about dracut. It assumes the new kernel is similar enough to the running one that it can use the one to configure the other. This is extremely not true when (as this machine is) it's building kernels for *other* machines in containers :) but the thing that failed was the machine's native kernel. (It is almost non-modular. I only have fat and vfat modules loaded right now: everything else is built in.) > I have also seen newer versions of software stacks/kernels > create/ignore underlying partitions that worked on older versions (ie > a partition on a device that has the data also--sometimes a > partitioned partition). I think I can be sure that only the kernel itself has changed here. I wonder if make oldconfig messed up and I lost libata or something? ... no, it's there. > Newer version have suddenly saw that /dev/sda1 was partitioned and > created a /dev/sda1p1 and "hidden" sda1 from scanning causing LVM not > not find pvs. > > I have also seen where the in-use/data device was /dev/sda1p1 and an > update broke partitioning a partition so only showed /dev/sda1, and so > no longer sees the devices. Ugh. That would break things indeed! This is all atop GPT... CONFIG_EFI_PARTITION=y no it's still there. I doubt anyone can conclude anything until I collect more info. This bug report is mostly useless for a reason! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-18 12:20 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure Nix 2022-07-18 13:17 ` Wols Lists 2022-07-18 15:55 ` Roger Heflin @ 2022-07-19 7:00 ` Guoqing Jiang 2022-07-20 16:35 ` Nix 2 siblings, 1 reply; 25+ messages in thread From: Guoqing Jiang @ 2022-07-19 7:00 UTC (permalink / raw) To: Nix, linux-raid On 7/18/22 8:20 PM, Nix wrote: > So I have a pair of RAID-6 mdraid arrays on this machine (one of which > has a bcache layered on top of it, with an LVM VG stretched across > both). Kernel 5.16 + mdadm 4.0 (I know, it's old) works fine, but I just > rebooted into 5.18.12 and it failed to assemble. mdadm didn't display > anything useful: an mdadm --assemble --scan --auto=md --freeze-reshape > simply didn't find anything to assemble, and after that nothing else was > going to work. But rebooting into 5.16 worked fine, so everything was > (thank goodness) actually still there. > > Alas I can't say what the state of the blockdevs was (other than that > they all seemed to be in /dev, and I'm using DEVICE partitions so they > should all have been spotte I suppose the array was built on top of partitions, then my wild guess is the problem is caused by the change in block layer (1ebe2e5f9d68?), maybe we need something similar in loop driver per b9684a71. diff --git a/drivers/md/md.c b/drivers/md/md.c index c7ecb0bffda0..e5f2e55cb86a 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -5700,6 +5700,7 @@ static int md_alloc(dev_t dev, char *name) mddev->queue = disk->queue; blk_set_stacking_limits(&mddev->queue->limits); blk_queue_write_cache(mddev->queue, true, true); + set_bit(GD_SUPPRESS_PART_SCAN, &disk->state); disk->events |= DISK_EVENT_MEDIA_CHANGE; mddev->gendisk = disk; error = add_disk(disk); Thanks, Guoqing ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-19 7:00 ` Guoqing Jiang @ 2022-07-20 16:35 ` Nix 2022-07-20 19:50 ` Roger Heflin 0 siblings, 1 reply; 25+ messages in thread From: Nix @ 2022-07-20 16:35 UTC (permalink / raw) To: Guoqing Jiang; +Cc: linux-raid On 19 Jul 2022, Guoqing Jiang spake thusly: > On 7/18/22 8:20 PM, Nix wrote: >> So I have a pair of RAID-6 mdraid arrays on this machine (one of which >> has a bcache layered on top of it, with an LVM VG stretched across >> both). Kernel 5.16 + mdadm 4.0 (I know, it's old) works fine, but I just >> rebooted into 5.18.12 and it failed to assemble. mdadm didn't display >> anything useful: an mdadm --assemble --scan --auto=md --freeze-reshape >> simply didn't find anything to assemble, and after that nothing else was >> going to work. But rebooting into 5.16 worked fine, so everything was >> (thank goodness) actually still there. >> >> Alas I can't say what the state of the blockdevs was (other than that >> they all seemed to be in /dev, and I'm using DEVICE partitions so they >> should all have been spotte > > I suppose the array was built on top of partitions, then my wild guess is > the problem is caused by the change in block layer (1ebe2e5f9d68?), > maybe we need something similar in loop driver per b9684a71. > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index c7ecb0bffda0..e5f2e55cb86a 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -5700,6 +5700,7 @@ static int md_alloc(dev_t dev, char *name) > mddev->queue = disk->queue; > blk_set_stacking_limits(&mddev->queue->limits); > blk_queue_write_cache(mddev->queue, true, true); > + set_bit(GD_SUPPRESS_PART_SCAN, &disk->state); > disk->events |= DISK_EVENT_MEDIA_CHANGE; > mddev->gendisk = disk; > error = add_disk(disk); I'll give it a try. But... the arrays, fully assembled: Personalities : [raid0] [raid6] [raid5] [raid4] md125 : active raid6 sda3[0] sdf3[5] sdd3[4] sdc3[2] sdb3[1] 15391689216 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU] md126 : active raid6 sda4[0] sdf4[5] sdd4[4] sdc4[2] sdb4[1] 7260020736 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU] bitmap: 0/2 pages [0KB], 1048576KB chunk md127 : active raid0 sda2[0] sdf2[5] sdd2[3] sdc2[2] sdb2[1] 1310064640 blocks super 1.2 512k chunks unused devices: <none> so they are on top of partitions. I'm not sure suppressing a partition scan will help... but maybe I misunderstand. -- NULL && (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-20 16:35 ` Nix @ 2022-07-20 19:50 ` Roger Heflin 2022-07-22 9:57 ` Nix 0 siblings, 1 reply; 25+ messages in thread From: Roger Heflin @ 2022-07-20 19:50 UTC (permalink / raw) To: Nix; +Cc: Guoqing Jiang, Linux RAID try a fdisk -l /dev/sda4 (to see if there is a partition on the partition). That breaking stuff comes and goes. So long as it does not show starts and stops you are ok. It will look like this, if you are doing all of the work on your disk then the mistake was probably not made. In the below you could have an LVM device on sdfe1 (2nd block, or a md-raid device) that the existence of the partition table hides. And if the sdfe1p1 is found and configured then it blocks/hides anything on sdfe1, and that depends on kernel scanning for partitions and userspace tools scanning for partitions fdisk -l /dev/sdfe Disk /dev/sdfe: 128.8 GB, 128849018880 bytes, 251658240 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 16384 bytes / 16777216 bytes Disk label type: dos Disk identifier: 0xxxxxx Device Boot Start End Blocks Id System /dev/sdfe1 32768 251658239 125812736 83 Linux 08:34 PM # fdisk -l /dev/sdfe1 Disk /dev/sdfe1: 128.8 GB, 128832241664 bytes, 251625472 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 16384 bytes / 16777216 bytes Disk label type: dos Disk identifier: 0xxxxxxx Device Boot Start End Blocks Id System /dev/sdfe1p1 32768 251625471 125796352 8e Linux LVM My other though was that maybe some change caused the partition type to start get used for something and if the type was wrong then ignore it. you might try a file -s /dev/sde1 against each partition that should have mdadm and make sure it says mdadm and that there is not some other header confusing the issue. I tried on some of mine and some of my working mdadm's devices report weird things. On Wed, Jul 20, 2022 at 12:31 PM Nix <nix@esperi.org.uk> wrote: > > On 19 Jul 2022, Guoqing Jiang spake thusly: > > > On 7/18/22 8:20 PM, Nix wrote: > >> So I have a pair of RAID-6 mdraid arrays on this machine (one of which > >> has a bcache layered on top of it, with an LVM VG stretched across > >> both). Kernel 5.16 + mdadm 4.0 (I know, it's old) works fine, but I just > >> rebooted into 5.18.12 and it failed to assemble. mdadm didn't display > >> anything useful: an mdadm --assemble --scan --auto=md --freeze-reshape > >> simply didn't find anything to assemble, and after that nothing else was > >> going to work. But rebooting into 5.16 worked fine, so everything was > >> (thank goodness) actually still there. > >> > >> Alas I can't say what the state of the blockdevs was (other than that > >> they all seemed to be in /dev, and I'm using DEVICE partitions so they > >> should all have been spotte > > > > I suppose the array was built on top of partitions, then my wild guess is > > the problem is caused by the change in block layer (1ebe2e5f9d68?), > > maybe we need something similar in loop driver per b9684a71. > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index c7ecb0bffda0..e5f2e55cb86a 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -5700,6 +5700,7 @@ static int md_alloc(dev_t dev, char *name) > > mddev->queue = disk->queue; > > blk_set_stacking_limits(&mddev->queue->limits); > > blk_queue_write_cache(mddev->queue, true, true); > > + set_bit(GD_SUPPRESS_PART_SCAN, &disk->state); > > disk->events |= DISK_EVENT_MEDIA_CHANGE; > > mddev->gendisk = disk; > > error = add_disk(disk); > > I'll give it a try. But... the arrays, fully assembled: > > Personalities : [raid0] [raid6] [raid5] [raid4] > md125 : active raid6 sda3[0] sdf3[5] sdd3[4] sdc3[2] sdb3[1] > 15391689216 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU] > > md126 : active raid6 sda4[0] sdf4[5] sdd4[4] sdc4[2] sdb4[1] > 7260020736 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU] > bitmap: 0/2 pages [0KB], 1048576KB chunk > > md127 : active raid0 sda2[0] sdf2[5] sdd2[3] sdc2[2] sdb2[1] > 1310064640 blocks super 1.2 512k chunks > > unused devices: <none> > > so they are on top of partitions. I'm not sure suppressing a partition > scan will help... but maybe I misunderstand. > > -- > NULL && (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-20 19:50 ` Roger Heflin @ 2022-07-22 9:57 ` Nix 2022-07-22 11:30 ` Wols Lists 0 siblings, 1 reply; 25+ messages in thread From: Nix @ 2022-07-22 9:57 UTC (permalink / raw) To: Roger Heflin; +Cc: Guoqing Jiang, Linux RAID On 20 Jul 2022, Roger Heflin verbalised: > try a fdisk -l /dev/sda4 (to see if there is a partition on the > partition). That breaking stuff comes and goes. ... partitions come and go? :) But no this was a blank disk before it was set up: there is no wreckage of old stuff, and I ran wipefs before doing anything else anyway. loom:~# blkid /dev/sda4 /dev/sda4: UUID="a35c9c54-bcdb-ff37-4f18-163ea93e9aa2" UUID_SUB="c262175d-09a1-1bc9-98d1-06dc5b18178c" LABEL="loom:slow" TYPE="linux_raid_member" PARTUUID="476279d8-7ea6-46dc-a7a4-8912267cf1b1" loom:~# sfdisk -l /dev/sda4 Disk /dev/sda4: 2.25 TiB, 2478221630976 bytes, 4840276623 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes > It will look like this, if you are doing all of the work on your disk > then the mistake was probably not made. Yeah. Also this has been assembling fine since 2017 :) > In the below you could have an LVM device on sdfe1 (2nd block, or a > md-raid device) that the existence of the partition table hides. Except for an SSD, the disks are laid out identically (as in, I did it with scripting so I can be sure that nothing else happened to them, no stray wreckage of old filesystems or anything): Disk /dev/sda: 7.28 TiB, 8001563222016 bytes, 15628053168 sectors Disk model: ST8000NM0055-1RM Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 14CB9BBE-BA31-4A1B-8643-7DE7A7EC2946 Device Start End Sectors Size Type /dev/sda1 2048 2099199 2097152 1G EFI System /dev/sda2 2099200 526387199 524288000 250G Linux RAID /dev/sda3 526387200 10787775999 10261388800 4.8T Linux RAID /dev/sda4 10787776512 15628053134 4840276623 2.3T Linux RAID The layout is as follows (adjusted lsblk output): NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sd[abcdf] 8:0 0 7.3T 0 disk ├─sd[abcdf]1 8:1 0 1G 0 part /boot ├─sd[abcdf]2 8:2 0 250G 0 part │ └─md127 9:127 0 1.2T 0 raid0 /.transient (bind-mounted to many places) ├─sd[abcdf]3 8:3 0 4.8T 0 part │ └─md125 9:125 0 14.3T 0 raid6 │ └─bcache0 253:0 0 14.3T 0 disk │ ├─main-root 252:5 0 4T 0 lvm (xfs; /, etc) │ ├─main-workcrypt 252:8 0 2T 0 lvm (work disk) │ │ └─workcrypt-plain 252:14 0 2T 0 crypt (xfs) │ └─main-steam 252:11 0 1T 0 lvm (ext4) └─sd[abcdf]4 8:4 0 2.3T 0 part └─md126 9:126 0 6.8T 0 raid6 ├─main-archive 252:0 0 3T 0 lvm /usr/archive ├─main-swap 252:1 0 40G 0 lvm [SWAP] ├─main-vms 252:2 0 1T 0 lvm /vm ├─main-phones 252:3 0 50G 0 lvm ├─main-private 252:4 0 100G 0 lvm ├─main-workmail 252:6 0 10G 0 lvm ├─main-music 252:7 0 2.3T 0 lvm ├─main-unifi 252:9 0 10G 0 lvm └─main-rescue 252:10 0 100G 0 lvm sde 8:64 0 447.1G 0 disk (.5TiB SSD) ├─sde1 8:65 0 32G 0 part (currently unused) ├─sde2 8:66 0 340G 0 part (bcache cache device) ├─sde3 8:67 0 2G 0 part (xfs journal for main-root) └─sde4 8:68 0 2G 0 part (xfs journal for main-workcrypt) There is one LVM VG, stretching across md125 and md126, with LVs positioned on one PV or the other (for now, until I run short of space and have to start being less selective!). But this is all a bit academic since none of these layers can come up in the absence of the RAID array :) (The layer underneath RAID is just SATA and libata, nice and simple and nothing to go wrong -- I thought.) > And if the sdfe1p1 is found and configured then it blocks/hides > anything on sdfe1, and that depends on kernel scanning for partitions > and userspace tools scanning for partitions True! And if that isn't happening all hell breaks loose. I thought the kernel would have found partitions automatically at boot, before userspace even starts, but I'll admit I didn't check. Something else to look at at the next trial boot. > My other though was that maybe some change caused the partition type > to start get used for something and if the type was wrong then ignore > it. I thought all the work done to assemble raid arrays was done by mdadm? Because that didn't change. Does the kernel md layer also get to say "type wrong, go away"? EW. I'd hope nothing is looking at partition types these days... > you might try a file -s /dev/sde1 against each partition that should > have mdadm and make sure it says mdadm and that there is not some > other header confusing the issue. Ooh I didn't think of that at all! ... looks good, uuids match: loom:~# file -s /dev/sd[afdcb]3 /dev/sda3: Linux Software RAID version 1.2 (1) UUID=4eb6bf4e:7458f1f1:d05bdfe4:6d38ca23 name=loom:fast level=6 disks=5 /dev/sdb3: Linux Software RAID version 1.2 (1) UUID=4eb6bf4e:7458f1f1:d05bdfe4:6d38ca23 name=loom:fast level=6 disks=5 /dev/sdc3: Linux Software RAID version 1.2 (1) UUID=4eb6bf4e:7458f1f1:d05bdfe4:6d38ca23 name=loom:fast level=6 disks=5 /dev/sdd3: Linux Software RAID version 1.2 (1) UUID=4eb6bf4e:7458f1f1:d05bdfe4:6d38ca23 name=loom:fast level=6 disks=5 /dev/sdf3: Linux Software RAID version 1.2 (1) UUID=4eb6bf4e:7458f1f1:d05bdfe4:6d38ca23 name=loom:fast level=6 disks=5 loom:~# file -s /dev/sd[afdcb]4 /dev/sda4: Linux Software RAID version 1.2 (1) UUID=a35c9c54:bcdbff37:4f18163e:a93e9aa2 name=loom:slow level=6 disks=5 /dev/sdb4: Linux Software RAID version 1.2 (1) UUID=a35c9c54:bcdbff37:4f18163e:a93e9aa2 name=loom:slow level=6 disks=5 /dev/sdc4: Linux Software RAID version 1.2 (1) UUID=a35c9c54:bcdbff37:4f18163e:a93e9aa2 name=loom:slow level=6 disks=5 /dev/sdd4: Linux Software RAID version 1.2 (1) UUID=a35c9c54:bcdbff37:4f18163e:a93e9aa2 name=loom:slow level=6 disks=5 /dev/sdf4: Linux Software RAID version 1.2 (1) UUID=a35c9c54:bcdbff37:4f18163e:a93e9aa2 name=loom:slow level=6 disks=5 loom:~# file -s /dev/sd[afdcb]2 /dev/sda2: Linux Software RAID version 1.2 (1) UUID=28f4c81c:f44742ea:89d4df21:6aea852b name=loom:transient level=0 disks=5 /dev/sdb2: Linux Software RAID version 1.2 (1) UUID=28f4c81c:f44742ea:89d4df21:6aea852b name=loom:transient level=0 disks=5 /dev/sdc2: Linux Software RAID version 1.2 (1) UUID=28f4c81c:f44742ea:89d4df21:6aea852b name=loom:transient level=0 disks=5 /dev/sdd2: Linux Software RAID version 1.2 (1) UUID=28f4c81c:f44742ea:89d4df21:6aea852b name=loom:transient level=0 disks=5 /dev/sdf2: Linux Software RAID version 1.2 (1) UUID=28f4c81c:f44742ea:89d4df21:6aea852b name=loom:transient level=0 disks=5 More and more this is looking like a blockdev and probably partition discovery issue. Roll on Saturday when I can look at this again. -- NULL && (void) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-22 9:57 ` Nix @ 2022-07-22 11:30 ` Wols Lists 2022-07-22 14:59 ` Nix 0 siblings, 1 reply; 25+ messages in thread From: Wols Lists @ 2022-07-22 11:30 UTC (permalink / raw) To: Nix, Roger Heflin; +Cc: Guoqing Jiang, Linux RAID On 22/07/2022 10:57, Nix wrote: > I thought all the work done to assemble raid arrays was done by mdadm? > Because that didn't change. Does the kernel md layer also get to say > "type wrong, go away"? EW. I'd hope nothing is looking at partition > types these days... As far as I know (which is probably the same as you :-) the kernel knows nothing about the v1 superblock format, so raid assembly *must* be done by mdadm. That's why, despite it being obsolete, people get upset when there's any mention of 0.9 going away, because the kernel DOES recognise it and can assemble those arrays. Cheers, Wol ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure 2022-07-22 11:30 ` Wols Lists @ 2022-07-22 14:59 ` Nix 0 siblings, 0 replies; 25+ messages in thread From: Nix @ 2022-07-22 14:59 UTC (permalink / raw) To: Wols Lists; +Cc: Roger Heflin, Guoqing Jiang, Linux RAID On 22 Jul 2022, Wols Lists spake thusly: > On 22/07/2022 10:57, Nix wrote: >> I thought all the work done to assemble raid arrays was done by mdadm? >> Because that didn't change. Does the kernel md layer also get to say >> "type wrong, go away"? EW. I'd hope nothing is looking at partition >> types these days... > > As far as I know (which is probably the same as you :-) the kernel knows nothing about the v1 superblock format, so raid assembly > *must* be done by mdadm. > > That's why, despite it being obsolete, people get upset when there's any mention of 0.9 going away, because the kernel DOES > recognise it and can assemble those arrays. Right. These are all v1.2, e.g. for one of them: /dev/md125: Version : 1.2 Creation Time : Mon Apr 10 10:42:31 2017 Raid Level : raid6 Array Size : 15391689216 (14678.66 GiB 15761.09 GB) Used Dev Size : 5130563072 (4892.89 GiB 5253.70 GB) Raid Devices : 5 Total Devices : 5 Persistence : Superblock is persistent Update Time : Fri Jul 22 15:58:45 2022 State : active Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : loom:fast (local to host loom) UUID : 4eb6bf4e:7458f1f1:d05bdfe4:6d38ca23 Events : 51202 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 2 8 35 2 active sync /dev/sdc3 4 8 51 3 active sync /dev/sdd3 5 8 83 4 active sync /dev/sdf3 ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2022-09-29 14:25 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-18 12:20 5.18: likely useless very preliminary bug report: mdadm raid-6 boot-time assembly failure Nix 2022-07-18 13:17 ` Wols Lists 2022-07-19 9:17 ` Jani Partanen 2022-07-19 17:09 ` Wols Lists 2022-07-19 17:40 ` Roger Heflin 2022-07-19 18:10 ` Reindl Harald 2022-07-19 19:22 ` Wol 2022-07-19 20:01 ` Reindl Harald 2022-07-19 21:51 ` Wols Lists 2022-07-19 22:35 ` Jani Partanen 2022-07-20 12:33 ` Phil Turmel 2022-07-20 15:55 ` Nix 2022-07-20 18:32 ` Wols Lists 2022-07-22 9:41 ` Nix 2022-07-22 11:58 ` Roger Heflin 2022-09-29 12:41 ` Nix 2022-09-29 14:24 ` Roger Heflin 2022-07-18 15:55 ` Roger Heflin 2022-07-20 16:18 ` Nix 2022-07-19 7:00 ` Guoqing Jiang 2022-07-20 16:35 ` Nix 2022-07-20 19:50 ` Roger Heflin 2022-07-22 9:57 ` Nix 2022-07-22 11:30 ` Wols Lists 2022-07-22 14:59 ` Nix
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.