* Raid-6 won't boot @ 2020-03-26 17:07 Alexander Shenkin 2020-03-26 22:00 ` antlists 0 siblings, 1 reply; 22+ messages in thread From: Alexander Shenkin @ 2020-03-26 17:07 UTC (permalink / raw) To: Linux-RAID Hi all, I have an older (14.04) Ubuntu system with / mounted on a raid6 (/dev/md2) and /boot on a raid1 (/dev/md0). I recently added my 7th disk, and everything seemed to be going well. I added the partition to /dev/md0 and it resynced fine. I added the partition to /dev/md2, and everything seemed fine there as well. Then, while it was resyncing /dev/md2, the transfer speed got very slower, and started going slower. 7kb/sec... 4kb/sec... 1kb/sec... and then eventually the system just stopped responding. Now, on power up, i just get a cursor on the screen. I surely need to boot with a rescue disk of some sort, but from there, I'm not sure exactly when I should do. Any suggestions are very welcome! Thanks, Allie ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-26 17:07 Raid-6 won't boot Alexander Shenkin @ 2020-03-26 22:00 ` antlists 2020-03-27 15:27 ` Alexander Shenkin 0 siblings, 1 reply; 22+ messages in thread From: antlists @ 2020-03-26 22:00 UTC (permalink / raw) To: Alexander Shenkin, Linux-RAID On 26/03/2020 17:07, Alexander Shenkin wrote: > I surely need to boot with a rescue disk of some sort, but from there, > I'm not sure exactly when I should do. Any suggestions are very welcome! Okay. Find a liveCD that supports raid (hopefully something like SystemRescueCD). Make sure it has a very recent kernel and the latest mdadm. All being well, the resync will restart, and when it's finished your system will be fine. If it doesn't restart on its own, do an "mdadm --stop array", followed by an "mdadm --assemble" If that doesn't work, then https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn Cheers, Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-26 22:00 ` antlists @ 2020-03-27 15:27 ` Alexander Shenkin 2020-03-27 15:55 ` Roger Heflin 2020-03-28 10:47 ` antlists 0 siblings, 2 replies; 22+ messages in thread From: Alexander Shenkin @ 2020-03-27 15:27 UTC (permalink / raw) To: antlists, Linux-RAID Thanks Wol, Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are reported. The first (md126) in reported as inactive with all 7 disks listed as spares. The second (md127) is reported as active auto-read-only with all 7 disks operational. Also, the only "personality" reported is Raid1. I could go ahead with your suggestion of mdadm --stop array and then mdadm --assemble, but I thought the reporting of just the Raid1 personality was a bit strange, so wanted to check in before doing that... Thanks, Allie On 3/26/2020 10:00 PM, antlists wrote: > On 26/03/2020 17:07, Alexander Shenkin wrote: >> I surely need to boot with a rescue disk of some sort, but from there, >> I'm not sure exactly when I should do. Any suggestions are very welcome! > > Okay. Find a liveCD that supports raid (hopefully something like > SystemRescueCD). Make sure it has a very recent kernel and the latest > mdadm. > > All being well, the resync will restart, and when it's finished your > system will be fine. If it doesn't restart on its own, do an "mdadm > --stop array", followed by an "mdadm --assemble" > > If that doesn't work, then > > https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > > Cheers, > Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-27 15:27 ` Alexander Shenkin @ 2020-03-27 15:55 ` Roger Heflin 2020-03-28 7:47 ` Alexander Shenkin 2020-03-28 10:47 ` antlists 1 sibling, 1 reply; 22+ messages in thread From: Roger Heflin @ 2020-03-27 15:55 UTC (permalink / raw) To: Alexander Shenkin; +Cc: antlists, Linux-RAID A non-assembled array always reports raid1. I would run "dmesg | grep md126" to start with and see what it reports it saw. On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: > > Thanks Wol, > > Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are > reported. The first (md126) in reported as inactive with all 7 disks > listed as spares. The second (md127) is reported as active > auto-read-only with all 7 disks operational. Also, the only > "personality" reported is Raid1. I could go ahead with your suggestion > of mdadm --stop array and then mdadm --assemble, but I thought the > reporting of just the Raid1 personality was a bit strange, so wanted to > check in before doing that... > > Thanks, > Allie > > On 3/26/2020 10:00 PM, antlists wrote: > > On 26/03/2020 17:07, Alexander Shenkin wrote: > >> I surely need to boot with a rescue disk of some sort, but from there, > >> I'm not sure exactly when I should do. Any suggestions are very welcome! > > > > Okay. Find a liveCD that supports raid (hopefully something like > > SystemRescueCD). Make sure it has a very recent kernel and the latest > > mdadm. > > > > All being well, the resync will restart, and when it's finished your > > system will be fine. If it doesn't restart on its own, do an "mdadm > > --stop array", followed by an "mdadm --assemble" > > > > If that doesn't work, then > > > > https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > > > > Cheers, > > Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-27 15:55 ` Roger Heflin @ 2020-03-28 7:47 ` Alexander Shenkin 2020-03-28 13:36 ` Roger Heflin 0 siblings, 1 reply; 22+ messages in thread From: Alexander Shenkin @ 2020-03-28 7:47 UTC (permalink / raw) To: Roger Heflin; +Cc: antlists, Linux-RAID Thanks Roger. dmesg has nothing in it referring to md126 or md127.... any other thoughts on how to investigate? thanks, allie On 3/27/2020 3:55 PM, Roger Heflin wrote: > A non-assembled array always reports raid1. > > I would run "dmesg | grep md126" to start with and see what it reports it saw. > > On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: >> >> Thanks Wol, >> >> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are >> reported. The first (md126) in reported as inactive with all 7 disks >> listed as spares. The second (md127) is reported as active >> auto-read-only with all 7 disks operational. Also, the only >> "personality" reported is Raid1. I could go ahead with your suggestion >> of mdadm --stop array and then mdadm --assemble, but I thought the >> reporting of just the Raid1 personality was a bit strange, so wanted to >> check in before doing that... >> >> Thanks, >> Allie >> >> On 3/26/2020 10:00 PM, antlists wrote: >>> On 26/03/2020 17:07, Alexander Shenkin wrote: >>>> I surely need to boot with a rescue disk of some sort, but from there, >>>> I'm not sure exactly when I should do. Any suggestions are very welcome! >>> >>> Okay. Find a liveCD that supports raid (hopefully something like >>> SystemRescueCD). Make sure it has a very recent kernel and the latest >>> mdadm. >>> >>> All being well, the resync will restart, and when it's finished your >>> system will be fine. If it doesn't restart on its own, do an "mdadm >>> --stop array", followed by an "mdadm --assemble" >>> >>> If that doesn't work, then >>> >>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >>> >>> Cheers, >>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-28 7:47 ` Alexander Shenkin @ 2020-03-28 13:36 ` Roger Heflin [not found] ` <c8185f80-837e-9654-ee19-611a030a0d54@shenkin.org> 0 siblings, 1 reply; 22+ messages in thread From: Roger Heflin @ 2020-03-28 13:36 UTC (permalink / raw) To: Alexander Shenkin; +Cc: antlists, Linux-RAID Try this grep: dmesg | grep "md/raid", if that returns nothing if you can just send the entire dmesg. On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: > > Thanks Roger. dmesg has nothing in it referring to md126 or md127.... > any other thoughts on how to investigate? > > thanks, > allie > > On 3/27/2020 3:55 PM, Roger Heflin wrote: > > A non-assembled array always reports raid1. > > > > I would run "dmesg | grep md126" to start with and see what it reports it saw. > > > > On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: > >> > >> Thanks Wol, > >> > >> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are > >> reported. The first (md126) in reported as inactive with all 7 disks > >> listed as spares. The second (md127) is reported as active > >> auto-read-only with all 7 disks operational. Also, the only > >> "personality" reported is Raid1. I could go ahead with your suggestion > >> of mdadm --stop array and then mdadm --assemble, but I thought the > >> reporting of just the Raid1 personality was a bit strange, so wanted to > >> check in before doing that... > >> > >> Thanks, > >> Allie > >> > >> On 3/26/2020 10:00 PM, antlists wrote: > >>> On 26/03/2020 17:07, Alexander Shenkin wrote: > >>>> I surely need to boot with a rescue disk of some sort, but from there, > >>>> I'm not sure exactly when I should do. Any suggestions are very welcome! > >>> > >>> Okay. Find a liveCD that supports raid (hopefully something like > >>> SystemRescueCD). Make sure it has a very recent kernel and the latest > >>> mdadm. > >>> > >>> All being well, the resync will restart, and when it's finished your > >>> system will be fine. If it doesn't restart on its own, do an "mdadm > >>> --stop array", followed by an "mdadm --assemble" > >>> > >>> If that doesn't work, then > >>> > >>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > >>> > >>> Cheers, > >>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <c8185f80-837e-9654-ee19-611a030a0d54@shenkin.org>]
* Re: Raid-6 won't boot [not found] ` <c8185f80-837e-9654-ee19-611a030a0d54@shenkin.org> @ 2020-03-30 15:53 ` Roger Heflin 2020-03-30 16:13 ` Alexander Shenkin 0 siblings, 1 reply; 22+ messages in thread From: Roger Heflin @ 2020-03-30 15:53 UTC (permalink / raw) To: Alexander Shenkin; +Cc: antlists, Linux-RAID That seems really odd. Is the raid456 module loaded? On mine I see messages like this for each disk it scanned and considered as maybe possibly being an array member. kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 and messages like this: md/raid:md14: not clean -- starting background reconstruction You might look at /etc/mdadm.conf on the rescue cd and see if it has a DEVICE line that limits what is being scanned. On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: > > Thanks Roger, > > that grep just returns the detection of the raid1 (md127). See dmesg > and mdadm --detail results attached. > > Many thanks, > allie > > On 3/28/2020 1:36 PM, Roger Heflin wrote: > > Try this grep: > > dmesg | grep "md/raid", if that returns nothing if you can just send > > the entire dmesg. > > > > On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: > >> > >> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... > >> any other thoughts on how to investigate? > >> > >> thanks, > >> allie > >> > >> On 3/27/2020 3:55 PM, Roger Heflin wrote: > >>> A non-assembled array always reports raid1. > >>> > >>> I would run "dmesg | grep md126" to start with and see what it reports it saw. > >>> > >>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>> > >>>> Thanks Wol, > >>>> > >>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are > >>>> reported. The first (md126) in reported as inactive with all 7 disks > >>>> listed as spares. The second (md127) is reported as active > >>>> auto-read-only with all 7 disks operational. Also, the only > >>>> "personality" reported is Raid1. I could go ahead with your suggestion > >>>> of mdadm --stop array and then mdadm --assemble, but I thought the > >>>> reporting of just the Raid1 personality was a bit strange, so wanted to > >>>> check in before doing that... > >>>> > >>>> Thanks, > >>>> Allie > >>>> > >>>> On 3/26/2020 10:00 PM, antlists wrote: > >>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: > >>>>>> I surely need to boot with a rescue disk of some sort, but from there, > >>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! > >>>>> > >>>>> Okay. Find a liveCD that supports raid (hopefully something like > >>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest > >>>>> mdadm. > >>>>> > >>>>> All being well, the resync will restart, and when it's finished your > >>>>> system will be fine. If it doesn't restart on its own, do an "mdadm > >>>>> --stop array", followed by an "mdadm --assemble" > >>>>> > >>>>> If that doesn't work, then > >>>>> > >>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > >>>>> > >>>>> Cheers, > >>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-30 15:53 ` Roger Heflin @ 2020-03-30 16:13 ` Alexander Shenkin 2020-03-30 17:21 ` Roger Heflin 0 siblings, 1 reply; 22+ messages in thread From: Alexander Shenkin @ 2020-03-30 16:13 UTC (permalink / raw) To: Roger Heflin; +Cc: antlists, Linux-RAID Thanks Roger, The only line that isn't commented out in /etc/mdadm.conf is "DEVICE partitions"... Thanks, Allie On 3/30/2020 4:53 PM, Roger Heflin wrote: > That seems really odd. Is the raid456 module loaded? > > On mine I see messages like this for each disk it scanned and > considered as maybe possibly being an array member. > kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 > and messages like this: > md/raid:md14: not clean -- starting background reconstruction > > You might look at /etc/mdadm.conf on the rescue cd and see if it has a > DEVICE line that limits what is being scanned. > > On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: >> Thanks Roger, >> >> that grep just returns the detection of the raid1 (md127). See dmesg >> and mdadm --detail results attached. >> >> Many thanks, >> allie >> >> On 3/28/2020 1:36 PM, Roger Heflin wrote: >>> Try this grep: >>> dmesg | grep "md/raid", if that returns nothing if you can just send >>> the entire dmesg. >>> >>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: >>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... >>>> any other thoughts on how to investigate? >>>> >>>> thanks, >>>> allie >>>> >>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: >>>>> A non-assembled array always reports raid1. >>>>> >>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. >>>>> >>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>> Thanks Wol, >>>>>> >>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are >>>>>> reported. The first (md126) in reported as inactive with all 7 disks >>>>>> listed as spares. The second (md127) is reported as active >>>>>> auto-read-only with all 7 disks operational. Also, the only >>>>>> "personality" reported is Raid1. I could go ahead with your suggestion >>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the >>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to >>>>>> check in before doing that... >>>>>> >>>>>> Thanks, >>>>>> Allie >>>>>> >>>>>> On 3/26/2020 10:00 PM, antlists wrote: >>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: >>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, >>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! >>>>>>> Okay. Find a liveCD that supports raid (hopefully something like >>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest >>>>>>> mdadm. >>>>>>> >>>>>>> All being well, the resync will restart, and when it's finished your >>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm >>>>>>> --stop array", followed by an "mdadm --assemble" >>>>>>> >>>>>>> If that doesn't work, then >>>>>>> >>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >>>>>>> >>>>>>> Cheers, >>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-30 16:13 ` Alexander Shenkin @ 2020-03-30 17:21 ` Roger Heflin 2020-03-30 20:05 ` Alexander Shenkin 0 siblings, 1 reply; 22+ messages in thread From: Roger Heflin @ 2020-03-30 17:21 UTC (permalink / raw) To: Alexander Shenkin; +Cc: antlists, Linux-RAID do this against each partition that had it: mdadm --examine /dev/sd*** It seems like it is not seeing it as a md-raid. On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: > > Thanks Roger, > > The only line that isn't commented out in /etc/mdadm.conf is "DEVICE > partitions"... > > Thanks, > > Allie > > On 3/30/2020 4:53 PM, Roger Heflin wrote: > > That seems really odd. Is the raid456 module loaded? > > > > On mine I see messages like this for each disk it scanned and > > considered as maybe possibly being an array member. > > kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 > > and messages like this: > > md/raid:md14: not clean -- starting background reconstruction > > > > You might look at /etc/mdadm.conf on the rescue cd and see if it has a > > DEVICE line that limits what is being scanned. > > > > On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >> Thanks Roger, > >> > >> that grep just returns the detection of the raid1 (md127). See dmesg > >> and mdadm --detail results attached. > >> > >> Many thanks, > >> allie > >> > >> On 3/28/2020 1:36 PM, Roger Heflin wrote: > >>> Try this grep: > >>> dmesg | grep "md/raid", if that returns nothing if you can just send > >>> the entire dmesg. > >>> > >>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... > >>>> any other thoughts on how to investigate? > >>>> > >>>> thanks, > >>>> allie > >>>> > >>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: > >>>>> A non-assembled array always reports raid1. > >>>>> > >>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. > >>>>> > >>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>> Thanks Wol, > >>>>>> > >>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are > >>>>>> reported. The first (md126) in reported as inactive with all 7 disks > >>>>>> listed as spares. The second (md127) is reported as active > >>>>>> auto-read-only with all 7 disks operational. Also, the only > >>>>>> "personality" reported is Raid1. I could go ahead with your suggestion > >>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the > >>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to > >>>>>> check in before doing that... > >>>>>> > >>>>>> Thanks, > >>>>>> Allie > >>>>>> > >>>>>> On 3/26/2020 10:00 PM, antlists wrote: > >>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: > >>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, > >>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! > >>>>>>> Okay. Find a liveCD that supports raid (hopefully something like > >>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest > >>>>>>> mdadm. > >>>>>>> > >>>>>>> All being well, the resync will restart, and when it's finished your > >>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm > >>>>>>> --stop array", followed by an "mdadm --assemble" > >>>>>>> > >>>>>>> If that doesn't work, then > >>>>>>> > >>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-30 17:21 ` Roger Heflin @ 2020-03-30 20:05 ` Alexander Shenkin 2020-03-30 20:45 ` Roger Heflin 0 siblings, 1 reply; 22+ messages in thread From: Alexander Shenkin @ 2020-03-30 20:05 UTC (permalink / raw) To: Roger Heflin; +Cc: antlists, Linux-RAID [-- Attachment #1: Type: text/plain, Size: 3611 bytes --] See attached. I should mention that the last drive i added is on a new controller that is separate from the other drives, but seemed to work fine for a bit, so kinda doubt that's the issue... thanks, allie On 3/30/2020 6:21 PM, Roger Heflin wrote: > do this against each partition that had it: > > mdadm --examine /dev/sd*** > > It seems like it is not seeing it as a md-raid. > > On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: >> Thanks Roger, >> >> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE >> partitions"... >> >> Thanks, >> >> Allie >> >> On 3/30/2020 4:53 PM, Roger Heflin wrote: >>> That seems really odd. Is the raid456 module loaded? >>> >>> On mine I see messages like this for each disk it scanned and >>> considered as maybe possibly being an array member. >>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 >>> and messages like this: >>> md/raid:md14: not clean -- starting background reconstruction >>> >>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a >>> DEVICE line that limits what is being scanned. >>> >>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: >>>> Thanks Roger, >>>> >>>> that grep just returns the detection of the raid1 (md127). See dmesg >>>> and mdadm --detail results attached. >>>> >>>> Many thanks, >>>> allie >>>> >>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: >>>>> Try this grep: >>>>> dmesg | grep "md/raid", if that returns nothing if you can just send >>>>> the entire dmesg. >>>>> >>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... >>>>>> any other thoughts on how to investigate? >>>>>> >>>>>> thanks, >>>>>> allie >>>>>> >>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: >>>>>>> A non-assembled array always reports raid1. >>>>>>> >>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. >>>>>>> >>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>> Thanks Wol, >>>>>>>> >>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are >>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks >>>>>>>> listed as spares. The second (md127) is reported as active >>>>>>>> auto-read-only with all 7 disks operational. Also, the only >>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion >>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the >>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to >>>>>>>> check in before doing that... >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Allie >>>>>>>> >>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: >>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: >>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, >>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! >>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like >>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest >>>>>>>>> mdadm. >>>>>>>>> >>>>>>>>> All being well, the resync will restart, and when it's finished your >>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm >>>>>>>>> --stop array", followed by an "mdadm --assemble" >>>>>>>>> >>>>>>>>> If that doesn't work, then >>>>>>>>> >>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Wol [-- Attachment #2: sda3.txt --] [-- Type: text/plain, Size: 984 bytes --] /dev/sda3: Magic : a92b4efc Version : 1.2 Feature Map : 0x5 Array UUID : c7303f62:d848d424:269581c8:83a045ec Name : ubuntu:2 Creation Time : Sun Feb 5 23:39:58 2017 Raid Level : raid6 Raid Devices : 7 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB) Array Size : 14600944640 (13924.55 GiB 14951.37 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : active Device UUID : 10bdbed5:cb70c8a9:566c384d:ec4c926e Internal Bitmap : 8 sectors from superblock Reshape pos'n : 0 Delta Devices : 1 (6->7) Update Time : Tue Feb 25 09:21:27 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 300f6945 - correct Events : 316485 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : AAAAAAA ('A' == active, '.' == missing, 'R' == replacing) [-- Attachment #3: sdb3.txt --] [-- Type: text/plain, Size: 984 bytes --] /dev/sdb3: Magic : a92b4efc Version : 1.2 Feature Map : 0x5 Array UUID : c7303f62:d848d424:269581c8:83a045ec Name : ubuntu:2 Creation Time : Sun Feb 5 23:39:58 2017 Raid Level : raid6 Raid Devices : 7 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB) Array Size : 14600944640 (13924.55 GiB 14951.37 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : active Device UUID : cf70dad5:0c9ff5f6:ede689f2:ccee2eb0 Internal Bitmap : 8 sectors from superblock Reshape pos'n : 0 Delta Devices : 1 (6->7) Update Time : Tue Feb 25 09:21:27 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 644667bb - correct Events : 316485 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : AAAAAAA ('A' == active, '.' == missing, 'R' == replacing) [-- Attachment #4: sdc3.txt --] [-- Type: text/plain, Size: 984 bytes --] /dev/sdc3: Magic : a92b4efc Version : 1.2 Feature Map : 0x5 Array UUID : c7303f62:d848d424:269581c8:83a045ec Name : ubuntu:2 Creation Time : Sun Feb 5 23:39:58 2017 Raid Level : raid6 Raid Devices : 7 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB) Array Size : 14600944640 (13924.55 GiB 14951.37 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : active Device UUID : f8839952:eaba2e9c:c2c401d4:3e0592a5 Internal Bitmap : 8 sectors from superblock Reshape pos'n : 0 Delta Devices : 1 (6->7) Update Time : Tue Feb 25 09:21:27 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 5d198b06 - correct Events : 316485 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : AAAAAAA ('A' == active, '.' == missing, 'R' == replacing) [-- Attachment #5: sdd3.txt --] [-- Type: text/plain, Size: 984 bytes --] /dev/sdd3: Magic : a92b4efc Version : 1.2 Feature Map : 0x5 Array UUID : c7303f62:d848d424:269581c8:83a045ec Name : ubuntu:2 Creation Time : Sun Feb 5 23:39:58 2017 Raid Level : raid6 Raid Devices : 7 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB) Array Size : 14600944640 (13924.55 GiB 14951.37 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : active Device UUID : 875a0dbd:965a9986:1b78eb3d:e15fee50 Internal Bitmap : 8 sectors from superblock Reshape pos'n : 0 Delta Devices : 1 (6->7) Update Time : Tue Feb 25 09:21:27 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : c73e0f3f - correct Events : 316485 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 3 Array State : AAAAAAA ('A' == active, '.' == missing, 'R' == replacing) [-- Attachment #6: sde3.txt --] [-- Type: text/plain, Size: 984 bytes --] /dev/sde3: Magic : a92b4efc Version : 1.2 Feature Map : 0x5 Array UUID : c7303f62:d848d424:269581c8:83a045ec Name : ubuntu:2 Creation Time : Sun Feb 5 23:39:58 2017 Raid Level : raid6 Raid Devices : 7 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB) Array Size : 14600944640 (13924.55 GiB 14951.37 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : active Device UUID : 635ef71b:e4add925:30ae4f0a:f6b46611 Internal Bitmap : 8 sectors from superblock Reshape pos'n : 0 Delta Devices : 1 (6->7) Update Time : Tue Feb 25 09:21:27 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 5244f196 - correct Events : 316485 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 6 Array State : AAAAAAA ('A' == active, '.' == missing, 'R' == replacing) [-- Attachment #7: sdf3.txt --] [-- Type: text/plain, Size: 984 bytes --] /dev/sdf3: Magic : a92b4efc Version : 1.2 Feature Map : 0x5 Array UUID : c7303f62:d848d424:269581c8:83a045ec Name : ubuntu:2 Creation Time : Sun Feb 5 23:39:58 2017 Raid Level : raid6 Raid Devices : 7 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB) Array Size : 14600944640 (13924.55 GiB 14951.37 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : active Device UUID : dc0bda8c:2457fb4c:f87a4bec:8d5b58ed Internal Bitmap : 8 sectors from superblock Reshape pos'n : 0 Delta Devices : 1 (6->7) Update Time : Tue Feb 25 09:21:27 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : a836bbae - correct Events : 316485 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 5 Array State : AAAAAAA ('A' == active, '.' == missing, 'R' == replacing) [-- Attachment #8: sdg3.txt --] [-- Type: text/plain, Size: 984 bytes --] /dev/sdg3: Magic : a92b4efc Version : 1.2 Feature Map : 0x5 Array UUID : c7303f62:d848d424:269581c8:83a045ec Name : ubuntu:2 Creation Time : Sun Feb 5 23:39:58 2017 Raid Level : raid6 Raid Devices : 7 Avail Dev Size : 5840377856 (2784.91 GiB 2990.27 GB) Array Size : 14600944640 (13924.55 GiB 14951.37 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : active Device UUID : dc842dc3:09c910c7:c351c307:e2383d13 Internal Bitmap : 8 sectors from superblock Reshape pos'n : 0 Delta Devices : 1 (6->7) Update Time : Tue Feb 25 09:21:27 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 99fc5ab3 - correct Events : 316485 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 4 Array State : AAAAAAA ('A' == active, '.' == missing, 'R' == replacing) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-30 20:05 ` Alexander Shenkin @ 2020-03-30 20:45 ` Roger Heflin 2020-03-31 0:16 ` antlists 2020-03-31 10:08 ` Alexander Shenkin 0 siblings, 2 replies; 22+ messages in thread From: Roger Heflin @ 2020-03-30 20:45 UTC (permalink / raw) To: Alexander Shenkin; +Cc: antlists, Linux-RAID They all seem to be there, all seem to report all 7 disks active, so it does not appear to be degraded. All event counters are the same. Something has to be causing them to not be scanned and assembled at all. Is the rescue disk a similar OS to what you have installed? If it is you might try a random say fedora livecd and see if it acts any different. what does fdisk -l /dev/sda look like? Is the raid456 module loaded (lsmod | grep raid)? what does cat /proc/cmdline look like? you might also run this: file -s /dev/sd*3 But I think it is going to show us the same thing as what the mdadm --examine is reporting. On Mon, Mar 30, 2020 at 3:05 PM Alexander Shenkin <al@shenkin.org> wrote: > > See attached. I should mention that the last drive i added is on a new > controller that is separate from the other drives, but seemed to work > fine for a bit, so kinda doubt that's the issue... > > thanks, > > allie > > On 3/30/2020 6:21 PM, Roger Heflin wrote: > > do this against each partition that had it: > > > > mdadm --examine /dev/sd*** > > > > It seems like it is not seeing it as a md-raid. > > > > On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >> Thanks Roger, > >> > >> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE > >> partitions"... > >> > >> Thanks, > >> > >> Allie > >> > >> On 3/30/2020 4:53 PM, Roger Heflin wrote: > >>> That seems really odd. Is the raid456 module loaded? > >>> > >>> On mine I see messages like this for each disk it scanned and > >>> considered as maybe possibly being an array member. > >>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 > >>> and messages like this: > >>> md/raid:md14: not clean -- starting background reconstruction > >>> > >>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a > >>> DEVICE line that limits what is being scanned. > >>> > >>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>> Thanks Roger, > >>>> > >>>> that grep just returns the detection of the raid1 (md127). See dmesg > >>>> and mdadm --detail results attached. > >>>> > >>>> Many thanks, > >>>> allie > >>>> > >>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: > >>>>> Try this grep: > >>>>> dmesg | grep "md/raid", if that returns nothing if you can just send > >>>>> the entire dmesg. > >>>>> > >>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... > >>>>>> any other thoughts on how to investigate? > >>>>>> > >>>>>> thanks, > >>>>>> allie > >>>>>> > >>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: > >>>>>>> A non-assembled array always reports raid1. > >>>>>>> > >>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. > >>>>>>> > >>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>>> Thanks Wol, > >>>>>>>> > >>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are > >>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks > >>>>>>>> listed as spares. The second (md127) is reported as active > >>>>>>>> auto-read-only with all 7 disks operational. Also, the only > >>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion > >>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the > >>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to > >>>>>>>> check in before doing that... > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Allie > >>>>>>>> > >>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: > >>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: > >>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, > >>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! > >>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like > >>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest > >>>>>>>>> mdadm. > >>>>>>>>> > >>>>>>>>> All being well, the resync will restart, and when it's finished your > >>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm > >>>>>>>>> --stop array", followed by an "mdadm --assemble" > >>>>>>>>> > >>>>>>>>> If that doesn't work, then > >>>>>>>>> > >>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-30 20:45 ` Roger Heflin @ 2020-03-31 0:16 ` antlists 2020-03-31 10:08 ` Alexander Shenkin 1 sibling, 0 replies; 22+ messages in thread From: antlists @ 2020-03-31 0:16 UTC (permalink / raw) To: Roger Heflin, Alexander Shenkin; +Cc: Linux-RAID On 30/03/2020 21:45, Roger Heflin wrote: > Is the raid456 module loaded (lsmod | grep raid)? Or "cat /proc/mdstat" iirc ... Cheers, Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-30 20:45 ` Roger Heflin 2020-03-31 0:16 ` antlists @ 2020-03-31 10:08 ` Alexander Shenkin 2020-03-31 13:53 ` Roger Heflin 2020-03-31 16:13 ` Alexander Shenkin 1 sibling, 2 replies; 22+ messages in thread From: Alexander Shenkin @ 2020-03-31 10:08 UTC (permalink / raw) To: Roger Heflin; +Cc: antlists, Linux-RAID Thanks Roger, It seems only the Raid1 module is loaded. I didn't find a straightforward way to get that module loaded... any suggestions? Or, will I have to find another livecd that contains raid456? Thanks, Allie On 3/30/2020 9:45 PM, Roger Heflin wrote: > They all seem to be there, all seem to report all 7 disks active, so > it does not appear to be degraded. All event counters are the same. > Something has to be causing them to not be scanned and assembled at > all. > > Is the rescue disk a similar OS to what you have installed? If it is > you might try a random say fedora livecd and see if it acts any > different. > > what does fdisk -l /dev/sda look like? > > Is the raid456 module loaded (lsmod | grep raid)? > > what does cat /proc/cmdline look like? > > you might also run this: > file -s /dev/sd*3 > But I think it is going to show us the same thing as what the mdadm > --examine is reporting. > > On Mon, Mar 30, 2020 at 3:05 PM Alexander Shenkin <al@shenkin.org> wrote: >> >> See attached. I should mention that the last drive i added is on a new >> controller that is separate from the other drives, but seemed to work >> fine for a bit, so kinda doubt that's the issue... >> >> thanks, >> >> allie >> >> On 3/30/2020 6:21 PM, Roger Heflin wrote: >>> do this against each partition that had it: >>> >>> mdadm --examine /dev/sd*** >>> >>> It seems like it is not seeing it as a md-raid. >>> >>> On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: >>>> Thanks Roger, >>>> >>>> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE >>>> partitions"... >>>> >>>> Thanks, >>>> >>>> Allie >>>> >>>> On 3/30/2020 4:53 PM, Roger Heflin wrote: >>>>> That seems really odd. Is the raid456 module loaded? >>>>> >>>>> On mine I see messages like this for each disk it scanned and >>>>> considered as maybe possibly being an array member. >>>>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 >>>>> and messages like this: >>>>> md/raid:md14: not clean -- starting background reconstruction >>>>> >>>>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a >>>>> DEVICE line that limits what is being scanned. >>>>> >>>>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>> Thanks Roger, >>>>>> >>>>>> that grep just returns the detection of the raid1 (md127). See dmesg >>>>>> and mdadm --detail results attached. >>>>>> >>>>>> Many thanks, >>>>>> allie >>>>>> >>>>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: >>>>>>> Try this grep: >>>>>>> dmesg | grep "md/raid", if that returns nothing if you can just send >>>>>>> the entire dmesg. >>>>>>> >>>>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... >>>>>>>> any other thoughts on how to investigate? >>>>>>>> >>>>>>>> thanks, >>>>>>>> allie >>>>>>>> >>>>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: >>>>>>>>> A non-assembled array always reports raid1. >>>>>>>>> >>>>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. >>>>>>>>> >>>>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>>>> Thanks Wol, >>>>>>>>>> >>>>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are >>>>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks >>>>>>>>>> listed as spares. The second (md127) is reported as active >>>>>>>>>> auto-read-only with all 7 disks operational. Also, the only >>>>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion >>>>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the >>>>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to >>>>>>>>>> check in before doing that... >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Allie >>>>>>>>>> >>>>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: >>>>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: >>>>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, >>>>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! >>>>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like >>>>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest >>>>>>>>>>> mdadm. >>>>>>>>>>> >>>>>>>>>>> All being well, the resync will restart, and when it's finished your >>>>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm >>>>>>>>>>> --stop array", followed by an "mdadm --assemble" >>>>>>>>>>> >>>>>>>>>>> If that doesn't work, then >>>>>>>>>>> >>>>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-31 10:08 ` Alexander Shenkin @ 2020-03-31 13:53 ` Roger Heflin 2020-03-31 14:28 ` Alexander Shenkin 2020-03-31 16:13 ` Alexander Shenkin 1 sibling, 1 reply; 22+ messages in thread From: Roger Heflin @ 2020-03-31 13:53 UTC (permalink / raw) To: Alexander Shenkin; +Cc: antlists, Linux-RAID the fedora live cds I think used to have it. It could be build into the kernel or it could be loaded as a module. See if there is a config* file on /boot and if so do a "grep -i raid456 configfilename" if it is =y it is build into the kernel, if =m it is a module and you should see it in lsmod so if you don't the module is not loaded, but it was built as a module. if=m then Try "modprobe raid456" that should load it if it is on the livecd. if that fails do a find /lib/modules -name "raid456*" -ls and see if it exists in the modules directory. If it is built into the kernel =y then something is probably wrong with the udev rules not triggering and building and enabling the raid6 array on the livecd. THere is a reasonable chance that whatever this is is also the problem with your booting os as it would need the right parts in the initramfs. What does cat /proc/cmdline look like? There are some options on there that can cause md's to get ignored at boot time. On Tue, Mar 31, 2020 at 5:08 AM Alexander Shenkin <al@shenkin.org> wrote: > > Thanks Roger, > > It seems only the Raid1 module is loaded. I didn't find a > straightforward way to get that module loaded... any suggestions? Or, > will I have to find another livecd that contains raid456? > > Thanks, > Allie > > On 3/30/2020 9:45 PM, Roger Heflin wrote: > > They all seem to be there, all seem to report all 7 disks active, so > > it does not appear to be degraded. All event counters are the same. > > Something has to be causing them to not be scanned and assembled at > > all. > > > > Is the rescue disk a similar OS to what you have installed? If it is > > you might try a random say fedora livecd and see if it acts any > > different. > > > > what does fdisk -l /dev/sda look like? > > > > Is the raid456 module loaded (lsmod | grep raid)? > > > > what does cat /proc/cmdline look like? > > > > you might also run this: > > file -s /dev/sd*3 > > But I think it is going to show us the same thing as what the mdadm > > --examine is reporting. > > > > On Mon, Mar 30, 2020 at 3:05 PM Alexander Shenkin <al@shenkin.org> wrote: > >> > >> See attached. I should mention that the last drive i added is on a new > >> controller that is separate from the other drives, but seemed to work > >> fine for a bit, so kinda doubt that's the issue... > >> > >> thanks, > >> > >> allie > >> > >> On 3/30/2020 6:21 PM, Roger Heflin wrote: > >>> do this against each partition that had it: > >>> > >>> mdadm --examine /dev/sd*** > >>> > >>> It seems like it is not seeing it as a md-raid. > >>> > >>> On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>> Thanks Roger, > >>>> > >>>> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE > >>>> partitions"... > >>>> > >>>> Thanks, > >>>> > >>>> Allie > >>>> > >>>> On 3/30/2020 4:53 PM, Roger Heflin wrote: > >>>>> That seems really odd. Is the raid456 module loaded? > >>>>> > >>>>> On mine I see messages like this for each disk it scanned and > >>>>> considered as maybe possibly being an array member. > >>>>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 > >>>>> and messages like this: > >>>>> md/raid:md14: not clean -- starting background reconstruction > >>>>> > >>>>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a > >>>>> DEVICE line that limits what is being scanned. > >>>>> > >>>>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>> Thanks Roger, > >>>>>> > >>>>>> that grep just returns the detection of the raid1 (md127). See dmesg > >>>>>> and mdadm --detail results attached. > >>>>>> > >>>>>> Many thanks, > >>>>>> allie > >>>>>> > >>>>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: > >>>>>>> Try this grep: > >>>>>>> dmesg | grep "md/raid", if that returns nothing if you can just send > >>>>>>> the entire dmesg. > >>>>>>> > >>>>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... > >>>>>>>> any other thoughts on how to investigate? > >>>>>>>> > >>>>>>>> thanks, > >>>>>>>> allie > >>>>>>>> > >>>>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: > >>>>>>>>> A non-assembled array always reports raid1. > >>>>>>>>> > >>>>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. > >>>>>>>>> > >>>>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>>>>> Thanks Wol, > >>>>>>>>>> > >>>>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are > >>>>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks > >>>>>>>>>> listed as spares. The second (md127) is reported as active > >>>>>>>>>> auto-read-only with all 7 disks operational. Also, the only > >>>>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion > >>>>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the > >>>>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to > >>>>>>>>>> check in before doing that... > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Allie > >>>>>>>>>> > >>>>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: > >>>>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: > >>>>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, > >>>>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! > >>>>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like > >>>>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest > >>>>>>>>>>> mdadm. > >>>>>>>>>>> > >>>>>>>>>>> All being well, the resync will restart, and when it's finished your > >>>>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm > >>>>>>>>>>> --stop array", followed by an "mdadm --assemble" > >>>>>>>>>>> > >>>>>>>>>>> If that doesn't work, then > >>>>>>>>>>> > >>>>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > >>>>>>>>>>> > >>>>>>>>>>> Cheers, > >>>>>>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-31 13:53 ` Roger Heflin @ 2020-03-31 14:28 ` Alexander Shenkin 2020-03-31 14:43 ` Roger Heflin 0 siblings, 1 reply; 22+ messages in thread From: Alexander Shenkin @ 2020-03-31 14:28 UTC (permalink / raw) To: Roger Heflin; +Cc: antlists, Linux-RAID Thanks Roger, modprobe raid456 did the trick. md126 is still showing up as inactive though. Do I need to bring it online after I activate the raid456 module? I could copy the results of /proc/cmdline over here if still necessary, but I figure it's likely not now that we've found raid456... It's just a single line specifying the BOOT_IMAGE... thanks, allie On 3/31/2020 2:53 PM, Roger Heflin wrote: > the fedora live cds I think used to have it. It could be build into > the kernel or it could be loaded as a module. > > See if there is a config* file on /boot and if so do a "grep -i > raid456 configfilename" if it is =y it is build into the kernel, if > =m it is a module and you should see it in lsmod so if you don't the > module is not loaded, but it was built as a module. > > if=m then Try "modprobe raid456" that should load it if it is on the livecd. > > if that fails do a find /lib/modules -name "raid456*" -ls and see if > it exists in the modules directory. > > If it is built into the kernel =y then something is probably wrong > with the udev rules not triggering and building and enabling the raid6 > array on the livecd. THere is a reasonable chance that whatever this > is is also the problem with your booting os as it would need the right > parts in the initramfs. > > What does cat /proc/cmdline look like? There are some options on > there that can cause md's to get ignored at boot time. > > > > On Tue, Mar 31, 2020 at 5:08 AM Alexander Shenkin <al@shenkin.org> wrote: >> >> Thanks Roger, >> >> It seems only the Raid1 module is loaded. I didn't find a >> straightforward way to get that module loaded... any suggestions? Or, >> will I have to find another livecd that contains raid456? >> >> Thanks, >> Allie >> >> On 3/30/2020 9:45 PM, Roger Heflin wrote: >>> They all seem to be there, all seem to report all 7 disks active, so >>> it does not appear to be degraded. All event counters are the same. >>> Something has to be causing them to not be scanned and assembled at >>> all. >>> >>> Is the rescue disk a similar OS to what you have installed? If it is >>> you might try a random say fedora livecd and see if it acts any >>> different. >>> >>> what does fdisk -l /dev/sda look like? >>> >>> Is the raid456 module loaded (lsmod | grep raid)? >>> >>> what does cat /proc/cmdline look like? >>> >>> you might also run this: >>> file -s /dev/sd*3 >>> But I think it is going to show us the same thing as what the mdadm >>> --examine is reporting. >>> >>> On Mon, Mar 30, 2020 at 3:05 PM Alexander Shenkin <al@shenkin.org> wrote: >>>> >>>> See attached. I should mention that the last drive i added is on a new >>>> controller that is separate from the other drives, but seemed to work >>>> fine for a bit, so kinda doubt that's the issue... >>>> >>>> thanks, >>>> >>>> allie >>>> >>>> On 3/30/2020 6:21 PM, Roger Heflin wrote: >>>>> do this against each partition that had it: >>>>> >>>>> mdadm --examine /dev/sd*** >>>>> >>>>> It seems like it is not seeing it as a md-raid. >>>>> >>>>> On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>> Thanks Roger, >>>>>> >>>>>> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE >>>>>> partitions"... >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Allie >>>>>> >>>>>> On 3/30/2020 4:53 PM, Roger Heflin wrote: >>>>>>> That seems really odd. Is the raid456 module loaded? >>>>>>> >>>>>>> On mine I see messages like this for each disk it scanned and >>>>>>> considered as maybe possibly being an array member. >>>>>>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 >>>>>>> and messages like this: >>>>>>> md/raid:md14: not clean -- starting background reconstruction >>>>>>> >>>>>>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a >>>>>>> DEVICE line that limits what is being scanned. >>>>>>> >>>>>>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>> Thanks Roger, >>>>>>>> >>>>>>>> that grep just returns the detection of the raid1 (md127). See dmesg >>>>>>>> and mdadm --detail results attached. >>>>>>>> >>>>>>>> Many thanks, >>>>>>>> allie >>>>>>>> >>>>>>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: >>>>>>>>> Try this grep: >>>>>>>>> dmesg | grep "md/raid", if that returns nothing if you can just send >>>>>>>>> the entire dmesg. >>>>>>>>> >>>>>>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... >>>>>>>>>> any other thoughts on how to investigate? >>>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> allie >>>>>>>>>> >>>>>>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: >>>>>>>>>>> A non-assembled array always reports raid1. >>>>>>>>>>> >>>>>>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. >>>>>>>>>>> >>>>>>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>>>>>> Thanks Wol, >>>>>>>>>>>> >>>>>>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are >>>>>>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks >>>>>>>>>>>> listed as spares. The second (md127) is reported as active >>>>>>>>>>>> auto-read-only with all 7 disks operational. Also, the only >>>>>>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion >>>>>>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the >>>>>>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to >>>>>>>>>>>> check in before doing that... >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Allie >>>>>>>>>>>> >>>>>>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: >>>>>>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: >>>>>>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, >>>>>>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! >>>>>>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like >>>>>>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest >>>>>>>>>>>>> mdadm. >>>>>>>>>>>>> >>>>>>>>>>>>> All being well, the resync will restart, and when it's finished your >>>>>>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm >>>>>>>>>>>>> --stop array", followed by an "mdadm --assemble" >>>>>>>>>>>>> >>>>>>>>>>>>> If that doesn't work, then >>>>>>>>>>>>> >>>>>>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-31 14:28 ` Alexander Shenkin @ 2020-03-31 14:43 ` Roger Heflin 2020-03-31 16:03 ` Alexander Shenkin 0 siblings, 1 reply; 22+ messages in thread From: Roger Heflin @ 2020-03-31 14:43 UTC (permalink / raw) To: Alexander Shenkin; +Cc: antlists, Linux-RAID Yes, you would have to activate it. Since raid456 was not loaded when the udev triggers happened at device creation then it would have failed to be able to assemble it. Do this: "lsinitrd /yourrealboot/initr*-younormalbootkernel | grep -i raid456" if that returns nothing then that module is not in the initrd and that would produce a failure to find the rootfs when the rootfs is on a raid4/5/6 device. You probably need to look at /etc/dracut.conf and/or /etc/dracut.conf.d and make sure mdraid modules is being installed, and rebuild the initrd, after rebuilding it then rerun the above test, if it does not show raid456 then you will need to add explicit options to include that specific module. There should be instructions on how to rebuild an initrd from a livecd boot, I have a pretty messy way to do it but my way may not be necessary when livecd is very similar to boot os. Most of the ones I rebuild it, the livecd is much newer than the actual host os, so to get a clean boot you have to mount the system at say /mnt (and any others if you separate fs on root) and boot at /mnt/boot and do a few bind mounts to get /proc /sys /dev visable under /mnt and chroot /mnt and run the commands from the install to rebuild init and use the config from the actual install. On Tue, Mar 31, 2020 at 9:28 AM Alexander Shenkin <al@shenkin.org> wrote: > > Thanks Roger, > > modprobe raid456 did the trick. md126 is still showing up as inactive > though. Do I need to bring it online after I activate the raid456 module? > > I could copy the results of /proc/cmdline over here if still necessary, > but I figure it's likely not now that we've found raid456... It's just > a single line specifying the BOOT_IMAGE... > > thanks, > allie > > On 3/31/2020 2:53 PM, Roger Heflin wrote: > > the fedora live cds I think used to have it. It could be build into > > the kernel or it could be loaded as a module. > > > > See if there is a config* file on /boot and if so do a "grep -i > > raid456 configfilename" if it is =y it is build into the kernel, if > > =m it is a module and you should see it in lsmod so if you don't the > > module is not loaded, but it was built as a module. > > > > if=m then Try "modprobe raid456" that should load it if it is on the livecd. > > > > if that fails do a find /lib/modules -name "raid456*" -ls and see if > > it exists in the modules directory. > > > > If it is built into the kernel =y then something is probably wrong > > with the udev rules not triggering and building and enabling the raid6 > > array on the livecd. THere is a reasonable chance that whatever this > > is is also the problem with your booting os as it would need the right > > parts in the initramfs. > > > > What does cat /proc/cmdline look like? There are some options on > > there that can cause md's to get ignored at boot time. > > > > > > > > On Tue, Mar 31, 2020 at 5:08 AM Alexander Shenkin <al@shenkin.org> wrote: > >> > >> Thanks Roger, > >> > >> It seems only the Raid1 module is loaded. I didn't find a > >> straightforward way to get that module loaded... any suggestions? Or, > >> will I have to find another livecd that contains raid456? > >> > >> Thanks, > >> Allie > >> > >> On 3/30/2020 9:45 PM, Roger Heflin wrote: > >>> They all seem to be there, all seem to report all 7 disks active, so > >>> it does not appear to be degraded. All event counters are the same. > >>> Something has to be causing them to not be scanned and assembled at > >>> all. > >>> > >>> Is the rescue disk a similar OS to what you have installed? If it is > >>> you might try a random say fedora livecd and see if it acts any > >>> different. > >>> > >>> what does fdisk -l /dev/sda look like? > >>> > >>> Is the raid456 module loaded (lsmod | grep raid)? > >>> > >>> what does cat /proc/cmdline look like? > >>> > >>> you might also run this: > >>> file -s /dev/sd*3 > >>> But I think it is going to show us the same thing as what the mdadm > >>> --examine is reporting. > >>> > >>> On Mon, Mar 30, 2020 at 3:05 PM Alexander Shenkin <al@shenkin.org> wrote: > >>>> > >>>> See attached. I should mention that the last drive i added is on a new > >>>> controller that is separate from the other drives, but seemed to work > >>>> fine for a bit, so kinda doubt that's the issue... > >>>> > >>>> thanks, > >>>> > >>>> allie > >>>> > >>>> On 3/30/2020 6:21 PM, Roger Heflin wrote: > >>>>> do this against each partition that had it: > >>>>> > >>>>> mdadm --examine /dev/sd*** > >>>>> > >>>>> It seems like it is not seeing it as a md-raid. > >>>>> > >>>>> On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>> Thanks Roger, > >>>>>> > >>>>>> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE > >>>>>> partitions"... > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Allie > >>>>>> > >>>>>> On 3/30/2020 4:53 PM, Roger Heflin wrote: > >>>>>>> That seems really odd. Is the raid456 module loaded? > >>>>>>> > >>>>>>> On mine I see messages like this for each disk it scanned and > >>>>>>> considered as maybe possibly being an array member. > >>>>>>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 > >>>>>>> and messages like this: > >>>>>>> md/raid:md14: not clean -- starting background reconstruction > >>>>>>> > >>>>>>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a > >>>>>>> DEVICE line that limits what is being scanned. > >>>>>>> > >>>>>>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>>> Thanks Roger, > >>>>>>>> > >>>>>>>> that grep just returns the detection of the raid1 (md127). See dmesg > >>>>>>>> and mdadm --detail results attached. > >>>>>>>> > >>>>>>>> Many thanks, > >>>>>>>> allie > >>>>>>>> > >>>>>>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: > >>>>>>>>> Try this grep: > >>>>>>>>> dmesg | grep "md/raid", if that returns nothing if you can just send > >>>>>>>>> the entire dmesg. > >>>>>>>>> > >>>>>>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... > >>>>>>>>>> any other thoughts on how to investigate? > >>>>>>>>>> > >>>>>>>>>> thanks, > >>>>>>>>>> allie > >>>>>>>>>> > >>>>>>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: > >>>>>>>>>>> A non-assembled array always reports raid1. > >>>>>>>>>>> > >>>>>>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>>>>>>> Thanks Wol, > >>>>>>>>>>>> > >>>>>>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are > >>>>>>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks > >>>>>>>>>>>> listed as spares. The second (md127) is reported as active > >>>>>>>>>>>> auto-read-only with all 7 disks operational. Also, the only > >>>>>>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion > >>>>>>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the > >>>>>>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to > >>>>>>>>>>>> check in before doing that... > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Allie > >>>>>>>>>>>> > >>>>>>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: > >>>>>>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: > >>>>>>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, > >>>>>>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! > >>>>>>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like > >>>>>>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest > >>>>>>>>>>>>> mdadm. > >>>>>>>>>>>>> > >>>>>>>>>>>>> All being well, the resync will restart, and when it's finished your > >>>>>>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm > >>>>>>>>>>>>> --stop array", followed by an "mdadm --assemble" > >>>>>>>>>>>>> > >>>>>>>>>>>>> If that doesn't work, then > >>>>>>>>>>>>> > >>>>>>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > >>>>>>>>>>>>> > >>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-31 14:43 ` Roger Heflin @ 2020-03-31 16:03 ` Alexander Shenkin 0 siblings, 0 replies; 22+ messages in thread From: Alexander Shenkin @ 2020-03-31 16:03 UTC (permalink / raw) To: Roger Heflin; +Cc: antlists, Linux-RAID Hi Roger, Ok, thanks, I've now managed to boot into an Ubuntu 18 server environment from a USB that has all the raid personalities loaded up. /proc/mdstat shows the same thing the systemrescuecd was showing: i.e. md126 is raid1, and md127 is inactive with all disks being spares. Just somehow not recognizing it as a raid6... thoughts from here? thanks, allie On 3/31/2020 3:43 PM, Roger Heflin wrote: > Yes, you would have to activate it. Since raid456 was not loaded > when the udev triggers happened at device creation then it would have > failed to be able to assemble it. > > Do this: "lsinitrd /yourrealboot/initr*-younormalbootkernel | grep -i > raid456" if that returns nothing then that module is not in the initrd > and that would produce a failure to find the rootfs when the rootfs is > on a raid4/5/6 device. > > You probably need to look at /etc/dracut.conf and/or > /etc/dracut.conf.d and make sure mdraid modules is being installed, > and rebuild the initrd, after rebuilding it then rerun the above test, > if it does not show raid456 then you will need to add explicit options > to include that specific module. > > There should be instructions on how to rebuild an initrd from a livecd > boot, I have a pretty messy way to do it but my way may not be > necessary when livecd is very similar to boot os. Most of the ones I > rebuild it, the livecd is much newer than the actual host os, so to > get a clean boot you have to mount the system at say /mnt (and any > others if you separate fs on root) and boot at /mnt/boot and do a few > bind mounts to get /proc /sys /dev visable under /mnt and chroot /mnt > and run the commands from the install to rebuild init and use the > config from the actual install. > > On Tue, Mar 31, 2020 at 9:28 AM Alexander Shenkin <al@shenkin.org> wrote: >> >> Thanks Roger, >> >> modprobe raid456 did the trick. md126 is still showing up as inactive >> though. Do I need to bring it online after I activate the raid456 module? >> >> I could copy the results of /proc/cmdline over here if still necessary, >> but I figure it's likely not now that we've found raid456... It's just >> a single line specifying the BOOT_IMAGE... >> >> thanks, >> allie >> >> On 3/31/2020 2:53 PM, Roger Heflin wrote: >>> the fedora live cds I think used to have it. It could be build into >>> the kernel or it could be loaded as a module. >>> >>> See if there is a config* file on /boot and if so do a "grep -i >>> raid456 configfilename" if it is =y it is build into the kernel, if >>> =m it is a module and you should see it in lsmod so if you don't the >>> module is not loaded, but it was built as a module. >>> >>> if=m then Try "modprobe raid456" that should load it if it is on the livecd. >>> >>> if that fails do a find /lib/modules -name "raid456*" -ls and see if >>> it exists in the modules directory. >>> >>> If it is built into the kernel =y then something is probably wrong >>> with the udev rules not triggering and building and enabling the raid6 >>> array on the livecd. THere is a reasonable chance that whatever this >>> is is also the problem with your booting os as it would need the right >>> parts in the initramfs. >>> >>> What does cat /proc/cmdline look like? There are some options on >>> there that can cause md's to get ignored at boot time. >>> >>> >>> >>> On Tue, Mar 31, 2020 at 5:08 AM Alexander Shenkin <al@shenkin.org> wrote: >>>> >>>> Thanks Roger, >>>> >>>> It seems only the Raid1 module is loaded. I didn't find a >>>> straightforward way to get that module loaded... any suggestions? Or, >>>> will I have to find another livecd that contains raid456? >>>> >>>> Thanks, >>>> Allie >>>> >>>> On 3/30/2020 9:45 PM, Roger Heflin wrote: >>>>> They all seem to be there, all seem to report all 7 disks active, so >>>>> it does not appear to be degraded. All event counters are the same. >>>>> Something has to be causing them to not be scanned and assembled at >>>>> all. >>>>> >>>>> Is the rescue disk a similar OS to what you have installed? If it is >>>>> you might try a random say fedora livecd and see if it acts any >>>>> different. >>>>> >>>>> what does fdisk -l /dev/sda look like? >>>>> >>>>> Is the raid456 module loaded (lsmod | grep raid)? >>>>> >>>>> what does cat /proc/cmdline look like? >>>>> >>>>> you might also run this: >>>>> file -s /dev/sd*3 >>>>> But I think it is going to show us the same thing as what the mdadm >>>>> --examine is reporting. >>>>> >>>>> On Mon, Mar 30, 2020 at 3:05 PM Alexander Shenkin <al@shenkin.org> wrote: >>>>>> >>>>>> See attached. I should mention that the last drive i added is on a new >>>>>> controller that is separate from the other drives, but seemed to work >>>>>> fine for a bit, so kinda doubt that's the issue... >>>>>> >>>>>> thanks, >>>>>> >>>>>> allie >>>>>> >>>>>> On 3/30/2020 6:21 PM, Roger Heflin wrote: >>>>>>> do this against each partition that had it: >>>>>>> >>>>>>> mdadm --examine /dev/sd*** >>>>>>> >>>>>>> It seems like it is not seeing it as a md-raid. >>>>>>> >>>>>>> On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>> Thanks Roger, >>>>>>>> >>>>>>>> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE >>>>>>>> partitions"... >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Allie >>>>>>>> >>>>>>>> On 3/30/2020 4:53 PM, Roger Heflin wrote: >>>>>>>>> That seems really odd. Is the raid456 module loaded? >>>>>>>>> >>>>>>>>> On mine I see messages like this for each disk it scanned and >>>>>>>>> considered as maybe possibly being an array member. >>>>>>>>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 >>>>>>>>> and messages like this: >>>>>>>>> md/raid:md14: not clean -- starting background reconstruction >>>>>>>>> >>>>>>>>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a >>>>>>>>> DEVICE line that limits what is being scanned. >>>>>>>>> >>>>>>>>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>>>> Thanks Roger, >>>>>>>>>> >>>>>>>>>> that grep just returns the detection of the raid1 (md127). See dmesg >>>>>>>>>> and mdadm --detail results attached. >>>>>>>>>> >>>>>>>>>> Many thanks, >>>>>>>>>> allie >>>>>>>>>> >>>>>>>>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: >>>>>>>>>>> Try this grep: >>>>>>>>>>> dmesg | grep "md/raid", if that returns nothing if you can just send >>>>>>>>>>> the entire dmesg. >>>>>>>>>>> >>>>>>>>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... >>>>>>>>>>>> any other thoughts on how to investigate? >>>>>>>>>>>> >>>>>>>>>>>> thanks, >>>>>>>>>>>> allie >>>>>>>>>>>> >>>>>>>>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: >>>>>>>>>>>>> A non-assembled array always reports raid1. >>>>>>>>>>>>> >>>>>>>>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>>>>>>>> Thanks Wol, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are >>>>>>>>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks >>>>>>>>>>>>>> listed as spares. The second (md127) is reported as active >>>>>>>>>>>>>> auto-read-only with all 7 disks operational. Also, the only >>>>>>>>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion >>>>>>>>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the >>>>>>>>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to >>>>>>>>>>>>>> check in before doing that... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Allie >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: >>>>>>>>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: >>>>>>>>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, >>>>>>>>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! >>>>>>>>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like >>>>>>>>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest >>>>>>>>>>>>>>> mdadm. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> All being well, the resync will restart, and when it's finished your >>>>>>>>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm >>>>>>>>>>>>>>> --stop array", followed by an "mdadm --assemble" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If that doesn't work, then >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-31 10:08 ` Alexander Shenkin 2020-03-31 13:53 ` Roger Heflin @ 2020-03-31 16:13 ` Alexander Shenkin 2020-03-31 16:16 ` Roger Heflin 1 sibling, 1 reply; 22+ messages in thread From: Alexander Shenkin @ 2020-03-31 16:13 UTC (permalink / raw) To: Roger Heflin; +Cc: antlists, Linux-RAID quick followup: trying a stop and assemble results in the message that it "Failed to restore critical section for reshape, sorry". On 3/31/2020 11:08 AM, Alexander Shenkin wrote: > Thanks Roger, > > It seems only the Raid1 module is loaded. I didn't find a > straightforward way to get that module loaded... any suggestions? Or, > will I have to find another livecd that contains raid456? > > Thanks, > Allie > > On 3/30/2020 9:45 PM, Roger Heflin wrote: >> They all seem to be there, all seem to report all 7 disks active, so >> it does not appear to be degraded. All event counters are the same. >> Something has to be causing them to not be scanned and assembled at >> all. >> >> Is the rescue disk a similar OS to what you have installed? If it is >> you might try a random say fedora livecd and see if it acts any >> different. >> >> what does fdisk -l /dev/sda look like? >> >> Is the raid456 module loaded (lsmod | grep raid)? >> >> what does cat /proc/cmdline look like? >> >> you might also run this: >> file -s /dev/sd*3 >> But I think it is going to show us the same thing as what the mdadm >> --examine is reporting. >> >> On Mon, Mar 30, 2020 at 3:05 PM Alexander Shenkin <al@shenkin.org> wrote: >>> >>> See attached. I should mention that the last drive i added is on a new >>> controller that is separate from the other drives, but seemed to work >>> fine for a bit, so kinda doubt that's the issue... >>> >>> thanks, >>> >>> allie >>> >>> On 3/30/2020 6:21 PM, Roger Heflin wrote: >>>> do this against each partition that had it: >>>> >>>> mdadm --examine /dev/sd*** >>>> >>>> It seems like it is not seeing it as a md-raid. >>>> >>>> On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>> Thanks Roger, >>>>> >>>>> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE >>>>> partitions"... >>>>> >>>>> Thanks, >>>>> >>>>> Allie >>>>> >>>>> On 3/30/2020 4:53 PM, Roger Heflin wrote: >>>>>> That seems really odd. Is the raid456 module loaded? >>>>>> >>>>>> On mine I see messages like this for each disk it scanned and >>>>>> considered as maybe possibly being an array member. >>>>>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 >>>>>> and messages like this: >>>>>> md/raid:md14: not clean -- starting background reconstruction >>>>>> >>>>>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a >>>>>> DEVICE line that limits what is being scanned. >>>>>> >>>>>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>> Thanks Roger, >>>>>>> >>>>>>> that grep just returns the detection of the raid1 (md127). See dmesg >>>>>>> and mdadm --detail results attached. >>>>>>> >>>>>>> Many thanks, >>>>>>> allie >>>>>>> >>>>>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: >>>>>>>> Try this grep: >>>>>>>> dmesg | grep "md/raid", if that returns nothing if you can just send >>>>>>>> the entire dmesg. >>>>>>>> >>>>>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... >>>>>>>>> any other thoughts on how to investigate? >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> allie >>>>>>>>> >>>>>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: >>>>>>>>>> A non-assembled array always reports raid1. >>>>>>>>>> >>>>>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. >>>>>>>>>> >>>>>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>>>>> Thanks Wol, >>>>>>>>>>> >>>>>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are >>>>>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks >>>>>>>>>>> listed as spares. The second (md127) is reported as active >>>>>>>>>>> auto-read-only with all 7 disks operational. Also, the only >>>>>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion >>>>>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the >>>>>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to >>>>>>>>>>> check in before doing that... >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Allie >>>>>>>>>>> >>>>>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: >>>>>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: >>>>>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, >>>>>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! >>>>>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like >>>>>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest >>>>>>>>>>>> mdadm. >>>>>>>>>>>> >>>>>>>>>>>> All being well, the resync will restart, and when it's finished your >>>>>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm >>>>>>>>>>>> --stop array", followed by an "mdadm --assemble" >>>>>>>>>>>> >>>>>>>>>>>> If that doesn't work, then >>>>>>>>>>>> >>>>>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-31 16:13 ` Alexander Shenkin @ 2020-03-31 16:16 ` Roger Heflin 2020-03-31 16:20 ` Alexander Shenkin 0 siblings, 1 reply; 22+ messages in thread From: Roger Heflin @ 2020-03-31 16:16 UTC (permalink / raw) To: Alexander Shenkin; +Cc: antlists, Linux-RAID were you doing a reshape when it was rebooted? And if so did you have to use an external file when doing the reshape and were was that file? I think there is a command to restart a reshape using an external file. On Tue, Mar 31, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: > > quick followup: trying a stop and assemble results in the message that > it "Failed to restore critical section for reshape, sorry". > > On 3/31/2020 11:08 AM, Alexander Shenkin wrote: > > Thanks Roger, > > > > It seems only the Raid1 module is loaded. I didn't find a > > straightforward way to get that module loaded... any suggestions? Or, > > will I have to find another livecd that contains raid456? > > > > Thanks, > > Allie > > > > On 3/30/2020 9:45 PM, Roger Heflin wrote: > >> They all seem to be there, all seem to report all 7 disks active, so > >> it does not appear to be degraded. All event counters are the same. > >> Something has to be causing them to not be scanned and assembled at > >> all. > >> > >> Is the rescue disk a similar OS to what you have installed? If it is > >> you might try a random say fedora livecd and see if it acts any > >> different. > >> > >> what does fdisk -l /dev/sda look like? > >> > >> Is the raid456 module loaded (lsmod | grep raid)? > >> > >> what does cat /proc/cmdline look like? > >> > >> you might also run this: > >> file -s /dev/sd*3 > >> But I think it is going to show us the same thing as what the mdadm > >> --examine is reporting. > >> > >> On Mon, Mar 30, 2020 at 3:05 PM Alexander Shenkin <al@shenkin.org> wrote: > >>> > >>> See attached. I should mention that the last drive i added is on a new > >>> controller that is separate from the other drives, but seemed to work > >>> fine for a bit, so kinda doubt that's the issue... > >>> > >>> thanks, > >>> > >>> allie > >>> > >>> On 3/30/2020 6:21 PM, Roger Heflin wrote: > >>>> do this against each partition that had it: > >>>> > >>>> mdadm --examine /dev/sd*** > >>>> > >>>> It seems like it is not seeing it as a md-raid. > >>>> > >>>> On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>> Thanks Roger, > >>>>> > >>>>> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE > >>>>> partitions"... > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Allie > >>>>> > >>>>> On 3/30/2020 4:53 PM, Roger Heflin wrote: > >>>>>> That seems really odd. Is the raid456 module loaded? > >>>>>> > >>>>>> On mine I see messages like this for each disk it scanned and > >>>>>> considered as maybe possibly being an array member. > >>>>>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 > >>>>>> and messages like this: > >>>>>> md/raid:md14: not clean -- starting background reconstruction > >>>>>> > >>>>>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a > >>>>>> DEVICE line that limits what is being scanned. > >>>>>> > >>>>>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>> Thanks Roger, > >>>>>>> > >>>>>>> that grep just returns the detection of the raid1 (md127). See dmesg > >>>>>>> and mdadm --detail results attached. > >>>>>>> > >>>>>>> Many thanks, > >>>>>>> allie > >>>>>>> > >>>>>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: > >>>>>>>> Try this grep: > >>>>>>>> dmesg | grep "md/raid", if that returns nothing if you can just send > >>>>>>>> the entire dmesg. > >>>>>>>> > >>>>>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... > >>>>>>>>> any other thoughts on how to investigate? > >>>>>>>>> > >>>>>>>>> thanks, > >>>>>>>>> allie > >>>>>>>>> > >>>>>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: > >>>>>>>>>> A non-assembled array always reports raid1. > >>>>>>>>>> > >>>>>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. > >>>>>>>>>> > >>>>>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>>>>>> Thanks Wol, > >>>>>>>>>>> > >>>>>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are > >>>>>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks > >>>>>>>>>>> listed as spares. The second (md127) is reported as active > >>>>>>>>>>> auto-read-only with all 7 disks operational. Also, the only > >>>>>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion > >>>>>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the > >>>>>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to > >>>>>>>>>>> check in before doing that... > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Allie > >>>>>>>>>>> > >>>>>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: > >>>>>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: > >>>>>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, > >>>>>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! > >>>>>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like > >>>>>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest > >>>>>>>>>>>> mdadm. > >>>>>>>>>>>> > >>>>>>>>>>>> All being well, the resync will restart, and when it's finished your > >>>>>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm > >>>>>>>>>>>> --stop array", followed by an "mdadm --assemble" > >>>>>>>>>>>> > >>>>>>>>>>>> If that doesn't work, then > >>>>>>>>>>>> > >>>>>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-31 16:16 ` Roger Heflin @ 2020-03-31 16:20 ` Alexander Shenkin 2020-03-31 16:30 ` Roger Heflin 0 siblings, 1 reply; 22+ messages in thread From: Alexander Shenkin @ 2020-03-31 16:20 UTC (permalink / raw) To: Roger Heflin; +Cc: antlists, Linux-RAID Yes, I had added a drive and it was busy copying data to the new drive when the reshape slowed down gradually, and eventually the system locked up. I didn't change raid configurations or anything like that - just added a drive. I didn't use any external files, so not sure if i'd be able to recover any... i suspect not... thanks, allie On 3/31/2020 5:16 PM, Roger Heflin wrote: > were you doing a reshape when it was rebooted? And if so did you > have to use an external file when doing the reshape and were was that > file? I think there is a command to restart a reshape using an > external file. > > On Tue, Mar 31, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: >> >> quick followup: trying a stop and assemble results in the message that >> it "Failed to restore critical section for reshape, sorry". >> >> On 3/31/2020 11:08 AM, Alexander Shenkin wrote: >>> Thanks Roger, >>> >>> It seems only the Raid1 module is loaded. I didn't find a >>> straightforward way to get that module loaded... any suggestions? Or, >>> will I have to find another livecd that contains raid456? >>> >>> Thanks, >>> Allie >>> >>> On 3/30/2020 9:45 PM, Roger Heflin wrote: >>>> They all seem to be there, all seem to report all 7 disks active, so >>>> it does not appear to be degraded. All event counters are the same. >>>> Something has to be causing them to not be scanned and assembled at >>>> all. >>>> >>>> Is the rescue disk a similar OS to what you have installed? If it is >>>> you might try a random say fedora livecd and see if it acts any >>>> different. >>>> >>>> what does fdisk -l /dev/sda look like? >>>> >>>> Is the raid456 module loaded (lsmod | grep raid)? >>>> >>>> what does cat /proc/cmdline look like? >>>> >>>> you might also run this: >>>> file -s /dev/sd*3 >>>> But I think it is going to show us the same thing as what the mdadm >>>> --examine is reporting. >>>> >>>> On Mon, Mar 30, 2020 at 3:05 PM Alexander Shenkin <al@shenkin.org> wrote: >>>>> >>>>> See attached. I should mention that the last drive i added is on a new >>>>> controller that is separate from the other drives, but seemed to work >>>>> fine for a bit, so kinda doubt that's the issue... >>>>> >>>>> thanks, >>>>> >>>>> allie >>>>> >>>>> On 3/30/2020 6:21 PM, Roger Heflin wrote: >>>>>> do this against each partition that had it: >>>>>> >>>>>> mdadm --examine /dev/sd*** >>>>>> >>>>>> It seems like it is not seeing it as a md-raid. >>>>>> >>>>>> On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>> Thanks Roger, >>>>>>> >>>>>>> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE >>>>>>> partitions"... >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Allie >>>>>>> >>>>>>> On 3/30/2020 4:53 PM, Roger Heflin wrote: >>>>>>>> That seems really odd. Is the raid456 module loaded? >>>>>>>> >>>>>>>> On mine I see messages like this for each disk it scanned and >>>>>>>> considered as maybe possibly being an array member. >>>>>>>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 >>>>>>>> and messages like this: >>>>>>>> md/raid:md14: not clean -- starting background reconstruction >>>>>>>> >>>>>>>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a >>>>>>>> DEVICE line that limits what is being scanned. >>>>>>>> >>>>>>>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>>> Thanks Roger, >>>>>>>>> >>>>>>>>> that grep just returns the detection of the raid1 (md127). See dmesg >>>>>>>>> and mdadm --detail results attached. >>>>>>>>> >>>>>>>>> Many thanks, >>>>>>>>> allie >>>>>>>>> >>>>>>>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: >>>>>>>>>> Try this grep: >>>>>>>>>> dmesg | grep "md/raid", if that returns nothing if you can just send >>>>>>>>>> the entire dmesg. >>>>>>>>>> >>>>>>>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... >>>>>>>>>>> any other thoughts on how to investigate? >>>>>>>>>>> >>>>>>>>>>> thanks, >>>>>>>>>>> allie >>>>>>>>>>> >>>>>>>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: >>>>>>>>>>>> A non-assembled array always reports raid1. >>>>>>>>>>>> >>>>>>>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: >>>>>>>>>>>>> Thanks Wol, >>>>>>>>>>>>> >>>>>>>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are >>>>>>>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks >>>>>>>>>>>>> listed as spares. The second (md127) is reported as active >>>>>>>>>>>>> auto-read-only with all 7 disks operational. Also, the only >>>>>>>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion >>>>>>>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the >>>>>>>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to >>>>>>>>>>>>> check in before doing that... >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Allie >>>>>>>>>>>>> >>>>>>>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: >>>>>>>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: >>>>>>>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, >>>>>>>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! >>>>>>>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like >>>>>>>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest >>>>>>>>>>>>>> mdadm. >>>>>>>>>>>>>> >>>>>>>>>>>>>> All being well, the resync will restart, and when it's finished your >>>>>>>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm >>>>>>>>>>>>>> --stop array", followed by an "mdadm --assemble" >>>>>>>>>>>>>> >>>>>>>>>>>>>> If that doesn't work, then >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-31 16:20 ` Alexander Shenkin @ 2020-03-31 16:30 ` Roger Heflin 0 siblings, 0 replies; 22+ messages in thread From: Roger Heflin @ 2020-03-31 16:30 UTC (permalink / raw) To: Alexander Shenkin; +Cc: antlists, Linux-RAID If you did not use external files, you are better off, it only requires the external files in a subset of cases. I think there is a way to start a reshape, someone should know how to do that now that we can at least find your raid and see it with activation failing. You may want to start a new thread with how to resume a reshape after it aborted (with a summary of the original and the error you are now getting). You may want to use the newest fedora livecd you can find as the original issue may have been a bug in your old kernel. If you can get the reshape going I would let it finish on that livecd so that the old system does not have to do a reshape with what may be a buggy kernel. I also have typically avoided the rescue cd's and stayed with full livecd's because of the limited tool sets and functionality on the dedicated rescue ones. Usually I pick a random fedora liivecd to use as a rescue disk and that in general has worked very well in a wide variety of ancient OS'es (compared to the really new fedora livecd). On Tue, Mar 31, 2020 at 11:20 AM Alexander Shenkin <al@shenkin.org> wrote: > > Yes, I had added a drive and it was busy copying data to the new drive > when the reshape slowed down gradually, and eventually the system locked > up. I didn't change raid configurations or anything like that - just > added a drive. I didn't use any external files, so not sure if i'd be > able to recover any... i suspect not... > > thanks, > allie > > On 3/31/2020 5:16 PM, Roger Heflin wrote: > > were you doing a reshape when it was rebooted? And if so did you > > have to use an external file when doing the reshape and were was that > > file? I think there is a command to restart a reshape using an > > external file. > > > > On Tue, Mar 31, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >> > >> quick followup: trying a stop and assemble results in the message that > >> it "Failed to restore critical section for reshape, sorry". > >> > >> On 3/31/2020 11:08 AM, Alexander Shenkin wrote: > >>> Thanks Roger, > >>> > >>> It seems only the Raid1 module is loaded. I didn't find a > >>> straightforward way to get that module loaded... any suggestions? Or, > >>> will I have to find another livecd that contains raid456? > >>> > >>> Thanks, > >>> Allie > >>> > >>> On 3/30/2020 9:45 PM, Roger Heflin wrote: > >>>> They all seem to be there, all seem to report all 7 disks active, so > >>>> it does not appear to be degraded. All event counters are the same. > >>>> Something has to be causing them to not be scanned and assembled at > >>>> all. > >>>> > >>>> Is the rescue disk a similar OS to what you have installed? If it is > >>>> you might try a random say fedora livecd and see if it acts any > >>>> different. > >>>> > >>>> what does fdisk -l /dev/sda look like? > >>>> > >>>> Is the raid456 module loaded (lsmod | grep raid)? > >>>> > >>>> what does cat /proc/cmdline look like? > >>>> > >>>> you might also run this: > >>>> file -s /dev/sd*3 > >>>> But I think it is going to show us the same thing as what the mdadm > >>>> --examine is reporting. > >>>> > >>>> On Mon, Mar 30, 2020 at 3:05 PM Alexander Shenkin <al@shenkin.org> wrote: > >>>>> > >>>>> See attached. I should mention that the last drive i added is on a new > >>>>> controller that is separate from the other drives, but seemed to work > >>>>> fine for a bit, so kinda doubt that's the issue... > >>>>> > >>>>> thanks, > >>>>> > >>>>> allie > >>>>> > >>>>> On 3/30/2020 6:21 PM, Roger Heflin wrote: > >>>>>> do this against each partition that had it: > >>>>>> > >>>>>> mdadm --examine /dev/sd*** > >>>>>> > >>>>>> It seems like it is not seeing it as a md-raid. > >>>>>> > >>>>>> On Mon, Mar 30, 2020 at 11:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>> Thanks Roger, > >>>>>>> > >>>>>>> The only line that isn't commented out in /etc/mdadm.conf is "DEVICE > >>>>>>> partitions"... > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Allie > >>>>>>> > >>>>>>> On 3/30/2020 4:53 PM, Roger Heflin wrote: > >>>>>>>> That seems really odd. Is the raid456 module loaded? > >>>>>>>> > >>>>>>>> On mine I see messages like this for each disk it scanned and > >>>>>>>> considered as maybe possibly being an array member. > >>>>>>>> kernel: [ 83.468700] md/raid:md13: device sdi3 operational as raid disk 5 > >>>>>>>> and messages like this: > >>>>>>>> md/raid:md14: not clean -- starting background reconstruction > >>>>>>>> > >>>>>>>> You might look at /etc/mdadm.conf on the rescue cd and see if it has a > >>>>>>>> DEVICE line that limits what is being scanned. > >>>>>>>> > >>>>>>>> On Mon, Mar 30, 2020 at 10:13 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>>>> Thanks Roger, > >>>>>>>>> > >>>>>>>>> that grep just returns the detection of the raid1 (md127). See dmesg > >>>>>>>>> and mdadm --detail results attached. > >>>>>>>>> > >>>>>>>>> Many thanks, > >>>>>>>>> allie > >>>>>>>>> > >>>>>>>>> On 3/28/2020 1:36 PM, Roger Heflin wrote: > >>>>>>>>>> Try this grep: > >>>>>>>>>> dmesg | grep "md/raid", if that returns nothing if you can just send > >>>>>>>>>> the entire dmesg. > >>>>>>>>>> > >>>>>>>>>> On Sat, Mar 28, 2020 at 2:47 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>>>>>> Thanks Roger. dmesg has nothing in it referring to md126 or md127.... > >>>>>>>>>>> any other thoughts on how to investigate? > >>>>>>>>>>> > >>>>>>>>>>> thanks, > >>>>>>>>>>> allie > >>>>>>>>>>> > >>>>>>>>>>> On 3/27/2020 3:55 PM, Roger Heflin wrote: > >>>>>>>>>>>> A non-assembled array always reports raid1. > >>>>>>>>>>>> > >>>>>>>>>>>> I would run "dmesg | grep md126" to start with and see what it reports it saw. > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Mar 27, 2020 at 10:29 AM Alexander Shenkin <al@shenkin.org> wrote: > >>>>>>>>>>>>> Thanks Wol, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are > >>>>>>>>>>>>> reported. The first (md126) in reported as inactive with all 7 disks > >>>>>>>>>>>>> listed as spares. The second (md127) is reported as active > >>>>>>>>>>>>> auto-read-only with all 7 disks operational. Also, the only > >>>>>>>>>>>>> "personality" reported is Raid1. I could go ahead with your suggestion > >>>>>>>>>>>>> of mdadm --stop array and then mdadm --assemble, but I thought the > >>>>>>>>>>>>> reporting of just the Raid1 personality was a bit strange, so wanted to > >>>>>>>>>>>>> check in before doing that... > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Allie > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 3/26/2020 10:00 PM, antlists wrote: > >>>>>>>>>>>>>> On 26/03/2020 17:07, Alexander Shenkin wrote: > >>>>>>>>>>>>>>> I surely need to boot with a rescue disk of some sort, but from there, > >>>>>>>>>>>>>>> I'm not sure exactly when I should do. Any suggestions are very welcome! > >>>>>>>>>>>>>> Okay. Find a liveCD that supports raid (hopefully something like > >>>>>>>>>>>>>> SystemRescueCD). Make sure it has a very recent kernel and the latest > >>>>>>>>>>>>>> mdadm. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> All being well, the resync will restart, and when it's finished your > >>>>>>>>>>>>>> system will be fine. If it doesn't restart on its own, do an "mdadm > >>>>>>>>>>>>>> --stop array", followed by an "mdadm --assemble" > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> If that doesn't work, then > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>> Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid-6 won't boot 2020-03-27 15:27 ` Alexander Shenkin 2020-03-27 15:55 ` Roger Heflin @ 2020-03-28 10:47 ` antlists 1 sibling, 0 replies; 22+ messages in thread From: antlists @ 2020-03-28 10:47 UTC (permalink / raw) To: Alexander Shenkin, Linux-RAID On 27/03/2020 15:27, Alexander Shenkin wrote: > Thanks Wol, > > Booting in SystemRescueCD and looking in /proc/mdstat, two arrays are > reported. The first (md126) in reported as inactive with all 7 disks > listed as spares. The second (md127) is reported as active > auto-read-only with all 7 disks operational. Also, the only > "personality" reported is Raid1. I could go ahead with your suggestion > of mdadm --stop array and then mdadm --assemble, but I thought the > reporting of just the Raid1 personality was a bit strange, so wanted to > check in before doing that... Always remember - provided you don't use a --force, it won't do any damange. Given that booting into a rescue CD didn't assemble correctly, it looks like --stop then --assemble won't work. You need to follow the instructions in "when things go wrogn". Cheers, Wol ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2020-03-31 16:30 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-26 17:07 Raid-6 won't boot Alexander Shenkin 2020-03-26 22:00 ` antlists 2020-03-27 15:27 ` Alexander Shenkin 2020-03-27 15:55 ` Roger Heflin 2020-03-28 7:47 ` Alexander Shenkin 2020-03-28 13:36 ` Roger Heflin [not found] ` <c8185f80-837e-9654-ee19-611a030a0d54@shenkin.org> 2020-03-30 15:53 ` Roger Heflin 2020-03-30 16:13 ` Alexander Shenkin 2020-03-30 17:21 ` Roger Heflin 2020-03-30 20:05 ` Alexander Shenkin 2020-03-30 20:45 ` Roger Heflin 2020-03-31 0:16 ` antlists 2020-03-31 10:08 ` Alexander Shenkin 2020-03-31 13:53 ` Roger Heflin 2020-03-31 14:28 ` Alexander Shenkin 2020-03-31 14:43 ` Roger Heflin 2020-03-31 16:03 ` Alexander Shenkin 2020-03-31 16:13 ` Alexander Shenkin 2020-03-31 16:16 ` Roger Heflin 2020-03-31 16:20 ` Alexander Shenkin 2020-03-31 16:30 ` Roger Heflin 2020-03-28 10:47 ` antlists
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.