* Trying to rescue a RAID-1 array @ 2022-03-31 16:44 Bruce Korb 2022-03-31 17:06 ` Wols Lists 0 siblings, 1 reply; 18+ messages in thread From: Bruce Korb @ 2022-03-31 16:44 UTC (permalink / raw) To: linux-raid I moved the two disks from a cleanly shut down system that could not reboot and could not be upgraded to a new OS release. So, I put them in.a new box and did an install. The installation recognized them as a RAID and decided that the partitions needed a new superblock of type RAID-0. Since these data have never been remounted since the shutdown on the original machine, I am hoping I can change the RAID type and mount it so as to recover my. .ssh and .thunderbird (email) directories. The bulk of the data are backed up (assuming no issues with the full backup of my critical data), but rebuilding and redistributing the .ssh directory would be a particular nuisance. SO: what are my options? I can't find any advice on how to tell mdadm that the RAID-0 partitions really are RAID-1 partitions. Last gasp might be to "mdadm --create" the RAID-1 again, but there's a lot of advice out there saying that it really is the last gasp before giving up. :) Thank you! - Bruce ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Trying to rescue a RAID-1 array 2022-03-31 16:44 Trying to rescue a RAID-1 array Bruce Korb @ 2022-03-31 17:06 ` Wols Lists 2022-03-31 18:14 ` Bruce Korb 0 siblings, 1 reply; 18+ messages in thread From: Wols Lists @ 2022-03-31 17:06 UTC (permalink / raw) To: bruce.korb+reply, linux-raid On 31/03/2022 17:44, Bruce Korb wrote: > I moved the two disks from a cleanly shut down system that could not > reboot and could not > be upgraded to a new OS release. So, I put them in.a new box and did an install. > The installation recognized them as a RAID and decided that the > partitions needed a > new superblock of type RAID-0. That's worrying, did it really write a superblock? > Since these data have never been > remounted since the > shutdown on the original machine, I am hoping I can change the RAID > type and mount it > so as to recover my. .ssh and .thunderbird (email) directories. The > bulk of the data are > backed up (assuming no issues with the full backup of my critical > data), but rebuilding > and redistributing the .ssh directory would be a particular nuisance. > > SO: what are my options? I can't find any advice on how to tell mdadm > that the RAID-0 partitions > really are RAID-1 partitions. Last gasp might be to "mdadm --create" > the RAID-1 again, but there's > a lot of advice out there saying that it really is the last gasp > before giving up. :) > https://raid.wiki.kernel.org/index.php/Asking_for_help Especially lsdrv. That tells us a LOT about your system. What was the filesystem on your raid? Hopefully it's as simple as moving the "start of partition", breaking the raid completely, and you can just mount the filesystem. What really worries me is how and why it both recognised it as a raid, then thought it needed to be converted to raid-0. That just sounds wrong on so many levels. Did you let it mess with your superblocks? I hope you said "don't touch those drives"? Cheers, Wol ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Trying to rescue a RAID-1 array 2022-03-31 17:06 ` Wols Lists @ 2022-03-31 18:14 ` Bruce Korb 2022-03-31 21:34 ` Wols Lists 0 siblings, 1 reply; 18+ messages in thread From: Bruce Korb @ 2022-03-31 18:14 UTC (permalink / raw) To: Wols Lists; +Cc: brucekorbreply, linux-raid On Thu, Mar 31, 2022 at 10:06 AM Wols Lists <antlists@youngman.org.uk> wrote: > > On 31/03/2022 17:44, Bruce Korb wrote: > > I moved the two disks from a cleanly shut down system that could not > > reboot and could not > > be upgraded to a new OS release. So, I put them in.a new box and did an install. > > The installation recognized them as a RAID and decided that the > > partitions needed a > > new superblock of type RAID-0. > > That's worrying, did it really write a superblock? Yep. That worried me, too. I did the command to show the RAID status of the two partitions and, sure enough, both partitions were now listed as RAID0. > > Since these data have never been > > remounted since the > > shutdown on the original machine, I am hoping I can change the RAID > > type and mount it > > so as to recover my. .ssh and .thunderbird (email) directories. The > > bulk of the data are > > backed up (assuming no issues with the full backup of my critical > > data), but rebuilding > > and redistributing the .ssh directory would be a particular nuisance. > > > > SO: what are my options? I can't find any advice on how to tell mdadm > > that the RAID-0 partitions > > really are RAID-1 partitions. Last gasp might be to "mdadm --create" > > the RAID-1 again, but there's > > a lot of advice out there saying that it really is the last gasp > > before giving up. :) > > > > https://raid.wiki.kernel.org/index.php/Asking_for_help Sorry about that. I have two systems: the one I'm typing on and the one I am trying to bring up. At the moment, I'm in single user mode building out a new /home file system. mdadm --create is 15% done after an hour :(. It'll be mid/late afternoon before /home is rebuilt, mounted and I'll be able to run display commands on the "old" RAID1 (or 0) partitions. > Especially lsdrv. That tells us a LOT about your system. Expect email in about 6 hours or so. :) But openSUSE doesn't know about any "lsdrv" command. "cat /proc/mdstat" shows /dev/md1 (the RAID device I'm fretting over) to be active, raid-0 using /dev/sdc1 and sde1. > What was the filesystem on your raid? Hopefully it's as simple as moving > the "start of partition", breaking the raid completely, and you can just > mount the filesystem. I *think* it was EXT4, but. it might be the XFS one. I think I let it default and openSUSE appears to prefer the XFS file system for RAID devices. Definitely one of those two. I built it close to a decade ago, so I'll be moving the data to the new /home array. > What really worries me is how and why it both recognised it as a raid, > then thought it needed to be converted to raid-0. That just sounds wrong > on so many levels. Did you let it mess with your superblocks? I hope you > said "don't touch those drives"? In retrospect, I ought to have left the drives unplugged until the install was done. The installer saw that they were RAID so it RAID-ed them. Only it seems to have decided on type 0 over type 1. I wasn't attentive because I've upgraded Linux so many times and it was "just done correctly" without having to give it a lot of thought. "If only" I'd thought to back up email and ssh. (1.5TB of photos are likely okay.) Thank you so much for your reply and potentially help :) Regards, Bruce ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Trying to rescue a RAID-1 array 2022-03-31 18:14 ` Bruce Korb @ 2022-03-31 21:34 ` Wols Lists 2022-04-01 17:59 ` Bruce Korb 0 siblings, 1 reply; 18+ messages in thread From: Wols Lists @ 2022-03-31 21:34 UTC (permalink / raw) To: bruce.korb+reply; +Cc: linux-raid On 31/03/2022 19:14, Bruce Korb wrote: > On Thu, Mar 31, 2022 at 10:06 AM Wols Lists <antlists@youngman.org.uk> wrote: >> >> On 31/03/2022 17:44, Bruce Korb wrote: >>> I moved the two disks from a cleanly shut down system that could not >>> reboot and could not >>> be upgraded to a new OS release. So, I put them in.a new box and did an install. >>> The installation recognized them as a RAID and decided that the >>> partitions needed a >>> new superblock of type RAID-0. >> >> That's worrying, did it really write a superblock? > > Yep. That worried me, too. I did the command to show the RAID status of the two > partitions and, sure enough, both partitions were now listed as RAID0. > >>> Since these data have never been >>> remounted since the >>> shutdown on the original machine, I am hoping I can change the RAID >>> type and mount it >>> so as to recover my. .ssh and .thunderbird (email) directories. The >>> bulk of the data are >>> backed up (assuming no issues with the full backup of my critical >>> data), but rebuilding >>> and redistributing the .ssh directory would be a particular nuisance. >>> >>> SO: what are my options? I can't find any advice on how to tell mdadm >>> that the RAID-0 partitions >>> really are RAID-1 partitions. Last gasp might be to "mdadm --create" >>> the RAID-1 again, but there's >>> a lot of advice out there saying that it really is the last gasp >>> before giving up. :) >>> >> >> https://raid.wiki.kernel.org/index.php/Asking_for_help > > Sorry about that. I have two systems: the one I'm typing on and the one > I am trying to bring up. At the moment, I'm in single user mode building > out a new /home file system. mdadm --create is 15% done after an hour :(. > It'll be mid/late afternoon before /home is rebuilt, mounted and I'll be > able to run display commands on the "old" RAID1 (or 0) partitions. > >> Especially lsdrv. That tells us a LOT about your system. > > Expect email in about 6 hours or so. :) But openSUSE doesn't know > about any "lsdrv" command. "cat /proc/mdstat" shows /dev/md1 (the > RAID device I'm fretting over) to be active, raid-0 using /dev/sdc1 and sde1. Well, the webpage does tell you where to download it from - it's not part of the official tools, and it's a personal thing that's damn useful. > >> What was the filesystem on your raid? Hopefully it's as simple as moving >> the "start of partition", breaking the raid completely, and you can just >> mount the filesystem. > > I *think* it was EXT4, but. it might be the XFS one. I think I let it default > and openSUSE appears to prefer the XFS file system for RAID devices. > Definitely one of those two. I built it close to a decade ago, so I'll be moving > the data to the new /home array. > >> What really worries me is how and why it both recognised it as a raid, >> then thought it needed to be converted to raid-0. That just sounds wrong >> on so many levels. Did you let it mess with your superblocks? I hope you >> said "don't touch those drives"? > > In retrospect, I ought to have left the drives unplugged until the install was > done. The installer saw that they were RAID so it RAID-ed them. Only it > seems to have decided on type 0 over type 1. I wasn't attentive because > I've upgraded Linux so many times and it was "just done correctly" without > having to give it a lot of thought. "If only" I'd thought to back up > email and ssh. > (1.5TB of photos are likely okay.) > > Thank you so much for your reply and potentially help :) > If it says the drive is active ... When you get and run lsdrv, see if it finds a filesystem on the raid-0 - I suspect it might! There's a bug, which should be well fixed, but it might have bitten you. It breaks raid arrays. But if the drive is active, it might well mount, and you will be running a degraded mirror. Mount it read-only, back it up, and then see whether you can force-assemble the two bits back together :-) But don't do anything if you have any trouble whatsoever mounting and backing up. Cheers, Wol ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Trying to rescue a RAID-1 array 2022-03-31 21:34 ` Wols Lists @ 2022-04-01 17:59 ` Bruce Korb 2022-04-01 18:21 ` Bruce Korb 0 siblings, 1 reply; 18+ messages in thread From: Bruce Korb @ 2022-04-01 17:59 UTC (permalink / raw) To: Wols Lists; +Cc: brucekorbreply, linux-raid [-- Attachment #1: Type: text/plain, Size: 6039 bytes --] Hi, Thank you again. I've attached a typescript of the commands. Here are the line numbers where the commands get issued. The relevant partitions are on /dev/sdc1 and /dev/sde1: 1:>3> uname -a 3:>4> mdadm --version 5:>5> for d in /dev/sd[ce] 6:>7> smartctl --xall /dev/sdc 252:>8> mdadm --examine /dev/sdc 256:>9> mdadm --examine /dev/sdc1 281:>5> for d in /dev/sd[ce] 282:>7> smartctl --xall /dev/sde 556:>8> mdadm --examine /dev/sde 560:>9> mdadm --examine /dev/sde1 585:>11> mdadm --detail /dev/md1 614:>12> /home/bkorb/bin/lsdrv/lsdrv-master/lsdrv Right after line 256, you'll see the fateful info: 263 Creation Time : Tue Mar 29 11:02:09 2022 264 Raid Level : raid0 The first block of /dev/sdc1 contains: bkorb@bach:~> sudo od -Ax -N 4096 -tx1 /dev/sdc1 000000 58 46 53 42 00 00 10 00 00 00 00 00 33 33 32 d0 000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000020 fe 71 6d a2 b5 15 4f d6 8e a6 f4 4f 48 03 8b 78 000030 00 00 00 00 20 00 00 07 00 00 00 00 00 00 00 60 000040 00 00 00 00 00 00 00 61 00 00 00 00 00 00 00 62 000050 00 00 00 01 0c cc cc b4 00 00 00 04 00 00 00 00 000060 00 06 66 66 bc b5 10 00 02 00 00 08 55 73 65 72 000070 00 00 00 00 00 00 00 00 0c 0c 09 03 1c 00 00 05 000080 00 00 00 00 00 15 9c 80 00 00 00 00 00 00 2b 48 000090 00 00 00 00 27 92 36 06 00 00 00 00 00 00 00 00 0000a0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0000b0 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 0000c0 00 0c 10 00 00 00 10 00 00 00 01 8a 00 00 01 8a 0000d0 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 0000e0 6f 7c f9 cc 00 00 00 00 ff ff ff ff ff ff ff ff 0000f0 00 00 00 7a 00 31 11 20 00 00 00 00 00 00 00 00 000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 I wouldn't know how to find the file system start. :) - Bruce On Thu, Mar 31, 2022 at 2:34 PM Wols Lists <antlists@youngman.org.uk> wrote: > > On 31/03/2022 19:14, Bruce Korb wrote: > > On Thu, Mar 31, 2022 at 10:06 AM Wols Lists <antlists@youngman.org.uk> wrote: > >> > >> On 31/03/2022 17:44, Bruce Korb wrote: > >>> I moved the two disks from a cleanly shut down system that could not > >>> reboot and could not > >>> be upgraded to a new OS release. So, I put them in.a new box and did an install. > >>> The installation recognized them as a RAID and decided that the > >>> partitions needed a > >>> new superblock of type RAID-0. > >> > >> That's worrying, did it really write a superblock? > > > > Yep. That worried me, too. I did the command to show the RAID status of the two > > partitions and, sure enough, both partitions were now listed as RAID0. > > > >>> Since these data have never been > >>> remounted since the > >>> shutdown on the original machine, I am hoping I can change the RAID > >>> type and mount it > >>> so as to recover my. .ssh and .thunderbird (email) directories. The > >>> bulk of the data are > >>> backed up (assuming no issues with the full backup of my critical > >>> data), but rebuilding > >>> and redistributing the .ssh directory would be a particular nuisance. > >>> > >>> SO: what are my options? I can't find any advice on how to tell mdadm > >>> that the RAID-0 partitions > >>> really are RAID-1 partitions. Last gasp might be to "mdadm --create" > >>> the RAID-1 again, but there's > >>> a lot of advice out there saying that it really is the last gasp > >>> before giving up. :) > >>> > >> > >> https://raid.wiki.kernel.org/index.php/Asking_for_help > > > > Sorry about that. I have two systems: the one I'm typing on and the one > > I am trying to bring up. At the moment, I'm in single user mode building > > out a new /home file system. mdadm --create is 15% done after an hour :(. > > It'll be mid/late afternoon before /home is rebuilt, mounted and I'll be > > able to run display commands on the "old" RAID1 (or 0) partitions. > > > >> Especially lsdrv. That tells us a LOT about your system. > > > > Expect email in about 6 hours or so. :) But openSUSE doesn't know > > about any "lsdrv" command. "cat /proc/mdstat" shows /dev/md1 (the > > RAID device I'm fretting over) to be active, raid-0 using /dev/sdc1 and sde1. > > Well, the webpage does tell you where to download it from - it's not > part of the official tools, and it's a personal thing that's damn useful. > > > >> What was the filesystem on your raid? Hopefully it's as simple as moving > >> the "start of partition", breaking the raid completely, and you can just > >> mount the filesystem. > > > > I *think* it was EXT4, but. it might be the XFS one. I think I let it default > > and openSUSE appears to prefer the XFS file system for RAID devices. > > Definitely one of those two. I built it close to a decade ago, so I'll be moving > > the data to the new /home array. > > > >> What really worries me is how and why it both recognised it as a raid, > >> then thought it needed to be converted to raid-0. That just sounds wrong > >> on so many levels. Did you let it mess with your superblocks? I hope you > >> said "don't touch those drives"? > > > > In retrospect, I ought to have left the drives unplugged until the install was > > done. The installer saw that they were RAID so it RAID-ed them. Only it > > seems to have decided on type 0 over type 1. I wasn't attentive because > > I've upgraded Linux so many times and it was "just done correctly" without > > having to give it a lot of thought. "If only" I'd thought to back up > > email and ssh. > > (1.5TB of photos are likely okay.) > > > > Thank you so much for your reply and potentially help :) > > > If it says the drive is active ... > > When you get and run lsdrv, see if it finds a filesystem on the raid-0 - > I suspect it might! > > There's a bug, which should be well fixed, but it might have bitten you. > It breaks raid arrays. But if the drive is active, it might well mount, > and you will be running a degraded mirror. Mount it read-only, back it > up, and then see whether you can force-assemble the two bits back > together :-) > > But don't do anything if you have any trouble whatsoever mounting and > backing up. > > Cheers, > Wol [-- Attachment #2: MD-Data.txt --] [-- Type: text/plain, Size: 30412 bytes --] >3> uname -a Linux bach 5.14.21-150400.11-default #1 SMP PREEMPT_DYNAMIC Wed Mar 2 08:27:22 UTC 2022 (0cc030f) x86_64 x86_64 x86_64 GNU/Linux >4> mdadm --version mdadm - v4.1 - 2018-10-01 >5> for d in /dev/sd[ce] >7> smartctl --xall /dev/sdc smartctl 7.2 2021-09-14 r5237 [x86_64-linux-5.14.21-150400.11-default] (SUSE RPM) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: HGST MegaScale 4000 Device Model: HGST HMS5C4040ALE640 Serial Number: PL1331LAHEZZ5H LU WWN Device Id: 5 000cca 22ed47499 Firmware Version: MPAOA580 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5700 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Fri Apr 1 10:29:34 2022 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM feature is: Disabled Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, frozen [SEC2] Wt Cache Reorder: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 28) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 705) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate PO-R-- 100 100 016 - 0 2 Throughput_Performance P-S--- 132 132 054 - 109 3 Spin_Up_Time POS--- 132 132 024 - 503 (Average 552) 4 Start_Stop_Count -O--C- 100 100 000 - 3497 5 Reallocated_Sector_Ct PO--CK 100 100 005 - 121 7 Seek_Error_Rate PO-R-- 100 100 067 - 0 8 Seek_Time_Performance P-S--- 113 113 020 - 42 9 Power_On_Hours -O--C- 098 098 000 - 16330 10 Spin_Retry_Count PO--C- 100 100 060 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 3495 192 Power-Off_Retract_Count -O--CK 097 097 000 - 4535 193 Load_Cycle_Count -O--C- 097 097 000 - 4535 194 Temperature_Celsius -O---- 222 222 000 - 27 (Min/Max 13/52) 196 Reallocated_Event_Count -O--CK 100 100 000 - 131 197 Current_Pending_Sector -O---K 100 100 000 - 0 198 Offline_Uncorrectable ---R-- 100 100 000 - 0 199 UDMA_CRC_Error_Count -O-R-- 200 200 000 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x03 GPL R/O 1 Ext. Comprehensive SMART error log 0x04 GPL R/O 7 Device Statistics log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x08 GPL R/O 2 Power Conditions log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x20 GPL R/O 1 Streaming performance log [OBS-8] 0x21 GPL R/O 1 Write stream error log 0x22 GPL R/O 1 Read stream error log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (1 sectors) No Errors Logged SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 15941 - # 2 Short offline Completed without error 00% 15934 - # 3 Short offline Completed without error 00% 15928 - # 4 Short offline Completed without error 00% 15917 - # 5 Short offline Completed without error 00% 15912 - # 6 Short offline Completed without error 00% 15906 - # 7 Short offline Completed without error 00% 15903 - # 8 Short offline Completed without error 00% 15894 - # 9 Short offline Completed without error 00% 15887 - #10 Short offline Completed without error 00% 15876 - #11 Short offline Completed without error 00% 15870 - #12 Short offline Completed without error 00% 15863 - #13 Short offline Completed without error 00% 15857 - #14 Short offline Completed without error 00% 15851 - #15 Short offline Completed without error 00% 15841 - #16 Short offline Completed without error 00% 15835 - #17 Short offline Completed without error 00% 15826 - #18 Short offline Completed without error 00% 15819 - #19 Short offline Completed without error 00% 15815 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 256 (0x0100) Device State: Active (0) Current Temperature: 27 Celsius Power Cycle Min/Max Temperature: 19/27 Celsius Lifetime Min/Max Temperature: 13/52 Celsius Under/Over Temperature Limit Count: 0/0 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -40/70 Celsius Temperature History Size (Index): 128 (37) Index Estimated Time Temperature Celsius 38 2022-04-01 08:22 34 *************** ... ..( 95 skipped). .. *************** 6 2022-04-01 09:58 34 *************** 7 2022-04-01 09:59 35 **************** ... ..( 12 skipped). .. **************** 20 2022-04-01 10:12 35 **************** 21 2022-04-01 10:13 ? - 22 2022-04-01 10:14 20 * 23 2022-04-01 10:15 21 ** 24 2022-04-01 10:16 21 ** 25 2022-04-01 10:17 22 *** 26 2022-04-01 10:18 22 *** 27 2022-04-01 10:19 23 **** 28 2022-04-01 10:20 23 **** 29 2022-04-01 10:21 24 ***** 30 2022-04-01 10:22 24 ***** 31 2022-04-01 10:23 25 ****** 32 2022-04-01 10:24 25 ****** 33 2022-04-01 10:25 25 ****** 34 2022-04-01 10:26 26 ******* 35 2022-04-01 10:27 26 ******* 36 2022-04-01 10:28 27 ******** 37 2022-04-01 10:29 27 ******** SCT Error Recovery Control: Read: Disabled Write: Disabled Device Statistics (GP Log 0x04) Page Offset Size Value Flags Description 0x01 ===== = = === == General Statistics (rev 2) == 0x01 0x008 4 3495 --- Lifetime Power-On Resets 0x01 0x018 6 51502617485 --- Logical Sectors Written 0x01 0x020 6 757543510 --- Number of Write Commands 0x01 0x028 6 37112141904 --- Logical Sectors Read 0x01 0x030 6 98430763 --- Number of Read Commands 0x03 ===== = = === == Rotating Media Statistics (rev 1) == 0x03 0x008 4 16309 --- Spindle Motor Power-on Hours 0x03 0x010 4 16309 --- Head Flying Hours 0x03 0x018 4 4535 --- Head Load Events 0x03 0x020 4 121 --- Number of Reallocated Logical Sectors 0x03 0x028 4 0 --- Read Recovery Attempts 0x03 0x030 4 0 --- Number of Mechanical Start Failures 0x04 ===== = = === == General Errors Statistics (rev 1) == 0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors 0x04 0x010 4 2 --- Resets Between Cmd Acceptance and Completion 0x05 ===== = = === == Temperature Statistics (rev 1) == 0x05 0x008 1 27 --- Current Temperature 0x05 0x010 1 33 N-- Average Short Term Temperature 0x05 0x018 1 38 N-- Average Long Term Temperature 0x05 0x020 1 52 --- Highest Temperature 0x05 0x028 1 13 --- Lowest Temperature 0x05 0x030 1 43 N-- Highest Average Short Term Temperature 0x05 0x038 1 25 N-- Lowest Average Short Term Temperature 0x05 0x040 1 39 N-- Highest Average Long Term Temperature 0x05 0x048 1 25 N-- Lowest Average Long Term Temperature 0x05 0x050 4 0 --- Time in Over-Temperature 0x05 0x058 1 60 --- Specified Maximum Operating Temperature 0x05 0x060 4 0 --- Time in Under-Temperature 0x05 0x068 1 0 --- Specified Minimum Operating Temperature 0x06 ===== = = === == Transport Statistics (rev 1) == 0x06 0x008 4 32502 --- Number of Hardware Resets 0x06 0x010 4 6197 --- Number of ASR Events 0x06 0x018 4 0 --- Number of Interface CRC Errors |||_ C monitored condition met ||__ D supports DSN |___ N normalized value Pending Defects log (GP Log 0x0c) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0009 2 4 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 1 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000d 2 0 Non-CRC errors within host-to-device FIS >8> mdadm --examine /dev/sdc /dev/sdc: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) >9> mdadm --examine /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : f624aab2:afc18758:5c20d349:55b9b36f Name : any:1 Creation Time : Tue Mar 29 11:02:09 2022 Raid Level : raid0 Raid Devices : 2 Avail Dev Size : 6871947240 sectors (3.20 TiB 3.52 TB) Super Offset : 6871947248 sectors State : active Device UUID : 0f94cfbf:af8375b8:5379c38e:3a1e3e79 Update Time : Tue Mar 29 11:02:09 2022 Bad Block Log : 512 entries available at offset -8 sectors Checksum : 2066f14f - correct Events : 0 Chunk Size : 64K Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing, 'R' == replacing) >5> for d in /dev/sd[ce] >7> smartctl --xall /dev/sde smartctl 7.2 2021-09-14 r5237 [x86_64-linux-5.14.21-150400.11-default] (SUSE RPM) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: HGST MegaScale 4000 Device Model: HGST HMS5C4040ALE640 Serial Number: PL1331LAHGEP7H LU WWN Device Id: 5 000cca 22ed4a812 Firmware Version: MPAOA580 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5700 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Fri Apr 1 10:29:37 2022 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM feature is: Disabled Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, frozen [SEC2] Wt Cache Reorder: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 28) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 712) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate PO-R-- 100 100 016 - 0 2 Throughput_Performance P-S--- 134 134 054 - 102 3 Spin_Up_Time POS--- 134 134 024 - 509 (Average 529) 4 Start_Stop_Count -O--C- 100 100 000 - 3499 5 Reallocated_Sector_Ct PO--CK 100 100 005 - 0 7 Seek_Error_Rate PO-R-- 100 100 067 - 0 8 Seek_Time_Performance P-S--- 113 113 020 - 42 9 Power_On_Hours -O--C- 098 098 000 - 16331 10 Spin_Retry_Count PO--C- 100 100 060 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 3497 192 Power-Off_Retract_Count -O--CK 098 098 000 - 3522 193 Load_Cycle_Count -O--C- 098 098 000 - 3522 194 Temperature_Celsius -O---- 230 230 000 - 26 (Min/Max 13/52) 196 Reallocated_Event_Count -O--CK 100 100 000 - 0 197 Current_Pending_Sector -O---K 100 100 000 - 0 198 Offline_Uncorrectable ---R-- 100 100 000 - 0 199 UDMA_CRC_Error_Count -O-R-- 200 200 000 - 1 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x03 GPL R/O 1 Ext. Comprehensive SMART error log 0x04 GPL R/O 7 Device Statistics log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x08 GPL R/O 2 Power Conditions log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x20 GPL R/O 1 Streaming performance log [OBS-8] 0x21 GPL R/O 1 Write stream error log 0x22 GPL R/O 1 Read stream error log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (1 sectors) Device Error Count: 1 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1 [0] occurred at disk power-on lifetime: 3554 hours (148 days + 2 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 84 -- 51 09 fa 00 01 aa 12 f3 26 0a 00 Error: ICRC, ABRT at LBA = 0x1aa12f326 = 7148335910 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 61 06 00 00 70 00 01 aa 12 fd 20 40 08 00:16:09.067 WRITE FPDMA QUEUED 61 0a 00 00 68 00 01 aa 12 f3 20 40 08 00:16:09.058 WRITE FPDMA QUEUED 61 06 00 00 60 00 01 aa 12 ed 20 40 08 00:16:09.040 WRITE FPDMA QUEUED 61 0a 00 00 58 00 01 aa 12 e3 20 40 08 00:16:09.031 WRITE FPDMA QUEUED 61 06 00 00 50 00 01 aa 12 dd 20 40 08 00:16:09.013 WRITE FPDMA QUEUED SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 15942 - # 2 Short offline Completed without error 00% 15935 - # 3 Short offline Completed without error 00% 15928 - # 4 Short offline Completed without error 00% 15918 - # 5 Short offline Completed without error 00% 15913 - # 6 Short offline Completed without error 00% 15907 - # 7 Short offline Completed without error 00% 15904 - # 8 Short offline Completed without error 00% 15895 - # 9 Short offline Completed without error 00% 15888 - #10 Short offline Completed without error 00% 15877 - #11 Short offline Completed without error 00% 15871 - #12 Short offline Completed without error 00% 15864 - #13 Short offline Completed without error 00% 15858 - #14 Short offline Completed without error 00% 15851 - #15 Short offline Completed without error 00% 15842 - #16 Short offline Completed without error 00% 15836 - #17 Short offline Completed without error 00% 15827 - #18 Short offline Completed without error 00% 15820 - #19 Short offline Completed without error 00% 15815 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 256 (0x0100) Device State: Active (0) Current Temperature: 26 Celsius Power Cycle Min/Max Temperature: 19/26 Celsius Lifetime Min/Max Temperature: 13/52 Celsius Under/Over Temperature Limit Count: 0/0 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -40/70 Celsius Temperature History Size (Index): 128 (106) Index Estimated Time Temperature Celsius 107 2022-04-01 08:22 34 *************** ... ..(109 skipped). .. *************** 89 2022-04-01 10:12 34 *************** 90 2022-04-01 10:13 ? - 91 2022-04-01 10:14 20 * 92 2022-04-01 10:15 20 * 93 2022-04-01 10:16 21 ** 94 2022-04-01 10:17 21 ** 95 2022-04-01 10:18 22 *** 96 2022-04-01 10:19 22 *** 97 2022-04-01 10:20 23 **** 98 2022-04-01 10:21 23 **** 99 2022-04-01 10:22 23 **** 100 2022-04-01 10:23 24 ***** 101 2022-04-01 10:24 24 ***** 102 2022-04-01 10:25 25 ****** 103 2022-04-01 10:26 25 ****** 104 2022-04-01 10:27 25 ****** 105 2022-04-01 10:28 26 ******* 106 2022-04-01 10:29 26 ******* SCT Error Recovery Control: Read: Disabled Write: Disabled Device Statistics (GP Log 0x04) Page Offset Size Value Flags Description 0x01 ===== = = === == General Statistics (rev 2) == 0x01 0x008 4 3497 --- Lifetime Power-On Resets 0x01 0x018 6 38118689395 --- Logical Sectors Written 0x01 0x020 6 753091092 --- Number of Write Commands 0x01 0x028 6 48929739739 --- Logical Sectors Read 0x01 0x030 6 139079999 --- Number of Read Commands 0x03 ===== = = === == Rotating Media Statistics (rev 1) == 0x03 0x008 4 16310 --- Spindle Motor Power-on Hours 0x03 0x010 4 16310 --- Head Flying Hours 0x03 0x018 4 3522 --- Head Load Events 0x03 0x020 4 0 --- Number of Reallocated Logical Sectors 0x03 0x028 4 0 --- Read Recovery Attempts 0x03 0x030 4 0 --- Number of Mechanical Start Failures 0x04 ===== = = === == General Errors Statistics (rev 1) == 0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors 0x04 0x010 4 6 --- Resets Between Cmd Acceptance and Completion 0x05 ===== = = === == Temperature Statistics (rev 1) == 0x05 0x008 1 26 --- Current Temperature 0x05 0x010 1 33 N-- Average Short Term Temperature 0x05 0x018 1 39 N-- Average Long Term Temperature 0x05 0x020 1 52 --- Highest Temperature 0x05 0x028 1 13 --- Lowest Temperature 0x05 0x030 1 43 N-- Highest Average Short Term Temperature 0x05 0x038 1 25 N-- Lowest Average Short Term Temperature 0x05 0x040 1 40 N-- Highest Average Long Term Temperature 0x05 0x048 1 25 N-- Lowest Average Long Term Temperature 0x05 0x050 4 0 --- Time in Over-Temperature 0x05 0x058 1 60 --- Specified Maximum Operating Temperature 0x05 0x060 4 0 --- Time in Under-Temperature 0x05 0x068 1 0 --- Specified Minimum Operating Temperature 0x06 ===== = = === == Transport Statistics (rev 1) == 0x06 0x008 4 35049 --- Number of Hardware Resets 0x06 0x010 4 6205 --- Number of ASR Events 0x06 0x018 4 1 --- Number of Interface CRC Errors |||_ C monitored condition met ||__ D supports DSN |___ N normalized value Pending Defects log (GP Log 0x0c) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0009 2 4 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 1 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000d 2 0 Non-CRC errors within host-to-device FIS >8> mdadm --examine /dev/sde /dev/sde: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) >9> mdadm --examine /dev/sde1 /dev/sde1: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : f624aab2:afc18758:5c20d349:55b9b36f Name : any:1 Creation Time : Tue Mar 29 11:02:09 2022 Raid Level : raid0 Raid Devices : 2 Avail Dev Size : 6871947240 sectors (3.20 TiB 3.52 TB) Super Offset : 6871947248 sectors State : active Device UUID : d4132121:d05ed348:83f7d542:fcbb7deb Update Time : Tue Mar 29 11:02:09 2022 Bad Block Log : 512 entries available at offset -8 sectors Checksum : 38686827 - correct Events : 0 Chunk Size : 64K Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing, 'R' == replacing) >11> mdadm --detail /dev/md1 /dev/md1: Version : 1.0 Creation Time : Tue Mar 29 11:02:09 2022 Raid Level : raid0 Array Size : 6871947136 (6.40 TiB 7.04 TB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Tue Mar 29 11:02:09 2022 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : -unknown- Chunk Size : 64K Consistency Policy : none Name : any:1 UUID : f624aab2:afc18758:5c20d349:55b9b36f Events : 0 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 8 65 1 active sync /dev/sde1 >12> /home/bkorb/bin/lsdrv/lsdrv-master/lsdrv bin/gather-md-info: /home/bkorb/bin/lsdrv/lsdrv-master/lsdrv: /usr/bin/python: bad interpreter: No such file or directory bkorb@bach:~> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Trying to rescue a RAID-1 array 2022-04-01 17:59 ` Bruce Korb @ 2022-04-01 18:21 ` Bruce Korb 2022-04-01 19:45 ` Wol 2022-04-20 8:40 ` Need to move RAID1 with mounted partition Leslie Rhorer 0 siblings, 2 replies; 18+ messages in thread From: Bruce Korb @ 2022-04-01 18:21 UTC (permalink / raw) To: Wols Lists; +Cc: brucekorbreply, linux-raid Um, I forgot that with a fresh install, I have to remember what all tools I had installed and re-install 'em. bach:/home/bkorb # /home/bkorb/bin/lsdrv/lsdrv-master/lsdrv PCI [ahci] 00:11.4 SATA controller: Intel Corporation C610/X99 series chipset sSATA Controller [AHCI mode] (rev 05) ├scsi 0:0:0:0 ATA TOSHIBA HDWE160 {487OK01XFB8G} │└sda 5.46t [8:0] Partitioned (gpt) │ ├sda1 128.00g [8:1] btrfs 'OPT-USR' {649826e5-7406-49fb-ad4a-a35a0077a325} │ │└Mounted as /dev/sda1 @ /opt │ ├sda3 64.00g [8:3] Empty/Unknown │ ├sda4 16.00g [8:4] Empty/Unknown │ └sda5 5.25t [8:5] MD raid1 (0/2) (w/ sdb5) in_sync 'bach:0' {0e2cb19c-b567-5fcc-2982-c38e81e42a71} │ └md0 5.25t [9:0] MD v1.2 raid1 (2) clean {0e2cb19c:-b567-5f:cc-2982-:c38e81e42a71} │ │ ext4 'HOME' {a6551143-65ab-40ff-82b6-8cc809a1a856} │ └Mounted as /dev/md0 @ /home ├scsi 1:0:0:0 ATA TOSHIBA HDWE160 {487OK01SFB8G} │└sdb 5.46t [8:16] Partitioned (gpt) │ ├sdb1 192.00g [8:17] btrfs 'VAR-TMP' {c1304823-0b3b-4655-bfbb-a7f064ec59f5} │ │└Mounted as /dev/sdb1 @ /var │ ├sdb2 16.00g [8:18] Empty/Unknown │ └sdb5 5.25t [8:21] MD raid1 (1/2) (w/ sda5) in_sync 'bach:0' {0e2cb19c-b567-5fcc-2982-c38e81e42a71} │ └md0 5.25t [9:0] MD v1.2 raid1 (2) clean {0e2cb19c:-b567-5f:cc-2982-:c38e81e42a71} │ ext4 'HOME' {a6551143-65ab-40ff-82b6-8cc809a1a856} ├scsi 2:0:0:0 ATA HGST HMS5C4040AL {PL1331LAHEZZ5H} │└sdc 3.64t [8:32] Partitioned (gpt) │ ├sdc1 3.20t [8:33] MD raid0 (0/2) (w/ sde1) in_sync 'any:1' {f624aab2-afc1-8758-5c20-d34955b9b36f} │ │└md1 6.40t [9:1] MD v1.0 raid0 (2) clean, 64k Chunk, None (None) None {f624aab2:-afc1-87:58-5c20-:d34955b9b36f} │ │ xfs 'User' {fe716da2-b515-4fd6-8ea6-f44f48038b78} │ ├sdc2 320.00g [8:34] ext4 'PHOTOS-B' {4ab1a2c2-dbee-4f4d-b491-8652ea7a24d7} │ └sdc3 65.22g [8:35] ext4 'TEMP' {c18c28d3-dafd-4f1b-aa9f-b7a462139073} └scsi 3:0:0:0 ATA WDC WDS250G2B0A- {181202806197} └sdd 232.89g [8:48] Partitioned (gpt) ├sdd1 901.00m [8:49] vfat 'BOOT-EFI' {AF1B-15D7} │└Mounted as /dev/sdd1 @ /boot/efi ├sdd2 116.00g [8:50] Partitioned (dos) 'ROOT1' {63e24f52-2f8f-4ad1-a1e6-cb5537efcf6f} │├Mounted as /dev/sdd2 @ / │├Mounted as /dev/sdd2 @ /.snapshots │├Mounted as /dev/sdd2 @ /boot/grub2/i386-pc │├Mounted as /dev/sdd2 @ /boot/grub2/x86_64-efi │├Mounted as /dev/sdd2 @ /srv │├Mounted as /dev/sdd2 @ /usr/local │├Mounted as /dev/sdd2 @ /tmp │└Mounted as /dev/sdd2 @ /root └sdd3 116.01g [8:51] xfs 'ROOT2' {69178c35-15ea-4f04-8f29-bf4f1f6f890a} └Mounted as /dev/sdd3 @ /root2 PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] (rev 05) ├scsi 4:0:0:0 ATA HGST HMS5C4040AL {PL1331LAHGEP7H} │└sde 3.64t [8:64] Partitioned (gpt) │ ├sde1 3.20t [8:65] MD raid0 (1/2) (w/ sdc1) in_sync 'any:1' {f624aab2-afc1-8758-5c20-d34955b9b36f} │ │└md1 6.40t [9:1] MD v1.0 raid0 (2) clean, 64k Chunk, None (None) None {f624aab2:-afc1-87:58-5c20-:d34955b9b36f} │ │ xfs 'User' {fe716da2-b515-4fd6-8ea6-f44f48038b78} │ ├sde2 64.00g [8:66] swap {dbd52b6f-fc65-42e9-948b-33d9c3834c3c} │ └sde3 385.22g [8:67] ext4 'PHOTO-A' {c84250ab-6563-4832-a919-632a34486bf1} └scsi 5:0:0:0 HL-DT-ST BD-RE WH14NS40 {SIK9TH8SE163} └sr0 1.00g [11:0] Empty/Unknown USB [usb-storage] Bus 002 Device 007: ID 05e3:0745 Genesys Logic, Inc. Logilink CR0012 {000000000903} └scsi 10:0:0:0 Generic STORAGE DEVICE {000000000503} └sdf 0.00k [8:80] Empty/Unknown USB [usb-storage] Bus 002 Device 008: ID 058f:6387 Alcor Micro Corp. Flash Drive {A3A1458D} └scsi 11:0:0:0 Generic Flash Disk {A} └sdg 28.91g [8:96] Partitioned (dos) └sdg1 28.91g [8:97] vfat '32GB' {1D6B-D5DB} └Mounted as /dev/sdg1 @ /run/media/bkorb/32GB Hmm. Interesting. Dunno what that /dev/sdf thingy is. I only have one thumb drive plugged in and mounted as /dev/sdg1. - Bruce On Fri, Apr 1, 2022 at 10:59 AM Bruce Korb <bruce.korb@gmail.com> wrote: > > Hi, > Thank you again. I've attached a typescript of the commands. Here are > the line numbers where the commands get issued. The relevant > partitions are on /dev/sdc1 and /dev/sde1: > > 1:>3> uname -a > 3:>4> mdadm --version > 5:>5> for d in /dev/sd[ce] > 6:>7> smartctl --xall /dev/sdc > 252:>8> mdadm --examine /dev/sdc > 256:>9> mdadm --examine /dev/sdc1 > 281:>5> for d in /dev/sd[ce] > 282:>7> smartctl --xall /dev/sde > 556:>8> mdadm --examine /dev/sde > 560:>9> mdadm --examine /dev/sde1 > 585:>11> mdadm --detail /dev/md1 > 614:>12> /home/bkorb/bin/lsdrv/lsdrv-master/lsdrv > > Right after line 256, you'll see the fateful info: > 263 Creation Time : Tue Mar 29 11:02:09 2022 > 264 Raid Level : raid0 > > The first block of /dev/sdc1 contains: > bkorb@bach:~> sudo od -Ax -N 4096 -tx1 /dev/sdc1 > 000000 58 46 53 42 00 00 10 00 00 00 00 00 33 33 32 d0 > 000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 000020 fe 71 6d a2 b5 15 4f d6 8e a6 f4 4f 48 03 8b 78 > 000030 00 00 00 00 20 00 00 07 00 00 00 00 00 00 00 60 > 000040 00 00 00 00 00 00 00 61 00 00 00 00 00 00 00 62 > 000050 00 00 00 01 0c cc cc b4 00 00 00 04 00 00 00 00 > 000060 00 06 66 66 bc b5 10 00 02 00 00 08 55 73 65 72 > 000070 00 00 00 00 00 00 00 00 0c 0c 09 03 1c 00 00 05 > 000080 00 00 00 00 00 15 9c 80 00 00 00 00 00 00 2b 48 > 000090 00 00 00 00 27 92 36 06 00 00 00 00 00 00 00 00 > 0000a0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 0000b0 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 > 0000c0 00 0c 10 00 00 00 10 00 00 00 01 8a 00 00 01 8a > 0000d0 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 > 0000e0 6f 7c f9 cc 00 00 00 00 ff ff ff ff ff ff ff ff > 0000f0 00 00 00 7a 00 31 11 20 00 00 00 00 00 00 00 00 > 000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > I wouldn't know how to find the file system start. :) > > - Bruce > > On Thu, Mar 31, 2022 at 2:34 PM Wols Lists <antlists@youngman.org.uk> wrote: > > > > On 31/03/2022 19:14, Bruce Korb wrote: > > > On Thu, Mar 31, 2022 at 10:06 AM Wols Lists <antlists@youngman.org.uk> wrote: > > >> > > >> On 31/03/2022 17:44, Bruce Korb wrote: > > >>> I moved the two disks from a cleanly shut down system that could not > > >>> reboot and could not > > >>> be upgraded to a new OS release. So, I put them in.a new box and did an install. > > >>> The installation recognized them as a RAID and decided that the > > >>> partitions needed a > > >>> new superblock of type RAID-0. > > >> > > >> That's worrying, did it really write a superblock? > > > > > > Yep. That worried me, too. I did the command to show the RAID status of the two > > > partitions and, sure enough, both partitions were now listed as RAID0. > > > > > >>> Since these data have never been > > >>> remounted since the > > >>> shutdown on the original machine, I am hoping I can change the RAID > > >>> type and mount it > > >>> so as to recover my. .ssh and .thunderbird (email) directories. The > > >>> bulk of the data are > > >>> backed up (assuming no issues with the full backup of my critical > > >>> data), but rebuilding > > >>> and redistributing the .ssh directory would be a particular nuisance. > > >>> > > >>> SO: what are my options? I can't find any advice on how to tell mdadm > > >>> that the RAID-0 partitions > > >>> really are RAID-1 partitions. Last gasp might be to "mdadm --create" > > >>> the RAID-1 again, but there's > > >>> a lot of advice out there saying that it really is the last gasp > > >>> before giving up. :) > > >>> > > >> > > >> https://raid.wiki.kernel.org/index.php/Asking_for_help > > > > > > Sorry about that. I have two systems: the one I'm typing on and the one > > > I am trying to bring up. At the moment, I'm in single user mode building > > > out a new /home file system. mdadm --create is 15% done after an hour :(. > > > It'll be mid/late afternoon before /home is rebuilt, mounted and I'll be > > > able to run display commands on the "old" RAID1 (or 0) partitions. > > > > > >> Especially lsdrv. That tells us a LOT about your system. > > > > > > Expect email in about 6 hours or so. :) But openSUSE doesn't know > > > about any "lsdrv" command. "cat /proc/mdstat" shows /dev/md1 (the > > > RAID device I'm fretting over) to be active, raid-0 using /dev/sdc1 and sde1. > > > > Well, the webpage does tell you where to download it from - it's not > > part of the official tools, and it's a personal thing that's damn useful. > > > > > >> What was the filesystem on your raid? Hopefully it's as simple as moving > > >> the "start of partition", breaking the raid completely, and you can just > > >> mount the filesystem. > > > > > > I *think* it was EXT4, but. it might be the XFS one. I think I let it default > > > and openSUSE appears to prefer the XFS file system for RAID devices. > > > Definitely one of those two. I built it close to a decade ago, so I'll be moving > > > the data to the new /home array. > > > > > >> What really worries me is how and why it both recognised it as a raid, > > >> then thought it needed to be converted to raid-0. That just sounds wrong > > >> on so many levels. Did you let it mess with your superblocks? I hope you > > >> said "don't touch those drives"? > > > > > > In retrospect, I ought to have left the drives unplugged until the install was > > > done. The installer saw that they were RAID so it RAID-ed them. Only it > > > seems to have decided on type 0 over type 1. I wasn't attentive because > > > I've upgraded Linux so many times and it was "just done correctly" without > > > having to give it a lot of thought. "If only" I'd thought to back up > > > email and ssh. > > > (1.5TB of photos are likely okay.) > > > > > > Thank you so much for your reply and potentially help :) > > > > > If it says the drive is active ... > > > > When you get and run lsdrv, see if it finds a filesystem on the raid-0 - > > I suspect it might! > > > > There's a bug, which should be well fixed, but it might have bitten you. > > It breaks raid arrays. But if the drive is active, it might well mount, > > and you will be running a degraded mirror. Mount it read-only, back it > > up, and then see whether you can force-assemble the two bits back > > together :-) > > > > But don't do anything if you have any trouble whatsoever mounting and > > backing up. > > > > Cheers, > > Wol ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Trying to rescue a RAID-1 array 2022-04-01 18:21 ` Bruce Korb @ 2022-04-01 19:45 ` Wol 2022-04-01 20:23 ` Bruce Korb 2022-04-20 8:40 ` Need to move RAID1 with mounted partition Leslie Rhorer 1 sibling, 1 reply; 18+ messages in thread From: Wol @ 2022-04-01 19:45 UTC (permalink / raw) To: bruce.korb+reply; +Cc: linux-raid Hmmm... what drives are the damaged array on? There's an intact raid1 there ... On 01/04/2022 19:21, Bruce Korb wrote: > Um, I forgot that with a fresh install, I have to remember what all > tools I had installed and re-install 'em. > > bach:/home/bkorb # /home/bkorb/bin/lsdrv/lsdrv-master/lsdrv > > PCI [ahci] 00:11.4 SATA controller: Intel Corporation C610/X99 series > chipset sSATA Controller [AHCI mode] (rev 05) > ├scsi 0:0:0:0 ATA TOSHIBA HDWE160 {487OK01XFB8G} > │└sda 5.46t [8:0] Partitioned (gpt) > │ ├sda1 128.00g [8:1] btrfs 'OPT-USR' {649826e5-7406-49fb-ad4a-a35a0077a325} > │ │└Mounted as /dev/sda1 @ /opt > │ ├sda3 64.00g [8:3] Empty/Unknown > │ ├sda4 16.00g [8:4] Empty/Unknown > │ └sda5 5.25t [8:5] MD raid1 (0/2) (w/ sdb5) in_sync 'bach:0' So sda5 has a raid1 on it ... > {0e2cb19c-b567-5fcc-2982-c38e81e42a71} > │ └md0 5.25t [9:0] MD v1.2 raid1 (2) clean called md0 > {0e2cb19c:-b567-5f:cc-2982-:c38e81e42a71} > │ │ ext4 'HOME' {a6551143-65ab-40ff-82b6-8cc809a1a856} > │ └Mounted as /dev/md0 @ /home and mounted as /home. > ├scsi 1:0:0:0 ATA TOSHIBA HDWE160 {487OK01SFB8G} > │└sdb 5.46t [8:16] Partitioned (gpt) > │ ├sdb1 192.00g [8:17] btrfs 'VAR-TMP' {c1304823-0b3b-4655-bfbb-a7f064ec59f5} > │ │└Mounted as /dev/sdb1 @ /var > │ ├sdb2 16.00g [8:18] Empty/Unknown > │ └sdb5 5.25t [8:21] MD raid1 (1/2) (w/ sda5) in_sync 'bach:0' > {0e2cb19c-b567-5fcc-2982-c38e81e42a71} > │ └md0 5.25t [9:0] MD v1.2 raid1 (2) clean and sdb5 is the other half. > {0e2cb19c:-b567-5f:cc-2982-:c38e81e42a71} > │ ext4 'HOME' {a6551143-65ab-40ff-82b6-8cc809a1a856} > ├scsi 2:0:0:0 ATA HGST HMS5C4040AL {PL1331LAHEZZ5H} > │└sdc 3.64t [8:32] Partitioned (gpt) > │ ├sdc1 3.20t [8:33] MD raid0 (0/2) (w/ sde1) in_sync 'any:1' > {f624aab2-afc1-8758-5c20-d34955b9b36f} > │ │└md1 6.40t [9:1] MD v1.0 raid0 (2) clean, 64k Chunk, None (None) > None {f624aab2:-afc1-87:58-5c20-:d34955b9b36f} > │ │ xfs 'User' {fe716da2-b515-4fd6-8ea6-f44f48038b78} This looks promising ... dunno what on earth it thought it was doing, but it's telling me that on sdc1 we have a raid 0, version 1.0, with an xfs on it. Is there any chance your install formatted the new raid? Because if it did your data is probably toast, but if it didn't we might be home and dry. > │ ├sdc2 320.00g [8:34] ext4 'PHOTOS-B' {4ab1a2c2-dbee-4f4d-b491-8652ea7a24d7} > │ └sdc3 65.22g [8:35] ext4 'TEMP' {c18c28d3-dafd-4f1b-aa9f-b7a462139073} > └scsi 3:0:0:0 ATA WDC WDS250G2B0A- {181202806197} > └sdd 232.89g [8:48] Partitioned (gpt) > ├sdd1 901.00m [8:49] vfat 'BOOT-EFI' {AF1B-15D7} > │└Mounted as /dev/sdd1 @ /boot/efi > ├sdd2 116.00g [8:50] Partitioned (dos) 'ROOT1' > {63e24f52-2f8f-4ad1-a1e6-cb5537efcf6f} > │├Mounted as /dev/sdd2 @ / > │├Mounted as /dev/sdd2 @ /.snapshots > │├Mounted as /dev/sdd2 @ /boot/grub2/i386-pc > │├Mounted as /dev/sdd2 @ /boot/grub2/x86_64-efi > │├Mounted as /dev/sdd2 @ /srv > │├Mounted as /dev/sdd2 @ /usr/local > │├Mounted as /dev/sdd2 @ /tmp > │└Mounted as /dev/sdd2 @ /root > └sdd3 116.01g [8:51] xfs 'ROOT2' {69178c35-15ea-4f04-8f29-bf4f1f6f890a} > └Mounted as /dev/sdd3 @ /root2 > PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C610/X99 series > chipset 6-Port SATA Controller [AHCI mode] (rev 05) > ├scsi 4:0:0:0 ATA HGST HMS5C4040AL {PL1331LAHGEP7H} > │└sde 3.64t [8:64] Partitioned (gpt) > │ ├sde1 3.20t [8:65] MD raid0 (1/2) (w/ sdc1) in_sync 'any:1' > {f624aab2-afc1-8758-5c20-d34955b9b36f} > │ │└md1 6.40t [9:1] MD v1.0 raid0 (2) clean, 64k Chunk, None (None) > None {f624aab2:-afc1-87:58-5c20-:d34955b9b36f} > │ │ xfs 'User' {fe716da2-b515-4fd6-8ea6-f44f48038b78} And the other half of the raid. > │ ├sde2 64.00g [8:66] swap {dbd52b6f-fc65-42e9-948b-33d9c3834c3c} > │ └sde3 385.22g [8:67] ext4 'PHOTO-A' {c84250ab-6563-4832-a919-632a34486bf1} > └scsi 5:0:0:0 HL-DT-ST BD-RE WH14NS40 {SIK9TH8SE163} > └sr0 1.00g [11:0] Empty/Unknown > USB [usb-storage] Bus 002 Device 007: ID 05e3:0745 Genesys Logic, Inc. > Logilink CR0012 {000000000903} > └scsi 10:0:0:0 Generic STORAGE DEVICE {000000000503} > └sdf 0.00k [8:80] Empty/Unknown > USB [usb-storage] Bus 002 Device 008: ID 058f:6387 Alcor Micro Corp. > Flash Drive {A3A1458D} > └scsi 11:0:0:0 Generic Flash Disk {A} > └sdg 28.91g [8:96] Partitioned (dos) > └sdg1 28.91g [8:97] vfat '32GB' {1D6B-D5DB} > └Mounted as /dev/sdg1 @ /run/media/bkorb/32GB > > Hmm. Interesting. Dunno what that /dev/sdf thingy is. I only have one > thumb drive plugged in and mounted as /dev/sdg1. > Can you mount the raid? This just looks funny to me though, so make sure it's read only. Seeing as it made it v1.0, that means the raid superblock is at the end of the device and will not have done much if any damage ... It's probably a good idea to create a loopback device and mount it via that because it will protect the filesystem. Does any of this feel like it's right? Cheers, Wol ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Trying to rescue a RAID-1 array 2022-04-01 19:45 ` Wol @ 2022-04-01 20:23 ` Bruce Korb 2022-04-01 21:02 ` Wol 0 siblings, 1 reply; 18+ messages in thread From: Bruce Korb @ 2022-04-01 20:23 UTC (permalink / raw) To: Wol; +Cc: brucekorbreply, linux-raid - Bruce On Fri, Apr 1, 2022 at 12:45 PM Wol <antlists@youngman.org.uk> wrote: > > Hmmm... what drives are the damaged array on? There's an intact raid1 > there ... In my script, I focused on /dev/sd[ce]1 because those partitions have (or had) the data. I'm guessing they're toast, but I never intentionally had a new RAID formatted, let alone a new FS installed. Given that the installer did stuff I did not intend for it to do, I cannot really guarantee anything anymore. > On 01/04/2022 19:21, Bruce Korb wrote: > > So sda5 has a raid1 on it ... new disk, no data -- same with sdb5. > and mounted as /home. New disks, new /home hierarchy. > > ├scsi 2:0:0:0 ATA HGST HMS5C4040AL {PL1331LAHEZZ5H} > > │└sdc 3.64t [8:32] Partitioned (gpt) > > │ ├sdc1 3.20t [8:33] MD raid0 (0/2) (w/ sde1) in_sync 'any:1' > > {f624aab2-afc1-8758-5c20-d34955b9b36f} > > │ │└md1 6.40t [9:1] MD v1.0 raid0 (2) clean, 64k Chunk, None (None) > > None {f624aab2:-afc1-87:58-5c20-:d34955b9b36f} > > │ │ xfs 'User' {fe716da2-b515-4fd6-8ea6-f44f48038b78} > > This looks promising ... dunno what on earth it thought it was doing, > but it's telling me that on sdc1 we have a raid 0, version 1.0, with an > xfs on it. Is there any chance your install formatted the new raid? Chance? Sure, because it didn't do what I was expecting. It was never, ever mounted. During the install, I made certain that no mount point was associated with it. Once the install was done, I did do a manual "mount /dev/md1 /mnt", but said I needed to run the "xfs_recover" program. I started it, but then I realized that it was looking at 7TB of striped data, whereas it was actually 3.5TB of redundant data. That wasn't going to work. > Because if it did your data is probably toast, but if it didn't we might > be home and dry. If I can figure out how to mount it (read only), then I can see if a new FS was installed. If it wasn't, then I've got my data. > Can you mount the raid? This just looks funny to me though, so make sure > it's read only. > > Seeing as it made it v1.0, that means the raid superblock is at the end > of the device and will not have done much if any damage ... +1 !! > It's probably a good idea to create a loopback device and mount it via > that because it will protect the filesystem. > > Does any of this feel like it's right? I don't remember back all those years ago about which file system openSUSE decided to use as default. I'm pretty sure it was either XFS or EXT4. I did file systems many years ago, but that was back in the mid-80s. IOW, I don't know how to look for the file system layout data to figure out how to mount this beast. I thought I could rely on the install detecting the RAID and doing the right thing. Obviously not. :( ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Trying to rescue a RAID-1 array 2022-04-01 20:23 ` Bruce Korb @ 2022-04-01 21:02 ` Wol 2022-04-01 21:24 ` Bruce Korb 0 siblings, 1 reply; 18+ messages in thread From: Wol @ 2022-04-01 21:02 UTC (permalink / raw) To: bruce.korb+reply; +Cc: linux-raid On 01/04/2022 21:23, Bruce Korb wrote: >> Because if it did your data is probably toast, but if it didn't we might >> be home and dry. > If I can figure out how to mount it (read only), then I can see if a > new FS was installed. > If it wasn't, then I've got my data. > That looks promising then. It looks like your original array may have been v1.0 too ... a very good sign. Read up on loopback devices - it's in the wiki somewhere on recovering your raid ... What that does is you stick a file between between the file system and whatever's running on top, so linux caches all your writes into the file and doesn't touch the disk. Let's hope xfs_recover didn't actually write anything or we could be in trouble. The whole point about v1.0 is - hallelujah - the file system starts at the start of the partition! So now you've got loopback sorted, FORGET ABOUT THE RAID. Put the loopback over sdc1, and mount it. If it needs xfs_recover, because you've got the loopback, you can let it run, and hopefully it will not do very much. IFF the wind is blowing in the right direction (and there's at least a decent chance), you've got your data back! If it all goes pear shaped, it may well still be recoverable, but I'll probably be out of ideas. But the loopback will have saved your data so you'll be able to try again. Oh - and if it looks sort-of-okay but xfs_recover has trashed sdc1, at least try the same thing with sde1. You stand a chance that xfs_recover only trashed one drive and, the other being an exact copy, it could have survived and be okay. Cheers, Wol ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Trying to rescue a RAID-1 array 2022-04-01 21:02 ` Wol @ 2022-04-01 21:24 ` Bruce Korb 2022-04-01 21:33 ` Wol 0 siblings, 1 reply; 18+ messages in thread From: Bruce Korb @ 2022-04-01 21:24 UTC (permalink / raw) To: Wol; +Cc: brucekorbreply, linux-raid On Fri, Apr 1, 2022 at 2:02 PM Wol <antlists@youngman.org.uk> wrote: > Read up on loopback devices - it's in the wiki somewhere on recovering > your raid ... Not in the index nor findable by google. :( > What that does is you stick a file between the file system and > whatever's running on top, so linux caches all your writes into the file > and doesn't touch the disk. Yeah, I'll need some guidance there. :( What commands do I need to run to get sdc1 or sde1 mounted under a loopback? > Let's hope xfs_recover didn't actually write anything or we could be in > trouble. All it ever did was print dots and say that something that might have been a superblock wasn't really. It didn't sound like it did anything. > The whole point about v1.0 is - hallelujah - the file system starts at > the start of the partition! with the RAID superblock at the end and the file system layout at the start, it should be good. Fingers crossed. Please point me to where I can learn how to loopback mount an XFS file system within a RAID partition. :) Thank you so much!! Regards,Bruce ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Trying to rescue a RAID-1 array 2022-04-01 21:24 ` Bruce Korb @ 2022-04-01 21:33 ` Wol 0 siblings, 0 replies; 18+ messages in thread From: Wol @ 2022-04-01 21:33 UTC (permalink / raw) To: bruce.korb+reply; +Cc: linux-raid On 01/04/2022 22:24, Bruce Korb wrote: > with the RAID superblock at the end and the file system layout at the start, > it should be good. Fingers crossed. Please point me to where I can learn > how to loopback mount an XFS file system within a RAID partition.:) https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID Note I've never done this myself (fortunately never needed it!), but a fair few people have needed it, and have done it ... Cheers, Wol ^ permalink raw reply [flat|nested] 18+ messages in thread
* Need to move RAID1 with mounted partition 2022-04-01 18:21 ` Bruce Korb 2022-04-01 19:45 ` Wol @ 2022-04-20 8:40 ` Leslie Rhorer 2022-04-20 8:55 ` Roman Mamedov 2022-04-20 11:08 ` Andy Smith 1 sibling, 2 replies; 18+ messages in thread From: Leslie Rhorer @ 2022-04-20 8:40 UTC (permalink / raw) To: Linux RAID Hello all, I have run into a little problem. I know of a couple of ways to fix it by shutting down the system and physically taking it apart, but for various reasons I don't wish to take that route. I want to be able to re-arrange the system with it running. The latest version (bullseye) of Debian will not complete its upgrade properly because my /boot file system is a little too small. I have two bootable drives with three partitions on them. The first partition on each drive is assembled into a RAID1 as /dev/md1 mounted as /boot. Once the system is booted, these can of course easily be umounted, the RAID1 stopped, and there is then no problem increasing the size of the partitions if there were space to be had. The third partition on each drive is assigned as swap, and of course it was easy to resize those partitions, leaving an additional 512MB between the second and third partitions on each drive. All I need to do is move the second partition on each drive up by 512MB. The problem is the second partition on both drives is also assembled into a RAID1 array on /dev/md2, formatted as ext4 and mounted as /. Is there a way I can move the RAID1 array up without shutting down the system? I don't need to resize the array, just move it. Is there a way to umount the root without halting the system? Note the system is headless, so access is via ssh over the network. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Need to move RAID1 with mounted partition 2022-04-20 8:40 ` Need to move RAID1 with mounted partition Leslie Rhorer @ 2022-04-20 8:55 ` Roman Mamedov 2022-04-20 9:21 ` Leslie Rhorer 2022-04-20 11:08 ` Andy Smith 1 sibling, 1 reply; 18+ messages in thread From: Roman Mamedov @ 2022-04-20 8:55 UTC (permalink / raw) To: Leslie Rhorer; +Cc: Linux RAID On Wed, 20 Apr 2022 03:40:12 -0500 Leslie Rhorer <lesrhorer@att.net> wrote: > I have run into a little problem. I know of a couple of ways to fix it > by shutting down the system and physically taking it apart, but for > various reasons I don't wish to take that route. I want to be able to > re-arrange the system with it running. > > The latest version (bullseye) of Debian will not complete its upgrade > properly because my /boot file system is a little too small. I have two > bootable drives with three partitions on them. The first partition on > each drive is assembled into a RAID1 as /dev/md1 mounted as /boot. Once > the system is booted, these can of course easily be umounted, the RAID1 > stopped, and there is then no problem increasing the size of the > partitions if there were space to be had. The third partition on each > drive is assigned as swap, and of course it was easy to resize those > partitions, leaving an additional 512MB between the second and third > partitions on each drive. All I need to do is move the second partition > on each drive up by 512MB. > > The problem is the second partition on both drives is also assembled > into a RAID1 array on /dev/md2, formatted as ext4 and mounted as /. Is > there a way I can move the RAID1 array up without shutting down the > system? I don't need to resize the array, just move it. You could fail one half of the RAID1, remove it, recreate the partition at the required offset, add the new partition into the array and let it rebuild. Then repeat with the other half. However during that process you do not have the redundancy protection, so in case the remaining array drive fails or has a bad sector, it could become tricky to recover. Maybe run a bad block scan, or "smartctl -t long" on the disks first. And of course have a backup. -- With respect, Roman ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Need to move RAID1 with mounted partition 2022-04-20 8:55 ` Roman Mamedov @ 2022-04-20 9:21 ` Leslie Rhorer 0 siblings, 0 replies; 18+ messages in thread From: Leslie Rhorer @ 2022-04-20 9:21 UTC (permalink / raw) To: Roman Mamedov; +Cc: Linux RAID On 4/20/2022 3:55 AM, Roman Mamedov wrote: > On Wed, 20 Apr 2022 03:40:12 -0500 > Leslie Rhorer <lesrhorer@att.net> wrote: > >> I have run into a little problem. I know of a couple of ways to fix it >> by shutting down the system and physically taking it apart, but for >> various reasons I don't wish to take that route. I want to be able to >> re-arrange the system with it running. >> >> The latest version (bullseye) of Debian will not complete its upgrade >> properly because my /boot file system is a little too small. I have two >> bootable drives with three partitions on them. The first partition on >> each drive is assembled into a RAID1 as /dev/md1 mounted as /boot. Once >> the system is booted, these can of course easily be umounted, the RAID1 >> stopped, and there is then no problem increasing the size of the >> partitions if there were space to be had. The third partition on each >> drive is assigned as swap, and of course it was easy to resize those >> partitions, leaving an additional 512MB between the second and third >> partitions on each drive. All I need to do is move the second partition >> on each drive up by 512MB. >> >> The problem is the second partition on both drives is also assembled >> into a RAID1 array on /dev/md2, formatted as ext4 and mounted as /. Is >> there a way I can move the RAID1 array up without shutting down the >> system? I don't need to resize the array, just move it. > > You could fail one half of the RAID1, remove it, recreate the partition at the > required offset, add the new partition into the array and let it rebuild. Then > repeat with the other half. > > However during that process you do not have the redundancy protection, so in > case the remaining array drive fails or has a bad sector, it could become > tricky to recover. > > Maybe run a bad block scan, or "smartctl -t long" on the disks first. And of > course have a backup. Hmm. I hadn't thought of that. Well, of course I thought of the backup. I'm not insane. The boot drives are SSDs, and they are not all that big. The /dev/md2 array is only 88G, so there isn't much exposure to drive failure. Rebuilding won't take long. Thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Need to move RAID1 with mounted partition 2022-04-20 8:40 ` Need to move RAID1 with mounted partition Leslie Rhorer 2022-04-20 8:55 ` Roman Mamedov @ 2022-04-20 11:08 ` Andy Smith 2022-04-20 12:07 ` Pascal Hambourg 2022-04-20 12:16 ` Leslie Rhorer 1 sibling, 2 replies; 18+ messages in thread From: Andy Smith @ 2022-04-20 11:08 UTC (permalink / raw) To: Linux RAID Hello, On Wed, Apr 20, 2022 at 03:40:12AM -0500, Leslie Rhorer wrote: > The third partition on each drive is assigned as swap, and of > course it was easy to resize those partitions, leaving an > additional 512MB between the second and third partitions on each > drive. All I need to do is move the second partition on each > drive up by 512MB. I'd be tempted to just make these two new 512M spaces into new partitions for a RAID-1 and move your /boot to that, abandoning the RAID-1 you have for the /boot that is using the partitions at the start of the disk. What would you lose? A couple of hundred MB? Exchanged for a much easier life. You could do away with the swap partitions entirely and use swap files instead. You could recycle the first partitions as swap, too. Cheers, Andy -- https://bitfolk.com/ -- No-nonsense VPS hosting ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Need to move RAID1 with mounted partition 2022-04-20 11:08 ` Andy Smith @ 2022-04-20 12:07 ` Pascal Hambourg 2022-04-20 12:24 ` Leslie Rhorer 2022-04-20 12:16 ` Leslie Rhorer 1 sibling, 1 reply; 18+ messages in thread From: Pascal Hambourg @ 2022-04-20 12:07 UTC (permalink / raw) To: Linux RAID, Leslie Rhorer Le 20/04/2022 à 13:08, Andy Smith wrote: > > On Wed, Apr 20, 2022 at 03:40:12AM -0500, Leslie Rhorer wrote: >> The third partition on each drive is assigned as swap, and of >> course it was easy to resize those partitions, leaving an >> additional 512MB between the second and third partitions on each >> drive. All I need to do is move the second partition on each >> drive up by 512MB. > > I'd be tempted to just make these two new 512M spaces into new > partitions for a RAID-1 and move your /boot to that, abandoning the > RAID-1 you have for the /boot that is using the partitions at the > start of the disk. I agree, unless the BIOS cannot read sectors at that offset. Or you could create a RAID10 array with the 4 partitions if they have similar sizes. Or you could move /boot back into the / filesystem. In either case, the BIOS restriction applies and you may need to reinstall the boot loader on both drives. Or you could try to reduce the required space in /boot : - remove old kernels - reduce initramfs size with MODULES=dep instead of MODULES=most in /etc/initramfs-tools/initramfs.conf - remove plymouth if installed > You could do away with the swap partitions entirely and use swap > files instead. Swap files are an ugly hack and not all filesystems support them. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Need to move RAID1 with mounted partition 2022-04-20 12:07 ` Pascal Hambourg @ 2022-04-20 12:24 ` Leslie Rhorer 0 siblings, 0 replies; 18+ messages in thread From: Leslie Rhorer @ 2022-04-20 12:24 UTC (permalink / raw) To: Pascal Hambourg, Linux RAID On 4/20/2022 7:07 AM, Pascal Hambourg wrote: > Le 20/04/2022 à 13:08, Andy Smith wrote: >> >> On Wed, Apr 20, 2022 at 03:40:12AM -0500, Leslie Rhorer wrote: >>> The third partition on each drive is assigned as swap, and of >>> course it was easy to resize those partitions, leaving an >>> additional 512MB between the second and third partitions on each >>> drive. All I need to do is move the second partition on each >>> drive up by 512MB. >> >> I'd be tempted to just make these two new 512M spaces into new >> partitions for a RAID-1 and move your /boot to that, abandoning the >> RAID-1 you have for the /boot that is using the partitions at the >> start of the disk. > > I agree, unless the BIOS cannot read sectors at that offset. > > Or you could create a RAID10 array with the 4 partitions if they have > similar sizes. They don't. 'Not even close. > > Or you could move /boot back into the / filesystem. I would rather not. > > In either case, the BIOS restriction applies and you may need to > reinstall the boot loader on both drives. I did so just for safety. Whether it was actually needed or not, who knows? > Or you could try to reduce the required space in /boot : > - remove old kernels I did that. It wasn't enough. Eventually I just moved the kernel files and created symlinks before completing the upgrade on one system. The other got the resize treatment. > - reduce initramfs size with MODULES=dep instead of MODULES=most in > /etc/initramfs-tools/initramfs.conf I didn't try that. It probably would have worked. > - remove plymouth if installed Nope. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Need to move RAID1 with mounted partition 2022-04-20 11:08 ` Andy Smith 2022-04-20 12:07 ` Pascal Hambourg @ 2022-04-20 12:16 ` Leslie Rhorer 1 sibling, 0 replies; 18+ messages in thread From: Leslie Rhorer @ 2022-04-20 12:16 UTC (permalink / raw) To: Linux RAID On 4/20/2022 6:08 AM, Andy Smith wrote: > Hello, > > On Wed, Apr 20, 2022 at 03:40:12AM -0500, Leslie Rhorer wrote: >> The third partition on each drive is assigned as swap, and of >> course it was easy to resize those partitions, leaving an >> additional 512MB between the second and third partitions on each >> drive. All I need to do is move the second partition on each >> drive up by 512MB. > > I'd be tempted to just make these two new 512M spaces into new > partitions for a RAID-1 and move your /boot to that, abandoning the > RAID-1 you have for the /boot that is using the partitions at the > start of the disk. What would you lose? A couple of hundred MB? > Exchanged for a much easier life. > > You could do away with the swap partitions entirely and use swap > files instead. > > You could recycle the first partitions as swap, too. > > Cheers, > Andy Well, the idea is moot, since I have already moved the partitions. (It took less than 10 minutes.) Of course, I could do as you say, but I don't really see the advantages. How would my life be easier? I looked into swap files rather than partitions some time ago. I don't recall all the reasons, but I decided to keep the swap partitions. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-04-20 12:24 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-31 16:44 Trying to rescue a RAID-1 array Bruce Korb 2022-03-31 17:06 ` Wols Lists 2022-03-31 18:14 ` Bruce Korb 2022-03-31 21:34 ` Wols Lists 2022-04-01 17:59 ` Bruce Korb 2022-04-01 18:21 ` Bruce Korb 2022-04-01 19:45 ` Wol 2022-04-01 20:23 ` Bruce Korb 2022-04-01 21:02 ` Wol 2022-04-01 21:24 ` Bruce Korb 2022-04-01 21:33 ` Wol 2022-04-20 8:40 ` Need to move RAID1 with mounted partition Leslie Rhorer 2022-04-20 8:55 ` Roman Mamedov 2022-04-20 9:21 ` Leslie Rhorer 2022-04-20 11:08 ` Andy Smith 2022-04-20 12:07 ` Pascal Hambourg 2022-04-20 12:24 ` Leslie Rhorer 2022-04-20 12:16 ` Leslie Rhorer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.