All of lore.kernel.org
 help / color / mirror / Atom feed
* “root account locked” after removing one RAID1 hard disc
@ 2020-11-30  8:44 c.buhtz
  2020-11-30  9:27 ` antlists
  2020-11-30 20:05 ` partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc") David T-G
  0 siblings, 2 replies; 31+ messages in thread
From: c.buhtz @ 2020-11-30  8:44 UTC (permalink / raw)
  To: linux-raid

X-Post: https://serverfault.com/q/1044339/374973

I tried this out in a VirtualMachine to hope I can learn something.

**Problem**

The RAID1 does not containt any systmem relevant data - the OS is on 
another drive. My Debian 10 does not boot anymore and tells me that I am 
in emergency mode and "Cannot open access to console, the root account 
is locked.". I removed one of the two RAID1 devices before.

And systemd tells me while booting "A start job is running for 
/dev/md127".

**Details***

The virtual machine contains three hard disks. /dev/sda1 use the full 
size of the disc and containts the Debian 10. /dev/sdb and /dev/sdc (as 
discs without partitions) are configured as RAID1 /dev/md127 and 
formated with ext4 and mounted to /Daten. I can read and write without 
any problems to the RAID.

I regualr shutdown and then removed /dev/sdc. After that the system does 
not boot anymore and shows me the error about the locked root account.

**Question 1**

Why is the system so sensible about one RAID device that does not 
contain essential data for the boot process. I would I understand if 
there is a error messages somewhere. But blocking the whole boot process 
is to much in my understanding.

**Question 2**

I read that a single RAID1 device (the second is missing) can be 
accessed without any problems. How can I do that?

**More details**

Here is the output of my fdisk -l. Interesting here is that /dv/md127 is 
shown but without its filesysxtem.

> Disk /dev/sda: 128 GiB, 137438953472 bytes, 268435456 sectors
> Disk model: VBOX HARDDISK
> Disklabel type: dos
> Disk identifier: 0xe3add51d
> 
> Device     Boot Start       End   Sectors  Size Id Type
> /dev/sda1  *     2048 266338303 266336256  127G 83 Linux
> 
> 
> Disk /dev/sdb: 8 GiB, 8589934592 bytes, 16777216 sectors
> Disk model: VBOX HARDDISK
> 
> Disk /dev/sdc: 8 GiB, 8589934592 bytes, 16777216 sectors
> Disk model: VBOX HARDDISK
> 
> Disk /dev/md127: 8 GiB, 8580497408 bytes, 16758784 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes

Here is mount output:

> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> /dev/sda1 on / type ext4 (rw,relatime,errors=remount-ro)
> /dev/md127 on /Daten type ext4 (rw,relatime)

This is /etc/fstab:

> # <file system> <mount point>   <type>  <options>       <dump>  <pass>
> # / was on /dev/sda1 during installation
> UUID=65ec95df-f83f-454e-b7bd-7008d8055d23 /               ext4    
> errors=remount-ro 0       1
> 
> /dev/md127  /Daten      ext4    defaults    0   0



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30  8:44 “root account locked” after removing one RAID1 hard disc c.buhtz
@ 2020-11-30  9:27 ` antlists
  2020-11-30 10:29   ` c.buhtz
  2020-11-30 10:31   ` Reindl Harald
  2020-11-30 20:05 ` partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc") David T-G
  1 sibling, 2 replies; 31+ messages in thread
From: antlists @ 2020-11-30  9:27 UTC (permalink / raw)
  To: linux-raid

On 30/11/2020 08:44, c.buhtz@posteo.jp wrote:
> X-Post: https://serverfault.com/q/1044339/374973
> 
> I tried this out in a VirtualMachine to hope I can learn something.
> 
> **Problem**
> 
> The RAID1 does not containt any systmem relevant data - the OS is on 
> another drive. My Debian 10 does not boot anymore and tells me that I am 
> in emergency mode and "Cannot open access to console, the root account 
> is locked.". I removed one of the two RAID1 devices before.

I don't think this is specific to raid ...
> 
> And systemd tells me while booting "A start job is running for /dev/md127".
> 
> **Details***
> 
> The virtual machine contains three hard disks. /dev/sda1 use the full 
> size of the disc and containts the Debian 10. /dev/sdb and /dev/sdc (as 
> discs without partitions) are configured as RAID1 /dev/md127 and 
> formated with ext4 and mounted to /Daten. I can read and write without 
> any problems to the RAID.
> 
> I regualr shutdown and then removed /dev/sdc. After that the system does 
> not boot anymore and shows me the error about the locked root account.
> 
> **Question 1**
> 
> Why is the system so sensible about one RAID device that does not 
> contain essential data for the boot process. I would I understand if 
> there is a error messages somewhere. But blocking the whole boot process 
> is to much in my understanding.

It's not. It's sensitive to the fact that ANY disk is missing.
> 
> **Question 2**
> 
> I read that a single RAID1 device (the second is missing) can be 
> accessed without any problems. How can I do that?

When a component of a raid disappears without warning, the raid will 
refuse to assemble properly on next boot. You need to get at a command 
line and force-assemble it.
> 
> **More details**
> 
> Here is the output of my fdisk -l. Interesting here is that /dv/md127 is 
> shown but without its filesysxtem.
> 
>> Disk /dev/sda: 128 GiB, 137438953472 bytes, 268435456 sectors
>> Disk model: VBOX HARDDISK
>> Disklabel type: dos
>> Disk identifier: 0xe3add51d
>>
>> Device     Boot Start       End   Sectors  Size Id Type
>> /dev/sda1  *     2048 266338303 266336256  127G 83 Linux
>>
>>
>> Disk /dev/sdb: 8 GiB, 8589934592 bytes, 16777216 sectors
>> Disk model: VBOX HARDDISK
>>
>> Disk /dev/sdc: 8 GiB, 8589934592 bytes, 16777216 sectors
>> Disk model: VBOX HARDDISK
>>
>> Disk /dev/md127: 8 GiB, 8580497408 bytes, 16758784 sectors
>> Units: sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
> 
> Here is mount output:
> 
>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
>> /dev/sda1 on / type ext4 (rw,relatime,errors=remount-ro)
>> /dev/md127 on /Daten type ext4 (rw,relatime)

And here is at least part of your problem. If the mount fails, systemd 
will halt and chuck you into a recovery console. I had exactly the same 
problem with an NTFS partition on a dual-boot system.
> 
> This is /etc/fstab:
> 
>> # <file system> <mount point>   <type>  <options>       <dump>  <pass>
>> # / was on /dev/sda1 during installation
>> UUID=65ec95df-f83f-454e-b7bd-7008d8055d23 /               ext4 
>> errors=remount-ro 0       1
>>
>> /dev/md127  /Daten      ext4    defaults    0   0
> 
> 
Is root's home on /Daten? It shouldn't be.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30  9:27 ` antlists
@ 2020-11-30 10:29   ` c.buhtz
  2020-11-30 11:40     ` Wols Lists
  2020-11-30 10:31   ` Reindl Harald
  1 sibling, 1 reply; 31+ messages in thread
From: c.buhtz @ 2020-11-30 10:29 UTC (permalink / raw)
  To: linux-raid

Thanks for your answer. It tells me that the observed behaviour is usual 
- no matter that I think it should not be usual. ;)

Am 30.11.2020 10:27 schrieb antlists:
>> Why is the system so sensible about one RAID device that does not 
>> contain essential data for the boot process. I would I understand if 
>> there is a error messages somewhere. But blocking the whole boot 
>> process is to much in my understanding.
> 
> It's not. It's sensitive to the fact that ANY disk is missing.

But the system does not need this disc to boot or to run. This IMO 
should not happen.

> When a component of a raid disappears without warning, the raid will
> refuse to assemble properly on next boot. You need to get at a command
> line and force-assemble it.

This is logical to me.

>>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
>>> /dev/sda1 on / type ext4 (rw,relatime,errors=remount-ro)
>>> /dev/md127 on /Daten type ext4 (rw,relatime)
> 
> And here is at least part of your problem. If the mount fails, systemd
> will halt and chuck you into a recovery console.

btw: I am not able to open the recovery console. I am not able to enter 
the shell.
In the Unix/Linux world all things have reasons - no matter that I do 
not know or understand all of them.
But this is systemd. ;)

I see no no reason to stop the boot process just because a unneeded 
data-only partition/drive is not available.

> Is root's home on /Daten? It shouldn't be.

No it is not. As you can see in the second line of the fstab / (and all 
its sub-content like home-dirs, boot, etc) is on /dev/sda1. The RAID is 
build of sdb and sdc.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30  9:27 ` antlists
  2020-11-30 10:29   ` c.buhtz
@ 2020-11-30 10:31   ` Reindl Harald
  2020-11-30 11:10     ` Rudy Zijlstra
  2020-11-30 12:00     ` “root account locked” after removing one RAID1 hard disc Wols Lists
  1 sibling, 2 replies; 31+ messages in thread
From: Reindl Harald @ 2020-11-30 10:31 UTC (permalink / raw)
  To: antlists, linux-raid



Am 30.11.20 um 10:27 schrieb antlists:
>> I read that a single RAID1 device (the second is missing) can be 
>> accessed without any problems. How can I do that?
> 
> When a component of a raid disappears without warning, the raid will 
> refuse to assemble properly on next boot. You need to get at a command 
> line and force-assemble it

since when is it broken that way?

from where should that commandlien come from when the operating system 
itself is on the for no vali dreason not assembling RAID?

luckily the past few years no disks died but on the office server 300 
kilometers from here with /boot, os and /data on RAID1 this was not true 
at least 10 years

* disk died
* boss replaced it and made sure
   the remaining is on the first SATA
   port
* power on
* machine booted
* me partitioned and added the new drive

hell it's and ordinary situation for a RAID that a disk disappears 
without warning because they tend to die from one moment to the next

hell it's expected behavior to boot from the remaining disks, no matter 
RAID1, RAID10, RAID5 as long as there are enough present for the whole 
dataset

the only thing i expect in that case is that it takes a little longer to 
boot when soemthing tries to wait until a timeout for the missing device 
/ componenzt



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30 10:31   ` Reindl Harald
@ 2020-11-30 11:10     ` Rudy Zijlstra
  2020-11-30 11:18       ` Reindl Harald
  2020-11-30 12:00     ` “root account locked” after removing one RAID1 hard disc Wols Lists
  1 sibling, 1 reply; 31+ messages in thread
From: Rudy Zijlstra @ 2020-11-30 11:10 UTC (permalink / raw)
  To: Reindl Harald, antlists, linux-raid



On 30-11-2020 11:31, Reindl Harald wrote:
>
>
> Am 30.11.20 um 10:27 schrieb antlists:
>>> I read that a single RAID1 device (the second is missing) can be 
>>> accessed without any problems. How can I do that?
>>
>> When a component of a raid disappears without warning, the raid will 
>> refuse to assemble properly on next boot. You need to get at a 
>> command line and force-assemble it
>
> since when is it broken that way?
>
> from where should that commandlien come from when the operating system 
> itself is on the for no vali dreason not assembling RAID?
>
> luckily the past few years no disks died but on the office server 300 
> kilometers from here with /boot, os and /data on RAID1 this was not 
> true at least 10 years
>
> * disk died
> * boss replaced it and made sure
>   the remaining is on the first SATA
>   port
> * power on
> * machine booted
> * me partitioned and added the new drive
>
> hell it's and ordinary situation for a RAID that a disk disappears 
> without warning because they tend to die from one moment to the next
>
> hell it's expected behavior to boot from the remaining disks, no 
> matter RAID1, RAID10, RAID5 as long as there are enough present for 
> the whole dataset
>
> the only thing i expect in that case is that it takes a little longer 
> to boot when soemthing tries to wait until a timeout for the missing 
> device / componenzt
>
>
The behavior here in the post is rather debian specific. The initrd from 
debian refuses to continue  if it cannot get all partitions mentioned in 
the fstab. On top i suspect an error in the initrd that the OP is using 
which leads to the raid not coming up with a single disk.

The problems from the OP have imho not much to do with raid, and a lot 
with debian specific issues/perhaps a mistake from the OP

Cheers

Rudy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30 11:10     ` Rudy Zijlstra
@ 2020-11-30 11:18       ` Reindl Harald
  2020-11-30 20:06         ` ???root account locked??? " David T-G
  0 siblings, 1 reply; 31+ messages in thread
From: Reindl Harald @ 2020-11-30 11:18 UTC (permalink / raw)
  To: Rudy Zijlstra, antlists, linux-raid


Am 30.11.20 um 12:10 schrieb Rudy Zijlstra:
> 
> On 30-11-2020 11:31, Reindl Harald wrote:
>>
>>
>> Am 30.11.20 um 10:27 schrieb antlists:
>>>> I read that a single RAID1 device (the second is missing) can be 
>>>> accessed without any problems. How can I do that?
>>>
>>> When a component of a raid disappears without warning, the raid will 
>>> refuse to assemble properly on next boot. You need to get at a 
>>> command line and force-assemble it
>>
>> since when is it broken that way?
>>
>> from where should that commandlien come from when the operating system 
>> itself is on the for no vali dreason not assembling RAID?
>>
>> luckily the past few years no disks died but on the office server 300 
>> kilometers from here with /boot, os and /data on RAID1 this was not 
>> true at least 10 years
>>
>> * disk died
>> * boss replaced it and made sure
>>   the remaining is on the first SATA
>>   port
>> * power on
>> * machine booted
>> * me partitioned and added the new drive
>>
>> hell it's and ordinary situation for a RAID that a disk disappears 
>> without warning because they tend to die from one moment to the next
>>
>> hell it's expected behavior to boot from the remaining disks, no 
>> matter RAID1, RAID10, RAID5 as long as there are enough present for 
>> the whole dataset
>>
>> the only thing i expect in that case is that it takes a little longer 
>> to boot when soemthing tries to wait until a timeout for the missing 
>> device / componenzt
>>
>>
> The behavior here in the post is rather debian specific. The initrd from 
> debian refuses to continue  if it cannot get all partitions mentioned in 
> the fstab. 

that is normal behavior but don't apply to a RAID with a missing device, 
that's the R in RAID about :-)

> On top i suspect an error in the initrd that the OP is using 
> which leads to the raid not coming up with a single disk.
> 
> The problems from the OP have imho not much to do with raid, and a lot 
> with debian specific issues/perhaps a mistake from the OP

good to know, on Fedora i am used not to care about missing RAID devices 
as long there are enough remaining

there is some timeout which takes boot longer than usual but at the end 
the machines are coming up as usual, mdmonitor fires a mail whining 
about degraded RAID adn that's it

that behavior makes the difference a trained monkey can replace the dead 
disk and the rest is done by me via ssh or having real trouble needing 
physical precence

typically fire up my "raid-repair.sh" telling the script source and 
target disk for cloning partition table, mbr and finally add the new 
partitions to start the rebuild

[root@srv-rhsoft:~]$ df
Dateisystem    Typ  Größe Benutzt Verf. Verw% Eingehängt auf
/dev/md1       ext4   29G    7,8G   21G   28% /
/dev/md2       ext4  3,6T    1,2T  2,4T   34% /mnt/data
/dev/md0       ext4  485M     48M  433M   10% /boot

[root@srv-rhsoft:~]$ cat /proc/mdstat
Personalities : [raid10] [raid1]
md1 : active raid10 sdc2[6] sdd2[5] sdb2[7] sda2[4]
       30716928 blocks super 1.1 256K chunks 2 near-copies [4/4] [UUUU]
       bitmap: 0/1 pages [0KB], 65536KB chunk

md2 : active raid10 sdd3[5] sdb3[7] sdc3[6] sda3[4]
       3875222528 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
       bitmap: 2/29 pages [8KB], 65536KB chunk

md0 : active raid1 sdc1[6] sdd1[5] sdb1[7] sda1[4]
       511988 blocks super 1.0 [4/4] [UUUU]

unused devices: <none>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30 10:29   ` c.buhtz
@ 2020-11-30 11:40     ` Wols Lists
  0 siblings, 0 replies; 31+ messages in thread
From: Wols Lists @ 2020-11-30 11:40 UTC (permalink / raw)
  To: c.buhtz, linux-raid

On 30/11/20 10:29, c.buhtz@posteo.jp wrote:
>> And here is at least part of your problem. If the mount fails, systemd
>> will halt and chuck you into a recovery console.
> 
> btw: I am not able to open the recovery console. I am not able to enter
> the shell.
> In the Unix/Linux world all things have reasons - no matter that I do
> not know or understand all of them.
> But this is systemd. ;)
> 
> I see no no reason to stop the boot process just because a unneeded
> data-only partition/drive is not available.

You haven't told systemd it's not needed. How else is it supposed to know?

There's some option you put in fstab which says "don't worry if you
can't mount this disk". My NTFS drive wasn't needed, but it still caused
the boot to crash ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30 10:31   ` Reindl Harald
  2020-11-30 11:10     ` Rudy Zijlstra
@ 2020-11-30 12:00     ` Wols Lists
  2020-11-30 12:13       ` Reindl Harald
  1 sibling, 1 reply; 31+ messages in thread
From: Wols Lists @ 2020-11-30 12:00 UTC (permalink / raw)
  To: Reindl Harald, linux-raid

On 30/11/20 10:31, Reindl Harald wrote:
> since when is it broken that way?
> 
> from where should that commandlien come from when the operating system
> itself is on the for no vali dreason not assembling RAID?
> 
> luckily the past few years no disks died but on the office server 300
> kilometers from here with /boot, os and /data on RAID1 this was not true
> at least 10 years
> 
> * disk died
> * boss replaced it and made sure
>   the remaining is on the first SATA
>   port
> * power on
> * machine booted
> * me partitioned and added the new drive
> 
> hell it's and ordinary situation for a RAID that a disk disappears
> without warning because they tend to die from one moment to the next
> 
> hell it's expected behavior to boot from the remaining disks, no matter
> RAID1, RAID10, RAID5 as long as there are enough present for the whole
> dataset
> 
> the only thing i expect in that case is that it takes a little longer to
> boot when soemthing tries to wait until a timeout for the missing device
> / componenzt
> 
So what happened? The disk failed, you shut down the server, the boss
replaced it, and you rebooted?

In that case I would EXPECT the system to come back - the superblock
matches the disks, the system says "everything is as it was", and your
degraded array boots fine.

EXCEPT THAT'S NOT WHAT IS HAPPENING HERE.

The - fully functional - array is shut down.

A disk is removed.

On boot, reality and the superblock DISAGREE. In which case the system
takes the only sensible route, screams "help!", and waits for MANUAL
INTERVENTION.

That's why you only have to force a degraded array to boot once - once
the disks and superblock are back in sync, the system assumes the ops
know about it.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30 12:00     ` “root account locked” after removing one RAID1 hard disc Wols Lists
@ 2020-11-30 12:13       ` Reindl Harald
  2020-11-30 13:11         ` antlists
  0 siblings, 1 reply; 31+ messages in thread
From: Reindl Harald @ 2020-11-30 12:13 UTC (permalink / raw)
  To: Wols Lists, linux-raid



Am 30.11.20 um 13:00 schrieb Wols Lists:
> On 30/11/20 10:31, Reindl Harald wrote:
>> since when is it broken that way?
>>
>> from where should that commandlien come from when the operating system
>> itself is on the for no vali dreason not assembling RAID?
>>
>> luckily the past few years no disks died but on the office server 300
>> kilometers from here with /boot, os and /data on RAID1 this was not true
>> at least 10 years
>>
>> * disk died
>> * boss replaced it and made sure
>>    the remaining is on the first SATA
>>    port
>> * power on
>> * machine booted
>> * me partitioned and added the new drive
>>
>> hell it's and ordinary situation for a RAID that a disk disappears
>> without warning because they tend to die from one moment to the next
>>
>> hell it's expected behavior to boot from the remaining disks, no matter
>> RAID1, RAID10, RAID5 as long as there are enough present for the whole
>> dataset
>>
>> the only thing i expect in that case is that it takes a little longer to
>> boot when soemthing tries to wait until a timeout for the missing device
>> / componenzt
>>
> So what happened? The disk failed, you shut down the server, the boss
> replaced it, and you rebooted?

in most cases smartd shouts a warning, the machine is powered down 
*without* remove the partitions from the RAID devices

the disk with SMART alerts is replaced by a blank, unpartitioned one

the remaining disk is made to be sure on the first SATA so that the 
first disk found by the BIOS is not the new blank one

> In that case I would EXPECT the system to come back - the superblock
> matches the disks, the system says "everything is as it was", and your
> degraded array boots fine.

correct, RAID comes up degraded

> EXCEPT THAT'S NOT WHAT IS HAPPENING HERE.
> 
> The - fully functional - array is shut down.
> 
> A disk is removed.
> 
> On boot, reality and the superblock DISAGREE. In which case the system
> takes the only sensible route, screams "help!", and waits for MANUAL
> INTERVENTION.

but i fail to see the difference and to understand why reality and 
superblock disagree, it shouldn't matter how and when a disk is removed, 
it's not there, so what as long as there are enough disks to bring the 
array up

in my case the fully functional array is shutdown too by shutdown the 
machine and after that one disk is replaced and when the RAID comes up 
there is a disk logically missing because on it's place is a blank one 
without any partitions

> That's why you only have to force a degraded array to boot once - once
> the disks and superblock are back in sync, the system assumes the ops
> know about it.
still don't get how that happens and why

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30 12:13       ` Reindl Harald
@ 2020-11-30 13:11         ` antlists
  2020-11-30 13:16           ` Reindl Harald
  0 siblings, 1 reply; 31+ messages in thread
From: antlists @ 2020-11-30 13:11 UTC (permalink / raw)
  To: Reindl Harald, linux-raid

On 30/11/2020 12:13, Reindl Harald wrote:
> 
> 
> Am 30.11.20 um 13:00 schrieb Wols Lists:
>> On 30/11/20 10:31, Reindl Harald wrote:
>>> since when is it broken that way?
>>>
>>> from where should that commandlien come from when the operating system
>>> itself is on the for no vali dreason not assembling RAID?
>>>
>>> luckily the past few years no disks died but on the office server 300
>>> kilometers from here with /boot, os and /data on RAID1 this was not true
>>> at least 10 years
>>>
>>> * disk died
>>> * boss replaced it and made sure
>>>    the remaining is on the first SATA
>>>    port
>>> * power on
>>> * machine booted
>>> * me partitioned and added the new drive
>>>
>>> hell it's and ordinary situation for a RAID that a disk disappears
>>> without warning because they tend to die from one moment to the next
>>>
>>> hell it's expected behavior to boot from the remaining disks, no matter
>>> RAID1, RAID10, RAID5 as long as there are enough present for the whole
>>> dataset
>>>
>>> the only thing i expect in that case is that it takes a little longer to
>>> boot when soemthing tries to wait until a timeout for the missing device
>>> / componenzt
>>>
>> So what happened? The disk failed, you shut down the server, the boss
>> replaced it, and you rebooted?
> 
> in most cases smartd shouts a warning, the machine is powered down 
> *without* remove the partitions from the RAID devices

And? The partitions have nothing to do with it.

The disk failed, the system was shut down, THE SUPERBLOCK WAS UPDATED!
> 
> the disk with SMART alerts is replaced by a blank, unpartitioned one
> 
> the remaining disk is made to be sure on the first SATA so that the 
> first disk found by the BIOS is not the new blank one
> 
>> In that case I would EXPECT the system to come back - the superblock
>> matches the disks, the system says "everything is as it was", and your
>> degraded array boots fine.
> 
> correct, RAID comes up degraded
> 
>> EXCEPT THAT'S NOT WHAT IS HAPPENING HERE.
>>
>> The - fully functional - array is shut down.
>>
>> A disk is removed.
>>
>> On boot, reality and the superblock DISAGREE. In which case the system
>> takes the only sensible route, screams "help!", and waits for MANUAL
>> INTERVENTION.
> 
> but i fail to see the difference and to understand why reality and 
> superblock disagree,

In YOUR case the array was degraded BEFORE shutdown. In the OP's case, 
the array was degraded AFTER shutdown.

> it shouldn't matter how and when a disk is removed, 
> it's not there, so what as long as there are enough disks to bring the 
> array up

FFS - how on earth is the system supposed to update the superblock, if 
it's SWITCHED OFF. !?!?
> 
> in my case the fully functional array is shutdown too by shutdown the 
> machine and after that one disk is replaced and when the RAID comes up 
> there is a disk logically missing because on it's place is a blank one 
> without any partitions
> 
>> That's why you only have to force a degraded array to boot once - once
>> the disks and superblock are back in sync, the system assumes the ops
>> know about it.
> still don't get how that happens and why

Just ask yourself this simple question. "Did the array change state 
BETWEEN SHUTDOWN AND BOOT?". In *your* case the answer is "no", in the 
OP's case it is "yes". And THAT is what matters - if the array is 
degraded at boot, but was fully functional at shutdown, the raid system 
screams for help.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30 13:11         ` antlists
@ 2020-11-30 13:16           ` Reindl Harald
  2020-11-30 13:47             ` antlists
  0 siblings, 1 reply; 31+ messages in thread
From: Reindl Harald @ 2020-11-30 13:16 UTC (permalink / raw)
  To: antlists, linux-raid



Am 30.11.20 um 14:11 schrieb antlists:
> On 30/11/2020 12:13, Reindl Harald wrote:
>> but i fail to see the difference and to understand why reality and 
>> superblock disagree,
> 
> In YOUR case the array was degraded BEFORE shutdown. In the OP's case, 
> the array was degraded AFTER shutdown

no, no and no again!

* the array is full opertional
* smartd fires a warning
* the machine is shut down
* after that the drive is replaced
* so the array get degraded AFTER shutdown
* at power-on RAID partitions are missing


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30 13:16           ` Reindl Harald
@ 2020-11-30 13:47             ` antlists
  2020-11-30 13:53               ` Reindl Harald
  0 siblings, 1 reply; 31+ messages in thread
From: antlists @ 2020-11-30 13:47 UTC (permalink / raw)
  To: Reindl Harald, linux-raid

On 30/11/2020 13:16, Reindl Harald wrote:
> 
> 
> Am 30.11.20 um 14:11 schrieb antlists:
>> On 30/11/2020 12:13, Reindl Harald wrote:
>>> but i fail to see the difference and to understand why reality and 
>>> superblock disagree,
>>
>> In YOUR case the array was degraded BEFORE shutdown. In the OP's case, 
>> the array was degraded AFTER shutdown
> 
> no, no and no again!
> 
> * the array is full opertional
> * smartd fires a warning

Ahhh ... but you said in your previous post(s) "the disk died". Not that 
it was just a warning.

> * the machine is shut down
> * after that the drive is replaced
> * so the array get degraded AFTER shutdown
> * at power-on RAID partitions are missing
> 
But we've had a post in the last week or so of someone who's array 
behaved exactly as I described. So I wonder what's going on ...

I need to get my test system up so I can play with these sort of things...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30 13:47             ` antlists
@ 2020-11-30 13:53               ` Reindl Harald
  2020-11-30 14:46                 ` Rudy Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Reindl Harald @ 2020-11-30 13:53 UTC (permalink / raw)
  To: antlists, linux-raid



Am 30.11.20 um 14:47 schrieb antlists:
> On 30/11/2020 13:16, Reindl Harald wrote:
>>
>>
>> Am 30.11.20 um 14:11 schrieb antlists:
>>> On 30/11/2020 12:13, Reindl Harald wrote:
>>>> but i fail to see the difference and to understand why reality and 
>>>> superblock disagree,
>>>
>>> In YOUR case the array was degraded BEFORE shutdown. In the OP's 
>>> case, the array was degraded AFTER shutdown
>>
>> no, no and no again!
>>
>> * the array is full opertional
>> * smartd fires a warning
> 
> Ahhh ... but you said in your previous post(s) "the disk died". Not that 
> it was just a warning.
> 
>> * the machine is shut down
>> * after that the drive is replaced
>> * so the array get degraded AFTER shutdown
>> * at power-on RAID partitions are missing
>>
> But we've had a post in the last week or so of someone who's array 
> behaved exactly as I described. So I wonder what's going on ...
> 
> I need to get my test system up so I can play with these sort of things...

and that's why i asked since when it's that broken

i expect a RAID simply coming up as nothign happened as long there are 
enough disks remaining to have the complete dataset

it's also not uncommon that a disk dies between power-cycles aka simply 
don't come up again which is the same as replace it when the machine is 
powered off

i replaced a ton of disks in Linux RAID1/RAID10 setups over the years 
that way and in some cases i cloed machines by put 2 out of 4 RAID10 
disks in a new machine and insert 2 blank disks in both

* spread the disks between both machines
* power on
* login via SSH
* start rebuild the array on both
* change hostname and network config of one

for me it's an ordinary event a RAID has to cope without interaction 
*before* boot to a normal OS state and if it doesn't it's a serious bug





^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: “root account locked” after removing one RAID1 hard disc
  2020-11-30 13:53               ` Reindl Harald
@ 2020-11-30 14:46                 ` Rudy Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Rudy Zijlstra @ 2020-11-30 14:46 UTC (permalink / raw)
  To: Reindl Harald, antlists, linux-raid



On 30-11-2020 14:53, Reindl Harald wrote:
>
>
> Am 30.11.20 um 14:47 schrieb antlists:
>> On 30/11/2020 13:16, Reindl Harald wrote:
>>>
>>>
>>> Am 30.11.20 um 14:11 schrieb antlists:
>>>> On 30/11/2020 12:13, Reindl Harald wrote:
>>>>> but i fail to see the difference and to understand why reality and 
>>>>> superblock disagree,
>>>>
>>>> In YOUR case the array was degraded BEFORE shutdown. In the OP's 
>>>> case, the array was degraded AFTER shutdown
>>>
>>> no, no and no again!
>>>
>>> * the array is full opertional
>>> * smartd fires a warning
>>
>> Ahhh ... but you said in your previous post(s) "the disk died". Not 
>> that it was just a warning.
>>
>>> * the machine is shut down
>>> * after that the drive is replaced
>>> * so the array get degraded AFTER shutdown
>>> * at power-on RAID partitions are missing
>>>
>> But we've had a post in the last week or so of someone who's array 
>> behaved exactly as I described. So I wonder what's going on ...
>>
>> I need to get my test system up so I can play with these sort of 
>> things...
>
> and that's why i asked since when it's that broken
>
> i expect a RAID simply coming up as nothign happened as long there are 
> enough disks remaining to have the complete dataset
>
> it's also not uncommon that a disk dies between power-cycles aka 
> simply don't come up again which is the same as replace it when the 
> machine is powered off
>
> i replaced a ton of disks in Linux RAID1/RAID10 setups over the years 
> that way and in some cases i cloed machines by put 2 out of 4 RAID10 
> disks in a new machine and insert 2 blank disks in both
>
> * spread the disks between both machines
> * power on
> * login via SSH
> * start rebuild the array on both
> * change hostname and network config of one
>
> for me it's an ordinary event a RAID has to cope without interaction 
> *before* boot to a normal OS state and if it doesn't it's a serious bug
>
Same thing here...

which is also why i am saying that in addition to the normal behavior of 
debian initrd, i think the OP has made another mistake. This should just 
work.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-11-30  8:44 “root account locked” after removing one RAID1 hard disc c.buhtz
  2020-11-30  9:27 ` antlists
@ 2020-11-30 20:05 ` David T-G
  2020-11-30 20:51   ` antlists
                     ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: David T-G @ 2020-11-30 20:05 UTC (permalink / raw)
  To: linux-raid

Hello!

...and then c.buhtz@posteo.jp said...
% 
...
% /dev/sdc (as discs without partitions) are configured as RAID1

This is fine, although it makes me queasy; I always create a partition
table and use a partition as a RAID device, and I leave a sliver at the
back end to hold useful information when the array is completely toasted.
But that's just me :-)


% /dev/md127 and formated with ext4 and mounted to /Daten. I can read
% and write without any problems to the RAID.

On the other hand, this part is interesting ...


% 
...
% 
% Here is the output of my fdisk -l. Interesting here is that
% /dv/md127 is shown but without its filesysxtem.
% 
...
% >Disk /dev/md127: 8 GiB, 8580497408 bytes, 16758784 sectors
% >Units: sectors of 1 * 512 = 512 bytes
% >Sector size (logical/physical): 512 bytes / 512 bytes
% >I/O size (minimum/optimal): 512 bytes / 512 bytes
% 
...
% >/dev/md127 on /Daten type ext4 (rw,relatime)
% 
...
% >UUID=65ec95df-f83f-454e-b7bd-7008d8055d23 /               ext4
% >errors=remount-ro 0       1
% >
% >/dev/md127  /Daten      ext4    defaults    0   0

You don't see any "filesystem" or, more correctly, partition in your

  fdisk -l

output because you have apparently created your filesystem on the entire
device (hey, I didn't know one could do that!).  That conclusion is
supported by your mount point (/dev/md127 rather than /dev/md127p1 or
similar) and your fstab entry (same).

So the display isn't interesting, although the logic behind that approach
certainly is to me.


HTH & HAND

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ???root account locked??? after removing one RAID1 hard disc
  2020-11-30 11:18       ` Reindl Harald
@ 2020-11-30 20:06         ` David T-G
  2020-11-30 21:57           ` Reindl Harald
  0 siblings, 1 reply; 31+ messages in thread
From: David T-G @ 2020-11-30 20:06 UTC (permalink / raw)
  To: linux-raid

Reindl, et al --

...and then Reindl Harald said...
% 
...
% 
% typically fire up my "raid-repair.sh" telling the script source and
% target disk for cloning partition table, mbr and finally add the new
% partitions to start the rebuild
[snip]

Oooh!  How handy :-)  Share, please!


HAND

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-11-30 20:05 ` partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc") David T-G
@ 2020-11-30 20:51   ` antlists
  2020-11-30 21:03     ` Rudy Zijlstra
                       ` (3 more replies)
  2020-12-01  8:41   ` buhtz
  2020-12-01  8:42   ` c.buhtz
  2 siblings, 4 replies; 31+ messages in thread
From: antlists @ 2020-11-30 20:51 UTC (permalink / raw)
  To: David T-G, linux-raid

On 30/11/2020 20:05, David T-G wrote:
> You don't see any "filesystem" or, more correctly, partition in your
> 
>    fdisk -l
> 
> output because you have apparently created your filesystem on the entire
> device (hey, I didn't know one could do that!). 

That, actually, is the norm. It is NOT normal to partition a raid array.

It's also not usual (which the OP has done) to create a raid array on 
top of raw devices rather than partitions - although this is down to the 
fact that various *other* utilities seem to assume that an unpartitioned 
device is free space that can be trampled on. Every now and then people 
seem to lose their arrays because an MBR or GPT has mysteriously 
appeared on the disk.

> That conclusion is
> supported by your mount point (/dev/md127 rather than /dev/md127p1 or
> similar) and your fstab entry (same).
> 
> So the display isn't interesting, although the logic behind that approach
> certainly is to me.

Your approach seems to be at odds with *normal* practice, although there 
is nothing wrong with it. At the end of the day, as far as linux is 
concerned, one block device is much the same as any other.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-11-30 20:51   ` antlists
@ 2020-11-30 21:03     ` Rudy Zijlstra
  2020-11-30 21:49     ` Reindl Harald
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Rudy Zijlstra @ 2020-11-30 21:03 UTC (permalink / raw)
  To: antlists, David T-G, linux-raid



Op 30-11-20 om 21:51 schreef antlists:
> On 30/11/2020 20:05, David T-G wrote:
>> You don't see any "filesystem" or, more correctly, partition in your
>>
>>    fdisk -l
>>
>> output because you have apparently created your filesystem on the entire
>> device (hey, I didn't know one could do that!). 
>
> That, actually, is the norm. It is NOT normal to partition a raid array.
>
> It's also not usual (which the OP has done) to create a raid array on
> top of raw devices rather than partitions - although this is down to
> the fact that various *other* utilities seem to assume that an
> unpartitioned device is free space that can be trampled on. Every now
> and then people seem to lose their arrays because an MBR or GPT has
> mysteriously appeared on the disk.

And as long as you take care that the mdadm.conf in the initrd is
correct and reflects the true status, all should be well.

If this is not the case, you almost certainly get initrd barfing at you,
if not at first boot in that condition, than once an error has occurred.

Cheers

Rudy

P.S. doing a raid as the OP has done is not what the debian installer
will do.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-11-30 20:51   ` antlists
  2020-11-30 21:03     ` Rudy Zijlstra
@ 2020-11-30 21:49     ` Reindl Harald
  2020-11-30 22:31       ` antlists
  2020-11-30 22:04     ` partitions & filesystems David T-G
  2020-12-01  8:45     ` partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc") c.buhtz
  3 siblings, 1 reply; 31+ messages in thread
From: Reindl Harald @ 2020-11-30 21:49 UTC (permalink / raw)
  To: antlists, David T-G, linux-raid



Am 30.11.20 um 21:51 schrieb antlists:
> On 30/11/2020 20:05, David T-G wrote:
>> You don't see any "filesystem" or, more correctly, partition in your
>>
>>    fdisk -l
>>
>> output because you have apparently created your filesystem on the entire
>> device (hey, I didn't know one could do that!). 
> 
> That, actually, is the norm. It is NOT normal to partition a raid array

it IS normal for several reasons

* you may end with a replacement disk which is subtle
   smaller and hence you normally make the partition
   a little smaller then the device

* in case of SSD you reserve some space for
   better wear-levelling

* in case you ever need to reinstall your OS you
   want your data not in the same filesystem

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ???root account locked??? after removing one RAID1 hard disc
  2020-11-30 20:06         ` ???root account locked??? " David T-G
@ 2020-11-30 21:57           ` Reindl Harald
  2020-11-30 22:06             ` RAID repair script (was "Re: ???root account locked??? after removing one RAID1 hard disc" David T-G
  0 siblings, 1 reply; 31+ messages in thread
From: Reindl Harald @ 2020-11-30 21:57 UTC (permalink / raw)
  To: David T-G, linux-raid



Am 30.11.20 um 21:06 schrieb David T-G:
> Reindl, et al --
> 
> ...and then Reindl Harald said...
> %
> ...
> %
> % typically fire up my "raid-repair.sh" telling the script source and
> % target disk for cloning partition table, mbr and finally add the new
> % partitions to start the rebuild
> [snip]
> 
> Oooh!  How handy :-)  Share, please!

just make sure GOOD_DISK is one of the remaining and BAD_DISK is the 
repalcement drive before uncomment the "exit"

and yeah, adjust how many raid-partitions are there

the first is my homeserver with 3 filesystems (boot, system, data), the 
second one is a RAID10 on a HP microserver with the OS on a sd-card

---------------------------------------------------------------------

DOS:

[root@srv-rhsoft:~]$ cat /scripts/raid-recovery.sh
#!/usr/bin/bash

GOOD_DISK="/dev/sda"
BAD_DISK="/dev/sdd"

echo "NOT NOW"
exit 1

# clone MBR
dd if=$GOOD_DISK of=$BAD_DISK bs=512 count=1

# force OS to read partition tables
partprobe $BAD_DISK

# start RAID recovery
mdadm /dev/md0 --add ${BAD_DISK}1
mdadm /dev/md1 --add ${BAD_DISK}2
mdadm /dev/md2 --add ${BAD_DISK}3

# print RAID status on screen
sleep 5
cat /proc/mdstat

# install bootloader on replacement disk
grub2-install "$BAD_DISK"

---------------------------------------------------------------------

GPT:

[root@nfs:~]$ cat /scripts/raid-recovery.sh
#!/usr/bin/bash

GOOD_DISK="/dev/sda"
BAD_DISK="/dev/sde"

echo "NOT NOW"
exit 1

echo "sgdisk $GOOD_DISK -R $BAD_DISK"
sgdisk $GOOD_DISK -R $BAD_DISK

echo "sgdisk -G $BAD_DISK"
sgdisk -G $BAD_DISK

echo "sleep 5"
sleep 5

echo "partprobe $BAD_DISK"
partprobe $BAD_DISK

echo "sleep 5"
sleep 5

echo "mdadm /dev/md0 --add ${BAD_DISK}1"
mdadm /dev/md0 --add ${BAD_DISK}1

echo "sleep 5"
sleep 5

echo "cat /proc/mdstat"
cat /proc/mdstat

---------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: partitions & filesystems
  2020-11-30 20:51   ` antlists
  2020-11-30 21:03     ` Rudy Zijlstra
  2020-11-30 21:49     ` Reindl Harald
@ 2020-11-30 22:04     ` David T-G
  2020-12-01  8:45     ` partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc") c.buhtz
  3 siblings, 0 replies; 31+ messages in thread
From: David T-G @ 2020-11-30 22:04 UTC (permalink / raw)
  To: linux-raid

Wol, et al --

...and then antlists said...
% 
% On 30/11/2020 20:05, David T-G wrote:
% >
% >output because you have apparently created your filesystem on the entire
% >device (hey, I didn't know one could do that!).
% 
% That, actually, is the norm. It is NOT normal to partition a raid array.

Oh!  That's two things I've learned today :-)


% 
...
% >So the display isn't interesting, although the logic behind that approach
% >certainly is to me.
% 
% Your approach seems to be at odds with *normal* practice, although

Well, that isn't surprising; I haven't been a real^Wprofessional Sys Admin
for a decade or more now.  I don't by any means pretend to even think that
I know what's what :-)  I was happy to just be able to explain why the OP's
"filesystem" was missing.


% there is nothing wrong with it. At the end of the day, as far as
% linux is concerned, one block device is much the same as any other.

True.  TMTOWTDI :-)


% 
% Cheers,
% Wol


Thanks & HAND

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RAID repair script (was "Re: ???root account locked??? after removing one RAID1 hard disc"
  2020-11-30 21:57           ` Reindl Harald
@ 2020-11-30 22:06             ` David T-G
  0 siblings, 0 replies; 31+ messages in thread
From: David T-G @ 2020-11-30 22:06 UTC (permalink / raw)
  To: linux-raid

Reindl, et al --

...and then Reindl Harald said...
% 
% Am 30.11.20 um 21:06 schrieb David T-G:
% >
% >Oooh!  How handy :-)  Share, please!
% 
% just make sure GOOD_DISK is one of the remaining and BAD_DISK is the
% repalcement drive before uncomment the "exit"
% 
% and yeah, adjust how many raid-partitions are there
% 
% the first is my homeserver with 3 filesystems (boot, system, data),
% the second one is a RAID10 on a HP microserver with the OS on a
% sd-card
[snip]

Thanks!


HANN

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-11-30 21:49     ` Reindl Harald
@ 2020-11-30 22:31       ` antlists
  2020-11-30 23:21         ` Reindl Harald
  0 siblings, 1 reply; 31+ messages in thread
From: antlists @ 2020-11-30 22:31 UTC (permalink / raw)
  To: Reindl Harald, David T-G, linux-raid

On 30/11/2020 21:49, Reindl Harald wrote:
> 
> 
> Am 30.11.20 um 21:51 schrieb antlists:
>> On 30/11/2020 20:05, David T-G wrote:
>>> You don't see any "filesystem" or, more correctly, partition in your
>>>
>>>    fdisk -l
>>>
>>> output because you have apparently created your filesystem on the entire
>>> device (hey, I didn't know one could do that!). 
>>
>> That, actually, is the norm. It is NOT normal to partition a raid array
> 
> it IS normal for several reasons
> 
> * you may end with a replacement disk which is subtle
>    smaller and hence you normally make the partition
>    a little smaller then the device

And what does that have to do with the price of tea in China? 
Partitioning your array doesn't have any effect on whether or not said 
array uses the whole disk or not.
> 
> * in case of SSD you reserve some space for
>    better wear-levelling

And putting a partition table in your array does exactly what to help that?
> 
> * in case you ever need to reinstall your OS you
>    want your data not in the same filesystem

Which is why I partition my disk. I use one raid for the OS, and another 
for my data.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-11-30 22:31       ` antlists
@ 2020-11-30 23:21         ` Reindl Harald
  2020-11-30 23:59           ` antlists
  0 siblings, 1 reply; 31+ messages in thread
From: Reindl Harald @ 2020-11-30 23:21 UTC (permalink / raw)
  To: antlists, David T-G, linux-raid



Am 30.11.20 um 23:31 schrieb antlists:
> On 30/11/2020 21:49, Reindl Harald wrote:
>>
>>
>> Am 30.11.20 um 21:51 schrieb antlists:
>>> On 30/11/2020 20:05, David T-G wrote:
>>>> You don't see any "filesystem" or, more correctly, partition in your
>>>>
>>>>    fdisk -l
>>>>
>>>> output because you have apparently created your filesystem on the 
>>>> entire
>>>> device (hey, I didn't know one could do that!). 
>>>
>>> That, actually, is the norm. It is NOT normal to partition a raid array
>>
>> it IS normal for several reasons
>>
>> * you may end with a replacement disk which is subtle
>>    smaller and hence you normally make the partition
>>    a little smaller then the device
> 
> And what does that have to do with the price of tea in China? 
> Partitioning your array doesn't have any effect on whether or not said 
> array uses the whole disk or not.
>>
>> * in case of SSD you reserve some space for
>>    better wear-levelling
> 
> And putting a partition table in your array does exactly what to help that?
>>
>> * in case you ever need to reinstall your OS you
>>    want your data not in the same filesystem
> 
> Which is why I partition my disk. I use one raid for the OS, and another 
> for my data.
but than "fdisk -l" shows partitions - what the hell.....
that above you responded to has no partitions but the whole drive for 
the RAID

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-11-30 23:21         ` Reindl Harald
@ 2020-11-30 23:59           ` antlists
  0 siblings, 0 replies; 31+ messages in thread
From: antlists @ 2020-11-30 23:59 UTC (permalink / raw)
  To: Reindl Harald, David T-G, linux-raid

On 30/11/2020 23:21, Reindl Harald wrote:
>> And putting a partition table in your array does exactly what to help 
>> that?
>>>
>>> * in case you ever need to reinstall your OS you
>>>    want your data not in the same filesystem
>>
>> Which is why I partition my disk. I use one raid for the OS, and 
>> another for my data.
> but than "fdisk -l" shows partitions - what the hell.....
> that above you responded to has no partitions but the whole drive for 
> the RAID

Yup - the raid sits directly on top of the hard drive - AND IS ITSELF 
PARTITIONED.

Both of those are not normal - it is not normal for a raid to sit on 
bare drives, and it is not normal for a raid to contain partitions.

(Although there's nothing actually wrong with either :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 31+ messages in thread

* re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-11-30 20:05 ` partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc") David T-G
  2020-11-30 20:51   ` antlists
@ 2020-12-01  8:41   ` buhtz
  2020-12-01  9:13     ` Reindl Harald
  2020-12-01  8:42   ` c.buhtz
  2 siblings, 1 reply; 31+ messages in thread
From: buhtz @ 2020-12-01  8:41 UTC (permalink / raw)
  To: David T-G; +Cc: linux-raid

Dear David and others,

thanks a lot for so much discussion and details. I learn a lot.
Following your discussions I see there still is some basic knowledge 
missing on my side.

Am 30.11.2020 21:05 schrieb David T-G:
> You don't see any "filesystem" or, more correctly, partition in your
> 
>   fdisk -l

I do not see the partition in the output of "fdisk -l".

But I can (when both discs are present) mount /dev/md127 (manualy via 
mount and via fstab) to /Daten and create files on it.

> So the display isn't interesting, although the logic behind that 
> approach
> certainly is to me.

I plugged in the nacked hard discs and they appear as /dev/sdb and 
/dev/sdc. After that
   mdadm --create /dev/md/md0 --level=1 --raid-devices=2 /dev/sdb 
/dev/sdc
Then I did
  ls -l /dev/md/md0 and found out this is just a link to /dev/md127.
I formated the raid with
  mkdfs.ext4 /dev/md127
Then I mounted (first manually via mount and after sucess via fstab) 
/dev/md127 to /Daten

Is this unusual?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-11-30 20:05 ` partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc") David T-G
  2020-11-30 20:51   ` antlists
  2020-12-01  8:41   ` buhtz
@ 2020-12-01  8:42   ` c.buhtz
  2 siblings, 0 replies; 31+ messages in thread
From: c.buhtz @ 2020-12-01  8:42 UTC (permalink / raw)
  To: David T-G; +Cc: linux-raid

Dear David and others,

thanks a lot for so much discussion and details. I learn a lot.
Following your discussions I see there still is some basic knowledge 
missing on my side.

Am 30.11.2020 21:05 schrieb David T-G:
> You don't see any "filesystem" or, more correctly, partition in your
> 
>   fdisk -l

I do not see the partition in the output of "fdisk -l".

But I can (when both discs are present) mount /dev/md127 (manualy via 
mount and via fstab) to /Daten and create files on it.

> So the display isn't interesting, although the logic behind that 
> approach
> certainly is to me.

I plugged in the nacked hard discs and they appear as /dev/sdb and 
/dev/sdc. After that
   mdadm --create /dev/md/md0 --level=1 --raid-devices=2 /dev/sdb 
/dev/sdc
Then I did
  ls -l /dev/md/md0 and found out this is just a link to /dev/md127.
I formated the raid with
  mkdfs.ext4 /dev/md127
Then I mounted (first manually via mount and after sucess via fstab) 
/dev/md127 to /Daten

Is this unusual?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-11-30 20:51   ` antlists
                       ` (2 preceding siblings ...)
  2020-11-30 22:04     ` partitions & filesystems David T-G
@ 2020-12-01  8:45     ` c.buhtz
  2020-12-01  9:18       ` Rudy Zijlstra
  2020-12-01 10:00       ` Wols Lists
  3 siblings, 2 replies; 31+ messages in thread
From: c.buhtz @ 2020-12-01  8:45 UTC (permalink / raw)
  To: antlists; +Cc: David T-G, linux-raid

I think my missunderstand depends also on my bad english?

Am 30.11.2020 21:51 schrieb antlists:
> On 30/11/2020 20:05, David T-G wrote:
>> You don't see any "filesystem" or, more correctly, partition in your
>> 
>>    fdisk -l
>> 
>> output because you have apparently created your filesystem on the 
>> entire
>> device (hey, I didn't know one could do that!).
> 
> That, actually, is the norm. It is NOT normal to partition a raid 
> array.

In my understanding you are contradicting yourself here.
Is there a difference between
   "create filesystem on the entire device"
and
   "partition a raid array"
?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-12-01  8:41   ` buhtz
@ 2020-12-01  9:13     ` Reindl Harald
  0 siblings, 0 replies; 31+ messages in thread
From: Reindl Harald @ 2020-12-01  9:13 UTC (permalink / raw)
  To: buhtz, David T-G; +Cc: linux-raid



Am 01.12.20 um 09:41 schrieb buhtz@posteo.de:
> Dear David and others,
> 
> thanks a lot for so much discussion and details. I learn a lot.
> Following your discussions I see there still is some basic knowledge 
> missing on my side.
> 
> Am 30.11.2020 21:05 schrieb David T-G:
>> You don't see any "filesystem" or, more correctly, partition in your
>>
>>   fdisk -l
> 
> I do not see the partition in the output of "fdisk -l".
> 
> But I can (when both discs are present) mount /dev/md127 (manualy via 
> mount and via fstab) to /Daten and create files on it.
> 
>> So the display isn't interesting, although the logic behind that approach
>> certainly is to me.
> 
> I plugged in the nacked hard discs and they appear as /dev/sdb and 
> /dev/sdc. After that
>    mdadm --create /dev/md/md0 --level=1 --raid-devices=2 /dev/sdb /dev/sdc
> Then I did
>   ls -l /dev/md/md0 and found out this is just a link to /dev/md127.
> I formated the raid with
>   mkdfs.ext4 /dev/md127
> Then I mounted (first manually via mount and after sucess via fstab) 
> /dev/md127 to /Daten
> 
> Is this unusual?

that's normal, the RAID itself is a virtual device backed by the 
underlying disks

you can place a filesystem or even LVM on top of the RAID device and 
then place the filesystem on the LVM-device to combine the redundancy on 
the lower layer with the flexibility of LVM (but it would create another 
layer of complexity)

what i would normally recommend is not adding /dev/sda and /dev/sdb 
directly but create a partition with identical size (and some free space 
at the end) on both of them and add that partitions to the raid


[root@srv-rhsoft:~]$ df -hT
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/md1       ext4   29G  7.8G   21G  28% /
/dev/md2       ext4  3.6T  1.2T  2.4T  34% /mnt/data
/dev/md0       ext4  485M   48M  433M  10% /boot


[root@srv-rhsoft:~]$ cat /proc/mdstat
Personalities : [raid10] [raid1]
md1 : active raid10 sdc2[6] sdd2[5] sdb2[7] sda2[4]
       30716928 blocks super 1.1 256K chunks 2 near-copies [4/4] [UUUU]
       bitmap: 0/1 pages [0KB], 65536KB chunk

md2 : active raid10 sdd3[5] sdb3[7] sdc3[6] sda3[4]
       3875222528 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
       bitmap: 6/29 pages [24KB], 65536KB chunk

md0 : active raid1 sdc1[6] sdd1[5] sdb1[7] sda1[4]
       511988 blocks super 1.0 [4/4] [UUUU]


[root@srv-rhsoft:~]$ fdisk -l /dev/sda
Disk /dev/sda: 1.84 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000d9ef2

Device     Boot    Start        End    Sectors  Size Id Type
/dev/sda1  *        2048    1026047    1024000  500M fd Linux raid 
autodetect
/dev/sda2        1026048   31746047   30720000 14.7G fd Linux raid 
autodetect
/dev/sda3       31746048 3906971647 3875225600  1.8T fd Linux raid 
autodetect


[root@srv-rhsoft:~]$ fdisk -l /dev/sdb
Disk /dev/sdb: 1.84 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000d9ef2

Device     Boot    Start        End    Sectors  Size Id Type
/dev/sdb1  *        2048    1026047    1024000  500M fd Linux raid 
autodetect
/dev/sdb2        1026048   31746047   30720000 14.7G fd Linux raid 
autodetect
/dev/sdb3       31746048 3906971647 3875225600  1.8T fd Linux raid 
autodetect


[root@srv-rhsoft:~]$ fdisk -l /dev/sdc
Disk /dev/sdc: 1.84 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: Samsung SSD 850
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000d9ef2

Device     Boot    Start        End    Sectors  Size Id Type
/dev/sdc1  *        2048    1026047    1024000  500M fd Linux raid 
autodetect
/dev/sdc2        1026048   31746047   30720000 14.7G fd Linux raid 
autodetect
/dev/sdc3       31746048 3906971647 3875225600  1.8T fd Linux raid 
autodetect


[root@srv-rhsoft:~]$ fdisk -l /dev/sdd
Disk /dev/sdd: 1.84 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: Samsung SSD 850
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000d9ef2

Device     Boot    Start        End    Sectors  Size Id Type
/dev/sdd1  *        2048    1026047    1024000  500M fd Linux raid 
autodetect
/dev/sdd2        1026048   31746047   30720000 14.7G fd Linux raid 
autodetect
/dev/sdd3       31746048 3906971647 3875225600  1.8T fd Linux raid 
autodetect



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-12-01  8:45     ` partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc") c.buhtz
@ 2020-12-01  9:18       ` Rudy Zijlstra
  2020-12-01 10:00       ` Wols Lists
  1 sibling, 0 replies; 31+ messages in thread
From: Rudy Zijlstra @ 2020-12-01  9:18 UTC (permalink / raw)
  To: c.buhtz, antlists; +Cc: David T-G, linux-raid



On 01-12-2020 09:45, c.buhtz@posteo.jp wrote:
> I think my missunderstand depends also on my bad english?
>
> Am 30.11.2020 21:51 schrieb antlists:
>> On 30/11/2020 20:05, David T-G wrote:
>>> You don't see any "filesystem" or, more correctly, partition in your
>>>
>>>    fdisk -l
>>>
>>> output because you have apparently created your filesystem on the 
>>> entire
>>> device (hey, I didn't know one could do that!).
>>
>> That, actually, is the norm. It is NOT normal to partition a raid array.
>
> In my understanding you are contradicting yourself here.
> Is there a difference between
>   "create filesystem on the entire device"
> and
>   "partition a raid array"
> ?
Yes, they are not the same.

   "create filesystem on the entire device"
you create a filesystem on the raw device, like with "mkfs.xfs /dev/sdc" 
or "mkfs.xfs /dev/md0"

a lot of people do this with raid devices, but it is rarely if ever done 
directly on HDD. afaik most installers will do this with raid devices, 
but never with normal HDD

"partition a raid array"
this means creating a partition table on the raid array, i.e. "fdisk 
/dev/md0" and creating partitions on it.
Here opinions differ. I am someone who often does this, and has 
production machines with partitioned raid :)

wols thinks it is not to be done.

Cheers

Rudy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc")
  2020-12-01  8:45     ` partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc") c.buhtz
  2020-12-01  9:18       ` Rudy Zijlstra
@ 2020-12-01 10:00       ` Wols Lists
  1 sibling, 0 replies; 31+ messages in thread
From: Wols Lists @ 2020-12-01 10:00 UTC (permalink / raw)
  To: c.buhtz; +Cc: David T-G, linux-raid

On 01/12/20 08:45, c.buhtz@posteo.jp wrote:
> I think my missunderstand depends also on my bad english?
> 
> Am 30.11.2020 21:51 schrieb antlists:
>> On 30/11/2020 20:05, David T-G wrote:
>>> You don't see any "filesystem" or, more correctly, partition in your
>>>
>>>    fdisk -l
>>>
>>> output because you have apparently created your filesystem on the entire
>>> device (hey, I didn't know one could do that!).
>>
>> That, actually, is the norm. It is NOT normal to partition a raid array.
> 
> In my understanding you are contradicting yourself here.
> Is there a difference between
>   "create filesystem on the entire device"
> and
>   "partition a raid array"
> ?
Yes.

Creating a raid array on an entire device means

"mdadm --create --devices sda sdb"

partioning a raid array means

"fdisk /dev/md127"

Remember linux doesn't care what a block device is, it's a block device.
So you can put a raid array directly on top of physical disks, and you
can put a GPT on top of a raid array.

Neither are recommended.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-12-01 10:01 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-30  8:44 “root account locked” after removing one RAID1 hard disc c.buhtz
2020-11-30  9:27 ` antlists
2020-11-30 10:29   ` c.buhtz
2020-11-30 11:40     ` Wols Lists
2020-11-30 10:31   ` Reindl Harald
2020-11-30 11:10     ` Rudy Zijlstra
2020-11-30 11:18       ` Reindl Harald
2020-11-30 20:06         ` ???root account locked??? " David T-G
2020-11-30 21:57           ` Reindl Harald
2020-11-30 22:06             ` RAID repair script (was "Re: ???root account locked??? after removing one RAID1 hard disc" David T-G
2020-11-30 12:00     ` “root account locked” after removing one RAID1 hard disc Wols Lists
2020-11-30 12:13       ` Reindl Harald
2020-11-30 13:11         ` antlists
2020-11-30 13:16           ` Reindl Harald
2020-11-30 13:47             ` antlists
2020-11-30 13:53               ` Reindl Harald
2020-11-30 14:46                 ` Rudy Zijlstra
2020-11-30 20:05 ` partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc") David T-G
2020-11-30 20:51   ` antlists
2020-11-30 21:03     ` Rudy Zijlstra
2020-11-30 21:49     ` Reindl Harald
2020-11-30 22:31       ` antlists
2020-11-30 23:21         ` Reindl Harald
2020-11-30 23:59           ` antlists
2020-11-30 22:04     ` partitions & filesystems David T-G
2020-12-01  8:45     ` partitions & filesystems (was "Re: ???root account locked??? after removing one RAID1 hard disc") c.buhtz
2020-12-01  9:18       ` Rudy Zijlstra
2020-12-01 10:00       ` Wols Lists
2020-12-01  8:41   ` buhtz
2020-12-01  9:13     ` Reindl Harald
2020-12-01  8:42   ` c.buhtz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.