how to not loose all production when one disk fail in a raid1 btrfs pool

All of lore.kernel.org
 help / color / mirror / Atom feed

* how to not loose all production when one disk fail in a raid1 btrfs pool
@ 2022-02-28 14:48 Ghislain Adnet
  2022-03-03  7:04 ` Forza
  0 siblings, 1 reply; 5+ messages in thread
From: Ghislain Adnet @ 2022-02-28 14:48 UTC (permalink / raw)
  To: linux-btrfs

hi,

   All the raid i know since btrfs are made so you dont loose data and dont loose uptime when just one drive in a raid1 system fails.
   you can check the failure and replace the drive.

   I just had a btrfs raid 1 that lost a ssd and the system immediatly stopped functionning (was the root FS). Seems the way it works

   As far as i can see when i search on the net it seems that btrfs is not made for that, it just protect data loss but fail the system and wait for you to change the disk.

   After some googling i find no way to make it work like all other raid works, to protect uptime and have transparent crash/recovery, it seems that running in degraded mode all the time is not usable and dangerous.

   Is there a way to make Btrfs function like all other raid system or is it a special case   ?

https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices

-- 
cordialement,
Ghislain ADNET.
AQUEOS.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how to not loose all production when one disk fail in a raid1 btrfs pool
  2022-02-28 14:48 how to not loose all production when one disk fail in a raid1 btrfs pool Ghislain Adnet
@ 2022-03-03  7:04 ` Forza
  2022-03-06  9:41   ` Ghislain Adnet
  0 siblings, 1 reply; 5+ messages in thread
From: Forza @ 2022-03-03  7:04 UTC (permalink / raw)
  To: Ghislain Adnet, linux-btrfs

On 2/28/22 15:48, Ghislain Adnet wrote:
> hi,
> 
>    All the raid i know since btrfs are made so you dont loose data and 
> dont loose uptime when just one drive in a raid1 system fails.
>    you can check the failure and replace the drive.
> 
>    I just had a btrfs raid 1 that lost a ssd and the system immediatly 
> stopped functionning (was the root FS). Seems the way it works
> 

Hi,

I do not believe that this is how it should work. Btrfs RAID1 should 
survive a complete device failure as well as data corruption on one device.

Can you explain a little more about what happened when the SSD failed?

One possible explanation for a failure is that you had mixed block 
groups. This means that you had some SINGLE block groups in addition to 
RAID1 block groups. If those are on the failed SSD, the filesystem would 
turn RO on a device failure.

Mixed block groups can happen for many reasons. You need to check your 
current setup with `btrfs filesystem usage /mnt/`

> 
>    As far as i can see when i search on the net it seems that btrfs is 
> not made for that, it just protect data loss but fail the system and 
> wait for you to change the disk.
> 
>    After some googling i find no way to make it work like all other raid 
> works, to protect uptime and have transparent crash/recovery, it seems 
> that running in degraded mode all the time is not usable and dangerous.
> 

Running in degraded mode is not recommended. It can also lead to mixed 
block groups as I mentioned above.

> 
>    Is there a way to make Btrfs function like all other raid system or 
> is it a special case   ?
> 

Can you elaborate a little on what you mean here?

> 
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices 
> 
> 

The Btrfs wiki does not mention that you should check the chunk 
allocation for SINGLE block groups after replacing a disk. This is 
important or you may not actually have full redundancy even after 
replacing a disk. I wrote about that over at 
https://wiki.tnonline.net/w/Btrfs/Replacing_a_disk#Restoring_redundancy_after_a_replaced_disk

> 

Thanks,
Forza

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how to not loose all production when one disk fail in a raid1 btrfs pool
  2022-03-03  7:04 ` Forza
@ 2022-03-06  9:41   ` Ghislain Adnet
  2022-03-06 14:30     ` Forza
  0 siblings, 1 reply; 5+ messages in thread
From: Ghislain Adnet @ 2022-03-06  9:41 UTC (permalink / raw)
  To: Forza, linux-btrfs

hi !

> 
> I do not believe that this is how it should work. Btrfs RAID1 should survive a complete device failure as well as data corruption on one device.
> 
> Can you explain a little more about what happened when the SSD failed?


the /dev/sdb ssd disk failed in the nigth and disapeared instantly. Next morning the computer was crashed. I have lots of


BTRFS error (device sda4): bdev /dev/sdb4 errs: wr 25186451, rd 5822, flush 3878537, corrupt 0, gen 0
BTRFS error (device sda4): error writing primary super block to device 2
BTRFS warning (device sda4): lost page write due to IO error on /dev/sdb4 (-5)

but it seems i got the order wrong , the disk failed near 23h and the server crashed later in the early morning. So the raid was working for a part of the night then.



> One possible explanation for a failure is that you had mixed block groups. This means that you had some SINGLE block groups in addition to RAID1 block groups. If those are on the failed SSD, the filesystem would turn RO on a device failure.
> 
> Mixed block groups can happen for many reasons. You need to check your current setup with `btrfs filesystem usage /mnt/`

the situation after the disk replacement is below, unfortunatly i dont have the one before the sdb breakdown

Overall:
     Device size:		 796.16GiB
     Device allocated:		  22.06GiB
     Device unallocated:		 774.10GiB
     Device missing:		     0.00B
     Used:			  19.38GiB
     Free (estimated):		 387.40GiB	(min: 387.40GiB)
     Data ratio:			      2.00
     Metadata ratio:		      2.00
     Global reserve:		  25.00MiB	(used: 0.00B)

Data,RAID1: Size:10.00GiB, Used:9.65GiB (96.48%)
    /dev/sda4	  10.00GiB
    /dev/sdb4	  10.00GiB

Metadata,RAID1: Size:1.00GiB, Used:43.48MiB (4.25%)
    /dev/sda4	   1.00GiB
    /dev/sdb4	   1.00GiB

System,RAID1: Size:32.00MiB, Used:16.00KiB (0.05%)
    /dev/sda4	  32.00MiB
    /dev/sdb4	  32.00MiB

Unallocated:
    /dev/sda4	 387.05GiB
    /dev/sdb4	 387.05GiB

----------------------
Overall:
     Device size:		 796.16GiB
     Device allocated:		  22.06GiB
     Device unallocated:		 774.10GiB
     Device missing:		     0.00B
     Used:			  19.38GiB
     Free (estimated):		 387.40GiB	(min: 387.40GiB)
     Data ratio:			      2.00
     Metadata ratio:		      2.00
     Global reserve:		  25.00MiB	(used: 0.00B)

              Data     Metadata System
Id Path      RAID1    RAID1    RAID1    Unallocated
-- --------- -------- -------- -------- -----------
  1 /dev/sda4 10.00GiB  1.00GiB 32.00MiB   402.98GiB
  2 /dev/sdb4 10.00GiB  1.00GiB 32.00MiB   402.98GiB
-- --------- -------- -------- -------- -----------
    Total     10.00GiB  1.00GiB 32.00MiB   805.97GiB
    Used       9.65GiB 43.48MiB 16.00KiB


you think that before the disk was with some part on single disk and when this part was hit the system crashed ?


> 
> Running in degraded mode is not recommended. It can also lead to mixed block groups as I mentioned above.

ok that's what i saw on various post on the net. thanks !
> 
>>
>>    Is there a way to make Btrfs function like all other raid system or is it a special case   ?
>>
> 
> Can you elaborate a little on what you mean here?
> 
>>
>> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
>>
> 
> The Btrfs wiki does not mention that you should check the chunk allocation for SINGLE block groups after replacing a disk. This is important or you may not actually have full redundancy even after replacing a disk. I wrote about that over at https://wiki.tnonline.net/w/Btrfs/Replacing_a_disk#Restoring_redundancy_after_a_replaced_disk

after that incident i did some googling and there was some articles that included information about raid would not work again until you mount system in degraded mode and with my experience in this incident  it seemed to me that the behavior was to fail and wait for replacement.
also all the 'tutorial' i found speak about restarting things and mounting in degraded mode also.
well seems there was somethign else in play here.

Thanks for your help it made me realise i reverted some event in the logs in the urge to get back on foot.

-- 
cordialement,
Ghislain


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how to not loose all production when one disk fail in a raid1 btrfs pool
  2022-03-06  9:41   ` Ghislain Adnet
@ 2022-03-06 14:30     ` Forza
  2022-03-09 10:26       ` Ghislain Adnet
  0 siblings, 1 reply; 5+ messages in thread
From: Forza @ 2022-03-06 14:30 UTC (permalink / raw)
  To: Ghislain Adnet, linux-btrfs




On 2022-03-06 10:41, Ghislain Adnet wrote:
> hi !
> 
>>
>> I do not believe that this is how it should work. Btrfs RAID1 should 
>> survive a complete device failure as well as data corruption on one 
>> device.
>>
>> Can you explain a little more about what happened when the SSD failed?
> 
> 
> the /dev/sdb ssd disk failed in the nigth and disapeared instantly. Next 
> morning the computer was crashed. I have lots of
> 
> 
> BTRFS error (device sda4): bdev /dev/sdb4 errs: wr 25186451, rd 5822, 
> flush 3878537, corrupt 0, gen 0
> BTRFS error (device sda4): error writing primary super block to device 2
> BTRFS warning (device sda4): lost page write due to IO error on 
> /dev/sdb4 (-5)
> 
> but it seems i got the order wrong , the disk failed near 23h and the 
> server crashed later in the early morning. So the raid was working for a 
> part of the night then.
> 

It is conceivable that the amount of errors over some time triggered 
some other bug that lead to a crash.

> 
> 
>> One possible explanation for a failure is that you had mixed block 
>> groups. This means that you had some SINGLE block groups in addition 
>> to RAID1 block groups. If those are on the failed SSD, the filesystem 
>> would turn RO on a device failure.
>>
>> Mixed block groups can happen for many reasons. You need to check your 
>> current setup with `btrfs filesystem usage /mnt/`
> 
> the situation after the disk replacement is below, unfortunatly i dont 
> have the one before the sdb breakdown
> 
> Overall:
>      Device size:         796.16GiB
>      Device allocated:          22.06GiB
>      Device unallocated:         774.10GiB
>      Device missing:             0.00B
>      Used:              19.38GiB
>      Free (estimated):         387.40GiB    (min: 387.40GiB)
>      Data ratio:                  2.00
>      Metadata ratio:              2.00
>      Global reserve:          25.00MiB    (used: 0.00B)
> 
> Data,RAID1: Size:10.00GiB, Used:9.65GiB (96.48%)
>     /dev/sda4      10.00GiB
>     /dev/sdb4      10.00GiB
> 
> Metadata,RAID1: Size:1.00GiB, Used:43.48MiB (4.25%)
>     /dev/sda4       1.00GiB
>     /dev/sdb4       1.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:16.00KiB (0.05%)
>     /dev/sda4      32.00MiB
>     /dev/sdb4      32.00MiB
> 
> Unallocated:
>     /dev/sda4     387.05GiB
>     /dev/sdb4     387.05GiB
> 
> ----------------------
> Overall:
>      Device size:         796.16GiB
>      Device allocated:          22.06GiB
>      Device unallocated:         774.10GiB
>      Device missing:             0.00B
>      Used:              19.38GiB
>      Free (estimated):         387.40GiB    (min: 387.40GiB)
>      Data ratio:                  2.00
>      Metadata ratio:              2.00
>      Global reserve:          25.00MiB    (used: 0.00B)
> 
>               Data     Metadata System
> Id Path      RAID1    RAID1    RAID1    Unallocated
> -- --------- -------- -------- -------- -----------
>   1 /dev/sda4 10.00GiB  1.00GiB 32.00MiB   402.98GiB
>   2 /dev/sdb4 10.00GiB  1.00GiB 32.00MiB   402.98GiB
> -- --------- -------- -------- -------- -----------
>     Total     10.00GiB  1.00GiB 32.00MiB   805.97GiB
>     Used       9.65GiB 43.48MiB 16.00KiB
> 
> 
> you think that before the disk was with some part on single disk and 
> when this part was hit the system crashed ?
> 

No, that output looks alright.

> 
>>
>> Running in degraded mode is not recommended. It can also lead to mixed 
>> block groups as I mentioned above.
> 
> ok that's what i saw on various post on the net. thanks !

Two different situation:
1) a device fails while mounted. You do not need to mount degraded to 
continue to operate. a `btrfs replace -r /dev/broken /dev/new /mnt` can 
be used to fix it online.
2) If the device failed and the filesystem is unmounted. In this case 
you need to use `mount -o degraded` because of a missing device.

However, having "degraded" in fstab or on the kernel command line could 
lead to issues where the filesystem gets mounted before all devices are 
found. This can lead to issues.

>>
>>>
>>>    Is there a way to make Btrfs function like all other raid system 
>>> or is it a special case   ?
>>>
>>
>> Can you elaborate a little on what you mean here?
>>
>>>
>>> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices 
>>>
>>>
>>
>> The Btrfs wiki does not mention that you should check the chunk 
>> allocation for SINGLE block groups after replacing a disk. This is 
>> important or you may not actually have full redundancy even after 
>> replacing a disk. I wrote about that over at 
>> https://wiki.tnonline.net/w/Btrfs/Replacing_a_disk#Restoring_redundancy_after_a_replaced_disk 
>>
> 
> after that incident i did some googling and there was some articles that 
> included information about raid would not work again until you mount 
> system in degraded mode and with my experience in this incident  it 
> seemed to me that the behavior was to fail and wait for replacement.
> also all the 'tutorial' i found speak about restarting things and 
> mounting in degraded mode also.
> well seems there was somethign else in play here.
> 
> Thanks for your help it made me realise i reverted some event in the 
> logs in the urge to get back on foot.
> 

Yes, there are several articles and information "out there" that doesn't 
have all things correct. Best when in doubt is to ask on the mailing 
lists or check the #btrfs IRC channel[*] for help.

It is possible to run a Btrfs RAID1 with only one device in degraded 
mode while waiting for a replacement drive. It is, however, not 
recommended to do so for any extended periods because any errors on the 
remaining device could not be corrected.

Thanks,
Forza


[*] https://web.libera.chat/#btrfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how to not loose all production when one disk fail in a raid1 btrfs pool
  2022-03-06 14:30     ` Forza
@ 2022-03-09 10:26       ` Ghislain Adnet
  0 siblings, 0 replies; 5+ messages in thread
From: Ghislain Adnet @ 2022-03-09 10:26 UTC (permalink / raw)
  To: Forza; +Cc: linux-btrfs


>> but it seems i got the order wrong , the disk failed near 23h and the server crashed later in the early morning. So the raid was working for a part of the night then.
>>
> 
> It is conceivable that the amount of errors over some time triggered some other bug that lead to a crash.

yes never had any on mdadm but so many variables here...could be btrfs could ba another part of the kernel.


> Yes, there are several articles and information "out there" that doesn't have all things correct. Best when in doubt is to ask on the mailing lists or check the #btrfs IRC channel[*] for help.
> 
> It is possible to run a Btrfs RAID1 with only one device in degraded mode while waiting for a replacement drive. It is, however, not recommended to do so for any extended periods because any errors on the remaining device could not be corrected.
thanks for your very valuable input on this !


cordialement,
Ghislain


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-09 10:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-28 14:48 how to not loose all production when one disk fail in a raid1 btrfs pool Ghislain Adnet
2022-03-03  7:04 ` Forza
2022-03-06  9:41   ` Ghislain Adnet
2022-03-06 14:30     ` Forza
2022-03-09 10:26       ` Ghislain Adnet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.