All of lore.kernel.org
 help / color / mirror / Atom feed
* BTRFS and power loss ~= corruption?
@ 2011-08-24 13:11 Berend Dekens
  2011-08-24 13:31 ` Arne Jansen
  2011-08-25 23:01 ` Gregory Maxwell
  0 siblings, 2 replies; 16+ messages in thread
From: Berend Dekens @ 2011-08-24 13:11 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have followed the progress made in the btrfs filesystem over time and 
while I have experimented with it a little in a VM, I have not yet used 
it in a production machine.

While the lack of a complete fsck was a major issue (I read the update 
that the first working version is about to be released) I am still 
worried about an issue I see popping up.

How is it possible that a copy-on-write filesystem becomes corrupted if 
a power failure occurs? I assume this means that even (hard) resetting a 
computer can result in a corrupt filesystem.

I thought the idea of COW was that whatever happens, you can always 
mount in a semi-consistent state?

As far as I can see, you wind up with this:
- No outstanding writes when power down
- File write complete, tree structure is updated. Since everything is 
hashed and duplicated, unless the update propagates to the highest 
level, the write will simply disappear upon failure. While this might be 
rectified with a fsck, there should be no problems mounting the 
filesystem (read-only if need be)
- Writes are not completed on all disks/partitions at the same time. The 
checksums will detect these errors and once again, the write disappears 
unless it is salvaged by a fsck.

Am I missing something? How come there seem to be plenty people with a 
corrupt btrfs after a power failure? And why haven't I experienced 
similar issues where a filesystem becomes unmountable with say NTFS or 
Ext3/4?

-- 
Regards,
Berend Dekens


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-24 13:11 BTRFS and power loss ~= corruption? Berend Dekens
@ 2011-08-24 13:31 ` Arne Jansen
  2011-08-24 15:01   ` Berend Dekens
  2011-08-25 23:01 ` Gregory Maxwell
  1 sibling, 1 reply; 16+ messages in thread
From: Arne Jansen @ 2011-08-24 13:31 UTC (permalink / raw)
  To: Berend Dekens; +Cc: linux-btrfs

On 24.08.2011 15:11, Berend Dekens wrote:
> Hi,
> 
> I have followed the progress made in the btrfs filesystem over time and while I have experimented with it a little in a VM, I have not yet used it in a production machine.
> 
> While the lack of a complete fsck was a major issue (I read the update that the first working version is about to be released) I am still worried about an issue I see popping up.
> 
> How is it possible that a copy-on-write filesystem becomes corrupted if a power failure occurs? I assume this means that even (hard) resetting a computer can result in a corrupt filesystem.
> 
> I thought the idea of COW was that whatever happens, you can always mount in a semi-consistent state?
> 
> As far as I can see, you wind up with this:
> - No outstanding writes when power down
> - File write complete, tree structure is updated. Since everything is hashed and duplicated, unless the update propagates to the highest level, the write will simply disappear upon failure. While this might be rectified with a fsck, there should be no problems mounting the filesystem (read-only if need be)
> - Writes are not completed on all disks/partitions at the same time. The checksums will detect these errors and once again, the write disappears unless it is salvaged by a fsck.
> 
> Am I missing something? How come there seem to be plenty people with a corrupt btrfs after a power failure? And why haven't I experienced similar issues where a filesystem becomes unmountable with say NTFS or Ext3/4?
> 

Problems arise when in your scenario writes from higher levels in the
tree hit the disk earlier than updates on lower levels. In this case
the tree is broken and the fs is unmountable.
Of course btrfs takes care of the order it writes, but problems arise
when the disk is lying about whether a write is stable on disk, i.e.
about cache flushes or barriers.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-24 13:31 ` Arne Jansen
@ 2011-08-24 15:01   ` Berend Dekens
  2011-08-24 15:04     ` *** GMX Spamverdacht *** " Arne Jansen
  0 siblings, 1 reply; 16+ messages in thread
From: Berend Dekens @ 2011-08-24 15:01 UTC (permalink / raw)
  To: Arne Jansen; +Cc: linux-btrfs

On 24/08/11 15:31, Arne Jansen wrote:
> On 24.08.2011 15:11, Berend Dekens wrote:
>> Hi,
>>
>> I have followed the progress made in the btrfs filesystem over time and while I have experimented with it a little in a VM, I have not yet used it in a production machine.
>>
>> While the lack of a complete fsck was a major issue (I read the update that the first working version is about to be released) I am still worried about an issue I see popping up.
>>
>> How is it possible that a copy-on-write filesystem becomes corrupted if a power failure occurs? I assume this means that even (hard) resetting a computer can result in a corrupt filesystem.
>>
>> I thought the idea of COW was that whatever happens, you can always mount in a semi-consistent state?
>>
>> As far as I can see, you wind up with this:
>> - No outstanding writes when power down
>> - File write complete, tree structure is updated. Since everything is hashed and duplicated, unless the update propagates to the highest level, the write will simply disappear upon failure. While this might be rectified with a fsck, there should be no problems mounting the filesystem (read-only if need be)
>> - Writes are not completed on all disks/partitions at the same time. The checksums will detect these errors and once again, the write disappears unless it is salvaged by a fsck.
>>
>> Am I missing something? How come there seem to be plenty people with a corrupt btrfs after a power failure? And why haven't I experienced similar issues where a filesystem becomes unmountable with say NTFS or Ext3/4?
> Problems arise when in your scenario writes from higher levels in the
> tree hit the disk earlier than updates on lower levels. In this case
> the tree is broken and the fs is unmountable.
> Of course btrfs takes care of the order it writes, but problems arise
> when the disk is lying about whether a write is stable on disk, i.e.
> about cache flushes or barriers.
Ah, I see. So the issue is not with the software implementation at all 
but only arises when hardware acknowledges flushes and barriers before 
they actually complete?

Is this a common problem of hard disks?

Regards,
Berend Dekens

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: *** GMX Spamverdacht *** Re: BTRFS and power loss ~= corruption?
  2011-08-24 15:01   ` Berend Dekens
@ 2011-08-24 15:04     ` Arne Jansen
  2011-08-24 15:13       ` Berend Dekens
  0 siblings, 1 reply; 16+ messages in thread
From: Arne Jansen @ 2011-08-24 15:04 UTC (permalink / raw)
  To: Berend Dekens; +Cc: linux-btrfs

On 24.08.2011 17:01, Berend Dekens wrote:
> On 24/08/11 15:31, Arne Jansen wrote:
>> On 24.08.2011 15:11, Berend Dekens wrote:
>>> Hi,
>>>
>>> I have followed the progress made in the btrfs filesystem over time and while I have experimented with it a little in a VM, I have not yet used it in a production machine.
>>>
>>> While the lack of a complete fsck was a major issue (I read the update that the first working version is about to be released) I am still worried about an issue I see popping up.
>>>
>>> How is it possible that a copy-on-write filesystem becomes corrupted if a power failure occurs? I assume this means that even (hard) resetting a computer can result in a corrupt filesystem.
>>>
>>> I thought the idea of COW was that whatever happens, you can always mount in a semi-consistent state?
>>>
>>> As far as I can see, you wind up with this:
>>> - No outstanding writes when power down
>>> - File write complete, tree structure is updated. Since everything is hashed and duplicated, unless the update propagates to the highest level, the write will simply disappear upon failure. While this might be rectified with a fsck, there should be no problems mounting the filesystem (read-only if need be)
>>> - Writes are not completed on all disks/partitions at the same time. The checksums will detect these errors and once again, the write disappears unless it is salvaged by a fsck.
>>>
>>> Am I missing something? How come there seem to be plenty people with a corrupt btrfs after a power failure? And why haven't I experienced similar issues where a filesystem becomes unmountable with say NTFS or Ext3/4?
>> Problems arise when in your scenario writes from higher levels in the
>> tree hit the disk earlier than updates on lower levels. In this case
>> the tree is broken and the fs is unmountable.
>> Of course btrfs takes care of the order it writes, but problems arise
>> when the disk is lying about whether a write is stable on disk, i.e.
>> about cache flushes or barriers.
> Ah, I see. So the issue is not with the software implementation at all but only arises when hardware acknowledges flushes and barriers before they actually complete?

It doesn't mean there aren't any bugs left in the software stack ;)
> 
> Is this a common problem of hard disks?

Only of very cheap ones. USB enclosures might add to the problem, too.
Also some SSDs are rumored to be bad in this regard.
Another problem are layers between btrfs and the hardware, like
encryption.

> 
> Regards,
> Berend Dekens
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-24 15:04     ` *** GMX Spamverdacht *** " Arne Jansen
@ 2011-08-24 15:13       ` Berend Dekens
  2011-08-24 17:06         ` Mitch Harder
  0 siblings, 1 reply; 16+ messages in thread
From: Berend Dekens @ 2011-08-24 15:13 UTC (permalink / raw)
  To: Arne Jansen; +Cc: linux-btrfs

On 24/08/11 17:04, Arne Jansen wrote:
> On 24.08.2011 17:01, Berend Dekens wrote:
>> On 24/08/11 15:31, Arne Jansen wrote:
>>> On 24.08.2011 15:11, Berend Dekens wrote:
>>>> Hi,
>>>>
>>>> I have followed the progress made in the btrfs filesystem over time and while I have experimented with it a little in a VM, I have not yet used it in a production machine.
>>>>
>>>> While the lack of a complete fsck was a major issue (I read the update that the first working version is about to be released) I am still worried about an issue I see popping up.
>>>>
>>>> How is it possible that a copy-on-write filesystem becomes corrupted if a power failure occurs? I assume this means that even (hard) resetting a computer can result in a corrupt filesystem.
>>>>
>>>> I thought the idea of COW was that whatever happens, you can always mount in a semi-consistent state?
>>>>
>>>> As far as I can see, you wind up with this:
>>>> - No outstanding writes when power down
>>>> - File write complete, tree structure is updated. Since everything is hashed and duplicated, unless the update propagates to the highest level, the write will simply disappear upon failure. While this might be rectified with a fsck, there should be no problems mounting the filesystem (read-only if need be)
>>>> - Writes are not completed on all disks/partitions at the same time. The checksums will detect these errors and once again, the write disappears unless it is salvaged by a fsck.
>>>>
>>>> Am I missing something? How come there seem to be plenty people with a corrupt btrfs after a power failure? And why haven't I experienced similar issues where a filesystem becomes unmountable with say NTFS or Ext3/4?
>>> Problems arise when in your scenario writes from higher levels in the
>>> tree hit the disk earlier than updates on lower levels. In this case
>>> the tree is broken and the fs is unmountable.
>>> Of course btrfs takes care of the order it writes, but problems arise
>>> when the disk is lying about whether a write is stable on disk, i.e.
>>> about cache flushes or barriers.
>> Ah, I see. So the issue is not with the software implementation at all but only arises when hardware acknowledges flushes and barriers before they actually complete?
> It doesn't mean there aren't any bugs left in the software stack ;)
Naturally, but the fact that its very likely that the corruption stories 
I've been reading about are caused by misbehaving hardware set my mind 
at ease about experimenting further with btrfs (although I will await 
the fsck before attempting things in production).
>> Is this a common problem of hard disks?
> Only of very cheap ones. USB enclosures might add to the problem, too.
> Also some SSDs are rumored to be bad in this regard.
> Another problem are layers between btrfs and the hardware, like
> encryption.
I am - and will be - using btrfs straight on hard disks, no lvm, 
(soft)raid, encryption or other layers.

My hard drives are not that fancy (no 15k raptors here); I usually buy 
hardware from the major suppliers (WD, Maxtor, Seagate, Hitachi etc). 
Also, until the fast cache mode for SSDs in combination with rotating 
hardware becomes stable, I'll stick to ordinary hard drives.

Thank you for clarifying things.

Regards,
Berend Dekens

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-24 15:13       ` Berend Dekens
@ 2011-08-24 17:06         ` Mitch Harder
  2011-08-24 21:00           ` Ahmed Kamal
  2011-08-25  3:31           ` Anand Jain
  0 siblings, 2 replies; 16+ messages in thread
From: Mitch Harder @ 2011-08-24 17:06 UTC (permalink / raw)
  To: Berend Dekens; +Cc: Arne Jansen, linux-btrfs

On Wed, Aug 24, 2011 at 10:13 AM, Berend Dekens <btrfs@cyberwizzard.nl> wrote:
> On 24/08/11 17:04, Arne Jansen wrote:
>>
>> On 24.08.2011 17:01, Berend Dekens wrote:
>>>
>>> On 24/08/11 15:31, Arne Jansen wrote:
>>>>
>>>> On 24.08.2011 15:11, Berend Dekens wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have followed the progress made in the btrfs filesystem over time and
>>>>> while I have experimented with it a little in a VM, I have not yet used it
>>>>> in a production machine.
>>>>>
>>>>> While the lack of a complete fsck was a major issue (I read the update
>>>>> that the first working version is about to be released) I am still worried
>>>>> about an issue I see popping up.
>>>>>
>>>>> How is it possible that a copy-on-write filesystem becomes corrupted if
>>>>> a power failure occurs? I assume this means that even (hard) resetting a
>>>>> computer can result in a corrupt filesystem.
>>>>>
>>>>> I thought the idea of COW was that whatever happens, you can always
>>>>> mount in a semi-consistent state?
>>>>>
>>>>> As far as I can see, you wind up with this:
>>>>> - No outstanding writes when power down
>>>>> - File write complete, tree structure is updated. Since everything is
>>>>> hashed and duplicated, unless the update propagates to the highest level,
>>>>> the write will simply disappear upon failure. While this might be rectified
>>>>> with a fsck, there should be no problems mounting the filesystem (read-only
>>>>> if need be)
>>>>> - Writes are not completed on all disks/partitions at the same time.
>>>>> The checksums will detect these errors and once again, the write disappears
>>>>> unless it is salvaged by a fsck.
>>>>>
>>>>> Am I missing something? How come there seem to be plenty people with a
>>>>> corrupt btrfs after a power failure? And why haven't I experienced similar
>>>>> issues where a filesystem becomes unmountable with say NTFS or Ext3/4?
>>>>
>>>> Problems arise when in your scenario writes from higher levels in the
>>>> tree hit the disk earlier than updates on lower levels. In this case
>>>> the tree is broken and the fs is unmountable.
>>>> Of course btrfs takes care of the order it writes, but problems arise
>>>> when the disk is lying about whether a write is stable on disk, i.e.
>>>> about cache flushes or barriers.
>>>
>>> Ah, I see. So the issue is not with the software implementation at all
>>> but only arises when hardware acknowledges flushes and barriers before they
>>> actually complete?
>>
>> It doesn't mean there aren't any bugs left in the software stack ;)
>
> Naturally, but the fact that its very likely that the corruption stories
> I've been reading about are caused by misbehaving hardware set my mind at
> ease about experimenting further with btrfs (although I will await the fsck
> before attempting things in production).
>>>
>>> Is this a common problem of hard disks?
>>
>> Only of very cheap ones. USB enclosures might add to the problem, too.
>> Also some SSDs are rumored to be bad in this regard.
>> Another problem are layers between btrfs and the hardware, like
>> encryption.
>
> I am - and will be - using btrfs straight on hard disks, no lvm, (soft)raid,
> encryption or other layers.
>
> My hard drives are not that fancy (no 15k raptors here); I usually buy
> hardware from the major suppliers (WD, Maxtor, Seagate, Hitachi etc). Also,
> until the fast cache mode for SSDs in combination with rotating hardware
> becomes stable, I'll stick to ordinary hard drives.
>
> Thank you for clarifying things.
>

I have to admit I've been beginning to wonder if we picked up a
regression somewhere along the way with respect to corruptions after
power outages.

I'm lucky enough to have very unreliable power.  Btrfs was always
robust for me on power outages until recently.  Now I've recently had
two corrupted volumes on unclean shutdowns and power outages.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-24 17:06         ` Mitch Harder
@ 2011-08-24 21:00           ` Ahmed Kamal
  2011-08-25  3:31           ` Anand Jain
  1 sibling, 0 replies; 16+ messages in thread
From: Ahmed Kamal @ 2011-08-24 21:00 UTC (permalink / raw)
  To: Mitch Harder; +Cc: Berend Dekens, Arne Jansen, linux-btrfs

AFAIK, ZFS compats lying disks by rolling back to the latest mountable
uber block (i.e. the latest tree that was completely and successfully
written to disk), does btrfs do something similar today ?

On Wed, Aug 24, 2011 at 7:06 PM, Mitch Harder
<mitch.harder@sabayonlinux.org> wrote:
>
> On Wed, Aug 24, 2011 at 10:13 AM, Berend Dekens <btrfs@cyberwizzard.n=
l> wrote:
> > On 24/08/11 17:04, Arne Jansen wrote:
> >>
> >> On 24.08.2011 17:01, Berend Dekens wrote:
> >>>
> >>> On 24/08/11 15:31, Arne Jansen wrote:
> >>>>
> >>>> On 24.08.2011 15:11, Berend Dekens wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I have followed the progress made in the btrfs filesystem over =
time and
> >>>>> while I have experimented with it a little in a VM, I have not =
yet used it
> >>>>> in a production machine.
> >>>>>
> >>>>> While the lack of a complete fsck was a major issue (I read the=
 update
> >>>>> that the first working version is about to be released) I am st=
ill worried
> >>>>> about an issue I see popping up.
> >>>>>
> >>>>> How is it possible that a copy-on-write filesystem becomes corr=
upted if
> >>>>> a power failure occurs? I assume this means that even (hard) re=
setting a
> >>>>> computer can result in a corrupt filesystem.
> >>>>>
> >>>>> I thought the idea of COW was that whatever happens, you can al=
ways
> >>>>> mount in a semi-consistent state?
> >>>>>
> >>>>> As far as I can see, you wind up with this:
> >>>>> - No outstanding writes when power down
> >>>>> - File write complete, tree structure is updated. Since everyth=
ing is
> >>>>> hashed and duplicated, unless the update propagates to the high=
est level,
> >>>>> the write will simply disappear upon failure. While this might =
be rectified
> >>>>> with a fsck, there should be no problems mounting the filesyste=
m (read-only
> >>>>> if need be)
> >>>>> - Writes are not completed on all disks/partitions at the same =
time.
> >>>>> The checksums will detect these errors and once again, the writ=
e disappears
> >>>>> unless it is salvaged by a fsck.
> >>>>>
> >>>>> Am I missing something? How come there seem to be plenty people=
 with a
> >>>>> corrupt btrfs after a power failure? And why haven't I experien=
ced similar
> >>>>> issues where a filesystem becomes unmountable with say NTFS or =
Ext3/4?
> >>>>
> >>>> Problems arise when in your scenario writes from higher levels i=
n the
> >>>> tree hit the disk earlier than updates on lower levels. In this =
case
> >>>> the tree is broken and the fs is unmountable.
> >>>> Of course btrfs takes care of the order it writes, but problems =
arise
> >>>> when the disk is lying about whether a write is stable on disk, =
i.e.
> >>>> about cache flushes or barriers.
> >>>
> >>> Ah, I see. So the issue is not with the software implementation a=
t all
> >>> but only arises when hardware acknowledges flushes and barriers b=
efore they
> >>> actually complete?
> >>
> >> It doesn't mean there aren't any bugs left in the software stack ;=
)
> >
> > Naturally, but the fact that its very likely that the corruption st=
ories
> > I've been reading about are caused by misbehaving hardware set my m=
ind at
> > ease about experimenting further with btrfs (although I will await =
the fsck
> > before attempting things in production).
> >>>
> >>> Is this a common problem of hard disks?
> >>
> >> Only of very cheap ones. USB enclosures might add to the problem, =
too.
> >> Also some SSDs are rumored to be bad in this regard.
> >> Another problem are layers between btrfs and the hardware, like
> >> encryption.
> >
> > I am - and will be - using btrfs straight on hard disks, no lvm, (s=
oft)raid,
> > encryption or other layers.
> >
> > My hard drives are not that fancy (no 15k raptors here); I usually =
buy
> > hardware from the major suppliers (WD, Maxtor, Seagate, Hitachi etc=
). Also,
> > until the fast cache mode for SSDs in combination with rotating har=
dware
> > becomes stable, I'll stick to ordinary hard drives.
> >
> > Thank you for clarifying things.
> >
>
> I have to admit I've been beginning to wonder if we picked up a
> regression somewhere along the way with respect to corruptions after
> power outages.
>
> I'm lucky enough to have very unreliable power. =A0Btrfs was always
> robust for me on power outages until recently. =A0Now I've recently h=
ad
> two corrupted volumes on unclean shutdowns and power outages.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs=
" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-24 17:06         ` Mitch Harder
  2011-08-24 21:00           ` Ahmed Kamal
@ 2011-08-25  3:31           ` Anand Jain
  2011-08-25 17:55             ` Martin Steigerwald
  1 sibling, 1 reply; 16+ messages in thread
From: Anand Jain @ 2011-08-25  3:31 UTC (permalink / raw)
  To: Mitch Harder; +Cc: Berend Dekens, Arne Jansen, linux-btrfs



  We have a bit of documentation on the disk power failure and
  corruption here:
  https://btrfs.wiki.kernel.org/index.php/FAQ
  Ref to the 2nd faq in the list.

  Things would have been a lot easier for the filesystem (in terms
  of maintaining the its consistency) if disks could have some kind
  of atomic write (between disk-cache and disk) for a given block size.

  anyways, solutions containing disk-write-cache disabled and SSD
  is quite popular now a days. And in terms of random synchronous
  write performance they are awesome.

HTH
Cheers, Anand


On 08/25/2011 01:06 AM, Mitch Harder wrote:
> On Wed, Aug 24, 2011 at 10:13 AM, Berend Dekens<btrfs@cyberwizzard.nl>  wrote:
>> On 24/08/11 17:04, Arne Jansen wrote:
>>>
>>> On 24.08.2011 17:01, Berend Dekens wrote:
>>>>
>>>> On 24/08/11 15:31, Arne Jansen wrote:
>>>>>
>>>>> On 24.08.2011 15:11, Berend Dekens wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have followed the progress made in the btrfs filesystem over time and
>>>>>> while I have experimented with it a little in a VM, I have not yet used it
>>>>>> in a production machine.
>>>>>>
>>>>>> While the lack of a complete fsck was a major issue (I read the update
>>>>>> that the first working version is about to be released) I am still worried
>>>>>> about an issue I see popping up.
>>>>>>
>>>>>> How is it possible that a copy-on-write filesystem becomes corrupted if
>>>>>> a power failure occurs? I assume this means that even (hard) resetting a
>>>>>> computer can result in a corrupt filesystem.
>>>>>>
>>>>>> I thought the idea of COW was that whatever happens, you can always
>>>>>> mount in a semi-consistent state?
>>>>>>
>>>>>> As far as I can see, you wind up with this:
>>>>>> - No outstanding writes when power down
>>>>>> - File write complete, tree structure is updated. Since everything is
>>>>>> hashed and duplicated, unless the update propagates to the highest level,
>>>>>> the write will simply disappear upon failure. While this might be rectified
>>>>>> with a fsck, there should be no problems mounting the filesystem (read-only
>>>>>> if need be)
>>>>>> - Writes are not completed on all disks/partitions at the same time.
>>>>>> The checksums will detect these errors and once again, the write disappears
>>>>>> unless it is salvaged by a fsck.
>>>>>>
>>>>>> Am I missing something? How come there seem to be plenty people with a
>>>>>> corrupt btrfs after a power failure? And why haven't I experienced similar
>>>>>> issues where a filesystem becomes unmountable with say NTFS or Ext3/4?
>>>>>
>>>>> Problems arise when in your scenario writes from higher levels in the
>>>>> tree hit the disk earlier than updates on lower levels. In this case
>>>>> the tree is broken and the fs is unmountable.
>>>>> Of course btrfs takes care of the order it writes, but problems arise
>>>>> when the disk is lying about whether a write is stable on disk, i.e.
>>>>> about cache flushes or barriers.
>>>>
>>>> Ah, I see. So the issue is not with the software implementation at all
>>>> but only arises when hardware acknowledges flushes and barriers before they
>>>> actually complete?
>>>
>>> It doesn't mean there aren't any bugs left in the software stack ;)
>>
>> Naturally, but the fact that its very likely that the corruption stories
>> I've been reading about are caused by misbehaving hardware set my mind at
>> ease about experimenting further with btrfs (although I will await the fsck
>> before attempting things in production).
>>>>
>>>> Is this a common problem of hard disks?
>>>
>>> Only of very cheap ones. USB enclosures might add to the problem, too.
>>> Also some SSDs are rumored to be bad in this regard.
>>> Another problem are layers between btrfs and the hardware, like
>>> encryption.
>>
>> I am - and will be - using btrfs straight on hard disks, no lvm, (soft)raid,
>> encryption or other layers.
>>
>> My hard drives are not that fancy (no 15k raptors here); I usually buy
>> hardware from the major suppliers (WD, Maxtor, Seagate, Hitachi etc). Also,
>> until the fast cache mode for SSDs in combination with rotating hardware
>> becomes stable, I'll stick to ordinary hard drives.
>>
>> Thank you for clarifying things.
>>
>
> I have to admit I've been beginning to wonder if we picked up a
> regression somewhere along the way with respect to corruptions after
> power outages.
>
> I'm lucky enough to have very unreliable power.  Btrfs was always
> robust for me on power outages until recently.  Now I've recently had
> two corrupted volumes on unclean shutdowns and power outages.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-25  3:31           ` Anand Jain
@ 2011-08-25 17:55             ` Martin Steigerwald
  2011-08-25 22:16               ` Maciej Marcin Piechotka
  0 siblings, 1 reply; 16+ messages in thread
From: Martin Steigerwald @ 2011-08-25 17:55 UTC (permalink / raw)
  To: Anand Jain; +Cc: Mitch Harder, Berend Dekens, Arne Jansen, linux-btrfs

Am Donnerstag, 25. August 2011 schrieb Anand Jain:
>   anyways, solutions containing disk-write-cache disabled and SSD
>   is quite popular now a days. And in terms of random synchronous
>   write performance they are awesome.

There are SSD with capacitors such as Intel SSD 320. These according to=
=20
the vendor should write out all remaining writes that made it to the di=
sk=20
cache should a power loss occur.

I did not have any issues with BTRFS on / with a ThinkPad T520 and an=20
Intel SSD 320. /home is still Ext4, as I want a fsck first. Thats with=20
Linux 3.0.0-2 amd64 debian package.

That said I also do not have any issues with BTRFS on a ThinkPad T23 fo=
r /=20
and /home. But then the machine has an hibernate-to-disk-and-resume upt=
ime=20
of almost 120 days, so it didn=B4t see a power loss for a long time. Th=
ats=20
still with 2.6.38.4.

--=20
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-25 17:55             ` Martin Steigerwald
@ 2011-08-25 22:16               ` Maciej Marcin Piechotka
  2011-11-09 20:15                 ` Martin Steigerwald
  0 siblings, 1 reply; 16+ messages in thread
From: Maciej Marcin Piechotka @ 2011-08-25 22:16 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 669 bytes --]

On Thu, 2011-08-25 at 19:55 +0200, Martin Steigerwald wrote:
> That said I also do not have any issues with BTRFS on a ThinkPad T23
> for / 
> and /home. But then the machine has an hibernate-to-disk-and-resume
> uptime 
> of almost 120 days, so it didn´t see a power loss for a long time.
> Thats 
> still with 2.6.38.4.
> 

Which method of hibernation are you using?

I got enormous problems with btrfs+toi including:

 - Freezes resulting in umountable partition (twice so far, 2 results in
google including one message I sent to list)
 - Sometimes a random program (Skype, Firefox) cannot be frozen and
stacktracke includes btrfs AIO.

regards

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-24 13:11 BTRFS and power loss ~= corruption? Berend Dekens
  2011-08-24 13:31 ` Arne Jansen
@ 2011-08-25 23:01 ` Gregory Maxwell
  2011-08-26  6:37   ` Arne Jansen
  2011-11-09 17:33   ` Stefan Behrens
  1 sibling, 2 replies; 16+ messages in thread
From: Gregory Maxwell @ 2011-08-25 23:01 UTC (permalink / raw)
  To: Berend Dekens; +Cc: linux-btrfs

On Wed, Aug 24, 2011 at 9:11 AM, Berend Dekens <btrfs@cyberwizzard.nl> wrote:
[snip]
> I thought the idea of COW was that whatever happens, you can always mount in
> a semi-consistent state?
[snip]


It seems to me that if someone created a block device which recorded
all write operations a rather excellent test could be constructed
where a btrfs filesystem is recorded under load and then every partial
replay is mounted and checked for corruption/data loss.

This would result in high confidence that no power loss event could
destroy data given the offered load assuming well behaved
(non-reordering hardware).  If it recorded barrier operations the a
tool could also try many (but probably not all) permissible
reorderings at every truncation offset.

It seems to me that the existence of this kind of testing is something
that should be expected of a modern filesystem before it sees
widescale production use.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-25 23:01 ` Gregory Maxwell
@ 2011-08-26  6:37   ` Arne Jansen
  2011-08-26  7:48     ` Mike Fleetwood
  2011-11-09 17:33   ` Stefan Behrens
  1 sibling, 1 reply; 16+ messages in thread
From: Arne Jansen @ 2011-08-26  6:37 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Berend Dekens, linux-btrfs

On 26.08.2011 01:01, Gregory Maxwell wrote:
> On Wed, Aug 24, 2011 at 9:11 AM, Berend Dekens <btrfs@cyberwizzard.nl> wrote:
> [snip]
>> I thought the idea of COW was that whatever happens, you can always mount in
>> a semi-consistent state?
> [snip]
> 
> 
> It seems to me that if someone created a block device which recorded
> all write operations a rather excellent test could be constructed
> where a btrfs filesystem is recorded under load and then every partial
> replay is mounted and checked for corruption/data loss.
> 
> This would result in high confidence that no power loss event could
> destroy data given the offered load assuming well behaved
> (non-reordering hardware).  If it recorded barrier operations the a
> tool could also try many (but probably not all) permissible
> reorderings at every truncation offset.
> 

I like the idea. Some more thoughts:
 - instead of trying all reorderings it might be enough to just always
   deliver the oldest possible copy
 - the order in which btrfs writes the data probably depends on the
   order in which the device acknowledges the request. You might need
   to add some reordering there, too
 - you need to produce a wide variety of workloads, as problems might
   only occur at a special kind of it (directIO, fsync, snapshots...)
 - if there really is a regression somewhere, it would be good to also
   include the full block layer into the test, as the regression might
   not be in btrfs at all
 - as a first small step one could just use blktrace to record the write
   order and analyze the order on mount as well

> It seems to me that the existence of this kind of testing is something
> that should be expected of a modern filesystem before it sees
> widescale production use.
> --

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-26  6:37   ` Arne Jansen
@ 2011-08-26  7:48     ` Mike Fleetwood
  2011-08-26  9:30       ` Arne Jansen
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Fleetwood @ 2011-08-26  7:48 UTC (permalink / raw)
  To: linux-btrfs

On 26 August 2011 07:37, Arne Jansen <sensille@gmx.net> wrote:
> On 26.08.2011 01:01, Gregory Maxwell wrote:
>> On Wed, Aug 24, 2011 at 9:11 AM, Berend Dekens <btrfs@cyberwizzard.n=
l> wrote:
>>
>> It seems to me that if someone created a block device which recorded
>> all write operations a rather excellent test could be constructed
>> where a btrfs filesystem is recorded under load and then every parti=
al
>> replay is mounted and checked for corruption/data loss.
>>
>> This would result in high confidence that no power loss event could
>> destroy data given the offered load assuming well behaved
>> (non-reordering hardware). =C2=A0If it recorded barrier operations t=
he a
>> tool could also try many (but probably not all) permissible
>> reorderings at every truncation offset.
>>
>
> I like the idea. Some more thoughts:
> =C2=A0- instead of trying all reorderings it might be enough to just =
always
> =C2=A0 deliver the oldest possible copy
> =C2=A0- the order in which btrfs writes the data probably depends on =
the
> =C2=A0 order in which the device acknowledges the request. You might =
need
> =C2=A0 to add some reordering there, too
> =C2=A0- you need to produce a wide variety of workloads, as problems =
might
> =C2=A0 only occur at a special kind of it (directIO, fsync, snapshots=
=2E..)
> =C2=A0- if there really is a regression somewhere, it would be good t=
o also
> =C2=A0 include the full block layer into the test, as the regression =
might
> =C2=A0 not be in btrfs at all
> =C2=A0- as a first small step one could just use blktrace to record t=
he write
> =C2=A0 order and analyze the order on mount as well
>
>> It seems to me that the existence of this kind of testing is somethi=
ng
>> that should be expected of a modern filesystem before it sees
>> widescale production use.

This article describes evaluating ext3, reiserfs and jfs using fault
injection using a custom Linux block device driver.
"Model-Based Failure Analysis of Journaling File Systems"
http://www.cs.wisc.edu/adsl/Publications/sfa-dsn05.pdf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-26  7:48     ` Mike Fleetwood
@ 2011-08-26  9:30       ` Arne Jansen
  0 siblings, 0 replies; 16+ messages in thread
From: Arne Jansen @ 2011-08-26  9:30 UTC (permalink / raw)
  To: Mike Fleetwood; +Cc: linux-btrfs

On 26.08.2011 09:48, Mike Fleetwood wrote:
>> On 26.08.2011 01:01, Gregory Maxwell wrote:
> 
> This article describes evaluating ext3, reiserfs and jfs using fault
> injection using a custom Linux block device driver.
> "Model-Based Failure Analysis of Journaling File Systems"
> http://www.cs.wisc.edu/adsl/Publications/sfa-dsn05.pdf
> --

While the article is interesting, it describes a quite different
failure mode. I/O-error handling is not very sophisticated in
btrfs yet. The tests Gregory were taking about are on completely
well-behaving hardware. Failure injection would be the second step.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-25 23:01 ` Gregory Maxwell
  2011-08-26  6:37   ` Arne Jansen
@ 2011-11-09 17:33   ` Stefan Behrens
  1 sibling, 0 replies; 16+ messages in thread
From: Stefan Behrens @ 2011-11-09 17:33 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Berend Dekens, linux-btrfs

On 8/26/2011 1:01 AM, Gregory Maxwell wrote:
> On Wed, Aug 24, 2011 at 9:11 AM, Berend Dekens <btrfs@cyberwizzard.nl> wrote:
> [snip]
>> I thought the idea of COW was that whatever happens, you can always mount in
>> a semi-consistent state?
> [snip]
> 
> 
> It seems to me that if someone created a block device which recorded
> all write operations a rather excellent test could be constructed
> where a btrfs filesystem is recorded under load and then every partial
> replay is mounted and checked for corruption/data loss.
> 
> This would result in high confidence that no power loss event could
> destroy data given the offered load assuming well behaved
> (non-reordering hardware).  If it recorded barrier operations the a
> tool could also try many (but probably not all) permissible
> reorderings at every truncation offset.
> 
> It seems to me that the existence of this kind of testing is something
> that should be expected of a modern filesystem before it sees
> widescale production use.

Gregory, Thank you for the idea to implement a tool that verifies the
file system consistency.
Following your idea, I have just written a runtime tool for this
purpose, refer to the message-id
<cover.1320849129.git.sbehrens@giantdisaster.de> in the btrfs mailing
list. The tool examines all btrfs disk write operations at runtime. It
verifies that the on-disk data is always in a consistent state, in order
to create confidence that power loss (or kernel panics) cannot cause
corrupted file systems.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BTRFS and power loss ~= corruption?
  2011-08-25 22:16               ` Maciej Marcin Piechotka
@ 2011-11-09 20:15                 ` Martin Steigerwald
  0 siblings, 0 replies; 16+ messages in thread
From: Martin Steigerwald @ 2011-11-09 20:15 UTC (permalink / raw)
  To: linux-btrfs, uzytkownik2

Hi Maciej,

Am Freitag, 26. August 2011 schrieb Maciej Marcin Piechotka:
> On Thu, 2011-08-25 at 19:55 +0200, Martin Steigerwald wrote:
> > That said I also do not have any issues with BTRFS on a ThinkPad T2=
3
> > for /
> > and /home. But then the machine has an hibernate-to-disk-and-resume
> > uptime
> > of almost 120 days, so it didn=C2=B4t see a power loss for a long t=
ime.
> > Thats
> > still with 2.6.38.4.
>=20
> Which method of hibernation are you using?
>=20
> I got enormous problems with btrfs+toi including:
>=20
>  - Freezes resulting in umountable partition (twice so far, 2 results
> in google including one message I sent to list)
>  - Sometimes a random program (Skype, Firefox) cannot be frozen and
> stacktracke includes btrfs AIO.

I do not use TOI since quite some time anymore since I had some issues=20
with it I do not remember anymore, I think it didn=C2=B4t work reliably=
 back=20
then. I like to try again, but did not come around to do it.

I use in-kernel-suspend, now with 3.0, debian packaged, thus not even s=
elf=20
compiled anymore, and its just rock solid on the T23. Its not that rock=
=20
solide on my new ThinkPad T520, but that might also be some driver issu=
es=20
and I do not use BTRFS on the new one, only on the T23.

Ciao,
--=20
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-11-09 20:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-24 13:11 BTRFS and power loss ~= corruption? Berend Dekens
2011-08-24 13:31 ` Arne Jansen
2011-08-24 15:01   ` Berend Dekens
2011-08-24 15:04     ` *** GMX Spamverdacht *** " Arne Jansen
2011-08-24 15:13       ` Berend Dekens
2011-08-24 17:06         ` Mitch Harder
2011-08-24 21:00           ` Ahmed Kamal
2011-08-25  3:31           ` Anand Jain
2011-08-25 17:55             ` Martin Steigerwald
2011-08-25 22:16               ` Maciej Marcin Piechotka
2011-11-09 20:15                 ` Martin Steigerwald
2011-08-25 23:01 ` Gregory Maxwell
2011-08-26  6:37   ` Arne Jansen
2011-08-26  7:48     ` Mike Fleetwood
2011-08-26  9:30       ` Arne Jansen
2011-11-09 17:33   ` Stefan Behrens

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.