All of lore.kernel.org
 help / color / mirror / Atom feed
* Recovery advice
@ 2013-07-26 20:31 Sandy McArthur
  2013-08-04 12:41 ` Kai Krakow
  0 siblings, 1 reply; 6+ messages in thread
From: Sandy McArthur @ 2013-07-26 20:31 UTC (permalink / raw)
  To: linux-btrfs

I have a 4 disk RAID1 setup that fails to {mount,btrfsck} when disk 4
is connected.

With disk 4 attached btrfsck errors with:
btrfsck: root-tree.c:46: btrfs_find_last_root: Assertion
`!(path->slots[0] == 0)' failed
(I'd have to reboot in a non-functioning state to get the full output.)

I can mount the filesystem in a degraded state with the 4th drive
removed. I believe there is some data corruption as I see lines in
/var/log/messages from the degraded,ro filesystem like this:

BTRFS info (device sdd1): csum failed ino 4433 off 3254538240 csum
1033749897 private 2248083221

I'm at the point where all I can think to do is wipe disk 4 and then
add it back in. Is there anything else I should try first. I have
booted btrfs-next with the latest btrfs-progs.

Thanks.
--
Sandy McArthur

"He who dares not offend cannot be honest."
- Thomas Paine

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recovery advice
  2013-07-26 20:31 Recovery advice Sandy McArthur
@ 2013-08-04 12:41 ` Kai Krakow
  2013-08-04 22:19   ` Duncan
  0 siblings, 1 reply; 6+ messages in thread
From: Kai Krakow @ 2013-08-04 12:41 UTC (permalink / raw)
  To: linux-btrfs

Sandy McArthur <sandymac@gmail.com> schrieb:

> I have a 4 disk RAID1 setup that fails to {mount,btrfsck} when disk 4
> is connected.
> 
> With disk 4 attached btrfsck errors with:
> btrfsck: root-tree.c:46: btrfs_find_last_root: Assertion
> `!(path->slots[0] == 0)' failed
> (I'd have to reboot in a non-functioning state to get the full output.)
> 
> I can mount the filesystem in a degraded state with the 4th drive
> removed. I believe there is some data corruption as I see lines in
> /var/log/messages from the degraded,ro filesystem like this:
> 
> BTRFS info (device sdd1): csum failed ino 4433 off 3254538240 csum
> 1033749897 private 2248083221
> 
> I'm at the point where all I can think to do is wipe disk 4 and then
> add it back in. Is there anything else I should try first. I have
> booted btrfs-next with the latest btrfs-progs.

It is a RAID-1 so why bother with the faulty drive? Just wipe it, put it 
back in, then run a btrfs balance... There should be no data loss because 
all data is stored twice (two-way mirroring).

Regards,
Kai


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recovery advice
  2013-08-04 12:41 ` Kai Krakow
@ 2013-08-04 22:19   ` Duncan
  2013-08-04 23:05     ` Kai Krakow
  2013-08-04 23:13     ` Chris Murphy
  0 siblings, 2 replies; 6+ messages in thread
From: Duncan @ 2013-08-04 22:19 UTC (permalink / raw)
  To: linux-btrfs

Kai Krakow posted on Sun, 04 Aug 2013 14:41:54 +0200 as excerpted:

> It is a RAID-1 so why bother with the faulty drive? Just wipe it, put it
> back in, then run a btrfs balance... There should be no data loss
> because all data is stored twice (two-way mirroring).

The caveat would be if it didn't start as btrfs raid1, and there's still 
some data (or possibly metadata if it was the single drive at one point 
or they're ssds, as btrfs defaults to metadata single in ssd mode) that 
hasn't been duped elsewhere.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recovery advice
  2013-08-04 22:19   ` Duncan
@ 2013-08-04 23:05     ` Kai Krakow
  2013-08-05 15:44       ` Sandy McArthur
  2013-08-04 23:13     ` Chris Murphy
  1 sibling, 1 reply; 6+ messages in thread
From: Kai Krakow @ 2013-08-04 23:05 UTC (permalink / raw)
  To: linux-btrfs

Duncan <1i5t5.duncan@cox.net> schrieb:

>> It is a RAID-1 so why bother with the faulty drive? Just wipe it, put it
>> back in, then run a btrfs balance... There should be no data loss
>> because all data is stored twice (two-way mirroring).
> 
> The caveat would be if it didn't start as btrfs raid1, and there's still
> some data (or possibly metadata if it was the single drive at one point
> or they're ssds, as btrfs defaults to metadata single in ssd mode) that
> hasn't been duped elsewhere.

Oh... That's actually a pitfall... :-\

Note to myself: Ensure balance has been run successfully and completely.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recovery advice
  2013-08-04 22:19   ` Duncan
  2013-08-04 23:05     ` Kai Krakow
@ 2013-08-04 23:13     ` Chris Murphy
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Murphy @ 2013-08-04 23:13 UTC (permalink / raw)
  To: Btrfs BTRFS


On Aug 4, 2013, at 4:19 PM, Duncan <1i5t5.duncan@cox.net> wrote:

> Kai Krakow posted on Sun, 04 Aug 2013 14:41:54 +0200 as excerpted:
> 
>> It is a RAID-1 so why bother with the faulty drive? Just wipe it, put it
>> back in, then run a btrfs balance... There should be no data loss
>> because all data is stored twice (two-way mirroring).
> 
> The caveat would be if it didn't start as btrfs raid1, and there's still 
> some data (or possibly metadata if it was the single drive at one point 
> or they're ssds, as btrfs defaults to metadata single in ssd mode) that 
> hasn't been duped elsewhere.

I agree. I think tossing the data on the problematic device is a bit of a hammer. It may be necessary, but I don't think enough information has been provided to conclusively determine all other options have been explored. What kernel versions have been used? What does dmesg record beginning at the time of a normal mount attempt with all four devices available? What does btrfsck (without repair) report? Are there any prior kernel messages related to the controller or libata messages related to the suspect drive? What's the smartctl -x output for the suspect drive? Has mounting with -o recovery been attempted, and if so what were the messages recorded?

Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recovery advice
  2013-08-04 23:05     ` Kai Krakow
@ 2013-08-05 15:44       ` Sandy McArthur
  0 siblings, 0 replies; 6+ messages in thread
From: Sandy McArthur @ 2013-08-05 15:44 UTC (permalink / raw)
  To: linux-btrfs

FYI: I ended up wipefs'ing the drive and adding it back in. I also has
to abort the residual balance process to get the filesystem back to a
state where I could add disk. I didn't realize this until after wiping
the drive. Maybe if I'd known to look for that I could have recovered
the drive before the wipe. Anyway, all seems fine now and I'm not mix
and matching connection types.

More History:
The filesystem came to a failed state during a balance just after
adding the problem disk. This disk also had been installed inside the
case on SATA instead of inside an external multi-drive enclosure. My
thoughts at the time (now known to be semi-faulty) were it would be
faster to push data into the disk that way. When the machine
hardlocked this one drive was different enough from the other 3 I
simply could not get btrfs to work with all four disks at once.


On Sun, Aug 4, 2013 at 7:05 PM, Kai Krakow <hurikhan77+btrfs@gmail.com> wrote:
> Duncan <1i5t5.duncan@cox.net> schrieb:
>
>>> It is a RAID-1 so why bother with the faulty drive? Just wipe it, put it
>>> back in, then run a btrfs balance... There should be no data loss
>>> because all data is stored twice (two-way mirroring).
>>
>> The caveat would be if it didn't start as btrfs raid1, and there's still
>> some data (or possibly metadata if it was the single drive at one point
>> or they're ssds, as btrfs defaults to metadata single in ssd mode) that
>> hasn't been duped elsewhere.
>
> Oh... That's actually a pitfall... :-\
>
> Note to myself: Ensure balance has been run successfully and completely.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Sandy McArthur

"He who dares not offend cannot be honest."
- Thomas Paine

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-08-05 15:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-26 20:31 Recovery advice Sandy McArthur
2013-08-04 12:41 ` Kai Krakow
2013-08-04 22:19   ` Duncan
2013-08-04 23:05     ` Kai Krakow
2013-08-05 15:44       ` Sandy McArthur
2013-08-04 23:13     ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.