linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert White <rwhite@pobox.com>
To: Chris Murphy <lists@colorremedies.com>,
	Zygo Blaxell <zblaxell@furryterror.org>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: filesystem corruption
Date: Tue, 04 Nov 2014 14:19:36 -0800	[thread overview]
Message-ID: <545950F8.1050505@pobox.com> (raw)
In-Reply-To: <BCBD4D38-1631-418A-8F3B-16497BDEB300@colorremedies.com>

On 11/04/2014 10:28 AM, Chris Murphy wrote:
> On Nov 3, 2014, at 9:31 PM, Zygo Blaxell <zblaxell@furryterror.org> wrote:
>> Now we have two disks with equal generation numbers.  Generations 6..9
>> on sda are not the same as generations 6..9 on sdb, so if we mix the
>> two disks' metadata we get bad confusion.
>>
>> It needs to be more than a sequential number.  If one of the disks
>> disappears we need to record this fact on the surviving disks, and also
>> cope with _both_ disks claiming to be the "surviving" one.
>
> I agree this is also a problem. But the most common case is where we know that sda generation is newer (larger value) and most recently modified, and sdb has not since been modified but needs to be caught up. As far as I know the only way to do that on Btrfs right now is a full balance, it doesn't catch up just be being reconnected with a normal mount.


I would think that any time any system or fraction thereof is mounted 
with both a "degraded" and "rw", status a degraded flag should be set 
somewhere/somehow in the superblock etc.

The only way to clear this flag would be to reach a "reconciled" state. 
That state could be reached in one of several ways. Removing the missing 
mirror element would be a fast reconcile, doing a balance or scrub would 
be a slow reconcile for a filessytem where all the media are returned to 
service (e.g. the missing volume of a RAID 1 etc is returned.)

Generation numbers are pretty good, but I'd put on a rider that any 
generation number or equivelant incremented while the system is degraded 
should have a unique quanta (say a GUID) generated and stored along with 
the generation number. The mere existence of this quanta would act as 
the degraded flag.

Any check/compare/access related to the generation number would know to 
notice that the GUID is in place and do the necessary resolution. If 
successful the GUID would be discarded.

As to how this could be implemented, I'm not fully conversant on the 
internal layout.

One possibility would be to add a block reference, or, indeed replace 
the current storage for generation numbers completely with block 
reference to a block containing the generation number and the potential 
GUID. The main value of having an out-of-structure reference is that its 
content is less space constrained, and it could be shared by multiple 
usages. In the case, for instance, where the block is added (as opposed 
to replacing the generation number) only one such block would be needed 
per degraded,rw mount, and it could be attached to as many filesystem 
structures as needed.


Just as metadata under DUP is divergent after a degraded mount, a 
generation block wold be divergent, and likely in a different location 
than its peers on a subsequent restored geometry.

A gerenation block could have other nicities like the date/time and the 
devices present (or absent); such information could conceivably be used 
to intellegently disambiguate references. For instance if one degraded 
mount had sda and sdb, and second had sdb and sdc, then itd be known 
that sdb was dominant for having been present every time.

  parent reply	other threads:[~2014-11-04 22:19 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-31  0:29 filesystem corruption Tobias Holst
2014-10-31  1:02 ` Tobias Holst
2014-10-31  2:41   ` Rich Freeman
2014-10-31 17:34     ` Tobias Holst
2014-11-02  4:49       ` Robert White
2014-11-02 21:57         ` Chris Murphy
2014-11-03  3:43           ` Zygo Blaxell
2014-11-03 17:11             ` Chris Murphy
2014-11-04  4:31               ` Zygo Blaxell
2014-11-04  8:25                 ` Duncan
2014-11-04 18:28                 ` Chris Murphy
2014-11-04 21:44                   ` Duncan
2014-11-04 22:19                   ` Robert White [this message]
2014-11-04 22:34                   ` Zygo Blaxell
2014-11-03  2:55         ` Tobias Holst
2014-11-03  3:49           ` Robert White
2018-12-03  9:31 Filesystem Corruption Stefan Malte Schumacher
2018-12-03 11:34 ` Qu Wenruo
2018-12-03 16:29 ` remi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=545950F8.1050505@pobox.com \
    --to=rwhite@pobox.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=zblaxell@furryterror.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).