All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs-progs: check metadata redundancy
@ 2015-05-02 16:03 sam tygier
  2015-05-05 14:54 ` David Sterba
  0 siblings, 1 reply; 7+ messages in thread
From: sam tygier @ 2015-05-02 16:03 UTC (permalink / raw)
  To: linux-btrfs

Currently BTRFS allows you to make bad choices of data and 
metadata levels. For example -d raid1 -m raid0 means you can
only use half your total disk space, but will loose everything
if 1 disk fails. This patch prevents you creating the situation
another will be need to prevent rebalancing in to it. 

When making a filesystem check that metadata mode is at least
as redundant as the data mode. For example don't allow:
	-d raid1 -m raid0

Signed-off-by: Sam Tygier <samtygier@yahoo.co.uk>
---
 utils.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/utils.c b/utils.c
index b175b01..1136a78 100644
--- a/utils.c
+++ b/utils.c
@@ -2391,6 +2391,24 @@ static int group_profile_devs_min(u64 flag)
 	}
 }
 
+static int group_profile_max_safe_loss(u64 flag)
+{
+	switch (flag & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+	case 0: /* single */
+	case BTRFS_BLOCK_GROUP_DUP:
+	case BTRFS_BLOCK_GROUP_RAID0:
+		return 0;
+	case BTRFS_BLOCK_GROUP_RAID1:
+	case BTRFS_BLOCK_GROUP_RAID5:
+	case BTRFS_BLOCK_GROUP_RAID10:
+		return 1;
+	case BTRFS_BLOCK_GROUP_RAID6:
+		return 2;
+	default:
+		return -1;
+	}
+}
+
 int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile,
 	u64 dev_cnt, int mixed, char *estr)
 {
@@ -2439,6 +2457,13 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile,
 			"dup for data is allowed only in mixed mode");
 		return 1;
 	}
+
+	if (group_profile_max_safe_loss(metadata_profile) <
+		group_profile_max_safe_loss(data_profile)){
+		snprintf(estr, sz,
+			"metatdata has lower redundancy than data");
+		return 1;
+	}
 	return 0;
 }
 
-- 2.1.4



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs-progs: check metadata redundancy
  2015-05-02 16:03 [PATCH] btrfs-progs: check metadata redundancy sam tygier
@ 2015-05-05 14:54 ` David Sterba
  2015-05-05 21:18   ` sam tygier
  0 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2015-05-05 14:54 UTC (permalink / raw)
  To: sam tygier; +Cc: linux-btrfs

On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote:
> Currently BTRFS allows you to make bad choices of data and 
> metadata levels. For example -d raid1 -m raid0 means you can
> only use half your total disk space, but will loose everything
> if 1 disk fails. This patch prevents you creating the situation
> another will be need to prevent rebalancing in to it. 
> 
> When making a filesystem check that metadata mode is at least
> as redundant as the data mode. For example don't allow:
> 	-d raid1 -m raid0

This is enforcing some policty that makes sense for some usecases, but I
think that the tool should be flexible enough to create any kind of raid
profiles. It's up to the user. I'm willing to add a warning that the
profiles seem fishy, but failing mkfs without any way to override that
is IMHO not a good thing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs-progs: check metadata redundancy
  2015-05-05 14:54 ` David Sterba
@ 2015-05-05 21:18   ` sam tygier
  2015-05-06  3:40     ` Paul Jones
  2015-05-12 15:02     ` David Sterba
  0 siblings, 2 replies; 7+ messages in thread
From: sam tygier @ 2015-05-05 21:18 UTC (permalink / raw)
  To: linux-btrfs

On 05/05/15 15:54, David Sterba wrote:
> On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote:
>> Currently BTRFS allows you to make bad choices of data and
>> metadata levels. For example -d raid1 -m raid0 means you can
>> only use half your total disk space, but will loose everything
>> if 1 disk fails. This patch prevents you creating the situation
>> another will be need to prevent rebalancing in to it.
>>
>> When making a filesystem check that metadata mode is at least
>> as redundant as the data mode. For example don't allow:
>> 	-d raid1 -m raid0
> 
> This is enforcing some policty that makes sense for some usecases, but I
> think that the tool should be flexible enough to create any kind of raid
> profiles. It's up to the user. I'm willing to add a warning that the
> profiles seem fishy, but failing mkfs without any way to override that
> is IMHO not a good thing.

There already seems to be policy in test_num_disk_vs_raid() disallowing
DUP for multiple devices. Is there really a useful case better protected
data than metadata?

In btrfs_balance() fs/btrfs/volumes.c, operations that reduce integrity
require a 'force' option. Would that be a good way of handling
questionable data/metadata combinations? If so should it overload the
existing for option, or additional one, e.g. --force-raid-level?

Otherwise I could redo it as just a warning.

If wrote a similar check for rebalancing is there a way to share the
group_profile_max_safe_loss() function between the kernel and btrfs-progs?

Thanks,
Sam


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] btrfs-progs: check metadata redundancy
  2015-05-05 21:18   ` sam tygier
@ 2015-05-06  3:40     ` Paul Jones
  2015-05-06 10:14       ` Duncan
  2015-05-12 15:04       ` David Sterba
  2015-05-12 15:02     ` David Sterba
  1 sibling, 2 replies; 7+ messages in thread
From: Paul Jones @ 2015-05-06  3:40 UTC (permalink / raw)
  To: sam tygier, linux-btrfs

> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> owner@vger.kernel.org] On Behalf Of sam tygier
> Sent: Wednesday, 6 May 2015 7:18 AM
> To: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH] btrfs-progs: check metadata redundancy
> 
> On 05/05/15 15:54, David Sterba wrote:
> > On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote:
> >> Currently BTRFS allows you to make bad choices of data and metadata
> >> levels. For example -d raid1 -m raid0 means you can only use half
> >> your total disk space, but will loose everything if 1 disk fails.
> >> This patch prevents you creating the situation another will be need
> >> to prevent rebalancing in to it.
> >>
> >> When making a filesystem check that metadata mode is at least as
> >> redundant as the data mode. For example don't allow:
> >> 	-d raid1 -m raid0
> >
> > This is enforcing some policty that makes sense for some usecases, but
> > I think that the tool should be flexible enough to create any kind of
> > raid profiles. It's up to the user. I'm willing to add a warning that
> > the profiles seem fishy, but failing mkfs without any way to override
> > that is IMHO not a good thing.
> 
> There already seems to be policy in test_num_disk_vs_raid() disallowing
> DUP for multiple devices. Is there really a useful case better protected data
> than metadata?

I would appreciate being able to use the DUP profile for data on a single disk - at the moment I just resort to partitioning the disk in two and creating a raid1.
Usecase is portable disk backups - I take a master backup at site A with slow internet, and upload it to backup server from site B. Then I use rsync and snapshots. I start over every 2 months as one of the backups is an incremental backup of a few windows system images.

Paul.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs-progs: check metadata redundancy
  2015-05-06  3:40     ` Paul Jones
@ 2015-05-06 10:14       ` Duncan
  2015-05-12 15:04       ` David Sterba
  1 sibling, 0 replies; 7+ messages in thread
From: Duncan @ 2015-05-06 10:14 UTC (permalink / raw)
  To: linux-btrfs

Paul Jones posted on Wed, 06 May 2015 03:40:12 +0000 as excerpted:

> I would appreciate being able to use the DUP profile for data on a
> single disk - at the moment I just resort to partitioning the disk in
> two and creating a raid1. Usecase is portable disk backups

Rabbit-trailing a bit from the original discussion to address this 
point...

First of all, DUP data has been on the wishlist for some users for some 
time, generally for those who wish to capitalize on btrfs data 
checksumming and validation and actually be able to correct and scrub 
invalid data when using btrfs on a single device, which would seem to be 
what your use-case is about, at its root.

Meanwhile, second, good news!  There is actually one way to achieve DUP 
data on a single device, without resorting to partitioning a single 
device to create a raid1 on it:

When doing the mkfs.btrfs, set the -M/--mixed chunk mode option.  

Mixed-chunk mode (aka mixed-block-group aka mixed-bg) is used by default 
on btrfs under 1 GiB, and there have been discussions about upping that 
to say 16 GiB or so, but the reason it isn't the default on normal larger 
btrfs is because (as the mkfs.btrfs manpage states) mixed-chunk mode 
incurs a performance penalty on larger btrfs.

But partitioning up a single physical device to make it two logical 
devices, just to be able to setup raid1 data, surely incurs a larger 
penalty, at least on spinning rust, where the single set of write heads 
will have to seek back and forth between partitions.  And obviously, if 
people are going to that extreme (and you and others have demonstrated 
that they are and do!), they're willing to pay the relatively smaller 
mixed-bg penalty.

Since mixed-bg mode mixes data and metadata in the same block-groups 
(chunks), DUP mode was made an option there (and I think the default, tho 
I always set it specifically for both data and metadata, setting just one 
in the case of mixed-bg is an error), in ordered to maintain metadata DUP 
protection.  But since as the name suggests they're mixed-bgs, that also 
has the effect of setting data DUP mode! =:^)

When asked specifically about this, Chris Mason confirmed that wasn't 
intentional, simply an implementation accident, with the purpose of mixed-
bg mode being, as the manpage and sub-1-GiB-default suggests, to allow 
more efficient usage of space on small btrfs, which was really needed as 
pre-mixed-bg, chunk allocation on small devices /was/ really inefficient, 
and mixed-bg really did solve that problem.  However, accident or not, 
it's not something that can really be dropped now, and I've seen 
absolutely no suggestions to do so.

So mixed-bg does seem to be the workaround for lack of DUP mode data, and 
even with its inefficiencies compared to separate data/metadata, compared 
to "virtual" raid1 on a single partitioned physical device, it should be 
/quite/ efficient indeed, at least on spinning rust.

Which just leaves some loose ends to tie up...

* I don't know that anyone has benchmarked, or even made claims based on 
logic, of performance on single SSD, partitioned raid1 mode against mixed-
bg dup mode.

* If the concern is failing media itself, again on spinning rust (ssd 
firmwares do weird things with consecutive sector addresses anyway), 
raid1 mode should still be slightly more reliable, simply because the two 
copies are going to be in partitions on opposite ends of the disc.  Mixed-
bg mode will tend to allocate chunks closer to each other, such that at 
least in theory, it's more likely that a flaky media section will damage 
both copies.

* Many people would still like to have true DUP data chunks as an option, 
and personally, I think it'll likely eventually happen, because I think 
the reason people want that option is valid and it shouldn't be hard to 
implement -- I think mostly just letting that be a data option too, tho 
there's likely a few corner-case races that might expose that have been 
hidden so far as it's not an option.  However, I also believe that as a 
practical matter, the existence of the mixed-bg workaround has more or 
less silenced the requests, and there have been bigger fish (well, bugs, 
and features like raid56 mode, not fish) to fry, so it hasn't been the 
issue that it would have been otherwise, and that has probably delayed 
the lifting of the DUP mode data restriction in the more general case.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs-progs: check metadata redundancy
  2015-05-05 21:18   ` sam tygier
  2015-05-06  3:40     ` Paul Jones
@ 2015-05-12 15:02     ` David Sterba
  1 sibling, 0 replies; 7+ messages in thread
From: David Sterba @ 2015-05-12 15:02 UTC (permalink / raw)
  To: sam tygier; +Cc: linux-btrfs

On Tue, May 05, 2015 at 10:18:07PM +0100, sam tygier wrote:
> On 05/05/15 15:54, David Sterba wrote:
> > On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote:
> >> Currently BTRFS allows you to make bad choices of data and
> >> metadata levels. For example -d raid1 -m raid0 means you can
> >> only use half your total disk space, but will loose everything
> >> if 1 disk fails. This patch prevents you creating the situation
> >> another will be need to prevent rebalancing in to it.
> >>
> >> When making a filesystem check that metadata mode is at least
> >> as redundant as the data mode. For example don't allow:
> >> 	-d raid1 -m raid0
> > 
> > This is enforcing some policty that makes sense for some usecases, but I
> > think that the tool should be flexible enough to create any kind of raid
> > profiles. It's up to the user. I'm willing to add a warning that the
> > profiles seem fishy, but failing mkfs without any way to override that
> > is IMHO not a good thing.
> 
> There already seems to be policy in test_num_disk_vs_raid() disallowing
> DUP for multiple devices. Is there really a useful case better protected
> data than metadata?

In case of DUP/data and single device it's not a policy but lack of
implementation. And not a simple change to make it work AFAIK.

DUP/metadata on multiple devices can exist only if a new device is added
to an existing filesystem until it's balanced. Here it is a policy that
multiple devices need RAID1.

> In btrfs_balance() fs/btrfs/volumes.c, operations that reduce integrity
> require a 'force' option. Would that be a good way of handling
> questionable data/metadata combinations? If so should it overload the
> existing for option, or additional one, e.g. --force-raid-level?

I think changing the integrity is something different than the mkfs
profile setup.

The force flag prevents irreversible changes (overwriting an existing
filesystem). Overloading it for the raid profiles does not sound good to
me, it would have to be another flag. But, I still think that the user
hould be aware of the properties of the respective raid levels, so the
warning is IMHO enough.

> Otherwise I could redo it as just a warning.

Yes please.

> If wrote a similar check for rebalancing is there a way to share the
> group_profile_max_safe_loss() function between the kernel and btrfs-progs?

No, the source code is not shared now, both have to be patched
separately.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs-progs: check metadata redundancy
  2015-05-06  3:40     ` Paul Jones
  2015-05-06 10:14       ` Duncan
@ 2015-05-12 15:04       ` David Sterba
  1 sibling, 0 replies; 7+ messages in thread
From: David Sterba @ 2015-05-12 15:04 UTC (permalink / raw)
  To: Paul Jones; +Cc: sam tygier, linux-btrfs

On Wed, May 06, 2015 at 03:40:12AM +0000, Paul Jones wrote:
> I would appreciate being able to use the DUP profile for data on a
> single disk - at the moment I just resort to partitioning the disk in
> two and creating a raid1.

As you can read elsewhere, it's possible with the mixed profiles and
AFAICT this will be the only way for the forseeable future.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-05-12 15:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-02 16:03 [PATCH] btrfs-progs: check metadata redundancy sam tygier
2015-05-05 14:54 ` David Sterba
2015-05-05 21:18   ` sam tygier
2015-05-06  3:40     ` Paul Jones
2015-05-06 10:14       ` Duncan
2015-05-12 15:04       ` David Sterba
2015-05-12 15:02     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.