All of lore.kernel.org
 help / color / mirror / Atom feed
* Slow Write Performance w/ No Cache Enabled and Different Size Drives
@ 2014-04-20 17:27 Adam Brenner
  2014-04-20 20:54 ` Chris Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: Adam Brenner @ 2014-04-20 17:27 UTC (permalink / raw)
  To: linux-btrfs

Howdy,

I recently setup a new BTRFS filesystem based on BTRFS version 3.12 on
Linux kernel 3.13-1 running Debian Jessie.

The BTRFS volume spans 3x 4TB disks, two of which are using the entire
raw block device, and one of them is using a partition (OS disks). The
setup is like so:


     root@gra-dfs:/data/tmp# btrfs filesystem show
     Label: none  uuid: 63d51c9b-f851-404f-b0f2-bf84d07df163
	    Total devices 3 FS bytes used 3.03TiB
	    devid    1 size 3.61TiB used 1.01TiB path /dev/sda3
	    devid    2 size 3.64TiB used 1.04TiB path /dev/sdb
	    devid    3 size 3.64TiB used 1.04TiB path /dev/sdc

     Btrfs v3.12
     root@gra-dfs:/data/tmp# btrfs filesystem df /data
     Data, single: total=3.07TiB, used=3.03TiB
     System, RAID1: total=8.00MiB, used=352.00KiB
     System, single: total=4.00MiB, used=0.00
     Metadata, RAID1: total=5.00GiB, used=3.60GiB
     Metadata, single: total=8.00MiB, used=0.00
     root@gra-dfs:/data/tmp#
     root@gra-dfs:/home# mount | grep /data
     /dev/sda3 on /data type btrfs (rw,noatime,space_cache)
     root@gra-dfs:/home#


The setup is supposed to be "RAID-0 like" but with different size drives
within the volume, I created the BTRFS filesystem using the following
command based on the WiKi[1]

     mkfs.btrfs -d single /dev/sda3 /dev/sdb /dev/sdc -f

Once setup, I transferred roughly 3.1TB of data and noticed the write 
speed was limited to 200MB/s. This is the same write speed that I would 
see across a single device. I used dd with oflag=direct and a block size 
of 1M and a count of 1024 from /dev/zero. Both showed the same speeds.

So my question is, should I have setup the BTRFS filesystem with -d 
raid0? Would this have worked with multiple devices with different sizes?

[1]: 
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices


-- 
Adam Brenner <adam@aeb.io>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
  2014-04-20 17:27 Slow Write Performance w/ No Cache Enabled and Different Size Drives Adam Brenner
@ 2014-04-20 20:54 ` Chris Murphy
  2014-04-20 21:04   ` Chris Murphy
  2014-04-21  4:56   ` Adam Brenner
  0 siblings, 2 replies; 10+ messages in thread
From: Chris Murphy @ 2014-04-20 20:54 UTC (permalink / raw)
  To: Adam Brenner; +Cc: linux-btrfs


On Apr 20, 2014, at 11:27 AM, Adam Brenner <adam@aeb.io> wrote:
> 
>    mkfs.btrfs -d single /dev/sda3 /dev/sdb /dev/sdc -f
> 
> Once setup, I transferred roughly 3.1TB of data and noticed the write speed was limited to 200MB/s. This is the same write speed that I would see across a single device. I used dd with oflag=direct and a block size of 1M and a count of 1024 from /dev/zero. Both showed the same speeds.

This is expected. And although I haven't tested it, I think you'd get the same results with multiple threads writing at the same time: the allocation would aggregate the threads to one chunk at a time until full, which means writing to one device at a time, then writing a new chunk on a different device until full, and so on in round robin fashion.

I also haven't tested this, so I'm not sure if different behavior happens for reading files located in different chunks on different devices, if those are effectively single threaded reads, or if the files can be read simultaneously.

> 
> So my question is, should I have setup the BTRFS filesystem with -d raid0? Would this have worked with multiple devices with different sizes?

raid0 does work with multiple devices of different sizes, but it won't use the full capacity of the last drive with the most space.

For example: 2GB, 3GB, and 4GB devices as raid0.

The first 2GB copies using 3 stripes, one per device, until the 2GB device is full. The next 1GB copies using 2 stripes, one per remaining device (the 3GB and 4GB ones) until the 3GB device is full. Additional copying results in "cp: error writing ‘./IMG_2892.dng’: No space left on device"

Label: none  uuid: 7dfde9eb-04a8-4920-95c0-51253b2483f8
	Total devices 3 FS bytes used 7.33GiB
	devid    1 size 2.00GiB used 2.00GiB path /dev/sdb
	devid    2 size 3.00GiB used 3.00GiB path /dev/sdc
	devid    3 size 4.00GiB used 3.00GiB path /dev/sdd


Ergo, there is no such thing as single device raid0, so the point at which all but 1 drive is full, writes fail.


Chris Murphy


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
  2014-04-20 20:54 ` Chris Murphy
@ 2014-04-20 21:04   ` Chris Murphy
  2014-04-21  4:56   ` Adam Brenner
  1 sibling, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2014-04-20 21:04 UTC (permalink / raw)
  To: Btrfs BTRFS


On Apr 20, 2014, at 2:54 PM, Chris Murphy <lists@colorremedies.com> wrote:

> 
> Ergo, there is no such thing as single device raid0, so the point at which all but 1 drive is full, writes fail.

Correction. Data writes fail. Metadata writes apparently still succeed, as zero length files are created. I now have several hundred such files. But no implosion, and the file system continues to work.

-rwxr-x---. 1 root root      0 Apr 20 15:01 IMG_3328.dng

I suppose this has pros and cons.


Chris Murphy


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
  2014-04-20 20:54 ` Chris Murphy
  2014-04-20 21:04   ` Chris Murphy
@ 2014-04-21  4:56   ` Adam Brenner
  2014-04-21  5:32     ` Chris Murphy
  2014-04-21 21:09     ` Duncan
  1 sibling, 2 replies; 10+ messages in thread
From: Adam Brenner @ 2014-04-21  4:56 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs

On 04/20/2014 01:54 PM, Chris Murphy wrote:
>
> This is expected. And although I haven't tested it, I think you'd get
> the same results with multiple threads writing at the same time: the
> allocation would aggregate the threads to one chunk at a time until
> full, which means writing to one device at a time, then writing a new
> chunk on a different device until full, and so on in round robin
> fashion.

Interesting and some what shocked -- if I am reading this correctly!

So ... BTRFS at this point in time, does not actually "stripe" the data 
across N number of devices/blocks for aggregated performance increase 
(both read and write)?

Essentially running mdadm with ext4 or XFS would offer better 
performance then BTRFS right now (and possible the ZFS on Linux project)?

I think I may be missing a keypoint here (or not RTFM)?



The WiKi page[1] clearly shows that the command I used to create my 
current setup (-m single) will *not* stripe the data which should have 
been the first _red_ flag for me! However, if I go ahead and create using

    mkfs.btrfs -d raid0 /dev/sda3 /dev/sdb /dev/sdc

This *should* stripe the data and improve read and write performance? 
But according to what Chris wrote above, this is not true? Just want 
some clarification on this.

>
>>
>> So my question is, should I have setup the BTRFS filesystem with -d
>> raid0? Would this have worked with multiple devices with different
>> sizes?
>
> raid0 does work with multiple devices of different sizes, but it
> won't use the full capacity of the last drive with the most space.
>
> For example: 2GB, 3GB, and 4GB devices as raid0.
>
> The first 2GB copies using 3 stripes, one per device, until the 2GB
> device is full. The next 1GB copies using 2 stripes, one per
> remaining device (the 3GB and 4GB ones) until the 3GB device is full.
> Additional copying results in "cp: error writing ‘./IMG_2892.dng’: No
> space left on device"

I am sorry, I do not quite understand this. If I read this correctly, we 
are copying a file that is larger then the total raid0 filesystem 
(9GB?). The point at which writes fail is at the magic number of 5GB -- 
which is where the two devices are full?

So going back to the setup I currently have:

     Label: none  uuid: 63d51c9b-f851-404f-b0f2-bf84d07df163
         Total devices 3 FS bytes used 3.03TiB
         devid    1 size 3.61TiB used 1.01TiB path /dev/sda3
         devid    2 size 3.64TiB used 1.04TiB path /dev/sdb
         devid    3 size 3.64TiB used 1.04TiB path /dev/sdc

If /dev/sda3 and /dev/sdb are full, but room is still left on /dev/sdc, 
writes file -- but the metadata will continue to succeed taking up 
ionodes and creating zero length files?

[1]: 
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices

-- 
Adam Brenner <adam@aeb.io>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
  2014-04-21  4:56   ` Adam Brenner
@ 2014-04-21  5:32     ` Chris Murphy
  2014-04-21 21:09     ` Duncan
  1 sibling, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2014-04-21  5:32 UTC (permalink / raw)
  To: Adam Brenner; +Cc: linux-btrfs


On Apr 20, 2014, at 10:56 PM, Adam Brenner <adam@aeb.io> wrote:

> On 04/20/2014 01:54 PM, Chris Murphy wrote:
>> 
>> This is expected. And although I haven't tested it, I think you'd get
>> the same results with multiple threads writing at the same time: the
>> allocation would aggregate the threads to one chunk at a time until
>> full, which means writing to one device at a time, then writing a new
>> chunk on a different device until full, and so on in round robin
>> fashion.
> 
> Interesting and some what shocked -- if I am reading this correctly!
> 
> So ... BTRFS at this point in time, does not actually "stripe" the data across N number of devices/blocks for aggregated performance increase (both read and write)?

Not for the single profile.

> The WiKi page[1] clearly shows that the command I used to create my current setup (-m single) will *not* stripe the data which should have been the first _red_ flag for me! However, if I go ahead and create using
> 
>   mkfs.btrfs -d raid0 /dev/sda3 /dev/sdb /dev/sdc
> 
> This *should* stripe the data and improve read and write performance?

It does.


> But according to what Chris wrote above, this is not true?

What I wrote above applies to single profile because I had just quoted your mkfs command which used -d single.

>>> 
>>> So my question is, should I have setup the BTRFS filesystem with -d
>>> raid0? Would this have worked with multiple devices with different
>>> sizes?
>> 
>> raid0 does work with multiple devices of different sizes, but it
>> won't use the full capacity of the last drive with the most space.
>> 
>> For example: 2GB, 3GB, and 4GB devices as raid0.
>> 
>> The first 2GB copies using 3 stripes, one per device, until the 2GB
>> device is full. The next 1GB copies using 2 stripes, one per
>> remaining device (the 3GB and 4GB ones) until the 3GB device is full.
>> Additional copying results in "cp: error writing ‘./IMG_2892.dng’: No
>> space left on device"
> 
> I am sorry, I do not quite understand this. If I read this correctly, we are copying a file that is larger then the total raid0 filesystem (9GB?).

The size of the file doesn't matter. I was copying a bunch of DNGs that are ~20MB each.

> The point at which writes fail is at the magic number of 5GB -- which is where the two devices are full?

No, actually I described it incorrectly. The first 6GB copies using 3 stripes. The smallest device is 2GB. Each device can accept at least 2GB. So that's 6GB. Anything after 6GB is striped across two devices until the 2nd device is full, and then at that point there's failure to write data since single stripe raid0 is apparently disallowed. And as I reported btrfs fi show, it became full at ~8GB (7.33GB of data): 3GB + 3GB + 2GB = 8GB. Thus the extra 1GB on sdd was not usable (except for some metadata).


> So going back to the setup I currently have:
> 
>    Label: none  uuid: 63d51c9b-f851-404f-b0f2-bf84d07df163
>        Total devices 3 FS bytes used 3.03TiB
>        devid    1 size 3.61TiB used 1.01TiB path /dev/sda3
>        devid    2 size 3.64TiB used 1.04TiB path /dev/sdb
>        devid    3 size 3.64TiB used 1.04TiB path /dev/sdc
> 
> If /dev/sda3 and /dev/sdb are full, but room is still left on /dev/sdc, writes file -- but the metadata will continue to succeed taking up ionodes and creating zero length files?

I don't see how your example can happen. sda3 will become full before either sdb or sdc because it's the smallest device in the volume. sdb and sdc are the same size, but might have slightly different amounts of data chunks if the raid1 metadata allocation differs between the two. So yeah one of them probably will fill up before the other, but from a practical standpoint they will fill up at the same time. And at that point, user space reports write errors, but yes I do see zero length files being created, that's limited by the fact metadata is raid1 so with all devices except one full, once all but one metadata chunk is full, then metadata writes would fail too.

And just FYI, if you care about your data or time, I wouldn't try filling up the volume to the brim. Btrfs still gets pretty fussy when that happens, with varying degrees of success in unwinding it.


Chris Murphy


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
  2014-04-21  4:56   ` Adam Brenner
  2014-04-21  5:32     ` Chris Murphy
@ 2014-04-21 21:09     ` Duncan
  2014-04-22 17:42       ` Chris Murphy
  1 sibling, 1 reply; 10+ messages in thread
From: Duncan @ 2014-04-21 21:09 UTC (permalink / raw)
  To: linux-btrfs

Adam Brenner posted on Sun, 20 Apr 2014 21:56:10 -0700 as excerpted:

> So ... BTRFS at this point in time, does not actually "stripe" the data
> across N number of devices/blocks for aggregated performance increase
> (both read and write)?

What Chris says is correct, but just in case it's unclear as written, let 
me try a reworded version, perhaps addressing a few uncaught details in 
the process.

1) Btrfs treats data and metadata separately, so unless they're both 
setup the same way (both raid0 or both single or whatever), different 
rules will apply to each.

2) Btrfs separately allocates data and metadata chunks, then fills them 
in until it needs to allocate more.  So as the filesystem fills, there 
will come a point at which all space is allocated to either data or 
metadata chunks and no more chunk allocations can be made.  At this 
point, you can still write to the filesystem, filling up the chunks that 
are there, but one or the other will fill up first, and then you'll get 
errors.

2a) By default, data chunks are 1 GiB in size, metadata chunks are 256 
MiB, altho the last ones written can be smaller to fill the available 
space.   Note that except for single mode, all chunks must be written in 
multiples: pairs for dup, raid1, a minimum of pairs for raid0, a minimum 
of triplets for raid5, a minimum of quads for raid6, raid10.  Thus, when 
using unequal sized devices or a number of devices that doesn't evenly 
match the minimum multiple, it's very likely that depending on the size 
of the individual devices, some space may not actually be allocatable.  
This is what Chris was seeing with his 3 device raid0, 2G, 3G, 4G.  The 
first two fill up, leaving no room to allocate in pairs+, with a gig of 
space left unused on the 4G device.

2b) For various reasons it usually the metadata that fills up first.  
When that happens, further operations (even attempting to delete files, 
since on a COW filesystem deletions require room to rewrite the metadata) 
return ENOSPC.  There are various tricks that can be tried when this 
happens (balance, etc) to recover some likely not yet full data chunks to 
unallocated and thus have more room to write metadata, but ideally, you 
watch the btrfs filesystem df and btrfs filesystem show stats and 
rebalance before you start getting ENOSPC errors.

It's also worth noting that btrfs reserves some metadata space, typically 
around 200 MiB, for its own usage.  Since metadata chunks are normally 
256 MiB in size, an easy way to look at it is to simply say you always 
need a spare metadata chunk allocated.  Once the filesystem cannot 
allocate more and you're on your last one, you run into ENOSPC trouble 
pretty quickly.

2c) Chris has reported the opposite situation in his test.  With no more 
space to allocate, he filled up his data chunks first.  At that point 
there's metadata space still available, thus the zero-length files he was 
reporting.  (Technically, he could probably write really small files too, 
because if they're small enough, likely something under 16 KiB and 
possibly something under 4 KiB, depending on the metadata node size (4 KiB 
by default until recently, 16 KiB from IIRC kernel 3.13), btrfs will 
write them directly into the metadata node and not actually allocate a 
data extent for them.  But the ~20 MiB files he was trying were too big 
for that, so he was getting the metadata allocation but not the data, 
thus zero-length files.)

Again, a rebalance might be able to return some unused metadata chunks to 
the unallocated pool, allowing a little more data to be written.

2d)  Still, if you keep adding more, there comes a point at which no more 
can be written using current data and metadata modes and there's no 
further partially written chunks to free using balance either, at which 
point the filesystem is full, even if there's still space left unused on 
one device.

With those basics in mind, we're now equipped to answer the question 
above.

On a multi-device filesystem, in default data allocation "single" mode, 
btrfs can sort of be said to stripe in theory, since it'll allocate 
chunks from all available devices, but since it's allocating and using 
only a single data chunk at a time and they're a GiB in size, the 
"stripes" are effectively a GiB in size, far too large to get any 
practical speedup from them.

But single mode does allow using that last bit of space on unevenly sized 
devices, and if a device goes bad, you can still recover files written to 
the other devices.

OTOH, raid0 mode will allocate in gig chunks per device across all 
available devices (minimum two) at once and will then write in much 
smaller stripes (IIRC 64 KiB, since that's the normal device read-ahead 
size) in the pre-allocated chunks, giving you far faster single-thread 
access.

But raid0 mode does require pair-minimum chunk allocation, so if the 
devices are uneven in size, depending on exact device sizes you'll likely 
end up with some unusable space on the last device.  Also, as is normally 
the case with raid0, if a device dies, consider the entire filesystem 
toast.  (In theory you can often still recover some files smaller than 
the stripe size, particularly if the metadata was raid1 as it is by 
default so it's still available, but in practice, if you're storing 
anything but throwaway data on a raid0 and/or you don't have current/
tested backups, you're abusing raid0 and playing Russian roulette with 
your data.  Just don't put valuable data on raid1 in the first place and/
or keep current/tested backups, and you can simply scrap the raid0 when a 
device dies without worry.)

OTOH, I vastly prefer raid1 here, both for the traditional device-fail 
redundancy and to take advantage of btrfs' data integrity features should 
one copy of the data go bad for some reason.  My biggest gripe is that 
currently btrfs raid1 only does pair-mirroring regardless of the number 
of devices thrown at it, and my sweet-spot is triplet-mirroring, which 
I'd really *REALLY* like to have available, just in case.  Oh, well...  
Anyway, for multi-threaded primarily read-based IO, raid1 mode is the 
better choice, since you get N-thread access in parallel, with N=number-
of-mirrors.  (Again, I'd really REALLY like N=3, but oh, well... it's on 
the roadmap.  I'll have to wait...)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
  2014-04-21 21:09     ` Duncan
@ 2014-04-22 17:42       ` Chris Murphy
  2014-04-22 17:56         ` Hugo Mills
  2014-04-23  3:18         ` Duncan
  0 siblings, 2 replies; 10+ messages in thread
From: Chris Murphy @ 2014-04-22 17:42 UTC (permalink / raw)
  To: Btrfs BTRFS


On Apr 21, 2014, at 3:09 PM, Duncan <1i5t5.duncan@cox.net> wrote:

> Adam Brenner posted on Sun, 20 Apr 2014 21:56:10 -0700 as excerpted:
> 
>> So ... BTRFS at this point in time, does not actually "stripe" the data
>> across N number of devices/blocks for aggregated performance increase
>> (both read and write)?
> 
> What Chris says is correct, but just in case it's unclear as written, let 
> me try a reworded version, perhaps addressing a few uncaught details in 
> the process.

Another likely problem is terminology. It's 2014 and still we don't have consistency in basic RAID terminology. We're functionally in the 19th century uncoordinated disagreement of weights and measures, except maybe worse because we sometimes have multiple words that mean the same thing; as if there were multiple words for the term gram or meter. It's just nonsensical and selfish that this continues to persist across various file system projects.

It's not immediately obvious to the btrfs newcomer that the md raid chunk isn't the same thing as the btrfs chunk, for example.

And strip, chunk, stripe unit, and stripe size get used interchangeably to mean the same thing, while just as often stripe size means something different. The best definition I've found so far is IBM's stripe unit definition: "granularity at which data is stored on one drive of the array before subsequent data is stored on the next drive of the array" which is in bytes. So that's the smallest raid unit we find on a drive, therefore it is a base unit in RAID, and yet we have no agreement on what word to use.

And it's not really like the storage industry trade association, SNIA, who published a dictionary of terms in 2013, really helps in this area. I'll argue they make it worse because they deprecate the term chunk, in favor of the terms strip and stripe element. NO kidding, two terms mean the same thing. Yet strip and stripe are NOT the same thing.

strip = stripe element
stripe = set of strips
strip size = stripe depth
stripe size = strip size * extents not including parity extents

Also the units are in blocks (sectors, not fs blocks and not bytes). The terms stripe unit, stripe width, and stride aren't found in the SNIA dictionary at all although they are found as terms in other file system projects.

So no matter how we look at it, everyone else is doing it wrong.



Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
  2014-04-22 17:42       ` Chris Murphy
@ 2014-04-22 17:56         ` Hugo Mills
  2014-04-22 18:41           ` Chris Murphy
  2014-04-23  3:18         ` Duncan
  1 sibling, 1 reply; 10+ messages in thread
From: Hugo Mills @ 2014-04-22 17:56 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3010 bytes --]

On Tue, Apr 22, 2014 at 11:42:09AM -0600, Chris Murphy wrote:
> 
> On Apr 21, 2014, at 3:09 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> 
> > Adam Brenner posted on Sun, 20 Apr 2014 21:56:10 -0700 as excerpted:
> > 
> >> So ... BTRFS at this point in time, does not actually "stripe" the data
> >> across N number of devices/blocks for aggregated performance increase
> >> (both read and write)?
> > 
> > What Chris says is correct, but just in case it's unclear as written, let 
> > me try a reworded version, perhaps addressing a few uncaught details in 
> > the process.
> 
> Another likely problem is terminology. It's 2014 and still we don't have consistency in basic RAID terminology. We're functionally in the 19th century uncoordinated disagreement of weights and measures, except maybe worse because we sometimes have multiple words that mean the same thing; as if there were multiple words for the term gram or meter. It's just nonsensical and selfish that this continues to persist across various file system projects.
> 
> It's not immediately obvious to the btrfs newcomer that the md raid chunk isn't the same thing as the btrfs chunk, for example.
> 
> And strip, chunk, stripe unit, and stripe size get used interchangeably to mean the same thing, while just as often stripe size means something different. The best definition I've found so far is IBM's stripe unit definition: "granularity at which data is stored on one drive of the array before subsequent data is stored on the next drive of the array" which is in bytes. So that's the smallest raid unit we find on a drive, therefore it is a base unit in RAID, and yet we have no agreement on what word to use.
> 
> And it's not really like the storage industry trade association, SNIA, who published a dictionary of terms in 2013, really helps in this area. I'll argue they make it worse because they deprecate the term chunk, in favor of the terms strip and stripe element. NO kidding, two terms mean the same thing. Yet strip and stripe are NOT the same thing.
> 
> strip = stripe element
> stripe = set of strips
> strip size = stripe depth
> stripe size = strip size * extents not including parity extents
> 
> Also the units are in blocks (sectors, not fs blocks and not bytes). The terms stripe unit, stripe width, and stride aren't found in the SNIA dictionary at all although they are found as terms in other file system projects.
> 
> So no matter how we look at it, everyone else is doing it wrong.

   Also not helped by btrfs's co-option of the term "RAID-1" to mean
something that's not traditional RAID-1, and (internally) "stripe" and
"chunk" to mean things that don't match (I think) any of the
definitions above...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
        --- A clear conscience.  Where did you get this taste ---        
                         for luxuries,  Bernard?                         

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
  2014-04-22 17:56         ` Hugo Mills
@ 2014-04-22 18:41           ` Chris Murphy
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2014-04-22 18:41 UTC (permalink / raw)
  To: Hugo Mills; +Cc: Btrfs BTRFS


On Apr 22, 2014, at 11:56 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
> 
>   Also not helped by btrfs's co-option of the term "RAID-1" to mean
> something that's not traditional RAID-1, and (internally) "stripe" and
> "chunk" to mean things that don't match (I think) any of the
> definitions above…

Right. Although in btrfs's defense, if there's a single term that no one agrees on it's stripe. If any word ought to be deprecated as functionally useless beyond repair, it's that one.

But no, what SNIA did was deprecate chunk leaving it in some sense fair game for anyone to use a generic and otherwise meaningless term like btrfs does. Except for the fact that most everywhere else including md/mdadm it is very much in-use, means a very specific thing, and should not have been usurped for that reason alone.

I have an idea, let's have a gram and grame. Gram is gram, but grame is kilogram. The e will be silent, and makes the a change from a short sound to a long sound. And we'll make grame element equal to a gram, just in case people need a compound word when a single word for the base unit should be sufficient. And then we'll deprecate kilogram. Perfect! It's not confusing AT ALL! Everyone will love and adopt this right away instead of completely totally ignore it.

Truly, we are apes that just happen to wear pants.

Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
  2014-04-22 17:42       ` Chris Murphy
  2014-04-22 17:56         ` Hugo Mills
@ 2014-04-23  3:18         ` Duncan
  1 sibling, 0 replies; 10+ messages in thread
From: Duncan @ 2014-04-23  3:18 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Tue, 22 Apr 2014 11:42:09 -0600 as excerpted:


> On Apr 21, 2014, at 3:09 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> 
>> Adam Brenner posted on Sun, 20 Apr 2014 21:56:10 -0700 as excerpted:
>> 
>>> So ... BTRFS at this point in time, does not actually "stripe" the
>>> data across N number of devices/blocks for aggregated performance
>>> increase (both read and write)?
>> 
>> What Chris says is correct, but just in case it's unclear as written,
>> let me try a reworded version, perhaps addressing a few uncaught
>> details in the process.
> 
> Another likely problem is terminology. It's 2014 and still we don't have
> consistency in basic RAID terminology.

> It's not immediately obvious to the btrfs newcomer that the md raid
> chunk isn't the same thing as the btrfs chunk, for example.
> 
> And strip, chunk, stripe unit, and stripe size get used interchangeably
> to mean the same thing, while just as often stripe size means something
> different.

FWIW, I did hesitate at one point, then used "stripe" for what I guess 
should have been strip or stripe-unit, after considering and rejecting 
"chunk" as already in use.

But in any case, while btrfs single mode is distinct from btrfs raid0 
mode, and because the minimum single-mode unit is 1 GiB and thus too 
large to do practical raid0, on multiple devices btrfs single mode does 
in fact end up in a sort of raid0 layout, just with too big a "strip" to 
work as raid0 in practice.

IOW, btrfs single mode layout is one 1 GiB chunk on one device at a time, 
but btrfs will alternate devices with those 1 GiB chunks (choosing the 
one with the least usage from those available), *NOT* use one device 
until it's full, then another until its full, etc, like md/raid linear 
mode does.  In that way, the layout is raid0-like, even if the chunks are 
too big to be practical raid0.

Btrfs raid0 mode, however, *DOES* work as raid0 in practice.  It still 
allocates 1 GiB chunks per devices, but does so in parallel across all 
available devices, and then stripes at a unit far smaller than the 1 GiB 
chunk, using I believe a 64 or 128 KiB strip/stripe-unit/whatever, with 
the full stripe-size thus being that times the number of devices in 
parallel in the stripe.

<sigh>  It's all clear in my head, anyway! =:^(

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-04-23  3:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-20 17:27 Slow Write Performance w/ No Cache Enabled and Different Size Drives Adam Brenner
2014-04-20 20:54 ` Chris Murphy
2014-04-20 21:04   ` Chris Murphy
2014-04-21  4:56   ` Adam Brenner
2014-04-21  5:32     ` Chris Murphy
2014-04-21 21:09     ` Duncan
2014-04-22 17:42       ` Chris Murphy
2014-04-22 17:56         ` Hugo Mills
2014-04-22 18:41           ` Chris Murphy
2014-04-23  3:18         ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.