* Slow Write Performance w/ No Cache Enabled and Different Size Drives
@ 2014-04-20 17:27 Adam Brenner
2014-04-20 20:54 ` Chris Murphy
0 siblings, 1 reply; 10+ messages in thread
From: Adam Brenner @ 2014-04-20 17:27 UTC (permalink / raw)
To: linux-btrfs
Howdy,
I recently setup a new BTRFS filesystem based on BTRFS version 3.12 on
Linux kernel 3.13-1 running Debian Jessie.
The BTRFS volume spans 3x 4TB disks, two of which are using the entire
raw block device, and one of them is using a partition (OS disks). The
setup is like so:
root@gra-dfs:/data/tmp# btrfs filesystem show
Label: none uuid: 63d51c9b-f851-404f-b0f2-bf84d07df163
Total devices 3 FS bytes used 3.03TiB
devid 1 size 3.61TiB used 1.01TiB path /dev/sda3
devid 2 size 3.64TiB used 1.04TiB path /dev/sdb
devid 3 size 3.64TiB used 1.04TiB path /dev/sdc
Btrfs v3.12
root@gra-dfs:/data/tmp# btrfs filesystem df /data
Data, single: total=3.07TiB, used=3.03TiB
System, RAID1: total=8.00MiB, used=352.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID1: total=5.00GiB, used=3.60GiB
Metadata, single: total=8.00MiB, used=0.00
root@gra-dfs:/data/tmp#
root@gra-dfs:/home# mount | grep /data
/dev/sda3 on /data type btrfs (rw,noatime,space_cache)
root@gra-dfs:/home#
The setup is supposed to be "RAID-0 like" but with different size drives
within the volume, I created the BTRFS filesystem using the following
command based on the WiKi[1]
mkfs.btrfs -d single /dev/sda3 /dev/sdb /dev/sdc -f
Once setup, I transferred roughly 3.1TB of data and noticed the write
speed was limited to 200MB/s. This is the same write speed that I would
see across a single device. I used dd with oflag=direct and a block size
of 1M and a count of 1024 from /dev/zero. Both showed the same speeds.
So my question is, should I have setup the BTRFS filesystem with -d
raid0? Would this have worked with multiple devices with different sizes?
[1]:
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices
--
Adam Brenner <adam@aeb.io>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
2014-04-20 17:27 Slow Write Performance w/ No Cache Enabled and Different Size Drives Adam Brenner
@ 2014-04-20 20:54 ` Chris Murphy
2014-04-20 21:04 ` Chris Murphy
2014-04-21 4:56 ` Adam Brenner
0 siblings, 2 replies; 10+ messages in thread
From: Chris Murphy @ 2014-04-20 20:54 UTC (permalink / raw)
To: Adam Brenner; +Cc: linux-btrfs
On Apr 20, 2014, at 11:27 AM, Adam Brenner <adam@aeb.io> wrote:
>
> mkfs.btrfs -d single /dev/sda3 /dev/sdb /dev/sdc -f
>
> Once setup, I transferred roughly 3.1TB of data and noticed the write speed was limited to 200MB/s. This is the same write speed that I would see across a single device. I used dd with oflag=direct and a block size of 1M and a count of 1024 from /dev/zero. Both showed the same speeds.
This is expected. And although I haven't tested it, I think you'd get the same results with multiple threads writing at the same time: the allocation would aggregate the threads to one chunk at a time until full, which means writing to one device at a time, then writing a new chunk on a different device until full, and so on in round robin fashion.
I also haven't tested this, so I'm not sure if different behavior happens for reading files located in different chunks on different devices, if those are effectively single threaded reads, or if the files can be read simultaneously.
>
> So my question is, should I have setup the BTRFS filesystem with -d raid0? Would this have worked with multiple devices with different sizes?
raid0 does work with multiple devices of different sizes, but it won't use the full capacity of the last drive with the most space.
For example: 2GB, 3GB, and 4GB devices as raid0.
The first 2GB copies using 3 stripes, one per device, until the 2GB device is full. The next 1GB copies using 2 stripes, one per remaining device (the 3GB and 4GB ones) until the 3GB device is full. Additional copying results in "cp: error writing ‘./IMG_2892.dng’: No space left on device"
Label: none uuid: 7dfde9eb-04a8-4920-95c0-51253b2483f8
Total devices 3 FS bytes used 7.33GiB
devid 1 size 2.00GiB used 2.00GiB path /dev/sdb
devid 2 size 3.00GiB used 3.00GiB path /dev/sdc
devid 3 size 4.00GiB used 3.00GiB path /dev/sdd
Ergo, there is no such thing as single device raid0, so the point at which all but 1 drive is full, writes fail.
Chris Murphy
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
2014-04-20 20:54 ` Chris Murphy
@ 2014-04-20 21:04 ` Chris Murphy
2014-04-21 4:56 ` Adam Brenner
1 sibling, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2014-04-20 21:04 UTC (permalink / raw)
To: Btrfs BTRFS
On Apr 20, 2014, at 2:54 PM, Chris Murphy <lists@colorremedies.com> wrote:
>
> Ergo, there is no such thing as single device raid0, so the point at which all but 1 drive is full, writes fail.
Correction. Data writes fail. Metadata writes apparently still succeed, as zero length files are created. I now have several hundred such files. But no implosion, and the file system continues to work.
-rwxr-x---. 1 root root 0 Apr 20 15:01 IMG_3328.dng
I suppose this has pros and cons.
Chris Murphy
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
2014-04-20 20:54 ` Chris Murphy
2014-04-20 21:04 ` Chris Murphy
@ 2014-04-21 4:56 ` Adam Brenner
2014-04-21 5:32 ` Chris Murphy
2014-04-21 21:09 ` Duncan
1 sibling, 2 replies; 10+ messages in thread
From: Adam Brenner @ 2014-04-21 4:56 UTC (permalink / raw)
To: Chris Murphy; +Cc: linux-btrfs
On 04/20/2014 01:54 PM, Chris Murphy wrote:
>
> This is expected. And although I haven't tested it, I think you'd get
> the same results with multiple threads writing at the same time: the
> allocation would aggregate the threads to one chunk at a time until
> full, which means writing to one device at a time, then writing a new
> chunk on a different device until full, and so on in round robin
> fashion.
Interesting and some what shocked -- if I am reading this correctly!
So ... BTRFS at this point in time, does not actually "stripe" the data
across N number of devices/blocks for aggregated performance increase
(both read and write)?
Essentially running mdadm with ext4 or XFS would offer better
performance then BTRFS right now (and possible the ZFS on Linux project)?
I think I may be missing a keypoint here (or not RTFM)?
The WiKi page[1] clearly shows that the command I used to create my
current setup (-m single) will *not* stripe the data which should have
been the first _red_ flag for me! However, if I go ahead and create using
mkfs.btrfs -d raid0 /dev/sda3 /dev/sdb /dev/sdc
This *should* stripe the data and improve read and write performance?
But according to what Chris wrote above, this is not true? Just want
some clarification on this.
>
>>
>> So my question is, should I have setup the BTRFS filesystem with -d
>> raid0? Would this have worked with multiple devices with different
>> sizes?
>
> raid0 does work with multiple devices of different sizes, but it
> won't use the full capacity of the last drive with the most space.
>
> For example: 2GB, 3GB, and 4GB devices as raid0.
>
> The first 2GB copies using 3 stripes, one per device, until the 2GB
> device is full. The next 1GB copies using 2 stripes, one per
> remaining device (the 3GB and 4GB ones) until the 3GB device is full.
> Additional copying results in "cp: error writing ‘./IMG_2892.dng’: No
> space left on device"
I am sorry, I do not quite understand this. If I read this correctly, we
are copying a file that is larger then the total raid0 filesystem
(9GB?). The point at which writes fail is at the magic number of 5GB --
which is where the two devices are full?
So going back to the setup I currently have:
Label: none uuid: 63d51c9b-f851-404f-b0f2-bf84d07df163
Total devices 3 FS bytes used 3.03TiB
devid 1 size 3.61TiB used 1.01TiB path /dev/sda3
devid 2 size 3.64TiB used 1.04TiB path /dev/sdb
devid 3 size 3.64TiB used 1.04TiB path /dev/sdc
If /dev/sda3 and /dev/sdb are full, but room is still left on /dev/sdc,
writes file -- but the metadata will continue to succeed taking up
ionodes and creating zero length files?
[1]:
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices
--
Adam Brenner <adam@aeb.io>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
2014-04-21 4:56 ` Adam Brenner
@ 2014-04-21 5:32 ` Chris Murphy
2014-04-21 21:09 ` Duncan
1 sibling, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2014-04-21 5:32 UTC (permalink / raw)
To: Adam Brenner; +Cc: linux-btrfs
On Apr 20, 2014, at 10:56 PM, Adam Brenner <adam@aeb.io> wrote:
> On 04/20/2014 01:54 PM, Chris Murphy wrote:
>>
>> This is expected. And although I haven't tested it, I think you'd get
>> the same results with multiple threads writing at the same time: the
>> allocation would aggregate the threads to one chunk at a time until
>> full, which means writing to one device at a time, then writing a new
>> chunk on a different device until full, and so on in round robin
>> fashion.
>
> Interesting and some what shocked -- if I am reading this correctly!
>
> So ... BTRFS at this point in time, does not actually "stripe" the data across N number of devices/blocks for aggregated performance increase (both read and write)?
Not for the single profile.
> The WiKi page[1] clearly shows that the command I used to create my current setup (-m single) will *not* stripe the data which should have been the first _red_ flag for me! However, if I go ahead and create using
>
> mkfs.btrfs -d raid0 /dev/sda3 /dev/sdb /dev/sdc
>
> This *should* stripe the data and improve read and write performance?
It does.
> But according to what Chris wrote above, this is not true?
What I wrote above applies to single profile because I had just quoted your mkfs command which used -d single.
>>>
>>> So my question is, should I have setup the BTRFS filesystem with -d
>>> raid0? Would this have worked with multiple devices with different
>>> sizes?
>>
>> raid0 does work with multiple devices of different sizes, but it
>> won't use the full capacity of the last drive with the most space.
>>
>> For example: 2GB, 3GB, and 4GB devices as raid0.
>>
>> The first 2GB copies using 3 stripes, one per device, until the 2GB
>> device is full. The next 1GB copies using 2 stripes, one per
>> remaining device (the 3GB and 4GB ones) until the 3GB device is full.
>> Additional copying results in "cp: error writing ‘./IMG_2892.dng’: No
>> space left on device"
>
> I am sorry, I do not quite understand this. If I read this correctly, we are copying a file that is larger then the total raid0 filesystem (9GB?).
The size of the file doesn't matter. I was copying a bunch of DNGs that are ~20MB each.
> The point at which writes fail is at the magic number of 5GB -- which is where the two devices are full?
No, actually I described it incorrectly. The first 6GB copies using 3 stripes. The smallest device is 2GB. Each device can accept at least 2GB. So that's 6GB. Anything after 6GB is striped across two devices until the 2nd device is full, and then at that point there's failure to write data since single stripe raid0 is apparently disallowed. And as I reported btrfs fi show, it became full at ~8GB (7.33GB of data): 3GB + 3GB + 2GB = 8GB. Thus the extra 1GB on sdd was not usable (except for some metadata).
> So going back to the setup I currently have:
>
> Label: none uuid: 63d51c9b-f851-404f-b0f2-bf84d07df163
> Total devices 3 FS bytes used 3.03TiB
> devid 1 size 3.61TiB used 1.01TiB path /dev/sda3
> devid 2 size 3.64TiB used 1.04TiB path /dev/sdb
> devid 3 size 3.64TiB used 1.04TiB path /dev/sdc
>
> If /dev/sda3 and /dev/sdb are full, but room is still left on /dev/sdc, writes file -- but the metadata will continue to succeed taking up ionodes and creating zero length files?
I don't see how your example can happen. sda3 will become full before either sdb or sdc because it's the smallest device in the volume. sdb and sdc are the same size, but might have slightly different amounts of data chunks if the raid1 metadata allocation differs between the two. So yeah one of them probably will fill up before the other, but from a practical standpoint they will fill up at the same time. And at that point, user space reports write errors, but yes I do see zero length files being created, that's limited by the fact metadata is raid1 so with all devices except one full, once all but one metadata chunk is full, then metadata writes would fail too.
And just FYI, if you care about your data or time, I wouldn't try filling up the volume to the brim. Btrfs still gets pretty fussy when that happens, with varying degrees of success in unwinding it.
Chris Murphy
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
2014-04-21 4:56 ` Adam Brenner
2014-04-21 5:32 ` Chris Murphy
@ 2014-04-21 21:09 ` Duncan
2014-04-22 17:42 ` Chris Murphy
1 sibling, 1 reply; 10+ messages in thread
From: Duncan @ 2014-04-21 21:09 UTC (permalink / raw)
To: linux-btrfs
Adam Brenner posted on Sun, 20 Apr 2014 21:56:10 -0700 as excerpted:
> So ... BTRFS at this point in time, does not actually "stripe" the data
> across N number of devices/blocks for aggregated performance increase
> (both read and write)?
What Chris says is correct, but just in case it's unclear as written, let
me try a reworded version, perhaps addressing a few uncaught details in
the process.
1) Btrfs treats data and metadata separately, so unless they're both
setup the same way (both raid0 or both single or whatever), different
rules will apply to each.
2) Btrfs separately allocates data and metadata chunks, then fills them
in until it needs to allocate more. So as the filesystem fills, there
will come a point at which all space is allocated to either data or
metadata chunks and no more chunk allocations can be made. At this
point, you can still write to the filesystem, filling up the chunks that
are there, but one or the other will fill up first, and then you'll get
errors.
2a) By default, data chunks are 1 GiB in size, metadata chunks are 256
MiB, altho the last ones written can be smaller to fill the available
space. Note that except for single mode, all chunks must be written in
multiples: pairs for dup, raid1, a minimum of pairs for raid0, a minimum
of triplets for raid5, a minimum of quads for raid6, raid10. Thus, when
using unequal sized devices or a number of devices that doesn't evenly
match the minimum multiple, it's very likely that depending on the size
of the individual devices, some space may not actually be allocatable.
This is what Chris was seeing with his 3 device raid0, 2G, 3G, 4G. The
first two fill up, leaving no room to allocate in pairs+, with a gig of
space left unused on the 4G device.
2b) For various reasons it usually the metadata that fills up first.
When that happens, further operations (even attempting to delete files,
since on a COW filesystem deletions require room to rewrite the metadata)
return ENOSPC. There are various tricks that can be tried when this
happens (balance, etc) to recover some likely not yet full data chunks to
unallocated and thus have more room to write metadata, but ideally, you
watch the btrfs filesystem df and btrfs filesystem show stats and
rebalance before you start getting ENOSPC errors.
It's also worth noting that btrfs reserves some metadata space, typically
around 200 MiB, for its own usage. Since metadata chunks are normally
256 MiB in size, an easy way to look at it is to simply say you always
need a spare metadata chunk allocated. Once the filesystem cannot
allocate more and you're on your last one, you run into ENOSPC trouble
pretty quickly.
2c) Chris has reported the opposite situation in his test. With no more
space to allocate, he filled up his data chunks first. At that point
there's metadata space still available, thus the zero-length files he was
reporting. (Technically, he could probably write really small files too,
because if they're small enough, likely something under 16 KiB and
possibly something under 4 KiB, depending on the metadata node size (4 KiB
by default until recently, 16 KiB from IIRC kernel 3.13), btrfs will
write them directly into the metadata node and not actually allocate a
data extent for them. But the ~20 MiB files he was trying were too big
for that, so he was getting the metadata allocation but not the data,
thus zero-length files.)
Again, a rebalance might be able to return some unused metadata chunks to
the unallocated pool, allowing a little more data to be written.
2d) Still, if you keep adding more, there comes a point at which no more
can be written using current data and metadata modes and there's no
further partially written chunks to free using balance either, at which
point the filesystem is full, even if there's still space left unused on
one device.
With those basics in mind, we're now equipped to answer the question
above.
On a multi-device filesystem, in default data allocation "single" mode,
btrfs can sort of be said to stripe in theory, since it'll allocate
chunks from all available devices, but since it's allocating and using
only a single data chunk at a time and they're a GiB in size, the
"stripes" are effectively a GiB in size, far too large to get any
practical speedup from them.
But single mode does allow using that last bit of space on unevenly sized
devices, and if a device goes bad, you can still recover files written to
the other devices.
OTOH, raid0 mode will allocate in gig chunks per device across all
available devices (minimum two) at once and will then write in much
smaller stripes (IIRC 64 KiB, since that's the normal device read-ahead
size) in the pre-allocated chunks, giving you far faster single-thread
access.
But raid0 mode does require pair-minimum chunk allocation, so if the
devices are uneven in size, depending on exact device sizes you'll likely
end up with some unusable space on the last device. Also, as is normally
the case with raid0, if a device dies, consider the entire filesystem
toast. (In theory you can often still recover some files smaller than
the stripe size, particularly if the metadata was raid1 as it is by
default so it's still available, but in practice, if you're storing
anything but throwaway data on a raid0 and/or you don't have current/
tested backups, you're abusing raid0 and playing Russian roulette with
your data. Just don't put valuable data on raid1 in the first place and/
or keep current/tested backups, and you can simply scrap the raid0 when a
device dies without worry.)
OTOH, I vastly prefer raid1 here, both for the traditional device-fail
redundancy and to take advantage of btrfs' data integrity features should
one copy of the data go bad for some reason. My biggest gripe is that
currently btrfs raid1 only does pair-mirroring regardless of the number
of devices thrown at it, and my sweet-spot is triplet-mirroring, which
I'd really *REALLY* like to have available, just in case. Oh, well...
Anyway, for multi-threaded primarily read-based IO, raid1 mode is the
better choice, since you get N-thread access in parallel, with N=number-
of-mirrors. (Again, I'd really REALLY like N=3, but oh, well... it's on
the roadmap. I'll have to wait...)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
2014-04-21 21:09 ` Duncan
@ 2014-04-22 17:42 ` Chris Murphy
2014-04-22 17:56 ` Hugo Mills
2014-04-23 3:18 ` Duncan
0 siblings, 2 replies; 10+ messages in thread
From: Chris Murphy @ 2014-04-22 17:42 UTC (permalink / raw)
To: Btrfs BTRFS
On Apr 21, 2014, at 3:09 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> Adam Brenner posted on Sun, 20 Apr 2014 21:56:10 -0700 as excerpted:
>
>> So ... BTRFS at this point in time, does not actually "stripe" the data
>> across N number of devices/blocks for aggregated performance increase
>> (both read and write)?
>
> What Chris says is correct, but just in case it's unclear as written, let
> me try a reworded version, perhaps addressing a few uncaught details in
> the process.
Another likely problem is terminology. It's 2014 and still we don't have consistency in basic RAID terminology. We're functionally in the 19th century uncoordinated disagreement of weights and measures, except maybe worse because we sometimes have multiple words that mean the same thing; as if there were multiple words for the term gram or meter. It's just nonsensical and selfish that this continues to persist across various file system projects.
It's not immediately obvious to the btrfs newcomer that the md raid chunk isn't the same thing as the btrfs chunk, for example.
And strip, chunk, stripe unit, and stripe size get used interchangeably to mean the same thing, while just as often stripe size means something different. The best definition I've found so far is IBM's stripe unit definition: "granularity at which data is stored on one drive of the array before subsequent data is stored on the next drive of the array" which is in bytes. So that's the smallest raid unit we find on a drive, therefore it is a base unit in RAID, and yet we have no agreement on what word to use.
And it's not really like the storage industry trade association, SNIA, who published a dictionary of terms in 2013, really helps in this area. I'll argue they make it worse because they deprecate the term chunk, in favor of the terms strip and stripe element. NO kidding, two terms mean the same thing. Yet strip and stripe are NOT the same thing.
strip = stripe element
stripe = set of strips
strip size = stripe depth
stripe size = strip size * extents not including parity extents
Also the units are in blocks (sectors, not fs blocks and not bytes). The terms stripe unit, stripe width, and stride aren't found in the SNIA dictionary at all although they are found as terms in other file system projects.
So no matter how we look at it, everyone else is doing it wrong.
Chris Murphy
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
2014-04-22 17:42 ` Chris Murphy
@ 2014-04-22 17:56 ` Hugo Mills
2014-04-22 18:41 ` Chris Murphy
2014-04-23 3:18 ` Duncan
1 sibling, 1 reply; 10+ messages in thread
From: Hugo Mills @ 2014-04-22 17:56 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
[-- Attachment #1: Type: text/plain, Size: 3010 bytes --]
On Tue, Apr 22, 2014 at 11:42:09AM -0600, Chris Murphy wrote:
>
> On Apr 21, 2014, at 3:09 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>
> > Adam Brenner posted on Sun, 20 Apr 2014 21:56:10 -0700 as excerpted:
> >
> >> So ... BTRFS at this point in time, does not actually "stripe" the data
> >> across N number of devices/blocks for aggregated performance increase
> >> (both read and write)?
> >
> > What Chris says is correct, but just in case it's unclear as written, let
> > me try a reworded version, perhaps addressing a few uncaught details in
> > the process.
>
> Another likely problem is terminology. It's 2014 and still we don't have consistency in basic RAID terminology. We're functionally in the 19th century uncoordinated disagreement of weights and measures, except maybe worse because we sometimes have multiple words that mean the same thing; as if there were multiple words for the term gram or meter. It's just nonsensical and selfish that this continues to persist across various file system projects.
>
> It's not immediately obvious to the btrfs newcomer that the md raid chunk isn't the same thing as the btrfs chunk, for example.
>
> And strip, chunk, stripe unit, and stripe size get used interchangeably to mean the same thing, while just as often stripe size means something different. The best definition I've found so far is IBM's stripe unit definition: "granularity at which data is stored on one drive of the array before subsequent data is stored on the next drive of the array" which is in bytes. So that's the smallest raid unit we find on a drive, therefore it is a base unit in RAID, and yet we have no agreement on what word to use.
>
> And it's not really like the storage industry trade association, SNIA, who published a dictionary of terms in 2013, really helps in this area. I'll argue they make it worse because they deprecate the term chunk, in favor of the terms strip and stripe element. NO kidding, two terms mean the same thing. Yet strip and stripe are NOT the same thing.
>
> strip = stripe element
> stripe = set of strips
> strip size = stripe depth
> stripe size = strip size * extents not including parity extents
>
> Also the units are in blocks (sectors, not fs blocks and not bytes). The terms stripe unit, stripe width, and stride aren't found in the SNIA dictionary at all although they are found as terms in other file system projects.
>
> So no matter how we look at it, everyone else is doing it wrong.
Also not helped by btrfs's co-option of the term "RAID-1" to mean
something that's not traditional RAID-1, and (internally) "stripe" and
"chunk" to mean things that don't match (I think) any of the
definitions above...
Hugo.
--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- A clear conscience. Where did you get this taste ---
for luxuries, Bernard?
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
2014-04-22 17:56 ` Hugo Mills
@ 2014-04-22 18:41 ` Chris Murphy
0 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2014-04-22 18:41 UTC (permalink / raw)
To: Hugo Mills; +Cc: Btrfs BTRFS
On Apr 22, 2014, at 11:56 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
>
> Also not helped by btrfs's co-option of the term "RAID-1" to mean
> something that's not traditional RAID-1, and (internally) "stripe" and
> "chunk" to mean things that don't match (I think) any of the
> definitions above…
Right. Although in btrfs's defense, if there's a single term that no one agrees on it's stripe. If any word ought to be deprecated as functionally useless beyond repair, it's that one.
But no, what SNIA did was deprecate chunk leaving it in some sense fair game for anyone to use a generic and otherwise meaningless term like btrfs does. Except for the fact that most everywhere else including md/mdadm it is very much in-use, means a very specific thing, and should not have been usurped for that reason alone.
I have an idea, let's have a gram and grame. Gram is gram, but grame is kilogram. The e will be silent, and makes the a change from a short sound to a long sound. And we'll make grame element equal to a gram, just in case people need a compound word when a single word for the base unit should be sufficient. And then we'll deprecate kilogram. Perfect! It's not confusing AT ALL! Everyone will love and adopt this right away instead of completely totally ignore it.
Truly, we are apes that just happen to wear pants.
Chris Murphy
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Slow Write Performance w/ No Cache Enabled and Different Size Drives
2014-04-22 17:42 ` Chris Murphy
2014-04-22 17:56 ` Hugo Mills
@ 2014-04-23 3:18 ` Duncan
1 sibling, 0 replies; 10+ messages in thread
From: Duncan @ 2014-04-23 3:18 UTC (permalink / raw)
To: linux-btrfs
Chris Murphy posted on Tue, 22 Apr 2014 11:42:09 -0600 as excerpted:
> On Apr 21, 2014, at 3:09 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>
>> Adam Brenner posted on Sun, 20 Apr 2014 21:56:10 -0700 as excerpted:
>>
>>> So ... BTRFS at this point in time, does not actually "stripe" the
>>> data across N number of devices/blocks for aggregated performance
>>> increase (both read and write)?
>>
>> What Chris says is correct, but just in case it's unclear as written,
>> let me try a reworded version, perhaps addressing a few uncaught
>> details in the process.
>
> Another likely problem is terminology. It's 2014 and still we don't have
> consistency in basic RAID terminology.
> It's not immediately obvious to the btrfs newcomer that the md raid
> chunk isn't the same thing as the btrfs chunk, for example.
>
> And strip, chunk, stripe unit, and stripe size get used interchangeably
> to mean the same thing, while just as often stripe size means something
> different.
FWIW, I did hesitate at one point, then used "stripe" for what I guess
should have been strip or stripe-unit, after considering and rejecting
"chunk" as already in use.
But in any case, while btrfs single mode is distinct from btrfs raid0
mode, and because the minimum single-mode unit is 1 GiB and thus too
large to do practical raid0, on multiple devices btrfs single mode does
in fact end up in a sort of raid0 layout, just with too big a "strip" to
work as raid0 in practice.
IOW, btrfs single mode layout is one 1 GiB chunk on one device at a time,
but btrfs will alternate devices with those 1 GiB chunks (choosing the
one with the least usage from those available), *NOT* use one device
until it's full, then another until its full, etc, like md/raid linear
mode does. In that way, the layout is raid0-like, even if the chunks are
too big to be practical raid0.
Btrfs raid0 mode, however, *DOES* work as raid0 in practice. It still
allocates 1 GiB chunks per devices, but does so in parallel across all
available devices, and then stripes at a unit far smaller than the 1 GiB
chunk, using I believe a 64 or 128 KiB strip/stripe-unit/whatever, with
the full stripe-size thus being that times the number of devices in
parallel in the stripe.
<sigh> It's all clear in my head, anyway! =:^(
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-04-23 3:18 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-20 17:27 Slow Write Performance w/ No Cache Enabled and Different Size Drives Adam Brenner
2014-04-20 20:54 ` Chris Murphy
2014-04-20 21:04 ` Chris Murphy
2014-04-21 4:56 ` Adam Brenner
2014-04-21 5:32 ` Chris Murphy
2014-04-21 21:09 ` Duncan
2014-04-22 17:42 ` Chris Murphy
2014-04-22 17:56 ` Hugo Mills
2014-04-22 18:41 ` Chris Murphy
2014-04-23 3:18 ` Duncan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.