All of lore.kernel.org
 help / color / mirror / Atom feed
* write-behind has no measurable effect?
@ 2011-02-14 21:38 Andras Korn
  2011-02-14 22:50 ` NeilBrown
  2011-02-14 22:56 ` Doug Dumitru
  0 siblings, 2 replies; 15+ messages in thread
From: Andras Korn @ 2011-02-14 21:38 UTC (permalink / raw)
  To: linux-raid

Hi,

I experimented a bit with write-mostly and write-behind and found that
write-mostly provides a very significant benefit (see below) but
write-behind seems to have no effect whatsoever.

This is not what I expected and I wonder if I missed something.

I built a RAID1 array from a 64GB Corsair SSD and two 7200rpm SATA hard
disks. I created xfs on the array, then benchmarked it using bonnie++,
iozone and by compiling linux 2.6.37 (with allyesconfig).

Some interesting benchmark results follow. I used a 2.6.38-rc2 kernel for
these measurements.

First, the stats that were identical (within a reasonable margin of error)
across all measurements:

bonnie++ blockwise sequential write: ~110MB/s
bonnie++ blockwise sequential rewrite: ~60MB/s
bonnie++ blockwise sequential read: ~160-175MB/s
iozone read, 16k block size: ~135MB/s
kernel compilation time, user: ~5450s (*)
kernel compilation time, system: 570s (*)

(*) I didn't measure kernel compilation times without write-mostly; I expect
they would've been worse.

Now for some of the measurements that resulted in (to me) surprising
differences:

Using just the SSD (so no RAID), xfs mounted with
"noatime,noikeep,attr2,logbufs=8,logbsize=256k":

bonnie++ seeks/s: 7791
iozone random read, 16k block size: ~46MB/s
iozone random write, 16k block size: ~44MB/s
iozone random read, 512k block size: ~130MB/s
iozone random write, 512k block size: ~140MB/s
wall clock kernel compile time: 887s

RAID1 from two disks and one SSD, the disks set to write-behind:

mdadm --create /dev/md/ssdraid --force --assume-clean --level=1 \
--raid-devices=3 --bitmap=internal --bitmap-chunk=262144 \
/dev/sdo2 --write-behind=16383 -W /dev/sd[nm]2

xfs mount options:
noatime,logbsize=256k,logbufs=8,noikeep,attr2,nodiratime,delaylog

bonnie++ seeks/s: 2087
iozone random read, 16k block size: ~43MB/s
iozone random write, 16k block size: ~3.7MB/s
iozone random read, 512k block size: ~126MB/s
iozone random write, 512k block size: ~69MB/s
wall clock kernel compile time: 936s

(Note the drastically reduced random write performance.)

Now the same setup, but with write-behind=0:

bonnie++ seeks/s: 1843
iozone random read, 16k block size: ~48MB/s
iozone random write, 16k block size: ~3.7MB/s
iozone random read, 512k block size: ~126MB/s
iozone random write, 512k block size: ~69MB/s
wall clock kernel compile time: 935s

So, the difference between write-behind=0 and write-behind=16383 (which
seems to be the maximum) is negligible (if not imaginary).

For reference, some results with even write-mostly disabled:

bonnie++ seeks/s: 487.4
iozone random read, 16k block size: ~3.7MB/s
iozone random write, 16k block size: ~3.7MB/s
iozone random read, 512k block size: ~58MB/s
iozone random write, 512k block size: ~69MB/s

(The full result set is available from
<http://elan.rulez.org/~korn/tmp/iobench.ods>, 27k.)

It's easy to see from the results that write-mostly does as advertised:
reads are mostly served by the SSD, so that random reads are approximately
as fast as when I only used the SSD.

I'd have expected write-behind to increase the apparent random write
performance though, and this didn't happen (there was no measurable
difference).

I thought maybe the iozone benchmark was too synthetic (too many writes in
too short a time, so that the buffer effect of write-behind is lost); that's
why I tried the kernel compilation, but I the raid array was as slow with
write-behind as without it.

Any idea why write-behind doesn't seem to have an effect?

Thanks

Andras

-- 
                     Andras Korn <korn at elan.rulez.org>
                 Keep your ears open - but your legs crossed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-14 21:38 write-behind has no measurable effect? Andras Korn
@ 2011-02-14 22:50 ` NeilBrown
  2011-02-14 22:57   ` Andras Korn
  2011-02-14 22:56 ` Doug Dumitru
  1 sibling, 1 reply; 15+ messages in thread
From: NeilBrown @ 2011-02-14 22:50 UTC (permalink / raw)
  To: Andras Korn; +Cc: linux-raid

On Mon, 14 Feb 2011 22:38:17 +0100 Andras Korn <korn@raidlist.elan.rulez.org>
wrote:

> Hi,
> 
> I experimented a bit with write-mostly and write-behind and found that
> write-mostly provides a very significant benefit (see below) but
> write-behind seems to have no effect whatsoever.

The use-case where write-behind can be expected to have an effect is when the
throughput is low enough to be well within the capacity of all devices, but
the latency of the write-behind device is higher than desired.
write-behind will allow that high latency to be hidden (as long as the
throughput limit is not exceeded).

I suspect your tests did not test for low latency in a low-throughput
scenario.

NeilBrown

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-14 21:38 write-behind has no measurable effect? Andras Korn
  2011-02-14 22:50 ` NeilBrown
@ 2011-02-14 22:56 ` Doug Dumitru
  2011-02-14 23:03   ` Andras Korn
  1 sibling, 1 reply; 15+ messages in thread
From: Doug Dumitru @ 2011-02-14 22:56 UTC (permalink / raw)
  To: Andras Korn; +Cc: linux-raid

Not to be too cute, but the man page for mdadm says that
--write-behind is only attempted on drives marked --write-mostly.  I
did not see a --write-mostly in your array create statement.

Also, are you trying to create a three-way-mirror or mirror the one
ssd to two HDDs as stripes.  If you want the latter, you need to
create a raid0 array and then the raid1 array.

For testing, two drives might produce fewer anomolies.

Doug Dumitru
EasyCo LLC

On Mon, Feb 14, 2011 at 1:38 PM, Andras Korn
<korn@raidlist.elan.rulez.org> wrote:
>
> Hi,
>
> I experimented a bit with write-mostly and write-behind and found that
> write-mostly provides a very significant benefit (see below) but
> write-behind seems to have no effect whatsoever.
>
> This is not what I expected and I wonder if I missed something.
>
> I built a RAID1 array from a 64GB Corsair SSD and two 7200rpm SATA hard
> disks. I created xfs on the array, then benchmarked it using bonnie++,
> iozone and by compiling linux 2.6.37 (with allyesconfig).
>
> Some interesting benchmark results follow. I used a 2.6.38-rc2 kernel for
> these measurements.
>
> First, the stats that were identical (within a reasonable margin of error)
> across all measurements:
>
> bonnie++ blockwise sequential write: ~110MB/s
> bonnie++ blockwise sequential rewrite: ~60MB/s
> bonnie++ blockwise sequential read: ~160-175MB/s
> iozone read, 16k block size: ~135MB/s
> kernel compilation time, user: ~5450s (*)
> kernel compilation time, system: 570s (*)
>
> (*) I didn't measure kernel compilation times without write-mostly; I expect
> they would've been worse.
>
> Now for some of the measurements that resulted in (to me) surprising
> differences:
>
> Using just the SSD (so no RAID), xfs mounted with
> "noatime,noikeep,attr2,logbufs=8,logbsize=256k":
>
> bonnie++ seeks/s: 7791
> iozone random read, 16k block size: ~46MB/s
> iozone random write, 16k block size: ~44MB/s
> iozone random read, 512k block size: ~130MB/s
> iozone random write, 512k block size: ~140MB/s
> wall clock kernel compile time: 887s
>
> RAID1 from two disks and one SSD, the disks set to write-behind:
>
> mdadm --create /dev/md/ssdraid --force --assume-clean --level=1 \
> --raid-devices=3 --bitmap=internal --bitmap-chunk=262144 \
> /dev/sdo2 --write-behind=16383 -W /dev/sd[nm]2
>
> xfs mount options:
> noatime,logbsize=256k,logbufs=8,noikeep,attr2,nodiratime,delaylog
>
> bonnie++ seeks/s: 2087
> iozone random read, 16k block size: ~43MB/s
> iozone random write, 16k block size: ~3.7MB/s
> iozone random read, 512k block size: ~126MB/s
> iozone random write, 512k block size: ~69MB/s
> wall clock kernel compile time: 936s
>
> (Note the drastically reduced random write performance.)
>
> Now the same setup, but with write-behind=0:
>
> bonnie++ seeks/s: 1843
> iozone random read, 16k block size: ~48MB/s
> iozone random write, 16k block size: ~3.7MB/s
> iozone random read, 512k block size: ~126MB/s
> iozone random write, 512k block size: ~69MB/s
> wall clock kernel compile time: 935s
>
> So, the difference between write-behind=0 and write-behind=16383 (which
> seems to be the maximum) is negligible (if not imaginary).
>
> For reference, some results with even write-mostly disabled:
>
> bonnie++ seeks/s: 487.4
> iozone random read, 16k block size: ~3.7MB/s
> iozone random write, 16k block size: ~3.7MB/s
> iozone random read, 512k block size: ~58MB/s
> iozone random write, 512k block size: ~69MB/s
>
> (The full result set is available from
> <http://elan.rulez.org/~korn/tmp/iobench.ods>, 27k.)
>
> It's easy to see from the results that write-mostly does as advertised:
> reads are mostly served by the SSD, so that random reads are approximately
> as fast as when I only used the SSD.
>
> I'd have expected write-behind to increase the apparent random write
> performance though, and this didn't happen (there was no measurable
> difference).
>
> I thought maybe the iozone benchmark was too synthetic (too many writes in
> too short a time, so that the buffer effect of write-behind is lost); that's
> why I tried the kernel compilation, but I the raid array was as slow with
> write-behind as without it.
>
> Any idea why write-behind doesn't seem to have an effect?
>
> Thanks
>
> Andras
>
> --
>                     Andras Korn <korn at elan.rulez.org>
>                 Keep your ears open - but your legs crossed.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Doug Dumitru
EasyCo LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-14 22:50 ` NeilBrown
@ 2011-02-14 22:57   ` Andras Korn
  2011-02-14 23:41     ` NeilBrown
  0 siblings, 1 reply; 15+ messages in thread
From: Andras Korn @ 2011-02-14 22:57 UTC (permalink / raw)
  To: linux-raid

On Tue, Feb 15, 2011 at 09:50:42AM +1100, NeilBrown wrote:

> > I experimented a bit with write-mostly and write-behind and found that
> > write-mostly provides a very significant benefit (see below) but
> > write-behind seems to have no effect whatsoever.
> 
> The use-case where write-behind can be expected to have an effect is when the
> throughput is low enough to be well within the capacity of all devices, but
> the latency of the write-behind device is higher than desired.
> write-behind will allow that high latency to be hidden (as long as the
> throughput limit is not exceeded).
> 
> I suspect your tests did not test for low latency in a low-throughput
> scenario.

I thought they did. "High latency" was, in my case, caused by the high seek
times (compared to the SSD) of the spinning disks. Throughput-wise, they
certainly could have kept up (their sequential read/write performance even
exceeds that of the SSD).

But maybe I misunderstand how write-behind works. I thought/hoped it would
commit writes to the fast drive(s) and mark affected areas dirty in the
intent map, then lazily sync the dirty areas over to the slow disk(s).

What does it actually do? md(4) isn't very forthcoming, and the wiki has no
relevant hits either.

Thanks.

-- 
                     Andras Korn <korn at elan.rulez.org>
                  Baroque: (def.) When you are out of Monet.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-14 22:56 ` Doug Dumitru
@ 2011-02-14 23:03   ` Andras Korn
  0 siblings, 0 replies; 15+ messages in thread
From: Andras Korn @ 2011-02-14 23:03 UTC (permalink / raw)
  To: linux-raid

On Mon, Feb 14, 2011 at 02:56:37PM -0800, Doug Dumitru wrote:

> Not to be too cute, but the man page for mdadm says that
> --write-behind is only attempted on drives marked --write-mostly.  I
> did not see a --write-mostly in your array create statement.

Yeah, I abbreviated it as -W:

> > mdadm --create /dev/md/ssdraid --force --assume-clean --level=1 \
> > --raid-devices=3 --bitmap=internal --bitmap-chunk=262144 \
> > /dev/sdo2 --write-behind=16383 -W /dev/sd[nm]2

> Also, are you trying to create a three-way-mirror or mirror the one

A three-way mirror.

-- 
                     Andras Korn <korn at elan.rulez.org>
                     Caesar si viveret, ad remum dareris.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-14 22:57   ` Andras Korn
@ 2011-02-14 23:41     ` NeilBrown
  2011-02-15  1:00       ` Andras Korn
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2011-02-14 23:41 UTC (permalink / raw)
  To: Andras Korn; +Cc: linux-raid

On Mon, 14 Feb 2011 23:57:54 +0100 Andras Korn <korn@raidlist.elan.rulez.org>
wrote:

> On Tue, Feb 15, 2011 at 09:50:42AM +1100, NeilBrown wrote:
> 
> > > I experimented a bit with write-mostly and write-behind and found that
> > > write-mostly provides a very significant benefit (see below) but
> > > write-behind seems to have no effect whatsoever.
> > 
> > The use-case where write-behind can be expected to have an effect is when the
> > throughput is low enough to be well within the capacity of all devices, but
> > the latency of the write-behind device is higher than desired.
> > write-behind will allow that high latency to be hidden (as long as the
> > throughput limit is not exceeded).
> > 
> > I suspect your tests did not test for low latency in a low-throughput
> > scenario.
> 
> I thought they did. "High latency" was, in my case, caused by the high seek
> times (compared to the SSD) of the spinning disks. Throughput-wise, they
> certainly could have kept up (their sequential read/write performance even
> exceeds that of the SSD).

A "MB/s" number is not going to show a difference with write-behind as it is
fundamentally about throughput.  We cannot turn random writes into sequential
writes just be doing 'write-behind' as the same locations on disk still have
to be written to.

You need a number like transactions-per-second to see a different.
If you write with O_SYNC, the write-behind will probably show a difference.


> 
> But maybe I misunderstand how write-behind works. I thought/hoped it would
> commit writes to the fast drive(s) and mark affected areas dirty in the
> intent map, then lazily sync the dirty areas over to the slow disk(s).
> 
> What does it actually do? md(4) isn't very forthcoming, and the wiki has no
> relevant hits either.

write-behind makes a copy of the data, submits writes to all devices in
parallel, and reports success to the upper layer as soon as all the
non-write-behind writes have finished.

The approach you suggest could be synthesised by:

 - add a write-intent bitmap with fairly small chunks.  This should be
   an external bitmap and should be directly on the fastest drive
 - have some daemon that fails the 'slow' device, waits 30 seconds, re-adds
   it, waits for recovery to complete, and loops back.

Actually I just realised another reason why you don' see any improvement.
You are using an internal bitmap.  This requires a synch write to both
devices.  The use-case for which write-behind was developed involved an
external bitmap.


Maybe I should disable bitmap updates to write-behind devices .....


NeilBrown

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-14 23:41     ` NeilBrown
@ 2011-02-15  1:00       ` Andras Korn
  2011-02-15  1:19         ` John Robinson
  0 siblings, 1 reply; 15+ messages in thread
From: Andras Korn @ 2011-02-15  1:00 UTC (permalink / raw)
  To: linux-raid

On Tue, Feb 15, 2011 at 10:41:09AM +1100, NeilBrown wrote:

> > > I suspect your tests did not test for low latency in a low-throughput
> > > scenario.
> > 
> > I thought they did. "High latency" was, in my case, caused by the high seek
> > times (compared to the SSD) of the spinning disks. Throughput-wise, they
> > certainly could have kept up (their sequential read/write performance even
> > exceeds that of the SSD).
> 
> A "MB/s" number is not going to show a difference with write-behind as it is
> fundamentally about throughput.  We cannot turn random writes into sequential
> writes just be doing 'write-behind' as the same locations on disk still have
> to be written to.

Thanks, I understand now; I had hoped write-behind would in fact re-order
the writes to the slow devices. In retrospect, I'm not sure what gave me
that notion. (Reckless optimism, probably. :)

> > What does it actually do? md(4) isn't very forthcoming, and the wiki has no
> > relevant hits either.
> 
> write-behind makes a copy of the data, submits writes to all devices in
> parallel, and reports success to the upper layer as soon as all the
> non-write-behind writes have finished.

So this really only makes a difference for synchronous writes (because
otherwise success would be reported as soon as the write is buffered),
right?

> The approach you suggest could be synthesised by:
> 
>  - add a write-intent bitmap with fairly small chunks.  This should be
>    an external bitmap and should be directly on the fastest drive
>  - have some daemon that fails the 'slow' device, waits 30 seconds, re-adds
>    it, waits for recovery to complete, and loops back.

Ewww. :)

> Actually I just realised another reason why you don' see any improvement.
> You are using an internal bitmap.  This requires a synch write to both
> devices.

Yes, that was something I actually wanted to ask. Since it's write_behind_,
it wouldn't need to be a synchronous write though - you could at least allow
the write-mostly disk to reorder it, couldn't you?

>  The use-case for which write-behind was developed involved an external
> bitmap.

My use case, fwiw, is that I have a single SSD and would like to exploit its
close-to-zero seek time while also providing redundancy (using spinning
disks) with eventual consistency. It's not for databases or anything
irreplaceable, just things like logs, svn working copies, vserver system
files... and an external jfs journal. (I know journal i/o is very nearly
sequential, but I don't have a spinning disk to dedicate to it, and if I use
the same disk for other purposes as well, seeking would definitely occur,
decreasing performance.)

> Maybe I should disable bitmap updates to write-behind devices .....

Or make them asynchronous, or lazy (like, update the bitmap whenever you
must seek into the vicinity anyway), or just infrequent. But yes, this
sounds like a very good idea.

Another approach to take would be to mark as dirty, on the fast devices, all
areas being written to, and in the background continuously synch them to the
slow devices, sequentially (marking as clean synched-and-as-yet-unwritten-to
areas); so that the array would be resyncing continually, but be very fast
for random writes. This would of course also require the bitmap to only be
synchronously updated on the fast devices.

Otoh, this is really a different mechanism from the current write-behind,
aimed at a different use-case, so maybe it could be implemented
orthogonally. (Patches welcome, I'm sure; it's times like these I hate not
being a coder.)

-- 
                     Andras Korn <korn at elan.rulez.org>
                    Take my advice, I don't use it anyway.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-15  1:00       ` Andras Korn
@ 2011-02-15  1:19         ` John Robinson
  2011-02-15  2:19           ` Andras Korn
  0 siblings, 1 reply; 15+ messages in thread
From: John Robinson @ 2011-02-15  1:19 UTC (permalink / raw)
  To: Andras Korn; +Cc: linux-raid

On 15/02/2011 01:00, Andras Korn wrote:
[...]
> Another approach to take would be to mark as dirty, on the fast devices, all
> areas being written to, and in the background continuously synch them to the
> slow devices, sequentially (marking as clean synched-and-as-yet-unwritten-to
> areas); so that the array would be resyncing continually, but be very fast
> for random writes. This would of course also require the bitmap to only be
> synchronously updated on the fast devices.
>
> Otoh, this is really a different mechanism from the current write-behind,
> aimed at a different use-case, so maybe it could be implemented
> orthogonally. (Patches welcome, I'm sure; it's times like these I hate not
> being a coder.)

I wonder whether bcache might do roughly what you want? I haven't tried 
it myself but it does sound interesting: "Hard drives are cheap and big, 
SSDs are fast but small and expensive. Wouldn't it be nice if you could 
transparently get the advantages of both? With Bcache, you can have your 
cake and eat it too." See http://bcache.evilpiepirate.org/

Cheers,

John.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-15  1:19         ` John Robinson
@ 2011-02-15  2:19           ` Andras Korn
       [not found]             ` <AANLkTikFSOePZJXknAt=Tx6+FpdJ4tiSNwpuwuPC3RY=@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Andras Korn @ 2011-02-15  2:19 UTC (permalink / raw)
  To: linux-raid

On Tue, Feb 15, 2011 at 01:19:33AM +0000, John Robinson wrote:

> >Another approach to take would be to mark as dirty, on the fast devices, all
> >areas being written to, and in the background continuously synch them to the
> >slow devices, sequentially (marking as clean synched-and-as-yet-unwritten-to
> >areas); so that the array would be resyncing continually, but be very fast
> >for random writes. This would of course also require the bitmap to only be
> >synchronously updated on the fast devices.
> >
> >Otoh, this is really a different mechanism from the current write-behind,
> >aimed at a different use-case, so maybe it could be implemented
> >orthogonally. (Patches welcome, I'm sure; it's times like these I hate not
> >being a coder.)
> 
> I wonder whether bcache might do roughly what you want? I haven't

It only does very roughly what I want (the idea there is to _cache_ a much
larger spinning disk using a relatively small SSD, whereas I basically want
them both to be the same size, with the disk eventually mirroring the
contents of the SSD); also, development of bcache has stalled (it doesn't
even compile with recent kernels and the developer has stated that he's
taking a break).

I also know of flashcache, which is similar to bcache and is more actively
developed, but is still lagging quite a few versions behind (the latest
kernel it works with is 2.6.32, I think; it certainly doesn't compile with
2.6.38).

So, while both of these may actually be good at what they do, neither of
them does what I have in mind and I also can't use either of them because I
need a newer kernel than what they support.

But thanks anyway.

-- 
                     Andras Korn <korn at elan.rulez.org>
                 I'm not nearly as think as you confused I am.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
       [not found]             ` <AANLkTikFSOePZJXknAt=Tx6+FpdJ4tiSNwpuwuPC3RY=@mail.gmail.com>
@ 2011-02-15  9:10               ` Roberto Spadim
  2011-02-15 12:40                 ` Andras Korn
  2011-02-16 12:00                 ` Andras Korn
  0 siblings, 2 replies; 15+ messages in thread
From: Roberto Spadim @ 2011-02-15  9:10 UTC (permalink / raw)
  To: Andras Korn; +Cc: linux-raid

andras could you make some benchmarks to raid1 with round robin read balance?
at this site:
www.spadim.com.br/raid1/

it's kernel 2.6.37 based


2011/2/15 Roberto Spadim <roberto@spadim.com.br>
>
> andras could you make some benchmarks to raid1 with round robin read balance?
> at this site:
> www.spadim.com.br/raid1/
>
> it's kernel 2.6.37 based
>
> 2011/2/14 Andras Korn <korn@raidlist.elan.rulez.org>
>>
>> On Tue, Feb 15, 2011 at 01:19:33AM +0000, John Robinson wrote:
>>
>> > >Another approach to take would be to mark as dirty, on the fast devices, all
>> > >areas being written to, and in the background continuously synch them to the
>> > >slow devices, sequentially (marking as clean synched-and-as-yet-unwritten-to
>> > >areas); so that the array would be resyncing continually, but be very fast
>> > >for random writes. This would of course also require the bitmap to only be
>> > >synchronously updated on the fast devices.
>> > >
>> > >Otoh, this is really a different mechanism from the current write-behind,
>> > >aimed at a different use-case, so maybe it could be implemented
>> > >orthogonally. (Patches welcome, I'm sure; it's times like these I hate not
>> > >being a coder.)
>> >
>> > I wonder whether bcache might do roughly what you want? I haven't
>>
>> It only does very roughly what I want (the idea there is to _cache_ a much
>> larger spinning disk using a relatively small SSD, whereas I basically want
>> them both to be the same size, with the disk eventually mirroring the
>> contents of the SSD); also, development of bcache has stalled (it doesn't
>> even compile with recent kernels and the developer has stated that he's
>> taking a break).
>>
>> I also know of flashcache, which is similar to bcache and is more actively
>> developed, but is still lagging quite a few versions behind (the latest
>> kernel it works with is 2.6.32, I think; it certainly doesn't compile with
>> 2.6.38).
>>
>> So, while both of these may actually be good at what they do, neither of
>> them does what I have in mind and I also can't use either of them because I
>> need a newer kernel than what they support.
>>
>> But thanks anyway.
>>
>> --
>>                     Andras Korn <korn at elan.rulez.org>
>>                 I'm not nearly as think as you confused I am.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial



--
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-15  9:10               ` Roberto Spadim
@ 2011-02-15 12:40                 ` Andras Korn
  2011-02-15 13:26                   ` Roberto Spadim
  2011-02-16 12:00                 ` Andras Korn
  1 sibling, 1 reply; 15+ messages in thread
From: Andras Korn @ 2011-02-15 12:40 UTC (permalink / raw)
  To: linux-raid

On Tue, Feb 15, 2011 at 06:10:17AM -0300, Roberto Spadim wrote:

> andras could you make some benchmarks to raid1 with round robin read balance?
> at this site:
> www.spadim.com.br/raid1/
> 
> it's kernel 2.6.37 based

Yes, I can do that. Can you give me some hints on what specific
configuration to try? I see you have some sysfs tunables. My raid1 array
consists of two spinning disks and an SSD, all local.

Do you expect this patch to make a difference in my case? With the spinning
disks marked as write-mostly, I'm getting close to the read performance of
the SSD (except for very small random reads, for some reason).

It's random writes that are much slower than with only the SSD.

-- 
                     Andras Korn <korn at elan.rulez.org>
                  There is no spoon(). But there is a fork().

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-15 12:40                 ` Andras Korn
@ 2011-02-15 13:26                   ` Roberto Spadim
  2011-02-15 17:46                     ` Roberto Spadim
  0 siblings, 1 reply; 15+ messages in thread
From: Roberto Spadim @ 2011-02-15 13:26 UTC (permalink / raw)
  To: Andras Korn; +Cc: linux-raid

writes will be the speed of slowest mirror (ssd or hd)
read can have speed improvement with this patch

there´s some options

/sys/block/md0/md/read_balance_mode

near_head = today implementation
round_robin => could be usefull if you have ssd only, since
round_robin consider that access time is the same for any drive (hard
disk access time is diferent for random and sequencial)
stripe => i didn´t get good benchmarks, but it´s nice to have it,
since we could put it over network, some sectors on one disk others on
another (nbd)
time_based => here you should send me some information about you disk
for example:
    access time of mirrors (check your drive information website, ssd
are normaly <0.1ms, hard disks near 10~20 ms)
    sequencial read speed (use dd if=/dev/sda of=/dev/zero bs=4096,
change the block size to size you will use with your filesystem, for
vertex2 ssd i´m using bs=4096, for disks it´s a good value too, since
disks bs ~= number of heads (2,4,8) )

send me the access time and sequencial read speed and i make the
values to tune your sysfs (/sys/block/md0/md/read_balance_config)

you will need to :

echo "time_based" > /sys/block/md0/md/read_balance_mode
echo "disks informations" > /sys/block/md0/md/read_balance_config

for each mirror, you can´t use sysfs file to configure it, maybe a
bash script is a better solution to configure it, on future version i
will change it and put at /sys/block/md0/md/dev-xxxx/

2011/2/15 Andras Korn <korn@raidlist.elan.rulez.org>:
> On Tue, Feb 15, 2011 at 06:10:17AM -0300, Roberto Spadim wrote:
>
>> andras could you make some benchmarks to raid1 with round robin read balance?
>> at this site:
>> www.spadim.com.br/raid1/
>>
>> it's kernel 2.6.37 based
>
> Yes, I can do that. Can you give me some hints on what specific
> configuration to try? I see you have some sysfs tunables. My raid1 array
> consists of two spinning disks and an SSD, all local.
>
> Do you expect this patch to make a difference in my case? With the spinning
> disks marked as write-mostly, I'm getting close to the read performance of
> the SSD (except for very small random reads, for some reason).
>
> It's random writes that are much slower than with only the SSD.
>
> --
>                     Andras Korn <korn at elan.rulez.org>
>                  There is no spoon(). But there is a fork().
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-15 13:26                   ` Roberto Spadim
@ 2011-02-15 17:46                     ` Roberto Spadim
  0 siblings, 0 replies; 15+ messages in thread
From: Roberto Spadim @ 2011-02-15 17:46 UTC (permalink / raw)
  To: Andras Korn; +Cc: linux-raid

at raid1.c there´s a example with ssd and hd with different speeds

2011/2/15 Roberto Spadim <roberto@spadim.com.br>:
> writes will be the speed of slowest mirror (ssd or hd)
> read can have speed improvement with this patch
>
> there´s some options
>
> /sys/block/md0/md/read_balance_mode
>
> near_head = today implementation
> round_robin => could be usefull if you have ssd only, since
> round_robin consider that access time is the same for any drive (hard
> disk access time is diferent for random and sequencial)
> stripe => i didn´t get good benchmarks, but it´s nice to have it,
> since we could put it over network, some sectors on one disk others on
> another (nbd)
> time_based => here you should send me some information about you disk
> for example:
>    access time of mirrors (check your drive information website, ssd
> are normaly <0.1ms, hard disks near 10~20 ms)
>    sequencial read speed (use dd if=/dev/sda of=/dev/zero bs=4096,
> change the block size to size you will use with your filesystem, for
> vertex2 ssd i´m using bs=4096, for disks it´s a good value too, since
> disks bs ~= number of heads (2,4,8) )
>
> send me the access time and sequencial read speed and i make the
> values to tune your sysfs (/sys/block/md0/md/read_balance_config)
>
> you will need to :
>
> echo "time_based" > /sys/block/md0/md/read_balance_mode
> echo "disks informations" > /sys/block/md0/md/read_balance_config
>
> for each mirror, you can´t use sysfs file to configure it, maybe a
> bash script is a better solution to configure it, on future version i
> will change it and put at /sys/block/md0/md/dev-xxxx/
>
> 2011/2/15 Andras Korn <korn@raidlist.elan.rulez.org>:
>> On Tue, Feb 15, 2011 at 06:10:17AM -0300, Roberto Spadim wrote:
>>
>>> andras could you make some benchmarks to raid1 with round robin read balance?
>>> at this site:
>>> www.spadim.com.br/raid1/
>>>
>>> it's kernel 2.6.37 based
>>
>> Yes, I can do that. Can you give me some hints on what specific
>> configuration to try? I see you have some sysfs tunables. My raid1 array
>> consists of two spinning disks and an SSD, all local.
>>
>> Do you expect this patch to make a difference in my case? With the spinning
>> disks marked as write-mostly, I'm getting close to the read performance of
>> the SSD (except for very small random reads, for some reason).
>>
>> It's random writes that are much slower than with only the SSD.
>>
>> --
>>                     Andras Korn <korn at elan.rulez.org>
>>                  There is no spoon(). But there is a fork().
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-15  9:10               ` Roberto Spadim
  2011-02-15 12:40                 ` Andras Korn
@ 2011-02-16 12:00                 ` Andras Korn
  2011-02-16 15:00                   ` Roberto Spadim
  1 sibling, 1 reply; 15+ messages in thread
From: Andras Korn @ 2011-02-16 12:00 UTC (permalink / raw)
  To: linux-raid

On Tue, Feb 15, 2011 at 06:10:17AM -0300, Roberto Spadim wrote:

> andras could you make some benchmarks to raid1 with round robin read balance?
> at this site:
> www.spadim.com.br/raid1/

For the record: we did some benchmarks and while the patch shows promise and
seems to cause no problems, it resulted in no measurable performance
increase for a RAID1 array composed of an SSD and two 7200rpm HDDs.

Andras

-- 
                     Andras Korn <korn at elan.rulez.org>
                 If it ain't broken, play with it until it is.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write-behind has no measurable effect?
  2011-02-16 12:00                 ` Andras Korn
@ 2011-02-16 15:00                   ` Roberto Spadim
  0 siblings, 0 replies; 15+ messages in thread
From: Roberto Spadim @ 2011-02-16 15:00 UTC (permalink / raw)
  To: Andras Korn; +Cc: linux-raid

a question here...
what happen if all disks are write-mostly and just ssd is write-behind?

why?
write-behind is an async feature (md only return ok to filesystem if
non write-behind disks are sync writen)
write-mostly is a read_balance optimization (ony read from that device
if all non-write-mostly devices fail)

making all disks write-mostly could allow us to use write-behind on
slowest(s) disk(s)

another idea...
could we change raid1 write code? how?
if a total of X write are done, return ok to filesystem, all other
devices are marked as write-behind (automatic write-behind) after sync
writes disks are marked as non-write-behind again
maybe a optimization is: what disk MUST be sync(only non
write-behind), what disk MUST BE async (only write-behind), what disk
can be async/sync (any write-behind type)


another question...
can read balance use write-mostly device in a very busy system without
failed devices (all mirrors are in sync)?


2011/2/16 Andras Korn <korn@raidlist.elan.rulez.org>:
> On Tue, Feb 15, 2011 at 06:10:17AM -0300, Roberto Spadim wrote:
>
>> andras could you make some benchmarks to raid1 with round robin read balance?
>> at this site:
>> www.spadim.com.br/raid1/
>
> For the record: we did some benchmarks and while the patch shows promise and
> seems to cause no problems, it resulted in no measurable performance
> increase for a RAID1 array composed of an SSD and two 7200rpm HDDs.
>
> Andras
>
> --
>                     Andras Korn <korn at elan.rulez.org>
>                 If it ain't broken, play with it until it is.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-02-16 15:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-14 21:38 write-behind has no measurable effect? Andras Korn
2011-02-14 22:50 ` NeilBrown
2011-02-14 22:57   ` Andras Korn
2011-02-14 23:41     ` NeilBrown
2011-02-15  1:00       ` Andras Korn
2011-02-15  1:19         ` John Robinson
2011-02-15  2:19           ` Andras Korn
     [not found]             ` <AANLkTikFSOePZJXknAt=Tx6+FpdJ4tiSNwpuwuPC3RY=@mail.gmail.com>
2011-02-15  9:10               ` Roberto Spadim
2011-02-15 12:40                 ` Andras Korn
2011-02-15 13:26                   ` Roberto Spadim
2011-02-15 17:46                     ` Roberto Spadim
2011-02-16 12:00                 ` Andras Korn
2011-02-16 15:00                   ` Roberto Spadim
2011-02-14 22:56 ` Doug Dumitru
2011-02-14 23:03   ` Andras Korn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.