All of lore.kernel.org
 help / color / mirror / Atom feed
* It is possible to put write cache on ssd?
@ 2010-06-04  8:52 Mario
  2010-06-07 19:14 ` Bill Davidsen
  0 siblings, 1 reply; 12+ messages in thread
From: Mario @ 2010-06-04  8:52 UTC (permalink / raw)
  To: linux-raid

Hello,
I have seen that the only hardware raid controllers that can go faster than
linux md raid are the controllers with BBU unit.

Infact thanks to the battery the controllers can have a more aggressive write
caching without the risk of losing data.

Obviously in a standard pc there is not BBU to use with linux software raid.

Now I see that latest hardware raid controllers exchange battery AND ram with a
little flash disk.

So I ask: if I add a fast (with little size) ssd to a linux server is there a
way for linux md raid to use it as a cache to have safer writes and faster raid?

Thanks in advance for interest.

Mario  




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: It is possible to put write cache on ssd?
  2010-06-04  8:52 It is possible to put write cache on ssd? Mario
@ 2010-06-07 19:14 ` Bill Davidsen
  2010-06-08  4:54   ` Ian Dall
  2010-06-08  7:31   ` Mario
  0 siblings, 2 replies; 12+ messages in thread
From: Bill Davidsen @ 2010-06-07 19:14 UTC (permalink / raw)
  To: Mario; +Cc: linux-raid

Mario wrote:
> Hello,
> I have seen that the only hardware raid controllers that can go faster than
> linux md raid are the controllers with BBU unit.
>
> Infact thanks to the battery the controllers can have a more aggressive write
> caching without the risk of losing data.
>
> Obviously in a standard pc there is not BBU to use with linux software raid.
>
> Now I see that latest hardware raid controllers exchange battery AND ram with a
> little flash disk.
>
> So I ask: if I add a fast (with little size) ssd to a linux server is there a
> way for linux md raid to use it as a cache to have safer writes and faster raid?
>
> Thanks in advance for interest.
>
>   
Actually playing with that now. I got an Intel SATA 40GB SSD, and I am 
trying various combinations of things to put on it. One thing which I 
hoped would benefit was to put a f/s journal on SSD and then use the 
option to push all through the journal (data=journal) in hopes that it 
would then free the RAM needed for cache and thus speed operation.

Since none of that has generated the performance I hoped, I'm now 
looking at a kernel patch to overflow the cache in RAM into the SSD, 
stealing code from the mmap to make some address space on the SSD. At 
the moment that works poorly (ok, doesn't work) and I'm going to have to 
rethink the way I do things and probably write a whole bunch of code to 
do it. Not sure if I want to do that, it's unlikely to be a candidate 
for mainline unless I put a ton of time into learning the corner cases.

I also played with mirroring and write mostly, etc. Does provide a 
general solution, at least in my tests.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: It is possible to put write cache on ssd?
  2010-06-07 19:14 ` Bill Davidsen
@ 2010-06-08  4:54   ` Ian Dall
  2010-06-08 19:28     ` Bill Davidsen
  2010-06-08 22:48     ` David Rees
  2010-06-08  7:31   ` Mario
  1 sibling, 2 replies; 12+ messages in thread
From: Ian Dall @ 2010-06-08  4:54 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Mario, linux-raid

On Mon, 2010-06-07 at 15:14 -0400, Bill Davidsen wrote:
> Mario wrote:
> > [...]
> > So I ask: if I add a fast (with little size) ssd to a linux server is there a
> > way for linux md raid to use it as a cache to have safer writes and faster raid?
> >
> > Thanks in advance for interest.
> >
> Actually playing with that now. I got an Intel SATA 40GB SSD, and I am 
> trying various combinations of things to put on it. One thing which I 
> hoped would benefit was to put a f/s journal on SSD and then use the 
> option to push all through the journal (data=journal) in hopes that it 
> would then free the RAM needed for cache and thus speed operation.
> 
> Since none of that has generated the performance I hoped, 

Interesting. If its the X25-V that you have, write performance is
nothing to write home about even compared to a single hard drive, let
alone a raid. By journaling data as well (as metadata), you just add
extra write overhead, possibly even a new bottleneck. 

What happens if you journal only the metadata? The hoped for advantage
would be to avoid seeks between the areas used for the journal and the
data.

The characteristics of these SSD devices seems to be that they get
faster as they get bigger (like the chips are effectively in a kind of
raid).

> I'm now 
> looking at a kernel patch to overflow the cache in RAM into the SSD, 
> stealing code from the mmap to make some address space on the SSD.

Again, I wonder if write performance is good enough for this to pay off.
How does that compare with just using the ssd for swap and possibly
tweaking some parameters to encourage the kernel to use swap more? This
would effectively free up more ram for buffers.

>  At 
> the moment that works poorly (ok, doesn't work) and I'm going to have to 
> rethink the way I do things and probably write a whole bunch of code to 
> do it. Not sure if I want to do that, it's unlikely to be a candidate 
> for mainline unless I put a ton of time into learning the corner cases.
> 
> I also played with mirroring and write mostly, etc. Does provide a 
> general solution, at least in my tests.

Do you mean "does NOT"?


-- 
Ian Dall <ian@beware.dropbear.id.au>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: It is possible to put write cache on ssd?
  2010-06-07 19:14 ` Bill Davidsen
  2010-06-08  4:54   ` Ian Dall
@ 2010-06-08  7:31   ` Mario
  2010-06-08 12:23     ` CoolCold
  1 sibling, 1 reply; 12+ messages in thread
From: Mario @ 2010-06-08  7:31 UTC (permalink / raw)
  To: linux-raid

Bill Davidsen <davidsen <at> tmr.com> writes:


> Actually playing with that now. I got an Intel SATA 40GB SSD, and I am 
> trying various combinations of things to put on it. One thing which I 
> hoped would benefit was to put a f/s journal on SSD and then use the 
> option to push all through the journal (data=journal) in hopes that it 
> would then free the RAM needed for cache and thus speed operation.
> 
>

Probably it is due to two things:

1) to see advantages you have to test in a server situation where a 
lot of thread are writing AND others are reading. So in this case
 writes to hard disk can be delayed a lot to give less latency to reads.
2) A lot of tuning is due to force linux to keep data on journal
 as long as it can.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: It is possible to put write cache on ssd?
  2010-06-08  7:31   ` Mario
@ 2010-06-08 12:23     ` CoolCold
  2010-06-09  7:49       ` Mario
  2010-06-09 11:06       ` MRK
  0 siblings, 2 replies; 12+ messages in thread
From: CoolCold @ 2010-06-08 12:23 UTC (permalink / raw)
  To: Mario; +Cc: linux-raid

May be something like
http://github.com/facebook/flashcache/blob/master/doc/flashcache-doc.txt
will be interesting for you.

On Tue, Jun 8, 2010 at 11:31 AM, Mario <mgiammarco@gmail.com> wrote:
> Bill Davidsen <davidsen <at> tmr.com> writes:
>
>
>> Actually playing with that now. I got an Intel SATA 40GB SSD, and I am
>> trying various combinations of things to put on it. One thing which I
>> hoped would benefit was to put a f/s journal on SSD and then use the
>> option to push all through the journal (data=journal) in hopes that it
>> would then free the RAM needed for cache and thus speed operation.
>>
>>
>
> Probably it is due to two things:
>
> 1) to see advantages you have to test in a server situation where a
> lot of thread are writing AND others are reading. So in this case
>  writes to hard disk can be delayed a lot to give less latency to reads.
> 2) A lot of tuning is due to force linux to keep data on journal
>  as long as it can.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: It is possible to put write cache on ssd?
  2010-06-08  4:54   ` Ian Dall
@ 2010-06-08 19:28     ` Bill Davidsen
  2010-06-08 22:48     ` David Rees
  1 sibling, 0 replies; 12+ messages in thread
From: Bill Davidsen @ 2010-06-08 19:28 UTC (permalink / raw)
  To: Ian Dall; +Cc: Mario, linux-raid

Ian Dall wrote:
> On Mon, 2010-06-07 at 15:14 -0400, Bill Davidsen wrote:
>   
>> Mario wrote:
>>     
>>> [...]
>>> So I ask: if I add a fast (with little size) ssd to a linux server is there a
>>> way for linux md raid to use it as a cache to have safer writes and faster raid?
>>>
>>> Thanks in advance for interest.
>>>
>>>       
>> Actually playing with that now. I got an Intel SATA 40GB SSD, and I am 
>> trying various combinations of things to put on it. One thing which I 
>> hoped would benefit was to put a f/s journal on SSD and then use the 
>> option to push all through the journal (data=journal) in hopes that it 
>> would then free the RAM needed for cache and thus speed operation.
>>
>> Since none of that has generated the performance I hoped, 
>>     
>
> Interesting. If its the X25-V that you have, write performance is
> nothing to write home about even compared to a single hard drive, let
> alone a raid. By journaling data as well (as metadata), you just add
> extra write overhead, possibly even a new bottleneck. 
>
>   
There was a claim that if you use journaled data that the memory buffers 
would be released after the journal was written. Looking at the code I 
didn't think so, but the idea was that a burst of less than 10GB or so 
would get out of memory to the SSD and then be pulled back more slowly 
without blowing everything out of memory cache. Always good to actually 
try stuff than look at the code and pontificate about what it will do 
under dynamic conditions.

The best thing I found was some code I was playing with in 2.6.27 or so, 
which limited the cache used by any one fd, so that there was cache for 
other programs. This shortened the initial fast write speed (write were 
going to buffer, not disk) but didn't hurt 10GB write time, and left the 
system working for other programs.
> What happens if you journal only the metadata? The hoped for advantage
> would be to avoid seeks between the areas used for the journal and the
> data.
>
>   
I've tried putting the journal (and bitmap) on other devices, even on a 
ramdisk, it only helps for certain load.
> The characteristics of these SSD devices seems to be that they get
> faster as they get bigger (like the chips are effectively in a kind of
> raid).
>
>   
>> I'm now 
>> looking at a kernel patch to overflow the cache in RAM into the SSD, 
>> stealing code from the mmap to make some address space on the SSD.
>>     
>
> Again, I wonder if write performance is good enough for this to pay off.
> How does that compare with just using the ssd for swap and possibly
> tweaking some parameters to encourage the kernel to use swap more? This
> would effectively free up more ram for buffers.
>
>   
>>  At 
>> the moment that works poorly (ok, doesn't work) and I'm going to have to 
>> rethink the way I do things and probably write a whole bunch of code to 
>> do it. Not sure if I want to do that, it's unlikely to be a candidate 
>> for mainline unless I put a ton of time into learning the corner cases.
>>
>> I also played with mirroring and write mostly, etc. Does provide a 
>> general solution, at least in my tests.
>>     
>
> Do you mean "does NOT"?
>
>
>   


-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: It is possible to put write cache on ssd?
  2010-06-08  4:54   ` Ian Dall
  2010-06-08 19:28     ` Bill Davidsen
@ 2010-06-08 22:48     ` David Rees
  2010-06-09  9:31       ` Ian Dall
  1 sibling, 1 reply; 12+ messages in thread
From: David Rees @ 2010-06-08 22:48 UTC (permalink / raw)
  To: Ian Dall; +Cc: Bill Davidsen, Mario, linux-raid

On Mon, Jun 7, 2010 at 9:54 PM, Ian Dall <ian@beware.dropbear.id.au> wrote:
> On Mon, 2010-06-07 at 15:14 -0400, Bill Davidsen wrote:
>> Actually playing with that now. I got an Intel SATA 40GB SSD, and I am
>> trying various combinations of things to put on it. One thing which I
>> hoped would benefit was to put a f/s journal on SSD and then use the
>> option to push all through the journal (data=journal) in hopes that it
>> would then free the RAM needed for cache and thus speed operation.
>>
>> Since none of that has generated the performance I hoped,
>
> Interesting. If its the X25-V that you have, write performance is
> nothing to write home about even compared to a single hard drive, let
> alone a raid. By journaling data as well (as metadata), you just add
> extra write overhead, possibly even a new bottleneck.

Depends on whether you are talking about small, seeky writes or large
writes.  Even the X25-V will kill any rotating drive in small seeky
writes, but if you are trying to write big files faster than ~40MB/s,
the the rotating disk might win depending on the exact one you are
comparing it to.

>> I also played with mirroring and write mostly, etc. Does provide a
>> general solution, at least in my tests.
>
> Do you mean "does NOT"?

write-mostly DOES work well in my tests...

-Dave

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: It is possible to put write cache on ssd?
  2010-06-08 12:23     ` CoolCold
@ 2010-06-09  7:49       ` Mario
  2010-06-09 11:06       ` MRK
  1 sibling, 0 replies; 12+ messages in thread
From: Mario @ 2010-06-09  7:49 UTC (permalink / raw)
  To: linux-raid

CoolCold <coolthecold <at> gmail.com> writes:

> 
> May be something like
> http://github.com/facebook/flashcache/blob/master/doc/flashcache-doc.txt
> will be interesting for you.
> 
Great thing! Is it working or is it in alpha stage?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: It is possible to put write cache on ssd?
  2010-06-08 22:48     ` David Rees
@ 2010-06-09  9:31       ` Ian Dall
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Dall @ 2010-06-09  9:31 UTC (permalink / raw)
  To: David Rees; +Cc: Bill Davidsen, Mario, linux-raid

On Tue, 2010-06-08 at 15:48 -0700, David Rees wrote:
> On Mon, Jun 7, 2010 at 9:54 PM, Ian Dall <ian@beware.dropbear.id.au> wrote:
> > On Mon, 2010-06-07 at 15:14 -0400, Bill Davidsen wrote:
> >> Actually playing with that now. I got an Intel SATA 40GB SSD, and I am
> >> trying various combinations of things to put on it. One thing which I
> >> hoped would benefit was to put a f/s journal on SSD and then use the
> >> option to push all through the journal (data=journal) in hopes that it
> >> would then free the RAM needed for cache and thus speed operation.
> >>
> >> Since none of that has generated the performance I hoped,
> >
> > Interesting. If its the X25-V that you have, write performance is
> > nothing to write home about even compared to a single hard drive, let
> > alone a raid. By journaling data as well (as metadata), you just add
> > extra write overhead, possibly even a new bottleneck.
> 
> Depends on whether you are talking about small, seeky writes or large
> writes.  Even the X25-V will kill any rotating drive in small seeky
> writes,

Of course, low  (almost 0) seek time it the forte of SSD disks, which is
why, is seems to me, swap would be an ideal application. I may be wrong
about that, but I imagine that paging is a kind of semi random pattern.
 
> >> I also played with mirroring and write mostly, etc. Does provide a
> >> general solution, at least in my tests.
> >
> > Do you mean "does NOT"?
> 
> write-mostly DOES work well in my tests...

Ah! My understanding of "write-mostly" it that it is used in a mirror
(raid 0), which means that you need enough SSD to store your entire
filesystem, and the rotating disk is just redundancy. So:
capacity=capacity of SSD, speed ~= speed of SSD (until write behind
queue is full), probability of failure ~= P(Fail SSD)*P(Fail rotating
media). If the reliability of SSD is good enough (I don't know), is this
much of a win? 

Regards,
Ian

-- 
Ian Dall <ian@beware.dropbear.id.au>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: It is possible to put write cache on ssd?
  2010-06-08 12:23     ` CoolCold
  2010-06-09  7:49       ` Mario
@ 2010-06-09 11:06       ` MRK
  2010-06-09 16:21         ` Aryeh Gregor
  1 sibling, 1 reply; 12+ messages in thread
From: MRK @ 2010-06-09 11:06 UTC (permalink / raw)
  To: CoolCold; +Cc: Mario, linux-raid

On 06/08/2010 02:23 PM, CoolCold wrote:
> May be something like
> http://github.com/facebook/flashcache/blob/master/doc/flashcache-doc.txt
> will be interesting for you.

There is another one: bcache

http://lkml.indiana.edu/hypermail//linux/kernel/1004.0/01051.html
http://lkml.org/lkml/2010/4/23/376

seems beta stage but nearing completion.
I have not yet investigated all the differences to flashcache. At first 
sight, bcache seems to cache reads, flashcache seems to cache writes.

But be warned that if you use a flash disk as a cache for a big RAID I 
believe it's gonna wear out very quickly.

Consider e.g. bcache caches reads, so on every cache miss it reads 
through the RAID and writes to the flash. If you have indexing programs 
that scrub the whole array (which is clearly larger than the SSD) they 
are going to transform all reads to writes to the flash disk. This is 
going to burn it quickly. I am not sure an SSD is a good medium for 
caching purposes. Try to buy RAM for this.

OTOH caching writes has another big problem:
http://www.legitreviews.com/news/7225/
read the grey box:
There is written that, due to the internal workings of SSDs, without a 
supercapacitor it is not possible to guarantee data integrity upon power 
loss. This is true EVEN if you are running it with cache off (so I 
suppose it is true even if you are using cache flushes or barriers).
Do you really want to lose data upon power loss? Caching writes on an 
SSD is risky...
At least bcache caches only reads, this should be safe I think.

Using a SSD for filesystem journal when the filesystem is on a HDD I 
suppose would not yield big improvements because the bottleneck will 
always be the HDD writes, which won't stay much behind journal commits. 
At most you are going to get the speed you have without a journal (like 
ext2).

Same problem with write-mostly/write-behind I think. I don't know how 
long is the queue that holds data already committed to the SSD and not 
yet committed to the HDD but it can't be too long. I'm reading the "man 
md" right now and it's not extremely clear on this. I have the 
impression the queue between the two it's either the 
/sys/block/hdddevice/queue/nr_requests or it uses the write-intent 
bitmap (if set). In case of the nr_requests, it's gonna be very short so 
the SSD can give you quick bursts but continuous performance will be 
that of the HDD. In case of write-intent bitmap, the delay between the 
two can probably be unbounded, but be warned that if the HDD is even 
just a bit behind the SSD, it's like not having the RAID: if the SSD 
fails, you lose data (might be a lot of data; will also probably need fsck).

What do you think?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: It is possible to put write cache on ssd?
  2010-06-09 11:06       ` MRK
@ 2010-06-09 16:21         ` Aryeh Gregor
  2010-06-10 12:08           ` MRK
  0 siblings, 1 reply; 12+ messages in thread
From: Aryeh Gregor @ 2010-06-09 16:21 UTC (permalink / raw)
  To: MRK; +Cc: CoolCold, Mario, linux-raid

On Wed, Jun 9, 2010 at 7:06 AM, MRK <mrk@shiftmail.org> wrote:
> Same problem with write-mostly/write-behind I think. I don't know how long
> is the queue that holds data already committed to the SSD and not yet
> committed to the HDD but it can't be too long. I'm reading the "man md"
> right now and it's not extremely clear on this. I have the impression the
> queue between the two it's either the /sys/block/hdddevice/queue/nr_requests
> or it uses the write-intent bitmap (if set). In case of the nr_requests,
> it's gonna be very short so the SSD can give you quick bursts but continuous
> performance will be that of the HDD.

I tried this once and posted some bonnie++ results:

https://kerneltrap.org/mailarchive/linux-raid/2010/1/31/6742263

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: It is possible to put write cache on ssd?
  2010-06-09 16:21         ` Aryeh Gregor
@ 2010-06-10 12:08           ` MRK
  0 siblings, 0 replies; 12+ messages in thread
From: MRK @ 2010-06-10 12:08 UTC (permalink / raw)
  To: Aryeh Gregor; +Cc: CoolCold, Mario, linux-raid

On 06/09/2010 06:21 PM, Aryeh Gregor wrote:
> On Wed, Jun 9, 2010 at 7:06 AM, MRK<mrk@shiftmail.org>  wrote:
>    
>> Same problem with write-mostly/write-behind I think. I don't know how long
>> is the queue that holds data already committed to the SSD and not yet
>> committed to the HDD but it can't be too long. I'm reading the "man md"
>> right now and it's not extremely clear on this. I have the impression the
>> queue between the two it's either the /sys/block/hdddevice/queue/nr_requests
>> or it uses the write-intent bitmap (if set). In case of the nr_requests,
>> it's gonna be very short so the SSD can give you quick bursts but continuous
>> performance will be that of the HDD.
>>      
> I tried this once and posted some bonnie++ results:
>
> https://kerneltrap.org/mailarchive/linux-raid/2010/1/31/6742263
>    

Thanks for your tests. The write-mostly array seems to go roughly as 
fast as the SSD itself if I interpret your tests correctly (have you 
really saturated the write-behind queue?). An HDD-only test would have 
been interesting though (with SSDs failed and removed).

Secondly:
I now realize that the write-behind distance is settable (man mdadm see 
--write-behind= ). However there is written it needs the write intent 
bitmap to work. This makes me think that it is not really safe upon SSD 
failure. Is the data in the write-behind queue also saved in RAM or does 
it exist only in the SSD device (pointed to by the bitmap)? In the 
second case, if the SSD dies, the HDD will likely be corrupt, it's not 
really like having a RAID. In the first case, I don't understand why it 
should need the write intent bitmap active.



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-06-10 12:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-04  8:52 It is possible to put write cache on ssd? Mario
2010-06-07 19:14 ` Bill Davidsen
2010-06-08  4:54   ` Ian Dall
2010-06-08 19:28     ` Bill Davidsen
2010-06-08 22:48     ` David Rees
2010-06-09  9:31       ` Ian Dall
2010-06-08  7:31   ` Mario
2010-06-08 12:23     ` CoolCold
2010-06-09  7:49       ` Mario
2010-06-09 11:06       ` MRK
2010-06-09 16:21         ` Aryeh Gregor
2010-06-10 12:08           ` MRK

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.